20/05/2018

Algorithms Behind Modern Storage Systems

Developing storage systems always presents the same challenges and factors to consider. Deciding what to optimize for has a substantial influence on the result. You can spend more time during write in order to lay out structures for more efficient reads, reserve extra space for in-place updates, facilitate faster writes, and buffer data in memory to ensure sequential write operations. It is impossible, however, to do this all at once. An ideal storage system would have the lowest read cost, lowest write cost, and no overhead. In practice, data structures compromise among multiple factors. Understanding these compromises is important.

Researchers from Harvard’s DASlab (Data System Laboratory) summarized the three key parameters database systems are optimized for: read overhead, update overhead, and memory overhead, or RUM. Understanding which of these parameters are most important for your use-case influences the choice of data structures, access methods, and even suitability for certain workloads, as the algorithms are tailored having a specific use-case in mind.

The RUM Conjecture states that setting an upper bound for two of the mentioned overheads also sets a lower bound for the third one. For example, B-trees are read-optimized at the cost of write overhead as well as having to reserve empty space for the (thereby resulting in memory overhead). LSM-trees have less space overhead at a cost of read overhead brought on by having to access multiple disk-resident tables during the read. These three parameters form a competing triangle, and improvement on one side may imply compromise on the other.

Microservices in Adopt?

You’re probably now wondering why or if we even still do recommend a microservices architecture. The flexibility, the independent scalability, the evolutionary characteristics, the strong encapsulation are still very real benefits to a microservices approach. Those benefits are well-articulated elsewhere. We are still firmly committed to using microservice architectures, extending our understanding of those architectures, and continuing to explore tools and approaches that address the issues articulated here. […] However, the costs and downsides of the approach and the level of organizational maturity needed to execute on the approach are the reason that microservices will quite likely never move into Adopt.