Index | About | Me | Jump to Menu Section

Designing data intensive applications

Data Structures

SSTable

Sorted Strings Table (SSTable) is a persistent file format used by ScyllaDB, Apache Cassandra, and other NoSQL databases to take the in-memory data stored in memtables, order it for fast access, and store it on disk in a persistent, ordered, immutable set of files. Immutable means SSTables are never modified. They are later merged into new SSTables or deleted as data is updated. - https://www.scylladb.com/glossary/sstable/

For Things like Primary Key

LSM tree

In computer science, the log-structured merge-tree (also known as LSM tree, or LSMT) is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

B-tree

In computer science, a B-tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree generalizes the binary search tree, allowing for nodes with more than two children. Unlike other self-balancing binary search trees, the B-tree is well suited for storage systems that read and write relatively large blocks of data, such as databases and file systems. - https://en.wikipedia.org/wiki/B-tree

B-tree vs LSM-tree

Concepts

Indexes

Terms

Back Compatibility

newer code can read data that was written by older code

Forward compatibility

older code can read data that was written by newer code

Monotonic reads

Monotonic reads ensures that if a process performs read r1, then r2, then r2 cannot observe a state prior to the writes which were reflected in r1; intuitively, reads cannot go backwards. Monotonic reads does not apply to operations performed by different processes, only reads by the same process.

Leaderless replication

Sloppy Quorum

Consider a scenario where we have to assigned N nodes for a particular key. These N nodes are spread across two clusters and none of these clusters have more than W - 1 nodes for this key. Whenever we receive a request from the client to update the key, we send it to these N nodes and if we receive a confirmation from W of these N nodes, we consider the update to be successful. Now if we encounter a network failure between these two node clusters then we won’t be able to reach a quorum for all the upcoming write requests as we are unable to communicate across all N nodes and gather at least W votes to get a quorum.

To handle the above issue we have a modified version of quorum called as Sloppy Quorum. Under sloppy quorum, when we are unable to reach all the N nodes due to a partition failure, we temporarily store the updates on backup nodes. These backup nodes were not initially responsible for storing updates for the required key but they store the updates only in case of partition failure. The updates are stored along with the metadata which describes the original node that the key was required to be stored on.

In simple words:

A real-life example of sloppy quorum & hinted handoff will be if a coworker takes a message on your behalf once you are out on a break and shares the message with you once you are back. If you didn’t had such an understanding coworker with you then you might have missed the message completely. You will have to figure out on your own about what messages you missed and will be scared to leave your desk in future for a break. - https://distributed-computing-musings.com/2022/05/sloppy-quorum-and-hinted-handoff-quorum-in-the-times-of-failure/

Write-a-head logging

In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability (two of the ACID properties) in database systems. A write ahead log is an append-only auxiliary disk-resident structure used for crash and transaction recovery. The changes are first recorded in the log, which must be written to stable storage, before the changes are written to the database. - https://en.wikipedia.org/wiki/Write-ahead_logging

Log Structure Storage