Corfu: Stripe & Mirror (Part 4)
The last post introduced a global sequencer to reduce contention on writes to the end of a log file. The idea is to be able to distribute the log file across many machines. In this post we will scale up or fictional example using by striping and mirroring data much like a RAID 10 disk configuration.
Lets image that we are still using expensive network disks but that we can afford four of them and each one can support four servers. We can attach four servers with full interconnections to four disks. This allows to use RAID 10 write semantics for both performance and resilience:
RAID 10 is a stripe across a number of mirrored sets. [HDD Tool]
Each server can run a key-value store process which simply applies `mod(2)` to each position to know which pair of mirrored disks to write to. Data is written to two disks to protect against disk crashes. All we need to ensure is that every process attempting to write to a given position writes to the same two disks in the same order Why? If they don’t then two processes can both successfully write a different value to the same position on the two mirrored disks. That would be a data corruption and a violation of our write-once safety semantics.
A really important thing we gain with a “RAID 10”-like stripe and mirror approach is that writes at adjacent positions are mapped onto a different pair or mirrored disks. Two clients with adjacent tickets taken from the global sequencer can concurrently write without contention. Happy days!
Why stop there? Our fictional solution is limited by our expense disks. If we can use commodity servers with local disks as storage nodes then instead of two replica sets of two disks we can use, say, eight replica sets of two disks. Indeed the Corfu paper says that they scaled to sixteen storage nodes. This is because Corfu can use commodity hardware for storage nodes exposing a simple log API over the network. The reference implementation for Corfu is a Java project on Github. This means that the number of mirrored disk pairs that we can concurrently write strips of the distributed log into is arbitrary.