Corfu: Linearizable Consistency (Part 2)

by simbo1905

This the second post in a series about Corfu. In this post I will outline the basic technique it uses to achieve linearizable consistency. We will then look at a simple adaption that we will later see can be used to scale the system which introduces an additional challenge to solve. 

Consider multiple processes implementing a key-value store. Let’s imagine that each process is connected to a shared global disk and that the disk implements an atomic append-only file API. Given these circumstances achieving linearizable consistency is very easy. Every process can append their reads and writes to the end of a shared global log file. Each process then simply follows the shared global log file. All processes will then see all writes linearly sequenced in a single ordering. To get strongly consistent reads we can simply sequence read commands into the log which are only executed by the writing process. This is basically a home run when it comes to strong consistency.

Note that this basic approach achieves strong consistency without any complex logic. It also doesn’t need a leader process. The only thing we needed was an expensive and fast network drive. All processes can directly use a simple append-only file API in an uncoordinated manner. The shared global disk simply needs to serialise all operations and append them to the file. The problem with this simple approach is that the global shared disk is a single IO bottleneck. From a scalability perspective this isn’t going to fly.

In order to scale this approach and move to commodity server hardware we first need to make a small modification. Rather than having an append-only file API we need to switch to writing to a specific location just beyond the current end of the shared log file. Why? Well in a later post we will stripe the logical file across the local disks of many independent commodity servers. An appending API would require us to collect and order writes within a global buffer spanning many machines. That would introduce network IO and a single buffer bottleneck that we want to avoid.

Instead imagine that each process is reading the shared file sequentially up to the last write it can see. When a process wants to write to the end of the log it sends a write command aimed at the next highest position beyond the latest value it has read. In this scenario we need to prevent two processes writing to the same position. This isn’t hard. We simply apply the first write that targets a given position in the file and reject any subsequent writes to that position. We only need one bit of additional storage to keep track of whether a position has been written to.

The challenge with what has been described so far is contention. A client that gets a rejected write can simply reread until it finds the new end of the file. Then it can retry its pending write at the next higher position. If we have a system which has very few writes this might be acceptable. Within a very high performance system with plenty of concurrent writes the performance would be terrible. In the next post we will introduce a simple technique which Corfu uses to solve this problem.