slash dev slash null

simbo1905’s ramblings about computers

Tag: Consensus

The network is faster than the disk

In distributed systems, contrary to popular belief, the local disk may not always be faster than the network. While we do use SSD/Flash drives that are considered faster, networks can also offer high speeds and low latency. It is important to carefully test the performance characteristics of both local disk and network resources when designing and deploying distributed systems. I recently came across two articles that tested latencies on AWS.

Read the rest of this entry »

One More Frown Please? (UPaxos Quorum Overlaps)

There was some discussion around UPaxos safety on a gist where Dave Turner was kind enough to clarify a confusion of mine. I had said that we needed an overlap between prepare quorums to avoid a split-brain. This was incorrect and I am greatful for Dave for correcting my misunderstanding. Yet there was something about not having that overlap that was bugging me… This morning I had an “Aha!” moment: if it exists then Trex will perform an optimisation. Yet in Trex this overlap is not enforced.

Read the rest of this entry »

Corfu: The Protocol

This is the eighth and last in a series of posts about Corfu a strongly consistent and scalable distributed log. To quote the paper:

At its base, CORFU derives its success by scaling Paxos to data center settings and providing a durable engine for totally ordering client requests at tens of Gigabits per second throughput.

It does this through a number of existing techniques:

Our design borrows bits and pieces from various existing methods, yet we invented a new solution in order to scale SMR [state machine replication] to larger cluster scales and deliver aggregate cluster throughput to hundreds of clients.

What I really like about Corfu is that it innovates by bringing existing techniques such as Paxos together into something which is a real breakthrough. The protocol itself is very simple. The established techniques it uses to implement the protocol are very simple. Yet when assembled into the solution they become extremely powerful. This post will put the pieces together and discuss the Corfu protocol.

Read the rest of this entry »

Corfu: Safety Techniques

This the sixth post in a series about Corfu. It will cover some of the basic safety techniques which I call “Write Once”, “Write-Ordering” and “Collaborating Threads”. What is remarkable about Corfu is how it uses such well-established techniques to build something extremely scalable. Read the rest of this entry »

Corfu: Copy Compaction

This the fifth post in a series about Corfu. In the last post we discussed striping a log across many mirrored sets of disks for IO concurrency. In this post we will discuss deletions from the global log and copy collection techniques to free the disk space:

CORFU makes it easy for developers to build applications over the shared log without worrying about garbage collection strategies

Copy collection will also allow us to compact the log even when we don’t use deletes which the paper doesn’t mention. Why do we need compaction under write-only usage? Writes are quantized into pages written to directly by clients in an uncoordinated manner. Under realistic workloads many writes may be much smaller than the page size. Without copy compaction Corfu would squander disk space.

Read the rest of this entry »

Corfu: Stripe & Mirror

This the fourth post in a series about Corfu. The last post introduced a global sequencer to reduce contention on writes to the end of a log file. The idea is to be able to distribute the log file across many machines. In this post we will scale up or fictional example using by striping and mirroring to achieve high performance:

[…] we can append data to the log at the aggregate bandwidth of the cluster […] Moreover, we can support reads at the aggregate cluster bandwidth. Essentially, CORFU’s design decouples ordering from I/O, extracting parallelism from the cluster for all IO while providing single-copy semantics for the shared log.

Read the rest of this entry »

 Corfu: Global Sequencer

This is the third post in a series about Corfu. In the last post we outlined how to linearizable consistency without a global buffer. This introduced a write contention challenge. In this post we will get into Corfu's simple fix for this issue. Corfu uses a global ticket service which acts as a sequencer:

Read the rest of this entry »

Corfu: Linearizable Consistency

This the second post in a series about Corfu. To quote the paper

The setting for CORFU is a data center with a large number of application servers (which we call clients) and a cluster of storage units. Our goal is to provide applications running on the clients with a shared log abstraction implemented over the storage cluster.

Why is a shared log abstraction exposed to clients useful? This post will discuss how a shared log can be used to achieve linearizable consistency. To quote the paper:

The semantics provided for read/write operations is linearizability [Herlihy and Wing 1990], which for a log means that once a log entry is fully written by a client or has been read, any future read will see the same content (unless it is reclaimed via a [delete] operation).

We will then look at a simple adaption that we will later see can be used to scale the system which introduces an additional challenge to solve.

Read the rest of this entry »

Corfu: Scaling log replication

Corfu: A distributed shared log is a paper running to 25 pages brought to my attention by The Morning Paper. What I really like about Corfu is how it borrows bits and pieces of established techniques and puts them together to get something which is a real breakthrough. Corfu achieves linearizable consistency of a fault tolerant distributed log using very simple write-once techniques. The real eye open is that it implements a distributed log interface without an IO bottleneck giving extreme performance. To quote the papers conclusion:

Despite almost forty years of research into replicated storage schemes, the only approach so far to scale up capacity and throughput has been to shard data and trade consistency for performance. In this article, we presented the CORFU system, which breaks this seeming tradeoff by organizing a cluster of drives as a single, shared log. CORFU offers single-copy semantics at cluster-scale speeds, providing a scalable source of atomicity and durability for distributed systems. CORFU’s novel client-centric design eliminates any single I/O bottleneck between numerous clients and the cluster, allowing data to stream to and from the cluster in parallel.

 

This series of posts will build up a simple fictional example that demonstrates the key techniques. Then I will glue it all together to describe how Corfu itself works and demystify the role Paxos plays.

Read the rest of this entry »

UPaxos: Unbounded Paxos Reconfigurations

The year 2016 turned out to be a bumper year for pragmatic Paxos discoveries. Hot on the heels of the FPaxos discovery of more flexible quorums comes Unbounded Pipelining in Dynamically Reconfigurable Paxos Clusters or “UPaxos”. This uses overlapping quorums between consecutive cluster configurations, and a leader “casting vote”, to enable cluster reconfigurations in a non-stop manner even when reconfiguration messages are lost. Read the rest of this entry »