The network is faster than the disk

by simbo1905

In distributed systems, contrary to popular belief, the local disk may not always be faster than the network. While we do use SSD/Flash drives that are considered faster, networks can also offer high speeds and low latency. It is important to carefully test the performance characteristics of both local disk and network resources when designing and deploying distributed systems. I recently came across two articles that tested latencies on AWS.

The article Log sync latency explained checked on an EC2 system the latency if fsync. They found that it as 3ms for 256 kb. It took 20ms to sync a 1.6M WAL log.

The article Measuring Latencies Between AWS Availability Zones tested across availability zone ping times across all regions. It found sub-ms latency was normal. Only some remote and slowest regions has 2.5ms latency.

This suggests that we really do not want to use consensus algorithms that force the disk like Raft or Paxos if we can use a consensus algorithm like Viewstamped Replication (VSP).

There is an amazing paper review Paper #74. Viewstamped Replication Revisited by one of the TigerBeetle team that explains VSP for a high-performance database.

Given that the modern paper on VSP was published in 2012, and referenced in the Raft paper, it would seem that it was the state-of-the-art for more than a decade now. It was great to learn that TigerBeetle DB is putting it into production.