slash dev slash null

stuff about puters

Category: Trex

TRex UPaxos Experimental Code

There is now a sketch (hack!) demoing the happy path flow of a UPaxos reconfiguration written up on the TRex wiki.

Paxos Voting Weights

The last UPaxos post took a run through the reconfiguration stall avoiding aspects of the UPaxos paper. In this post, we will take a quick look at how the paper describes hot swapping of nodes using voting weights. Read the rest of this entry »

UPaxos: Unbounded Paxos Reconfigurations

The year 2016 turned out to be a bumper year for pragmatic Paxos discoveries. Hot on the heels of the FPaxos discovery of more flexible quorums comes Unbounded Pipelining in Dynamically Reconfigurable Paxos Clusters or “UPaxos”. This uses overlapping quorums between consecutive cluster configurations, and a leader “casting vote”, to enable cluster reconfigurations in a non-stop manner even when reconfiguration messages are lost. Read the rest of this entry »

Paxos Reconfiguration Stalls

In a previous post I covered using the Paxos engine itself to do cluster reconfiguration as per the 2001 Paxos Made Simple paper. In this post I will cover a problem with that technique know as pipeline stalls. This post is to set the scene for a new technical report published 2016 which fixes the problem with a state-of-the-art Paxos implementation called UPaxosRead the rest of this entry »

Just say NO to custom hardware for Paxos 

Today’s Morning Paper on “Just say NO to Paxos overhead: replacing consensus with network ordering” was a thrilling disappointment. It’s always with both excitement and trepidation I read about new developments in distributed consensus; is today the day that I learn that Paxos is obsolete? The NOPaxos paper (“network ordered Paxos”) reviewed at the link above has a title which suggests a breakthrough. Unfortunately, it has far less general applicability, and far higher economic cost to implement, than “vanilla” Paxos.

Read the rest of this entry »

Failures In Distributed Databases

The “Morning Paper” blog has some fascinating insights into critical production failures in distributed data-intensive databases such as Cassandra and HBase. This reveals that simple testing can prevent most critical failures. In this blog post we take a quick look to see what an embeddable Paxos algorithm project such as Trex can learn from this study.  Read the rest of this entry »

The FPaxos “Even Nodes” Optimisation 

Up until 2016, it was well understood that the optimal size for typical Paxos clusters is three or five nodes. With typical replication workloads, it was known that four or six nodes clusters are no better, and in fact worse, than having one less node. The FPaxos “Flexible Paxos” paper changes the rules of the consensus game with the “even nodes” optimisation.  Read the rest of this entry »

Trex now supports Flexible Paxos (FPaxos) Strategies

2016 has turned out to be a great year for Paxos with the discovery of a more flexible Paxos known as FPaxos. Only a simple change was required to the TRex paxos JVM library to support this new discovery. You can now supply a pluggable QuorumStraregy which can do things like “grid consensus”. The code on the master branch now has a default strategy which does the  “even nodes” optimisation. Enjoy!

Bolt-on Causal Consistency

Eventual consistency forces complexity onto an application. Consider a comments system on a blog site where users are discussing with each other. What every user would like to see is “causal consistency” whereby they don’t see a comment until they can also see all the comments that it is “in reply-to”.

In the general case an eventually consistent data store (ECDS) like Cassandra won’t give causual consistency: you can see a comment before you see what the user is replying to. The Morning Paper has an excellent discussion of a paper that shows how 2k lines of code can layer Causual Consistency over the top of Cassandra using a separate local data store at each node and a vector clocks to track ordering.

the morning paper

Bolt-on Causal Consistency – Bailis et al. 2013

“It’ll probably be OK” seems to reflect the prevailing application developer’s attitude to working with eventually consistent stores. Thanks to the work of Bailis et al. on PBS, we can now quantify that ‘probably.’ And it looks pretty good at first glance, with 99+% probabilities achievable after a pretty short window. The greater the volume of transactions you’re processing though, the more this bites you: (a) the window between transactions is shorter, increasing the probability of staleness, and (b) you’re taking a percentage of a bigger absolute number. Let’s say 1M transactions per day, and 99.99% probability of recency under normal conditions – that’s 100 stale reads a day. Is that ok? It depends on your application semantics of course. After a year of operation you’ll have 36,500 stale reads – it’ll probably be ok?!

Presumably you’re using an eventually consistent…

View original post 2,320 more words

Understanding Paxos

The excellent paper Understanding Paxos has a very detailed explanation of the mechanics of the algorithm. Here is a diagram from the paper showing a node which bridges a network partition from that paper which then goes and works through the possible outcomes with diagrams showing every message. Excellent work!

Source: Understanding Paxos