Dealing with failure

FLP proof from 1985 shows that it is impossible for a faulty process in an async model to check if a process has failed or whether it is just slow. In the presence of unreliable communication, we cannot achieve consensus since we may never be able to communicate with a process.

Two army problem
- Two armies, A and B want to attack army C.
- A sends message: “lets attack at dawn.”
- B replies with OK. The message arrives at A, but A realizes B does not know whether their message OK reached A.
- B is not confident that they should attack since they don’t know if their message was received by A.
- Even if A sent the message, “received B’s confirmation.”, A cannot be sure it will arrive.

This problem demonstrates that even with non-faulty processors, agreement is not possible with unreliable communication channels.

This is why use upper bounds on communications to consider a process faulty if it does not respond within that bounded time.

A fail-stop is the condition where a failed process does not communicate. A byzantine fault is where the faulty process continues to communicate but may produce faulty information.

To make a consensus algorithm work against fail-stops, we will need an algorithm that allows the same value to be decided by all processes even if all processes receives at most (n-t) answers. Where there are n processes and of which t are faulty.

A fail-stop resilient algorithm:
- A process broadcasts its preferred value and the number of processes it has seen that also have that value (this count is called cardinality and initiated at 1)
- The process receives (n-t) answers, each containing a preferred value and cardinality.
- The process can change its preferred value to the one preferred by other processes; it updates the corresponding cardinality to the # responses it received with that value + itself.
- Continue this process until a process receives t messages of a single value with cardinality of n/2. This means half the system has agreed on the same value.
- As the number of phases approaches infinity, the probability of consensus not being reached approaches 0.

We can make it easier to develop algorithms by relaxing the definition of asynchrony and allowing partial synchrony. Different types of asynchrony exist in a system:
- Process asynchrony: a process may sleep or be suspended for an arbitrary amount of time.
- Communication asynchrony: no upper bound on the time a message takes to reach its destination.
- Message order asynchrony: messages may be delivered in a different order than sent.

It has been shown it is not sufficient to make processes synchronous but that any of the following cases is sufficient to make a consensus protocol possible:
- put an upper bound on sleep time and message transmission (process + comms synchrony)
- put an upper bound on process sleep time and ordering (process + message order synchrony)
- use message order synchrony and broadcast capability.
- use comms synchrony, broadcast capability, and send/receive atomicity.


Before we understand consensus, which is multiple nodes agreeing on something, we need to understand consistency.

Consistency in data systems is at least eventual - meaning if you read data that is inconsistent, at least it will eventually all be consistent across all nodes. Another way to say this is convergence.
Inconsistency example - we submit a write, want to read what we just submitted, and it’s not there.

Another word for linearizability is atomic consistency.
- If databases provide the illusion to each node that there is only one replica, then replication lag wouldn’t be an issue.
- Replication lag = when I query two different replicas, I get two different results.
- If there is only one replica, all operations on it are atomic. This makes it simpler for apps, which won’t have to worry about the replicas.
- Linearizability is a recency guarantee.

Network interruptions force a choice between availability and linearizability
- If they both stay up, the data won’t be consistent
- To stay consistent, one system has to go down.
- Because they can’t connect to each other.
- If it’s a single-leader database, one of the data centres contains the leaders, and data processed in centres without it will have write errors since they can’t contact the leader node.
- If it’s a multi leader database, both can stay up since data is copied asynchronously, the writes are queued and will be exchanged when network connection is restored.

If you must have linearizability, the tradeoff is availability; one of the replicas during network problems will go down. If the system can handle network problems without going down, its behaviour is not linearizable since the replicas can process requests independently. Meaning not providing the illusion of one copy of data and all atomic operations.
- CAP theorem is this insight: Consistency, Availability, Partition. They’re either consistent or available when partitioned but not both.

Every CPU core has its own memory cache and store buffer - this means they are not linearizable. The reason for dropping linearizability is for performance, not fault-tolerance. In networks of highly variable delays, read and write response times can be high. Weaker consistency models are much faster, so the trade-off is important for latency-sensitive systems.

In other words:

If latency and/or availability is the priority; weaker consistency.
If consistency is the priority; performance and availability may suffer.