Partition Tolerance
System continues to operate despite network failures
“The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.” – Eric Brewer
When to use
Always. In a distributed system, partitions will happen. You cannot choose to not have P.
Why it matters
- Reality: Networks are unreliable. Cables get cut. Switches fail.
- Decision: Since P is a given, you must choose between C (Consistency) and A (Availability) when P happens.
Signs of Violation
- (You can’t really violate P in a distributed system, you just fail miserably if you don’t account for it).
- A single network glitch causes the entire cluster to corrupt data or crash.
Explanation
Problem
Two nodes, A and B, lose contact. A user writes to A. Does A accept it?
- If Yes: You chose Availability. B doesn’t know about the write. (Data diverges).
- If No: You chose Consistency. A refuses the write until it can talk to B. (System down).
Solution
Design your system acknowledging that the network is not reliable.
Real world analogy
Two generals trying to coordinate an attack via messengers. If the messenger gets captured (Partition), they can’t agree on the time. They either attack alone (Available, but risky/inconsistent) or wait indefinitely (Consistent, but doing nothing).
Pros and Cons
| Pros | Cons |
|---|---|
Comparison
- Fallacies of Distributed Computing: “The network is reliable” is the #1 fallacy.
Code example
Typescript
Good (Adherence)
PHP
Bad (Violation)
Good (Adherence)