Partition Tolerance

System continues to operate despite network failures

“The system continues to operate despite an arbitrary number of messages being dropped or delayed by the network between nodes.” – Eric Brewer

When to use

Always. In a distributed system, partitions will happen. You cannot choose to not have P.

Why it matters

  • Reality: Networks are unreliable. Cables get cut. Switches fail.
  • Decision: Since P is a given, you must choose between C (Consistency) and A (Availability) when P happens.

Signs of Violation

  • (You can’t really violate P in a distributed system, you just fail miserably if you don’t account for it).
  • A single network glitch causes the entire cluster to corrupt data or crash.

Explanation

Problem

Two nodes, A and B, lose contact. A user writes to A. Does A accept it?

  • If Yes: You chose Availability. B doesn’t know about the write. (Data diverges).
  • If No: You chose Consistency. A refuses the write until it can talk to B. (System down).

Solution

Design your system acknowledging that the network is not reliable.

Real world analogy

Two generals trying to coordinate an attack via messengers. If the messenger gets captured (Partition), they can’t agree on the time. They either attack alone (Available, but risky/inconsistent) or wait indefinitely (Consistent, but doing nothing).

Pros and Cons

Pros Cons
  • Resilience
  • Complexity in handling the split
  • Comparison

    • Fallacies of Distributed Computing: “The network is reliable” is the #1 fallacy.

    Code example

    Typescript

    Good (Adherence)

    PHP

    Bad (Violation)

    Good (Adherence)