# Fault Tolerance
- Reliability and availability
- Metrics
- MTTF = Mean time to failure
- MTBF = Mean time between failures
- MTTR = Mean time to recovery
- Types of faults
- Transient vs persistent
- Malicous vs benign
- Fail stop, stuck 1 or 0
- Byzantine faults, inconsistent behaviors
- Safety vs liveliness
- Fail-safe, graceful degradation
- $N$-modular redundancy masks up to $m$ failures, when $N = 2m + 1$.
- Hot/Cold standby.
- Design considerations
- Fault avoidance - focus at the design phase
- Fault removal - debugging, iteration
- Fault tolerance