# Fault Tolerance - Reliability and availability - Metrics - MTTF = Mean time to failure - MTBF = Mean time between failures - MTTR = Mean time to recovery - Types of faults - Transient vs persistent - Malicous vs benign - Fail stop, stuck 1 or 0 - Byzantine faults, inconsistent behaviors - Safety vs liveliness - Fail-safe, graceful degradation - $N$-modular redundancy masks up to $m$ failures, when $N = 2m + 1$. - Hot/Cold standby. - Design considerations - Fault avoidance - focus at the design phase - Fault removal - debugging, iteration - Fault tolerance