# Transformer Architecutre
- A "mini-brain" sitting on each token.
- A mini-brain can pass information to its right.
- A mini-brain has states. i.e. To compute a layer, all it need is previous layers and the output of the mini-brain to the left.
- Mini-brains can ask questions and share information.
- "Backward and downward" mechanism, information only flows from left to right.
- The only way to get around "downward" is the newly generated token will have a chance to pass insights in high layers to future generated tokens -- the basis of chain-of-thought prompting.