# Large Language Model
## Concepts
- [[prompt|Prompt engineering]]
- _Tokens_ used by a model: _vocabulary_
- _Autoregression_, takes a most likely next token, append it to the prompt, and
run it again to get a new token.
- _Sampling_ tokens from possible outputs
- _Temperature_, `0` will cause a semi-deterministic output, `1` will uniformly
sample the choices according to probability distribution. Model output
deteriorates at high temperature since the gibberish formed a pattern.
- [[transformer|Transformer architecture]]
- [[fine-tune|Fine-tuning process]]
## History
- Markov model of natural language introduced by Shannon in 1948.
[_A Mathematical Theory of Communication_](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf),
which also introduced the concept of information [[entropy]].
- `seq2seq` architecture, encoder + decoder + thought vector, recurrent design.
ChallengeL thought vector is fixed and finite.
- [@bahdanauNeuralMachineTranslation2016] introduced preserving all the hidden
state vectors for encoder to "soft search".
- [@vaswaniAttentionAllYou2017] _Attention is All You Need_ introduced
transformer architecture, removed recurrent circuitry.
- [@radfordImprovingLanguageUnderstanding] proposed _generative pre-trained
transformer_ - GPT architecture, basically transformer with encoder ripped
off. Pre-training on unlabelled text with fine-tuning for specific tasks
worked pretty well.
- GPT-2 increased training set and model size, making it multitask learner.
- GPT-3 saw another order-of-magnitude increase in model size and trainig set.
[@brownLanguageModelsAre2020] - language models are few shot learners, the
start of [[prompt|prompt-engineering]].
## Resources
- [LiteLLM](https://litellm.ai), offers unified API to different models.
- [OpenRouter](https://openrouter.ai), one API key to different models.
- [Artificial ANalysis](https://artificialanalysis.ai), an independent site that
rates the performance of models.