# Large Language Model
## Concepts
- [[prompt|Prompt engineering]]
- _Tokens_ used by a model: _vocabulary_
- _Autoregression_, takes a most likely next token, append it to the prompt, and
run it again to get a new token.
- _Sampling_ tokens from possible outputs
- _Temperature_, `0` will cause a semi-deterministic output, `1` will uniformly
sample the choices according to probability distribution. Model output
deteriorates at high temperature since the gibberish formed a pattern.
- [[transformer|Transformer architecture]]
- [[fine-tune|Fine-tuning process]]
- [[agent|Agent]]
- Another way to look at LLM: not just auto-completers, but highly effective,
neural network powered classifier, at each token. LLM also performs better
when used in this way. (e.g. tool calling in agent)
## History
- Markov model of natural language introduced by Shannon in 1948.
[_A Mathematical Theory of Communication_](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf),
which also introduced the concept of information [[entropy]].
- `seq2seq` architecture, encoder + decoder + thought vector, recurrent design.
ChallengeL thought vector is fixed and finite.
- [@bahdanauNeuralMachineTranslation2016] introduced preserving all the hidden
state vectors for encoder to "soft search".
- [@vaswaniAttentionAllYou2017] _Attention is All You Need_ introduced
transformer architecture, removed recurrent circuitry.
- [@radfordImprovingLanguageUnderstanding] proposed _generative pre-trained
transformer_ - GPT architecture, basically transformer with encoder ripped
off. Pre-training on unlabelled text with fine-tuning for specific tasks
worked pretty well.
- GPT-2 increased training set and model size, making it multitask learner.
- GPT-3 saw another order-of-magnitude increase in model size and training set.
[@brownLanguageModelsAre2020] - language models are few shot learners, the
start of [[prompt|prompt-engineering]].
## Resources
- [LiteLLM](https://litellm.ai), offers unified API to different models.
- [OpenRouter](https://openrouter.ai), one API key to different models.
- [Artificial Analysis](https://artificialanalysis.ai), an independent site that
rates the performance of models.