# Large Language Model ## Concepts - [[prompt|Prompt engineering]] - _Tokens_ used by a model: _vocabulary_ - _Autoregression_, takes a most likely next token, append it to the prompt, and run it again to get a new token. - _Sampling_ tokens from possible outputs - _Temperature_, `0` will cause a semi-deterministic output, `1` will uniformly sample the choices according to probability distribution. Model output deteriorates at high temperature since the gibberish formed a pattern. - [[transformer|Transformer architecture]] - [[fine-tune|Fine-tuning process]] ## History - Markov model of natural language introduced by Shannon in 1948. [_A Mathematical Theory of Communication_](https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf), which also introduced the concept of information [[entropy]]. - `seq2seq` architecture, encoder + decoder + thought vector, recurrent design. ChallengeL thought vector is fixed and finite. - [@bahdanauNeuralMachineTranslation2016] introduced preserving all the hidden state vectors for encoder to "soft search". - [@vaswaniAttentionAllYou2017] _Attention is All You Need_ introduced transformer architecture, removed recurrent circuitry. - [@radfordImprovingLanguageUnderstanding] proposed _generative pre-trained transformer_ - GPT architecture, basically transformer with encoder ripped off. Pre-training on unlabelled text with fine-tuning for specific tasks worked pretty well. - GPT-2 increased training set and model size, making it multitask learner. - GPT-3 saw another order-of-magnitude increase in model size and trainig set. [@brownLanguageModelsAre2020] - language models are few shot learners, the start of [[prompt|prompt-engineering]]. ## Resources - [LiteLLM](https://litellm.ai), offers unified API to different models. - [OpenRouter](https://openrouter.ai), one API key to different models. - [Artificial ANalysis](https://artificialanalysis.ai), an independent site that rates the performance of models.