# Prompt Engineering for [[llm|LLM]] <!-- cSpell:words snippetize snippetizing RLHF --> - Prompting steps - Context retrieval - Snippetizing context - Scoring and prioritizing snippets (some may need to be dropped) - Prompt assembly ## About the Prompt - _truth-bias_ in the prompt content. - LLMs are all about completing a document. - Putting user content inside the system message will give users a chance to override the system message. - Criteria for prompt (for completion models) - Should be similar to texts that the LLM is trained on - Should contain all information needed to complete - Should lead to a solution - Should have a clear stop - Dos and don'ts - Prefer dos over don'ts - Give reason for instruction (thou shall not kill because...) - Few-shot prompting - Usually much easier than instruction based prompting, since LLMs are good at following examples. But this is also limited. - Does not scale well when context is big (long examples or too many examples) - And _anchor_ the model in an unexpected way (biased), especially biasing towards edge cases (assume them to be as common as typical cases) - Can suggest spurious patterns, such as any sorting order -- you never know what pattern is extrapolated by the LLM. - Try to make the model "believe" that it has solved a few of the problems successfully before. ## Context - Latency matters. It's best to pick as much context as we could then whittle them, thus context items should be comparable in terms of their value. - Brainstorm with mind map to find potential context items. - Two dimensions: proximity from user and stability and context - Irrelevant information should be avoided: _Chekov's Gun fallacy_, LLM will try to reason hard to make sense of all info. Use [[rag|RAG]] for context. - _Summarization_ is needed when context is too long - Summarize summaries if content exceeds context window. - Recursive summaries: summarize at sections level, then at chapter level, then book level. - "Rumor problem": model could misunderstand things in summarization. - Summarization is lossy, ask for summary with the final application task in mind. Specific summaries are good but can't be shared among different use cases. ## Assembly of Prompt - Constraints - _In-context learning_: the closer the information is to the end of the prompt, the more impact it has on the model. - _The lost middle phenomenon_: the model can easily recall the beginning and end of the prompt, but struggles with information in the middle. - Structure - _Introduction_: guiding the focus of the LLM from the very beginning. - _Valley of Meh_: the content in this valley are of reduced impact. - _Context_ - _Refocus_: necessary for longer prompts to bring the model's attention back to the question itself. e.g. "Based on the given information, I am ready to answer the question regarding..." - _Transition_: e.g. "The answer is..." In some model, this is implied by a question mark. - Chat vs Completion model - Chat model benefit from natural multi-round interactive problem solving. - Completion model avoids some unhelpful traits from RLHF, and allows _inception_, where we dictate the beginning of the answer. - Document types - Dialogues: freeform text, transcript, marker-less, structured. - Analytic Report: preferably in [[markdown|Markdown]] format, with an `## Idea` monologue section that can be ignored (chain-of-thought prompting), the `## Conclusion` section is the actual output, and `## Further Reading` can be treated as a marker for end of response. - Structured Document: XML, YAML, JSON, etc. - Elastic snippet: given limited context window, create multiple versions of a context snippet, and place the biggest snippet that fit in into the final prompt. - Relationship between (sub) prompts - Position - Importance, assessed with scores or tiers. - Dependency, e.g. requirements and incompatibilities for snippets. - A prompt crafting engine: respects the constraints, uses some algorithm (e.g. additive/subtractive [[greedy]] algorithm) to pick snippets, then reconstruct the prompt according to the position. ## Completion/Response <!-- cSpell:words logprob logprobs --> - Preamble - Structural boilerplate: can be eliminated through prompting. - Reasoning: desirable with chain-of-thought prompting. - Fluff: should be avoided. e.g. "Please reply in the following format: 1. result 1, result 2, ..., result n; 2. Disclaimers (if any); 3. Background and explanation (if any)." - Postscript: To detect the end of the actual answer, with the use of stop sequences and ending the stream. - Recognizing Start and End - Logprob: averaged logprobs is an indicator for confidence level of the response or quality. ## LLM as Classifier - When used as classifier, it's important to make sure options all start with different tokens. Otherwise, the model will favor the options sharing common prefixes, as their logprobs add up. - _Calibrate_ the model by shifting the logprob by a constant, if needed. For example, only answer `No` if it's quite certain. The constant can be found by experimenting or by minimizing the cross entropy loss, as we do in [[logistic-regression]].