# Prompt Engineering for [[llm|LLM]]
<!-- cSpell:words snippetize snippetizing RLHF -->
- Prompting steps
- Context retrieval
- Snippetizing context
- Scoring and prioritizing snippets (some may need to be dropped)
- Prompt assembly
## About the Prompt
- _truth-bias_ in the prompt content.
- LLMs are all about completing a document.
- Putting user content inside the system message will give users a chance to
override the system message.
- Criteria for prompt (for completion models)
- Should be similar to texts that the LLM is trained on
- Should contain all information needed to complete
- Should lead to a solution
- Should have a clear stop
- Dos and don'ts
- Prefer dos over don'ts
- Give reason for instruction (thou shall not kill because...)
- Few-shot prompting
- Usually much easier than instruction based prompting, since LLMs are good at
following examples. But this is also limited.
- Does not scale well when context is big (long examples or too many examples)
- And _anchor_ the model in an unexpected way (biased), especially biasing
towards edge cases (assume them to be as common as typical cases)
- Can suggest spurious patterns, such as any sorting order -- you never know
what pattern is extrapolated by the LLM.
- Try to make the model "believe" that it has solved a few of the problems
successfully before.
## Context
- Latency matters. It's best to pick as much context as we could then whittle
them, thus context items should be comparable in terms of their value.
- Brainstorm with mind map to find potential context items.
- Two dimensions: proximity from user and stability and context
- Irrelevant information should be avoided: _Chekov's Gun fallacy_, LLM will try
to reason hard to make sense of all info. Use [[rag|RAG]] for context.
- _Summarization_ is needed when context is too long
- Summarize summaries if content exceeds context window.
- Recursive summaries: summarize at sections level, then at chapter level,
then book level.
- "Rumor problem": model could misunderstand things in summarization.
- Summarization is lossy, ask for summary with the final application task in
mind. Specific summaries are good but can't be shared among different use
cases.
## Assembly of Prompt
- Constraints
- _In-context learning_: the closer the information is to the end of the
prompt, the more impact it has on the model.
- _The lost middle phenomenon_: the model can easily recall the beginning and
end of the prompt, but struggles with information in the middle.
- Structure
- _Introduction_: guiding the focus of the LLM from the very beginning.
- _Valley of Meh_: the content in this valley are of reduced impact.
- _Context_
- _Refocus_: necessary for longer prompts to bring the model's attention back
to the question itself. e.g. "Based on the given information, I am ready to
answer the question regarding..."
- _Transition_: e.g. "The answer is..." In some model, this is implied by a
question mark.
- Chat vs Completion model
- Chat model benefit from natural multi-round interactive problem solving.
- Completion model avoids some unhelpful traits from RLHF, and allows
_inception_, where we dictate the beginning of the answer.
- Document types
- Dialogues: freeform text, transcript, marker-less, structured.
- Analytic Report: preferably in [[markdown|Markdown]] format, with an
`## Idea` monologue section that can be ignored (chain-of-thought
prompting), the `## Conclusion` section is the actual output, and
`## Further Reading` can be treated as a marker for end of response.
- Structured Document: XML, YAML, JSON, etc.
- Elastic snippet: given limited context window, create multiple versions of a
context snippet, and place the biggest snippet that fit in into the final
prompt.
- Relationship between (sub) prompts
- Position
- Importance, assessed with scores or tiers.
- Dependency, e.g. requirements and incompatibilities for snippets.
- A prompt crafting engine: respects the constraints, uses some algorithm (e.g.
additive/subtractive [[greedy]] algorithm) to pick snippets, then reconstruct
the prompt according to the position.
## Completion/Response
<!-- cSpell:words logprob logprobs -->
- Preamble
- Structural boilerplate: can be eliminated through prompting.
- Reasoning: desirable with chain-of-thought prompting.
- Fluff: should be avoided. e.g. "Please reply in the following format: 1.
result 1, result 2, ..., result n; 2. Disclaimers (if any); 3. Background
and explanation (if any)."
- Postscript: To detect the end of the actual answer, with the use of stop
sequences and ending the stream.
- Recognizing Start and End
- Logprob: averaged logprobs is an indicator for confidence level of the
response or quality.
## LLM as Classifier
- When used as classifier, it's important to make sure options all start with
different tokens. Otherwise, the model will favor the options sharing common
prefixes, as their logprobs add up.
- _Calibrate_ the model by shifting the logprob by a constant, if needed. For
example, only answer `No` if it's quite certain. The constant can be found by
experimenting or by minimizing the cross entropy loss, as we do in
[[logistic-regression]].