# Scanner and Encoder
> [!note] Definition Scanning is the process of **identifying tokens** from the
> raw text source code of a program.
- Go through the source code and put each token into the "bracket".
- Sample tokens:
- Keywords
- Identifiers
- Numbers
- Strings
- Comments and whitespaces
- Remember to create enum type for all tokens.
- Narrow down the scope of possible identifiers as we go.
- _[[backtracking|Backtracking]]_ -- `unputc` function.
- We have to be rigorous in defining tokens.
- Example: [JSON](https://json.org)
- To rigorously define it, we need [[regex|regular expression]].
## Regular Expression
- Finite REs create potentially enumerable infinite languages.
- `SLASH + STAR + (NOT STAR | STAR + NOT STAR)* + STAR + SLASH`
## Finite Automaton
- DFA: Exactly one action.
- NFA
- Could be multiple choices at each step.
- Assigning priorities to different accepting states (tokens)
- RE to NFA: Thompson's Construction