# Scanner and Encoder > [!note] Definition Scanning is the process of **identifying tokens** from the > raw text source code of a program. - Go through the source code and put each token into the "bracket". - Sample tokens: - Keywords - Identifiers - Numbers - Strings - Comments and whitespaces - Remember to create enum type for all tokens. - Narrow down the scope of possible identifiers as we go. - _[[backtracking|Backtracking]]_ -- `unputc` function. - We have to be rigorous in defining tokens. - Example: [JSON](https://json.org) - To rigorously define it, we need [[regex|regular expression]]. ## Regular Expression - Finite REs create potentially enumerable infinite languages. - `SLASH + STAR + (NOT STAR | STAR + NOT STAR)* + STAR + SLASH` ## Finite Automaton - DFA: Exactly one action. - NFA - Could be multiple choices at each step. - Assigning priorities to different accepting states (tokens) - RE to NFA: Thompson's Construction