lcs - Samuel's Vault

# Longest-Common-Subsequence Problem https://leetcode.com/problems/longest-common-subsequence/ ## Definition Given a sequence $X = \langle x_1, x_2, \dots, x_m \rangle$, another sequence $Z = \langle z_1, z_2, \dots, z_k \rangle$ is a _subsequence_ of $X$ if there exists a strictly increasing sequence $\langle i_1, i_2, \dots, i_k \rangle$ such that $x_{i_j} = z_j$. ## Algorithm - Optimal substructure: the LCS of two sequences contains within it an LCS of prefixes of the two sequences. - Recursive solution - If $x_m = y_n$, find LCS of $X_{m-1}$ and $Y_{n-1}$ - If $x_m \ne y_n$, find the LCS of $X_{m-1}$ and $Y_n$, and that of $X_m$ and $Y_{n-1}$. Pick the longer LCS. $ c[i, j] = \begin{cases} 0, & i = 0 \text{ or } j = 0, \\ c[i - 1, j - 1] + 1, & i, j > 0 \text{ and } x_i = y_i, \\ \max \{ c[i, j - 1], c[i - 1, j] \}, & i, j > 0 \text{ and } x_i \ne y_j t\end{cases} $ ## Approach 1: Memoization - Discuss whether `text1[p1]` is included in the optimal solution or not. - If it is, we can find the first occurrence of this char in `text2` and the problem is reduced. - Time complexity: $O(mn^2)$, as there are $mn$ possible pairs of strings, and solving each of them can take $O(n)$ time, as we're searching the occurrence of a char in a string of length $n$. ```python from functools import lru_cache m, n = len(text1), len(text2) @lru_cache() def helper(p1, p2): # Base case: empty strings if p1 == m or p2 == n: return 0 # Option 1: We don't include text1[p1] in the solution. option_1 = helper(p1 + 1, p2) # Option 2: We include text1[p1] in the solution, as long as a match for it # in text2 at or after p2 exists. first_occurence = text2.find(text1[p1], p2) if first_occurence != -1: option_2 = 1 + helper(p1 + 1, first_occurence + 1) else: option_2 = 0 # Return the best option. return max(option_1, option_2) return helper(0, 0) ``` ## Approach 2: Improved Memoization A more intuitive way of structuring the subproblems. - If the current char is the same, consider them a match. This must be part of the optimal solution. - If not, either skip a char in `text1` or skip a char in `text2`. Time complexity becomes $O(mn)$, as there are $mn$ combinations of substrings (or suffixes, more specifically.) ```python from functools import lru_cache m, n = len(text1), len(text2) @lru_cache(maxsize=None) def memo_solve(p1, p2): # Base case: If either string is now empty, we can't match up anymore # characters. if p1 == m or p2 == n: return 0 # Recursive case 1. if text1[p1] == text2[p2]: return 1 + memo_solve(p1 + 1, p2 + 1) # Recursive case 2. else: return max(memo_solve(p1, p2 + 1), memo_solve(p1 + 1, p2)) return memo_solve(0, 0) ``` ## Approach 3: Bottom-Up [[dp|DP]] - Starting from the final char on both strings. Loop in reversed order. - First letter is the same? add `1` to length of LCS - First letter not the same? Have to kick a letter out, either from `str1` or `str2`. Then take the max of sub problems. ```python # Make a grid of 0's with len(text2) + 1 columns and len(text1) + 1 rows. dp = [[0] * (len(text2) + 1) for _ in range(len(text1) + 1)] # Iterate up each column, starting from the last one. for col in reversed(range(len(text2))): for row in reversed(range(len(text1))): # If the corresponding characters for this cell are the same... if text2[col] == text1[row]: dp[row][col] = 1 + dp[row + 1][col + 1] # Otherwise they must be different... else: dp[row][col] = max(dp[row + 1][col], dp[row][col + 1]) # The original problem's answer is in dp[0][0]. Return it. return dp[0][0] ``` ## Approach 4: Bottom-Up DP with Space Optimization - Note that we're only referencing the previously calculated column and the current column that we're calculating. ```python if len(text2) < len(text1): text1, text2 = text2, text1 # The previous column starts with all 0's and like before is 1 # more than the length of the first word. previous = [0] * (len(text1) + 1) # Iterate up each column, starting from the last one. for col in reversed(range(len(text2))): # Create a new array to represent the current column. current = [0] * (len(text1) + 1) for row in reversed(range(len(text1))): if text2[col] == text1[row]: current[row] = 1 + previous[row + 1] # previous col is to the right else: current[row] = max(previous[row], current[row + 1]) # The current column becomes the previous one. previous = current # The original problem's answer is in previous[0]. Return it. return previous[0] ``` An slightly adjusted version: ```python if len(text2) < len(text1): text1, text2 = text2, text1 previous = [0] * (len(text1) + 1) current = [0] * (len(text1) + 1) for col in reversed(range(len(text2))): for row in reversed(range(len(text1))): if text2[col] == text1[row]: current[row] = 1 + previous[row + 1] else: current[row] = max(previous[row], current[row + 1]) previous, current = current, previous return previous[0] ```