Python – Longest Common Subsequence

The Longest Common Subsequence (LCS) problem is a classic dynamic programming challenge. It involves finding the longest subsequence common to two sequences, where the subsequence does not need to occupy consecutive positions in the original sequences. This problem is widely used in areas such as text comparison and bioinformatics.

Problem Statement

Given two sequences, X and Y, the task is to find the length of the longest subsequence that is common to both sequences. If there is no common subsequence, the result should be 0.

Sample Input and Output

Example 1:

</>
Copy
# Input:
X = "ABCBDAB"
Y = "BDCABC"

# One of the possible LCS is "BCAB"
# Output: 4

Explanation: The longest common subsequence between “ABCBDAB” and “BDCABC” is “BCAB”, which has a length of 4. Note that there can be multiple valid LCS with the same length.

Example 2:

</>
Copy
# Input:
X = "AGGTAB"
Y = "GXTXAYB"

# One of the possible LCS is "GTAB"
# Output: 4

Explanation: The longest common subsequence between “AGGTAB” and “GXTXAYB” is “GTAB”, which also has a length of 4.

Solution Approach

To solve the LCS problem using dynamic programming, we construct a 2D table dp where dp[i][j] represents the length of the LCS for the first i characters of sequence X and the first j characters of sequence Y. The recurrence relation is defined as follows:

if X[i-1] == Y[j-1]: dp[i][j] = dp[i-1][j-1] + 1 else: dp[i][j] = max(dp[i-1][j], dp[i][j-1])

We initialize the first row and first column of the table with 0 because an empty sequence has an LCS of length 0 with any other sequence.

Python Program

Iterative Dynamic Programming Approach

</>
Copy
def lcs_length(X, Y):
    """
    Compute the length of the longest common subsequence (LCS) between sequences X and Y.
    
    Parameters:
    X (str): First sequence.
    Y (str): Second sequence.
    
    Returns:
    int: Length of the LCS.
    """
    m = len(X)
    n = len(Y)
    
    # Initialize a (m+1) x (n+1) DP table with all values as 0.
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    
    # Build the dp table in bottom-up manner.
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if X[i - 1] == Y[j - 1]:
                dp[i][j] = dp[i - 1][j - 1] + 1
            else:
                dp[i][j] = max(dp[i - 1][j], dp[i][j - 1])
    
    return dp[m][n]

# Test examples
print("LCS Length Example 1:")
X1 = "ABCBDAB"
Y1 = "BDCABC"
print("LCS length for X = '{}' and Y = '{}' is:".format(X1, Y1), lcs_length(X1, Y1))
# Expected output: 4

print("\nLCS Length Example 2:")
X2 = "AGGTAB"
Y2 = "GXTXAYB"
print("LCS length for X = '{}' and Y = '{}' is:".format(X2, Y2), lcs_length(X2, Y2))
# Expected output: 4

Explanation: In this solution, we create a 2D list dp where each entry dp[i][j] holds the length of the LCS for the substrings X[0:i] and Y[0:j]. We iterate through each character of both sequences. If the characters match, we add 1 to the result of the previous subproblem dp[i-1][j-1]; otherwise, we take the maximum of the two possible subproblems dp[i-1][j] and dp[i][j-1]. The final answer is found at dp[m][n], where m and n are the lengths of X and Y, respectively.

Conclusion

In this tutorial, we explored the Longest Common Subsequence problem and demonstrated how to solve it using a dynamic programming approach. We walked through the problem statement, examined sample inputs and outputs with detailed explanations, and implemented an iterative solution in Python. This approach, with a time complexity of O(m*n), is effective for solving the LCS problem and is a common topic in technical interviews.