Longest Common Subsequence

Subsequence
A sequence in mathematics is a an enumerated collection of objects. Unlike sets, they are ordered and can have repeated elements. For example,
A subsequence in mathematics is a sequence that can be derived from another sequence by deleting certain elements without changing the order of the remaining elements. For example,
Longest Common Subsequence
The Longest common subsequence (LCS) of 2 sequences is a subsequence, with maximal length, which is common to both the sequences. The subsequence must appear in the same order in both the sequences and need not be continuous.
For example, for the two strings given below, the length of the LCS is 6.
LCS isn't necessarily unique, there can be more than one common subsequence with the maximum length.
For example, for the two strings given below, the length of the LCS is 3.
Brute Force Solution:
For every subsequence of , check whether it is a subsequence of . Consider two sequences and of length and respectively.
There are subsequences to check in and it takes time to check whether each of these subsequences is a subsequence of or not. This gives a time complexity of:
Dynamic Programming Solution:
The LCS problem follows the optimal substructure property i.e. it can be broken down into smaller subproblems until the solution for the subproblem becomes trivial. It also has overlapping subproblems. Thus, the LCS problem can be solved using dynamic programming by storing the results for the subproblems and reusing them instead of computing them over and over. Consider,
Let be the LCS of A and B of length p.
, and represent the prefixes A[1..i], B[1..j] and C[1..k].
- If then
- If then
We save the results for the subproblems in a 2D array so that we don't need to recompute it again and again.
stores the length of the LCS for the prefixes and
Recursive Formulation
stores the length of the LCS of A and B.
Printing the LCS
To find the length of the LCS, we used a 2D table which stores the length of the LCS for the prefixes. We can use this table to print the LCS for the two sequences.
For strings
The LCS table for these two strings is
0 0 0 0 0 0
0 1 1 1 1 1
0 1 1 1 2 2
0 1 2 2 2 3
0 1 2 2 3 3
0 1 2 3 3 3
0 1 2 3 4 4
We start with the bottom right corner i.e. LCS[n][m].
If the then add this character to the and go to position Otherwise compare and LCS[i][j-1] and go to the position with higher value (The value of will be equal to this value) .
Pseudocode:
i=n,j=m
index=LCS[n][m] //We start from the bottom right corner and add the characters to the LCS from the last
while(i>0 and j>0)
if(a[i]=b[j])
LCS_String[index]=a[i]
index=index-1
i=i-1
j=j-1
else
if(LCS[i][j-1]=LCS[i][j])
j=j-1
else
i=i-1