• + 0 comments

    I won't post the code here, but here's how I solved it:

    1. Setup (spliting words in spaces and special characters)
    2. Create a dictionary counting the ammount of words for each text
    3. Create a global dictionary for the total ammount of words
    4. Normalize each local dictionary, considering the global frequency
    5. Considering each word as a vector, calculate cosine distance between every text from setA versus setB.
    6. Create similarity matrix
    7. Find the permutation matrix to maximize PxS

    Got 100% with this approach