Finding similarity between documents is an important task. In the rise of Machine Learning techniques, there are complicated techniques using RNN to detect the grammer and flow behind two documents to determine their similarity. This might sound to be a difficult task but we have been doing this since 1980s using Natural Language Processing techniques. We will follow NLP techniques like TF IDF to achieve this in this article.
This is a companion discussion topic for the original entry at http://iq.opengenus.org/document-similarity-tf-idf/