Home

This site covers a variety of data science, data engineering, and visualization practices and methods being deployed at digital first companies.

Content

Identifying Document Similarities from Text Data with Python

A common issue across publishing and other companies have analyzing large volumes of text data is a seemingly simple question – how similar are documents to one another? The use cases for these algorithms are abundant from identifying plagiarism, to avoiding duplicative content warnings from Search Engines, contextual document recommendation engine generation, and tasks like …