TF-IDF (Term Frequency-Inverse Document Frequency) is a combination of two individual metrics, TF and IDF, respectively. TF-IDF is used when we have multiple documents. It is based on the idea that rare words contain more information about the content of a document than words that are used many times throughout all the documents.

A problem with scoring word frequency is that highly frequent words start to dominate in the document, but may not contain as much “informational content” to the model as rarer but perhaps domain specific words. …










Ning Chen

What happened couldn’t have happened any other way…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store