Talk:Tf–idf

From Wikipedia, the free encyclopedia

Usually, the term frequency is just the count of a term in a document (NOT divided by the total number of terms in the document), which is confusing because it isn't really a frequency.

I strongly agree, in all the technical papers I've been reading for my Internet services class at U.Washington, TF is the count, and so TF*IDF is biased (usually has higher values) for longer documents therefore needing to be normalized.

[edit] lowercase

Why title of the article is in lower case? Why not "TF-IDF"? --ajvol 15:29, 25 November 2006 (UTC)

  • I believe the short story of this is that tf-idf is a well known function in the literature and that is how it is referred. I know that in some cases it is used to help differentiate it from the uppercase variations that are sometimes used to refer to other equations. Josh Froelich 03:19, 11 December 2006 (UTC)