Algorithm | Document representation | Similarity measure | Data set |
---|---|---|---|
Existing algorithms | |||
SHC Gad and Kamel (2010) | Term weight (word/phrase relationship) | Semantic Similarity | Reuters-21578 and 20-Newsgroups |
ESHC-IntraCVS Gavin and Yue (2009) | Term frequency | Cosine Similarity | UW-CAN dataset, 314 web pages from University of Waterloo |
CBA (Shehata (2010;Shehata et al. 2010) | Verb argument structure | Concept similarity Measure | ACM abstract articles, Reuters, Brown corpus, Usenet newsgroups |
ICA Liu et al. (2008) | Term occurrencec | Jaccard coefficient | 20NewsGroup corpus |
Proposed algorithms | |||
TMARDC | Term frequency | MARDL, sentence similarity | ACM abstract articles, 20Newsgroup |
CCMARC | Correlated terms | Semantic similarity | ACM abstract articles, 20Newsgroup |
CCFICA | Correlated terms | Semantic similarity | ACM abstract articles, 20Newsgroup |