Skip to main content

Table 2 Techniques adopted in existing and proposed algorithms

From: Correlated concept based dynamic document clustering algorithms for newsgroups and scientific literature

Algorithm

Document representation

Similarity measure

Data set

Existing algorithms

SHC Gad and Kamel (2010)

Term weight (word/phrase relationship)

Semantic Similarity

Reuters-21578 and 20-Newsgroups

ESHC-IntraCVS Gavin and Yue (2009)

Term frequency

Cosine Similarity

UW-CAN dataset, 314 web pages from University of Waterloo

CBA (Shehata (2010;Shehata et al. 2010)

Verb argument structure

Concept similarity Measure

ACM abstract articles, Reuters, Brown corpus, Usenet newsgroups

ICA Liu et al. (2008)

Term occurrencec

Jaccard coefficient

20NewsGroup corpus

Proposed algorithms

TMARDC

Term frequency

MARDL, sentence similarity

ACM abstract articles, 20Newsgroup

CCMARC

Correlated terms

Semantic similarity

ACM abstract articles, 20Newsgroup

CCFICA

Correlated terms

Semantic similarity

ACM abstract articles, 20Newsgroup