Skip to main content


Table 1 Survey of recent document clustering algorithms

From: Correlated concept based dynamic document clustering algorithms for newsgroups and scientific literature

Algorithm name with author(s) Technical abbreviation Representation Similarity measure Data set used
Threshold Resilient Online Algorithm Chou and Chen (2008) IPLSI(Incremental Probabilistic Latent Semantic Indexing) Latent Semantic Variables A latent variable is introduced between documents and terms, Cosine function NIST TDT Corpora
Efficient Phrase Based Indexing Hammouda and Kamel (2004) Uses DIG(Document Index Graph) for Web Clustering Document Index Graph (Phrase Based Representation) Phrase Based Similarity measure USENET News Groups
Component-Based Clustering Algorithms Boris et al. (2012) IR(Initial Representative), MD(Measure Distance), UR(Update Representatives), EC(Evaluate Clusters), SC(Stop Criterion) Object-Based Software Representation CITY,CORREL, COSINE, ELUCID 10 UCI Datasets
Temporal Queries and Version Management Zaniolo and Wang (2008) XML Techniques V-Document (XML Document) ---- W3C, World Fact Book
Density –Based Methods for Hierarchical Clustering Chehreghani and Abolhassani (2008) 3-Phases: Insertion Phase, Extraction Phase, Combination Phase M-Tree Structure Relative distance between objects DMOZ, NEWS, REUTERS
XML Schema Matching Algorithm Alsayed et al. (2009) NPS(Number Prufer Sequences), LPS(Label Prufer Sequences) Prufer Sequences, Schema Trees The distance between two nodes in the schema tree XCBL, OAGIS
Novel Web User Clustering Method Ling et al. (2009) A 3Phase COWES Algorithm A Web Session Subtree DoC(Degree of Change), FoC(Frequency of Change) and SoC(Significance of Change) Internet Traffic Archive
Multi-label Document Clustering Algorithm Chen et al. (2010) FMDC(Fuzzy Based Multi-label Document Clustering) – Fuzzy Association Rule + Existing Ontology Terms and Hypernyms Representation of documents Membership Functions and Document Term Matrix Classic, Re0, R8, and WebKB
Incremental Construction of Multilingual Topic Maps Ellouze et al. (2012) CITOM(Construction Incremental Topic Map) Topic Map Model Representation Topic Map Pruning Process Multilingual corpora
Feature Extraction Algorithm Yan et al. (2011) TOFA(Trace-Oriented Feature Analysis) Bag Of Words Model(BOW) Latent Semantic Indexing(LSI) 20NG, RVCI, ODP
Correlation Similarity Measure Space Zhang et al. (2011) CPI(Correlation Preserving Index) Terms and related terms Correlation similarity 20NG
Contextual Document Cluster Rooney et al. (2006) CDC(Contextual Document Cluster) Term Document Representation Adjacent Document Similarity RCVI
Framework of Wikipedia-Based Clustering Hu et al. (2009) Exact-match and Relatedness-match Concept feature vector and Category feature vector Complete Linkage as cluster distance measure 20-newsgroup, TDT2, LA Times