Skip to main content

Table 1 Survey of recent document clustering algorithms

From: Correlated concept based dynamic document clustering algorithms for newsgroups and scientific literature

Algorithm name with author(s)

Technical abbreviation

Representation

Similarity measure

Data set used

Threshold Resilient Online Algorithm Chou and Chen (2008)

IPLSI(Incremental Probabilistic Latent Semantic Indexing)

Latent Semantic Variables

A latent variable is introduced between documents and terms, Cosine function

NIST TDT Corpora

Efficient Phrase Based Indexing Hammouda and Kamel (2004)

Uses DIG(Document Index Graph) for Web Clustering

Document Index Graph (Phrase Based Representation)

Phrase Based Similarity measure

USENET News Groups

Component-Based Clustering Algorithms Boris et al. (2012)

IR(Initial Representative), MD(Measure Distance), UR(Update Representatives), EC(Evaluate Clusters), SC(Stop Criterion)

Object-Based Software Representation

CITY,CORREL, COSINE, ELUCID

10 UCI Datasets

Temporal Queries and Version Management Zaniolo and Wang (2008)

XML Techniques

V-Document (XML Document)

----

W3C, World Fact Book

Density –Based Methods for Hierarchical Clustering Chehreghani and Abolhassani (2008)

3-Phases: Insertion Phase, Extraction Phase, Combination Phase

M-Tree Structure

Relative distance between objects

DMOZ, NEWS, REUTERS

XML Schema Matching Algorithm Alsayed et al. (2009)

NPS(Number Prufer Sequences), LPS(Label Prufer Sequences)

Prufer Sequences, Schema Trees

The distance between two nodes in the schema tree

XCBL, OAGIS

Novel Web User Clustering Method Ling et al. (2009)

A 3Phase COWES Algorithm

A Web Session Subtree

DoC(Degree of Change), FoC(Frequency of Change) and SoC(Significance of Change)

Internet Traffic Archive

Multi-label Document Clustering Algorithm Chen et al. (2010)

FMDC(Fuzzy Based Multi-label Document Clustering) – Fuzzy Association Rule + Existing Ontology

Terms and Hypernyms Representation of documents

Membership Functions and Document Term Matrix

Classic, Re0, R8, and WebKB

Incremental Construction of Multilingual Topic Maps Ellouze et al. (2012)

CITOM(Construction Incremental Topic Map)

Topic Map Model Representation

Topic Map Pruning Process

Multilingual corpora

Feature Extraction Algorithm Yan et al. (2011)

TOFA(Trace-Oriented Feature Analysis)

Bag Of Words Model(BOW)

Latent Semantic Indexing(LSI)

20NG, RVCI, ODP

Correlation Similarity Measure Space Zhang et al. (2011)

CPI(Correlation Preserving Index)

Terms and related terms

Correlation similarity

20NG

Contextual Document Cluster Rooney et al. (2006)

CDC(Contextual Document Cluster)

Term Document Representation

Adjacent Document Similarity

RCVI

Framework of Wikipedia-Based Clustering Hu et al. (2009)

Exact-match and Relatedness-match

Concept feature vector and Category feature vector

Complete Linkage as cluster distance measure

20-newsgroup, TDT2, LA Times