Data Mining
Cluster Analysis: Advanced Concepts
and Algorithms
Lecture Notes for Chapter 8
Introduction to Data Mining
by
Tan, Steinbach, Kumar
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 *
Hierarchical Clustering: Revisited
CURE: Another Hierarchical Approach
CURE
Graph-Based Clustering
Graph-Based Clustering: Sparsification
Graph-Based Clustering: Sparsification …
Limitations of Current Merging Schemes
Chameleon: Steps
Chameleon: Steps …
ROCK (RObust Clustering using linKs)
Obtain a sample of points from the data set
Compute the link value for each set of points, i.e., transform the original similarities (computed by Jaccard coefficient) into similarities that reflect the number of shared neighbors between points
Perform an agglomerative hierarchical clustering on the data using the “number of shared neighbors” as similarity measure and maximizing “the shared neighbors” objective function
Assign the remaining points to the clusters that have been found
Jarvis-Patrick Clustering
SNN Clustering Algorithm
Compute the similarity matrix
This corresponds to a similarity graph with data points for nodes and edges whose weights are the similarities between data points
Sparsify the similarity matrix by keeping only the k most similar neighbors
This corresponds to only keeping the k strongest links of the similarity graph
Construct the shared nearest neighbor graph from the sparsified similarity matrix.
At this point, we could apply a similarity threshold and find the connected components to obtain the clusters (Jarvis-Patrick algorithm)
Find the SNN density of each Point.
Using a user specified parameters, Eps, find the number points that have an SNN similarity of Eps or greater to each point. This is the SNN density of the point
SNN Clustering Algorithm …
Find the core points
Using a user specified parameter, MinPts, find the core points, i.e., all points that have an SNN density greater than MinPts
Form clusters from the core points
If two core points are within a radius, Eps, of each other they are place in the same cluster
Discard all noise points
All non-core points that are not within a radius of Eps of a core point are discarded
Assign all non-noise, non-core points to clusters
This can be done by assigning such points to the nearest core point
(Note that steps 4-8 are DBSCAN)
Features and Limitations of SNN Clustering