This web page is for the article "Finding Subspace Clusters using Ranked Neighborhoods" by Emin Aksehirli, Siegfried Nijssen, Matthijs van Leeuwen, and Bart Goethals which is published in Proceedings of Data Mining Workshop (ICDMW), 2015 IEEE International Conference on. The article can be found here.
Clustering high dimensional datasets is challenging due to the curse of dimensionality. One approach to address this challenge is to search for subspace clusters, i.e., clusters present in subsets of attributes. Recently the cartification algorithm was proposed to find such subspace clusters. The distinguishing feature of this algorithm is that it operates on a neighborhood database, in which for every object only the identities of the k closest objects are stored. Cartification was shown to produce better results than other state-of-the-art subspace clustering algorithms; however, which clusters it detects was also found to depend heavily on the setting of the parameters. In other words, it is not robust to input parameters.
In this paper, we propose a new approach called ranked cartification that produces more robust results than ordinary cartification. We develop a transformation that creates ranked matrices instead of neighborhood databases; we identify clusters in these ranked matrices.
We demonstrate that this method is more robust than cartification in terms of cluster detection.
Please find the pre-compiled and packaged implementation here and its source code here.
Carti-Rank is packaged as a runnable .jar file (carti-rank.jar). You can run the application
on command line with the commands,
java -jar carti-rank.jar data-file theta minlen
Carti-Rank accepts parameters as command line arguments in the specified order.
data-file: Path to the multi dimensional datafile. Please find the
theta: Parameter for the tile mining.
minsup: Minimum length for the cluster.
About the data file:
Carti-Rank outputs the found clusters to the standard output. Each line of output
represents a subspace cluster. Output format:
Size of the cluster - Dimensions of the cluster - Objects of the cluster
10 - 1 2 6 - 0 1 2 3 4 5 6 7 8 9 means a cluster is
detected at 1st, 2nd and 6th subspaces and it has '10' objects, i.e.,
0 1 2 3 4 5 6 7 8 9
java -jar cart-rank.jar ns25.mime 25 20
All the datasets we use in the paper are here.