Bart Goethals

Frequent Pattern Mining Implementations

The Apriori, DIC, Eclat and Fp-growth algorithms generate all frequent itemsets for a given minimal support threshold. The Rules algorithm generates all association rules for a given minimal confidence threshold.
Most are implemented in C++ (using the Standard Template Library).
A survey of most of these methods can be found in my "Survey on Frequent Pattern Mining".

IMPORTANT: All the software is developed for research purposes only!
Some bugs may be present within the software and no guarantees are given!
We would appreciate any comments, bug descriptions, suggestions or succes stories regarding the software.

Eclat (Download)
This implementation is based on the Eclat algorithm (cfr. Zaki et al., 1997).
Several optimizations have been added such as the use of diffsets.

A very short Python implementation can be found here
And an even shorter (tweetable) version is here:

def e(l,p):
 while l:i,t=l.pop();print(p+[i],len(t));e([(j,t&o)for j,o in l if len(t&o)>=m],p+[i])

Fp-growth (Download)
This implementation is based on the Fp-growth algorithm (cfr. Han et al., 2000).

Apriori (Download)
This implementation is based on the apriori algorithm (Agrawal et al., 1996).
Several optimizations have been added and a trie structure is used in stead of the hash-tree structures.
Note, this code only generates frequent sets! For rules, see below.

NDI (Download)
This implementation is based on the NDI algorithm (Calders and Goethals, 2002).
Several optimizations have been added, such as a fast algorithm for the inclusion-exclusion (see Calders and Goethals, KDID 2005).
The tar-ball contains the original breadth-first implementation, but also a depth-first implementation of the basic NDI algorithm (Calders and Goethals, SDM 2005). Note, however, that this depth-first implementation does not use the generalized diffsets as described in the latter paper.

DIC (Download)
This implementation is based on the Dynamic Itemset Counting (DIC) algorithm (cfr. Brin et al., 1997).
The implementation contains no additional optimizations ans seems to perform worse than Apriori on almost all datasets I have tested on. All suggestions or comments are welcome.

Rules (Download)
This implementation generates association rules, based on the Apriori algorithm (cfr. Agrawal et al.,1995). It takes as input a file of frequent sets in the format such as generated by the previous implementations.

Rank-correlated set mining (Download)
This implementation is based on the rank-correlated set mining technique for numerical attributes as described in the paper "Mining rank-correlated sets of numerical attributes" (Calders, Goethals, Jaroszewicz, 2006).

For other free implementations and experimental comparisons, take a look at the FIMI repository.

[Home]