FreeTreeMiner
In recent years, researchers in graph mining have been exploring linear paths as well as subgraphs as pattern languages. Here, we are investigating the middle ground between these two extremes: mining free (that is, unrooted) trees in graph data. The motivation for this is the need to upgrade linear path patterns, while avoiding complexity issues with subgraph patterns. Starting from such considerations, we developed FreeTreeMiner, a system for constraint-based graph mining working on top of a chemical library, and tested it on two datasets from the National Cancer Institute's Developmental Therapeutics Program (DTP), anti-HIV and anti-cancer screening data. The binary is available for download.
Publications
Rückert, U and Kramer, S
(2003).
Generalized Version Space Trees
In: Proceedings of the 2nd International Workshop on Knowledge Discovery in Inductive Databases (KDID-2003), ed. by Jean-Francois Boulicaut and Saso Dzeroski, pp. 119-129.
Rückert, U and Kramer, S
(2004).
Frequent Free Tree Discovery in Graph Data
In: Proceedings of the ACM Symposium on Applied Computing (SAC-2004), pp. 564-570.
Software
Download Linux Binary: FreeTreeMiner-binary
Source: FreeTreeMiner-source
FTM is licensed under the GNU General Public License (GPL) version 2. Please cite [RK04] if you are using the software in a publication.
gSpan'
More recently, we proposed two optimizations for mining molecular databases with gSpan. Both optimizations apply to the enumeration of subgraph occurrences in a graph database, which is, also according to our profiling, the most expensive operation of gSpan. The first optimization reduces the number of subgraph isomorphisms that need to be accessed for proper support computation in considering the symmetries inherent in many chemical molecules,and the second speeds up subgraph isomorphism tests by making use of the non-uniform frequency distribution of atom and bond types. The optimizations are part of a reimplementation of the original gSpan algorithm and are shown to significantly increase the performance on two chemical datasets. The software is made available under the GNU GPL.
Publications
Jahn, K and Kramer, S
(2005).
Optimizing gSpan for Molecular Datasets
In: Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005).
Software
Download Linux Binary & Source: gSpan'
gSpan' is licensed under the GNU General Public License (GPL) version 2. Please cite [JK05] if you are using the software in a publication.
