Johannes Fischer, Volker Heun, and Stefan Kramer (2005)
Fast Frequent String Mining Using Suffix Arrays
In: ICDM '05: Proceedings of the Fifth IEEE International Conference on Data Mining, pp. 609-612, Washington, DC, USA, IEEE Computer Society Press.
Mining frequent strings in databases has many interesting applications, e.g., in computational biology. We focus on a special kind of constraint-based frequent string mining, namely computing all strings that are frequent in one database and infrequent in another. We present a method to find such strings by using the suffix- and lcp-arrays, which can be computed extremely fast and space efficiently, and further exhibit a good locality behavior. We test our method on several biologically relevant data sets and show that it outperforms existing methods in terms of time and space.