- Graph Mining
- Graph mining is about finding interesting patterns in graphs. Our groups has worked on two projects in this field: FreeTreeMiner and gSpan'.
- Inductive Databases
- To fully support the analysis of complex and structured data, new efficient computational methods and suitable interfaces for data exploration have to be developed. Moreover, it is desirable to perform all tasks in the knowledge discovery process, from pre-processing to post-processing, on the basis of query languages. Inductive query languages should allow handling patterns/models as first-class objects, provide the right level of abstraction to the user (i.e., meaningful building blocks of data analysis), and emphasize the compositionality of data mining tasks.
- Quantitative Association Rules
- Recently, the problem of including numerical parameters in patterns and association rules attracted some interest in data mining. Taking into account numbers in pattern mining, the definition of patterns and rules becomes a non-trivial problem. We present a new approach to quantitative association rules based on half-spaces and show how it can be applied to the problem of gene expression data analysis.
- String Mining
- In many applications, e.g., in computational biology, the goal is to find interesting string or sequence patterns in data. Application areas are, among others, finding discriminative features for sequence classification or segmentation, discovering new binding motifs of transcription factors, or probe design. We propose a new algorithmic framework that solves frequency-related data mining queries on databases of strings in optimal time, i.e., in time linear in the input and the output size.
- Adapted Transfer of Distance Measures for QSAR
- Information and Datasets for the 2010 Discovery Science Publication and the 2012 Computer Journal Submission
- Complex Output
- In practice, the required output of machine learning algorithms is rarely just binary classification, but may involve multiple classes, hierarchical classification, multi-label classification, the possibility of abstention or, more generally, structured output.
- Fast Conditional Density Estimation
- Many methods for quantitative structure-activity relationships (QSARs) deliver point estimates only, without quantifying the uncertainty inherent in the prediction. One way to quantify the uncertainy of a QSAR prediction is to predict the conditional density of the activity given the structure instead of a point estimate. If a conditional density estimate is available, it is easy to derive prediction intervals of activities. In this paper, we experimentally evaluate and compare three methods for conditional density estimation for their suitability in QSAR modeling.
- Margin-Based Rule Learning
- While early work in machine learning focused on human-comprehensible models like decision trees and rules, recent statistical learning approaches more or less exclusively optimize prediction performance at the expense of comprehensibility. In many application areas (e.g, the environmental or medical sciences), however, comprehensible models with a strong theoretical, statistical underpinning are required. Therefore, we propose a synthesis of classical machine learning and statistical learning methods.
- Using Local Models to Improve (Q)SAR Predictivity
- In a recent paper, Benigni and Bossa found that local QSAR models can produce results that are mechanistically interpretable and compare favorably with the known limits in reproducibility of the experimental systems. However, many existing large databases cannot be used directly to build local QSAR models, because they contain diverse sets of non-congeneric structures. We present a novel QSAR approach that detects groups of structures for local QSAR modeling.
- Downloadable software extensions for the AZOrange data mining framework.
- The overall objective of the project was to develop a framework that provides a unified access to toxicity data, (Q)SAR models, procedures supporting validation and additional information that helps with the interpretation of (Q)SAR predictions.
- The main research goal is to gain new insight into the regulation and evolution of complex cellular systems by uniting approaches from various fields of research (such as bioinformatics, bioengineering, biology and biochemistry).