You are here: Home Research Using Local Models to Improve (Q)SAR Predictivity

Using Local Models to Improve (Q)SAR Predictivity

In a recent paper, Benigni and Bossa[1] found that local QSAR models can produce results that are  mechanistically interpretable and compare favorably with the known limits in reproducibility of the experimental systems. However, many existing large databases cannot be used directly to build local QSAR models, because they contain diverse sets of non-congeneric structures. We present a novel QSAR approach that detects groups of structures for local QSAR modeling. The algorithm combines clustering and classification or regression for making predictions on chemical structure data. A structural clustering procedure is applied as a preprocessing step, before a (local) model is learned for each relevant cluster. Instead of using only one global model (classical approach), we use weighted local models for predictions of query instances dependent on cluster memberships.
The obtained clusterings are overlapping and non-exhaustive. In other words, a compound can be member of more than one cluster or member of no cluster at all. The dataset is divided into several parts representing the different structural     classes. Compared to global models that generally try to capture everything at once, local models are high-quality models of small regions of the chemical space that often have the advantage of being much more interesting and understandable to a domain expert. Those local models reflect the classical approach to QSAR, where only a small set of highly relevant compounds is taken into account when building a model for a specific endpoint.

The approach is evaluated together with standard statistical QSAR algorithms on various datasets. The results show that in most cases the application of local models significantly improves the predictive power of the derived QSAR models compared to the classical approach, to models that are induced by a fingerprint-based clustering approach and to locally weighted learning. In summary, the new combined approach is interesting both theoretically as a new synthesis of clustering and QSAR model learning and practically as a new method for making predictions in pharmacological and toxicological applications.

[1] Benigni, R., Bossa, C.: Predictivity of QSAR. J. Chem. Inf. Model. 48 (2008), 971–98

Publications

Buchwald, F, Girschick, T, Seeland, M, and Kramer, S (2011).
Using Local Models to Improve (Q)SAR Predictivity
Molecular Informatics, 30(2-3):205-218.

Files

Datasets for the LoMoGraph experiments in ZIP compression (8.3 MB): .zip

Datasets for the LoMoGraph experiments in TAR.GZ compression (8.3 MB): .tar.gz

Descriptor set (731 B): .txt