Using Local Models to Improve (Q)SAR Predictivity
In a recent paper, Benigni and Bossa found that local QSAR models
can produce results that are mechanistically interpretable and compare
favorably with the known limits in reproducibility of the experimental
systems. However, many existing large databases cannot be used directly
to build local QSAR models, because they contain diverse sets of
non-congeneric structures. We present a novel QSAR approach that detects
groups of structures for local QSAR modeling. The algorithm combines
clustering and classification or regression for making predictions on
chemical structure data. A structural clustering procedure is applied as
a preprocessing step, before a (local) model is learned for each
relevant cluster. Instead of using only one global model (classical
approach), we use weighted local models for predictions of query
instances dependent on cluster memberships.
The obtained clusterings are overlapping and non-exhaustive. In other words, a compound can be member of more than one cluster or member of no cluster at all. The dataset is divided into several parts representing the different structural classes. Compared to global models that generally try to capture everything at once, local models are high-quality models of small regions of the chemical space that often have the advantage of being much more interesting and understandable to a domain expert. Those local models reflect the classical approach to QSAR, where only a small set of highly relevant compounds is taken into account when building a model for a specific endpoint.
The approach is evaluated together with standard statistical QSAR algorithms on various datasets. The results show that in most cases the application of local models significantly improves the predictive power of the derived QSAR models compared to the classical approach, to models that are induced by a fingerprint-based clustering approach and to locally weighted learning. In summary, the new combined approach is interesting both theoretically as a new synthesis of clustering and QSAR model learning and practically as a new method for making predictions in pharmacological and toxicological applications.
 Benigni, R., Bossa, C.: Predictivity of QSAR. J. Chem. Inf. Model. 48 (2008), 971–98