Using Local Models to Improve (Q)SAR Predictivity
In a recent paper, Benigni and Bossa[1] found that local QSAR models
can produce results that are mechanistically interpretable and compare
favorably with the known limits in reproducibility of the experimental
systems. However, many existing large databases cannot be used directly
to build local QSAR models, because they contain diverse sets of
non-congeneric structures. We present a novel QSAR approach that detects
groups of structures for local QSAR modeling. The algorithm combines
clustering and classification or regression for making predictions on
chemical structure data. A structural clustering procedure is applied as
a preprocessing step, before a (local) model is learned for each
relevant cluster. Instead of using only one global model (classical
approach), we use weighted local models for predictions of query
instances dependent on cluster memberships.
The obtained clusterings
are overlapping and non-exhaustive. In other words, a compound can be
member of more than one cluster or member of no cluster at all. The
dataset is divided into several parts representing the different
structural classes. Compared to global models that generally try to
capture everything at once, local models are high-quality models of
small regions of the chemical space that often have the advantage of
being much more interesting and understandable to a domain expert. Those
local models reflect the classical approach to QSAR, where only a small
set of highly relevant compounds is taken into account when building a
model for a specific endpoint.
The approach is evaluated together
with standard statistical QSAR algorithms on various datasets. The
results show that in most cases the application of local models
significantly improves the predictive power of the derived QSAR models
compared to the classical approach, to models that are induced by a
fingerprint-based clustering approach and to locally weighted learning.
In summary, the new combined approach is interesting both theoretically
as a new synthesis of clustering and QSAR model learning and practically
as a new method for making predictions in pharmacological and
toxicological applications.
[1] Benigni, R., Bossa, C.: Predictivity of QSAR. J. Chem. Inf. Model. 48 (2008), 971–98
Publications
Buchwald, F, Girschick, T, Seeland, M, and Kramer, S
(2011).
Using Local Models to Improve (Q)SAR Predictivity
Molecular Informatics, 30(2-3):205-218.
