Marianne Müller and Stefan Kramer (2010)
Integer Linear Programming Models for Constrained Clustering
In: Proceedings of the 13th International Discovery Science DS 2010, ed. by Bernhard Pfahringer, Geoffrey Holmes and Achim Hoffman, pp. 159-173, Springer.
We address the problem of building a clustering as a subset of a (possibly large) set of candidate clusters under user-defined constraints. In contrast to most approaches to constrained clustering, we do not constrain the way observations can be grouped into clusters, but the way candidate clusters can be combined into suitable clusterings. The constraints may concern the type of clustering (e.g., complete clusterings, overlapping or encompassing clusters) and the composition of clusterings (e.g., certain clusters excluding others). In the paper, we show that these constraints can be translated into integer linear programs, which can be solved by standard optimization packages. Our experiments with benchmark and real-world data investigates the quality of the clusterings and the running times depending on a variety of parameters.
