Jana Schmidt, Andreas Hapfelmeier, Ashgar Ghorbani, and Stefan Kramer (2013)
Learning Probabilistic Real-Time Automata from Multi-Attribute Event Logs
Intelligent Data Analysis Special Issue on Dynak Topics, 7(1):to appear.
The growing number of time-labeled datasets in science and industry
increases the need for algorithms that automatically induce
process models. Existing methods are capable of identifying
process models that typically only work on single attribute
events. We propose a new model type
to address the problem of mining multi-attribute
events, meaning that each event is described by a vector of
attributes. The model is based on timed automata, includes
expressive descriptions of states and can be used for making
predictions. A probabilistic real time automaton is
created, where each state is annotated by a profile
of events. To
identify the states of the automaton, similar events are
combined by a clustering approach. The method was implemented
and tested on a synthetic, a medical and a biological dataset.
Its prediction accuracy was evaluated on a medical dataset and
compared to a combined logistic regression, which is considered
a standard in this application domain. Moreover, the method
was experimentally compared to Multi-Output HMMs and Petri nets
learned by standard process mining algorithms. The experimental
comparison suggests that the automaton-based approach performs
favorably in several dimensions. Most importantly, we show that
meaningful medical and biological process knowledge can be
extracted from such automata.
