IJMLC 2014 Vol.4(2): 120-126 ISSN: 2010-3700
Anomaly Detection in Application Performance Monitoring Data
Thomas J. Veasey and Stephen J. Dodson
Abstract—Performance issues and outages in IT systems have significant impact on business. Traditional methods for identifying these issues based on rules and simple statistics have become ineffective due to the complexity of the underlying systems, the volume and variety of performance metrics collected and the desire to correlate unusual application logging to help diagnosis. This paper examines the problem of providing accurate ranking of disjoint time periods in raw IT system monitoring data by their anomalousness. Given this ranking a decision method can be used to identify certain periods as anomalous with the aim of reducing the various performance metrics and application log messages to a manageable number of timely and actionable reports about unusual system behaviour. In order to be actionable, any such report should aim to provide the minimum context necessary to understand the behaviour it describes.
In this paper, we argue that this problem is well suited to analysis with a statistical model of the system state and further that Bayesian methods are particularly well suited to the formulation of this model. To do this we analyse performance data gathered for a real internet banking system. These data highlight some of the challenges for accurately modelling the system state; in brief, very high dimensionality, high overall data rates, seasonality and variability in the data rates, seasonality in the data values, transaction data, mixed data types (continuous data, integer data, lattice data), bounded data, lags between the onset of anomalous behaviour in different performance metrics and non-Gaussian distributions. In order to be successful, subject to the criteria defined above, any approach must be flexible enough to handle all these features of the data.
Finally, we present the results of applying robust methods to analyse these data, which were effectively used to pre-empt and diagnose system issues.
—Anomaly detection, APM.
T. J. Veasey and S. J. Dodson are with Prelert Ltd, UK (e-mail: tveasey@
Cite: Thomas J. Veasey and Stephen J. Dodson, "Anomaly Detection in Application Performance Monitoring Data," International Journal of Machine Learning and Computing vol.4, no. 2, pp. 120-126, 2014.