19 Oct 2020 |
Research article |
Software Systems, Multimedia and Cybersecurity
Pattern and Periodicity Detection in Cloud Computing Data
Purchased on Istock.com. Copyright.
Proactive detection of patterns and periodicity in cloud computing workloads is a technique that can be used to optimize resource provisioning strategies and anticipate performance issues. However, current techniques are restrictive because not only do they require constant intervention by experienced administrators, but also because these techniques are specific to a particular type of data—CPU utilization rates, network traffic, etc. To overcome these limitations and to provide a generic and automated tool that can detect patterns of various magnitudes, length and type, we propose an approach using a prefix transposition technique , . Tests conducted on CPU and throughput datasets from cloud server nodes aIMS environments show that our approach outperforms the autocorrelation technique in detection accuracy. Keywords: Time series analysis, cloud computing, pattern detection.
Challenges in Resource Analysis of a Cloud Computing System
Virtualization is a fundamental concept underpinning the architecture of cloud computing systems, in which many applications share the same physical infrastructure, optimizing resource usage and reducing operating costs. However, to ensure the smooth operation of these virtualized systems and to improve customer experience (QoS – Quality of Service), resources must constantly be scaled to respond to demand fluctuations .
Thus, deploying pattern and periodicity detection mechanisms is necessary to facilitate monitoring and analysis of resource behaviour in a cloud computing system . These are essential in detecting periodic trends/patterns designed to facilitate the work of upstream administrators, allowing them first to proactively detect abnormal system behaviour and then adjust resource scaling strategies accordingly.
However, existing approaches supporting these mechanisms show certain limitations that restrict their performance. First, they rely on constant intervention by administrators to specify the parameters and criteria of the detection approach, for example, the frequency at which the time series is periodic. Second, these approaches are only effective when the workload periodicity is said to be stationary, i.e. when the length, shape, and amplitude of the period are of fixed sizes. This second point is a major issue in industrial production environments since the length, shape and amplitude of the periodic cycles vary greatly depending on the underlying use of the system (e.g. system response to a predictably high demand due to a hockey game at the Bell Centre) or the time of day/week/month—(e.g. use of the system in downtown Montreal during the week versus weekend use).
A New Approach
To address these limitations, we propose a new approach to model and periodicity detection for cloud computing environments. The approach capitalizes on a prefix transposition technique, which has demonstrated its effectiveness in molecular biology in the deduction of functional and evolutionary connections of genomes. The proposed solution stands out because it can detect periodic cycles of variable length, amplitude and shape for any type of discrete time series.
To demonstrate its effectiveness, experiments were carried out using datasets from virtualized Web servers and a virtualized OpenIMS platform hosted at École de technologie supérieure. Results show that our approach can accurately detect any type of periodic cycle and that it stands apart from conventional techniques based on the analysis of autocorrelation coefficients.
The proposed approach is divided into two stages.
The first stage, called the preparation stage, involves pre-processing a workload dataset to refine the trend curves and remove outliers.
The main steps of the preparation stage are summarized as follows: the discrete time series of the workload is converted into splines to remove outliers and refine the trend curves. The result is a continuous series that can be re-discretized at desired time intervals.
The re-discretized values are then converted into their binary equivalent. This allows transposing and extracting a fixed-sized number sequence into each of the values. This number sequence is called a “digital fingerprint”. The digital fingerprint is key in our approach because it is with this feature that we can distinguish the shape and length of a periodic cycle. Subsequently, other values that were not extracted during the prefix transposition are retained and go through additional filtering steps. These residuals will be useful to detect the amplitude of periodic cycles.
The second stage, called the decision stage, consists in scanning digital fingerprints using a template filter inserted into a sliding window. A successive pair of ascending and descending patterns confirms the presence of a given periodic cycle. Subsequently, various additional steps are used to count the periodic cycles, and determine their shape, length and amplitude.
Evaluation of the Proposed Approach
We evaluated our approach with a CPU workload dataset from a virtualized ÉTS Web server. CPU data was collected every 30 minutes over the course of a week. Figure 1 summarizes the various steps involved in converting the raw time series values into digital fingerprints with the adjacent detected periodic cycle patterns—ascending patterns in red, and descending patterns in blue. Figure 2 shows the raw discrete-time series, in a), followed by the continuous series generated using a spline with the detected patterns, in b) and, finally, the periodic cycles detected using an autocorrelation coefficient graph, in c). Table 3 compares the number of periodic cycles detected with our approach to the autocorrelation coefficient analysis technique.
This research proposes an approach to pattern and periodicity detection in cloud data with two main benefits: (1) its generic nature, allowing its acceptance for any type of time series, and (2) its ability to detect periodic cycles, regardless of their amplitude, length and shape properties. Our experiments on CPU and throughput databases from IMS and Web virtualized environments demonstrate that our solution significantly improves the accuracy of pattern detection and periodicity compared to autocorrelation based approaches, especially in situations of extreme workload variations.
For more information on this research, please refer to the following paper:
St-Onge, Cédric; Kara, Nadjia; Wahab, Omar Abdel; Edstrom, Claes; Lemieux, Yves. Detection of time series patterns and periodicity of cloud computing workloads. Future Generation Computer Systems. Volume 109. pp. 249-261.
Cédric St-Onge is a PhD student and research assistant in the Department of Software and IT Engineering.
Program : Information Technology Engineering
Research laboratories : LASI – Computer System Architecture Research Laboratory
Nadjia Kara is a professor in the Department of Software and IT Engineering. Her research focuses on traffic engineering, telehealth, network resource management and service engineering for communication networks.
Omar Abdel Wahab
Omar Abdel Wahab is an assistant professor at Université du Québec en Outaouais. The main topics of his current research activities are in the areas of artificial intelligence, cybersecurity, cloud computing, and the Internet of Things.
Program : Information Technology Engineering
Claes Edstrom is a senior specialist in cloud computing at Ericsson. His research interests include application transformation, resource management and automation in cloud computing environments.