Time series data differs naturally form other data by containing concept drifts. Concept drifts are important for structured analytics, even crucial for machine learning. In this article, we discuss what concept drifts are, provide examples where concept drifts occur and what the implications for machine learning are.
What are concept drifts in Time Series Data?
Concept drifts are changing patterns over time in underlying data foundations of a concept. Concepts can be seen as data values.
Data patterns where a concept is depending on can change in ways where an inference or prediction of the concept is not being precise.
In short, a concept/value drifts over time away from the patterns and rules it was precieved to be related to at other points of time. Concept drifts are caused by (known or unkwnown) changing real world conditions.
For example, a tram will not be late all the time at all days and changed customer behaviour changes dealys. In fact, the humans concepts like a Monday to Sundays result in different jam and tram loading, thus resulting in different delays occuring.
Concept drifts are battle deciding factors in Data Science
Time Series Data are ordinarily the foundation for many Data Science applications. The special thing is that concept drifts occur naturally in Time Series Data.
Having Big Data stored in a time series format is the foundation for structured multi-dimensional analysis, extended data analysis and machine learing where detecting concept drifts or utilizing them becomes crucial.
Examples of machine learning where concept drifts are utilized or are important for it to be detected are manyfold. Below are some of the instances:
- training forecasting algorithms
- doing manual forecasts
- anomaly detection
- causality and correlation analytics
- operational monitoring
- alerting
Hence, when concept in the data changes completely or just drift, then this can cause tremendous problems.
A heavy economic impact, for example the Coronavirus pandemic 2020, where a Coronavirus changed all analytic forecasts and peoples habits and behaviour which then manifests in different patterns in data.
Imagine the choreography made by the airline industry and ecommerce sector; within a few months the complete sector has changed and the patterns in the data shifted and drifted completely with on a whole new slate to begin from. When an airline uses smart data to notify the customer on special offers before, the data was because of country lockdowns which then suddenly becomes invalid.
A simpler concept of drifts can be a minor personal change in an enterprise which is often called slowly changing dimensions. Imagine when a head of a sales department changes. With a new head the sales skyrocket and forecasts for future sales will be adjusted. Without crediting such real-world change in a multi-dimensional data analytis model every structured data investigation becomes very difficult.
Simplified, concept drifts can render all analytics and trained machine learning models useless.
Therefore, it is crucial for a company to detect and consider concept drifts in data models or use concepts drifts as a parameter when training machine learning models.
Natural concept drifts in Big Data
Data often evolves over time and concept drifts are detectable. This can be caused by the previously described scenarios that one can think of, but also those which happens naturally without easily determining reasons.
Simple scenarios are aging sensors where the readings decreases, otherwise it can be the surrounding conditions which can lead to changing sensor readings.
For instance, a temperature sensor readings can drift based on seasons.
Temperature sensor output readings may fluctuate in different seasons or at different locations. When you look at singular readings from a day to day basis, the underlying concept of a season is not really visible, especially because the temperature also changes during day and night.
Imagine seasons showing concept drifts not being able to reveal themselves in the very beginning of a project. In order to detect them a longer capture of data is important, where it needs to be stored and accessed in a flexible way. A Time Series Database is then very useful to slice and dice the data over time or to aggregate it for specific time intervals.
Concept Drifts implications for machine learning
Challenging in all of the are the implications when concept drift occurs. This possibility leads to the necessity to monitor the trained machine learning models, when dealing with Time Series data.
Monitoring of the performance of then helps to detect concept drifts much earlier and to take necessary actions like to re-train a model or to investigate for new patterns which need to be considered.
Scenarios like deploy and forget are risky in live and productive Time Series applications, because no one can know if new .
Hence, concept drifts mark another end of the age where and application is developed once and then rots while it outdates. When dealing with Big Time Series Data technology monitoring and continous deploy- and development becomes a necessity.
Ultimately, Big Data applications based on machine learning and Time Series Data need continous maintenance, monitoring and improvement to overcome the consequences that concept drifts can occur all the time in unforseen ways.
Are you looking for ways to get the best out of your data?
If yes, then let us help you use your data.
Sum-up FAQ
What are concept drifts in Time Series Data sets?
Concept drifts are changing patterns over time in in underlying data foundations of a concept. Concepts can be seen as data values. Data patterns where a concept is dependant on changes in ways that an inference of the value of the concept is not precise. In short, a concept/value drifting over time away from the patterns and rules it was precieved to be relating on at other points of time. Concept drifts are caused by (known or unkwnown) changing real world conditions.
Which implications do concept drifts have on application development?
Big Data applications based on machine learning and needing continous maintenance, monitoring and improvement to overcome the consequences that concept drifts can occur all the time in unforseen ways.
In which machine learning use cases play concept drifts a role?
– training forecasting algorithms
– doing manual forecasts
– anomaly detection
– causality and correlation analytics
– operational monitoring
– alerting
What are examples for concept drifts?
– Weekdays changing public transport delays
– Seasons changing temperature readings
– Aging sensors producing different readings when they age
– Smartphones sensor base readings differing inside and outside
What is the big thing about concept drifts?
Concept drifts can render all analytics and trained machine learning models useless or change the detected or foundation patterns.