In the previous article on the Internet of Things we talked about how the Machine Learning, through the different families of existing algorithms, is able to transform a Data Lake of IoT data collected from the field into added value for the process.

There are many solutions derived from this approach, for example:

  • analysis of behavioral patterns to identify abnormal or, conversely, optimal situations;
  • monitoring integration tools real-time for event prediction;
  • Decision support through prescriptive analysis.

The phase of Data Exploration, the first step in any Data Science project, allows you to select, within the varied data source built with IoT data collection, the perimeter of the process you are interested in according to the results to be obtained.

At this point, it is necessary to prepare the dataset from the large amount of IoT data, historicized at the highest level of detail with the sampling rate of the sensors from which it is collected.

Data Wrangling of IoT Data

The phase just introduced can take various names, including the most common Data Wrangling, or in Italian data aggregation: this delicate phase is the basis for achieving meaningful results, and its approach varies according to the goal set.

This stage can be seen as a thorough photographic work: starting from a complete panorama, it is necessary to find the set-up optimal to capture the scene in its highlights, enhancing its graphic features.

Starting with IoT data at the maximum detail, it might be successful to perform the aggregations:

  • on a defined time axis, useful, for example, if you want to implement an online prediction algorithm during the process, as in a Predictive Maintenance system;
  • by product or batch of products, which is necessary, for example, to correlate the data describing their describe the production process with qualitative feedback, an approach often used for Predictive Quality Analytics.

At this point, one can operate as a photographer: using statistical or mathematical aggregating functions, also designed ad hoc for the domain, one can capture a snapshot of the process that can summarize the key aspects from which the chosen Machine Learning algorithm can extract knowledge and value.

Like any photographer in the digital age, having taken a good photograph you go into post-production to remove the little bit of noise that doesn’t make it perfect: this stage, in preparing a dataset for a machine learning algorithm, is the Data Cleaning (or Cleansing).

This equally delicate and important process will be the subject of the next article in our series devoted to Machine Learning in the context of the Internet Of Things.