The Internet of Things (IoT) has been shown to be very valuable for Business Process Management (BPM), for example, to better track and control process executions. While IoT actuators can automatically trigger actions, IoT sensors can monitor the changes in the environment and the humans involved in the processes. These sensors produce large amounts of discrete and continuous data streams, which hold the key to understanding the quality of the executed processes. However, to enable this understanding, it is needed to have a joint representation of the data generated by the process engine executing the process, and the data generated by the IoT sensors. In this paper, we present an extension of the event log standard format XES called DataStream. DataStream enables the connection of IoT data to process events, preserving the full context required for data analysis, even when scenarios or hardware artifacts are rapidly changing. The DataStream extension is designed based on a set of goals and evaluated by creating two datasets for real-world scenarios from the transportation/logistics and manufacturing domains.
Keywords: process management; Industry 4.0; IoT data; process mining; XES
All companies rely on business logic, or more generic, process logic to accomplish business goals. Process logic describes the interaction between machines, humans, software (e.g., ERP, MES), resources (e.g., raw materials), and the environment in order to achieve a predefined business goal. Such process logic is executed by a process engine, which enacts and monitors all the rules contained in the processing logic. This process execution can rely on the Internet of Things (IoT) consisting of a network of (smart) machines and sensors. In this case, the process engine and the IoT work together to execute the processes. The engine orchestrates the process activities, using IoT actuators to automate process tasks, while IoT sensors and tags can be used to closely monitor the execution environment and involved resources [[
To understand and improve the processes, the process execution data is stored in an event log, then can later be analyzed using a variety of process mining techniques [[
In the absence of a unified, expressive standard for IoT-enriched event logs, various players in industry and academia are developing their own proprietary formats and database schemas. This results in many highly customized and not-interoperable data formats and procedural applications (examples include—as of 2023: Celonis Execution Management System, IBM Process Mining, Fluxicon Disco, Microsoft Process Advisor, Mehrwerk MPM Process Mining) dealing with process mining, i.e., runtime and ex-post analysis of data-streams to check the compliance with the business logic, search for the cause of errors and gain insights about bottlenecks and resource shortages.
In this paper, we present the DataStream XES extension for uniform representation of IoT-enriched event logs. The extension complements plain XES in a way that extensive IoT sensor data can be stored in events or traces, but also, independently of these concepts if the connection is not clear (yet). The foundations of this extension are based in the challenge C3 of the BPM&IoT Manifesto [[
- Provide a well-defined set of named XES attributes to describe individual (sensor) events.
- Utilize well-established XES concepts such as lists to group the named attributes for simplified analysis.
- Establish a set of named XES attributes to store many (sensor) events per process event.
- Describe how to store large quantities of (sensor) events, which might occur between the start/end of a process event or a process instance (i.e., establishing a new XES BPAF lifecycle transition).
- Establish a set of named XES attributes to connect (sensor) events to groups of process tasks.
Subsequently these 5 goals are evaluated in two different real-world scenarios, from the transportation/logistics and manufacturing domains.
The extension adds nested attributes to XES, providing a vocabulary for storing IoT data streams with process logs. Thus, the extension can easily be integrated with existing process execution environments or log aggregation mechanisms, as exemplified in Section 4. The DataStream XES extension is intended to lay a foundation for process mining in IoT environments and to promote re-usability and interoperability.
The structure of the paper is as follows: In Section 2, we describe the theoretical basis for process mining in IoT and the related literature. Section 3 introduces the proposed DataStream XES extension to specify IoT-enriched event logs. In Section 4, we present application scenarios for IoT-enriched event logs in smart manufacturing and public transportation. Section 5 summarizes the results, lists advantages and limitations, and gives an outlook for future research directions.
The recent developments and technologies used in the Industrial Internet of Things (IIoT) [[
Dorsemaine et al. [[
Context in IoT is defined by Dey et al. [[
Serpanos [[
Furthermore, the IoT domain is characterized by a focus on devices topologies, and middleware that facilitates the interaction between various classes of devices in scenarios such as smart homes and smart cities [[
Often, existing work on IoT does not cover the integration of BPM concepts. However, various approaches from the BPM community investigating aspects regarding the integration of BPM with IoT exist and are discussed in different, recent surveys [[
Chang et al. [[
The work presented in these papers complements all of these approaches, by giving them a means to uniformly represent the process and IoT data in a single data structure, that can be either used for long-term storage, or as the basis for data analysis.
One technique to analyze IoT sensor data together with related event data is process mining. Process mining describes three analysis tasks. The most common is (i) process discovery. Discovery techniques take an event log and produce a process model depicting the process executed in the log [[
Many proposals have been made in the past for storing event logs. The first was MXML, as a simple XML format for audit and trails in process-aware information systems [[
An XES attribute consist of (a) a data type represented by the qualified name of the XML element, (b) a key to denote the type of attribute (unique within its container), and (c) a value (see Listing 1). XES describes six types of attributes: string, date, int, float, boolean and id which have a value, as well as two additional attributes, container and list, which can hold arbitrary child attributes. All attributes can also be nested (even inside non-container and non-list attributes) [[
1 <log xes.version="1.0" 2 xmlns="http://www.xes-standard.org" 3 xes.creator="cpee.org" 4 xes.features="nested-attributes"> 5 <extension name="Concept" prefix="concept" uri="http://www.xes-standard.org/concept. xesext"/> 6 <extension name="Lifecycle" prefix="lifecycle" uri="http://www.xes-standard.org/lifecycle.xesext"/> 7 <extension name="Identity" prefix="identifier" uri="http://www.xes-standard.org/identity. xesext"/> 8 <extension name="Time" prefix="time" uri="http://www.xes-standard.org/time.xesext"/> 9 <global scope="trace"> 10 <string key="concept:name" value="name"/> 11 </global> 12 <global scope="event"> 13 <string key="concept:name" value="name"/> 14 <string key="lifecycle:transition" value="start"/> 15 <date key="time:timestamp" value="1970-01-01T00:00:00.000+00:00"/> 16 </global> 17 <string key="lifecycle:model" value="standard"/> 18 <string key="creator" value="cpee.org"/> 19 <string key="library" value="cpee.org"/> 20 <trace> 21 <string key="concept:name" value="Process 1"/> 22 <event> 23 <string key="concept:name" value="Task 1"/> 24 <string key="lifecycle:transition" value="start"/> 25 <date key="time:timestamp" value="1970-01-01T00:00:00.000+00:00"/> 26 <string key="name" value="Juergen"/> 27 </event> 28 <event> 29 <string key="concept:name" value="Task 2"/> 30 <string key="lifecycle:transition" value="start"/> 31 <date key="time:timestamp" value="1970-01-01T00:00:00.000+00:00"/> 32 <string key="name" value="Juergen"/> 33 /event> 34 </trace> 35 </log>
Since the requirements for event logs differ depending on the application and domain, XES can be extended. Standard XES extensions include the concept extension, which specifies a generally understood name for events, traces, or the log. In addition, the lifecycle extension can be used to specify different stages in the lifecycle of events and the time extension standardizes the specification of event timestamps [[
Recently, the uptake of new technologies and the gain in maturity of the process mining field have increased the urge to create more powerful event log models. Multiple propositions that relax some assumptions of XES and allow for more flexibility in event data representation have been presented (e.g., [[
When representing a process execution into the OCEL format, information about traces, events, or the overall sequence of steps is only available implicitly, by linking together the models of each object. This means that information is potentially lost, and ambiguities regarding the process model might appear. While theoretically, it is possible to define an object type corresponding to an overall case as understood in XES, it is non-standard and therefore not automatically analyzable. In this case, XES is still a better fit. So, despite its noteworthy flexibility and closer conceptualization to the real-world in case of business processes supported by relational databases, the missing case notion in the OCEL format can become a liability.
In recent publications the temporal and spatial aspects of sensor readings are used for automatically connecting them to process execution [[
Multi-perspective process mining [[
Banham et al. [[
For all of these approaches, datasets have been provided containing a wealth of context data in conjunction with process events. However, these datasets present slightly different granularity levels, slightly different formats, and slightly different semantics.
Wei et al. [[
Based on our literature analysis presented in Section 2 we think, that XES is still the best starting point for representation and long-term storage of process data and all connected IoT data. As discussed above, (
Holding the extracted, transformed and aggregated data (i.e., after process mining, or after process execution) in a flexible, structured long-term storage format is imperative. This section discusses how XES can be extended to better support this goal.
XES is built around events. Each process activity execution can lead to a set of events in an XES log file, following the life-cycle (see Section 2) of the execution of that activity in a particular instance, i.e., each activity could lead to a "start" event, to a "complete" event, and to an arbitrary number of events in between, depending on the utilized life-cycle model.
Many XES log files just store one event per executed activity, thus sensor readings could be attached to this event. Other available logs, such as [[
An execution of the model shown in Figure 2 leads to the XES log described in Listing 1. As mentioned in the XES Standard [[
"Log, trace, and event objects contain no information themselves. They only define the structure of the document. All information in an event log is stored in attributes. Attributes describe their parent element (log, trace, etc.). All attributes have a string-based key."
However, as we indicated before, XES cannot be directly used to store IoT-enhanced BP logs. Specifically, we distinguish three different cases (see Figure 3) where IoT data might be connected to process activities, according to which part of the process that data may be relevant for:
- "Single Activity" Context : A time-series of sensor readings from at least one sensor is connected to a single activity, e.g., when the activity represents the machining of a part, collected sensor data might describe various aspects, such as the throughput of coolant while machining, a discrete series of vibration readings, or a function (continuous data) describing the noise generation (volume). All sensor data can be assigned to a particular activity, being the data relevant between the start and the completion of the activity.
- "Group of Activities" Context : A time-series of sensor readings from at least one sensor is connected to a set of activities. This is especially relevant for environmental sensors, which for example span a multitude of production steps. For instance, temperature changes during several process activities might give insights into certain quality properties of a finished product but cannot be clearly attributed to a single step (e.g., significance of the difference between temperatures measured at the start of activity 1, and at the end of activity n).
- "Trace" Context : A time-series of sensor readings from at least one sensor is connected to a whole trace. This case is analogous to the "Group of Activities" case. This is for instance necessary when the sensor readings from a period before and/or after individual activities may be relevant for the process analysis. e.g., when enacting a chemical reaction, the characteristics of a warm-up phase might be important for the outcome but might not be explicitly part of the process model, which only starts with adding ingredients.
Please note that being connected to "group of activities" or "traces" can also include that one sensor reading can be relevant for very different processes or process instances or activities in very different process or process instances. We assume that the readings are duplicated for these cases—we focus on single instances and their activities. As each reading can be uniquely identified, complex relationships between instances and processes will still be visible in the data.
In order to realize these three contexts, we extend XES as depicted in Figure 4.
In the following, we will denote all attributes of our proposed extension with the prefix stream:, to increase the clarity of the description. We will furthermore assume that the stream: prefix is specified in an XES extension—see https://cpee.org/datastream/datastream.xesext (accessed on 8 February 2023).
The core of the extension is " attribute:point ", furthermore denoted as stream:point (see previous paragraph). It contains all the attributes that allow us to represent individual sensor values as XES artifacts. It is a list. Values include:
- id : uniquely identifies the sensor, e.g., if a gyro-sensor delivers orientation and angular velocity changes separately, the identifiers can be gyro/velocity and gyro/angular_velocity. On the other hand, if the sensor delivers a value pair, the identifier can be gyro.
- source : identify the source of a sensor value, e.g., a drilling machine is the source of many different sensor readings at all times. The source attribute allows grouping these values into groups that may belong together and, thus, make sense to be analyzed together. The source is optional.
- timestamp : A timestamp when the reading was taken. The timestamp is intended to be in ISO 8601 format, including milliseconds (YYYY-MM-DDTHH:mm:ss.sssZ) or microseconds (YYYY-MM-DDTHH:mm:ss.ssssssZ).
- value : The value delivered by the sensor. As sensors can deliver single values (float, int, strings) or complex data (pairs, triplets, deeply structured data, ...), we always assume this is stored as some serialized string representation.
- meta : A straightforward extension point, which allows us to specify an additional list of attributes, which might be important for custom data analysis purposes. Meta is optional.
The second concept (see Figure 4) is the stream:datastream. It was introduced to group points for the "Single Activity" and "Trace" contexts. Its only (optional) attribute is name, which can be used to describe the purpose of the grouping.
If a set of stream:datastream is included directly in the level of the trace, all sensor:point attributes are meant to exist in the "Trace" context: they cannot be attributed to any event or group of events yet.
If a stream:datacontext exists at the trace level, the stream:datacontext has to group multiple events, and it has to contain at least one stream:datastream. This realizes the "Group of Activities" context. Multiple stream:datacontext attributes can exist at trace level, meaning that multiple groups exist.
If a stream:datastream exists at the event level, it has to contain at least one stream:point. Multiple stream:datastream can exist at the event level. While this does not change the meaning of all these points being connected to one event, its purpose might be to further structure the events, e.g., separating two different levels of importance for analysis purposes.
All stream:datacontext attributes might be nested. Nested sensor:datacontext attributes convey different layers of connection granularity. For example, some stream:point attributes might be grouped to a group (a) of 2 tasks, some other stream:point attributes might be connected to a group (b) of 2 different tasks. Then a third set of stream:point attributes might be connected to all tasks in groups (a) and (b), leading to a (c: (a) (b)) nesting, as depicted in Listing 2:
1 <trace> 2 <string key="concept:name" value="Process 1"/> 3 <list key="stream:datacontext"> 4 <list key="stream:datastream"> 5 <list key="stream:point"> 6 <date key="stream:timestamp" value="2021-11-04T15:22:19.367+01:00"/> 7 <string key="stream:id" value="humidity"/> 8 <string key="stream:value" value="62.5"/> 9 </list> 10 </list> 11 [...] 12 <list key="stream:datacontext"> 13 <list key="stream:datastream"> 14 <list key="stream:point"> 15 <date key="stream:timestamp" value="2021-11-04T15:22:22.369+01:00"/> 16 <string key="stream:id" value="pressure"/> 17 <int string key="stream:value" value="19"/> 18 </list> 19 [...] 20 </list> 21 <event>[...]</event> 22 <event>[...]</event> 23 [...] 24 </list> 25 <list key="stream:datacontext"> 26 <list key="stream:datastream"> 27 <list key="stream:point"> 28 <date key="stream:timestamp" value="2021-11-04T15:22:28.369+01:00"/> 29 <string key="stream:id" value="temperature"/> 30 <int string key="stream:value" value="75.3"/> 31 </list> 32 [...] 33 </list> 34 <event>[...]</event> 35 <event>[...]</event> 36 [...] 37 </list> 38 </list> 39 </trace> 40 </log>
This leaves us with the special case of overlapping cases, where some stream:points are connected to tasks 1 and 2, where some other stream:points are connected to tasks 2 and 3. This case can only (XES being a tree structure) be solved by creating three stream:datastream attributes with some duplicated stream:point elements.
For long-running tasks, especially when adding IoT data at runtime, it is beneficial to add the IoT data immediately to the XES log, instead of waiting for the next event to occur.
For example, if IoT sensors deliver data during the execution of a task with a duration of 90 min, all data would occur in the "lifecycle:transition complete" event. If the XES log is processed at runtime, e.g., for runtime drift analysis such as in [[
Thus, the introduction of a new lifecycle transition named stream/data, as shown in Listing 3, allows the immediate addition of data to the log. The event might optionally include the context (i.e., id of the task), or not include the context if the event exists at the trace or log level.
1 <trace> 2 <string key="concept:name" value="Process 1"/> 3 <event> 4 <string key="lifecycle:transition" value="start"/> 5 ... 6 <event> 7 <event> 8 <string key="lifecycle:transition" value="stream/data"/> 9 <list key="stream:datastream"> 10 ... 11 </list> 12 ... 13 <event> 14 ... 15 <event> 16 <string key="lifecycle:transition" value="complete"/> 17 ... 18 <event> 19 ... 20 </trace> 21 </log>
The final element introduced in Figure 4 is stream:multipoint. This concept is not necessary from a functional perspective, but allows reducing the size of the log file.
For example, when a set of sensor:point attributes all origin from the same sensor and the same source, and contain the same meta information, this information is duplicated all over and over. A sensor:multipoint (see Listing 4) allows us to group this redundant information for a set of points:
1 <trace> 2 <string key="concept:name" value="Process 1"/> 3 <event> 4 <string key="concept:name" value="Task 1"/> 5 <string key="lifecycle:transition" value="complete"/> 6 <date key="time:timestamp" value="1970-01-01T00:00:00.000+00:00"/> 7 <string key="name" value="Juergen"/> 8 <list key="stream:datastream"> 9 <string key="stream:name" value="Temperature"/> 10 <list key="stream:multipoint"> 11 <string key="stream:id" value="keyence/mesurement"/> 12 <string key="stream:source" value="keyence"/> 13 <list key="stream:point"> 14 <date key="stream:timestamp" value="2021-11-04T15:22:19.367+01:00"/> 15 <string key="stream:value" value="18"/> 16 </list> 17 <list key="stream:point"> 18 <date key="stream:timestamp" value="2021-11-04T15:22:20.369+01:00"/> 19 <int string key="stream:value" value="19"/> 20 </list> 21 </list> 22 </list> 23 </event> 24 </trace> 25 </log>
Alternatively, it can be used to group according to timestamp if a set of sensor readings are taken at discrete points in time (see Listing 5):
1 <trace> 2 <string key="concept:name" value="Process 1"/> 3 <event> 4 <string key="concept:name" value="Task 1"/> 5 <string key="lifecycle:transition" value="complete"/> 6 <date key="time:timestamp" value="1970-01-01T00:00:00.000+00:00"/> 7 <string key="name" value="Juergen"/> 8 <list key="stream:datastream"> 9 <string key="stream:name" value="Temperature"/> 10 <list key="stream:multipoint"> 11 <date key="stream:timestamp" value="2021-11-04T15:22:19.367+01:00"/> 12 <list key="stream:point"> 13 <string key="stream:id" value="temperature"/> 14 <string key="stream:value" value="48.5371"/> 15 </list> 16 <list key="stream:point"> 17 <string key="stream:id" value="pressure"/> 18 <string key="stream:value" value="12:30-1,12:31-2,3,4,5"/> 19 </list> 20 </list> 21 </list> 22 </event> 23 </trace> 24 </log>
In order to evaluate the DataStream XES extension, we present and discuss real-world IoT-enriched event logs from smart factories and the public transportation domain (see Section 4.1 and Section 4.2). The presented event logs have been created through cpee.org, which supports the DataStream XES extension to directly write logs. All presented application scenarios show how sensor data can be grouped, nested and embedded with ordinary XES logs as a basis for future data-oriented analysis tasks.
Examples in this section are displayed in the XES YAML (Yet Another Mark-up Language, https://yaml.org (accessed on 8 February 2023)) serialization because it is more compact and readable. YAML has the following properties: it relies on indentation for structure (like python), the data types are omitted, the key and value attributes directly result in key value pairs, e.g., the XML excerpt "
After describing the application scenarios, we discuss the results derived from using the DataStream XES extension in the scenarios to create enriched event logs and describe use cases for process mining analysis (see Section 4.3).
For efficient planning of public transportation services, knowledge about the operation of individual tram lines as well as information about effects which might influence its smooth execution, such as weather or traffic, is needed. The dataset (https://doi.org/10.5281/zenodo.7411234 (accessed on 8 February 2023)) used for this section provides an example for the collection of such data in Vienna, Austria using (
The process model in Figure 5 contains tasks that are responsible for the collection of data, which is signalized by the two curved lines on the top right side of these tasks. Such tasks contain a description of the data probes [[
Corresponding snippets taken out of the created log are shown in Listings 6 and 7. Listing 6 contains a snippet of the weather data in the log as collected by the task shown in Figure 6. In this excerpt of the log, different stream:point elements (as specified in the data probes of the task) are contained in the stream:datastream element.
So, (nested) stream:datastream elements can be used as a grouping mechanism that conveys semantic cohesiveness. For example, in Listing 8, different stops identified by a name and an id, are children of a "traffic" datastream. Instead of providing a flat list of values, additional semantic depth can be expressed, which can be utilized for visualization or analysis.
1 event: 2 concept:instance: 8968 3 concept:name: Get Weather 4 [...] 5 stream:datastream: 6 - stream:point: 7 stream:id: temperature 8 stream:value: 4.95 9 stream:timestamp: 2022-12-06 17:17:01.403889247 +01:00 10 stream:source: openweathermap 11 - stream:point: 12 stream:id: feels_like 13 stream:value: 0.56 14 stream:timestamp: 2022-12-06 17:17:01.403972224 +01:00 15 stream:source: openweathermap 16 - stream:point: 17 stream:id: pressure 18 stream:value: 1017 19 stream:timestamp: 2022-12-06 17:17:01.404046393 +01:00 20 stream:source: openweathermap 21 [...]
Listing 7 contains a snippet of the log which is created by the task shown in Figure 7. The "traffic" data stream contains information about the traffic at multiple stops. The structure of the collected data is defined by the data probe defined in the task. In contrast to the other example, multiple data streams (stream:datastream) are created which each include one data point (stream:point) while in the first example just one data stream exists which includes multiple data points.
1 event: 2 concept:instance: 8968 3 concept:name: Get Traffic status 4 [...] 5 stream:datastream: 6 - stream:name: traffic 7 - stream:source: tomtom 8 - stream:datastream: 9 - stream:name: stop 10 - stream:id: '5' 11 - stream:source: tomtom 12 - stream:meta: 13 station: Börse 14 - stream:point: 15 stream:id: trafic 16 stream:value: 0.6394647901326838 17 stream:timestamp: 2022-12-06 17:16:40.142570851 +01:00 18 stream:source: tomtom 19 stream:meta: 20 intersections_in_proximity: 71 21 - stream:datastream: 22 - stream:name: stop 23 - stream:id: '46' 24 - stream:source: tomtom 25 - stream:meta: 26 station: Schottentor U 27 - stream:point: 28 stream:id: trafic 29 stream:value: 0.5571382984201959 30 stream:timestamp: 2022-12-06 17:16:40.142576922 +01:00 31 stream:source: tomtom 32 stream:meta: 33 intersections_in_proximity: 94 34 [...]
One goal in the manufacturing domain is the production of small lot sizes while still keeping a high degree of automation. To achieve this, it is possible to use a process execution engine enacting a process model consisting of several standardized subprocesses. These standardized subprocesses are then tailored to the currently produced part by invoking them with the parameters needed for the individual piece/use-case.
In the scenario discussed in Section 4.2 we focus on a process, where several components work together to produce and measure a chess piece. The components are: (
During all of these steps, different data from the involved components is collected and stored in the logs, as proposed in the XES DataStream extension format. This data includes machining data such as the workload of the drive, the axis speed for different axis, and the actual speed and workload of the spindle as well as measuring data from the optical measurement. The dataset (https://doi.org/10.5281/zenodo.7477845 (accessed on 8 February 2023)) which is created adheres to the earlier described XES DataStream extension format provides the basis for analysis tasks such as process mining, prediction of part quality, or detection of broken tools.
As also described in Section 4.1 a process model is enacted by the process execution engine cpee.org. The process model includes tasks having data probes which define how to collect/extract data and attach it to the log (see Figure 6 and Figure 7 for examples). In this scenario, machining data (see Listing 8) as well as measuring data (see Listing 9) are collected.
1 event: 2 concept:instance: 4129 3 concept:name: Fetch 4 [...] 5 stream:datastream: 6 - stream:name: MaxxTurn45 7 - stream:source: machine 8 - stream:point: 9 stream:id: State/progStatus 10 stream:value: 3 11 stream:timestamp: '2022-12-16T18:01:17.000+01:00' 12 stream:source: 13 proto: opcua 14 host: opc.tcp://192.168.10.59:4840 15 access: ns=2;s=/Channel/State/progStatus 16 [...] 17 - stream:point: 18 stream:id: Axes/Z/aaLeadP 19 stream:value: 405.5 20 stream:timestamp: '2022-12-16T18:01:18.000+01:00' 21 stream:source: 22 proto: opcua 23 host: opc.tcp://192.168.10.59:4840 24 access: ns=2;s=/Channel/MachineAxis/aaLeadP[u1,2] 25 - stream:point: 26 stream:id: Axes/X/aaTorque 27 stream:value: -2.028 28 stream:timestamp: '2022-12-16T18:01:18.000+01:00' 29 stream:source: 30 proto: opcua 31 host: opc.tcp://192.168.10.59:4840 32 access: ns=2;s=/Channel/MachineAxis/aaTorque[u1,1] 33 [...]
Both Listings 8 and 9 extract data and create a data stream consisting of different data points. This enables the collection of many individual values as shown in Listing 8 but also data collection in scenarios where a few or only one value is measured over and over again (as in the measuring example shown in Listing 9).
1 event: 2 concept:instance: 4153 3 concept:name: Fetch 4 [...] 5 stream:datastream: 6 - stream:name: keyence 7 - stream:source: machine 8 - stream:point: 9 stream:id: measurement 10 stream:value: 20.08 11 stream:timestamp: '2022-12-16T17:51:44.408+01:00' 12 stream:source: 13 proto: serial 14 access: "/dev/ttyUSB0 115200" 15 stream:meta: {} 16 - stream:point: 17 stream:id: measurement 18 stream:value: 20.09 19 stream:timestamp: '2022-12-16T17:51:44.423+01:00' 20 stream:source: 21 proto: serial 22 access: "/dev/ttyUSB0 115200" 23 stream:meta: {} 24 [...]
The stream points shown in Listing 9 represent a time series of measurements of the contour of the chess piece. Without the DataStream extension, these measurements would have been part of the log in a proprietary format as provided by the machine (as part of a PDF file). By representing the measurements in the DataStream format, data extraction and transformation steps can be avoided, and data for different measurement machines becomes readily comparable due to the uniform representation.
For the scenarios described in Section 4.1 and Section 4.2 the DataStream XES extension allowed to collect not only the process flow data but also IoT data collected during the process. Such context data contains important information, e.g., when performing a root cause analysis for process outcome properties such as part quality.
For example, in transportation data set (see Section 4.1) the traffic state is explicitly queried as part of the process model, leading to a value between 0 (no traffic flow) and 1 (free traffic flow). In a traditional process log all the sensor readings (from different crossings) in the vicinity would not be part of the data set but might yield crucial information for process improvement (in this case, e.g., changing the timing of traffic lights, or moving the station).
Another, even clearer example can be highlighted in the manufacturing data set (see Section 4.2) the task of producing a rook from the process point of view only results in "success" or "no success". The over twenty sensors that produced several thousand readings during the two-minute duration of machining, cannot easily be included in a traditional XES file, and were traditionally analyzed separately from the process. The XES DataStream extension provides a means to structure and store these readings in a common format.
Linking/integrating the data directly in the event log, can provide a common basis for future analysis tools that can perform a joint analysis of process and IoT data.
Limitations currently exist regarding the usability of the data structure in analysis tools, as there are no implementations to use it. Moreover, due to the large amounts of data in the IoT context, the logs can quickly become very large, which might overwhelm some existing process mining tools.
In this paper, an extension to XES has been presented, allowing the joint representation of process event logs and IoT data related to the environment where these events occur. This enables the development of generic data pipelines, process mining approaches, and visualization tools for IoT event logs.
The extension identifies what is required from the IoT perspective to enable the use of BPM methods for IoT (cf. [[
Section 1 defines goals which should be met by the proposed XES extension. By describing real-world application scenarios (see Section 4) and demonstrating how the goals are achieved using the DataStream XES extension proposed in this paper, it is proven that the approach fulfills the set requirements:
- "Provide a well-defined set of named XES attributes to describe individual (sensor) events." is achieved by defining the named XES attributes in the DataStream Metamodel shown in Figure 4 and using these attributes in the logs of the real-world application scenarios.
- "Utilize well-established XES concepts such as lists to group the named attributes for simplified analysis." is achieved by identifying different granularity levels such as stream:datastream and point in the DataStream Metamodel shown in Figure 4 and assign the named attributes to their corresponding levels. This leads to easier analysis as for example stream:points can share attributes of their parent stream:datastream as shown in Listings 7–9. This serves the purpose of data de-duplication for a set of stream:point attributes, thus reducing data cleaning effort.
- "Establish a set of named XES attributes to store many (sensor) events per process event." is achieved by allowing for multiple stream:datastreams and therefore also stream:points to be connected to one process event in the DataStream Metamodel shown in Figure 4. This is also shown in Listings 6–9 where multiple stream:datastreams and/or stream:points are present in one process event. Having a set of common attributes instead of a mix of different attributes with slightly different meaning, reduces data transformation effort, and provides context and meaning to the most basic shared concepts.
- "Describe how to store large quantities of (sensor) events, which might occur between the start/end of a process event or a process instance (i.e., establishing a new XES BPAF lifecycle transition)." is achieved by introducing the stream/data lifecycle transition. The stream/data lifecycle transition allows us to add IoT data to the XES log at any time therefore omitting the need to wait for the next process event to carry the data collected in the process until then. This is described in more depth in Section 3.
- "Establish a set of named XES attributes to connect (sensor) events to groups of process tasks." is achieved by introducing stream/datacontext (see Section stream/datacontext). This allows us to connect (sensor) data to process tasks, although it is not in the same granularity, for example when averaged sensor readings (e.g., temperature, humidity, ...) span multiple process tasks.
In the future, a complete event log of a factory shall be parsed, analysed and visualized based on the proposed extension format to support process refinement and root cause analysis, in order to promote process re-engineering with the goal of resilient [[
Graph: Figure 1 Related Work.
Graph: Figure 2 Example Process.
Graph: Figure 3 Different Contexts in Which IoT Data Can Be Collected.
Graph: Figure 4 XES + DataStream Metamodel Extension.
Graph: Figure 5 Process Model for Collecting Public Transportation Delay Data (Process model used for creating the dataset described in ) (accessed on 8 February 2023).
Graph: Figure 6 Data Probes for the "Get Weather" Task.
Graph: Figure 7 Data Probe(s) for the "Get Traffic Status" Task.
Graph: Figure 8 One Good and One Bad (With Chip) Chess Piece on the Palette (Figure available at (accessed on 8 February 2023)).
Conceptualization, J.M., J.G., L.M., M.E. and Y.B.; methodology, J.M., J.G., L.M., M.E. and Y.B.; software, J.M. and J.G.; validation, J.M. and M.E.; investigation, J.M., J.G., L.M., M.E. and Y.B.; resources, S.R.-M., E.S.A. and R.B.; data curation, J.M. and M.E.; writing—original draft preparation, J.M., J.G. and L.M.; writing—review and editing, J.M., J.G., L.M., M.E., Y.B., J.-V.B., S.R.-M., E.S.A. and R.B.; visualization, J.M.; supervision, S.R.-M., E.S.A. and R.B.; project administration, S.R.-M., E.S.A. and R.B.; funding acquisition, S.R.-M., E.S.A. and R.B. All authors have read and agreed to the published version of the manuscript.
The datasets utilized in this work are openly available in the Zenodo open access repository at https://doi.org/10.5281/zenodo.7411234 and https://doi.org/10.5281/zenodo.7477845 (accessed on 8 February 2023).
The authors declare no conflict of interest.
By Juergen Mangler; Joscha Grüger; Lukas Malburg; Matthias Ehrendorfer; Yannis Bertrand; Janik-Vasily Benzin; Stefanie Rinderle-Ma; Estefania Serral Asensio and Ralph Bergmann
Reported by Author; Author; Author; Author; Author; Author; Author; Author; Author