This page will guide you in creating their own event log objects. Firstly, the data model for events is introduced. Secondly, it is shown how to create eventlog objects. Finally, some common examples of data transformations are shown which might be useful for reshaping event data.

## Event data model

The notion of event log in bupaR refers to a set of events which are recorded in the context of a process. For instance, suppose the process under consideration takes place in the emergency department of a hospital. A general representation of the data model is shown below. Firstly, each event belongs to a case. A case, in general is an instance of the process. In the emergency department example, a case would be a visit by a patient.

### Activities and activity instances

Each event relates to the coarser concept of an activity. For instance, activities in our example might be: check-in, surgery, treatment, etc. When an activity is performed, this means that an activity instance is created. While the label surgery refers to an activity, one specific surgery for a specific patient at a specific point in time is an activity instance.

## # A tibble: 5 x 2
##    patient  activity
##      <chr>     <chr>
## 1 John Doe  check-in
## 2 John Doe   surgery
## 3 John Doe treatment
## 4 John DOe   surgery
## 5 John Doe check-out

The table above shows a fictious example of a patient which went through 4 different activities. Note that there are 5 different activity instances, as there were two instances of the “surgery” activity.

### Transactional life cycle

An event is an atomic registration related to an activity instance. It thus contains one (and only one) timestamp. Additionally, the event should include a reference to a life cycle transition. More specificaly, multiple events can describe different life cycle transitions of a single activity instance. For example, one event might record when a surgery is scheduled, another when it is started, yet another when it is completed, etc. The Figure shows the standard transactional life cycle. While it supports a wide variety of transaction life cycle stages, the user is allowed to defined his/her own life cycle transitions.

In the table below, the earlier example is extended: for each activity instance, different statusses of the transactional life cycle can be seen, each of them with there own timestamp. At this point, each row refers to a single specific event. Note that not all activity instances have the same life cycle transations, and even different instances of the same activity might have different recorded transitions. E.g. in contrast with the first surgery, the second started without being scheduled, probably due to reasons of urgency.

patient activity timestamp status
John Doe check-in 2017-05-10 08:33:26 complete
John Doe surgery 2017-05-10 08:38:21 schedule
John Doe surgery 2017-05-10 08:53:16 start
John Doe surgery 2017-05-10 09:25:19 complete
John Doe treatment 2017-05-10 10:01:25 start
John Doe treatment 2017-05-10 10:35:18 complete
John DOe surgery 2017-05-10 10:41:35 start
John DOe surgery 2017-05-10 11:05:56 complete
John Doe check-out 2017-05-11 14:52:36 complete

In order to be able to correlate events which belong to the same activity instance, an activity instance identifier is required. For example, it is possible that a patient has gone through different surgeries, each with there own scheduled, started and complete event. The activity instance identifier will then allow to distinguish which events belong together and which not. The activity instance identifier is always required and will be very important in case of concurrent activity instances. In the table below, the event-activity instance correlation is formally defined by the activity_instance column. It is important to note that this instance identifier should be unique, also among different cases and activities.

patient activity timestamp status activity_instance
John Doe check-in 2017-05-10 08:33:26 complete 1
John Doe surgery 2017-05-10 08:38:21 schedule 2
John Doe surgery 2017-05-10 08:53:16 start 2
John Doe surgery 2017-05-10 09:25:19 complete 2
John Doe treatment 2017-05-10 10:01:25 start 3
John Doe treatment 2017-05-10 10:35:18 complete 3
John Doe surgery 2017-05-10 10:41:35 start 4
John Doe surgery 2017-05-10 11:05:56 complete 4
John Doe check-out 2017-05-11 14:52:36 complete 5

### Resources

Finally, each event can also contain the notion of a resource. For instance, the adminstrative clerk that checked-in a patient or scheduled its surgery, or the docter which performed a treatment.

patient activity timestamp status activity_instance resource
John Doe check-in 2017-05-10 08:33:26 complete 1 Samantha
John Doe surgery 2017-05-10 08:38:21 schedule 2 Danny
John Doe surgery 2017-05-10 08:53:16 start 2 Richard
John Doe surgery 2017-05-10 09:25:19 complete 2 Richard
John Doe treatment 2017-05-10 10:01:25 start 3 Danny
John Doe treatment 2017-05-10 10:35:18 complete 3 Danny
John Doe surgery 2017-05-10 10:41:35 start 4 William
John Doe surgery 2017-05-10 11:05:56 complete 4 William
John Doe check-out 2017-05-11 14:52:36 complete 5 Samantha

To sum up, each row in the data should be an event with at least 6 different pieces of required information:

• a timestamp
• a case identifier
• an activity label
• a activity instance identifier
• a transactional life cycle stage
• a resource identifier

Additionally, any number of custom event attributes can be addedn e.g. cost.

## The event log object

Given that the data is in the format discussed above and stored as a data.frame, an event log object can be created with the eventlog function from bupaR as shown below.

library(bupaR)
example_log_1 %>% #a data.frame with the information in the table above
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
)
## Event log consisting of:
## 9 events
## 1 traces
## 1 cases
## 4 activities
## 5 activity instances
##
## # A tibble: 9 x 6
##    patient  activity           timestamp   status activity_instance
##      <chr>     <chr>              <dttm>    <chr>             <dbl>
## 1 John Doe  check-in 2017-05-10 08:33:26 complete                 1
## 2 John Doe   surgery 2017-05-10 08:38:21 schedule                 2
## 3 John Doe   surgery 2017-05-10 08:53:16    start                 2
## 4 John Doe   surgery 2017-05-10 09:25:19 complete                 2
## 5 John Doe treatment 2017-05-10 10:01:25    start                 3
## 6 John Doe treatment 2017-05-10 10:35:18 complete                 3
## 7 John Doe   surgery 2017-05-10 10:41:35    start                 4
## 8 John Doe   surgery 2017-05-10 11:05:56 complete                 4
## 9 John Doe check-out 2017-05-11 14:52:36 complete                 5
## # ... with 1 more variables: resource <chr>

## Common transformations

Often, data will not come in the format defined above, or will not include all the required values. Below are given a few examples and how to handle them.

### Lack of transitional lifecycle

It happens a lot that data is not recorded at the low level of transactions, but that only a single timestamp is recorded for each activity instance. In that case, an event is equivalent to a activity instance. For instance, consider the example above, but now we only have the following information.

example_log_2
## # A tibble: 5 x 4
##    patient  activity           timestamp resource
##      <chr>     <chr>              <dttm>    <chr>
## 1 John Doe  check-in 2017-05-10 08:33:26 Samantha
## 2 John Doe   surgery 2017-05-10 09:25:19  Richard
## 3 John Doe treatment 2017-05-10 10:35:18    Danny
## 4 John Doe   surgery 2017-05-10 11:05:56  William
## 5 John Doe check-out 2017-05-11 14:52:36 Samantha

When this is the case, it requires domain knowledge to know which transition of the life cycle is recorded. However, most of the time it will be the completion of a task which is recorded. A such, the lifecycle transition can be added manually, as well as the activity instance id, which is unique for each row.

example_log_2 %>%
mutate(status = "complete",
activity_instance = 1:nrow(.)) %>%
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
)
## Event log consisting of:
## 5 events
## 1 traces
## 1 cases
## 4 activities
## 5 activity instances
##
## # A tibble: 5 x 6
##    patient  activity           timestamp resource   status
##      <chr>     <chr>              <dttm>    <chr>    <chr>
## 1 John Doe  check-in 2017-05-10 08:33:26 Samantha complete
## 2 John Doe   surgery 2017-05-10 09:25:19  Richard complete
## 3 John Doe treatment 2017-05-10 10:35:18    Danny complete
## 4 John Doe   surgery 2017-05-10 11:05:56  William complete
## 5 John Doe check-out 2017-05-11 14:52:36 Samantha complete
## # ... with 1 more variables: activity_instance <int>

### Lack of resources

Since many of the functions in bupaR are targetted towards organizational and performance issues, they expect the presence of the resource attribute. However, in certain cases, this information will no be available, such as for the data in example_log_3.

example_log_3
## # A tibble: 9 x 5
##    patient  activity           timestamp   status activity_instance
##      <chr>     <chr>              <dttm>    <chr>             <dbl>
## 1 John Doe  check-in 2017-05-10 08:33:26 complete                 1
## 2 John Doe   surgery 2017-05-10 08:38:21 schedule                 2
## 3 John Doe   surgery 2017-05-10 08:53:16    start                 2
## 4 John Doe   surgery 2017-05-10 09:25:19 complete                 2
## 5 John Doe treatment 2017-05-10 10:01:25    start                 3
## 6 John Doe treatment 2017-05-10 10:35:18 complete                 3
## 7 John Doe   surgery 2017-05-10 10:41:35    start                 4
## 8 John Doe   surgery 2017-05-10 11:05:56 complete                 4
## 9 John Doe check-out 2017-05-11 14:52:36 complete                 5

In order to work around this problem, the easiest solution is to include an empty resource variable.

example_log_3 %>%
mutate(resource = NA) %>%
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
)
## Event log consisting of:
## 9 events
## 1 traces
## 1 cases
## 4 activities
## 5 activity instances
##
## # A tibble: 9 x 6
##    patient  activity           timestamp   status activity_instance
##      <chr>     <chr>              <dttm>    <chr>             <dbl>
## 1 John Doe  check-in 2017-05-10 08:33:26 complete                 1
## 2 John Doe   surgery 2017-05-10 08:38:21 schedule                 2
## 3 John Doe   surgery 2017-05-10 08:53:16    start                 2
## 4 John Doe   surgery 2017-05-10 09:25:19 complete                 2
## 5 John Doe treatment 2017-05-10 10:01:25    start                 3
## 6 John Doe treatment 2017-05-10 10:35:18 complete                 3
## 7 John Doe   surgery 2017-05-10 10:41:35    start                 4
## 8 John Doe   surgery 2017-05-10 11:05:56 complete                 4
## 9 John Doe check-out 2017-05-11 14:52:36 complete                 5
## # ... with 1 more variables: resource <lgl>

### Activity log

Another possibity is that instead of a list of events, there is a list of activity instances available. This is the case in example_log_4.

example_log_4
## # A tibble: 5 x 5
##    patient  activity            schedule               start            complete
## *    <chr>     <chr>              <dttm>              <dttm>              <dttm>
## 1 John Doe  check-in                  NA                  NA 2017-05-10 08:33:26
## 2 John Doe check-out                  NA                  NA 2017-05-11 14:52:36
## 3 John Doe   surgery 2017-05-10 08:38:21 2017-05-10 08:53:16 2017-05-10 09:25:19
## 4 John Doe   surgery                  NA 2017-05-10 10:41:35 2017-05-10 11:05:56
## 5 John Doe treatment                  NA 2017-05-10 10:01:25 2017-05-10 10:35:18

When this is the case, we proceed by first adding an unique id to define the activity instances, and subsequently by gathering the different timestamp columns using tidyr::gather.

example_log_4 %>%
mutate(activity_instance = 1:nrow(.)) %>%
gather(status, timestamp, schedule, start, complete)  %>%
mutate(resource = NA) %>%
filter(!is.na(timestamp)) %>%
eventlog(
case_id = "patient",
activity_id = "activity",
activity_instance_id = "activity_instance",
lifecycle_id = "status",
timestamp = "timestamp",
resource_id = "resource"
)
## Event log consisting of:
## 9 events
## 1 traces
## 1 cases
## 4 activities
## 5 activity instances
##
## # A tibble: 9 x 6
##    patient  activity activity_instance   status           timestamp
##      <chr>     <chr>             <int>    <chr>              <dttm>
## 1 John Doe   surgery                 3 schedule 2017-05-10 08:38:21
## 2 John Doe   surgery                 3    start 2017-05-10 08:53:16
## 3 John Doe   surgery                 4    start 2017-05-10 10:41:35
## 4 John Doe treatment                 5    start 2017-05-10 10:01:25
## 5 John Doe  check-in                 1 complete 2017-05-10 08:33:26
## 6 John Doe check-out                 2 complete 2017-05-11 14:52:36
## 7 John Doe   surgery                 3 complete 2017-05-10 09:25:19
## 8 John Doe   surgery                 4 complete 2017-05-10 11:05:56
## 9 John Doe treatment                 5 complete 2017-05-10 10:35:18
## # ... with 1 more variables: resource <lgl>