There are several ways to enrich event data, by using predefined metrics as well as customized approaches.

Appending metrics

The metrics defined here cannot only be computed in isolation, but also directly added as additional information to the event log. This is most useful at the level of cases, but also supported for the levels activity, resource and resource-activity (if available).

Appending metrics to the event data can be done by calling the metric with the appropriate level and setting the append = TRUE argument. For example, consider the throughput time.

patients %>%
    throughput_time(level = "case",append = TRUE)
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 8
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 3 more variables: time <dttm>,
## #   .order <int>, throughput_time_case <dbl>

A new variable, “throughput_time_case”, has now been added to the event log as a case attribute. This new attribute can than be directly used in later analysis.

For some metrics, there are multiple output values that are candidate to be appended. For example, considered the output of the trace coverage metric.

patients %>% 
    trace_coverage(level = "case")
## # A tibble: 500 x 4
##    patient trace                                         absolute relative
##    <chr>   <chr>                                            <int>    <dbl>
##  1 2       Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  2 5       Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  3 8       Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  4 9       Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  5 10      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  6 11      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  7 14      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  8 17      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
##  9 18      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
## 10 19      Registration,Triage and Assessment,X-Ray,Dis~      258    0.516
## # ... with 490 more rows

We obtain the absolute number of cases that are covered by trace, as well as the relative number. Only one of these variable gets appended, and which one is chosen automatically for each metric. The result below shows that the absolute values are appended.

patients %>%
    trace_coverage(level = "case",append = TRUE)
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 9
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 4 more variables: time <dttm>,
## #   .order <int>, trace <chr>, absolute_case_trace_coverage <int>

To change this default, the argument append_column can be set. For instance, we can instead append the relative coverage.

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") 
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 9
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 4 more variables: time <dttm>,
## #   .order <int>, trace <chr>, relative_case_trace_coverage <dbl>

Custom enrichment

Next to the metrics, more customized enrichments can be made. Suppose we want to indicate which patients have had a MRI-SCAN. Using mutate, we can do this as follows.

patients %>%
    group_by_case %>%
    mutate(had_MRI = any(handling == "MRI SCAN")) %>%
    ungroup_eventlog()
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 8
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 3 more variables: time <dttm>,
## #   .order <int>, had_MRI <lgl>

Note that the group_by_case function is a helpful function to group the data by case id. As a result, the mutate will look for the MRI SCAN in each case separately. The ungroup_eventlog function removes the grouping, so that later analyses are not affected by this.

Refining enriched data

Using mutate, one can always further refine the enriched variables. For instance, after appending the relative trace coverage, we can create a variable that indicates whether a case followed a frequent or infrequent path. The following code adds a variable frequent whioch is TRUE if more than 20% of the cases share the same trace.

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") %>%
    mutate(frequent = relative_case_trace_coverage > 0.2)
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 10
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 5 more variables: time <dttm>,
## #   .order <int>, trace <chr>, relative_case_trace_coverage <dbl>,
## #   frequent <lgl>

The new attribute can than be included in further analysis. For instance, does throughput time differ between frequent and infrequent traces?

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") %>%
    mutate(frequent = relative_case_trace_coverage > 0.2) %>%
    group_by(frequent) %>%
    throughput_time()
## # A tibble: 2 x 9
##   frequent   min    q1 median  mean    q3   max st_dev   iqr
##   <lgl>    <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 TRUE      1.50  4.33   6.15  6.70  8.61 23.1    3.24  4.28
## 2 FALSE     3.07  3.94   4.58  4.97  5.60  7.81   1.68  1.66

We see that frequent traces have a higher throughput time on average. For this specific case, the reason for this is that the infrequent traces are actually incomplete ones.