There are several ways to enrich event data, by using predefined metrics as well as customized approaches.

Appending metrics

The metrics defined here cannot only be computed in isolation, but also directly added as additional information to the event log. This is most useful at the level of cases, but also supported for the levels activity, resource and resource-activity (if available).

Appending metrics to the event data can be done by calling the metric with the appropriate level and setting the append = TRUE argument. For example, consider the throughput time.

patients %>%
    throughput_time(level = "case",append = TRUE)
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 8
##    handling patient employee handling_id registration_ty~
##    <fct>    <chr>   <fct>    <chr>       <fct>           
##  1 Registr~ 1       r1       1           start           
##  2 Registr~ 2       r1       2           start           
##  3 Registr~ 3       r1       3           start           
##  4 Registr~ 4       r1       4           start           
##  5 Registr~ 5       r1       5           start           
##  6 Registr~ 6       r1       6           start           
##  7 Registr~ 7       r1       7           start           
##  8 Registr~ 8       r1       8           start           
##  9 Registr~ 9       r1       9           start           
## 10 Registr~ 10      r1       10          start           
## # ... with 5,432 more rows, and 3 more variables: time <dttm>,
## #   .order <int>, throughput_time_case <dbl>

A new variable, “throughput_time_case”, has now been added to the event log as a case attribute. This new attribute can than be directly used in later analysis.

For some metrics, there are multiple output values that are candidate to be appended. For example, considered the output of the trace coverage metric.

patients %>% 
    trace_coverage(level = "case")
## # A tibble: 500 x 4
##    patient trace                                          absolute relative
##    <chr>   <chr>                                             <int>    <dbl>
##  1 2       Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  2 5       Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  3 8       Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  4 9       Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  5 10      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  6 11      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  7 14      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  8 17      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
##  9 18      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
## 10 19      Registration,Triage and Assessment,X-Ray,Disc~      258    0.516
## # ... with 490 more rows

We obtain the absolute number of cases that are covered by trace, as well as the relative number. Only one of these variable gets appended, and which one is chosen automatically for each metric. The result below shows that the absolute values are appended.

patients %>%
    trace_coverage(level = "case",append = TRUE)
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 9
##    handling patient employee handling_id registration_ty~
##    <fct>    <chr>   <fct>    <chr>       <fct>           
##  1 Registr~ 1       r1       1           start           
##  2 Registr~ 2       r1       2           start           
##  3 Registr~ 3       r1       3           start           
##  4 Registr~ 4       r1       4           start           
##  5 Registr~ 5       r1       5           start           
##  6 Registr~ 6       r1       6           start           
##  7 Registr~ 7       r1       7           start           
##  8 Registr~ 8       r1       8           start           
##  9 Registr~ 9       r1       9           start           
## 10 Registr~ 10      r1       10          start           
## # ... with 5,432 more rows, and 4 more variables: time <dttm>,
## #   .order <int>, trace <chr>, absolute_case_trace_coverage <int>

To change this default, the argument append_column can be set. For instance, we can instead append the relative coverage.

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") 
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 9
##    handling patient employee handling_id registration_ty~
##    <fct>    <chr>   <fct>    <chr>       <fct>           
##  1 Registr~ 1       r1       1           start           
##  2 Registr~ 2       r1       2           start           
##  3 Registr~ 3       r1       3           start           
##  4 Registr~ 4       r1       4           start           
##  5 Registr~ 5       r1       5           start           
##  6 Registr~ 6       r1       6           start           
##  7 Registr~ 7       r1       7           start           
##  8 Registr~ 8       r1       8           start           
##  9 Registr~ 9       r1       9           start           
## 10 Registr~ 10      r1       10          start           
## # ... with 5,432 more rows, and 4 more variables: time <dttm>,
## #   .order <int>, trace <chr>, relative_case_trace_coverage <dbl>

Custom enrichment

Next to the metrics, more customized enrichments can be made. Suppose we want to indicate which patients have had a MRI-SCAN. Using mutate, we can do this as follows.

patients %>%
    group_by_case %>%
    mutate(had_MRI = any(handling == "MRI SCAN")) %>%
    ungroup_eventlog()
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 8
##    handling patient employee handling_id registration_ty~
##    <fct>    <chr>   <fct>    <chr>       <fct>           
##  1 Registr~ 1       r1       1           start           
##  2 Registr~ 2       r1       2           start           
##  3 Registr~ 3       r1       3           start           
##  4 Registr~ 4       r1       4           start           
##  5 Registr~ 5       r1       5           start           
##  6 Registr~ 6       r1       6           start           
##  7 Registr~ 7       r1       7           start           
##  8 Registr~ 8       r1       8           start           
##  9 Registr~ 9       r1       9           start           
## 10 Registr~ 10      r1       10          start           
## # ... with 5,432 more rows, and 3 more variables: time <dttm>,
## #   .order <int>, had_MRI <lgl>

Note that the group_by_case function is a helpful function to group the data by case id. As a result, the mutate will look for the MRI SCAN in each case separately. The ungroup_eventlog function removes the grouping, so that later analyses are not affected by this.

Refining enriched data

Using mutate, one can always further refine the enriched variables. For instance, after appending the relative trace coverage, we can create a variable that indicates whether a case followed a frequent or infrequent path. The following code adds a variable frequent whioch is TRUE if more than 20% of the cases share the same trace.

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") %>%
    mutate(frequent = relative_case_trace_coverage > 0.2)
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 10
##    handling patient employee handling_id registration_ty~
##    <fct>    <chr>   <fct>    <chr>       <fct>           
##  1 Registr~ 1       r1       1           start           
##  2 Registr~ 2       r1       2           start           
##  3 Registr~ 3       r1       3           start           
##  4 Registr~ 4       r1       4           start           
##  5 Registr~ 5       r1       5           start           
##  6 Registr~ 6       r1       6           start           
##  7 Registr~ 7       r1       7           start           
##  8 Registr~ 8       r1       8           start           
##  9 Registr~ 9       r1       9           start           
## 10 Registr~ 10      r1       10          start           
## # ... with 5,432 more rows, and 5 more variables: time <dttm>,
## #   .order <int>, trace <chr>, relative_case_trace_coverage <dbl>,
## #   frequent <lgl>

The new attribute can than be included in further analysis. For instance, does throughput time differ between frequent and infrequent traces?

patients %>%
    trace_coverage(level = "case",append = TRUE, append_column = "relative") %>%
    mutate(frequent = relative_case_trace_coverage > 0.2) %>%
    group_by(frequent) %>%
    processing_time() 
## Warning: `cols` is now required.
## Please use `cols = c(raw)`
## Warning: `cols` is now required.
## Please use `cols = c(data)`
## # A tibble: 2 x 9
##   frequent   min    q1 median  mean    q3   max st_dev   iqr
##   <lgl>    <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl> <dbl>
## 1 TRUE     0.723 1.04   1.16  1.16   1.28  1.59  0.169 0.239
## 2 FALSE    0.447 0.734  0.951 0.901  1.08  1.28  0.297 0.344

We see that frequent traces have a higher throughput time on average. For this specific case, the reason for this is that the infrequent traces are actually incomplete ones.