After preparing the data and creating an eventlog object, we can use bupaR functions to get basic information from the log, as well as metadata. In this example, we will use the patients provided by eventdataR.

library(bupaR)
eventdataR::patients
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 7
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 5,432 more rows, and 2 more variables: time <dttm>,
## #   .order <int>

Getting metadata

The mapping function can be used to retrieve all the meta data from an event log object, i.e. the relation between event log identifiers and data fields.

patients %>% mapping
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type

In this case, we see that the handling field is the activity identifier in the event log, while the patient field is used as case identifier. We can also obtain each of these identifiers individually.

patients %>% activity_id
patients %>% case_id
patients %>% resource_id
## [1] "handling"
## [1] "patient"
## [1] "employee"

Getting basic information

We can look at a general summary of the event log by calling the summary function.

patients %>% summary
## Number of events:  5442
## Number of cases:  500
## Number of traces:  7
## Number of distinct activities:  7
## Average trace length:  10.884
## 
## Start eventlog:  2017-01-02 11:41:53
## End eventlog:  2018-05-05 07:16:02
##                   handling      patient          employee 
##  Blood test           : 474   Length:5442        r1:1000  
##  Check-out            : 984   Class :character   r2:1000  
##  Discuss Results      : 990   Mode  :character   r3: 474  
##  MRI SCAN             : 472                      r4: 472  
##  Registration         :1000                      r5: 522  
##  Triage and Assessment:1000                      r6: 990  
##  X-Ray                : 522                      r7: 984  
##  handling_id        registration_type      time                    
##  Length:5442        complete:2721     Min.   :2017-01-02 11:41:53  
##  Class :character   start   :2721     1st Qu.:2017-05-06 17:15:18  
##  Mode  :character                     Median :2017-09-08 04:16:50  
##                                       Mean   :2017-09-02 20:52:34  
##                                       3rd Qu.:2017-12-22 15:44:11  
##                                       Max.   :2018-05-05 07:16:02  
##                                                                    
##      .order    
##  Min.   :   1  
##  1st Qu.:1361  
##  Median :2722  
##  Mean   :2722  
##  3rd Qu.:4082  
##  Max.   :5442  
## 

The basic counts which show up in the summary can also be retrieved indivdual as a numeric vector of length one.

patients %>% n_activities
patients %>% n_activity_instances
patients %>% n_cases
patients %>% n_events
patients %>% n_traces
patients %>% n_resources
## [1] 7
## [1] 2721
## [1] 500
## [1] 5442
## [1] 7
## [1] 7

More detailed information about activities , cases, resources and traces can be obtained using the functions named accordingly. For example, consider the overview of the cases of the patients event log below.

patients %>% cases
## # A tibble: 500 x 10
##    patient trace_length number_of_activities start_timestamp    
##    <chr>          <int>                <int> <dttm>             
##  1 1                  6                    6 2017-01-02 11:41:53
##  2 10                 5                    5 2017-01-06 05:58:54
##  3 100                5                    5 2017-04-11 16:34:31
##  4 101                5                    5 2017-04-16 06:38:58
##  5 102                5                    5 2017-04-16 06:38:58
##  6 103                6                    6 2017-04-19 20:22:01
##  7 104                6                    6 2017-04-19 20:22:01
##  8 105                6                    6 2017-04-21 02:19:09
##  9 106                6                    6 2017-04-21 02:19:09
## 10 107                5                    5 2017-04-22 18:32:16
## # ... with 490 more rows, and 6 more variables: complete_timestamp <dttm>,
## #   trace <chr>, trace_id <dbl>, duration_in_days <dbl>,
## #   first_activity <fct>, last_activity <fct>

Basic subsetting

Slicing

An eventlog can be sliced, which mean returning a slice, i.e. a subset, from the eventlog, based on row number. For instance, the example below shows how we can select event number 101 to 200.

patients %>%
    slice(101:200)
## Event log consisting of:
## 1102 events
## 2 traces
## 100 cases
## 7 activities
## 551 activity instances
## 
## # A tibble: 1,102 x 7
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 101     r1       101         start            
##  2 Registration 102     r1       102         start            
##  3 Registration 103     r1       103         start            
##  4 Registration 104     r1       104         start            
##  5 Registration 105     r1       105         start            
##  6 Registration 106     r1       106         start            
##  7 Registration 107     r1       107         start            
##  8 Registration 108     r1       108         start            
##  9 Registration 109     r1       109         start            
## 10 Registration 110     r1       110         start            
## # ... with 1,092 more rows, and 2 more variables: time <dttm>,
## #   .order <int>

Sampling

In contrast to the slice function, the sample_n function allows to take a sample of the event log contain n cases. The code below returns a sample of 10 patients.

patients %>%
    sample_n(size = 10)
## Event log consisting of:
## 118 events
## 2 traces
## 10 cases
## 7 activities
## 59 activity instances
## 
## # A tibble: 118 x 7
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 26      r1       26          start            
##  2 Registration 35      r1       35          start            
##  3 Registration 49      r1       49          start            
##  4 Registration 191     r1       191         start            
##  5 Registration 227     r1       227         start            
##  6 Registration 255     r1       255         start            
##  7 Registration 402     r1       402         start            
##  8 Registration 464     r1       464         start            
##  9 Registration 468     r1       468         start            
## 10 Registration 484     r1       484         start            
## # ... with 108 more rows, and 2 more variables: time <dttm>, .order <int>

Note that this function can also be used with a sample size bigger than the number of cases in the event log, if you allow for the replacements of draw.

An more extensive list of subsetting methods is provided by edeaR. Look here for more information.