After preparing the data and creating an eventlog object, we can use bupaR functions to get basic information from the log, as well as metadata. In this example, we will use the patients provided by eventdataR.

library(bupaR)
eventdataR::patients
## Log of 5442 events consisting of:
## 7 traces 
## 500 cases 
## 2721 instances of 7 activities 
## 7 resources 
## Events occurred from 2017-01-02 11:41:53 until 2018-05-05 07:16:02 
##  
## Variables were mapped as follows:
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type 
## 
## # A tibble: 5,442 x 7
##    handling patient employee handling_id registration_ty~ time               
##    <fct>    <chr>   <fct>    <chr>       <fct>            <dttm>             
##  1 Registr~ 1       r1       1           start            2017-01-02 11:41:53
##  2 Registr~ 2       r1       2           start            2017-01-02 11:41:53
##  3 Registr~ 3       r1       3           start            2017-01-04 01:34:05
##  4 Registr~ 4       r1       4           start            2017-01-04 01:34:04
##  5 Registr~ 5       r1       5           start            2017-01-04 16:07:47
##  6 Registr~ 6       r1       6           start            2017-01-04 16:07:47
##  7 Registr~ 7       r1       7           start            2017-01-05 04:56:11
##  8 Registr~ 8       r1       8           start            2017-01-05 04:56:11
##  9 Registr~ 9       r1       9           start            2017-01-06 05:58:54
## 10 Registr~ 10      r1       10          start            2017-01-06 05:58:54
## # ... with 5,432 more rows, and 1 more variable: .order <int>

Getting metadata

The mapping function can be used to retrieve all the meta data from an event log object, i.e. the relation between event log identifiers and data fields.

patients %>% mapping
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type

In this case, we see that the handling field is the activity identifier in the event log, while the patient field is used as case identifier. We can also obtain each of these identifiers individually.

patients %>% activity_id
patients %>% case_id
patients %>% resource_id
## [1] "handling"
## [1] "patient"
## [1] "employee"

Getting basic information

We can look at a general summary of the event log by calling the summary function.

patients %>% summary
## Number of events:  5442
## Number of cases:  500
## Number of traces:  7
## Number of distinct activities:  7
## Average trace length:  10.884
## 
## Start eventlog:  2017-01-02 11:41:53
## End eventlog:  2018-05-05 07:16:02
##                   handling      patient          employee  handling_id       
##  Blood test           : 474   Length:5442        r1:1000   Length:5442       
##  Check-out            : 984   Class :character   r2:1000   Class :character  
##  Discuss Results      : 990   Mode  :character   r3: 474   Mode  :character  
##  MRI SCAN             : 472                      r4: 472                     
##  Registration         :1000                      r5: 522                     
##  Triage and Assessment:1000                      r6: 990                     
##  X-Ray                : 522                      r7: 984                     
##  registration_type      time                         .order    
##  complete:2721     Min.   :2017-01-02 11:41:53   Min.   :   1  
##  start   :2721     1st Qu.:2017-05-06 17:15:18   1st Qu.:1361  
##                    Median :2017-09-08 04:16:50   Median :2722  
##                    Mean   :2017-09-02 20:52:34   Mean   :2722  
##                    3rd Qu.:2017-12-22 15:44:11   3rd Qu.:4082  
##                    Max.   :2018-05-05 07:16:02   Max.   :5442  
## 

The basic counts which show up in the summary can also be retrieved indivdual as a numeric vector of length one.

patients %>% n_activities
patients %>% n_activity_instances
patients %>% n_cases
patients %>% n_events
patients %>% n_traces
patients %>% n_resources
## [1] 7
## [1] 2721
## [1] 500
## [1] 5442
## [1] 7
## [1] 7

More detailed information about activities , cases, resources and traces can be obtained using the functions named accordingly. For example, consider the overview of the cases of the patients event log below.

patients %>% cases
## # A tibble: 500 x 10
##    patient trace_length number_of_activ~ start_timestamp     complete_timestamp 
##    <chr>          <int>            <int> <dttm>              <dttm>             
##  1 1                  6                6 2017-01-02 11:41:53 2017-01-09 19:45:45
##  2 10                 5                5 2017-01-06 05:58:54 2017-01-10 15:41:59
##  3 100                5                5 2017-04-11 16:34:31 2017-04-22 09:58:07
##  4 101                5                5 2017-04-16 06:38:58 2017-04-23 02:55:23
##  5 102                5                5 2017-04-16 06:38:58 2017-04-22 10:50:04
##  6 103                6                6 2017-04-19 20:22:01 2017-04-23 02:36:55
##  7 104                6                6 2017-04-19 20:22:01 2017-04-23 02:07:20
##  8 105                6                6 2017-04-21 02:19:09 2017-04-27 01:09:05
##  9 106                6                6 2017-04-21 02:19:09 2017-05-01 09:54:39
## 10 107                5                5 2017-04-22 18:32:16 2017-04-27 02:45:57
## # ... with 490 more rows, and 5 more variables: trace <chr>, trace_id <dbl>,
## #   duration_in_days <dbl>, first_activity <fct>, last_activity <fct>