After preparing the data and creating an eventlog object, we can use bupaR functions to get basic information from the log, as well as metadata. In this example, we will use the patients provided by eventdataR.

library(bupaR)
eventdataR::patients
## Event log consisting of:
## 5442 events
## 7 traces
## 500 cases
## 7 activities
## 2721 activity instances
## 
## # A tibble: 5,442 x 6
##        handling patient employee handling_id registration_type
##           <chr>   <int>    <chr>       <int>             <chr>
##  1 Registration       1       r1           1             start
##  2 Registration       2       r1           2             start
##  3 Registration       3       r1           3             start
##  4 Registration       4       r1           4             start
##  5 Registration       5       r1           5             start
##  6 Registration       6       r1           6             start
##  7 Registration       7       r1           7             start
##  8 Registration       8       r1           8             start
##  9 Registration       9       r1           9             start
## 10 Registration      10       r1          10             start
## # ... with 5,432 more rows, and 1 more variables: time <dttm>

Getting metadata

The mapping function can be used to retrieve all the meta data from an event log object, i.e. the relation between event log identifiers and data fields.

patients %>% mapping
## Case identifier:     patient 
## Activity identifier:     handling 
## Resource identifier:     employee 
## Activity instance identifier:    handling_id 
## Timestamp:           time 
## Lifecycle transition:        registration_type

In this case, we see that the handling field is the activity identifier in the event log, while the patient field is used as case identifier. We can also obtain each of these identifiers individually.

patients %>% activity_id
patients %>% case_id
patients %>% resource_id
## [1] "handling"
## [1] "patient"
## [1] "employee"

Getting basic information

We can look at a general summary of the event log by calling the summary function.

patients %>% summary
## Number of events:  5442
## Number of cases:  500
## Number of traces:  7
## Number of distinct activities:  7
## Average trace length:  10.884
## 
## Start eventlog:  2017-01-02 11:41:53
## End eventlog:  2018-05-05 07:16:02
##    handling            patient        employee          handling_id  
##  Length:5442        Min.   :  1.0   Length:5442        Min.   :   1  
##  Class :character   1st Qu.:125.0   Class :character   1st Qu.: 681  
##  Mode  :character   Median :249.0   Mode  :character   Median :1361  
##                     Mean   :249.2                      Mean   :1361  
##                     3rd Qu.:374.0                      3rd Qu.:2041  
##                     Max.   :500.0                      Max.   :2721  
##  registration_type       time                    
##  Length:5442        Min.   :2017-01-02 11:41:53  
##  Class :character   1st Qu.:2017-05-06 17:15:18  
##  Mode  :character   Median :2017-09-08 04:16:50  
##                     Mean   :2017-09-02 20:52:34  
##                     3rd Qu.:2017-12-22 15:44:11  
##                     Max.   :2018-05-05 07:16:02

The basic counts which show up in the summary can also be retrieved indivdual as a numeric vector of length one.

patients %>% n_activities
patients %>% n_activity_instances
patients %>% n_cases
patients %>% n_events
patients %>% n_traces
patients %>% n_resources
## [1] 7
## [1] 2721
## [1] 500
## [1] 5442
## [1] 7
## [1] 7

More detailed information about activities , cases, resources and traces can be obtained using the functions named accordingly. For example, consider the overview of the cases of the patients event log below.

patients %>% cases
## # A tibble: 500 x 10
##    patient trace_length number_of_activities     start_timestamp
##      <int>        <int>                <int>              <dttm>
##  1       1            6                    6 2017-01-02 11:41:53
##  2       2            5                    5 2017-01-02 11:41:53
##  3       3            6                    6 2017-01-04 01:34:05
##  4       4            6                    6 2017-01-04 01:34:04
##  5       5            5                    5 2017-01-04 16:07:47
##  6       6            6                    6 2017-01-04 16:07:47
##  7       7            6                    6 2017-01-05 04:56:11
##  8       8            5                    5 2017-01-05 04:56:11
##  9       9            5                    5 2017-01-06 05:58:54
## 10      10            5                    5 2017-01-06 05:58:54
## # ... with 490 more rows, and 6 more variables: complete_timestamp <dttm>,
## #   trace <chr>, trace_id <dbl>, duration_in_days <dbl>,
## #   first_activity <fctr>, last_activity <fctr>

Basic subsetting

Slicing

An eventlog can be sliced, which mean returning a slice, i.e. a subset, from the eventlog, based on row number. For instance, the example below shows how we can select event number 101 to 200.

patients %>%
    slice(101:200)
## Event log consisting of:
## 100 events
## 1 traces
## 100 cases
## 1 activities
## 100 activity instances
## 
## # A tibble: 100 x 6
##        handling patient employee handling_id registration_type
##           <chr>   <int>    <chr>       <int>             <chr>
##  1 Registration     101       r1         101             start
##  2 Registration     102       r1         102             start
##  3 Registration     103       r1         103             start
##  4 Registration     104       r1         104             start
##  5 Registration     105       r1         105             start
##  6 Registration     106       r1         106             start
##  7 Registration     107       r1         107             start
##  8 Registration     108       r1         108             start
##  9 Registration     109       r1         109             start
## 10 Registration     110       r1         110             start
## # ... with 90 more rows, and 1 more variables: time <dttm>

Sampling

In contrast to the slice function, the sample_n function allows to take a sample of the event log contain n cases. The code below returns a sample of 10 patients.

patients %>%
    sample_n(size = 10)
## Event log consisting of:
## 112 events
## 2 traces
## 10 cases
## 7 activities
## 56 activity instances
## 
## # A tibble: 112 x 6
##        handling patient employee handling_id registration_type
##           <chr>   <int>    <chr>       <int>             <chr>
##  1 Registration       4       r1           4             start
##  2 Registration     115       r1         115             start
##  3 Registration     180       r1         180             start
##  4 Registration     242       r1         242             start
##  5 Registration     272       r1         272             start
##  6 Registration     287       r1         287             start
##  7 Registration     309       r1         309             start
##  8 Registration     345       r1         345             start
##  9 Registration     356       r1         356             start
## 10 Registration     421       r1         421             start
## # ... with 102 more rows, and 1 more variables: time <dttm>

Note that this function can also be used with a sample size bigger than the number of cases in the event log, if you allow for the replacements of draw.

An more extensive list of subsetting methods is provided by edeaR. Look here for more information.