The filters for event data subsetting can mostly be divided into two type: event filters and case filters. Event filters will subset parts of cases based on criteria applied on the events (e.g. the resource which performed it), while case filters will subset complete cases, based on criteria applied on the cases (e.g. the trace length).

Each filter has a reverse argument, which allows to reverse the filter very easily. Furthermore, each filter has an interface-alternative, which can be called by adding a i before the function name.

Event filters

Filter activities

The filter activity function can be used to filter activities by name. It has three arguments

  • the event log
  • a vector of activities
  • the reverse argument (FALSE or TRUE)
patients %>%
    filter_activity(c("X-Ray", "Blood test")) %>%
    summary
## Number of events:  996
## Number of cases:  498
## Number of traces:  2
## Number of distinct activities:  2
## Average trace length:  2
## 
## Start eventlog:  2017-01-05 08:59:04
## End eventlog:  2018-05-05 01:34:30
##                   handling     patient          employee
##  Blood test           :474   Length:996         r1:  0  
##  Check-out            :  0   Class :character   r2:  0  
##  Discuss Results      :  0   Mode  :character   r3:474  
##  MRI SCAN             :  0                      r4:  0  
##  Registration         :  0                      r5:522  
##  Triage and Assessment:  0                      r6:  0  
##  X-Ray                :522                      r7:  0  
##  handling_id        registration_type      time                    
##  Length:996         complete:498      Min.   :2017-01-05 08:59:04  
##  Class :character   start   :498      1st Qu.:2017-05-06 12:31:43  
##  Mode  :character                     Median :2017-09-08 00:10:11  
##                                       Mean   :2017-09-03 07:11:55  
##                                       3rd Qu.:2017-12-23 02:06:20  
##                                       Max.   :2018-05-05 01:34:30  
##                                                                    
##      .order     
##  Min.   :  1.0  
##  1st Qu.:249.8  
##  Median :498.5  
##  Mean   :498.5  
##  3rd Qu.:747.2  
##  Max.   :996.0  
## 

As one can see, there are only 2 distinct activities left in the event log.

Filter on activity frequency

It is also possible to filter on activity frequency. This filter uses a percentile cut off, and will look at those activities which are most frequent until the required percentage of events has been reached. Thus, a percentile cut off of 80% will look at the activities needed to represent 80% of the events. In the example below, the least frequent activities covering 50% of the event log are selected, since the reverse argument is true.

patients %>%
    filter_activity_frequency(percentile_cut_off = 0.5, reverse = T) %>%
    activity_frequency("activity")
## Warning in deprecated_perc(percentage, ...): Argument percentile_cut_off is
## deprecated. Use percentage instead.
## # A tibble: 4 x 3
##   handling   absolute relative
##   <fct>         <int>    <dbl>
## 1 Blood test      237    0.193
## 2 Check-out       492    0.401
## 3 MRI SCAN        236    0.192
## 4 X-Ray           261    0.213

Filter on attributes

The filter_attributes function is a very generic function an can be supplied with conditions on the data set, in the same way as the dplyr::filter function. As such, it allows you to filter on event or case attributes. Multiple conditions can be listed, separated by a comma. In that case, the comma will be treated as “and”. You can use the |-symbol to state “OR”. Since the patients dataset does not have many additional attributes, the example below uses the resource and activity. This filter is thus the same as the combination of filter_activity and filter_resource, in case both conditions were required. However, it has the advantange of stating both conditions as OR.

patients %>% 
    filter_attributes(employee == "r1" | handling == "X-Ray") 
## Event log consisting of:
## 1522 events
## 2 traces
## 500 cases
## 2 activities
## 761 activity instances
## 
## # A tibble: 1,522 x 7
##    handling     patient employee handling_id registration_type
##    <fct>        <chr>   <fct>    <chr>       <fct>            
##  1 Registration 1       r1       1           start            
##  2 Registration 2       r1       2           start            
##  3 Registration 3       r1       3           start            
##  4 Registration 4       r1       4           start            
##  5 Registration 5       r1       5           start            
##  6 Registration 6       r1       6           start            
##  7 Registration 7       r1       7           start            
##  8 Registration 8       r1       8           start            
##  9 Registration 9       r1       9           start            
## 10 Registration 10      r1       10          start            
## # ... with 1,512 more rows, and 2 more variables: time <dttm>,
## #   .order <int>

Filter resources

Similar to the activity filter, the resource filter can be used to filter events by listing on or more resources.

patients %>%
    filter_resource(c("r1","r4")) %>%
    resource_frequency("resource")
## # A tibble: 2 x 3
##   employee absolute relative
##   <fct>       <int>    <dbl>
## 1 r1            500    0.679
## 2 r4            236    0.321

Trim cases

The trim filter is a special event filter, as it also take into account the notion of cases. In fact, it trim cases such that they start with a certain activities until they end with a certain activity. It requires two list: one for possible start activities and one for end activities. The cases will be trimmed from the first appearance of a start activity till the last appearance of an end activity. When reversed, these slices of the event log will be removed instead of preserved.

patients %>%
    filter_trim(start_activities = "Registration", end_activities =  c("MRI SCAN","X-Ray")) %>%
    process_map(type = performance())

Case filters

Documentation coming soon