The filters for event data subsetting can mostly be divided into two type: event filters and case filters. Event filters will subset parts of cases based on criteria applied on the events (e.g. the resource which performed it), while case filters will subset complete cases, based on criteria applied on the cases (e.g. the trace length).

Each filter has a reverse argument, which allows to reverse the filter very easily. Furthermore, each filter has an interface-alternative, which can be called by adding a i before the function name.

Event filters

Filter activities

The filter activity function can be used to filter activities by name. It has three arguments

  • the event log
  • a vector of activities
  • the reverse argument (FALSE or TRUE)
patients %>%
    filter_activity(c("X-Ray", "Blood test")) %>%
    summary
## Number of events:  996
## Number of cases:  498
## Number of traces:  2
## Number of distinct activities:  2
## Average trace length:  2
## 
## Start eventlog:  2017-01-05 08:59:04
## End eventlog:  2018-05-05 01:34:30
##    handling            patient        employee          handling_id  
##  Length:996         Min.   :  1.0   Length:996         Min.   :1001  
##  Class :character   1st Qu.:125.0   Class :character   1st Qu.:1125  
##  Mode  :character   Median :249.5   Mode  :character   Median :1486  
##                     Mean   :249.5                      Mean   :1373  
##                     3rd Qu.:374.0                      3rd Qu.:1610  
##                     Max.   :498.0                      Max.   :1734  
##  registration_type       time                    
##  Length:996         Min.   :2017-01-05 08:59:04  
##  Class :character   1st Qu.:2017-05-06 12:31:43  
##  Mode  :character   Median :2017-09-08 00:10:11  
##                     Mean   :2017-09-03 07:11:55  
##                     3rd Qu.:2017-12-23 02:06:20  
##                     Max.   :2018-05-05 01:34:30

As one can see, there are only 2 distinct activities left in the event log.

Filter on activity frequency

It is also possible to filter on activity frequency. This filter uses a percentile cut off, and will look at those activities which are most frequent until the required percentage of events has been reached. Thus, a percentile cut off of 80% will look at the activities needed to represent 80% of the events. In the example below, the least frequent activities covering 50% of the event log are selected, since the reverse argument is true.

patients %>%
    filter_activity_frequency(percentile_cut_off = 0.5, reverse = T) %>%
    activity_frequency("activity")
## # A tibble: 5 x 3
##          handling absolute  relative
##             <chr>    <int>     <dbl>
## 1      Blood test      474 0.1377106
## 2       Check-out      984 0.2858803
## 3 Discuss Results      990 0.2876235
## 4        MRI SCAN      472 0.1371296
## 5           X-Ray      522 0.1516560

Filter on attributes

The filter_attributes function is a very generic function an can be supplied with conditions on the data set, in the same way as the dplyr::filter function. As such, it allows you to filter on event or case attributes. Multiple conditions can be listed, separated by a comma. In that case, the comma will be treated as “and”. You can use the |-symbol to state “OR”. Since the patients dataset does not have many additional attributes, the example below uses the resource and activity. This filter is thus the same as the combination of filter_activity and filter_resource, in case both conditions were required. However, it has the advantange of stating both conditions as OR.

patients %>% 
    filter_attributes(employee == "r1" | handling == "X-Ray") 
## Event log consisting of:
## 1522 events
## 2 traces
## 500 cases
## 2 activities
## 761 activity instances
## 
## # A tibble: 1,522 x 6
##        handling patient employee handling_id registration_type
##           <chr>   <int>    <chr>       <int>             <chr>
##  1 Registration       1       r1           1             start
##  2 Registration       2       r1           2             start
##  3 Registration       3       r1           3             start
##  4 Registration       4       r1           4             start
##  5 Registration       5       r1           5             start
##  6 Registration       6       r1           6             start
##  7 Registration       7       r1           7             start
##  8 Registration       8       r1           8             start
##  9 Registration       9       r1           9             start
## 10 Registration      10       r1          10             start
## # ... with 1,512 more rows, and 1 more variables: time <dttm>

Filter resources

Similar to the activity filter, the resource filter can be used to filter events by listing on or more resources.

patients %>%
    filter_resource(c("r1","r4")) %>%
    resource_frequency("resource")
## # A tibble: 2 x 3
##   employee absolute  relative
##      <chr>    <int>     <dbl>
## 1       r1      500 0.6793478
## 2       r4      236 0.3206522

Trim cases

The trim filter is a special event filter, as it also take into account the notion of cases. In fact, it trim cases such that they start with a certain activities until they end with a certain activity. It requires two list: one for possible start activities and one for end activities. The cases will be trimmed from the first appearance of a start activity till the last appearance of an end activity. When reversed, these slices of the event log will be removed instead of preserved.

patients %>%
    filter_trim(start_activities = "Registration", end_activities =  c("MRI SCAN","X-Ray")) %>%
    process_map(type = performance())

Case filters

Documentation coming soon