Cohort creation — cohort_creation • Rgemini

This function creates a cohort data table based on user-specified inclusion/ exclusion criteria. It also returns a table showing the cohort size (e.g., number of encounters) that were included/excluded at each step of the cohort creation.

Usage

cohort_creation(
  cohort,
  labels,
  exclusion_flag = NULL,
  show_prct = TRUE,
  group_var = NULL,
  ...
)

Arguments

cohort

(list) A list where each item corresponds to a filtered data.table/data.frame object that contains the cohort at a given step of the cohort inclusions/exclusions. The function will automatically combine the inclusion/exclusion steps in a sequential manner, and will then count the number of entries that remain after each criterion. For example, if you have a data.table object called data: To obtain a cohort of encounters that are female and older than 65 years, you can use: cohort = list(data[gender == "F"], data[age > 65]). In this case, the returned cohort inclusion/exclusion table will contain 2 rows listing the number of encounters that are 1) female, and 2) female AND older than 65 years (= final cohort). Note that if data is a data.frame, you will need to filter the relevant rows as follows: cohort = list(data[data\$gender == "F", ], data[data\$age > 65, ])

labels

(character) Vector containing a description for each inclusion/exclusion step (needs to be in the same order as corresponding list items in cohort input).

exclusion_flag

(logical) A vector indicating whether a given cohort creation step should be interpreted as an exclusion (rather than an inclusion). If TRUE the corresponding entries will be removed and the number (%) of rows that were removed (rather than included) will be shown.

By default, all cohort steps will be interpreted as inclusion steps.

show_prct

(logical) Flag indicating whether to show percentage values (default = TRUE). If FALSE, only raw counts will be shown. Note that the percentages always reflect the % change relative to the N in the previous inclusion/exclusion step.

group_var

(character) Optional: Name of a grouping variable (e.g., hospital). If provided, cohort numbers will be stratified by each level of the grouping variable (in addition to overall cohort numbers).

...

Additional parameters that will be passed to prettyNum for additional formatting of numbers (e.g., big.mark = ",").

Value

A list with 2 items:

cohort_data: data.table containing all entries in the final cohort (after applying all inclusions/exclusions)
cohort_steps: data.table showing the number (and %) of entries that were included/excluded at each step of the cohort creation.

Examples

# create dummy data
my_data <- Rgemini::dummy_ipadmdad(10000, n_hospitals = 5)

# convert to data.table for easy filtering
my_data <- data.table::setDT(my_data)

# run cohort_creation
my_cohort <- cohort_creation(
  cohort = list(
    my_data,
    my_data[gender == "F"],
    my_data[age > 65],
    my_data[grepl("^7", discharge_disposition)]
  ),
  labels = c(
    "All GEMINI encounters",
    "Gender = Female",
    "Age > 65",
    "In-hospital death"
  ),
  exclusion_flag = c(FALSE, FALSE, FALSE, TRUE),
  group_var = "hospital_num" # optional: stratify by hospital
)

# get data table containing all entries in final cohort
cohort_data <- my_cohort[[1]]

# print table with N (%) at each inclusion/exclusion step
print(my_cohort[[2]])
#>             Cohort creation step Overall N (%)           1           2
#>     <char>                <char>        <char>      <char>      <char>
#> 1: Incl. 1 All GEMINI encounters         10000        2002        2050
#> 2: Incl. 2       Gender = Female  4684 (46.8%) 943 (47.1%) 895 (43.7%)
#> 3: Incl. 3              Age > 65  3392 (72.4%) 708 (75.1%) 477 (53.3%)
#> 4: Excl. 1     In-hospital death  -280 (-8.3%) -49 (-6.9%) -37 (-7.8%)
#> 5:                  Final cohort          3112         659         440
#>               3           4           5
#>          <char>      <char>      <char>
#> 1:         1980        1982        1986
#> 2: 1011 (51.1%) 944 (47.6%) 891 (44.9%)
#> 3:  804 (79.5%) 733 (77.6%) 670 (75.2%)
#> 4:  -74 (-9.2%) -54 (-7.4%) -66 (-9.9%)
#> 5:          730         679         604