Cohort creation
cohort_creation.Rd
This function creates a cohort data table based on user-specified inclusion/ exclusion criteria. It also returns a table showing the cohort size (e.g., number of encounters) that were included/excluded at each step of the cohort creation.
Usage
cohort_creation(
cohort,
labels,
exclusion_flag = NULL,
show_prct = TRUE,
group_var = NULL,
...
)
Arguments
- cohort
(
list
) A list where each item corresponds to a filtereddata.table
/data.frame
object that contains the cohort at a given step of the cohort inclusions/exclusions. The function will automatically combine the inclusion/exclusion steps in a sequential manner, and will then count the number of entries that remain after each criterion. For example, if you have adata.table
object calleddata
: To obtain a cohort of encounters that are female and older than 65 years, you can use:cohort = list(data[gender == "F"], data[age > 65])
. In this case, the returned cohort inclusion/exclusion table will contain 2 rows listing the number of encounters that are 1) female, and 2) female AND older than 65 years (= final cohort). Note that ifdata
is adata.frame
, you will need to filter the relevant rows as follows:cohort = list(data[data\$gender == "F", ], data[data\$age > 65, ])
- labels
(
character
) Vector containing a description for each inclusion/exclusion step (needs to be in the same order as corresponding list items incohort
input).- exclusion_flag
(
logical
) A vector indicating whether a given cohort creation step should be interpreted as an exclusion (rather than an inclusion). IfTRUE
the corresponding entries will be removed and the number (%) of rows that were removed (rather than included) will be shown.By default, all cohort steps will be interpreted as inclusion steps.
- show_prct
(
logical
) Flag indicating whether to show percentage values (default =TRUE
). IfFALSE
, only raw counts will be shown. Note that the percentages always reflect the % change relative to the N in the previous inclusion/exclusion step.- group_var
(
character
) Optional: Name of a grouping variable (e.g., hospital). If provided, cohort numbers will be stratified by each level of the grouping variable (in addition to overall cohort numbers).- ...
Additional parameters that will be passed to
prettyNum
for additional formatting of numbers (e.g.,big.mark = ","
).
Value
A list with 2 items:
cohort_data
:data.table
containing all entries in the final cohort (after applying all inclusions/exclusions)cohort_steps
:data.table
showing the number (and %) of entries that were included/excluded at each step of the cohort creation.
Examples
# create dummy data
my_data <- Rgemini::dummy_ipadmdad(10000, n_hospitals = 5)
# convert to data.table for easy filtering
my_data <- data.table::setDT(my_data)
# run cohort_creation
my_cohort <- cohort_creation(
cohort = list(
my_data,
my_data[gender == "F"],
my_data[age > 65],
my_data[grepl("^7", discharge_disposition)]
),
labels = c(
"All GEMINI encounters",
"Gender = Female",
"Age > 65",
"In-hospital death"
),
exclusion_flag = c(FALSE, FALSE, FALSE, TRUE),
group_var = "hospital_num" # optional: stratify by hospital
)
# get data table containing all entries in final cohort
cohort_data <- my_cohort[[1]]
# print table with N (%) at each inclusion/exclusion step
print(my_cohort[[2]])
#> Cohort creation step Overall N (%) 1 2
#> <char> <char> <char> <char> <char>
#> 1: Incl. 1 All GEMINI encounters 10000 2002 2050
#> 2: Incl. 2 Gender = Female 4684 (46.8%) 943 (47.1%) 895 (43.7%)
#> 3: Incl. 3 Age > 65 3392 (72.4%) 708 (75.1%) 477 (53.3%)
#> 4: Excl. 1 In-hospital death -280 (-8.3%) -49 (-6.9%) -37 (-7.8%)
#> 5: Final cohort 3112 659 440
#> 3 4 5
#> <char> <char> <char>
#> 1: 1980 1982 1986
#> 2: 1011 (51.1%) 944 (47.6%) 891 (44.9%)
#> 3: 804 (79.5%) 733 (77.6%) 670 (75.2%)
#> 4: -74 (-9.2%) -54 (-7.4%) -66 (-9.9%)
#> 5: 730 679 604