Compute daily census
daily_census.Rd
Calculates the daily number of patients that were hospitalized at each site during the specified time period.
The daily census is defined as a cross-sectional count of bed occupancy at a given time of the day (8am by default).
It is calculated as the number of patients who were admitted before and discharged after that point in time
(i.e., the number of patients who occupied a bed at 8am on each day).
Results are returned as raw counts as well as a capacity ratio (= census/capacity
). The capacity ratio indicates
whether the number of hospitalized patients on a particular day was above (>1) or below (<1) the typical occupancy
(or max occupancy) during the time period of interest. By default, capacity is estimated as median(census)
but users can specify other measures of typical/max capacity (see capacity_func
input). Note that if cohorts
are filtered/grouped based on patient characteristics (e.g., diagnosis, age etc.), this indicator simply reflects
a standardized measure of patient counts, rather than a measure of true capacity limitations of medical wards.
Usage
daily_census(
cohort,
time_period = NULL,
scu_exclude = NULL,
group_var = NULL,
capacity_func = "median",
buffer = 30,
time_of_day = "08:00:00",
include_zero = TRUE
)
Arguments
- cohort
(
data.frame
ordata.table
) Cohort table with all relevant encounters of interest, where each row corresponds to a single encounter. Must contain the following columns:genc_id
(integer
): GEMINI encounter IDhospital_num
orhospital_id
(integer
orcharacter
): Hospital IDadmission_date_time
(character
|POSIXct POSIXt
): Date-time of admission in YYYY-MM-DD HH:MM formatdischarge_date_time
(character
|POSIXct POSIXt
): Date-time of discharge in YYYY-MM-DD HH:MM format
If a
group_var
is specified (see below), this should be included in thecohort
table as well.- time_period
(
character
) The start and end points of the study period, e.g.,c('2019-01-01', '2019-12-31')
. Should be specified in valid date format (e.g.,'yyyy-mm-dd'
).- scu_exclude
(
data.frame
|data.table
) Optional table containing special care unit (SCU) encounters. This is only required if patients who were in an SCU should be excluded from the census calculation during time periods where they were in the SCU. Thescu_exclude
table typically refers to theipscu
table (see GEMINI database schema). However, users may want to filter theipscu
table by specific SCU unit numbers that should be excluded from the census calculation. Entries wherescu_unit_number = 99
("No SCU"
) are automatically removed by this function.The
scu_exclude
input table needs to contain the following columns:genc_id
(integer
): GEMINI encounter IDscu_admit_date_time
(character
|POSIXct POSIXt
): Date-time of SCU admission in YYYY-MM-DD HH:MM formatscu_discharge_date_time
(character
|POSIXct POSIXt
): Date-time SCU of discharge in YYYY-MM-DD HH:MM format
For all entries in the
scu_exclude
table that have a validscu_admit_date_time
andscu_discharge_date_time
, the encounter will not be counted towards the census during any time periods where they were in the SCU (i.e., time periods where they did not occupy a bed in a non-SCU ward). If no input is provided for this argument, no exclusion will be applied.- group_var
(
character
) Optional input specifying one (or multiple) grouping variables. Census numbers and capacity ratio will be calculated separately for each level of a given grouping variable. Note: Users do not need to specify hospital as a grouping variable since the function automatically groups the output byhospital_num
. Therefore,group_var
should only be specified if additional grouping (e.g., by different medical subservices, physicians etc.) is required. Multiple grouping variables can be provided as a character vector (e.g.,group_var = c("medical_subservice","physician")
). If no grouping variable is specified (default:group_var = NA
), the function returns the total daily census numbers and capacity ratios per hospital.- capacity_func
(
character
) Optional input specifying the function to be used to define capacity for thecapacity_ratio
output. Can either be a measure of central tendency to obtain typical occupancy (median
,mean
, ormode
) ormax
to obtain maximum occupancy. Thecapacity_ratio
is calculated ascensus/capacity_func(census)
. Default is median (i.e.,capacity_ratio = census/median(census)
). If"mode"
is selected andcensus
has multiple modes, the median of all modes is used to calculate the capacity ratio.Note that capacity is calculated separately for each site (and grouping variable, if any). Other types of measures (e.g., trimmed means), or year-over-year capacity ratios could be calculated based on the raw census counts in the output (see vignette for more details).
- buffer
(
integer
) Buffer period (in days) to be applied at the end of the data availability period. Default is 30 days, i.e., census counts (and capacity ratio) are set toNA
for the last 30 days of available data for each hospital. This is only relevant if the time period of interest is towards the end of the data availability period for a given hospital. For example, iftime_period = c("2020-01-01","2021-12-31")
, and data are only available until Dec 31 2021 for hospital A, census counts will drop sharply during the last few days in Dec 2021 due to a "truncation bias". Specifically, in the example scenario, there are encounters at hospital A who were admitted prior to Dec 31 2021, but have yet to be discharged. Due to the fact that encounters only appear in the GEMINI database once they have been discharged, the census counts in late December will decline abruptly at hospital A because patients who have not been discharged yet are not included in the database. To prevent this decrease in patient numbers from affecting census counts & capacity ratio, outputs will be set toNA
for the last30
days at hospital A.The default buffer period is set to
30
because the vast majority of patients is discharged within 30 days. That is, a 30-day buffer period ensures that most admitted patients have already been discharged, and thus, will be included in our data. However, users may specify a different buffer period since length of stays can vary widely between cohorts. Therefore, users should explore which buffer period makes most sense for the data they work with.Note that data availability differs across hospitals. Therefore, the actual buffer that is applied is calculated separately for each hospital. If a hospital's data availability ends prior to the end of the time period of interest, the buffer includes all last X days of that hospital's data (e.g., 30 days by default). If a hospital's data availability goes past the end of the time period of interest, the actual buffer that is applied is based on the difference between the specified buffer and the latest available discharge date at that hospital. For example, if
buffer = 30
and a given hospital has more than 30 extra days of data past the time period of interest, no additional buffer is applied.- time_of_day
(
character
|numeric
) Optional input specifying the time of the day the census/capacity ratio should be calculated at. Default is 8am (time_of_day = "8:00:00"
). Other times can be specified as a character (e.g.,"10:30"
for 10.30am) or as numeric input (e.g.,14
for 2pm).- include_zero
(
logical
) Flag indicating whether days with census counts of0
should be treated as real 0s or whether they should be returned ascensus = NA
. Note that this is only relevant for edge cases where this function is applied to small cohorts or if users specify agroup_var
with small samples resulting in counts of 0 on some days. In those cases, the decision on whether or not to include 0s can affect the estimated typical capacity (and thus, thecapacity_ratio
output).By default (
include_zero = TRUE
), the function will return days where nogenc_ids
were hospitalized ascensus = 0
. This is the desired behavior in cases where counts of 0 patients are conceptually meaningful. For example, when users want to analyze how many patients with a rare condition are in hospital on a typical day, days wherecensus = 0
represent true counts of 0 patients (assuming the cohort provides sufficient coverage of all hospitalized patients). Note: If data availability periods differ between hospitals,census
counts will only be set to 0 for dates that fall within the min-max date range for that particular site.In other scenarios, users may want to disregard days where
census = 0
by specifyinginclude_zero = FALSE
. For example, when looking at daily census counts per physician, days wherecensus = 0
likely reflect days where a given physician was not on service. Therefore, those days should not be included in the estimate of physicians' typical patient volume.
Value
(data.table
)
data.table with the daily counts of hospitalized patients (census
) at each hospital. Additionally, the
capacity_ratio
(census
relative to the typical occupancy during the time period of interest) will be returned.
Dates within the time period of interest where no data were available at a given site are not included in the output.
For any dates inside the buffer period, census
and capacity_ratio
are returned as NA
.
Examples
## calculate census of all in-patient admissions (ipadm):
if (FALSE) { # \dontrun{
drv <- dbDriver("PostgreSQL")
dbcon <- DBI::dbConnect(drv,
dbname = "db",
host = "domain_name.ca",
port = 1234,
user = getPass("Enter user:"),
password = getPass("password"))
ipadm <- dbGetQuery(dbcon, "select * from admdad") %>% data.table()
ipadm_census <- daily_census(ipadm)
} # }