Skip to contents

Overview

The daily_census function calculates the number of patients that were hospitalized at a given hospital on each day during a time period of interest. The census provides a cross-sectional count of patients that occupied a hospital bed at 8am (by default) each day. Specifically, it is calculated as the number of patients who were admitted before that time point and discharged after that time point. The function also returns a capacity_ratio indicating whether the number of patients on a given day was higher (>1) or lower (<1) than on a typical day, where by default, typical occupancy is based on the median census during the time period of interest.

Note: Although this function can be applied to different cohorts and research contexts, the interpretation of the function outputs may vary according to the inputs users provide. For example, if users pre-filter their cohort based on certain diagnosis criteria, the census counts will only include those patients, and similarly, the capacity_ratio would not reflect a real indicator of capacity, but rather a measure of whether occupancy is above/below typical occupancy for a certain group of patients (also see section on group_var input below).

Moreover, users should use caution when analyzing cohorts with small sample size or when including grouping variables that can result in counts of 0 on some days. In that case, users need to make an informed decision about whether census = 0 is meaningful within the context of their research question, or whether days with patient counts of 0 should be returned as census = NA (i.e., those days are excluded from the capacity_ratio estimate; also see section on include_zero input below).

Running daily_census() with default arguments

To run the daily_census function, users need to provide an input table cohort containing all encounters that are part of their cohort of interest. Typically, the cohort is created based on the admdad table (or a subset thereof) in the GEMINI database. It needs to contain the columns genc_id, hospital_num, admission_date_time, and discharge_date_time.

Here is an example of how to load the relevant data and run the daily_census function with default settings:

# Load necessary libraries
library(RPostgreSQL)
library(DBI)
library(getPass)

# Establish database connection
db <- DBI::dbConnect(drv,
                     dbname = "DB_name",
                     host = "172.XX.XX.XXX",
                     port = 1234,
                     user = getPass("Enter user:"),
                     password = getPass("Enter Password:"))

# query admdad table containing cohort (could be filtered further if necessary)
admdad <- dbGetQuery(db, "SELECT genc_id, hospital_num, admission_date_time, discharge_date_time FROM admdad;") 

# Run default daily_census calculation         
census_output <- daily_census(cohort = admdad) 

head(census_output, 10)

Mock output table (for illustration purposes, not real data):

date_time hospital_num census capacity_ratio
2016-04-01T08:00:00Z 1 237 0.9834025
2016-04-01T08:00:00Z 2 168 1.0181818
2016-04-01T08:00:00Z 3 140 1.0294118
2016-04-01T08:00:00Z 4 159 1.1521739
2016-04-01T08:00:00Z 5 115 0.9913793
2016-04-02T08:00:00Z 1 225 0.9336100
2016-04-02T08:00:00Z 2 157 0.9515152
2016-04-02T08:00:00Z 3 138 1.0147059
2016-04-02T08:00:00Z 4 158 1.1449275
2016-04-02T08:00:00Z 5 108 0.9310345

This table shows an example output for data from 5 hospitals from April 2016 - April 2017. Each row corresponds to a unique combination of date_time (with a reference time of 8am, by default) and hospital ID (hospital_num). The census variable indicates the count of patients that were occupying a bed at 8am each day at a given hospital. The capacity_ratio refers to a relative measure of bed occupancy, which by default is calculated as census/median(census).

Based on this table, users could extract further information, such as the median daily census at each hospital during the study period:

library(data.table) #  we are using data.table operations below, but users could also use tidyverse to analyze the function output

# Compute median census at each site
median_census <- census_output[ , .(median_census = median(census)), by = hospital_num] 
hospital_num median_census
1 241
2 165
3 136
4 138
5 116

Additionally, users could plot census and capacity_ratio over time, separately for each hospital:

library(ggplot2)

# Plot census over time
ggplot(census_output, aes(x=as.Date(date_time), y=census, group = hospital_num, color=hospital_num)) +
  geom_line(linewidth=1.5,show.legend = TRUE) +
  scale_x_date(name = 'Date', breaks = seq(min(as.Date(census_output$date_time)), max(as.Date(census_output$date_time)), by="1 months"), date_labels = "%b\n%Y") +
  ggtitle("Mock figure: Daily census by hospital") + theme_classic()

# Plot capacity ratio over time
ggplot(census_output, aes(x=as.Date(date_time), y=capacity_ratio, group = hospital_num, color=hospital_num)) +
  geom_line(linewidth=1.5,show.legend = TRUE) +
  scale_x_date(name = 'Date', breaks = seq(min(as.Date(census_output$date_time)), max(as.Date(census_output$date_time)), by="1 months"), date_labels = "%b\n%Y") +
  ggtitle("Mock figure: Daily capacity ratio by hospital") + theme_classic() +
  geom_hline(yintercept=1)

Note that capacity_ratio fluctuates around 1 (= typical occupancy) where values > 1 correspond to days with higher occupancy than usual.

Optional input arguments

time_period

By default, the function will calculate the census for the whole time period that is available in the cohort table. Note that data availability may differ by hospital, and the function will determine data availability individually for each site. If users only want to calculate the census for a certain time period within their overall cohort, they can provide an optional time_period input specifying a start and end date.

For example, to calculate the census from June 1, 2016 - Dec 31, 2016:

# Compute census for specific time period
census_output <- daily_census(admdad, time_period = c("2016-06-01","2016-12-31")) 

Note that in this case, the capacity_ratio is calculated based on the typical occupancy observed during that same time period (i.e., between June-Dec 2016).

scu_exclude

By default, the total duration of each encounter’s hospital stay is considered in the census counts. However, for certain projects, it may be of interest to exclude any time points where the encounter was in a special care unit (SCU), such as intensive care (ICU). This is relevant if researchers want to analyse bed occupancy at a particular medical ward (e.g., GIM) and only want to count patients who were in fact occupying a bed in that ward on a given day, while excluding any patients who were in an SCU.

In that case, users should provide an scu_exclude table that contains all SCU encounters that should be excluded from the census calculation. Note: The SCU table typically refers to the ipscu table in the GEMINI database, however, users may want to further filter that table by relevant scu_unit_numbers that should be excluded from the census counts. The function automatically removes any entries where scu_unit_number = 99, which refers to encounters with "no SCU". Additionally, only SCU encounters with a valid scu_admit_date_time and scu_discharge_date_time can be excluded from the census. Availability of these variables is low for certain cohorts and SCU units. Therefore, users are advised to carefully inspect the SCU table to make an informed decision about whether to exclude SCU encounters, and if yes, which SCU entries to exclude.

# exclude SCU encounters from census
scu <- dbGetQuery(db, "SELECT * FROM ipscu;") 

# Compute census excluding SCU
census_output <- daily_census(admdad, scu_exclude = scu) 

group_var

By default, the census is calculated separately for each hospital (by hospital_num). Users can specify additional grouping variables to obtain patient counts (and capacity ratios) for subgroups of interest, such as different medical subservices or physicians.

Here is a simple example where census is grouped by patients’ gender and age (<65 vs. 65+):

# Create age category
admdad$age_cat <- ifelse(admdad$age <= 65,'<=65','>65')

# Compute census by gender & age categories
census_output <- daily_census(admdad, group_var = c("gender","age_cat")) 

head(census_output, 10)

Mock output table showing census grouping by age & gender categories:

hospital_num date_time age_cat gender census capacity_ratio
1 2016-04-01T08:00:00Z <=65 M 27 1.0384615
1 2016-04-01T08:00:00Z >65 M 87 1.0235294
1 2016-04-01T08:00:00Z <=65 F 28 1.2173913
1 2016-04-01T08:00:00Z >65 F 95 0.9500000
1 2016-04-02T08:00:00Z <=65 M 23 0.8846154
1 2016-04-02T08:00:00Z >65 M 90 1.0588235

Note that capacity_ratio in this mock example should not be interpreted as a real “capacity” indicator, but rather, as a measure of whether on a given day there were more (>1) or less (<1) patients of a certain age/gender than on a typical day. Capacity_ratio is more useful for grouping variables that correspond to separate medical entities, such as different medical subservices or wards, where capacity_ratio can serve as an indicator of system load vs. capacity. Nevertheless, grouping variables that are based on patient characteristics (age, gender, diagnosis group, illness severity etc.) could be useful to analyse the case mix of hospitalized patients during certain periods of time.

capacity_func

By default, capacity_ratio is defined as census/median(census), which is calculated separately for each hospital and grouping variable (if any). That is, capacity_ratio refers to the daily count of patients relative to typical bed occupancy, where typical bed occupancy is defined as the median census during the time period of interest. Alternatively, users can specify other measures of central tendency to obtain typical occupancy (“mean”, “mode”) or estimate capacity based on the maximum occupancy (“max”).

# Get occupancy relative to max capacity (estimated based on max(census))
census_output <- daily_census(admdad, capacity_func = "max") 

time_of_day

By default, the census is calculated at 8am each day during the study period. For example, to obtain the patient counts for April 1st, 2016 the function counts all patients with admission_date_time <= '2015-04-01 08:00:00' and discharge_date_time >= '2015-04-01 08:00:00'. Users can specify a different reference time by providing an optional time_of_day input:

# Calculate census at 2.30pm each day
census_output <- daily_census(admdad, time_of_day = "14:30:00") 

buffer

For time periods that are towards the end of the data availability timeline of a given hospital, users may observe a pattern similar to this, where census counts suddenly drop at the end of the specified time period:

This effect is due to a truncation bias that can occur if the end of the specified time period (here April 2018) is close to the last available date in the overall cohort (e.g., the cohort only contains encounters up to April 2018 in this example). Note that cohorts are typically defined by discharge date. That is, if a cohort includes data from April 2017 - April 2018, it will only include encounters that were discharged during that time. As a result, there may be patients who were admitted during the last week of April 2018, but were not discharged prior to the end of the month. Therefore, they are not part of the cohort, and cannot be counted towards the census.

To prevent this from biasing the census and capacity_ratio estimates, the function automatically checks for data availability at each hospital based on the min and max dates in the cohort input table. If data availability ends prior to (or at the same time as) the end of the time period of interest, a buffer period of 30 days is applied by default. Specifically, the last 30 days at each site will be set to NA. The default setting of 30 days is based on the observation that the vast majority of hospital stays are < 30 days, and therefore, we can be confident that patients who where hospitalized 30 days prior to the end of the specified time period have already been discharged (i.e., are included in the cohort).

The default setting of buffer = 30 will result in the following output, which removes the truncation bias shown in the previous figure by setting the last 30 days of the time period to NA:

Users can specify other buffer periods (in full days) based on their cohort (and the typical length of stay observed in that cohort). For example, to set the buffer period to 10 days, run the following code:

# specify buffer of 10 days at end of time period
census_output <- daily_census(admdad, buffer = 10) 

Note: In this example, if users specify a time_period input that ends 10 days earlier than the latest available data, the buffer period will be ignored and all available data will be used to calculate census for the time period of interest. In other words, the actual buffer period that is applied within the function depends on a combination of the specified time period, data availability per site, and the buffer argument in order to ensure that the maximum amount of available data are used in the census calculation.

include_zero

By default, the function includes days where census = 0 in the output (include_zero = TRUE). This means that census counts on those days are considered to be true 0s (i.e., “no patient was hospitalized”), and therefore, 0s are included in the capacity ratio estimate. For example, when analyzing cohorts of patients with a rare disease, days where no patients with that disease were in hospital are conceptually meaningful and should be counted towards the typical occupancy. By contrast, when analyzing the daily census counts per physician, days where census = 0 likely reflect days where a given physician was not on service. Therefore, those days should not be included in the estimate of physicians’ typical patient volume, and thus, users should set include_zero to FALSE (i.e., census counts and capacity_ratio will be returned as NA for days where no patient was associated with a given physician).

Calculating customized measures of capacity

In addition to the flexibility provided by the function itself, users may want to obtain additional capacity indicators that are currently not supported by the function. For example, researchers may want to calculate capacity_ratio relative to the typical occupancy on a year-over-year basis. The function calculates capacity_ratio by estimating typical (or max) capacity based on census numbers throughout the whole time period. However, the census may vary on a year-by-year basis. In that case, users could either run the daily_census() function separately for each study year, or they could run the function on the whole time period and calculate a year-over-year capacity_ratio based on the raw census output provided by the function, e.g.:

# calculate capacity_ratio based on median census *per year*
census_output[ , year := year(census_output$date_time)]
census_output[ , capacity_ratio_yoy := census/median(census), by = c("hospital_num","year")]

Similarly, users could define other indicators of typical occupancy that are currently not supported by the function (e.g., trimmed means).