Skip to contents

This function plots distributions (histograms/barplots) and shows basic summary statistics (e.g., median [Q1, Q3], % missing etc.) for multiple variables.

Usage

plot_summary(
  data,
  plot_vars = NULL,
  facet_group = NULL,
  show_stats = TRUE,
  prct = FALSE,
  base_size = NULL,
  color = "lightblue",
  ...
)

Arguments

data

(data.frame | data.table)
Table containing data to be plotted.

plot_vars

(character | list)
Character vector or list of variables to be plotted. If no plot_vars input is provided, the function will automatically plot all variables, ignoring any encounter/patient/physician IDs and date-time variables.

facet_group

(character)
Name of variable to be used as facet variable. This only works if plot_vars only specifies 1 variable to be plotted. Users can then create separate subplots per facet_group level, for example, to plot separate histograms/ barplots for each hospital (facet_group = "hospital_num")/

show_stats

(logical)
Flag indicating whether to show descriptive stats above each plot.

prct

(logical)
Flag indicating whether y-axis labels should show percentage (%). If FALSE (default), counts (n) will be shown.

base_size

(numeric)
Numeric input defining the base font size (in pts) for each subplot. By default, the function will automatically determine an appropriate size depending on the number of subplots (base_size = 11 if a single subplot).

color

(character)
Plotting color used for "fill". Default is R's built-in "lightblue".

...


Additional arguments passed to ggpubr::ggarrange() that allow for finer control of subplot arrangement (e.g., ncol, nrow, widths, heights, align etc.; see ? ggarrange for more details).

Value

(ggplot)
A ggplot figure with subplots showing histograms/ barplots for all variables specified in plot_vars.

Note

These plots are not meant as publication-ready figures. Instead, the goal of this function is to provide a quick and easy means to visually inspect the data and obtain information about distributional properties of a wide range of variables, requiring just a single line of code.

Additional inputs when providing plot_vars as a list

When plot_vars are provided as a list, users can specify additional characteristics for each individual variable, such as:

  • class (character): variable type, e.g., "numeric", "character", "logical" etc.

  • sort (character): for categorical variables, whether to sort bars in ascending ("^a") or descending (starting with "^d") frequency

  • binwidth/bins/breaks: for numeric/integer variables, specifying the histogram bins

  • normal: for numeric/integer variables, whether to assume normal distribution (will show mean [SD] if show_stats = TRUE)

Examples

# simulate GEMINI data table
admdad <- dummy_ipadmdad(
  n = 10000,
  n_hospitals = 20,
  time_period = c(2015, 2022)
)

## Providing plot_vars as a character vector
plot_summary(
  data = admdad,
  plot_vars = c("age", "gender", "discharge_disposition")
)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


## Providing plot_vars as a list input
plot_summary(
  admdad,
  plot_vars = list(
    `Discharge disposition` = list(
        plot_var = "discharge_disposition",
        class = "character",
        sort = "desc"
    ),
    `# Days in ALC` = list(
        plot_var = "number_of_alc_days",
        binwidth = 1,
        breaks = seq(0, 7, 1)
    )
  )
)