
Prepare RxNorm Data for Validation
prepare_pharm_for_validation.RdThis function processes the output returned by rxnorm_query()
into a frequency table to faciliate validation of RxNorm matches by
a Subject Matter Expert (SME). It retrieves previously validated
"raw_input ~ rxnorm_match" pairs (fields returned by rxnorm_query())
from the lookup_pharmacy_mapping table to identify RxNorm matches
that have already been validated by previous projects. The function
additionally applies optional hierarchy filtering and small cell
suppression. The returned output facilitates further analyses of
validated pharmacy entries for the drug(s) of interest.
When the rxnorm_query() object contains unmatched rows of the
pharmacy table, the function returns all columns containing drug-containing
columns of unmatched pharmacy rows, together with their historical
mappings from previous projects.
section @warning
The default version of this function requires database versions
drm_cleandb_v4_1_1/h4h_template_v5_0_1 or newer due to the
dependency on the lookup_pharmacy_mapping table. However, as
a workaround, users can also supply a custom
lookup_pharmacy_mapping table (e.g., based on a flat file or,
for internal GEMINI staff, the pharmacy master mapping DB).
For HPC4Health users working with older database versions: Please
reach out to our team via the ticketing system if you require
access to the pharmacy mapping lookup table.
Usage
prepare_pharm_for_validation(
dbcon = NULL,
rxnorm_res,
hierarchy = TRUE,
cell_suppression = TRUE,
outpath = NULL,
custom_lookup = NULL
)Arguments
- dbcon
(
DBIConnection)DBIconnection to a GEMINI database versiondrm_cleandb_v4_1_1/h4h_template_v5_0_1or newer. This is required to query thelookup_pharmacy_mappingtable. Alternatively, users may provide acustom_lookupinput instead (see below).- rxnorm_res
(
data.frame,data.table,list)
Output returned byrxnorm_query()to be processed for validation. If the RxNorm search returned unmatched rows, this object should be provided as a list containing bothmatched_rowsandunmatched_rows.- hierarchy
(
logical)
IfTRUE(default), the function applies predefined hierarchy filters to retain only the highest-priority drug-containing "raw_input" per pharmacy row (i.e. hierarchy filtering is applied at the level of individual drug orders). The hierarchy rules prioritize drug information in the following order:med_id_generic_name_raw > med_id_brand_name_raw > med_id_hospital_code_raw > med_id_din > med_id_ndc > iv_component_type. For each pharmacy row (identified byrow_numof the pharmacy table), only the highest-priority column that was matched to the drug(s) of interest is retained as the "raw_input" for that row. The hierarchy rules handle potentially redundant or conflicting information of different search columns for the same pharmacy order. It ensures consistent drug-containing information per row and streamlines the validation process. Therefore, it is recommended to set hierarchy toTRUE. Only set toFALSEwhen applying non-standard hierarchy rules or conducting further detailed investigations.When the
rxnorm_resobject contains unmatched rows, hierarchy filters are applied to search for historical mappings. For unmatched rows, hierarchy filters cannot be disabled.- cell_suppression
(
logical)
When set toTRUE(default), pharmacy entries associated with less than 6 unique encounters are cell-suppressed and shown as "<6" in the output table.- outpath
(
character)
Optional file path for saving the output files. Default isNULL, and no file will be exported to folder. If provided, two files are saved:an
.xlsxfile for SME review,an
.RDSfile for analysts (This file should strictly be used within GEMINI's HPC4Health/NORA environment and should NEVER be pushed to a GitLab repository). Compared to the file for SMEs, the .RDS file for analysts contains an additional columnpharm_row_numwith allrow_numvalues in the pharmacy table (as a list) corresponding to a given RxNorm match. The IDs in thepharm_row_numserve as identifiers for analysts to merge the validated frequency table back to the pharmacy table. For example, this can be used to extract individual pharmacy orders that contain the SME-validated drug entry and perform additional filtering (e.g., by order date-time). Note: When anoutpathis provided,cell_suppressionneeds to be set toTRUE.
- custom_lookup
(
data.frame,data.table)
Optional input allowing users to provide a custom version of thelookup_pharmacy_mappingtable (e.g., from a flat file/master mapping DB). Needs to contain the followingcharactercolumns:search_type: Name of column with RxNorm match (e.g.,med_id_generic_name_raw)raw_input: Entry from the pharmacy table RxNorm matched to the drug of interest (e.g.,amoxicillin 50 mg/ml)rxnorm_match: Drug of interest the pharmacy entry was matched to (e.g.,amoxicillin)drug_group: Broader drug group of matched drug (e.g.,antibiotic)project_name: Name of the project drug match was validated for
If a
custom_lookuptable is provided to this function, thedbconargument will be ignored and the custom table will take priority over the lookup table in the database.
Value
A list containing two or three data tables:
sme: Frequency table for SME validation with columns
count,search_type,raw_input,rxnorm_match,drug_group,times_validated,SME_confirm(for SME to confirm each "raw_input ~ rxnorm_match" pair; must provide a logical value TRUE/FALSE),SME_comments(for SME to add comments, e.g. whether the drug should be included in the study, regardless of whether the match itself is validated)analyst: Frequency table for analyst use, with columns
count,search_type,raw_input,rxnorm_match,drug_group,times_validated, andpharm_row_num (as a list)unmatched_rows: Frequency table for unmatched pharmacy rows if they exist in the input
rxnorm_res. Contains pharmacy columns searched byrxnorm_query()and additionally,historically_mapped_to_drugandhistorically_mapped_to_drug_group(agnostic to drugs user searched for),SME_mapped_search_type,SME_mapped_drug,SME_mapped_drug_group,SME_comment
Additionally, if outpath is provided, two files are saved to the
specified folder:
"pharmacy_mapping_for_SME.xlsx" for SME review
"pharm_res_INTERNAL_USE_ONLY.rds" for analyst to use post SME validation
Details
The function takes results from rxnorm_query() and generates
structured frequency tables of pharmacy data for SME
review. Once the validation is complete, analysts can use the output
table to obtain all validated pharmacy entries for the drugs of interest
These are the key processing steps performed by the function:
Normalizes pharmacy data by converting all text values to lowercase, applying ASCII encoding, and trimming off leading and trailing white spaces and periods.
For
rxnorm_matchanddrug_group, additional normalization is applied to convert plural words to their singular form.Retrieves previous validation and adds the
times_validatedcolumn to the returned table indicating how many times each "raw_input ~ rxnorm_match" pair has previously been validated by individual projects (0: never been validated, 1: validated by one project, 2+: validated by two or more projects). The function additionally retrieves information about the broader classification of each drug from previous projects. However, please note that the classification is not standardized and should be used as supplementary information only.Applies hierarchy filtering on
search_typesuch that only drug information at the highest priority is retained for consideration per row of pharmacy table (see details in parameter description ofhierarchy).Computes number of unique
genc_idsassociated with each "search_type ~ raw_input ~ rxnorm_match" entry. Occurrences less than 6 are suppressed to "<6".Generates a frequency table for SME to perform validation for each "raw_input ~ rxnorm_match" pair (note: validation should be agnostic to all other fields). Each row of the frequency table is uniquely identified by
row_id. When anoutpathis provided, the function saves .xlsx/.rds files to the specified directory.Generates a frequency table for analyst use following SME validation. The table is identical to the SME version but includes an additional column
pharm_row_num, which is a list storingrow_numvalues (of the pharmacy table) associated with each "search_type ~ raw_input ~ rxnorm_match" entry. Therow_numis an identifier allowing analysts to trace each entry back to the original pharmacy table.If
rxnorm_query()was executed withreturn_unmatched = TRUE, the function performs a secondary search on unmatched rows. The secondary search matches each drug-containing field (i.e. search_type) with existing mappings in thepharmacy_master_mappingdatabase in order of hierarchymed_id_generic_name_raw > med_id_brand_name_raw > med_id_hospital_code_raw > med_id_din > med_id_ndc > iv_component_type. The highest priority match is returned per row, along with the match's corresponding drug_group. The search is not specific to the drug searched byrxnorm_query(). Instead, it is a broad search against all existing drug mappings found in thelookup_pharmacy_mappingtable (or user-provided custom lookup table). Users may need to apply filters (e.g. via regex) and manual mapping to identify if any entries in the unmatched rows may contain the drug(s) of interest.
Examples
if (FALSE) { # \dontrun{
drv <- dbDriver("PostgreSQL")
# Connect to GEMINI database
dbcon <- DBI::dbConnect(drv,
dbname = "db",
host = "domain_name.ca",
port = 1234,
user = getPass("Username: "),
password = getPass("Password: ")
)
rxnorm_res <- rxnorm_query(
dbcon,
drug_name = c("warfarin"),
return_unmatched = FALSE
)
## Prepare RxNorm output for validation
res <- prepare_pharm_for_validation(
dbcon = dbcon,
rxnorm_res = rxnorm_res,
hierarchy = TRUE,
cell_suppression = TRUE,
outpath = "/user_path/"
)
res$sme # frequency table to be validated by SME
res$analyst # frequency table with `pharm_row_num` to be used by analysts after SME validation.
## For internal GEMINI staff only:
## Provide custom mapping lookup table based on master mapping DB
# connect to master mapping DB
pharm_db <- DBI::dbConnect(drv,
dbname = "pharmacy_mapping",
host = "domain_name.ca",
port = 1234,
user = getPass("Username: "),
password = getPass("Password: ")
)
# query master mapping table
pharm_mapping <- dbGetQuery(
pharm_db, "SELECT * FROM pharmacy_master_mapping;"
) %>%
data.table()
res <- prepare_pharm_for_validation(
rxnorm_res = rxnorm_res,
custom_lookup = pharm_mapping # no `dbcon` input required
)
} # }