{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "395d7f62-3dd8-458c-b42a-d766c761eb34",
   "metadata": {},
   "source": [
    "# Using `spaCy` NER with pyDeid"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6fa12c0f-7895-4d70-9b85-24ee900c072b",
   "metadata": {},
   "source": [
    "## Basic `spaCy` NER pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "ae982bee-72f6-4472-afc2-6fefcd5134d9",
   "metadata": {},
   "outputs": [],
   "source": [
    "import spacy"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3051062e-3912-4650-b906-f64b92718aef",
   "metadata": {},
   "source": [
    "Custom `spaCy` NER pipelines can be added to pyDeid. `spaCy` ships four english pretrained pipelines by default:\n",
    "\n",
    "* `en_core_web_sm`\n",
    "* `en_core_web_md`\n",
    "* `en_core_web_lg`\n",
    "* `en_core_web_trf`\n",
    "\n",
    "Let's use the transformer-based pipeline in conjunction with pyDeid. This begins with downloading the pipeline."
   ]
  },
  {
   "cell_type": "raw",
   "id": "6a102900-209a-4f7e-8cb7-8b79ee1ef7c9",
   "metadata": {},
   "source": [
    "!python -m spacy download en_core_web_lg"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "929d3179-ae6a-4bd7-bd59-230ef5230e40",
   "metadata": {},
   "source": [
    "Now we load the pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4490dbb1-d7f3-4efc-90f6-a6a1dd95a8db",
   "metadata": {},
   "outputs": [],
   "source": [
    "nlp = spacy.load(\"en_core_web_lg\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "f0fac81e-6dc9-423d-b57d-cb4ed53c0787",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner']"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp.pipe_names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e8bb68d-18ac-47e1-b79e-7f3ea7b24f54",
   "metadata": {},
   "source": [
    "We can now pass the pipeline as-is to `pyDeid` to add to the set of de-identification steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "b8a2691d-910c-482d-91c3-0687c2258f1e",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing encounter 3, note Record 3: : 3it [00:00,  3.02it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Diagnostics:\n",
      "                - chars/s = 131.73949687826678\n",
      "                - s/note = 0.33146222432454425\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "from pyDeid import pyDeid\n",
    "\n",
    "pyDeid(\n",
    "    original_file = \"./../../tests/test.csv\",\n",
    "    note_id_varname = \"note_id\",\n",
    "    encounter_id_varname = \"genc_id\",\n",
    "    note_varname = \"note_text\",\n",
    "    ner_pipeline = nlp,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4171cee2-9c4d-461a-a64f-23ff068dd6de",
   "metadata": {},
   "source": [
    "## Add `medspaCy` components"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c04131f5-599b-478f-880f-694059179c69",
   "metadata": {},
   "source": [
    "Now let's instead use a `medspaCy` tokenizer and sentence parser.\n",
    "\n",
    "We begin by replacing the default tokenizer with the `medspacy_tokenizer`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "6b2dfc4d-ce52-44d4-9cff-fa04b95de9f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from medspacy.custom_tokenizer import create_medspacy_tokenizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "0338b77f-af00-486c-bdc8-61e2791772ec",
   "metadata": {},
   "outputs": [],
   "source": [
    "medspacy_tokenizer = create_medspacy_tokenizer(nlp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "d54246f9-b098-4a74-b816-4425de26834c",
   "metadata": {},
   "outputs": [],
   "source": [
    "nlp.tokenizer = medspacy_tokenizer"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3f5a6488-67c9-4f8c-b593-b02ddd041b5c",
   "metadata": {},
   "source": [
    "Next we use `PyRuSH` sentence parsing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "1a3eda74-1a54-4c00-8052-0c8a1456d604",
   "metadata": {},
   "outputs": [],
   "source": [
    "from medspacy.sentence_splitting import PyRuSHSentencizer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "eed92698-06f3-46da-b8be-3ba8654bbc09",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['tok2vec',\n",
       " 'tagger',\n",
       " 'medspacy_pyrush',\n",
       " 'parser',\n",
       " 'attribute_ruler',\n",
       " 'lemmatizer',\n",
       " 'ner']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nlp.add_pipe(\"medspacy_pyrush\", before=\"parser\")\n",
    "nlp.pipe_names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3d7d3ce3-afa9-467d-972f-3e49b0752aa7",
   "metadata": {},
   "source": [
    "We can pass this pipeline to `pyDeid` similarly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "38202984-f35b-4612-a4e5-4a06798d6221",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Processing encounter 3, note Record 3: : 3it [00:01,  2.77it/s]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Diagnostics:\n",
      "                - chars/s = 120.9813422064844\n",
      "                - s/note = 0.3609371980031331\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "pyDeid(\n",
    "    original_file = \"./../../tests/test.csv\",\n",
    "    note_id_varname = \"note_id\",\n",
    "    encounter_id_varname = \"genc_id\",\n",
    "    note_varname = \"note_text\",\n",
    "    ner_pipeline = nlp,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "38ca0589-9e60-4a56-9ab7-55368d35c2b0",
   "metadata": {},
   "source": [
    "## Create a custom NER pipeline for `pyDeid`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d101896e-d732-42fe-87cb-379bccc7dde7",
   "metadata": {},
   "source": [
    "Given that `pyDeid` can accept any `spaCy` NER pipeline, lets create a custom pipeline with:\n",
    "\n",
    "* A `med7` base model.\n",
    "* The `medspaCy` tokenizer.\n",
    "* The `medspaCy` sentence parser.\n",
    "* A preprocessing component that `truecases` the input text.\n",
    "\n",
    "Note that rather than the `Med7` model, we could easily have used any other `spaCy` model such as `SciSpaCy`."
   ]
  },
  {
   "cell_type": "raw",
   "id": "c7013e7a-3960-4845-a30f-cf16c29a67f6",
   "metadata": {},
   "source": [
    "!pip install https://huggingface.co/kormilitzin/en_core_med7_trf/resolve/main/en_core_med7_lg-any-py3-none-any.whl"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "b3acf73a-3805-49ad-b88e-ad78a15b1f93",
   "metadata": {},
   "outputs": [],
   "source": [
    "med7_model = \"en_core_med7_lg\" # the model downloaded above\n",
    "\n",
    "med7_nlp = spacy.load(med7_model)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "dc6c5ffe-0508-4097-9cfc-50d2b93f9715",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<function __main__.truecaser(doc: spacy.tokens.doc.Doc) -> spacy.tokens.doc.Doc>"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from spacy.language import Language\n",
    "from spacy.tokens import Doc\n",
    "import truecase\n",
    "\n",
    "@Language.component(\"truecaser\")\n",
    "def truecaser(doc: Doc) -> Doc:\n",
    "    \"\"\"Apply truecasing to the document text.\"\"\"\n",
    "    truecased_text = truecase.get_true_case(doc.text)\n",
    "    return Doc(doc.vocab, words=truecased_text.split())\n",
    "\n",
    "med7_nlp.add_pipe(\"truecaser\", first=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "fdddb515-a362-48ad-b68f-afd78ebad0f8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<spacy.pipeline.ner.EntityRecognizer at 0x2bcd7cc8270>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "base_spacy_nlp = spacy.load(\"en_core_web_lg\")\n",
    "\n",
    "# add medspaCy tokenizer and sentence parser\n",
    "med7_nlp.tokenizer = create_medspacy_tokenizer(med7_nlp)\n",
    "med7_nlp.add_pipe(\"medspacy_pyrush\", before=\"ner\")\n",
    "\n",
    "# replace SciSpaCy NER with spaCy NER\n",
    "med7_nlp.remove_pipe(\"ner\")\n",
    "med7_nlp.add_pipe(\"ner\", source = base_spacy_nlp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "17533838-8d3e-471e-8407-d6b171fb9071",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['truecaser', 'tok2vec', 'medspacy_pyrush', 'ner']"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "med7_nlp.pipe_names"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2875f693-4e8d-4662-b437-bfdb42e6bfc3",
   "metadata": {},
   "source": [
    "Below we create a sample messy unstructured clinical note (produced by Claude 3.5 Sonnet). We will de-identify this note with our custom NER pipeline."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "090892b2-5183-4ebe-b4a7-ebf2600dba10",
   "metadata": {},
   "outputs": [],
   "source": [
    "note = \"\"\"\n",
    "Pt: J. Smith, 45yo M\n",
    "CC: SOB, chest pain x 2 days\n",
    "Hx: HTN, T2DM, +smoker (1ppd x 20yrs)\n",
    "Meds: metformin, lisinopril\n",
    "VS: BP 145/92, HR 88, RR 20, T 37.2, SpO2 97% RA\n",
    "S) pt reports gradual onset SOB w/ exertion, worse w/ lying flat. denies fever, cough, leg swelling. + intermittent L sided chest pain, non-radiating, 6/10, worse w/ deep breath\n",
    "O) appears mildly SOB, speaking full sentences. Lungs: crackles bil bases. Heart: RRR, no m/r/g. Ext: no edema\n",
    "ECG: NSR, no ST changes\n",
    "Labs:\n",
    "CBC - WBC 9.2, Hgb 13.5, Plt 220\n",
    "BMP - Na 138, K 4.2, Cr 1.1, Glu 162\n",
    "Troponin neg\n",
    "CXR: cardiomegaly, no infiltrates\n",
    "A/P:\n",
    "\n",
    "Acute SOB - ?early CHF vs pneumonia\n",
    "\n",
    "start lasix 40 IV\n",
    "repeat CXR in AM\n",
    "trend troponin\n",
    "\n",
    "\n",
    "HTN - hold lisinopril, recheck BP in AM\n",
    "DM - cont home meds, check A1c\n",
    "Smoking - advised cessation, pt interested in patch\n",
    "\n",
    "Dispo: admit obs, f/u w/ cards\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ec01f63-0230-44c2-9644-ae55ddb0ae0e",
   "metadata": {},
   "source": [
    "Below we use the base spacy display functionality to display the captured entities."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "c9452f83-0c6c-4cd5-b159-68196a4efc9f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><!DOCTYPE html>\n",
       "<html lang=\"en\">\n",
       "    <head>\n",
       "        <title>displaCy</title>\n",
       "    </head>\n",
       "\n",
       "    <body style=\"font-size: 16px; font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif, 'Apple Color Emoji', 'Segoe UI Emoji', 'Segoe UI Symbol'; padding: 4rem 2rem; direction: ltr\">\n",
       "<figure style=\"margin-bottom: 6rem\">\n",
       "<div class=\"entities\" style=\"line-height: 2.5; direction: ltr\">Pt: J. Smith, 45Yo M CC: Sob, chest pain X \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    2 days\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " Hx: Htn, T2Dm, +Smoker (1Ppd X 20Yrs) \n",
       "<mark class=\"entity\" style=\"background: #f0d0ff; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Meds: Metformin,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">WORK_OF_ART</span>\n",
       "</mark>\n",
       " \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Lisinopril\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       " vs: \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    BP\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " 145/92, hr \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    88,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">CARDINAL</span>\n",
       "</mark>\n",
       " Rr 20, t \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    37.2,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">CARDINAL</span>\n",
       "</mark>\n",
       " Spo2 97% \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    RA\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " s) PT reports gradual Onset Sob W/ exertion, worse W/ lying flat . DENIES fever, cough, leg swelling . + intermittent L sided chest pain, \n",
       "<mark class=\"entity\" style=\"background: #bfeeb7; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Non-Radiating, 6/10,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PRODUCT</span>\n",
       "</mark>\n",
       " worse W/ deep breath O) appears mildly \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Sob,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " speaking full sentences . lungs: crackles \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    BIL\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " bases . heart: Rrr, no M/R/G . \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Ext:\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " no edema Ecg: Nsr, no St changes LABS: CBC - Wbc 9.2, Hgb 13.5, PLT \n",
       "<mark class=\"entity\" style=\"background: #e4e7d2; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    220\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">CARDINAL</span>\n",
       "</mark>\n",
       " BMP - NA 138, K 4.2, Cr 1.1, Glu 162 Troponin Neg CXR: cardiomegaly, no Infiltrates A/P: acute \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Sob\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " -? early Chf vs pneumonia start \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Lasix\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " 40 IV repeat CXR in am trend \n",
       "<mark class=\"entity\" style=\"background: #aa9cfc; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Troponin Htn\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PERSON</span>\n",
       "</mark>\n",
       " - hold \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Lisinopril,\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " recheck \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    BP\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " in am \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    DM - CONT\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " home Meds, check A1C smoking - advised \n",
       "<mark class=\"entity\" style=\"background: #7aecec; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    cessation, PT\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">ORG</span>\n",
       "</mark>\n",
       " interested in patch \n",
       "<mark class=\"entity\" style=\"background: #feca74; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Dispo:\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">GPE</span>\n",
       "</mark>\n",
       " admit Obs, F/U W/ cards </div>\n",
       "</figure>\n",
       "</body>\n",
       "</html></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Using the 'ent' visualizer\n",
      "Serving on http://0.0.0.0:5000 ...\n",
      "\n",
      "Shutting down server on port 5000.\n"
     ]
    }
   ],
   "source": [
    "from spacy import displacy\n",
    "doc = med7_nlp(note)\n",
    "displacy.serve(doc, style=\"ent\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77200312-7be3-4acc-b10e-9962ed16fa97",
   "metadata": {},
   "source": [
    "We can pass this custom pipeline to `deid_string`, similarly to how we did for `pyDeid`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "910fcfd8-ccc1-4754-9521-656e170bf0cb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<span class=\"tex2jax_ignore\"><div class=\"entities\" style=\"line-height: 2.5; direction: ltr\"></br>Pt: \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    J.\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">PHI</span>\n",
       "</mark>\n",
       " \n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    Smith\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       ", 45yo M</br>CC: SOB, chest pain x 2 days</br>Hx: HTN, T2DM, +smoker (1ppd x 20yrs)</br>Meds: metformin, lisinopril</br>VS: BP 145/92, HR 88, RR 20, T 37.2, SpO2 97% RA</br>S) pt reports gradual onset SOB w/ exertion, worse w/ lying flat. denies fever, cough, leg swelling. + intermittent L sided chest pain, non-radiating, \n",
       "<mark class=\"entity\" style=\"background: #bfe1d9; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    6/10\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">DATE</span>\n",
       "</mark>\n",
       ", worse w/ deep breath</br>O) appears mildly S\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    OB, \n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       "speaking full sentences. Lungs: crackles bil bases. Heart: RRR, no m/r/g. Ext:\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "     no \n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       "edema</br>ECG: NSR, no ST changes</br>Labs:</br>CBC - WBC 9.2, Hgb 13.5, Plt 220</br>BMP - Na 138, K 4.2, Cr 1.1, Glu 162</br>Troponin neg</br>CXR: cardiomegaly, no infiltrates</br>A/P:</br></br>Acute SOB\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "     - \n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       "?early CHF vs pneumonia</br></br>start la\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    six 4\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       "0 IV</br>repeat CXR in AM</br>trend tr\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    oponin\n",
       "\n",
       "\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       "</br>\n",
       "<mark class=\"entity\" style=\"background: #ddd; padding: 0.45em 0.6em; margin: 0 0.25em; line-height: 1; border-radius: 0.35em;\">\n",
       "    HTN\n",
       "    <span style=\"font-size: 0.8em; font-weight: bold; line-height: 1; border-radius: 0.35em; vertical-align: middle; margin-left: 0.5rem\">NAME</span>\n",
       "</mark>\n",
       " - hold lisinopril, recheck BP in AM</br>DM - cont home meds, check A1c</br>Smoking - advised cessation, pt interested in patch</br></br>Dispo: admit obs, f/u w/ cards</br></div></span>"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "from pyDeid import deid_string, display_deid\n",
    "\n",
    "surrogates, new_note = deid_string(note, ner_pipeline = med7_nlp)\n",
    "\n",
    "display_deid(note, surrogates)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a63c4917-f110-48c5-bf8a-22e6cbe57774",
   "metadata": {},
   "source": [
    "Perhaps these false positives can be handled by fine-tuning this NER pipeline, which may be part of some future work."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}