ID: biomedical.clinical.clinical_nlp
Version: 1.0.0
Status: Production
Category: Clinical / Natural Language Processing
The Clinical NLP Toolkit provides comprehensive natural language processing capabilities for clinical text, including electronic health records (EHRs), discharge summaries, clinical notes, pathology reports, and radiology reports. Built on medspaCy, OpenMed, and clinical transformer models (BioBERT, ClinicalBERT, PubMedBERT), this skill extracts structured medical information from unstructured clinical narratives.
Clinical documentation contains 80% of patient health information in unstructured text. This skill transforms free-text clinical notes into structured, queryable data for clinical decision support, research, and population health analytics.
| Entity Type | Examples | Detection Method |
|---|---|---|
| Diseases/Conditions | diabetes mellitus, pneumonia, CHF | BioBERT + medspaCy |
| Medications | metformin 500mg, lisinopril | RxNorm normalization |
| Procedures | colonoscopy, CT scan, biopsy | CPT/SNOMED mapping |
| Anatomy | left ventricle, right lung | Anatomical ontology |
| Lab Values | HbA1c 7.2%, creatinine 1.4 | Numeric extraction |
| Temporal | 3 days ago, since 2019 | Temporal reasoning |
| Assertion | Description | Example |
|---|---|---|
| Present | Currently active finding | "Patient has pneumonia" |
| Absent/Negated | Explicitly ruled out | "No evidence of malignancy" |
| Hypothetical | Conditional/possible | "If patient develops fever" |
| Historical | Past occurrence | "History of appendectomy" |
| Family | Family member condition | "Mother had breast cancer" |
| Uncertain | Hedged/uncertain | "Possible early diabetes" |
| Relation Type | Description | Example |
|---|---|---|
| Drug-Disease | Medication treats condition | metformin → diabetes |
| Drug-Dose | Medication with dosage | lisinopril → 10mg |
| Test-Result | Lab test with value | HbA1c → 7.2% |
| Symptom-Anatomy | Symptom localization | pain → chest |
| Temporal Relations | Time-based associations | fever → 3 days |
| Framework | Description | Strengths |
|---|---|---|
| medspaCy | spaCy for clinical NLP | Rule-based + ML, interpretable |
| OpenMed | Biomedical NLP toolkit | LLM integration, de-identification |
| Transformers | BioBERT, ClinicalBERT | State-of-the-art accuracy |
| SciSpaCy | Scientific text processing | Biomedical entity linking |
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
str |
Required | Clinical text to process |
tasks |
list |
["ner", "negation"] |
NLP tasks to perform |
model |
str |
en_core_med7_lg |
NLP model to use |
output_format |
str |
json |
Output format (json, fhir, csv) |
normalize |
bool |
True |
Map to standard terminologies |
{
"entities": [
{
"text": "type 2 diabetes",
"label": "DISEASE",
"start": 45,
"end": 60,
"assertion": "present",
"cui": "C0011860",
"snomed": "44054006"
}
],
"relations": [
{
"subject": "metformin",
"predicate": "treats",
"object": "type 2 diabetes"
}
],
"sections": {
"chief_complaint": "...",
"history": "...",
"assessment": "..."
}
}python clinical_nlp.py "Patient presents with chest pain and shortness of breath.
No fever. History of MI in 2019. Currently on aspirin 81mg daily." \
--tasks ner negation sections \
--model en_core_med7_lg \
--output-format jsonimport medspacy
from medspacy.ner import TargetRule
from medspacy.visualization import visualize_ent
# Load clinical NLP pipeline
nlp = medspacy.load(enable=["tokenizer", "ner", "context"])
# Add target rules for custom entities
target_rules = [
TargetRule("type 2 diabetes", "CONDITION"),
TargetRule("metformin", "MEDICATION"),
]
nlp.get_pipe("target_matcher").add(target_rules)
# Process clinical text
clinical_note = """
ASSESSMENT: 58-year-old male with poorly controlled type 2 diabetes.
HbA1c 9.2% (elevated from 7.8% six months ago). No hypoglycemic episodes.
History of hypertension, well controlled on lisinopril.
Plan: Increase metformin to 1000mg BID. Add empagliflozin 10mg daily.
"""
doc = nlp(clinical_note)
# Extract entities with assertions
for ent in doc.ents:
print(f"{ent.text} | {ent.label_} | Negated: {ent._.is_negated} | Historical: {ent._.is_historical}")
# Visualize
visualize_ent(doc)from langchain.tools import tool
import medspacy
from medspacy.context import ConTextComponent
@tool
def extract_clinical_entities(
clinical_text: str,
entity_types: list = ["CONDITION", "MEDICATION", "PROCEDURE"]
) -> str:
"""
Extracts medical entities from clinical text with assertion detection.
Identifies diseases, medications, procedures, labs and determines
if findings are present, negated, historical, or hypothetical.
Args:
clinical_text: Clinical note, discharge summary, or report
entity_types: Types of entities to extract
Returns:
JSON with extracted entities, assertions, and relationships
"""
nlp = medspacy.load(enable=["tokenizer", "ner", "context", "sectionizer"])
doc = nlp(clinical_text)
entities = []
for ent in doc.ents:
if ent.label_ in entity_types:
entities.append({
"text": ent.text,
"label": ent.label_,
"is_negated": ent._.is_negated,
"is_historical": ent._.is_historical,
"is_hypothetical": ent._.is_hypothetical,
"is_family": ent._.is_family,
"section": ent._.section_category
})
return json.dumps({"entities": entities, "entity_count": len(entities)})
@tool
def normalize_medical_concepts(
entities: list,
terminology: str = "SNOMED"
) -> str:
"""
Maps extracted entities to standard medical terminologies.
Args:
entities: List of entity texts to normalize
terminology: Target terminology (SNOMED, ICD10, RxNorm)
Returns:
JSON mapping entities to standard codes
"""
from scispacy.linking import EntityLinker
nlp = spacy.load("en_core_sci_lg")
nlp.add_pipe("scispacy_linker", config={"resolve_abbreviations": True})
results = []
for entity in entities:
doc = nlp(entity)
for ent in doc.ents:
if ent._.kb_ents:
cui, score = ent._.kb_ents[0]
results.append({
"text": entity,
"cui": cui,
"confidence": score
})
return json.dumps(results)import anthropic
import medspacy
client = anthropic.Client()
def enhanced_clinical_extraction(clinical_note: str):
"""Combines rule-based NLP with LLM reasoning for clinical extraction."""
# Step 1: medspaCy extraction
nlp = medspacy.load()
doc = nlp(clinical_note)
medspacy_entities = [
{"text": ent.text, "label": ent.label_, "negated": ent._.is_negated}
for ent in doc.ents
]
# Step 2: Claude enhancement
message = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=2000,
messages=[
{
"role": "user",
"content": f"""You are a clinical NLP expert. Review this extraction and enhance it.
Clinical Note:
{clinical_note}
Extracted Entities (medspaCy):
{json.dumps(medspacy_entities, indent=2)}
Tasks:
1. Identify any missed entities (medications, conditions, procedures, labs)
2. Correct any mis-classifications
3. Add clinical reasoning for assertion status
4. Extract implicit information (e.g., implied diabetes from "HbA1c monitoring")
5. Structure the findings in a clinical problem list format
Return JSON with enhanced entities and clinical summary."""
}
],
)
return message.content[0].textClinical notes are automatically segmented into standard sections:
| Section | Description | Typical Content |
|---|---|---|
chief_complaint |
Reason for visit | Main symptoms |
hpi |
History of present illness | Symptom timeline |
past_medical_history |
Prior conditions | Chronic diseases |
medications |
Current medications | Drug list |
allergies |
Allergy information | Drug allergies |
family_history |
Family conditions | Hereditary risks |
social_history |
Lifestyle factors | Smoking, alcohol |
physical_exam |
Exam findings | Vital signs, findings |
assessment |
Clinical assessment | Diagnoses |
plan |
Treatment plan | Medications, follow-up |
This implementation follows established clinical NLP methodologies:
Eyre, H. et al. Launching into clinical space with medspaCy: a new clinical text processing toolkit in Python. AMIA 2021. https://github.com/medspacy/medspacy
Alsentzer, E. et al. Publicly Available Clinical BERT Embeddings. NAACL Clinical NLP Workshop 2019.
Key design decisions:
- Hybrid approach: Rule-based patterns for precision, ML for recall
- Context algorithm: ConText for assertion detection (Chapman et al.)
- Section detection: Pattern-based with ML fallback
- Terminology mapping: UMLS as semantic backbone
medspacy>=1.0.0
spacy>=3.4.0
scispacy>=0.5.0
transformers>=4.30.0
en_core_med7_lg (model)
en_core_sci_lg (model)
Install with:
pip install medspacy scispacy transformers
pip install https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_lg-0.5.1.tar.gzPerformance on benchmark datasets:
| Dataset | Task | F1 Score |
|---|---|---|
| i2b2 2010 | NER | 0.89 |
| i2b2 2010 | Assertion | 0.94 |
| i2b2 2012 | Temporal | 0.78 |
| MIMIC-III | Section | 0.92 |
| n2c2 2018 | Relations | 0.85 |
- De-identification: Built-in PHI removal capabilities
- HIPAA compliance: No PHI stored or transmitted
- Audit logging: Track all text processing
- On-premise deployment: Air-gapped operation supported
- Clinical Note Summarization: For SOAP note generation
- EHR/FHIR Integration: For structured data export
- Trial Eligibility Agent: For patient matching
- Medical NER Agent: For deep relation extraction
MD BABU MIA Artificial Intelligence Group Icahn School of Medicine at Mount Sinai md.babu.mia@mssm.edu