MedError is an open-source framework for systematic error analysis of clinical NLP and large language model (LLM) outputs in electronic health record (EHR)-based concept extraction. It provides a structured error taxonomy, an LLM-assisted annotation interface, and visual analytics for multi-site clinical NLP evaluation.
Try the app: https://ohnlp.org/MedError/
The live demo supports Azure OpenAI out of the box. Ollama (local LLM) does not work from the live demo β use the standalone
index.htmlinstead (see Quickstart below).
MedError supports a three-step workflow:
- Configure β upload an annotation guideline (YAML) and select or upload an error taxonomy
- Load β upload model predictions (JSON or CSV) containing gold-standard labels and model outputs
- Analyze β review LLM-generated error categorizations, override as needed, and export results
The error taxonomy covers six dimensions β Annotation, Contextual, Linguistic, Logic, Output/Generation, and Other β with support for both rule-based and transformer/LLM model types.
If you use MedError in your research, please cite:
Liu H, Fu S, Lu Q, Ahn J, Chen F, Yin H, Wen J, Yue Z, Harrison T, Jun J, Ruan X. MedError: A Machine-Assisted Framework for Systematic Error Analysis in Clinical Concept Extraction. Research Square. 2025 Sep 17:rs-3.
Download index.html and open it directly in a browser, or use the live demo. No server or build step required. Configure your Azure OpenAI endpoint in the LLM Config panel.
Download index.html, then serve it over a local HTTP server β do not open it by double-clicking, as browsers block localhost requests from file:// pages:
cd /path/to/MedError
python3 -m http.server 8080
# open http://localhost:8080 in your browserSee LLM Configuration β Ollama for full setup instructions.
Requires Node.js β₯ 18 and pnpm β₯ 8.
cd error-analysis-web-app-source
pnpm install
pnpm dev # development server at http://localhost:5173
pnpm build # production build β dist/index.htmlMedError accepts JSON or CSV files containing one row per model prediction. Each row must include:
| Field | Type | Description |
|---|---|---|
input |
string | The clinical text span being evaluated |
gold_standard |
string or null | The correct label (null = no annotation expected) |
LLM_prediction |
string or null | The model's predicted label |
FP_FN |
"FP" or "FN" |
Whether this is a false positive or false negative |
model_type |
string | Model identifier (e.g., "Rule-based", "GPT-4") |
concept_category |
string | (optional) Concept class for grouping (auto-filled from gold_standard if omitted) |
error_type |
string | (optional) Pre-assigned error label; can be set or overridden in the UI |
Download a ready-to-use example from the app's Upload Errors tab, or from sample_data/error_input_examples.csv.
Upload a YAML file that defines gold-standard annotation rules for your target concept. See sample_data/annotation_guideline_example.yaml for a delirium-domain example.
The app ships with two built-in MedError taxonomies (sub-class and class level). You can also upload a custom YAML taxonomy β see the Concept Extraction Guideline tab for the expected format.
The full taxonomy is defined in Taxonomy/error_taxonomy_v2_1.md.
Six error dimensions are supported:
| Dimension | Description |
|---|---|
| Annotation Error | Human labeling errors in the gold standard |
| Contextual Error | Errors from misinterpreting clinical context (negation, certainty, section, subject, temporality) |
| Linguistic Error | Surface-form errors (morphology, spelling, abbreviation, synonyms, syntax) |
| Logic Error | Rule or pattern misspecification, hallucination, over-extraction |
| Output / Generation Error | LLM-specific failures: verbosity, inconsistency, sycophancy |
| Other Error | Incomplete extraction, dictionary errors, normalization errors |
MedError can call an LLM to automatically suggest an error class and reasoning for each FP/FN case. Configure the provider in the LLM Config sidebar panel before running analysis.
In the LLM Config panel, select Azure OpenAI and fill in:
| Field | Where to find it |
|---|---|
| Endpoint | Azure Portal β your OpenAI resource β Keys and Endpoint |
| Deployment name | Azure AI Studio β Deployments β your model name |
| API key | Azure Portal β your OpenAI resource β Keys and Endpoint |
Ollama runs models locally on your machine and exposes an OpenAI-compatible API. No account or API key is needed.
β οΈ Ollama does not work from the live demo athttps://ohnlp.org/MedError/. You must serve MedError locally (see step 3 below).
1. Install Ollama
Download and install from https://ollama.com/download for macOS, Windows, or Linux.
2. Pull a model
Open a terminal and pull a model. A 7β14B parameter model is sufficient for error classification:
ollama pull llama3.1 # 8B, good balance of speed and accuracy
ollama pull mistral # 7B, fast on CPU
ollama pull qwen2.5:14b # 14B, stronger reasoning3. Start Ollama
ollama serveOllama runs at http://localhost:11434. Leave this terminal open while using the app.
4. Serve MedError locally
Do not open index.html by double-clicking β browsers block localhost requests from file:// pages. Instead, serve it over HTTP:
cd /path/to/MedError
python3 -m http.server 8080Then open http://localhost:8080 in your browser.
5. Configure in MedError
In the LLM Config panel, select Ollama and set:
- Base URL:
http://localhost:11434(default, no change needed) - Model name: the model you pulled (e.g.,
llama3.1,mistral,qwen2.5:14b)
After loading the error file, MedError provides:
- Analysis Summary β total FP/FN counts, per-concept breakdown, and corpus statistics
- Upload Errors β per-case LLM suggestion, reasoning, and manual override controls
- Error Visualization β Sankey diagram and frequency charts across error dimensions
- Multi-site Comparison β side-by-side error distribution across studies or sites
- Export β downloadable CSV/JSON of all categorized errors with metadata
This project is licensed under the MIT License.
Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.
