MedError

MedError is an open-source framework for systematic error analysis of clinical NLP and large language model (LLM) outputs in electronic health record (EHR)-based concept extraction. It provides a structured error taxonomy, an LLM-assisted annotation interface, and visual analytics for multi-site clinical NLP evaluation.

🌐 Live Demo

Try the app: https://ohnlp.org/MedError/

The live demo supports Azure OpenAI out of the box. Ollama (local LLM) does not work from the live demo — use the standalone index.html instead (see Quickstart below).

Overview

MedError supports a three-step workflow:

Configure — upload an annotation guideline (YAML) and select or upload an error taxonomy
Load — upload model predictions (JSON or CSV) containing gold-standard labels and model outputs
Analyze — review LLM-generated error categorizations, override as needed, and export results

The error taxonomy covers six dimensions — Annotation, Contextual, Linguistic, Logic, Output/Generation, and Other — with support for both rule-based and transformer/LLM model types.

Citation

If you use MedError in your research, please cite:

Liu H, Fu S, Lu Q, Ahn J, Chen F, Yin H, Wen J, Yue Z, Harrison T, Jun J, Ruan X. MedError: A Machine-Assisted Framework for Systematic Error Analysis in Clinical Concept Extraction. Research Square. 2025 Sep 17:rs-3.

Quickstart

Option A — Azure OpenAI (no install)

Download index.html and open it directly in a browser, or use the live demo. No server or build step required. Configure your Azure OpenAI endpoint in the LLM Config panel.

Option B — Ollama (local LLM, no API key)

Download index.html, then serve it over a local HTTP server — do not open it by double-clicking, as browsers block localhost requests from file:// pages:

cd /path/to/MedError
python3 -m http.server 8080
# open http://localhost:8080 in your browser

See LLM Configuration → Ollama for full setup instructions.

Option C — Run from source

Requires Node.js ≥ 18 and pnpm ≥ 8.

cd error-analysis-web-app-source
pnpm install
pnpm dev        # development server at http://localhost:5173
pnpm build      # production build → dist/index.html

Input Format

MedError accepts JSON or CSV files containing one row per model prediction. Each row must include:

Field	Type	Description
`input`	string	The clinical text span being evaluated
`gold_standard`	string or null	The correct label (null = no annotation expected)
`LLM_prediction`	string or null	The model's predicted label
`FP_FN`	`"FP"` or `"FN"`	Whether this is a false positive or false negative
`model_type`	string	Model identifier (e.g., `"Rule-based"`, `"GPT-4"`)
`concept_category`	string	(optional) Concept class for grouping (auto-filled from `gold_standard` if omitted)
`error_type`	string	(optional) Pre-assigned error label; can be set or overridden in the UI

Download a ready-to-use example from the app's Upload Errors tab, or from sample_data/error_input_examples.csv.

Annotation Guideline (YAML)

Upload a YAML file that defines gold-standard annotation rules for your target concept. See sample_data/annotation_guideline_example.yaml for a delirium-domain example.

Error Taxonomy (YAML)

The app ships with two built-in MedError taxonomies (sub-class and class level). You can also upload a custom YAML taxonomy — see the Concept Extraction Guideline tab for the expected format.

Error Taxonomy

The full taxonomy is defined in Taxonomy/error_taxonomy_v2_1.md.

Six error dimensions are supported:

Dimension	Description
Annotation Error	Human labeling errors in the gold standard
Contextual Error	Errors from misinterpreting clinical context (negation, certainty, section, subject, temporality)
Linguistic Error	Surface-form errors (morphology, spelling, abbreviation, synonyms, syntax)
Logic Error	Rule or pattern misspecification, hallucination, over-extraction
Output / Generation Error	LLM-specific failures: verbosity, inconsistency, sycophancy
Other Error	Incomplete extraction, dictionary errors, normalization errors

LLM Configuration

MedError can call an LLM to automatically suggest an error class and reasoning for each FP/FN case. Configure the provider in the LLM Config sidebar panel before running analysis.

Option A — Azure OpenAI

In the LLM Config panel, select Azure OpenAI and fill in:

Field	Where to find it
Endpoint	Azure Portal → your OpenAI resource → Keys and Endpoint
Deployment name	Azure AI Studio → Deployments → your model name
API key	Azure Portal → your OpenAI resource → Keys and Endpoint

Option B — Ollama (local, no API key required)

Ollama runs models locally on your machine and exposes an OpenAI-compatible API. No account or API key is needed.

⚠️ Ollama does not work from the live demo at https://ohnlp.org/MedError/. You must serve MedError locally (see step 3 below).

1. Install Ollama

Download and install from https://ollama.com/download for macOS, Windows, or Linux.

2. Pull a model

Open a terminal and pull a model. A 7–14B parameter model is sufficient for error classification:

ollama pull llama3.1        # 8B, good balance of speed and accuracy
ollama pull mistral         # 7B, fast on CPU
ollama pull qwen2.5:14b     # 14B, stronger reasoning

3. Start Ollama

ollama serve

Ollama runs at http://localhost:11434. Leave this terminal open while using the app.

4. Serve MedError locally

Do not open index.html by double-clicking — browsers block localhost requests from file:// pages. Instead, serve it over HTTP:

cd /path/to/MedError
python3 -m http.server 8080

Then open http://localhost:8080 in your browser.

5. Configure in MedError

In the LLM Config panel, select Ollama and set:

Base URL: http://localhost:11434 (default, no change needed)
Model name: the model you pulled (e.g., llama3.1, mistral, qwen2.5:14b)

Expected Output

After loading the error file, MedError provides:

Analysis Summary — total FP/FN counts, per-concept breakdown, and corpus statistics
Upload Errors — per-case LLM suggestion, reasoning, and manual override controls
Error Visualization — Sankey diagram and frequency charts across error dimensions
Multi-site Comparison — side-by-side error distribution across studies or sites
Export — downloadable CSV/JSON of all categorized errors with metadata

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
.github/workflows		.github/workflows
Taxonomy		Taxonomy
assets		assets
docs		docs
error-analysis-web-app-source		error-analysis-web-app-source
sample_data		sample_data
src/mederror		src/mederror
.env		.env
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
index.html		index.html
pyproject.toml		pyproject.toml
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MedError

🌐 Live Demo

Overview

Citation

Quickstart

Option A — Azure OpenAI (no install)

Option B — Ollama (local LLM, no API key)

Option C — Run from source

Input Format

Annotation Guideline (YAML)

Error Taxonomy (YAML)

Error Taxonomy

LLM Configuration

Option A — Azure OpenAI

Option B — Ollama (local, no API key required)

Expected Output

License

Contributing

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MedError

🌐 Live Demo

Overview

Citation

Quickstart

Option A — Azure OpenAI (no install)

Option B — Ollama (local LLM, no API key)

Option C — Run from source

Input Format

Annotation Guideline (YAML)

Error Taxonomy (YAML)

Error Taxonomy

LLM Configuration

Option A — Azure OpenAI

Option B — Ollama (local, no API key required)

Expected Output

License

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages