Skip to content

OHNLP/MedError

Repository files navigation

MedError

MedError is an open-source framework for systematic error analysis of clinical NLP and large language model (LLM) outputs in electronic health record (EHR)-based concept extraction. It provides a structured error taxonomy, an LLM-assisted annotation interface, and visual analytics for multi-site clinical NLP evaluation.

License: MIT


🌐 Live Demo

Try the app: https://ohnlp.org/MedError/

The live demo supports Azure OpenAI out of the box. Ollama (local LLM) does not work from the live demo β€” use the standalone index.html instead (see Quickstart below).

App Screenshot


Overview

MedError supports a three-step workflow:

  1. Configure β€” upload an annotation guideline (YAML) and select or upload an error taxonomy
  2. Load β€” upload model predictions (JSON or CSV) containing gold-standard labels and model outputs
  3. Analyze β€” review LLM-generated error categorizations, override as needed, and export results

The error taxonomy covers six dimensions β€” Annotation, Contextual, Linguistic, Logic, Output/Generation, and Other β€” with support for both rule-based and transformer/LLM model types.


Citation

If you use MedError in your research, please cite:

Liu H, Fu S, Lu Q, Ahn J, Chen F, Yin H, Wen J, Yue Z, Harrison T, Jun J, Ruan X. MedError: A Machine-Assisted Framework for Systematic Error Analysis in Clinical Concept Extraction. Research Square. 2025 Sep 17:rs-3.


Quickstart

Option A β€” Azure OpenAI (no install)

Download index.html and open it directly in a browser, or use the live demo. No server or build step required. Configure your Azure OpenAI endpoint in the LLM Config panel.

Option B β€” Ollama (local LLM, no API key)

Download index.html, then serve it over a local HTTP server β€” do not open it by double-clicking, as browsers block localhost requests from file:// pages:

cd /path/to/MedError
python3 -m http.server 8080
# open http://localhost:8080 in your browser

See LLM Configuration β†’ Ollama for full setup instructions.

Option C β€” Run from source

Requires Node.js β‰₯ 18 and pnpm β‰₯ 8.

cd error-analysis-web-app-source
pnpm install
pnpm dev        # development server at http://localhost:5173
pnpm build      # production build β†’ dist/index.html

Input Format

MedError accepts JSON or CSV files containing one row per model prediction. Each row must include:

Field Type Description
input string The clinical text span being evaluated
gold_standard string or null The correct label (null = no annotation expected)
LLM_prediction string or null The model's predicted label
FP_FN "FP" or "FN" Whether this is a false positive or false negative
model_type string Model identifier (e.g., "Rule-based", "GPT-4")
concept_category string (optional) Concept class for grouping (auto-filled from gold_standard if omitted)
error_type string (optional) Pre-assigned error label; can be set or overridden in the UI

Download a ready-to-use example from the app's Upload Errors tab, or from sample_data/error_input_examples.csv.

Annotation Guideline (YAML)

Upload a YAML file that defines gold-standard annotation rules for your target concept. See sample_data/annotation_guideline_example.yaml for a delirium-domain example.

Error Taxonomy (YAML)

The app ships with two built-in MedError taxonomies (sub-class and class level). You can also upload a custom YAML taxonomy β€” see the Concept Extraction Guideline tab for the expected format.


Error Taxonomy

The full taxonomy is defined in Taxonomy/error_taxonomy_v2_1.md.

Six error dimensions are supported:

Dimension Description
Annotation Error Human labeling errors in the gold standard
Contextual Error Errors from misinterpreting clinical context (negation, certainty, section, subject, temporality)
Linguistic Error Surface-form errors (morphology, spelling, abbreviation, synonyms, syntax)
Logic Error Rule or pattern misspecification, hallucination, over-extraction
Output / Generation Error LLM-specific failures: verbosity, inconsistency, sycophancy
Other Error Incomplete extraction, dictionary errors, normalization errors

LLM Configuration

MedError can call an LLM to automatically suggest an error class and reasoning for each FP/FN case. Configure the provider in the LLM Config sidebar panel before running analysis.

Option A β€” Azure OpenAI

In the LLM Config panel, select Azure OpenAI and fill in:

Field Where to find it
Endpoint Azure Portal β†’ your OpenAI resource β†’ Keys and Endpoint
Deployment name Azure AI Studio β†’ Deployments β†’ your model name
API key Azure Portal β†’ your OpenAI resource β†’ Keys and Endpoint

Option B β€” Ollama (local, no API key required)

Ollama runs models locally on your machine and exposes an OpenAI-compatible API. No account or API key is needed.

⚠️ Ollama does not work from the live demo at https://ohnlp.org/MedError/. You must serve MedError locally (see step 3 below).

1. Install Ollama

Download and install from https://ollama.com/download for macOS, Windows, or Linux.

2. Pull a model

Open a terminal and pull a model. A 7–14B parameter model is sufficient for error classification:

ollama pull llama3.1        # 8B, good balance of speed and accuracy
ollama pull mistral         # 7B, fast on CPU
ollama pull qwen2.5:14b     # 14B, stronger reasoning

3. Start Ollama

ollama serve

Ollama runs at http://localhost:11434. Leave this terminal open while using the app.

4. Serve MedError locally

Do not open index.html by double-clicking β€” browsers block localhost requests from file:// pages. Instead, serve it over HTTP:

cd /path/to/MedError
python3 -m http.server 8080

Then open http://localhost:8080 in your browser.

5. Configure in MedError

In the LLM Config panel, select Ollama and set:

  • Base URL: http://localhost:11434 (default, no change needed)
  • Model name: the model you pulled (e.g., llama3.1, mistral, qwen2.5:14b)

Expected Output

After loading the error file, MedError provides:

  • Analysis Summary β€” total FP/FN counts, per-concept breakdown, and corpus statistics
  • Upload Errors β€” per-case LLM suggestion, reasoning, and manual override controls
  • Error Visualization β€” Sankey diagram and frequency charts across error dimensions
  • Multi-site Comparison β€” side-by-side error distribution across studies or sites
  • Export β€” downloadable CSV/JSON of all categorized errors with metadata

License

This project is licensed under the MIT License.


Contributing

Contributions are welcome. Please open an issue to discuss proposed changes before submitting a pull request.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors