EarXplore

EarXplore is an interactive research database for earable interaction studies. It combines a Flask backend with four synchronized exploration views and configurable filtering logic driven by YAML and CSV data.

Live instance: earxplore.teco.edu
Paper: arXiv:2507.20656

What This Repository Contains

earXplore/
├── app.py                                # Flask app (all routes and server-side data prep)
├── configs/
│   └── earXplore_interaction.yaml        # Main runtime configuration
├── datasets/
│   ├── data.csv                          # Core study dataset
│   ├── explanations.csv                  # Column descriptions shown in UI tooltips
│   ├── abstract_similarity/
│   │   ├── data_with_embeddings.csv
│   │   └── normalized_abstract_similarity.csv
│   ├── database_similarity/
│   │   └── normalized_database_similarity.csv
│   ├── interconnections/
│   │   ├── citation_matrix.csv
│   │   └── coauthor_matrix.csv
│   └── usage_logs/                       # Optional study logs exported by participants
├── computation_notebooks/                # One-off Jupyter notebooks for initial matrix computation
├── git_actions_scripts/
│   ├── update_similarity_matrices.py                        # Similarity matrices only (no author connections)
│   └── update_similarity_matrices_and_author_connections.py # Full update — used by CI workflow
├── readme_figures/                       # Images and SVGs embedded in this README
├── static/                               # Frontend JS/CSS/assets
├── templates/                            # Main Flask templates
└── similarity_human_matching/            # Separate rating/annotation mini-app

Quick Start

Recommended Python version: 3.11+ (project is currently run with modern Flask/pandas/numpy stack).

1) Create and activate a virtual environment

Windows (PowerShell):

py -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

3) Create `.env`

At minimum, set SECRET_KEY. If you use forms and chatbot features, configure SMTP and LLM variables too (see full section below).

4) Run the app

python app.py

Open: http://localhost:888

Optional dev mode:

flask run --debug

Environment Variables (.env)

The app reads environment variables via python-dotenv on startup.

Core / Security

SECRET_KEY="a-long-random-secret"
FLASK_DEBUG=false
BEHIND_PROXY=false

SECRET_KEY: Required for stable CSRF/session behavior.
BEHIND_PROXY=true: Enables ProxyFix so rate limiting uses client IP behind reverse proxies.

Mail (Add Study / Report Mistake forms)

MAIL_SERVER="your-smtp-server.example.com"
MAIL_PORT=587
MAIL_USE_TLS=true
MAIL_DEFAULT_SENDER="default-sender@example.com"
RECIPIENTS="reviewer@example.com"

Note on SMTP authentication: app.py currently configures only MAIL_SERVER, MAIL_PORT, MAIL_USE_TLS, and MAIL_DEFAULT_SENDER. If your SMTP server requires a username and password, add the following two lines to app.py alongside the other app.config assignments:
app.config['MAIL_USERNAME'] = os.getenv("MAIL_USERNAME")
app.config['MAIL_PASSWORD'] = os.getenv("MAIL_PASSWORD")
Then add MAIL_USERNAME and MAIL_PASSWORD to your .env accordingly.

Chatbot (EarBot)

LLM_API_URL="https://your-llm-endpoint/chat/completions"
LLM_API_KEY="your-key"
LLM_MODEL="your-model-name"

The endpoint must accept OpenAI-style chat-completions requests (POST with Authorization: Bearer <key>, a messages array, model, max_tokens, etc.) and return either an OpenAI-compatible choices[0].message.content response or a simpler response/reply field. This is compatible with OpenAI, Azure OpenAI, Mistral, Groq, university/KIT API gateways, and any other OpenAI-compatible provider.

If these variables are missing, /api/chat returns a configuration error to the client instead of crashing the server.

Similarity recomputation (embeddings)

GEMINI_API_KEY="your-gemini-api-key"

Used by the update scripts and workflows that generate abstract embeddings/similarity matrices.

Configuration via YAML

Runtime behavior is no longer hardcoded in app.py; it is loaded from configs/earXplore_interaction.yaml.

Important keys:

Key	Purpose
`database-path`	Path to the main CSV dataset used by the views.
`explanations-path`	Path to the CSV with tooltip explanations.
`excluded-sidebar-categories`	Columns hidden from normal sidebar filter generation.
`metadata-sidebar-categories`	Columns grouped into the metadata panel.
`slider-categories`	Numeric columns rendered as range sliders.
`select-deselect-all-categories`	Columns with per-category select/deselect-all controls.
`exclusive-filtering-categories`	Columns that support exclusive matching mode.
`select-deselect-all-panels`	Panels with panel-level bulk selection controls.
`initially-hidden-panels`	Panels collapsed by default.
`parenthical-columns`	Columns using `Value (details)` where filtering should use only `Value`.
`start-category-filters`	Default visible columns/categories in table/chart views.
`performance-metrics-columns`	Performance metric columns merged into one dedicated filter block.
`device-model-column`	Column used by the custom Device Model keyword filter block.
`device-model-options`	Fixed Device Model options shown in UI (e.g., OpenEarable, AirPods, Other, N/A).
`other-threshold-columns`	Columns where rare values are grouped into `Other`.
`token-search-columns`	Columns rendered with token-search UI (opt-in filtering).

Notes:

Performance metric columns are automatically treated as parenthetical and excluded from regular checkbox rendering.
Token-search columns use selected tokens as filters; empty selection means no constraint for that column.
The app computes rare-value sets and token options at startup from the current dataset.

Dataset and File Conventions

Required base data

datasets/data.csv
datasets/explanations.csv with at least Column and Explanation headers

Naming convention for grouping in sidebar panels

Columns containing _PANEL_ are grouped by the prefix before _PANEL_ (for example Interaction_PANEL_... goes to panel Interaction).

Similarity and timeline artifacts used by views

datasets/abstract_similarity/normalized_abstract_similarity.csv
datasets/database_similarity/normalized_database_similarity.csv
datasets/interconnections/citation_matrix.csv
datasets/interconnections/coauthor_matrix.csv

If citation/coauthor matrices are missing, timeline view falls back to zero matrices so the page can still render.

Recomputing Similarity and Connection Matrices

There are two ways to (re)compute the derived matrix files, depending on your situation.

Initial computation (first setup or full rebuild)

Use the Jupyter notebooks in computation_notebooks/:

Notebook	Output
`database_similarity.ipynb`	`datasets/database_similarity/normalized_database_similarity.csv`
`abstract_similarity.ipynb`	`datasets/abstract_similarity/data_with_embeddings.csv` + normalized similarity CSV (requires `GEMINI_API_KEY`)
`author_connections_timeline.ipynb`	`datasets/interconnections/coauthor_matrix.csv`
`grobid_citations_metadata.ipynb`	`datasets/interconnections/citation_matrix.csv` (requires a running GROBID Docker instance)

Note: The citation matrix is only produced by the GROBID notebook; the automated scripts do not update it. If no citation matrix is present, the Timeline View falls back to an all-zero matrix.

Incremental update (new studies added to an existing deployment)

Run from repository root:

python git_actions_scripts/update_similarity_matrices_and_author_connections.py

This script updates:

database similarity (datasets/database_similarity/normalized_database_similarity.csv)
abstract embeddings + similarity (datasets/abstract_similarity/data_with_embeddings.csv, datasets/abstract_similarity/abstract_similarity.csv, datasets/abstract_similarity/normalized_abstract_similarity.csv)
coauthor matrix (datasets/interconnections/coauthor_matrix.csv)

Automated (GitHub Actions)

Workflow: .github/workflows/update-matrices.yml

Triggers when a PR into main is closed and merged.
Runs git_actions_scripts/update_similarity_matrices_and_author_connections.py.
Requires repository secret GEMINI_API_KEY.
Commits generated changes back to the repository.

Views and Main Features

EarXplore offers four synchronized views sharing one filter state (persisted in session storage):

Tabular Overview (/)
Distribution Charts (/bar-chart)
Study Similarity (/similarity)
Study Timeline (/timeline)

Additional features:

Add Study / Report Mistake (/add_study) with CSRF protection, honeypot spam check, and mail dispatch.
EarBot chatbot (/api/chat) with rate limiting (20/day/IP), max input length checks, and sanitized markdown rendering on frontend.
Usage study logger (frontend-only) that records interactions in sessionStorage and exports JSON logs (no server-side tracking required).

Usage Walkthrough

The following media show the main interaction flow on the hosted instance.

View Selection Menu

Navigate between tabular, chart, similarity, timeline, and contribution entry points.

Filter Sidebar

Use panel/category controls, sliders, token filters, and bulk select actions.

Display Customization

Toggle visible columns/charts and color mappings while preserving active filters across views.

Modal Overlays

Open full study details and relation-specific overlays from charts/nodes.

Screenshots of the Four Views

Tabular View

Tabular View — (1) The Tabular View serves as landing page and can also be selected via the view selection menu. (2) By default, key information on each study (Main Author, Year, Location, Input Body Part, and Gesture) is displayed. (3) The top toggle menu allows users to show or hide columns with additional information. (4) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (5) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (6) Clicking the info icons at the beginning of each row opens a modal overlay that displays all available information for the selected study. (7) The entire dataset or a selected subset can be downloaded as a csv file.

Graphical View

Graphical View — (1) The Graphical View can be selected via the view selection menu. (2) The top toggle menu allows users to show or hide bar charts with additional information. (3) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (4) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (5) For each selected criterion, a bar chart displays the distribution of answer options. Chart size automatically adapts to the number of bars. (6) Users can adjust the threshold for the maximum number of bars shown per chart. (7) Clicking on a bar opens a modal overlay showing key information on all studies represented by that bar. (8) Clicking on the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. (9) The entire dataset or a selected subset can be downloaded as a csv file.

Similarity View

Similarity View — (1) The Similarity View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can choose between Database Similarity and Abstract Similarity. (5) A threshold slider controls which similarity connections are displayed. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay showing key information on all studies that meet the similarity threshold with the selected study. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. The full information view can also be displayed via the info icons attached to each study node.

Timeline View

Timeline View — (1) The Timeline View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can display shared author connections as dashed lines. (5) Citation connections, including their direction, can be shown as solid lines. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay displaying key information on all studies connected to the selected study through shared authorship or citations based on the user's selection. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study.

Contributing

Contributions are welcome in two ways:

Use the in-app form (/add_study) to suggest new papers or report mistakes.
Open pull requests with dataset/config/script improvements.

For dataset updates, keep generated similarity/connection artifacts in sync (locally or via the workflow).

License

This project is licensed under the MIT License.

Contact

GitHub: 98JoHu
E-mail: jonas.hummel@kit.edu

Name		Name	Last commit message	Last commit date
Latest commit History 420 Commits
.github/workflows		.github/workflows
.vscode		.vscode
computation_notebooks		computation_notebooks
configs		configs
datasets		datasets
git_actions_scripts		git_actions_scripts
readme_figures		readme_figures
similarity_human_matching		similarity_human_matching
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EarXplore

Table of Contents

What This Repository Contains

Quick Start

1) Create and activate a virtual environment

2) Install dependencies

3) Create .env

4) Run the app

Environment Variables (.env)

Core / Security

Mail (Add Study / Report Mistake forms)

Chatbot (EarBot)

Similarity recomputation (embeddings)

Configuration via YAML

Dataset and File Conventions

Required base data

Naming convention for grouping in sidebar panels

Similarity and timeline artifacts used by views

Recomputing Similarity and Connection Matrices

Initial computation (first setup or full rebuild)

Incremental update (new studies added to an existing deployment)

Automated (GitHub Actions)

Views and Main Features

Usage Walkthrough

View Selection Menu

Filter Sidebar

Display Customization

Modal Overlays

Screenshots of the Four Views

Tabular View

Graphical View

Similarity View

Timeline View

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

3) Create `.env`

Packages