Skip to content

OpenEarable/earXplore

Repository files navigation

EarXplore

Paper Teaser Figure

EarXplore is an interactive research database for earable interaction studies. It combines a Flask backend with four synchronized exploration views and configurable filtering logic driven by YAML and CSV data.

Live instance: earxplore.teco.edu
Paper: arXiv:2507.20656


Table of Contents

  1. What This Repository Contains
  2. Quick Start
  3. Environment Variables (.env)
  4. Configuration via YAML
  5. Dataset and File Conventions
  6. Recomputing Similarity and Connection Matrices
  7. Views and Main Features
  8. Usage Walkthrough
  9. Contributing
  10. License
  11. Contact

What This Repository Contains

earXplore/
├── app.py                                # Flask app (all routes and server-side data prep)
├── configs/
│   └── earXplore_interaction.yaml        # Main runtime configuration
├── datasets/
│   ├── data.csv                          # Core study dataset
│   ├── explanations.csv                  # Column descriptions shown in UI tooltips
│   ├── abstract_similarity/
│   │   ├── data_with_embeddings.csv
│   │   └── normalized_abstract_similarity.csv
│   ├── database_similarity/
│   │   └── normalized_database_similarity.csv
│   ├── interconnections/
│   │   ├── citation_matrix.csv
│   │   └── coauthor_matrix.csv
│   └── usage_logs/                       # Optional study logs exported by participants
├── computation_notebooks/                # One-off Jupyter notebooks for initial matrix computation
├── git_actions_scripts/
│   ├── update_similarity_matrices.py                        # Similarity matrices only (no author connections)
│   └── update_similarity_matrices_and_author_connections.py # Full update — used by CI workflow
├── readme_figures/                       # Images and SVGs embedded in this README
├── static/                               # Frontend JS/CSS/assets
├── templates/                            # Main Flask templates
└── similarity_human_matching/            # Separate rating/annotation mini-app

Quick Start

Recommended Python version: 3.11+ (project is currently run with modern Flask/pandas/numpy stack).

1) Create and activate a virtual environment

Windows (PowerShell):

py -m venv .venv
.\.venv\Scripts\Activate.ps1

macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate

2) Install dependencies

pip install -r requirements.txt

3) Create .env

At minimum, set SECRET_KEY. If you use forms and chatbot features, configure SMTP and LLM variables too (see full section below).

4) Run the app

python app.py

Open: http://localhost:888

Optional dev mode:

flask run --debug

Environment Variables (.env)

The app reads environment variables via python-dotenv on startup.

Core / Security

SECRET_KEY="a-long-random-secret"
FLASK_DEBUG=false
BEHIND_PROXY=false
  • SECRET_KEY: Required for stable CSRF/session behavior.
  • BEHIND_PROXY=true: Enables ProxyFix so rate limiting uses client IP behind reverse proxies.

Mail (Add Study / Report Mistake forms)

MAIL_SERVER="your-smtp-server.example.com"
MAIL_PORT=587
MAIL_USE_TLS=true
MAIL_DEFAULT_SENDER="default-sender@example.com"
RECIPIENTS="reviewer@example.com"

Note on SMTP authentication: app.py currently configures only MAIL_SERVER, MAIL_PORT, MAIL_USE_TLS, and MAIL_DEFAULT_SENDER. If your SMTP server requires a username and password, add the following two lines to app.py alongside the other app.config assignments:

app.config['MAIL_USERNAME'] = os.getenv("MAIL_USERNAME")
app.config['MAIL_PASSWORD'] = os.getenv("MAIL_PASSWORD")

Then add MAIL_USERNAME and MAIL_PASSWORD to your .env accordingly.

Chatbot (EarBot)

LLM_API_URL="https://your-llm-endpoint/chat/completions"
LLM_API_KEY="your-key"
LLM_MODEL="your-model-name"

The endpoint must accept OpenAI-style chat-completions requests (POST with Authorization: Bearer <key>, a messages array, model, max_tokens, etc.) and return either an OpenAI-compatible choices[0].message.content response or a simpler response/reply field. This is compatible with OpenAI, Azure OpenAI, Mistral, Groq, university/KIT API gateways, and any other OpenAI-compatible provider.

If these variables are missing, /api/chat returns a configuration error to the client instead of crashing the server.

Similarity recomputation (embeddings)

GEMINI_API_KEY="your-gemini-api-key"

Used by the update scripts and workflows that generate abstract embeddings/similarity matrices.


Configuration via YAML

Runtime behavior is no longer hardcoded in app.py; it is loaded from configs/earXplore_interaction.yaml.

Important keys:

Key Purpose
database-path Path to the main CSV dataset used by the views.
explanations-path Path to the CSV with tooltip explanations.
excluded-sidebar-categories Columns hidden from normal sidebar filter generation.
metadata-sidebar-categories Columns grouped into the metadata panel.
slider-categories Numeric columns rendered as range sliders.
select-deselect-all-categories Columns with per-category select/deselect-all controls.
exclusive-filtering-categories Columns that support exclusive matching mode.
select-deselect-all-panels Panels with panel-level bulk selection controls.
initially-hidden-panels Panels collapsed by default.
parenthical-columns Columns using Value (details) where filtering should use only Value.
start-category-filters Default visible columns/categories in table/chart views.
performance-metrics-columns Performance metric columns merged into one dedicated filter block.
device-model-column Column used by the custom Device Model keyword filter block.
device-model-options Fixed Device Model options shown in UI (e.g., OpenEarable, AirPods, Other, N/A).
other-threshold-columns Columns where rare values are grouped into Other.
token-search-columns Columns rendered with token-search UI (opt-in filtering).

Notes:

  • Performance metric columns are automatically treated as parenthetical and excluded from regular checkbox rendering.
  • Token-search columns use selected tokens as filters; empty selection means no constraint for that column.
  • The app computes rare-value sets and token options at startup from the current dataset.

Dataset and File Conventions

Required base data

  • datasets/data.csv
  • datasets/explanations.csv with at least Column and Explanation headers

Naming convention for grouping in sidebar panels

Columns containing _PANEL_ are grouped by the prefix before _PANEL_ (for example Interaction_PANEL_... goes to panel Interaction).

Similarity and timeline artifacts used by views

  • datasets/abstract_similarity/normalized_abstract_similarity.csv
  • datasets/database_similarity/normalized_database_similarity.csv
  • datasets/interconnections/citation_matrix.csv
  • datasets/interconnections/coauthor_matrix.csv

If citation/coauthor matrices are missing, timeline view falls back to zero matrices so the page can still render.


Recomputing Similarity and Connection Matrices

There are two ways to (re)compute the derived matrix files, depending on your situation.

Initial computation (first setup or full rebuild)

Use the Jupyter notebooks in computation_notebooks/:

Notebook Output
database_similarity.ipynb datasets/database_similarity/normalized_database_similarity.csv
abstract_similarity.ipynb datasets/abstract_similarity/data_with_embeddings.csv + normalized similarity CSV (requires GEMINI_API_KEY)
author_connections_timeline.ipynb datasets/interconnections/coauthor_matrix.csv
grobid_citations_metadata.ipynb datasets/interconnections/citation_matrix.csv (requires a running GROBID Docker instance)

Note: The citation matrix is only produced by the GROBID notebook; the automated scripts do not update it. If no citation matrix is present, the Timeline View falls back to an all-zero matrix.

Incremental update (new studies added to an existing deployment)

Run from repository root:

python git_actions_scripts/update_similarity_matrices_and_author_connections.py

This script updates:

  • database similarity (datasets/database_similarity/normalized_database_similarity.csv)
  • abstract embeddings + similarity (datasets/abstract_similarity/data_with_embeddings.csv, datasets/abstract_similarity/abstract_similarity.csv, datasets/abstract_similarity/normalized_abstract_similarity.csv)
  • coauthor matrix (datasets/interconnections/coauthor_matrix.csv)

Automated (GitHub Actions)

Workflow: .github/workflows/update-matrices.yml

  • Triggers when a PR into main is closed and merged.
  • Runs git_actions_scripts/update_similarity_matrices_and_author_connections.py.
  • Requires repository secret GEMINI_API_KEY.
  • Commits generated changes back to the repository.

Views and Main Features

EarXplore offers four synchronized views sharing one filter state (persisted in session storage):

  1. Tabular Overview (/)
  2. Distribution Charts (/bar-chart)
  3. Study Similarity (/similarity)
  4. Study Timeline (/timeline)

Additional features:

  • Add Study / Report Mistake (/add_study) with CSRF protection, honeypot spam check, and mail dispatch.
  • EarBot chatbot (/api/chat) with rate limiting (20/day/IP), max input length checks, and sanitized markdown rendering on frontend.
  • Usage study logger (frontend-only) that records interactions in sessionStorage and exports JSON logs (no server-side tracking required).

Usage Walkthrough

The following media show the main interaction flow on the hosted instance.

View Selection Menu

Navigate between tabular, chart, similarity, timeline, and contribution entry points.

navbar_demonstration gif

Filter Sidebar

Use panel/category controls, sliders, token filters, and bulk select actions.

sidebar_demonstration gif

Display Customization

Toggle visible columns/charts and color mappings while preserving active filters across views.

filter_demonstration gif

Modal Overlays

Open full study details and relation-specific overlays from charts/nodes.

modal_demonstration gif

Screenshots of the Four Views

Tabular View

Tabular View

Tabular View — (1) The Tabular View serves as landing page and can also be selected via the view selection menu. (2) By default, key information on each study (Main Author, Year, Location, Input Body Part, and Gesture) is displayed. (3) The top toggle menu allows users to show or hide columns with additional information. (4) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (5) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (6) Clicking the info icons at the beginning of each row opens a modal overlay that displays all available information for the selected study. (7) The entire dataset or a selected subset can be downloaded as a csv file.

Graphical View

Graphical View

Graphical View — (1) The Graphical View can be selected via the view selection menu. (2) The top toggle menu allows users to show or hide bar charts with additional information. (3) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (4) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (5) For each selected criterion, a bar chart displays the distribution of answer options. Chart size automatically adapts to the number of bars. (6) Users can adjust the threshold for the maximum number of bars shown per chart. (7) Clicking on a bar opens a modal overlay showing key information on all studies represented by that bar. (8) Clicking on the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. (9) The entire dataset or a selected subset can be downloaded as a csv file.

Similarity View

Similarity View

Similarity View — (1) The Similarity View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can choose between Database Similarity and Abstract Similarity. (5) A threshold slider controls which similarity connections are displayed. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay showing key information on all studies that meet the similarity threshold with the selected study. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. The full information view can also be displayed via the info icons attached to each study node.

Timeline View

Timeline View

Timeline View — (1) The Timeline View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can display shared author connections as dashed lines. (5) Citation connections, including their direction, can be shown as solid lines. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay displaying key information on all studies connected to the selected study through shared authorship or citations based on the user's selection. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study.


Contributing

Contributions are welcome in two ways:

  1. Use the in-app form (/add_study) to suggest new papers or report mistakes.
  2. Open pull requests with dataset/config/script improvements.

For dataset updates, keep generated similarity/connection artifacts in sync (locally or via the workflow).


License

This project is licensed under the MIT License.


Contact

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors