EarXplore is an interactive research database for earable interaction studies. It combines a Flask backend with four synchronized exploration views and configurable filtering logic driven by YAML and CSV data.
Live instance: earxplore.teco.edu
Paper: arXiv:2507.20656
- What This Repository Contains
- Quick Start
- Environment Variables (.env)
- Configuration via YAML
- Dataset and File Conventions
- Recomputing Similarity and Connection Matrices
- Views and Main Features
- Usage Walkthrough
- Contributing
- License
- Contact
earXplore/
├── app.py # Flask app (all routes and server-side data prep)
├── configs/
│ └── earXplore_interaction.yaml # Main runtime configuration
├── datasets/
│ ├── data.csv # Core study dataset
│ ├── explanations.csv # Column descriptions shown in UI tooltips
│ ├── abstract_similarity/
│ │ ├── data_with_embeddings.csv
│ │ └── normalized_abstract_similarity.csv
│ ├── database_similarity/
│ │ └── normalized_database_similarity.csv
│ ├── interconnections/
│ │ ├── citation_matrix.csv
│ │ └── coauthor_matrix.csv
│ └── usage_logs/ # Optional study logs exported by participants
├── computation_notebooks/ # One-off Jupyter notebooks for initial matrix computation
├── git_actions_scripts/
│ ├── update_similarity_matrices.py # Similarity matrices only (no author connections)
│ └── update_similarity_matrices_and_author_connections.py # Full update — used by CI workflow
├── readme_figures/ # Images and SVGs embedded in this README
├── static/ # Frontend JS/CSS/assets
├── templates/ # Main Flask templates
└── similarity_human_matching/ # Separate rating/annotation mini-app
Recommended Python version: 3.11+ (project is currently run with modern Flask/pandas/numpy stack).
Windows (PowerShell):
py -m venv .venv
.\.venv\Scripts\Activate.ps1macOS/Linux:
python3 -m venv .venv
source .venv/bin/activatepip install -r requirements.txtAt minimum, set SECRET_KEY. If you use forms and chatbot features, configure SMTP and LLM variables too (see full section below).
python app.pyOpen: http://localhost:888
Optional dev mode:
flask run --debugThe app reads environment variables via python-dotenv on startup.
SECRET_KEY="a-long-random-secret"
FLASK_DEBUG=false
BEHIND_PROXY=falseSECRET_KEY: Required for stable CSRF/session behavior.BEHIND_PROXY=true: EnablesProxyFixso rate limiting uses client IP behind reverse proxies.
MAIL_SERVER="your-smtp-server.example.com"
MAIL_PORT=587
MAIL_USE_TLS=true
MAIL_DEFAULT_SENDER="default-sender@example.com"
RECIPIENTS="reviewer@example.com"Note on SMTP authentication:
app.pycurrently configures onlyMAIL_SERVER,MAIL_PORT,MAIL_USE_TLS, andMAIL_DEFAULT_SENDER. If your SMTP server requires a username and password, add the following two lines toapp.pyalongside the otherapp.configassignments:app.config['MAIL_USERNAME'] = os.getenv("MAIL_USERNAME") app.config['MAIL_PASSWORD'] = os.getenv("MAIL_PASSWORD")Then add
MAIL_USERNAMEandMAIL_PASSWORDto your.envaccordingly.
LLM_API_URL="https://your-llm-endpoint/chat/completions"
LLM_API_KEY="your-key"
LLM_MODEL="your-model-name"The endpoint must accept OpenAI-style chat-completions requests (POST with Authorization: Bearer <key>, a messages array, model, max_tokens, etc.) and return either an OpenAI-compatible choices[0].message.content response or a simpler response/reply field. This is compatible with OpenAI, Azure OpenAI, Mistral, Groq, university/KIT API gateways, and any other OpenAI-compatible provider.
If these variables are missing, /api/chat returns a configuration error to the client instead of crashing the server.
GEMINI_API_KEY="your-gemini-api-key"Used by the update scripts and workflows that generate abstract embeddings/similarity matrices.
Runtime behavior is no longer hardcoded in app.py; it is loaded from configs/earXplore_interaction.yaml.
Important keys:
| Key | Purpose |
|---|---|
database-path |
Path to the main CSV dataset used by the views. |
explanations-path |
Path to the CSV with tooltip explanations. |
excluded-sidebar-categories |
Columns hidden from normal sidebar filter generation. |
metadata-sidebar-categories |
Columns grouped into the metadata panel. |
slider-categories |
Numeric columns rendered as range sliders. |
select-deselect-all-categories |
Columns with per-category select/deselect-all controls. |
exclusive-filtering-categories |
Columns that support exclusive matching mode. |
select-deselect-all-panels |
Panels with panel-level bulk selection controls. |
initially-hidden-panels |
Panels collapsed by default. |
parenthical-columns |
Columns using Value (details) where filtering should use only Value. |
start-category-filters |
Default visible columns/categories in table/chart views. |
performance-metrics-columns |
Performance metric columns merged into one dedicated filter block. |
device-model-column |
Column used by the custom Device Model keyword filter block. |
device-model-options |
Fixed Device Model options shown in UI (e.g., OpenEarable, AirPods, Other, N/A). |
other-threshold-columns |
Columns where rare values are grouped into Other. |
token-search-columns |
Columns rendered with token-search UI (opt-in filtering). |
Notes:
- Performance metric columns are automatically treated as parenthetical and excluded from regular checkbox rendering.
- Token-search columns use selected tokens as filters; empty selection means no constraint for that column.
- The app computes rare-value sets and token options at startup from the current dataset.
datasets/data.csvdatasets/explanations.csvwith at leastColumnandExplanationheaders
Columns containing _PANEL_ are grouped by the prefix before _PANEL_ (for example Interaction_PANEL_... goes to panel Interaction).
datasets/abstract_similarity/normalized_abstract_similarity.csvdatasets/database_similarity/normalized_database_similarity.csvdatasets/interconnections/citation_matrix.csvdatasets/interconnections/coauthor_matrix.csv
If citation/coauthor matrices are missing, timeline view falls back to zero matrices so the page can still render.
There are two ways to (re)compute the derived matrix files, depending on your situation.
Use the Jupyter notebooks in computation_notebooks/:
| Notebook | Output |
|---|---|
database_similarity.ipynb |
datasets/database_similarity/normalized_database_similarity.csv |
abstract_similarity.ipynb |
datasets/abstract_similarity/data_with_embeddings.csv + normalized similarity CSV (requires GEMINI_API_KEY) |
author_connections_timeline.ipynb |
datasets/interconnections/coauthor_matrix.csv |
grobid_citations_metadata.ipynb |
datasets/interconnections/citation_matrix.csv (requires a running GROBID Docker instance) |
Note: The citation matrix is only produced by the GROBID notebook; the automated scripts do not update it. If no citation matrix is present, the Timeline View falls back to an all-zero matrix.
Run from repository root:
python git_actions_scripts/update_similarity_matrices_and_author_connections.pyThis script updates:
- database similarity (
datasets/database_similarity/normalized_database_similarity.csv) - abstract embeddings + similarity (
datasets/abstract_similarity/data_with_embeddings.csv,datasets/abstract_similarity/abstract_similarity.csv,datasets/abstract_similarity/normalized_abstract_similarity.csv) - coauthor matrix (
datasets/interconnections/coauthor_matrix.csv)
Workflow: .github/workflows/update-matrices.yml
- Triggers when a PR into
mainis closed and merged. - Runs
git_actions_scripts/update_similarity_matrices_and_author_connections.py. - Requires repository secret
GEMINI_API_KEY. - Commits generated changes back to the repository.
EarXplore offers four synchronized views sharing one filter state (persisted in session storage):
- Tabular Overview (
/) - Distribution Charts (
/bar-chart) - Study Similarity (
/similarity) - Study Timeline (
/timeline)
Additional features:
- Add Study / Report Mistake (
/add_study) with CSRF protection, honeypot spam check, and mail dispatch. - EarBot chatbot (
/api/chat) with rate limiting (20/day/IP), max input length checks, and sanitized markdown rendering on frontend. - Usage study logger (frontend-only) that records interactions in
sessionStorageand exports JSON logs (no server-side tracking required).
The following media show the main interaction flow on the hosted instance.
Navigate between tabular, chart, similarity, timeline, and contribution entry points.
Use panel/category controls, sliders, token filters, and bulk select actions.
Toggle visible columns/charts and color mappings while preserving active filters across views.
Open full study details and relation-specific overlays from charts/nodes.
Tabular View — (1) The Tabular View serves as landing page and can also be selected via the view selection menu. (2) By default, key information on each study (Main Author, Year, Location, Input Body Part, and Gesture) is displayed. (3) The top toggle menu allows users to show or hide columns with additional information. (4) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (5) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (6) Clicking the info icons at the beginning of each row opens a modal overlay that displays all available information for the selected study. (7) The entire dataset or a selected subset can be downloaded as a csv file.
Graphical View — (1) The Graphical View can be selected via the view selection menu. (2) The top toggle menu allows users to show or hide bar charts with additional information. (3) Filters in the sidebar enable users to refine the database by including or excluding specific attribute values. (4) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (5) For each selected criterion, a bar chart displays the distribution of answer options. Chart size automatically adapts to the number of bars. (6) Users can adjust the threshold for the maximum number of bars shown per chart. (7) Clicking on a bar opens a modal overlay showing key information on all studies represented by that bar. (8) Clicking on the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. (9) The entire dataset or a selected subset can be downloaded as a csv file.
Similarity View — (1) The Similarity View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can choose between Database Similarity and Abstract Similarity. (5) A threshold slider controls which similarity connections are displayed. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay showing key information on all studies that meet the similarity threshold with the selected study. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study. The full information view can also be displayed via the info icons attached to each study node.
Timeline View — (1) The Timeline View can be selected via the view selection menu. (2) Filters allow the user to refine the database along several criteria. (3) A chatbot can optionally be consulted for assistance in selecting and filtering criteria. (4) The user can display shared author connections as dashed lines. (5) Citation connections, including their direction, can be shown as solid lines. (6) The nodes representing the studies can be colored and sorted along several criteria. (7) Clicking on a node opens a modal overlay displaying key information on all studies connected to the selected study through shared authorship or citations based on the user's selection. (8) Clicking the info icons at the beginning of each row within the modal overlay reveals the full information modal overlay for the respective study.
Contributions are welcome in two ways:
- Use the in-app form (
/add_study) to suggest new papers or report mistakes. - Open pull requests with dataset/config/script improvements.
For dataset updates, keep generated similarity/connection artifacts in sync (locally or via the workflow).
This project is licensed under the MIT License.
- GitHub: 98JoHu
- E-mail: jonas.hummel@kit.edu



