Static USC Data Science / IRDS site for the DSCI 550 Spring 2025 Haunted Places data visualization assignment. The site collects student HW3 DATAVIS work and presents it in a GitHub Pages-ready layout modeled after the IRDS class project sites ufo.usc.edu and phishing.usc.edu.
Published site:
Repository:
cd /Users/mattmann/git/haunted.usc.edu
python3 -m http.server 8008Then open:
- http://localhost:8008/
- http://localhost:8008/html/d3-examples.html
- http://localhost:8008/teams/team_9/team_9.html
- http://localhost:8008/teams/team_14/team_14.html
The site uses relative paths and vendored JavaScript/CSS, so it works both from a local HTTP server and from http://irds.usc.edu/haunted.usc.edu/.
The class used the Kaggle Haunted Places dataset:
The original dataset contains reported haunted locations across the United States with fields such as city, state, location name, description, latitude, and longitude. In DSCI 550 Spring 2025, students worked with this data across three assignments:
- HW1: Big Data / feature engineering - add evidence, temporal, witness-count, apparition, event, and external public-dataset features.
- HW2: Extraction / multimodal enrichment - use tools such as Apache Tika, GeoTopicParser, SpaCy, AI image generation, image captioning, and object recognition to add geospatial, entity, image, and caption features.
- HW3: Web data visualization - build D3 visualizations and explore larger investigative/search systems such as MEMEX ImageSpace, ImageCat, GeoParser, Solr, and ElasticSearch.
index.html- Haunted Places landing page with assignment context and team cards.html/d3-examples.html- Spring 2025 Haunted Places team gallery.teams/team_9/- Team 9 report, submitted TSV, generated local D3 preview pages, and derived CSV summaries.teams/team_14/- Team 14 report, readme, submitted subset TSV, generated local D3 preview pages, and derived CSV summaries.teams/haunted-viz.js- shared local D3 renderer for bar charts, pie charts, maps, scatterplots, and word-frequency views.data/haunted_places.csv- local course copy of the Haunted Places dataset.data/Haunted_House_States.xlsx- local course copy of the state-level haunted places data.data/us-states-outline.json- local map outline used by the sighting maps.images/haunted-places-banner.png- generated hero/banner image for the site.js/andcss/- vendored Bootstrap, jQuery, D3, and site CSS.
Team members:
- Sena London
- Kevin Sy
- Andrew Turangan
- Gideon Nazarian
- Anneliese Wilkins
- Austin Oliver
Team 9 explored the enriched Haunted Places data through language, named entities, geography, apparition categories, and investigative image/location tooling. Their report discusses five intended D3 views:
- Word Cloud - frequent words in sighting descriptions, used to reveal common language and contextual themes in the raw reports.
- Entity Bar Chart - named-entity frequencies from SpaCy output, with temporal entities highlighted as a dominant signal.
- State Aggregated Choropleth Map - state-level comparisons using joined socio-demographic, education, crime, mortality, and apparition-related features.
- Haunted Sightings Spike Map - geographic hot spots and city-level concentration patterns, including dense sighting regions such as Honolulu and Los Angeles.
- ApparitionTypes Stacked Bar Chart - apparition categories combined with evidence, witness, and time-of-day features.
Team 9 also experimented with MEMEX ImageSpace for image similarity and MEMEX GeoParser for location extraction. Their submitted zip included reports and v2_final.tsv; because it did not include a standalone deployed HTML/D3 bundle, this repository rebuilds the first published visualization cards from their submitted TSV using local D3.
Published local pages include:
teams/team_9/state_distribution.htmlteams/team_9/word_frequency.htmlteams/team_9/apparition_types.htmlteams/team_9/entity_counts.htmlteams/team_9/sighting_map.htmlteams/team_9/crime_scatter.html
Team members:
- Avery Fratto
- Hanieh Hosseinzadeh Zorofchi
- Madison
- Rui Chen
- Vartan Pashayan
- Yat Hei Brian Chan
Team 14 enhanced the Haunted Places data with features related to drug/alcohol abuse, moon phase, churches and religiosity, AI-generated captions, GeoParser output, Solr indexing, and ImageSpace exploration. Their report frames the project around several questions:
- How are haunted sightings distributed across the United States?
- Do some states carry a larger share of sightings?
- What is the role of religiosity and proximity to places of worship?
- How do moon phase, moon diameter, or moon distance relate to sighting frequency?
- How do semantic and fuzzy similarity scores correlate with witness count or binge-drinking features?
- What words appear frequently in haunted-sighting descriptions?
Team 14 contributions noted in their submitted readme:
- Rui Chen - MEMEX GeoParser and report writing.
- Avery - moon visualizations, map visualization, scatterplots, GitHub site, and report writing.
- Vartan - word cloud and report writing.
- Hanieh - sighting distribution and report writing.
- Madison - ImageSpace analysis and report writing.
- Brian - TSV/JSON ingestion into Apache Solr, Solr index packaging, and report writing.
Their original submission included an Observable/D3 export, Solr/ImageSpace/GeoParser materials, and a subset TSV. For the deployed site, the primary visualizations are rebuilt as local D3 pages to avoid remote runtime dependencies, CDN imports, mixed-content issues, or CORS problems.
Published local pages include:
teams/team_14/state_distribution.htmlteams/team_14/word_frequency.htmlteams/team_14/entity_counts.htmlteams/team_14/apparition_types.htmlteams/team_14/sighting_map.htmlteams/team_14/binge_similarity_scatter.html
This repository keeps the browser-facing pages static and self-contained. Aggregated CSV files under teams/team_*/data/ were derived from the teams' submitted TSVs so the D3 pages load quickly from a plain HTTP server.
Primary local source files:
teams/team_9/data/v2_final.tsvteams/team_14/data/FINAL_HW2_EXTRACT_DATASET_subset.tsvdata/haunted_places.csvdata/Haunted_House_States.xlsx
The map pages use data/us-states-outline.json and d3.geoAlbersUsa() to draw a local U.S. outline before plotting latitude/longitude points.
Branches:
master- source branchgh-pages- GitHub Pages branch
No build step is required. The site is plain HTML/CSS/JavaScript and can be served from any static web server.
Dependency policy:
- D3 is vendored locally in
js/d3.v6.min.js. - Bootstrap and jQuery are vendored locally.
- Primary visualization pages do not use remote JavaScript, CSS, fonts, or CDN-hosted D3.
- Relative paths are used so pages work under both
/locally and/haunted.usc.edu/onirds.usc.edu.
- Dr. Chris Mattmann - http://mattmann.ai
- DSCI 550 Spring 2025 student teams
- USC Information Retrieval and Data Science Group (IRDS) - http://irds.usc.edu/
- Tools and libraries: Apache Tika, Tika-Python, GeoTopicParser, SpaCy, D3.js, MEMEX GeoParser, MEMEX ImageSpace/ImageCat, Apache Solr, and ElasticSearch.
Apache License 2.0. See LICENSE.