Skip to content

Latest commit

 

History

History
30 lines (19 loc) · 782 Bytes

File metadata and controls

30 lines (19 loc) · 782 Bytes

Discogs Data Parser

This project processes large Discogs XML release dumps into structured JSON files.

Features

  • Extracts essential release data (title, artists, labels, genres, tracks, etc.)
  • Handles malformed or incomplete records gracefully
  • Logs rejected records with reasons
  • Utilizes multiprocessing for efficient processing

Usage

Run the script with:

python prepare9D.py

Ensure that the Discogs XML file (e.g., discogs_YYYYMMDD_releases.xml.gz) is present in the script directory.

Output

  • Processed JSON files containing essential release data.
  • A log file rejected_log.txt capturing rejection reasons.
  • A rejected_discogs_data folder with samples of rejected records.

License

This project is licensed under the MIT License.