What happened? Was there any interest or the OCR side?

I'm curious as to what happened with this. It looks like you did an excellent job of architecture and documentation.

## Update

I got the main Tracks database down to 800 KB (270 KB gzipped) using CSV as

```tsv
ID  Title   Tags    Posted  Length  Bitrate FileName    FileSize    YouTube
```

I think I might do another 2 like that for Games + Songs and Artists / Composers and still keep the total data under 1mb, and provide a small snipped of JavaScript that fetches the 3 files and links them by numerical ID on the client side along with hard-coded numeric ids for Tags and the list of Mirrors.

## Implementation Ideas / Notes

(for people who end up on this repo like I did)

Since there's less than 10,000 remixes + albums, and it's highly unlikely that there will be 100,000 items within our lifetimes, it would probably be simpler to
- ship the entire database as a single JSON file for GET / filter / etc with per-IP rate limit (to encourage proper caching)
  - could be brute-force optimally precompressed w/ gzip, zstd, and brotli
  - could also use single-digit ids for relationships \
    (like a typical db rather than a typical api, though I don't like this idea as much)
  - possibly use permanent, browser-level caching for ID ranges e.g. 1-2000, 2001-4000, and dynamic for newer IDs
  - could also use multiple CSVs rather than a single JSON... maybe
- POST by ID for atomic updates
- have a GET that only hands back updates since a last_updated_at parameter

## Scraping

I'm going to give this a shot myself with a little help from Grok to save on the tedious HTML parsing.

- <https://ocremix.org/remixes/?&offset=0&sort=datedesc> gives
  - the approximate number of OCR ids
  - the actual last OCR id
  - offset increments by 30 in sorted mode
  - in non-sorted mode there is a duplicate row for the Game, followed by remixes for the game
- Remixes
  - start at https://ocremix.org/remix/OCR00001
  - end at https://ocremix.org/remix/OCR04851
  - about 5% no longer exist (maybe pulled for copyright or never finished submission?)
  - all seem to have md5sum hash ids, and direct download links which match the torrent names
- Albums
  - many IDs between https://ocremix.org/album/1 and https://ocremix.org/album/100 _happen_ to be valid
  - other IDs are random (?)
  - must crawl <https://ocremix.org/albums/?&offset=0&sort=datedesc> to get all album ids
  - some have direct downloads, others have dedicated pages
  - **!!** not all remixes from /albums appear in the /remixes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What happened? Was there any interest or the OCR side? #5

Update

Implementation Ideas / Notes

Scraping

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

What happened? Was there any interest or the OCR side? #5

Description

Update

Implementation Ideas / Notes

Scraping

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions