fix: missing date column, relay attribution, and racer/alternate distinction (breaking)#17
fix: missing date column, relay attribution, and racer/alternate distinction (breaking)#17fsalum wants to merge 6 commits into
Conversation
The date column is omitted by MM2 2.0 exports and by some MM3/MM5 files when the operator/converter drops it. Previously raised ValueError and the wrapper consumer dropped the entire result. Now returns None for blank input; timing fields populate normally.
Mirrors the e2_parser fix for relay results. Same root cause, same shape; relay timing fields now survive a missing date column.
BREAKING CHANGE: Event no longer carries relay_team_id or relay_swim_team_code. They were a workaround for not having entry-level attribution; in practice they captured only the first F1 line's values and silently mislabeled every subsequent relay entry. The correct model is per-entry: events are competition slots, EventEntry objects are individual team relays within them. EventEntry gains relay_team_id and relay_swim_team_code (set by f1_parser). same_swimmer_entry_as is now type-aware: relays match on (event_number, team_id, team_code, seed_time, seed_course); individuals unchanged. Recommend major version bump for this change.
BREAKING CHANGE: EventEntry.swimmers is now a dict keyed by leg number (1..8) rather than a positional list. This faithfully represents the F3 line's (swimmer_id, leg_number) pairs and lets consumers cleanly distinguish racers (legs 1..N) from alternates (legs N+1..8). Previously the list-based form discarded leg numbers, conflated racers and alternates, and produced 8-swimmer 200 yd relay rows downstream when F3 listed 4 racers + 4 alternates. Library does not bake in the racers-vs-alternates heuristic — that's the consumer's domain knowledge.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
egelja
left a comment
There was a problem hiding this comment.
Thanks for the PR, great catches! Just one small comment about tests, but the code itself looks great.
If you don't mind, could you also add a couple of e2e integration tests with different MM versions? I didn't have access to that many MM files when writing this, which is why I missed a few things.
Also, out of curiosity, how are you using this package? I originally built it for myself and just put it out there in case it was useful to others, so it's always nice to hear how people are using it.
| For individuals the identity is the existing (swimmers, event_number, | ||
| seed_time, seed_course, converted_seed_time, converted_seed_time_course). | ||
| """ | ||
| if self.relay or other.relay: |
There was a problem hiding this comment.
Good catch!
I saw you added a test to make sure entries aren't collapsed. Could you also add one that calls this function directly? Just to make sure it doesn't break in the future
…er_entry_as tests Address review feedback on PR SwimComm#17: 1. End-to-end fixtures across MM generations (Bug 1, 2-A, 2-B in real files): - MM2 2.0Fg (2007 Pacific Committee R-W SC Meet) — blank dates + multi-team relay - MM3 3.0Ea (2011 CA CSSC Fall Mile) — blank dates on a different generation - MM5 8.0Fd (2025 WMPSSDL Championships) — F3 relay with alternates (legs 1..5) Fixtures are trimmed-down slices of public meet result files with swimmer PII (names, DOBs, USA-Swimming IDs) and coach contact info redacted. See tests/hy3/fixtures/README.md. 2. Direct unit tests for EventEntry.same_swimmer_entry_as covering relay identity (event + team_code + team_id + seed), individual identity (unchanged), and relay-vs-individual cases. 30 tests pass.
…er_entry_as tests Address review feedback on PR SwimComm#17: 1. End-to-end fixtures across MM generations (Bug 1, 2-A, 2-B in real files): - MM2 2.0Fg (2007 Pacific Committee R-W SC Meet) — blank dates + multi-team relay - MM3 3.0Ea (2011 CA CSSC Fall Mile) — blank dates on a different generation - MM5 8.0Fd (2025 WMPSSDL Championships) — F3 relay with alternates (legs 1..5) Fixtures are trimmed-down slices of public meet result files with swimmer PII (names, DOBs, USA-Swimming IDs) and coach contact info redacted. See tests/hy3/fixtures/README.md. 2. Direct unit tests for EventEntry.same_swimmer_entry_as covering relay identity (event + team_code + team_id + seed), individual identity (unchanged), and relay-vs-individual cases. 30 tests pass.
|
Thanks! Pushed
The fixtures are trimmed-down slices of publicly distributed meet result files. Swimmer PII (names, DOBs, USA-Swimming IDs) and coach contact info on D1/C2/C3 lines are replaced with same-width placeholders so the files remain parseable. Notes in To your question: I'm building a platform for swimmers, parents, and coaches — stats and historical data, meet analysis, and eventually training plans. Your library is what gets the raw Hy-Tek files into a usable shape; the audit that surfaced these bugs came from running it across thousands of SoCal meet files for that platform. |
Summary
Three coupled bugs in the hy3 parser, surfaced by a downstream audit of >5,000 SoCal Hy-Tek files.
Bug 1 — date-column crash (any MM version)
e2_parserandf2_parserparse the per-result swim date withdatetime.strptime(extract(line, N, 8), "%m%d%Y")unconditionally, before assigning timing fields to the entry. When the source file omits the date column, this raisesValueErrorand the entire result is lost (any caller wrapping in try/except silently drops the result; downstream sees null timing). Confirmed across MM2 2.0, MM3 3.0Ea, MM5 6.0Eb, MM5 8.0Dh in the audited corpus.Fix: strip and check before parsing; return
Noneon blank input.Bug 2-A — relay attribution at the wrong level (breaking)
relay_team_idandrelay_swim_team_codelive onEvent, butMeet.get_or_create_eventreturns existing events unchanged on subsequent F1 lines, so the event keeps the first relay's team and letter forever. The model implies "one team per event," which is wrong — championships routinely have A and B teams from multiple clubs in the same event.Fix: move the fields to
EventEntry(where they belong), set them inf1_parser, makesame_swimmer_entry_astype-aware (relay identity isevent_number + team_id + team_code + seed; individual identity unchanged). Drop the Event-level fields.Bug 2-B — racers and alternates conflated in F3 (breaking)
F3 lines list racers (legs 1..N) plus, in some files, alternates (legs N+1..8). The current parser puts all of them into
entry.swimmersas a positional list, losing leg numbers and producing nonsensical 8-swimmer 200 yd relay entries downstream.Fix:
EventEntry.swimmersbecomesdict[int, Swimmer]keyed by leg number, faithfully representing the F3 line. Library doesn't take a position on which legs are racers vs alternates — that's the consumer's domain knowledge.Why a major version bump
Bug 2-A removes
Event.relay_team_idandEvent.relay_swim_team_codeoutright. Keeping them forward as a backwards-compat shim would propagate the bug for any consumer that reads them. Recommend bumping to 2.0.0.Bug 2-B changes
EventEntry.swimmersfromlisttodict— a type change Python won't catch at runtime; consumers must update.Tests
tests/hy3/line_parsers/test_e_event_parsers.py::TestE2BlankDateColumntests/hy3/line_parsers/test_f_relay_parsers.py::TestF2BlankDateColumntests/hy3/line_parsers/test_f_relay_parsers.py::TestF1PerEntryAttribution(includes a same-seed-time collapse case)tests/hy3/line_parsers/test_f_relay_parsers.py::TestF3DictKeyedSwimmersFull suite: 16 passed.
Commits
fix(e2): tolerate blank date column (MM2 2.0 and version-agnostic)fix(f2): tolerate blank date column (relay variant of e2 fix)fix(schemas)!: move relay attribution from Event to EventEntryfix(schemas)!: change EventEntry.swimmers to dict[int, Swimmer]chore: bump version to 2.0.0