Skip to content

fix: missing date column, relay attribution, and racer/alternate distinction (breaking)#17

Open
fsalum wants to merge 6 commits into
SwimComm:masterfrom
fsalum:fix/issue-28
Open

fix: missing date column, relay attribution, and racer/alternate distinction (breaking)#17
fsalum wants to merge 6 commits into
SwimComm:masterfrom
fsalum:fix/issue-28

Conversation

@fsalum
Copy link
Copy Markdown

@fsalum fsalum commented May 10, 2026

Summary

Three coupled bugs in the hy3 parser, surfaced by a downstream audit of >5,000 SoCal Hy-Tek files.

Bug 1 — date-column crash (any MM version)

e2_parser and f2_parser parse the per-result swim date with datetime.strptime(extract(line, N, 8), "%m%d%Y") unconditionally, before assigning timing fields to the entry. When the source file omits the date column, this raises ValueError and the entire result is lost (any caller wrapping in try/except silently drops the result; downstream sees null timing). Confirmed across MM2 2.0, MM3 3.0Ea, MM5 6.0Eb, MM5 8.0Dh in the audited corpus.

Fix: strip and check before parsing; return None on blank input.

Bug 2-A — relay attribution at the wrong level (breaking)

relay_team_id and relay_swim_team_code live on Event, but Meet.get_or_create_event returns existing events unchanged on subsequent F1 lines, so the event keeps the first relay's team and letter forever. The model implies "one team per event," which is wrong — championships routinely have A and B teams from multiple clubs in the same event.

Fix: move the fields to EventEntry (where they belong), set them in f1_parser, make same_swimmer_entry_as type-aware (relay identity is event_number + team_id + team_code + seed; individual identity unchanged). Drop the Event-level fields.

Bug 2-B — racers and alternates conflated in F3 (breaking)

F3 lines list racers (legs 1..N) plus, in some files, alternates (legs N+1..8). The current parser puts all of them into entry.swimmers as a positional list, losing leg numbers and producing nonsensical 8-swimmer 200 yd relay entries downstream.

Fix: EventEntry.swimmers becomes dict[int, Swimmer] keyed by leg number, faithfully representing the F3 line. Library doesn't take a position on which legs are racers vs alternates — that's the consumer's domain knowledge.

Why a major version bump

Bug 2-A removes Event.relay_team_id and Event.relay_swim_team_code outright. Keeping them forward as a backwards-compat shim would propagate the bug for any consumer that reads them. Recommend bumping to 2.0.0.

Bug 2-B changes EventEntry.swimmers from list to dict — a type change Python won't catch at runtime; consumers must update.

Tests

  • tests/hy3/line_parsers/test_e_event_parsers.py::TestE2BlankDateColumn
  • tests/hy3/line_parsers/test_f_relay_parsers.py::TestF2BlankDateColumn
  • tests/hy3/line_parsers/test_f_relay_parsers.py::TestF1PerEntryAttribution (includes a same-seed-time collapse case)
  • tests/hy3/line_parsers/test_f_relay_parsers.py::TestF3DictKeyedSwimmers

Full suite: 16 passed.

Commits

  1. fix(e2): tolerate blank date column (MM2 2.0 and version-agnostic)
  2. fix(f2): tolerate blank date column (relay variant of e2 fix)
  3. fix(schemas)!: move relay attribution from Event to EventEntry
  4. fix(schemas)!: change EventEntry.swimmers to dict[int, Swimmer]
  5. chore: bump version to 2.0.0

fsalum and others added 5 commits May 10, 2026 05:47
The date column is omitted by MM2 2.0 exports and by some MM3/MM5 files
when the operator/converter drops it. Previously raised ValueError and
the wrapper consumer dropped the entire result. Now returns None for
blank input; timing fields populate normally.
Mirrors the e2_parser fix for relay results. Same root cause, same
shape; relay timing fields now survive a missing date column.
BREAKING CHANGE: Event no longer carries relay_team_id or
relay_swim_team_code. They were a workaround for not having entry-level
attribution; in practice they captured only the first F1 line's values
and silently mislabeled every subsequent relay entry. The correct model
is per-entry: events are competition slots, EventEntry objects are
individual team relays within them.

EventEntry gains relay_team_id and relay_swim_team_code (set by
f1_parser). same_swimmer_entry_as is now type-aware: relays match on
(event_number, team_id, team_code, seed_time, seed_course); individuals
unchanged.

Recommend major version bump for this change.
BREAKING CHANGE: EventEntry.swimmers is now a dict keyed by leg number
(1..8) rather than a positional list. This faithfully represents the F3
line's (swimmer_id, leg_number) pairs and lets consumers cleanly
distinguish racers (legs 1..N) from alternates (legs N+1..8).

Previously the list-based form discarded leg numbers, conflated racers
and alternates, and produced 8-swimmer 200 yd relay rows downstream
when F3 listed 4 racers + 4 alternates.

Library does not bake in the racers-vs-alternates heuristic — that's
the consumer's domain knowledge.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@fsalum fsalum marked this pull request as ready for review May 10, 2026 20:00
Copy link
Copy Markdown
Contributor

@egelja egelja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, great catches! Just one small comment about tests, but the code itself looks great.

If you don't mind, could you also add a couple of e2e integration tests with different MM versions? I didn't have access to that many MM files when writing this, which is why I missed a few things.

Also, out of curiosity, how are you using this package? I originally built it for myself and just put it out there in case it was useful to others, so it's always nice to hear how people are using it.

For individuals the identity is the existing (swimmers, event_number,
seed_time, seed_course, converted_seed_time, converted_seed_time_course).
"""
if self.relay or other.relay:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

I saw you added a test to make sure entries aren't collapsed. Could you also add one that calls this function directly? Just to make sure it doesn't break in the future

…er_entry_as tests

Address review feedback on PR SwimComm#17:

1. End-to-end fixtures across MM generations (Bug 1, 2-A, 2-B in real files):
   - MM2 2.0Fg  (2007 Pacific Committee R-W SC Meet)  — blank dates + multi-team relay
   - MM3 3.0Ea  (2011 CA CSSC Fall Mile)              — blank dates on a different generation
   - MM5 8.0Fd  (2025 WMPSSDL Championships)          — F3 relay with alternates (legs 1..5)

   Fixtures are trimmed-down slices of public meet result files with swimmer
   PII (names, DOBs, USA-Swimming IDs) and coach contact info redacted.
   See tests/hy3/fixtures/README.md.

2. Direct unit tests for EventEntry.same_swimmer_entry_as covering relay
   identity (event + team_code + team_id + seed), individual identity
   (unchanged), and relay-vs-individual cases.

30 tests pass.
fsalum added a commit to fsalum/hytek-parser that referenced this pull request May 12, 2026
…er_entry_as tests

Address review feedback on PR SwimComm#17:

1. End-to-end fixtures across MM generations (Bug 1, 2-A, 2-B in real files):
   - MM2 2.0Fg  (2007 Pacific Committee R-W SC Meet)  — blank dates + multi-team relay
   - MM3 3.0Ea  (2011 CA CSSC Fall Mile)              — blank dates on a different generation
   - MM5 8.0Fd  (2025 WMPSSDL Championships)          — F3 relay with alternates (legs 1..5)

   Fixtures are trimmed-down slices of public meet result files with swimmer
   PII (names, DOBs, USA-Swimming IDs) and coach contact info redacted.
   See tests/hy3/fixtures/README.md.

2. Direct unit tests for EventEntry.same_swimmer_entry_as covering relay
   identity (event + team_code + team_id + seed), individual identity
   (unchanged), and relay-vs-individual cases.

30 tests pass.
@fsalum
Copy link
Copy Markdown
Author

fsalum commented May 12, 2026

Thanks! Pushed 452db02 addressing both points:

  1. Direct unit tests for EventEntry.same_swimmer_entry_as in tests/hy3/line_parsers/test_schemas.py — covers relay identity (same/different team code, same/different letter, identity ignores swimmers dict so prelim and finals merge correctly), individual identity, and the relay-vs-individual case.

  2. End-to-end integration tests in tests/hy3/test_integration.py parsing real MM fixtures across three generations:

    • MM2 2.0Fg (2007 Pacific Committee R-W SC Meet) — blank E2/F2 date column + multi-team relay (event 21 has 6 entries across BRSC, LAC, ROSE A/B, WEST A/B)
    • MM3 3.0Ea (2011 CA CSSC Fall Mile) — every E2 line has a blank date column; finals times still populated
    • MM5 8.0Fd (2025 WMPSSDL Championships) — F3 alternates: relay entries with 5 swimmers keyed by leg number (legs 1..5)

The fixtures are trimmed-down slices of publicly distributed meet result files. Swimmer PII (names, DOBs, USA-Swimming IDs) and coach contact info on D1/C2/C3 lines are replaced with same-width placeholders so the files remain parseable. Notes in tests/hy3/fixtures/README.md. Combined fixture size is ~128 KB; full suite runs in <0.1s, 30/30 pass.

To your question: I'm building a platform for swimmers, parents, and coaches — stats and historical data, meet analysis, and eventually training plans. Your library is what gets the raw Hy-Tek files into a usable shape; the audit that surfaced these bugs came from running it across thousands of SoCal meet files for that platform.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants