test: add e2e test for the full training pipeline with a tiny dataset

## Context

There is no automated test that exercises the full pipeline from DwC-A to trained model. The individual CLI commands have some test coverage, but integration issues (column mismatches, missing files between steps, incorrect shard patterns) are only caught by running the pipeline manually.

## Proposed Changes

Add an end-to-end test that runs the full species classifier pipeline with a tiny dataset:

- Use a small DwC-A fixture (or a subset of an existing one) with ~10-20 images across 3-5 species
- Run all pipeline steps: `fetch-images` -> `verify-images` -> `clean-dataset` -> `build_species_list.py` -> `split-dataset` -> `create-webdataset` -> `train-model`
- Train for only 1-3 epochs to keep runtime short
- Use the small-dataset config values: `MIN_INSTANCES=0`, `--val-frac 0.3`, `--test-frac 0.2` (already documented as commented-out alternatives in `scripts/train_species_classifier.sh`)
- Assert that key outputs exist and are valid: `category_map.json` has expected species, split CSVs are non-empty, webdataset tar files are created, model checkpoint is saved
- Add GitHub workflows for running the full e2e test locally and in the docker SLURM environment

This could run in CI (CPU-only, training will be slow but feasible for 1-3 epochs on a tiny dataset) or as a local smoke test.

## Related

- PR #69 (species classifier pipeline)
- `scripts/train_species_classifier.sh` contains the pipeline steps and small-dataset config

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add e2e test for the full training pipeline with a tiny dataset #72

Context

Proposed Changes

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test: add e2e test for the full training pipeline with a tiny dataset #72

Description

Context

Proposed Changes

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions