Skip to content

Add weight-coverage walker to converter test suite#527

Open
jlamypoirier wants to merge 1 commit into
mainfrom
jlp_weight_coverage_walker
Open

Add weight-coverage walker to converter test suite#527
jlamypoirier wants to merge 1 commit into
mainfrom
jlp_weight_coverage_walker

Conversation

@jlamypoirier
Copy link
Copy Markdown
Collaborator

Summary

  • Extends tests/models/test_converters.py with a per-fixture weight-coverage check: for each ModelTestingConfig with a checkpoint_format, materialise the Fast-LLM base model and assert every named_parameters() entry is consumed by some leaf WeightConverter emitted by base_model_converter_class.get_converters(config).
  • Tied parameters (e.g. MTP head output weights sharing a single LM-head weight) count as covered when any group member has a converter, matching the export-time behaviour where one shared weight is serialised once.
  • gemma4 is xfailed (strict) against pre-existing coverage gaps in its converter declarations; the xfail reason describes the three specific gaps.

Why

The strict HF-side coverage check landed by #523 catches an HF dict carrying a key with no Fast-LLM consumer. There was no equivalent symmetric check on the Fast-LLM side — a model parameter without a converter is silently skipped by ExternalStateDictCheckpointHandler._convert_state_dict, so its trained value is lost on HF export. This test closes that gap statically.

Implementation notes

  • Uses bm.base_model_class(bm, DistributedConfig()) — CPU-only, ParameterMeta weights, no distributed setup or NCCL. Iterates named_parameters() for the canonical Fast-LLM name set.
  • Tied-group resolution uses object identity (id(parameter)) since ParameterMeta.tensor_name isn't assigned until stage setup, which we deliberately skip.
  • Walker is parametrised over every fixture with a checkpoint format (not just one per format) so format-specific variations get coverage. Cheap — adds ~36 tests, all under 0.7s combined locally.

Test plan

  • Local: pytest -v tests/models/test_converters.py → 36 pass, 1 xfail (gemma4)
  • Cluster: full tests/models/test_converters.py + test_checkpoint.py + test_hf_roundtrip.py sweep — converters tests pass + xfail as designed
  • Pre-existing failures (test_conversion[*] torchscript coverage and test_hf_roundtrip Gemma4 import) verified unrelated to this PR (diff is +71 lines in test_converters.py only)

🤖 Generated with Claude Code

For each test fixture with a checkpoint format, materialise the Fast-LLM
base model (CPU, ParameterMeta — no distributed setup) and assert every
parameter is consumed by some leaf WeightConverter emitted by
base_model_converter_class.get_converters(config). Runtime-tied parameters
count as covered when any group member has a converter, matching export
behaviour. Gemma4 is xfailed against pre-existing coverage gaps in its
declarations.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant