Oss readiness#51
Merged
Merged
Conversation
Phase 1 — Legal & attribution
- Align license: pyproject.toml + README badge → Apache-2.0 (matches LICENSE).
- Add NOTICE summarising bundled third-party data and upstream terms.
- Add License & attribution sections to datasets/README.md and each
datasets/sharegpt_*_v1/README.md (CC BY 4.0, upstream link).
- Add schema/accuracy_subset.README.md documenting the MMLU subset (MIT).
Phase 2 — Contributor experience & validation
- Fix doc drift in DEVELOPMENT.md, README.md, runners/README.md,
suites/README.md, runners/template/runner.py (rename
SUPPORTED_QUANTIZATIONS → SUPPORTED_QUANTIZATION_BACKENDS in
*editable* files only; existing runner.py hashes untouched).
- Add schema/suite.schema.json + runners/validate_suites.py and wire
both into validate_pr.yml / generate_leaderboard.yml.
- Add .github/ISSUE_TEMPLATE/new_suite.md for community suite proposals.
- CONTRIBUTING.md: add local leaderboard preview instructions.
- .gitignore: ignore node_modules/, .cursor/, .aider*, .envrc, .direnv/.
Phase 3 — Code quality & CI
- runners/benchmark_runner.py:
* Remove dead code (stub format_prompt, dead spec-decoding branch,
redundant acc_result init, duplicated _build_result_json block).
* Extract helpers (_prepare_load_context, _score_accuracy_questions,
_write_accuracy_artifacts) shared between accuracy scenarios.
* Replace inference dispatch if/elif ladder with _SCENARIO_REGISTRY
(ScenarioSpec dataclass: inference_kind, use_async, merge_key…).
* _MERGE_SCENARIO_KEYS now derived from the registry. Net −111 lines.
- leaderboard: split SUITE_META into
leaderboard/site/assets/data/suite-meta.js, data.js re-exports it
(data.js 1010 → 800 lines).
- validate_pr.yml: add python-tests job (serve + openclaw_skill pytest).
- pyproject.toml: setuptools.packages.find now lists loadgen/runners/
serve/openclaw_skill explicitly and excludes tests*.
README hero & citation
- Embed docs/assets/framework-overview.png under nav links and
docs/assets/chip-cloud.png in a new "Currently on the leaderboard"
section.
- Expand BibTeX author list in the Citation section.
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
✅ AccelMark Validation: All submissions validSee the workflow run for details. |
- serve/server.py: import uvicorn lazily inside start_server() so that importing the module (e.g. from tests, or to expose the ASGI app) does not require uvicorn to be installed. - validate_pr.yml: add numpy to the python-tests install list — pulled in transitively by loadgen, needed once serve.server imports runners.benchmark_runner during test collection. Co-authored-by: Cursor <cursoragent@cursor.com>
… NotImplementedError Pre-existing breakage in serve/tests/test_server.py — never caught because python-tests was not wired into CI until this branch. - test_server.py imports TokenStreamingMockRunner from mock_runner, but the class did not exist (4 ImportError collection errors). - test_fallback_when_no_token_stream expects MockRunner to *not* implement true token streaming so the server's single-chunk fallback path runs. MockRunner used to yield word-by-word, so the test asserted len(content_chunks) == 1 but got more (1 AssertionError). Fix to match the RunnerProtocol contract (runners/protocol.py:67) — true token streaming is optional, runners signal "not supported" by raising NotImplementedError: - MockRunner.inference_fn_token_stream now raises NotImplementedError (with a trailing unreachable yield so the function shape stays an async generator, matching the protocol). - Add TokenStreamingMockRunner(MockRunner) that overrides the method to yield word-by-word with a small async delay — used by the four tests that exercise the multi-chunk SSE path. Co-authored-by: Cursor <cursoragent@cursor.com>
…not trailing
test_token_stream_reassembles_correctly concatenates every content
delta and expects exact equality with the response_text. Yielding
"word + ' '" tacks an extra trailing space onto the reassembled string,
so the assertion failed:
got: 'Hello from token stream. '
expected: 'Hello from token stream.'
Switch to a leading-space separator (space before every word *after*
the first). Concatenation now round-trips exactly, and the shape
matches how real BPE / SentencePiece tokenizers stream pieces (the
first token has no preceding space; subsequent ones do).
Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Type of change
Testing
# Commands used to verifyChecklist
result.jsonfiles (or I have explained the migration path)BenchmarkRunner, produces validresult.json, includes a reference resultvalidate_submission.pyupdated and all existing results still validateleaderboard/generate.pyproduces correct output on existing resultsRelated issues