Oss readiness by JuhaoLiang1997 · Pull Request #51 · FreedomIntelligence/AccelMark

JuhaoLiang1997 · 2026-05-19T03:23:36Z

Summary

Type of change

Testing

# Commands used to verify

Checklist

I have read CONTRIBUTING.md
My change does not break existing result.json files (or I have explained the migration path)
If adding a new platform: runner inherits from BenchmarkRunner, produces valid result.json, includes a reference result
If changing the schema: validate_submission.py updated and all existing results still validate
If changing the leaderboard generator: leaderboard/generate.py produces correct output on existing results
I have updated relevant documentation

Related issues

Phase 1 — Legal & attribution - Align license: pyproject.toml + README badge → Apache-2.0 (matches LICENSE). - Add NOTICE summarising bundled third-party data and upstream terms. - Add License & attribution sections to datasets/README.md and each datasets/sharegpt_*_v1/README.md (CC BY 4.0, upstream link). - Add schema/accuracy_subset.README.md documenting the MMLU subset (MIT). Phase 2 — Contributor experience & validation - Fix doc drift in DEVELOPMENT.md, README.md, runners/README.md, suites/README.md, runners/template/runner.py (rename SUPPORTED_QUANTIZATIONS → SUPPORTED_QUANTIZATION_BACKENDS in *editable* files only; existing runner.py hashes untouched). - Add schema/suite.schema.json + runners/validate_suites.py and wire both into validate_pr.yml / generate_leaderboard.yml. - Add .github/ISSUE_TEMPLATE/new_suite.md for community suite proposals. - CONTRIBUTING.md: add local leaderboard preview instructions. - .gitignore: ignore node_modules/, .cursor/, .aider*, .envrc, .direnv/. Phase 3 — Code quality & CI - runners/benchmark_runner.py: * Remove dead code (stub format_prompt, dead spec-decoding branch, redundant acc_result init, duplicated _build_result_json block). * Extract helpers (_prepare_load_context, _score_accuracy_questions, _write_accuracy_artifacts) shared between accuracy scenarios. * Replace inference dispatch if/elif ladder with _SCENARIO_REGISTRY (ScenarioSpec dataclass: inference_kind, use_async, merge_key…). * _MERGE_SCENARIO_KEYS now derived from the registry. Net −111 lines. - leaderboard: split SUITE_META into leaderboard/site/assets/data/suite-meta.js, data.js re-exports it (data.js 1010 → 800 lines). - validate_pr.yml: add python-tests job (serve + openclaw_skill pytest). - pyproject.toml: setuptools.packages.find now lists loadgen/runners/ serve/openclaw_skill explicitly and excludes tests*. README hero & citation - Embed docs/assets/framework-overview.png under nav links and docs/assets/chip-cloud.png in a new "Currently on the leaderboard" section. - Expand BibTeX author list in the Citation section. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions · 2026-05-19T03:23:53Z

✅ AccelMark Validation: All submissions valid

See the workflow run for details.

- serve/server.py: import uvicorn lazily inside start_server() so that importing the module (e.g. from tests, or to expose the ASGI app) does not require uvicorn to be installed. - validate_pr.yml: add numpy to the python-tests install list — pulled in transitively by loadgen, needed once serve.server imports runners.benchmark_runner during test collection. Co-authored-by: Cursor <cursoragent@cursor.com>

… NotImplementedError Pre-existing breakage in serve/tests/test_server.py — never caught because python-tests was not wired into CI until this branch. - test_server.py imports TokenStreamingMockRunner from mock_runner, but the class did not exist (4 ImportError collection errors). - test_fallback_when_no_token_stream expects MockRunner to *not* implement true token streaming so the server's single-chunk fallback path runs. MockRunner used to yield word-by-word, so the test asserted len(content_chunks) == 1 but got more (1 AssertionError). Fix to match the RunnerProtocol contract (runners/protocol.py:67) — true token streaming is optional, runners signal "not supported" by raising NotImplementedError: - MockRunner.inference_fn_token_stream now raises NotImplementedError (with a trailing unreachable yield so the function shape stays an async generator, matching the protocol). - Add TokenStreamingMockRunner(MockRunner) that overrides the method to yield word-by-word with a small async delay — used by the four tests that exercise the multi-chunk SSE path. Co-authored-by: Cursor <cursoragent@cursor.com>

…not trailing test_token_stream_reassembles_correctly concatenates every content delta and expects exact equality with the response_text. Yielding "word + ' '" tacks an extra trailing space onto the reassembled string, so the assertion failed: got: 'Hello from token stream. ' expected: 'Hello from token stream.' Switch to a leading-space separator (space before every word *after* the first). Concatenation now round-trips exactly, and the shape matches how real BPE / SentencePiece tokenizers stream pieces (the first token has no preceding space; subsequent ones do). Co-authored-by: Cursor <cursoragent@cursor.com>

JuhaoLiang1997 and others added 2 commits May 19, 2026 11:22

docs(readme): update citation title

7bdd378

Co-authored-by: Cursor <cursoragent@cursor.com>

JuhaoLiang1997 and others added 3 commits May 19, 2026 11:27

JuhaoLiang1997 merged commit 6bd3bf1 into main May 19, 2026
5 checks passed

JuhaoLiang1997 deleted the oss-readiness branch May 19, 2026 04:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Oss readiness#51

Oss readiness#51
JuhaoLiang1997 merged 5 commits into
mainfrom
oss-readiness

JuhaoLiang1997 commented May 19, 2026

Uh oh!

github-actions Bot commented May 19, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JuhaoLiang1997 commented May 19, 2026

Summary

Type of change

Testing

Checklist

Related issues

Uh oh!

github-actions Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ AccelMark Validation: All submissions valid

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions Bot commented May 19, 2026 •

edited

Loading