QualBench — CI for AI-Generated Code

AI Cost Tracking

This project uses AI-generated code. Total cost: $0.1500 with 1 AI commits.

Generated on 2026-04-04 using openrouter/qwen/qwen3-coder-next

Correct code is not the same as mergeable code. eslint + code review, but for AI. Add to your pipeline in 2 minutes.

60 seconds to your first score

pip install qualbench
qualbench quickstart

No config, no API keys. QualBench evaluates your current diff and prints a Quality Score.

Add to CI in 2 minutes

# .github/workflows/qualbench.yml
name: QualBench
on: [pull_request]
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: semcod/qualbench-action@v1
        with:
          tool: prollama
          fail_on_score: 70

Every AI-generated PR gets a quality review comment. Set fail_on_score and the pipeline fails if quality is below your threshold.

🧠 QualBench Review

Quality Score: 78/100

  ❌ Complexity increased (+12%)
  ⚠ Security: 1 new medium-severity finding
  ✔ Tests pass, no regressions

Verdict: needs_review

CI/CD Examples

GitHub Action (recommended)

# .github/workflows/qualbench.yml
name: QualBench
on: [pull_request]
jobs:
  quality-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: semcod/qualbench-action@v1
        with:
          tool: prollama
          fail_on_score: 70

GitLab CI

# .gitlab-ci.yml
qualbench:
  stage: test
  image: python:3.12-slim
  before_script:
    - pip install qualbench
  script:
    - qualbench run --tool prollama --json --fail-on-score 70
  only:
    - merge_requests

Azure DevOps

# azure-pipelines.yml
steps:
  - task: UsePythonVersion@0
    inputs:
      versionSpec: '3.12'
  - script: |
      pip install qualbench
      qualbench run --tool prollama --json --fail-on-score 70
    displayName: 'QualBench Quality Check'

Jenkins

// Jenkinsfile
stage('Quality Check') {
    steps {
        sh '''
            pip install qualbench
            qualbench run --tool prollama --fail-on-score 70
        '''
    }
}

CircleCI

# .circleci/config.yml
version: 2.1
jobs:
  quality:
    docker:
      - image: python:3.12-slim
    steps:
      - checkout
      - run: pip install qualbench
      - run: qualbench run --tool prollama --fail-on-score 70
workflows:
  quality-check:
    jobs:
      - quality

The problem

AI coding tools resolve 70–80% of benchmark tasks. But most AI-generated PRs are not mergeable without human fixes. Every existing benchmark asks "do tests pass?" — nobody asks "would a senior developer approve this PR?"

Six dimensions of production readiness

Dimension	What it measures	Weight
Correctness	All tests pass, no regressions	25%
Mergeability	Would a senior dev merge this? (1–5)	25%
Security	New vulnerabilities introduced	15%
Code quality	Complexity delta, dead code	15%
Iterations	Attempts to reach acceptable output	10%
Cost efficiency	USD per successful patch	10%

Verdicts: ready_to_merge (≥85), needs_review (65–84), not_merge_ready (<65).

CLI

qualbench run --tool prollama          # score current diff
qualbench run --tool prollama --json   # portable JSON output
qualbench run --mode cheap             # lowest-cost models
qualbench quickstart                   # first score in 60 seconds
qualbench compare my_tool              # vs leaderboard
qualbench info                         # dataset summary
qualbench doctor                       # check dependencies

One portable format everywhere

CLI, API, GitHub Action — same JSON schema. See docs/schema.md.

Adding your tool

cp runners/template.py runners/my_tool.py
# Implement run() → return portable schema
qualbench run --tool my_tool
# Submit PR with results

License

Licensed under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.data		.data
.github/workflows		.github/workflows
.idea		.idea
.pyqual		.pyqual
action		action
articles		articles
content		content
dataset		dataset
docs		docs
project		project
qualbench		qualbench
reviews		reviews
runners		runners
scripts		scripts
tests		tests
www		www
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO.md		TODO.md
VERSION		VERSION
goal.yaml		goal.yaml
planfile.yaml		planfile.yaml
prefact.yaml		prefact.yaml
project.sh		project.sh
pyproject.toml		pyproject.toml
pyqual.yaml		pyqual.yaml
server.py		server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QualBench — CI for AI-Generated Code

AI Cost Tracking

60 seconds to your first score

Add to CI in 2 minutes

CI/CD Examples

GitHub Action (recommended)

GitLab CI

Azure DevOps

Jenkins

CircleCI

The problem

Six dimensions of production readiness

CLI

One portable format everywhere

Adding your tool

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

QualBench — CI for AI-Generated Code

AI Cost Tracking

60 seconds to your first score

Add to CI in 2 minutes

CI/CD Examples

GitHub Action (recommended)

GitLab CI

Azure DevOps

Jenkins

CircleCI

The problem

Six dimensions of production readiness

CLI

One portable format everywhere

Adding your tool

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages