Skip to content

Conversation

@planetf1
Copy link
Contributor

@planetf1 planetf1 commented Jan 23, 2026

Add Code Coverage Tracking

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Adds pytest-cov for code coverage tracking to improve test quality visibility. Coverage runs automatically for all test invocations (local and CI).

Changes

  • Added pytest-cov>=6.0.0 to dev dependencies
  • Configured coverage for mellea and cli packages in pyproject.toml
  • Enabled coverage by default via pytest addopts
  • Generates both terminal and HTML reports automatically

Usage

# Coverage runs automatically
uv run pytest test -v

# View HTML report
open htmlcov/index.html

# Disable coverage if needed
uv run pytest test --no-cov

Testing

  • Tests added to the respective file if code was changed (N/A - configuration only)
  • New code has 100% coverage if code as added (N/A - configuration only)
  • Ensure existing tests and github automation passes

Fixes #352

Status

NOTE: currently testing, checking output, and checking impact in timings (saw a slow test run). Will update from draft when ready

@github-actions
Copy link
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Jan 23, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

- Add pytest-cov to dev dependencies
- Configure coverage in pyproject.toml
- Enable coverage by default for all test runs
@planetf1 planetf1 force-pushed the feat/add-code-coverage-clean branch from c2e2a6a to 8be3e41 Compare January 23, 2026 18:18
@planetf1
Copy link
Contributor Author

Code Coverage Summary

Generated: 2026-01-23
Overall Coverage: 54.25% (3,481 / 6,417 statements)
Test Results: 218 passed, 59 skipped, 2 xpassed, 109 warnings
Runtime: 22m 8s


Executive Summary

Coverage tracking successfully implemented with pytest-cov. Baseline established at 54.25% overall coverage across mellea and cli packages.

Key Findings

Strong Coverage (>80%)

  • Core modules: base.py (86.5%), backend.py (90%), requirement.py (100%)
  • Formatters: template_formatter.py (91.7%), chat_formatter.py (100%)
  • Backends: watsonx.py (85.9%), litellm.py (81.2%), ollama.py (79.8%)
  • Components: mify.py (95.2%), intrinsic/rag.py (95.4%)
  • Sampling: majority_voting.py (95.4%), sofai.py (79.5%)

⚠️ Needs Attention (<50%)

  • CLI commands: alora/, decompose/, eval/ (0% - not tested)
  • Backends: vllm.py (20.3%), tools.py (43.1%), huggingface.py (41.1%)
  • Components: simple.py (30.3%), genslot.py (67.4%)
  • Sampling: budget_forcing.py (21.1%), budget_forcing_alg.py (2.6%)
  • Requirements: tool_reqs.py (20.5%), guardian.py (0%)

Coverage by Component

1. Core Framework (86.5% avg)

Module Coverage Missing Priority
core/base.py 86.5% 42/311 Medium
core/backend.py 90.0% 4/40 Low
core/requirement.py 100% 0/61
core/sampling.py 87.2% 5/39 Low
core/utils.py 83.1% 11/65 Low

Analysis: Core framework well-tested. Missing coverage mostly edge cases and error paths.

2. Backends (58.7% avg)

Backend Coverage Missing Priority
watsonx.py 85.9% 30/213 Low
litellm.py 81.2% 43/229 Low
ollama.py 79.8% 51/252 Medium
openai.py 53.4% 170/365 High
huggingface.py 41.1% 274/465 High
vllm.py 20.3% 153/192 Critical
tools.py 43.1% 86/151 High
utils.py 27.0% 27/37 High

Analysis: Production backends (Ollama, LiteLLM, Watsonx) well-tested. HuggingFace and vLLM need significant test expansion.

3. Components (72.8% avg)

Component Coverage Missing Priority
mify.py 95.2% 6/125
intrinsic/rag.py 95.4% 2/43
richdocument.py 89.3% 9/84 Low
mobject.py 83.1% 10/59 Low
instruction.py 81.8% 12/66 Low
chat.py 77.1% 16/70 Medium
genslot.py 67.4% 77/236 High
simple.py 30.3% 23/33 High
unit_test_eval.py 0% 71/71 Medium

Analysis: Core components well-tested. genslot.py needs more edge case coverage. simple.py and unit_test_eval.py undertested.

4. Requirements (61.5% avg)

Module Coverage Missing Priority
python_reqs.py 92.3% 5/65
requirement.py 64.4% 16/45 Medium
md.py 62.2% 17/45 Medium
tool_reqs.py 20.5% 35/44 High
guardian.py 0% 155/155 Critical

Analysis: Python requirements well-tested. Guardian safety module completely untested (0%).

5. Sampling (60.8% avg)

Module Coverage Missing Priority
majority_voting.py 95.4% 4/86
base.py 85.7% 14/98 Low
sofai.py 79.5% 41/200 Medium
budget_forcing.py 21.1% 56/71 High
budget_forcing_alg.py 2.6% 74/76 Critical

Analysis: Majority voting excellent. Budget forcing algorithms critically undertested.

6. CLI Commands (0% avg)

Module Coverage Missing Priority
alora/* 0% 94/94 Low*
decompose/* 0% 1,000+/1,000+ Low*
eval/* 0% 165/165 Low*
m.py 0% 12/12 Low*

Analysis: CLI commands not covered by unit tests. *Low priority as these are integration/E2E features tested manually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: Add code coverage

1 participant