Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 0 |
| Duplication | 0 |
TIP This summary will be updated as you push new changes. Give us feedback
15f808b to
790b58a
Compare
790b58a to
b72e644
Compare
…n errors in test collection
b72e644 to
2202c07
Compare
437aac1 to
298914e
Compare
298914e to
026073c
Compare
gsprochette
left a comment
There was a problem hiding this comment.
Thanks for this absolute gold PR. Already super good, I left some comments for little improvements here and there, almost ready already
|
|
||
| - shell: bash | ||
| run: uv sync --extra dev --extra lmharness --extra vllm | ||
| run: uv sync --extra dev --extra lmharness |
There was a problem hiding this comment.
why are we still installing lmharness? Shouldn't it be isolated as well?
.github/workflows/tests.yaml
Outdated
| run: | | ||
| echo "Running tests with up to 3 reruns on failure using $PYTEST_WORKERS workers..." | ||
| uv run pytest -n $PYTEST_WORKERS -m "not (slow or style or high_cpu or cuda or distributed)" --reruns 3 --reruns-delay 10 --maxfail=1 | ||
| uv run pytest -n $PYTEST_WORKERS -m "cpu and not slow and not requires_intel" --reruns 3 --reruns-delay 10 --maxfail=1 |
There was a problem hiding this comment.
I think this will run the style tests a second time, since they are already running during the linting job line 56
|
|
||
|
|
||
| @pytest.mark.slow | ||
| @pytest.mark.requires_awq |
There was a problem hiding this comment.
is it not slow anymore?
There was a problem hiding this comment.
i thought slow was a wrong tag here, as awq is not time intensive like gptq for instance, but how do you feel?
tests/algorithms/testers/upscale.py
Outdated
|
|
||
| @pytest.mark.cuda | ||
| # Takes too long to run on CPU, so we mark it as slow | ||
| @pytest.mark.slow |
There was a problem hiding this comment.
You could also do something like
@classmethod
def compatible_devices(cls) -> list[str]: # type: ignore[override]
"""Get the devices compatible with this algorithm, except CPU (too slow)."""
return [d for d in super().compatible_devices() if d != "cpu"]
if the intent is to not run this test on cpu.
I do agree that the cuda mark is not the correct tool because it will just create a cpu test eitherway, and it will have both the cpu and cuda mark...
There was a problem hiding this comment.
I don't think we need the slow thing now do we? maybe we still want it or maybe it should be a comment stating why we ignore the cpu case... I'll let you decide
tests/evaluation/test_registry.py
Outdated
| from pruna.evaluation.metrics.metric_base import BaseMetric | ||
| from pruna.evaluation.metrics.registry import MetricRegistry | ||
|
|
||
| pytestmark = pytest.mark.cpu |
There was a problem hiding this comment.
this is redundant with the cpu default in conftest.py and not very explicit
tests/telemetry/test_metrics.py
Outdated
| set_opentelemetry_log_level, | ||
| ) | ||
|
|
||
| pytestmark = pytest.mark.cpu |
There was a problem hiding this comment.
This is redundant with the cpu default in conftest.py, and not very explicit
| # Intel extension is tightly coupled with the torch version | ||
| intel = [ | ||
| "intel-extension-for-pytorch>=2.7.0", | ||
| "torch>=2.7.0,<2.9.0", |
There was a problem hiding this comment.
I get that it's tightly coupled with torch, but if intel-extension already declares it's torch dependency we shouldn't need to declare it here no? Same goes for torchvision
There was a problem hiding this comment.
I see what you mean and I softly disagree. yes the version already defines which torch version it's supposed to use, but it doesn't define it in its dependencies. same for the torchvision version. So someone could install ipex with the absolutely wrong torch stack and get really cryptic errors. I think it's better to be explicit here.
There was a problem hiding this comment.
ah, if the torch version is not in the package dependency then absolutely let's pin torch so that the intel install works
gsprochette
left a comment
There was a problem hiding this comment.
Looks great, let's merge it ! I just left one comment for you to decide about the "slow" mark on upscale :) 💅
tests/algorithms/testers/upscale.py
Outdated
|
|
||
| @pytest.mark.cuda | ||
| # Takes too long to run on CPU, so we mark it as slow | ||
| @pytest.mark.slow |
There was a problem hiding this comment.
I don't think we need the slow thing now do we? maybe we still want it or maybe it should be a comment stating why we ignore the cpu case... I'll let you decide
gsprochette
left a comment
There was a problem hiding this comment.
Meant to approve, please refer to previous comment 🙃
58db272 to
c7e5727
Compare
c7e5727 to
8292c33
Compare
Description
Isolated whisper dependencies into a dedicated [whisper] extra. Moved ctranslate2 and whisper-s2t out of the default full install group into [whisper] to prevent side-effects from whisper_s2t imports breaking unrelated tests at collection time. The previously hard-skipped TestWhisperS2T is now gated behind requires_whisper instead.
Pin torch/torchvision bounds on the [intel] extra. intel-extension-for-pytorch is tightly coupled to the torch version; added explicit torch>=2.7.0,<2.9.0 and torchvision>=0.22.0,<0.24.0 constraints so installs don't silently break.
Also switch TestIPEXLLM to use opt_125m model and drop latency metric. The algorithm didn't work on a tiny model due to latent size being small; we also need a specific inference handler for it for the metrics, which can be done later, so removed the evaluation part for now.
IPEX is strictly coupled with the torch version and is not maintained. So makes sense to run it in the nightlies but not on the CI, as it requires its own setup.
Declared packaging conflicts between intel/awq and gptq/awq. These extras pull in incompatible transitive deps; the new [tool.uv.conflicts] entries surface the error at resolve time instead of runtime.
Added requires_* markers for optional-extra tests. New markers requires_awq, requires_gptq, requires_stable_fast, requires_vllm, requires_intel, requires_lmharness, and requires_whisper let CI select/deselect tests based on what's installed rather than relying on implicit skips or collection errors.
Removed the old HIGH_RESOURCE_FIXTURES list and runtime CUDA/multi-GPU detection. Tests without an explicit hardware mark are now auto-tagged cpu. Incompatible device_parametrized variants (e.g. a CUDA-only algorithm's CPU test) are deselected at collection time instead of skipping after expensive fixture setup. Added explicit pytestmark = pytest.mark.cpu to pure-CPU test modules (test_compatibility_symmetry, test_evalharness_metrics, test_registry, test_metrics) so they're correctly selected by -m "cpu".
Updated CI workflow. Dropped --extra vllm from the default uv sync in the setup action. Changed the pytest filter from -m "not (slow or style or high_cpu or cuda or distributed)" to the positive-selection form -m "cpu and not slow and not requires_intel".
Related Issue
Fixes #(issue number)
Type of Change
Testing
uv run pytest -m "cpu and not slow")For full setup and testing instructions, see the Contributing Guide.
Checklist
Thanks for contributing to Pruna! We're excited to review your work.
New to contributing? Check out our Contributing Guide for everything you need to get started.
First Prune (1-year OSS anniversary)
First Prune marks one year of Pruna’s open-source work. During the initiative window, qualifying merged contributions count toward First Prune. You can earn credits for our performance models via our API.
If you’d like your contribution to count toward First Prune, here’s how it works: