[OMNIML-2850] [3/n] Adds sparse attention calibration#538
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #538 +/- ##
==========================================
- Coverage 73.74% 73.54% -0.21%
==========================================
Files 199 205 +6
Lines 21183 22000 +817
==========================================
+ Hits 15621 16179 +558
- Misses 5562 5821 +259 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
8c7ee86 to
da6f627
Compare
525a119 to
c9d7008
Compare
7727793 to
2864629
Compare
2864629 to
ca7e24e
Compare
|
@kaix-nv github is showing 7000+ lines of code as part of this PR. Is that accurate? |
3474b6f to
74a29ea
Compare
0553ec6 to
a5136e8
Compare
5cd6149 to
4b7efca
Compare
4b7efca to
5b22b85
Compare
modelopt/torch/sparsity/attention_sparsity/calibration/calibrate.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
How long does running pytest tests/gpu/torch/sparsity take?
tests/unit/torch/sparsity/attention_sparsity/test_sparse_attention_calibration.py
Show resolved
Hide resolved
| import pytest | ||
|
|
||
| pytest.importorskip("transformers") | ||
|
|
tests/unit/torch/sparsity/attention_sparsity/test_threshold_info.py
Outdated
Show resolved
Hide resolved
|
@kevalmorabia97 All feedback has been addressed. Please take another look. Thanks. |
9a7ae2a to
7529d30
Compare
kevalmorabia97
left a comment
There was a problem hiding this comment.
Some minor comments. Otherwise LGTM. Thanks for addressing my comments
modelopt/torch/sparsity/attention_sparsity/calibration/calibrate.py
Outdated
Show resolved
Hide resolved
modelopt/torch/sparsity/attention_sparsity/calibration/calibrate.py
Outdated
Show resolved
Hide resolved
modelopt/torch/sparsity/attention_sparsity/calibration/calibrate.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Can we merge this file with ruler_utils.py and name ruler_dataset.py? Both are specific for ruler dataset only
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
Signed-off-by: Kai Xu <kaix@nvidia.com>
7529d30 to
2e3059b
Compare
Signed-off-by: Kai Xu <kaix@nvidia.com>
2e3059b to
179e8dd
Compare
Edwardf0t1
left a comment
There was a problem hiding this comment.
Add codeowner approval.
@kaix-nv Will the VSA support be your next PR? =)
Yes, the VSA PR will be submitted soon. |
What does this PR do?
Type of change: ?
new feature
Overview: ?
Usage
The calibration method
Calibration Algorithm
Why Choosing the Inverse Power model?
The inverse power model better fits the relationship between sparsity ratio and threshold_scale_factor.

Runtime Flexibility
Testing
The calibration results for
Qwen/Qwen3-30B-A3B-Thinking-2507are shown below and are mostly consistent with the ground-truth numbers collected from the kernel side.Before your PR is "Ready for review"
Additional Information