Skip to content

[tests] refactor caching tests.#13235

Merged
sayakpaul merged 14 commits into
mainfrom
refactor-caching-tests
Jun 2, 2026
Merged

[tests] refactor caching tests.#13235
sayakpaul merged 14 commits into
mainfrom
refactor-caching-tests

Conversation

@sayakpaul
Copy link
Copy Markdown
Member

@sayakpaul sayakpaul commented Mar 9, 2026

What does this PR do?

  • Refactor MagCache tests.
  • Include TaylorSeer in our model-level caching test mixin.
  • Consider removing caching-related mixins from test_pipelines_common.py.

@sayakpaul sayakpaul marked this pull request as ready for review March 10, 2026 03:27
@sayakpaul sayakpaul changed the title [wip] [tests] refactor caching tests. [tests] refactor caching tests. Mar 10, 2026
@sayakpaul sayakpaul requested a review from DN6 March 10, 2026 03:27
@sayakpaul
Copy link
Copy Markdown
Member Author

@DN6 LMK your thoughts on removing the caching-related stuff we have in test_pipelines_common.py. IMO, they're not adding much value given we have model-level caching testers for all the caching methods we support.

@github-actions github-actions Bot added tests size/L PR with diff > 200 LOC labels Apr 17, 2026
@github-actions github-actions Bot added size/L PR with diff > 200 LOC and removed size/L PR with diff > 200 LOC labels May 1, 2026
@sayakpaul
Copy link
Copy Markdown
Member Author

@DN6 a gentle ping.

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@github-actions github-actions Bot removed the models label May 20, 2026
@sayakpaul sayakpaul requested a review from dg845 May 27, 2026 02:53
Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I think we should consider retaining the original MagCache tests in tests/hooks/test_mag_cache.py in some form, since if I understand correctly those tests are testing if the MagCache implementation is logically correct, while MagCacheTesterMixin is testing whether MagCache works with a given model.

@sayakpaul
Copy link
Copy Markdown
Member Author

@dg845 thanks! I added back the MagCache tests and migrated to pytest for consistency. I initially removed them in the interest of consistency (all caching methods don't have them).

LMK your thoughts on "Consider removing caching-related mixins from test_pipelines_common.py" as well.

@sayakpaul sayakpaul requested a review from dg845 May 31, 2026 07:41
@sayakpaul
Copy link
Copy Markdown
Member Author

I also realized that the hook testing wasn't ever a part of our CI. Opened a PR #13848

"""

@torch.no_grad()
def _test_cache_inference(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like FirstBlockCacheTesterMixin._test_cache_inference and TaylorSeerCacheTesterMixin._test_cache_inference are identical except for the cache context. Would it be possible to define a _test_cache_inference_with_context method in CacheTesterMixin that takes in the context name as an argument? Maybe something like

    @torch.no_grad()
    def _test_cache_inference_with_context(self, context_name: str):
        ...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to do that in a follow-up. This PR is reserved for purely migration.

)

@torch.no_grad()
def _test_reset_stateful_cache(self):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly to #13235 (comment), I think we may be able to define a context-aware version in CacheTesterMixin:

    @torch.no_grad()
    def _test_reset_stateful_cache_with_context(self, context_name: str):
        ...
        with model.cache_context(context_name):
            _ = model(**inputs_dict, return_dict=False)[0]
        ...

and then reuse it for both FirstBlockCacheTesterMixin and TaylorSeerCacheTesterMixin.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. If the method is fairly general, feel free to open a follow-up PR.

Copy link
Copy Markdown
Collaborator

@dg845 dg845 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Left some small comments.

@dg845
Copy link
Copy Markdown
Collaborator

dg845 commented Jun 2, 2026

With respect to the caching mixins in test_pipelines_common.py, my current thoughts are that we probably should have pipeline-level cache testing mixins because cache techniques generally maintain state across different model forward passes within a denoising loop, and vary the model behavior on subsequent timesteps based on that state. For example, MagCacheState maintains an accumulated_steps counter that, if it reaches MagCacheConfig.max_skip_steps, will cause the next model forward pass to be fully computed.

So I think ideally we would have testing on three levels, following MagCache:

  1. Hook tests (tests/hooks/test_mag_cache.py): test if the hook is working and logically correct in isolation (kind of analogous to unit tests)
  2. Model tests (tests/models/testing_utils/cache.py::MagCacheTesterMixin): tests compatibility with a given model (integration tests at the model level)
  3. Pipeline tests (tests/pipelines/test_pipelines_common.py::MagCacheTesterMixin): tests compatibility with a given pipeline (integration tests at the pipeline level)

In practice, I'm not sure what the right way to separate model tests and pipeline tests is. I think (but am not sure) that model compatibility does not necessarily imply pipeline compatibility, so pipeline-level tests are not necessarily redundant. For example, I think that testing output quality as done by test_mag_cache_inference in the pipeline-level MagCacheTesterMixin makes sense because this isn't something a model-level test can capture.

@sayakpaul
Copy link
Copy Markdown
Member Author

Fair points esepcially because:

because cache techniques generally maintain state across different model forward passes within a denoising loop, and vary the model behavior on subsequent timesteps based on that state.

In practice, I'm not sure what the right way to separate model tests and pipeline tests is. I think (but am not sure) that model compatibility does not necessarily imply pipeline compatibility, so pipeline-level tests are not necessarily redundant. For example, I think that testing output quality as done by test_mag_cache_inference in the pipeline-level MagCacheTesterMixin makes sense because this isn't something a model-level test can capture.

I think a good boundary to consider when having both or just model-level tests is if the underlying feature depends on the specifics coming from pipelines. I think quantization is one of those cases. It doesn't matter if it comes from loading a pipeline or a model.

@sayakpaul sayakpaul merged commit ed07118 into main Jun 2, 2026
14 checks passed
@sayakpaul sayakpaul deleted the refactor-caching-tests branch June 2, 2026 02:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L PR with diff > 200 LOC tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants