Skip to content

NO-SNOW: Fix test_large_series_items and regr_syy#4053

Merged
sfc-gh-joshi merged 3 commits intomainfrom
joshi/fix-pandas-test_items
Jan 21, 2026
Merged

NO-SNOW: Fix test_large_series_items and regr_syy#4053
sfc-gh-joshi merged 3 commits intomainfrom
joshi/fix-pandas-test_items

Conversation

@sfc-gh-joshi
Copy link
Copy Markdown
Contributor

@sfc-gh-joshi sfc-gh-joshi commented Jan 20, 2026

  1. Which Jira issue is this PR addressing? Make sure that there is an accompanying issue to your PR.

    Fixes SNOW-NNNNNNN

  2. Fill out the following pre-review checklist:

    • I am adding a new automated test(s) to verify correctness of my new code
      • If this test skips Local Testing mode, I'm requesting review from @snowflakedb/local-testing
    • I am adding new logging messages
    • I am adding a new telemetry message
    • I am adding new credentials
    • I am adding a new dependency
    • If this is a new feature/behavior, I'm adding the Local Testing parity changes.
    • I acknowledge that I have ensured my changes to be thread-safe. Follow the link for more information: Thread-safe Developer Guidelines
    • If adding any arguments to public Snowpark APIs or creating new public Snowpark APIs, I acknowledge that I have ensured my changes include AST support. Follow the link for more information: AST Support Guidelines
  3. Please describe how your code solves the related issue.

This PR fixes 3 issues:

(1) src/snowflake/snowpark/functions.py::snowpark.functions.regr_syy was failing because the doctest did not sort the query output.

(2) tests/integ/modin/series/test_items.py::test_items_large_series completes in ~40 seconds when run locally in single-thread mode, but when parallelized with pytest-xdist (as in Jenkins), hangs for a very long time. This occurs because the test creates 8193 SqlCounter instances in a loop. Each instance then unconditionally performs an expensive traceback.format_stack() call (here); some individual calls on my machine began taking upwards of 9 seconds to complete. This may be amplified by the fact that each SqlCounter instance must acquire an Rlock upon creation, but I didn't investigate too deeply--at any rate, lifting the SqlCounter out of the loop resolves the issue locally (passing Jenkins run).

(3) Cursor notes that the SqlCounter code to generate a traceback is run unconditionally whether a test succeeds or fails. Though this operation typically takes ~0.1ms, the Jenkins runner runs >40k tests for pandas, and there may be other tests with outlier stack traces like this one that can benefit. As such, this PR also adjusts the SqlCounter to only generate a stack trace when counts fail.

Benchmark of serial pytest tests/integ/modin/series/test_items.py for a very rough estimate of speed improvement:
Always generating stack trace:

Benchmark 1: pytest tests/integ/modin/series/test_items.py
  Time (mean ± σ):     15.416 s ±  0.467 s    [User: 4.481 s, System: 0.689 s]
  Range (min … max):   14.536 s … 16.007 s    10 runs

Skipping stack trace computation:

Benchmark 1: pytest tests/integ/modin/series/test_items.py
  Time (mean ± σ):     15.120 s ±  0.631 s    [User: 4.307 s, System: 0.654 s]
  Range (min … max):   14.100 s … 15.856 s    10 runs

The mean difference of 0.3s is not statistically significant at this sample size, but it's promising all the same, and may let us shave a few minutes off each CI run when considering the size of the pandas codebase.

@sfc-gh-joshi sfc-gh-joshi added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Jan 20, 2026
@github-actions github-actions Bot added snowpark-pandas and removed NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Jan 20, 2026
@sfc-gh-joshi sfc-gh-joshi changed the title NO-SNOW: Fix test_large_series_items jenkins flake NO-SNOW: Fix test_large_series_items and regr_syy Jan 20, 2026
@sfc-gh-joshi sfc-gh-joshi force-pushed the joshi/fix-pandas-test_items branch from f6ac8b2 to 36f8225 Compare January 20, 2026 22:45
@sfc-gh-joshi sfc-gh-joshi added NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs labels Jan 20, 2026
@sfc-gh-joshi sfc-gh-joshi marked this pull request as ready for review January 21, 2026 00:22
@sfc-gh-joshi sfc-gh-joshi requested review from a team as code owners January 21, 2026 00:22
@sfc-gh-joshi sfc-gh-joshi merged commit e0d2c65 into main Jan 21, 2026
43 of 53 checks passed
@sfc-gh-joshi sfc-gh-joshi deleted the joshi/fix-pandas-test_items branch January 21, 2026 19:40
@github-actions github-actions Bot locked and limited conversation to collaborators Jan 21, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

NO-CHANGELOG-UPDATES This pull request does not need to update CHANGELOG.md NO-PANDAS-CHANGEDOC-UPDATES This PR does not update Snowpark pandas docs snowpark-pandas

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants