Skip to content

Feature/snowflake s3 stage#20

Open
abhishek-pattern wants to merge 3 commits intomainfrom
feature/snowflake-s3-stage-v1
Open

Feature/snowflake s3 stage#20
abhishek-pattern wants to merge 3 commits intomainfrom
feature/snowflake-s3-stage-v1

Conversation

@abhishek-pattern
Copy link

@abhishek-pattern abhishek-pattern commented Mar 2, 2026

Updates:

  • New use_s3_stage parameter in publish_pandas and query_pandas_from_snowflake
  • New BatchInferencePipeline for inference using s3 stage
  • Added Auto warehouse selection functionallity in witch you only specify warehouse as "xl", "med", "xs" and it derives warehouse from tags and current.is_production (Backwards compatible)

Copilot AI review requested due to automatic review settings March 2, 2026 04:01
@wiz-55ccc8b716
Copy link

wiz-55ccc8b716 bot commented Mar 2, 2026

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings 1 Medium
Software Management Finding Software Management Findings -
Total 1 Medium

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an S3-stage-based path for moving pandas data between Snowflake and Metaflow, centralizes shared SQL helpers (templating, query-tagging, SQL file loading), and adds a BatchInferencePipeline utility to orchestrate Snowflake→S3→inference→Snowflake workflows.

Changes:

  • Add shared sql_utils (query templating, SQL file loading, select.dev query-tag injection) and update Snowflake execution to automatically tag statements.
  • Add S3/Snowflake stage utilities + optional use_s3_stage support for publish_pandas / query_pandas_from_snowflake, plus a new BatchInferencePipeline.
  • Expand/adjust unit + functional tests and publish API docs; update CI to run pytest in parallel.

Reviewed changes

Copilot reviewed 31 out of 32 changed files in this pull request and generated 11 comments.

Show a summary per file
File Description
tests/unit_tests/snowflake/test__get_select_dev_query_tags.py New unit tests for get_select_dev_query_tags behavior and perimeter fallback.
tests/unit_tests/snowflake/test__execute_sql.py Updates _execute_sql tests for new error behavior and multi-statement semantics.
tests/unit_tests/snowflake/test__add_comment_to_each_sql_statement.py Updates tests to validate revised SQL annotation behavior.
tests/functional_tests/metaflow/test__warehouse.py Adds a Metaflow flow validating warehouse selection behavior.
tests/functional_tests/metaflow/test__publish.py Standardizes project/table naming and ensures tags are passed into functional runs.
tests/functional_tests/metaflow/test__pandas_utc.py Renames tables/project and passes tags into flow runs.
tests/functional_tests/metaflow/test__pandas_s3.py New functional flow covering pandas read/write via S3 stage.
tests/functional_tests/metaflow/test__pandas.py Renames tables/project and passes tags into flow runs.
tests/functional_tests/metaflow/test__get_select_dev_query_tags.py Removes older functional tests for query-tag warnings (replaced by unit tests).
tests/functional_tests/metaflow/test__batch_inference_pipeline.py New functional flow validating BatchInferencePipeline.
src/ds_platform_utils/sql_utils.py New shared SQL utilities including select.dev query-tag injection into SQL batches.
src/ds_platform_utils/pandas_utils.py Adds chunk-size estimation helper for parquet upload chunking.
src/ds_platform_utils/metaflow/write_audit_publish.py Refactors to use shared SQL helpers + new Snowflake connection helper; adjusts warehouse parameter type.
src/ds_platform_utils/metaflow/snowflake_connection.py Adds warehouse mapping logic and passes warehouse/timezone/query_tag into Snowflake connection creation.
src/ds_platform_utils/metaflow/s3_stage.py Implements Snowflake↔S3 stage COPY helpers and schema inference for parquet loads.
src/ds_platform_utils/metaflow/s3.py Adds Metaflow S3 helpers to read/write parquet via Polars/Pandas.
src/ds_platform_utils/metaflow/pandas.py Adds use_s3_stage support for publish/query pandas, centralizes schema substitution, and uses estimated chunk sizes.
src/ds_platform_utils/metaflow/batch_inference_pipeline.py Adds a pipeline abstraction for multi-step batch inference using Snowflake + S3.
src/ds_platform_utils/metaflow/_consts.py Renames non-prod schema constant and adds S3 stage configuration constants.
src/ds_platform_utils/metaflow/init.py Exposes BatchInferencePipeline from the public Metaflow module.
src/ds_platform_utils/_snowflake/write_audit_publish.py Refactors to use shared SQL helpers and updated schema constant naming.
src/ds_platform_utils/_snowflake/run_query.py Changes _execute_sql to auto-inject select.dev query tags and adds DEBUG_QUERY printing.
pyproject.toml Bumps version and adds dependencies (polars, pytest-xdist).
docs/metaflow/*.md Adds API documentation pages for Metaflow-facing utilities.
README.md Adds links to the new docs pages.
.github/workflows/ci-cd-ds-platform-utils.yaml Ignores docs-only PRs for CI and enables pytest-xdist parallel runs.
Comments suppressed due to low confidence (1)

tests/unit_tests/snowflake/test__add_comment_to_each_sql_statement.py:1

  • This test imports via src.ds_platform_utils..., which typically fails when the package is installed/loaded as ds_platform_utils (and is inconsistent with the other tests in this PR). Import from ds_platform_utils.sql_utils instead so tests work in normal packaging/CI environments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link

Copilot AI commented Mar 2, 2026

@abhishek-pattern I've opened a new pull request, #21, to work on those changes. Once the pull request is ready, I'll request review from you.

@abhishek-pattern abhishek-pattern force-pushed the feature/snowflake-s3-stage-v1 branch from c6d63f4 to 7e09621 Compare March 2, 2026 05:27
Copy link

@sandeepk-pattern sandeepk-pattern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants