Conversation
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
There was a problem hiding this comment.
Pull request overview
This PR introduces an S3-stage-based path for moving pandas data between Snowflake and Metaflow, centralizes shared SQL helpers (templating, query-tagging, SQL file loading), and adds a BatchInferencePipeline utility to orchestrate Snowflake→S3→inference→Snowflake workflows.
Changes:
- Add shared
sql_utils(query templating, SQL file loading, select.dev query-tag injection) and update Snowflake execution to automatically tag statements. - Add S3/Snowflake stage utilities + optional
use_s3_stagesupport forpublish_pandas/query_pandas_from_snowflake, plus a newBatchInferencePipeline. - Expand/adjust unit + functional tests and publish API docs; update CI to run pytest in parallel.
Reviewed changes
Copilot reviewed 31 out of 32 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/unit_tests/snowflake/test__get_select_dev_query_tags.py | New unit tests for get_select_dev_query_tags behavior and perimeter fallback. |
| tests/unit_tests/snowflake/test__execute_sql.py | Updates _execute_sql tests for new error behavior and multi-statement semantics. |
| tests/unit_tests/snowflake/test__add_comment_to_each_sql_statement.py | Updates tests to validate revised SQL annotation behavior. |
| tests/functional_tests/metaflow/test__warehouse.py | Adds a Metaflow flow validating warehouse selection behavior. |
| tests/functional_tests/metaflow/test__publish.py | Standardizes project/table naming and ensures tags are passed into functional runs. |
| tests/functional_tests/metaflow/test__pandas_utc.py | Renames tables/project and passes tags into flow runs. |
| tests/functional_tests/metaflow/test__pandas_s3.py | New functional flow covering pandas read/write via S3 stage. |
| tests/functional_tests/metaflow/test__pandas.py | Renames tables/project and passes tags into flow runs. |
| tests/functional_tests/metaflow/test__get_select_dev_query_tags.py | Removes older functional tests for query-tag warnings (replaced by unit tests). |
| tests/functional_tests/metaflow/test__batch_inference_pipeline.py | New functional flow validating BatchInferencePipeline. |
| src/ds_platform_utils/sql_utils.py | New shared SQL utilities including select.dev query-tag injection into SQL batches. |
| src/ds_platform_utils/pandas_utils.py | Adds chunk-size estimation helper for parquet upload chunking. |
| src/ds_platform_utils/metaflow/write_audit_publish.py | Refactors to use shared SQL helpers + new Snowflake connection helper; adjusts warehouse parameter type. |
| src/ds_platform_utils/metaflow/snowflake_connection.py | Adds warehouse mapping logic and passes warehouse/timezone/query_tag into Snowflake connection creation. |
| src/ds_platform_utils/metaflow/s3_stage.py | Implements Snowflake↔S3 stage COPY helpers and schema inference for parquet loads. |
| src/ds_platform_utils/metaflow/s3.py | Adds Metaflow S3 helpers to read/write parquet via Polars/Pandas. |
| src/ds_platform_utils/metaflow/pandas.py | Adds use_s3_stage support for publish/query pandas, centralizes schema substitution, and uses estimated chunk sizes. |
| src/ds_platform_utils/metaflow/batch_inference_pipeline.py | Adds a pipeline abstraction for multi-step batch inference using Snowflake + S3. |
| src/ds_platform_utils/metaflow/_consts.py | Renames non-prod schema constant and adds S3 stage configuration constants. |
| src/ds_platform_utils/metaflow/init.py | Exposes BatchInferencePipeline from the public Metaflow module. |
| src/ds_platform_utils/_snowflake/write_audit_publish.py | Refactors to use shared SQL helpers and updated schema constant naming. |
| src/ds_platform_utils/_snowflake/run_query.py | Changes _execute_sql to auto-inject select.dev query tags and adds DEBUG_QUERY printing. |
| pyproject.toml | Bumps version and adds dependencies (polars, pytest-xdist). |
| docs/metaflow/*.md | Adds API documentation pages for Metaflow-facing utilities. |
| README.md | Adds links to the new docs pages. |
| .github/workflows/ci-cd-ds-platform-utils.yaml | Ignores docs-only PRs for CI and enables pytest-xdist parallel runs. |
Comments suppressed due to low confidence (1)
tests/unit_tests/snowflake/test__add_comment_to_each_sql_statement.py:1
- This test imports via
src.ds_platform_utils..., which typically fails when the package is installed/loaded asds_platform_utils(and is inconsistent with the other tests in this PR). Import fromds_platform_utils.sql_utilsinstead so tests work in normal packaging/CI environments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@abhishek-pattern I've opened a new pull request, #21, to work on those changes. Once the pull request is ready, I'll request review from you. |
c6d63f4 to
7e09621
Compare
Updates: