Skip to content

Harden BigQuery ML tools against SQL injection#5251

Open
petrmarinec wants to merge 1 commit intogoogle:mainfrom
petrmarinec:security/bigquery-ml-sqli
Open

Harden BigQuery ML tools against SQL injection#5251
petrmarinec wants to merge 1 commit intogoogle:mainfrom
petrmarinec:security/bigquery-ml-sqli

Conversation

@petrmarinec
Copy link
Copy Markdown

Summary

This change hardens the BigQuery ML helper tools exposed by BigQueryToolset against SQL injection.

forecast, analyze_contribution, and detect_anomalies previously constructed SQL with direct string interpolation of tool inputs. Because these helpers are exposed through the toolset, attacker-influenced inputs could be propagated into generated SQL, including in ORDER BY clauses and model option values.

What changed

  • Parameterize literal ML option values in forecast, analyze_contribution, and detect_anomalies instead of interpolating them into SQL strings.
  • Restrict history_data, input_data, and target_data to valid BigQuery table IDs instead of accepting raw SQL statements.
  • Validate and quote field paths used by detect_anomalies before placing them in ORDER BY.
  • Add regression tests covering the previously unsafe input patterns.

Why this approach

BigQuery supports parameterized queries natively, which is the correct defense for literal values controlled by tool inputs.

Table references and ORDER BY identifiers cannot be passed as query parameters, so those inputs are now validated as BigQuery table IDs or field paths before use.

Compatibility

This is an intentional hardening change: these ML helper tools no longer accept raw SQL statements as data sources.

Callers that need arbitrary SQL should use execute_sql directly or materialize a view or table first, then pass that table ID to the ML helper.

Validation

Verified in a fresh Linux environment with test dependencies installed via pip install -e '.[test]':

  • pytest tests/unittests/tools/bigquery/test_bigquery_query_tool.py -q
    • Result: 95 passed

Also re-ran the offline regression scenario used to confirm the original issue and verified that the generated SQL no longer contains the previously attacker-controlled fragments.

This PR also adds unit regressions for those cases so future changes do not reintroduce the same issue.

@google-cla
Copy link
Copy Markdown

google-cla bot commented Apr 10, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@petrmarinec petrmarinec force-pushed the security/bigquery-ml-sqli branch from 89c2b7c to 86db909 Compare April 10, 2026 18:50
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Apr 10, 2026

Response from ADK Triaging Agent - Security Review Complete.
I have reviewed the SQL injection hardening changes and verified
the test coverage. The sanitization approach looks correct.
LGTM - ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants