Skip to content

Commit 9259644

Browse files
feat(backend/kernel): route use_sea=True through the Rust kernel
Phase 2 of the PySQL × kernel integration plan (databricks-sql-kernel/docs/designs/pysql-kernel-integration.md). Wires `use_sea=True` to a new `backend/kernel/` module that delegates to the Rust kernel via the `databricks_sql_kernel` PyO3 extension (kernel PR #13). New module: `src/databricks/sql/backend/kernel/` - `client.py` — `KernelDatabricksClient(DatabricksClient)`. Lazy- imports `databricks_sql_kernel` so a connector install without the kernel wheel doesn't `ImportError` at startup; only `use_sea=True` surfaces the missing-extra message. Implements open/close_session, sync + async execute_command (async_op=True goes through `Statement.submit()` and stashes the handle in a dict keyed on `CommandId`), cancel/close_command, get_query_state, get_execution_result, and the metadata calls (catalogs / schemas / tables / columns) via `Session.metadata().list_*`. Real server-issued session and statement IDs flow through (no synthetic UUIDs). - `auth_bridge.py` — translate the connector's `AuthProvider` into kernel `Session` kwargs. PAT (including federation-wrapped PAT — `get_python_sql_connector_auth_provider` always wraps the base in `TokenFederationProvider`, so a naive isinstance check never matches) routes through `auth_type="pat"`. Everything else routes through `auth_type="external"` with a callback that delegates to `auth_provider.add_headers({})`. (External today is rejected by the kernel at `build_auth_provider`; the separate kernel-side enablement PR will flip it on.) - `result_set.py` — `KernelResultSet(ResultSet)`. Duck-typed over `databricks_sql_kernel.ExecutedStatement` (sync execute) and `ResultStream` (metadata + async await_result) since both expose `arrow_schema()` / `fetch_next_batch()` / `fetch_all_arrow()` / `close()`. Same FIFO batch buffer the prior ADBC POC used, so `fetchmany(n)` for n smaller than the kernel's natural batch size doesn't re-fetch. - `type_mapping.py` — Arrow → PEP 249 description-string mapper. Lifted from the prior ADBC POC; centralised here so future kernel-result wrappers reuse the same mapping. Kernel errors → PEP 249 exceptions: `KernelError.code` is mapped in a single table to `ProgrammingError` / `OperationalError` / `DatabaseError`. The structured fields (`sql_state`, `error_code`, `query_id`, …) are copied onto the re-raised exception so callers can branch on them without reaching through `__cause__`. Routing: `Session._create_backend` flips the `use_sea=True` branch to instantiate `KernelDatabricksClient` instead of the native `SeaDatabricksClient`. The native `backend/sea/` module is left in place (no users on `use_sea=True` after this PR; its long- term fate is out of scope here). Packaging: `[tool.poetry.extras] kernel = ["databricks-sql-kernel"]`. `pip install 'databricks-sql-connector[kernel]'` pulls in the kernel wheel; `use_sea=True` without the extra raises a pointed ImportError telling the user how to install it. Known gaps (acknowledged, will be follow-ups): - Parameter binding (`execute_command(parameters=[...])`) raises NotSupportedError — PyO3 `Statement.bind_param` lands in a follow-up. - Statement-level `query_tags` raises NotSupportedError. - `get_tables(table_types=[...])` returns unfiltered rows (the native SEA backend's filter is keyed on `SeaResultSet`; needs a small port to operate on `KernelResultSet`). - External-auth end-to-end blocked on the kernel-side `AuthConfig::External` enablement PR. - Volume PUT/GET (staging operations): kernel has no Volume API. Test plan: - Unit: 37 new tests across `tests/unit/test_kernel_auth_bridge.py` (auth provider → kwargs mapping, including federation-wrapped PAT and the External trampoline call-counter check), `tests/unit/test_kernel_type_mapping.py` (Arrow type mapping + description shape), and `tests/unit/test_kernel_result_set.py` (buffer semantics, fetchmany across batch boundaries, idempotent close, close() swallowing handle-close failures). All pass. - Full unit suite: 600 pre-existing tests still pass; one pre-existing failure (`test_useragent_header` — agent detection adds `agent/claude-code` in this env) was already failing on main, unrelated to this change. - Live e2e against dogfood with `use_sea=True`: SELECT 1, `range(10000)`, `fetchmany` pacing, `fetchall_arrow`, all four metadata calls (returned 75 catalogs / 144 schemas in main / 47 tables in `system.information_schema` / 15 columns), `session_configuration={'ANSI_MODE': 'false'}` round-trips, bad SQL surfaces as DatabaseError with `code='SqlError'` and `sql_state='42P01'` on the exception. All checks pass. Co-authored-by: Isaac Signed-off-by: Vikrant Puppala <vikrant.puppala@databricks.com>
1 parent cbd6a88 commit 9259644

10 files changed

Lines changed: 1321 additions & 8 deletions

File tree

pyproject.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,16 @@ pyarrow = [
3232
pyjwt = "^2.0.0"
3333
pybreaker = "^1.0.0"
3434
requests-kerberos = {version = "^0.15.0", optional = true}
35+
# Optional kernel backend: `pip install 'databricks-sql-connector[kernel]'`
36+
# unlocks use_sea=True, which routes through the Rust kernel via PyO3.
37+
# Without it, use_sea=True raises a pointed ImportError. The kernel
38+
# wheel itself ships from the databricks-sql-kernel repo.
39+
databricks-sql-kernel = {version = "^0.1.0", optional = true}
3540

3641

3742
[tool.poetry.extras]
3843
pyarrow = ["pyarrow"]
44+
kernel = ["databricks-sql-kernel"]
3945

4046
[tool.poetry.group.dev.dependencies]
4147
pytest = "^7.1.2"
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
"""Backend that delegates to the Databricks SQL Kernel (Rust) via PyO3.
2+
3+
Routed when ``use_sea=True`` is passed to ``databricks.sql.connect``.
4+
The module's identity is "delegates to the kernel" — not the wire
5+
protocol the kernel happens to use today (SEA REST). The kernel may
6+
switch its default transport (SEA REST → SEA gRPC → …) without
7+
renaming this module.
8+
9+
See ``docs/designs/pysql-kernel-integration.md`` in
10+
``databricks-sql-kernel`` for the full integration design.
11+
"""
12+
13+
from databricks.sql.backend.kernel.client import KernelDatabricksClient
14+
15+
__all__ = ["KernelDatabricksClient"]
Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
"""Translate the connector's ``AuthProvider`` into ``databricks_sql_kernel``
2+
``Session`` auth kwargs.
3+
4+
The connector already implements every auth flow it supports (PAT,
5+
OAuth M2M, OAuth U2M, external token providers, federation). The
6+
kernel must not re-implement them. Decision D9 in the integration
7+
design: PAT goes through the kernel's PAT path; everything else
8+
delegates back to the connector via the kernel's ``External``
9+
trampoline, with a Python callback that returns a fresh bearer
10+
token.
11+
12+
Token extraction goes through ``AuthProvider.add_headers({})``
13+
rather than touching auth-provider-specific attributes, so the
14+
bridge works for every subclass — including custom providers a
15+
caller may have wired in.
16+
17+
End-to-end limitation: the kernel's
18+
``build_auth_provider`` currently rejects ``AuthConfig::External``
19+
("reserved; v0 wires PAT + OAuthM2M + OAuthU2M only"). Until the
20+
kernel-side follow-up PR lands, non-PAT auth surfaces a clear
21+
``KernelError(code='InvalidArgument', message='AuthConfig::External
22+
is reserved...')`` from ``Session.open_session``. PAT works today.
23+
"""
24+
25+
from __future__ import annotations
26+
27+
import logging
28+
from typing import Any, Dict, Optional
29+
30+
from databricks.sql.auth.authenticators import AccessTokenAuthProvider, AuthProvider
31+
from databricks.sql.auth.token_federation import TokenFederationProvider
32+
33+
logger = logging.getLogger(__name__)
34+
35+
36+
_BEARER_PREFIX = "Bearer "
37+
38+
39+
def _is_pat(auth_provider: AuthProvider) -> bool:
40+
"""Return True iff this provider ultimately wraps an
41+
``AccessTokenAuthProvider``.
42+
43+
``get_python_sql_connector_auth_provider`` always wraps the
44+
base provider in a ``TokenFederationProvider``, so an
45+
``isinstance`` check against ``AccessTokenAuthProvider`` alone
46+
never matches in practice. We peek through the federation
47+
wrapper to find the real type.
48+
"""
49+
if isinstance(auth_provider, AccessTokenAuthProvider):
50+
return True
51+
if isinstance(auth_provider, TokenFederationProvider) and isinstance(
52+
auth_provider.external_provider, AccessTokenAuthProvider
53+
):
54+
return True
55+
return False
56+
57+
58+
def _extract_bearer_token(auth_provider: AuthProvider) -> Optional[str]:
59+
"""Pull the current bearer token out of an ``AuthProvider``.
60+
61+
The connector's ``AuthProvider.add_headers`` mutates a header
62+
dict and writes the ``Authorization: Bearer <token>`` value.
63+
Going through that public surface keeps us insulated from
64+
provider-specific internals.
65+
66+
Returns ``None`` if the provider did not write an Authorization
67+
header or wrote a non-Bearer scheme — neither shape is
68+
representable in the kernel's auth surface today.
69+
"""
70+
headers: Dict[str, str] = {}
71+
auth_provider.add_headers(headers)
72+
auth = headers.get("Authorization")
73+
if not auth:
74+
return None
75+
if not auth.startswith(_BEARER_PREFIX):
76+
return None
77+
return auth[len(_BEARER_PREFIX) :]
78+
79+
80+
def kernel_auth_kwargs(auth_provider: AuthProvider) -> Dict[str, Any]:
81+
"""Build the kwargs passed to ``databricks_sql_kernel.Session(...)``.
82+
83+
Two routing decisions:
84+
85+
1. ``AccessTokenAuthProvider`` → ``auth_type='pat'`` with the
86+
static token. Kernel uses it verbatim for every request.
87+
2. Anything else → ``auth_type='external'`` with a callback that
88+
calls ``auth_provider.add_headers({})`` and returns the
89+
fresh bearer token. The connector keeps owning the OAuth /
90+
MSAL / federation flow; the kernel asks for a token whenever
91+
it needs one.
92+
93+
The PAT special-case exists because it's the only path the
94+
kernel actually serves end-to-end today. Once the kernel-side
95+
External enablement lands, PAT could collapse into the
96+
External path too (one callback that returns the static token);
97+
but keeping the explicit ``pat`` route means the kernel does
98+
not pay the GIL-reacquire cost on every HTTP request for PAT
99+
users.
100+
"""
101+
if _is_pat(auth_provider):
102+
# PAT case: pull the static token out and feed the kernel's
103+
# PAT path. We go through ``add_headers`` regardless of
104+
# whether the provider was wrapped in TokenFederation or
105+
# not — both shapes write the same Authorization header.
106+
token = _extract_bearer_token(auth_provider)
107+
if not token:
108+
raise ValueError(
109+
"PAT auth provider did not produce a Bearer Authorization "
110+
"header; cannot route through the kernel's PAT path"
111+
)
112+
return {"auth_type": "pat", "access_token": token}
113+
114+
# Every other provider: trampoline a callback. The callback is
115+
# invoked once per HTTP request that needs auth (the kernel does
116+
# not cache the returned token), so the auth_provider's own
117+
# caching is what keeps this fast.
118+
def token_callback() -> str:
119+
token = _extract_bearer_token(auth_provider)
120+
if not token:
121+
raise RuntimeError(
122+
f"{type(auth_provider).__name__}.add_headers did not produce "
123+
"a Bearer Authorization header; cannot supply a token to the kernel"
124+
)
125+
return token
126+
127+
logger.debug(
128+
"Routing %s through kernel External trampoline",
129+
type(auth_provider).__name__,
130+
)
131+
return {"auth_type": "external", "token_callback": token_callback}

0 commit comments

Comments
 (0)