Redis backed Caching of Internal Embeddings#2
Open
ajtiwari07 wants to merge 59 commits into
Open
Conversation
….15.3 (Azure#3488) ## Why make this change? Bump `OpenTelemetry.Exporter.OpenTelemetryProtocol` to pick up patch fixes from 1.15.3. ## What is this change? - Updated `OpenTelemetry.Exporter.OpenTelemetryProtocol` from `1.13.0` → `1.15.3` in `src/Directory.Packages.props` (centrally managed versions) ## How was this tested? - [ ] Integration Tests - [ ] Unit Tests ## Sample Request(s) N/A — dependency version bump only. > [!WARNING] > > <details> > <summary>Firewall rules blocked me from connecting to one or more addresses (expand for details)</summary> > > #### I tried to connect to the following addresses, but was blocked by firewall rules: > > - `www.nuget.org` > - Triggering command: `/home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js` (dns block) > > If you need me to access, download, or install something from one of these locations, you can either: > > - Configure [Actions setup steps](https://gh.io/copilot/actions-setup-steps) to set up my environment, which run before the firewall is enabled > - Add the appropriate URLs or hosts to the custom allowlist in this repository's [Copilot coding agent settings](https://github.com/Azure/data-api-builder/settings/copilot/coding_agent) (admins only) > > </details> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: Ruben Cerna <rcernaserna@microsoft.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 1, 2026
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Why make this change? Closes Azure#3274 - MCP Server returns "Method not found: logging/setLevel" error when clients send the standard MCP logging/setLevel request. Closes Azure#3275 - Control output in MCP stdio mode (default to `LogLevel.None`, redirect/suppress console output). ## What is this change? ### MCP `logging/setLevel` Handler - Added handler for `logging/setLevel` JSON-RPC method in `McpStdioServer.cs` - Implemented `DynamicLogLevelProvider` with `ILogLevelController` interface to allow MCP to update log levels dynamically - Added `IsCliOverridden` and `IsConfigOverridden` properties to enforce precedence rules ### Log Level Precedence System **Precedence (highest to lowest):** 1. **CLI `--LogLevel` flag** - cannot be changed by MCP 2. **Config `runtime.telemetry.log-level`** - cannot be changed by MCP 3. **MCP `logging/setLevel`** - only works if neither CLI nor config set a level 4. Default (LogLevel.None for MCP stdio mode) If CLI or config set a level, MCP requests are accepted but silently ignored (no error returned per MCP spec). ### Early Config Reading for MCP Mode - Added `TryGetLogLevelFromConfig()` in `Program.cs` to read config file early (before host build) - This ensures config log level is detected before Console redirect decision - Console redirect for MCP stdio mode now respects config log level ### CLI Log Level Handling - Added `Utils.CliLogLevel` property to track the parsed `--LogLevel` value - CLI's `CustomLoggerProvider` now respects the `--LogLevel` value for its own logging ### Config Helpers - Added `HasExplicitLogLevel()` helper to `RuntimeConfig` to correctly detect when config actually pins a log level - This properly handles null values in telemetry section (null values don't count as explicit override) ## How was this tested? - [x] Unit Tests (`DynamicLogLevelProviderTests` - 5 tests) - [x] Manual Testing ### Manual Test 1: No override (MCP can change level) 1. Start MCP server without `--LogLevel` and without config `log-level` 2. MCP sends `logging/setLevel` with `level: info` 3. Result: Log level changes to info ### Manual Test 2: CLI override (MCP blocked) 1. Start MCP server with `--LogLevel Warning` 2. MCP sends `logging/setLevel` with `level: info` 3. Result: Log level stays at Warning, MCP request accepted silently ### Manual Test 3: Config override (MCP blocked) 1. Add `"telemetry": { "log-level": { "default": "Warning" } }` to config 2. Start MCP server without `--LogLevel` 3. MCP sends `logging/setLevel` with `level: info` 5. Result: Log level stays at Warning, MCP request accepted silently ### Manual Test 4: Config with null values (MCP can change level) 1. Add `"telemetry": { "log-level": { "default": null } }` to config 2. Start MCP server without `--LogLevel` 3. MCP sends `logging/setLevel` with `level: info` 4. Result: Log level changes to info (null values don't count as override) ## Sample Request(s) MCP client sends: ```json { "jsonrpc": "2.0", "id": 1, "method": "logging/setLevel", "params": { "level": "info" } } ``` Server responds with empty result (success per MCP spec) and updates log level if no CLI/config override is active. --------- Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 4, 2026
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…re#3318) ## Why make this change? - Closes issue Azure#3262 The logger for the Startup class is not initialized properly, since this logger is special due to the nature of the Startup class it needs to be continuously updated as DAB initializes. This causes two problems: - Some logs appear even when LogLevel is set to some value that would impede those logs to appear. - Some logs don't appear at all, even when LogLevel is set to a value that should allow them to be logged. - Closes issue Azure#3256 & Azure#3255 The CLI logger still outputs some logs even when the LogLevel is set to `none`. It is expected that if the LogLevel set is `none` or some other level that shouldn't output the `information` level, the logs will not appear. ## What is this change? Important Note: These changes currently only allow us to change the LogLevel from the CLI with the `default` namespace in the config file. An task was created to solve this issue: Azure#3451 In order to solve issue Azure#3262: - We removed the LogBuffer from the services inside of `Startup.cs`, this is necessary since we wanted each class to have its own LogBuffer so that we are able to tell from which logger the logs are being outputted. - Then, we also correctly initialized the `Startup` logger by changing the method that it was using to initialize the logger, it now uses `CreateLoggerFactoryForHostedAndNonHostedScenario` which checks if there are any LogLevel namespaces from the config file that can be applicable for the specific logger. It is important to note that there are multiple places where the logs are flushed in order to cover for the cases in which an exception is found and causes DAB to end abruptly, and when we there is an IsLateConfigured scenario. - We also changed the logger for the LogBuffer in all the missing places where it creates logs before the logger is able to properly initialize to add those logs to the LogBuffer and only flush them after the loggers are initialized. In order to solve issue Azure#3256 & Azure#3255: - We changed the CLI so that we add all the logs go to a single global LogBuffer that is created inside the `StartOptions.cs` until it is able to deserialize the RuntimeConfig and find which level to set the `LogLevel` in order to flush all the logs. - This is something that we only want to happen when we use the `dab start` command, which is why we only make this change in the `StartOptions.cs` file, on the function `TryStartEngineWithOptions` inside of `ConfigGenerator.cs`, and a few functions from `Utils.cs` and `ConfigMerger.cs` that are used inside the `TryStartEngine` function. ## How was this tested? - [ ] Integration Tests - [x] Unit Tests ## Sample Request(s) - dab start --LogLevel none - dab start --LogLevel error --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com>
## Why make this change? Closes Azure#3267 ## What is this change? Alters the validation logic in the following way. Is top-level config with data-source-files? (we call this a `Root` config file) ├── YES │ ├── Has datasource? → ValidateEntityPresence (same rules as non-root) │ ├── No datasource but has entities/autoentities? → ERROR │ └── No datasource, no entities → VALID (children provide everything) │ └── For each child → ValidateNonRootConfig(child, filename) │ └── NO (standalone or child config) ├── No datasource? → ERROR: "data source is required" └── Has datasource → ValidateEntityPresence Note: A top-level config file without any children data-source files is NOT considered a root. And an intermediary config file, ie: is a child, that also has child configs is NOT a root. Only a top-level config with children configs is a Root. #### ValidateEntityPresence Count resolved autoentities from AutoentityResolutionCounts total = manual entities + resolved autoentities total == 0? → ERROR: "No entities found" total > 0 but autoentities discovered nothing? → WARN: "Autoentities configured but none discovered" No double messaging. If total is 0, only the error is recorded, not the warning. ## How was this tested? ### Truth table — top-level config Variables (`1` = present / non-empty, `0` = absent / empty): - **DSF** — `data-source-files` present - **DS** — `data-source` present - **E** — manual `entities` count > 0 - **AE** — `autoentities` count > 0 (presence, *not* resolved count) Path is determined by `IsRootConfig = (DSF == 1) && !IsChildConfig`. | # | DSF | DS | E | AE | AE resolved | Path | Expected | Test | |---|:---:|:--:|:-:|:--:|:-----------:|------|----------|------| | 1 | 0 | 0 | 0 | 0 | — | Non-root | **Error**: "data source is required" | `TestNonRootWithNoDataSourceProducesError` | | 2 | 0 | 0 | 0 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1 — DS check fires first_ | | 3 | 0 | 0 | 1 | 0 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 4 | 0 | 0 | 1 | 1 | — | Non-root | **Error**: "data source is required" | _covered by #1_ | | 5 | 0 | 1 | 0 | 0 | — | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndNoEntitiesProducesError` | | 6a | 0 | 1 | 0 | 1 | 0 | Non-root | **Error**: "No entities found" | `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 6b | 0 | 1 | 0 | 1 | >0 | Non-root | **Valid** | `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 7 | 0 | 1 | 1 | 0 | — | Non-root | **Valid** | `TestNonRootWithDataSourceAndEntitiesIsValid` | | 8a | 0 | 1 | 1 | 1 | 0 | Non-root | **Valid** + **Warn** | `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 8b | 0 | 1 | 1 | 1 | >0 | Non-root | **Valid** | _covered by #7 / #6b combined_ | | 9 | 1 | 0 | 0 | 0 | — | Root | **Valid** (children carry the load) | `TestRootWithNoDataSourceAndNoEntitiesIsValid`, `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | 10 | 1 | 0 | 0 | 1 | — | Root | **Error**: "must not define entities or autoentities" | `TestRootWithNoDataSourceButAutoentitiesProducesError` | | 11 | 1 | 0 | 1 | 0 | — | Root | **Error**: "must not define entities" | `TestRootWithNoDataSourceButEntitiesProducesError` | | 12 | 1 | 0 | 1 | 1 | — | Root | **Error** | _covered by #11_ | | 13 | 1 | 1 | 0 | 0 | — | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndNoEntitiesProducesError` | | 14a | 1 | 1 | 0 | 1 | 0 | Root (with own DS) | **Error**: "No entities found" | `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` | | 14b | 1 | 1 | 0 | 1 | >0 | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` | | 15 | 1 | 1 | 1 | 0 | — | Root (with own DS) | **Valid** | `TestRootWithDataSourceAndEntitiesIsValid` | | 16a | 1 | 1 | 1 | 1 | 0 | Root (with own DS) | **Valid** + **Warn** | `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` | | 16b | 1 | 1 | 1 | 1 | >0 | Root (with own DS) | **Valid** | _covered by Azure#15 / #14b combined_ | ### Truth table — child config (validated when iterating `root.ChildConfigs`) Children are always treated as non-root regardless of their own `data-source-files`. | # | DS | E | AE | AE resolved | Expected | Test | |---|:--:|:-:|:--:|:-----------:|----------|------| | C1 | 0 | 0 | 0 | — | **Error** naming the child file: "data source is required" | `TestChildWithNoDataSourceProducesNamedError` | | C2 | 0 | * | * | — | **Error** naming the child file: "data source is required" | _covered by C1_ | | C3 | 1 | 0 | 0 | — | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndNoEntitiesProducesNamedError` | | C4a | 1 | 0 | 1 | 0 | **Error** naming the child file: "No entities found" | `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` | | C4b | 1 | 0 | 1 | >0 | **Valid** | _covered by C5 (resolved entities behave the same as manual entities)_ | | C5 | 1 | 1 | 0 | — | **Valid** | _implicitly via `TestRootWithDataSourceAndEntitiesIsValid` setup_ | | C6a | 1 | 1 | 1 | 0 | **Valid** + **Warn** naming the child file | `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` | | C6b | 1 | 1 | 1 | >0 | **Valid** | _covered by C5_ | ### Other scenarios | Scenario | Expected | Test | |----------|----------|------| | Connection-string error gates entity validation (no entity error fires when DB unreachable) | `IsConfigValid == false` due to connection error only | `TestValidateNonRootZeroEntitiesWithInvalidConnectionString` | | Config with no entities parses cleanly (constructor no longer throws) and `IsConfigValid` returns false without throwing | parse OK, validate fails | `TestValidateConfigWithNoEntitiesProducesCleanError` _(modified)_ | | Root parses successfully without a data source | parse OK, `IsRootConfig == true` | `TestRootConfigWithNoDataSourceAndNoEntitiesParses` | | Non-root with DS and no entities parses successfully | parse OK, `IsRootConfig == false` | `TestNonRootConfigWithDataSourceAndNoEntitiesParses` | | Autoentities present but resolve to nothing — must not crash, must not double-message with "No entities found" | no crash; only "No entities found" if total = 0 | `ValidateAutoentitiesConfiguration` _(modified to `isValidateOnly: true`)_ | New tests: `TestRootConfigWithNoDataSourceAndNoEntitiesParses` Root config (has data-source-files) without datasource parses OK `TestNonRootConfigWithDataSourceAndNoEntitiesParses` Non-root config with datasource + no entities parses OK (validation catches it later) `TestNonRootWithDataSourceAndNoEntitiesProducesError` Calls ValidateDataSourceAndEntityPresence directly, error recorded `TestNonRootWithNoDataSourceProducesError` No datasource, error with "data source is required" `TestNonRootWithDataSourceAndEntitiesIsValid` Datasource + entities, no errors `TestRootWithNoDataSourceAndNoEntitiesIsValid` Root with child, no own datasource, valid `TestRootWithNoDataSourceButEntitiesProducesError` Root with entities but no datasource, error `TestRootWithDataSourceAndEntitiesIsValid` Root with own datasource + entities, valid `TestChildWithDataSourceAndNoEntitiesProducesNamedError` Child with no entities, error names the child file `TestChildWithNoDataSourceProducesNamedError` Child with no datasource, error names the child file `TestNonRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Non-root with only autoentities that resolve to 0 `TestNonRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Non-root with only autoentities resolving > 0 entities `TestNonRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Non-root with manual entities + autoentities resolving 0 `TestRootWithNoDataSourceButAutoentitiesProducesError` Root with no datasource but autoentities defined `TestRootWithDataSourceAndNoEntitiesProducesError` Root with own datasource and zero entities/autoentities `TestRootWithDataSourceAndAutoentitiesResolvingZeroProducesError` Root with own datasource and autoentities resolving 0 `TestRootWithDataSourceAndAutoentitiesResolvingEntitiesIsValid` Root with own datasource and autoentities resolving > 0 `TestRootWithEntitiesAndAutoentitiesResolvingZeroLogsWarning` Root with own datasource, manual entities, and autoentities resolving 0 `TestChildWithDataSourceAndAutoentitiesResolvingZeroProducesNamedError` Child with autoentities-only resolving 0 `TestChildWithEntitiesAndAutoentitiesResolvingZeroLogsNamedWarning` Child with manual entities + autoentities resolving 0 Modified tests: `TestValidateConfigWithNoEntitiesProducesCleanError` Replaced main's version (expected parse failure) with ours: parse succeeds, IsConfigValid returns false `ValidateAutoentitiesConfiguration` Changed to isValidateOnly: true, asserts no crashes instead of zero errors --------- Co-authored-by: Anusha Kolan <anushakolan10@gmail.com>
## Why make this change? Enables MCP clients (like MCP Inspector, Claude Desktop, VS Code Copilot) to receive real-time log output via MCP `notifications/message`. Related: Azure#3274 (depends on PR Azure#3419) ## What is this change? When `logging/setLevel` is called with a level other than "none", logs are sent to MCP clients as JSON-RPC notifications: ```json { "jsonrpc": "2.0", "method": "notifications/message", "params": { "level": "info", "logger": "Azure.DataApiBuilder.Service.Startup", "data": "Starting Data API builder..." } } ``` ### New files: - `McpLogNotificationWriter.cs` - Writes logs as MCP notifications to stdout - `McpLogger.cs` / `McpLoggerProvider.cs` - ILogger implementation for .NET logging pipeline - `McpLogNotificationTests.cs` - Unit tests (8 tests) ### Modified files: - `Program.cs` - Registers `McpNotificationWriter` and `McpLoggerProvider` for MCP mode - `McpStdioServer.cs` - Enables notifications when `logging/setLevel` is called ## How was this tested? - Unit tests: 6 tests covering level mapping, enable/disable, JSON format - Manual testing with MCP Inspector: verified notifications appear when `logging/setLevel` is sent ## Note This PR targets `dev/anushakolan/set-log-level` (PR Azure#3419) as it depends on the `logging/setLevel` implementation.
## Why make this change?
Fixes the format of the OData filter in JWT string claims.
## What is this change?
In `AuthorizationResolver` we now escape embedded single quotes in claim
values by doubling them, before we wrap the value in single quotes for
OData substitution. This conforms to the OData 4.01 ABNF rule for string
literals (Section 7: Literal Data Values).
Policy: `@item.col1 eq @claims.userId`
Claim `userId` value: `alice' or 1 eq 1 or '`
| | Resulting OData predicate |
| --- | --- |
| Before | `col1 eq 'alice' or 1 eq 1 or ''` <- injects `or 1 eq 1`,
bypassing row-level auth |
| After | `col1 eq 'alice'' or 1 eq 1 or '''` <- attacker payload
contained inside a single string literal |
## How was this tested?
New parameterized test
`DbPolicy_StringClaim_SingleQuotesEscaped_PreventsODataInjection` in
`src/Service.Tests/Authorization/AuthorizationResolverUnitTests.cs`
covers:
- Active OR-predicate injection attempt is neutralized.
- Legitimate apostrophe-bearing value (e.g. `O'Brien`) is safely
escaped.
- Value composed solely of single quotes is fully escaped.
- Value with no single quotes is unchanged aside from the enclosing
quotes (no regression).
## Sample Request(s)
```json
{
"entities": {
"Note": {
"source": "dbo.Notes",
"permissions": [
{
"role": "authenticated",
"actions": [
{
"action": "read",
"policy": { "database": "@item.ownerId eq @claims.userId" }
}
]
}
]
}
}
}
```
Reproduction - `userId` claim value of `alice' or 1 eq 1 or '`:
```http
GET /api/Note HTTP/1.1
Authorization: Bearer <jwt-with-crafted-userId-claim>
X-MS-API-ROLE: authenticated
```
- Before fix: the engine emitted `WHERE ownerId = 'alice' or 1 eq 1 or
''`, returning rows owned by other users.
- After fix: the engine emits `WHERE ownerId = 'alice'' or 1 eq 1 or
'''`, which compares against the literal string `alice' or 1 eq 1 or '`
and returns no unauthorized rows.
Co-authored-by: Souvik Ghosh <souvikofficial04@gmail.com>
Co-authored-by: Aniruddh Munde <anmunde@microsoft.com>
…ent (Azure#3475) ## Why make this change? `nuget/README.md` (the README shown on the NuGet package page) was minimal and stale — it described only REST and GraphQL, omitted the MCP endpoint entirely and other content present in the root `README.md`. ## What is this change? Rewrote `nuget/README.md` to match the quality and completeness of the root `README.md`, adapted for the NuGet context: - **Badges** — docs link, license - **About** — updated to cover REST, GraphQL, and MCP endpoints - **Features** — added MCP section: DML tools, custom tools via stored procedures, configurable path (`/mcp` default), AI client compatibility (GitHub Copilot, Azure AI Foundry) - **Resources** — aligned with root README; links use absolute GitHub URLs ## How was this tested? - [ ] Integration Tests - [ ] Unit Tests Documentation-only change; no code logic modified. ## Sample Request(s) N/A --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: Ruben Cerna <rcernaserna@microsoft.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
|
|
||
| if (string.IsNullOrWhiteSpace(level2Options.ConnectionString)) | ||
| { | ||
| throw new Exception("Embeddings Azure Managed Redis L2 cache requires a valid connection-string."); |
There was a problem hiding this comment.
for log messages and such not sure if it would be less confusing to refer to it as 'Remote' cache or something in case people are running outside of Azure
Owner
Author
There was a problem hiding this comment.
Technically yes, I was hoping it is our value prop going into L2 caching. Will Modify it.
| .WithBackplane(new RedisBackplane(new RedisBackplaneOptions | ||
| { | ||
| ConnectionMultiplexerFactory = async () => await connectionMultiplexerTask | ||
| })); |
There was a problem hiding this comment.
for this particular api do we not need to .Apply() or something to signal that we're done configuring?
Owner
Author
There was a problem hiding this comment.
Nope, FusionCacheBuilder is itself registered in the DI container.
JimRoberts-MS
approved these changes
May 11, 2026
**Summary** This PR adds configurable text chunking capabilities to the embeddings API, enabling automatic text segmentation before embedding generation. This feature supports both single-text and multi-document batch processing with runtime configuration and query parameter overrides. **Changes** Configuration Added EmbeddingsChunkingOptions.cs - Configuration model for chunking behavior Enabled (bool) - Enable/disable chunking SizeChars (int) - Chunk size in characters (default: 800) OverlapChars (int) - Overlap between chunks (default: 100) EffectiveSizeChars property ensures minimum valid chunk size Modified EmbeddingsOptions.cs - Added Chunking property and IsChunkingEnabled helper Removed EmbeddingsCacheOptions.cs - Simplified configuration by removing unused cache feature **API Enhancements** Modified Controllers/EmbeddingController.cs Auto-detects request type (single text vs. document array) Implements overlapping text chunking algorithm Supports query parameter overrides: $chunking.enabled, $chunking.size-chars, $chunking.overlap-chars Returns multiple embeddings per document when chunking is enabled Added Models/EmbedDocumentRequest.cs - Request model for document arrays Added Models/EmbedDocumentResponse.cs - Response model with chunked embeddings Schema ([schemas](vscode-file://vscode-app/c:/Users/ajtiwari/AppData/Local/Programs/Microsoft%20VS%20Code/07ff9d6178/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) Modified dab.draft.schema.json - Added chunking configuration schema with validation rules Testing ([UnitTests](vscode-file://vscode-app/c:/Users/ajtiwari/AppData/Local/Programs/Microsoft%20VS%20Code/07ff9d6178/resources/app/out/vs/code/electron-browser/workbench/workbench.html)) Added EmbeddingsChunkingOptionsTests.cs (13 tests) - Configuration validation Added ChunkTextTests.cs (21 tests) - Chunking algorithm validation including edge cases Modified EmbeddingControllerTests.cs (+18 tests) - API endpoint tests for chunking and document arrays Total Test Coverage: 72 tests (48 existing + 24 new) - All passing## Why make this change? **Testing** All 72 unit tests passing Edge cases covered: empty text, very small chunks, overlap larger than chunk size, Unicode text Query parameter parsing validated Backward compatibility verified **Breaking Changes** None - This is a backward-compatible addition. Existing single-text requests continue to work without modification. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: JerryNixon <1749983+JerryNixon@users.noreply.github.com> Co-authored-by: roberto.perez <robertopc@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Souvik Ghosh <souvikofficial04@gmail.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: sayalikudale <68876274+sayalikudale@users.noreply.github.com> Co-authored-by: RubenCerna2079 <32799214+RubenCerna2079@users.noreply.github.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com> Co-authored-by: Anusha Kolan <anushakolan10@gmail.com> Co-authored-by: Sayali Kudale <sayalikudale@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Jim Roberts <JimRoberts-MS@users.noreply.github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
…-api-builder into embedding-phase2
…-api-builder into embedding-phase2
## Summary - Closes Azure#3398 Internally, all tools respects the enabled/disabled config flag. However, discovery was still made. Disabled tools were still returned in `tools/list` MCP responses. This fix filters them out at the listing level. ## Changes - Added `IsEnabled(RuntimeConfig)` to `IMcpTool` interface - Implemented per-tool config checks in all 7 built-in tools + `DynamicCustomTool` (always enabled) - Added `GetEnabledTools(RuntimeConfig)` to `McpToolRegistry` - Updated HTTP and stdio `tools/list` handlers to use `GetEnabledTools` with fail-fast (no silent fallback to unfiltered list) - Added unit tests for filtering logic and per-tool `IsEnabled` behavior ### Behavior **Before:** `tools/list` returned all registered tools regardless of `dml-tools` config flags. Setting `"delete-record": false` or `"create-record": false` had zero effect — disabled tools still appeared in the tool list and were callable. **After:** `tools/list` only returns tools where `IsEnabled(RuntimeConfig)` is true. Disabled DML tools no longer appear in the response. Both HTTP (Streamable) and stdio transports are covered.
## Why make this change? - Solves issue Azure#3479 ## What is this change? - Changes `Cli.proj` and `Core.proj` files to retrieve their unique NOTICE files from their respective new path. - Changes `notice-generation.ps1` script to use the NOTICE file path from the pipeline as an argument instead of having it hard coded. - Changes `create-manifest-file.ps1` script to add the new nuget Core package that will be created in the pipeline. ## How was this tested? This is tested alongside the changes from the [pipeline](https://msdata.visualstudio.com/CosmosDB/_git/data-api-builder/pullrequest/2097650) ## Sample Request(s) N/A --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 19, 2026
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 19, 2026
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Why make this change? - Closes Azure#3593 - Closes Azure#3486 - Azure#3168 attempted to add a simplified dab config to the docker image build. However, the official image that is built internally picks up the dab-config.json from the binaries built folder. - That is why the generic dab-config.json should move to `src`/`Service` and replace the unusable previous dab-config.json that was specific to Aspire project. This way the generic dab-config gets copied into the publish folder and lands in the official docker image ## What is this change? **Configuration and Environment Variable Changes:** - Changed environment variable for database connection strings from `ConnectionStrings__Database` to `DAB_CONNSTRING` throughout the codebase, ensuring consistency and aligning with Data API Builder conventions (`src/Aspire.AppHost/AppHost.cs`, `src/Service/dab-config.json`). [[1]](diffhunk://#diff-449ec835acbdec5d143b744fabbfe570c55adfb1f7c8aeede080f9da7a8051aeL42-R46) [[2]](diffhunk://#diff-449ec835acbdec5d143b744fabbfe570c55adfb1f7c8aeede080f9da7a8051aeL82-R86) [[3]](diffhunk://#diff-1419093a1971fc58ad92a1d189020545ab9ac30f7a2143c5c74acf6b9bde266dR2-R5) - Set the database type explicitly to `mssql` in `dab-config.json` instead of using an environment variable, simplifying configuration for local development (`src/Service/dab-config.json`). **Entity Management Simplification:** - Removed explicit entity definitions and relationships in favor of `autoentities` with a default pattern that includes all schema objects, drastically reducing manual configuration and making the setup more maintainable (`dab-config.json`, `src/Service/dab-config.json`). [[1]](diffhunk://#diff-9171a2da7cd405d440d610f1d8e2bbdf02d2f051318901ff7fe7d4ad3711a7c7L36-R41) [[2]](diffhunk://#diff-1419093a1971fc58ad92a1d189020545ab9ac30f7a2143c5c74acf6b9bde266dR2-R5) - Added a pattern in `autoentities` to automatically include all objects and generate entity names in the format `{schema}_{object}` (`dab-config.json`, `src/Service/dab-config.json`). [[1]](diffhunk://#diff-9171a2da7cd405d440d610f1d8e2bbdf02d2f051318901ff7fe7d4ad3711a7c7L36-R41) [[2]](diffhunk://#diff-1419093a1971fc58ad92a1d189020545ab9ac30f7a2143c5c74acf6b9bde266dR2-R5) **Access and Security Relaxations for Development:** - Updated CORS settings to allow all origins by default, making APIs accessible from any client during development (`dab-config.json`, `src/Service/dab-config.json`). [[1]](diffhunk://#diff-9171a2da7cd405d440d610f1d8e2bbdf02d2f051318901ff7fe7d4ad3711a7c7L27-R27) [[2]](diffhunk://#diff-1419093a1971fc58ad92a1d189020545ab9ac30f7a2143c5c74acf6b9bde266dR2-R5) - Changed authentication provider to `Unauthenticated` and allowed all actions for the `anonymous` role, making the API open by default for easier testing (`dab-config.json`, `src/Service/dab-config.json`). [[1]](diffhunk://#diff-9171a2da7cd405d440d610f1d8e2bbdf02d2f051318901ff7fe7d4ad3711a7c7L52-R55) [[2]](diffhunk://#diff-1419093a1971fc58ad92a1d189020545ab9ac30f7a2143c5c74acf6b9bde266dR2-R5) **Minor Improvements:** - Improved error messaging for GraphQL name violations by including the entity name in the exception message (`src/Core/Services/MetadataProviders/SqlMetadataProvider.cs`). - Disabled strict request body validation for REST endpoints to allow more flexible payloads during development (`src/Service/dab-config.json`). ## How was this tested? - Used the default config to start dab on a local SQL Server and ensured all endpoints were working. Once merged, will build a test docker image from internal buddy pipeline to verify inclusion.
…`entity.description` covers MCP (Azure#3552) ## Why make this change? `entities.*.source.description` existed in the JSON schema with documentation claiming it drove MCP tool discovery — but it was never wired to any C# model (`EntitySource` has no `Description` field) and was silently ignored at runtime. Meanwhile, `entities.*.description` — the field that *actually* surfaces in MCP — made no mention of MCP. Users following the schema would configure the wrong field and get no MCP output with no error. ## What is this change? - **Removed** `entities.*.source.description` from `schemas/dab.draft.schema.json`. The property never had a backing field in `EntitySource` and was dead schema surface area. - **Updated** the description of `entities.*.description` to explicitly mention MCP: ```diff - "Optional description for the entity. Will be surfaced in generated API documentation and GraphQL schema as comments." + "Optional description for the entity. Will be surfaced in MCP tool discovery, generated API documentation, and GraphQL schema as comments." ``` The correct config pattern for MCP-visible descriptions is: ```json "Todo": { "description": "Manages to-do items — surfaced in MCP, REST docs, and GraphQL schema.", "source": { "object": "dbo.Todos", "type": "table" } } ``` ## How was this tested? - [ ] Integration Tests - [ ] Unit Tests Schema-only change; no runtime behavior altered. The removed property had no backing model field and was never deserialized. ## Sample Request(s) N/A — schema documentation fix only. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Aniruddh25 <3513779+Aniruddh25@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com>
Co-authored-by: Copilot <copilot@github.com>
## Why make this change? - Closes Azure#3588 `ExecuteEntityTool` validated parameters against config-side `Source.Parameters` only, rejecting DB-discovered parameters that had no config entry. Defaults were applied in an all-or-nothing fashion (only when the user supplied zero parameters). ## What is this change? - Parameter validation now uses `StoredProcedureDefinition.Parameters` (the merged DB metadata) as source of truth, accepting any parameter the DB declares. - Default application iterates `ParameterDefinition.HasConfigDefault`/`ConfigDefaultValue` per-missing-parameter, so partial user input gets remaining defaults filled in. This is an intentional behavioral change from the previous all-or-nothing guard. - Safe-cast `dbObject` to `DatabaseStoredProcedure` with graceful error instead of unconditional cast. ## How was this tested? - [x] Unit Tests | Test | Scenario | |------|----------| | `AcceptsDbDiscoveredParam` | Param in DB metadata but not config - accepted | | `RejectsInvalidParamName` (x2) | Param not in DB metadata - error | | `AcceptsMultipleValidParams` | Multiple valid params pass validation | | `RejectsRequest_WhenAnyParamInvalid` | Mix of valid + invalid params - rejected | | `AppliesConfigDefaults` | Missing params filled from `HasConfigDefault`/`ConfigDefaultValue` | | `UserParamsOverride` | User-supplied values take precedence over config defaults | | `DoesNotInjectWithoutDefault` | Param without `HasConfigDefault` is not injected | | `ZeroParamSP` | SP with no parameters succeeds with no input | | `RejectsNonStoredProcedureEntity` | Non-SP entity types return error | | `ReturnsError_WhenEntityNotFound` | Non-existent entity returns EntityNotFound | - [x] Integration Tests | Test | Scenario | |------|----------| | Success (x4 DataRows) | GetBook(id), InsertBook(null), InsertBook(override), GetBooks(zero-param) | | `GetBookById_ReturnsRecord` | Validates returned data matches expected | | `InsertBookWithDefaults_ExecutesSuccessfully` | Config defaults produce valid insert | | Rejection (x2 DataRows) | Invalid param names rejected with clear error | - [x] Manual Testing | Category | Result | |----------|--------| | Valid DB-discovered params | Pass | | Invalid/unknown/extra params rejected | Pass | | Config defaults / partial params / override | Pass | | Non-SP entity type gating | Pass | | Successful SP execution (zero/single/multi-param) | Pass | | Non-existent entity | Pass | | SQL injection (parameterized) | Pass | | Type mismatch | Pass | | Permission denied | Pass | | No results for valid param | Pass | ## Sample Request(s) NA
## Why make this change? - Closes Azure#3589 DynamicCustomTool (per-entity custom MCP tools) used `entityConfig.Source.Parameters` / `ParameterMetadata` for parameter validation and defaults. This meant: - Parameters discovered only via DB metadata (not listed in config) were rejected. - Defaults were applied all-or-nothing instead of per-missing-parameter. ## What is this change? - **Validate against DB metadata**: After authorization, safe-cast `dbObject` to `DatabaseStoredProcedure` and validate user-supplied params against `spDefinition.Parameters` (source of truth populated by `FillSchemaForStoredProcedureAsync`). - **Per-param config defaults**: Iterate `spDefinition.Parameters`; for each missing param with `HasConfigDefault == true`, inject `ConfigDefaultValue`. User-supplied params always take precedence. - **Removed old logic**: Replaced `entityConfig.Source.Parameters` / `ParameterMetadata` loop. ## How was this tested? - [x] Unit Tests | Test | Scenario | |------|----------| | `AcceptsDbDiscoveredParam` | Param in DB metadata but not config - accepted | | `RejectsInvalidParamName` (x2) | Param not in DB metadata - error | | `AppliesConfigDefaults` | Missing params filled from `HasConfigDefault`/`ConfigDefaultValue` | | `UserParamsOverride` | User-supplied values take precedence over config defaults | | `DoesNotInjectWithoutDefault` | Param without `HasConfigDefault` is not injected | | `ZeroParamSP` | SP with no parameters succeeds with no input | - [x] Integration Tests | Test | Scenario | |------|----------| | Success (x4 DataRows) | GetBook(id), InsertBook(null), InsertBook(override), GetBooks(zero-param) | | `GetBookById_ReturnsRecord` | Validates returned data matches expected | | Rejection (x2 DataRows) | Invalid param names rejected with clear error | - [x] Manual Testing | Category | Result | |----------|--------| | Valid params / no results | Pass | | Type mismatch - error | Pass | | Zero-param SPs | Pass | | Config defaults / partial params | Pass | | Multi-param update | Pass | | SQL injection (parameterized) | Pass | | Boundary values (-1, 0, MAX_INT) | Pass | | Empty/long/unicode strings | Pass | | Read-after-write, FK errors, cross-entity | Pass | ## Sample Request(s) NA --------- Co-authored-by: Anusha Kolan <anushakolan10@gmail.com>
## Why make this change? - Solves issue Azure#3607 Currently the notice file script has a bug that causes it to not replace some of the strings correctly as the values in the `content` variable that are replaced are overwritten by other changes, so they are not saved. ## What is this change? Changed the way the values are changed inside the `content` value by using regex. Currently the only values that need to be deleted are for the years 2023 and 2024. This might change in the future. ## How was this tested? - [ ] Integration Tests - [ ] Unit Tests - [x] Manual Testing Tested this manually by running the script locally. ## Sample Request(s) N/A --------- Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 21, 2026
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 21, 2026
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Azure#3619) ## Why make this change? - Closes Azure#3615 Custom MCP tools used a permissive multi-type array for all parameters in the input schema. MCP clients had no type information to guide users or validate inputs before execution. ## What is this change? - **InitializeMetadata**: Resolves StoredProcedureDefinition.Parameters from the metadata provider and builds a typed JSON Schema (integer/number/boolean/string) for each DB-discovered parameter. - **Fallback**: Falls back to config-based permissive schema when DB metadata is unavailable. - **Both modes**: Applied in HTTP mode (McpToolRegistryInitializer) and stdio mode (McpStdioHelper) via shared `McpToolRegistry.InitializeAndRegisterTools` helper. - **Default descriptions**: Surfaces `(default: value)` in parameter descriptions when HasConfigDefault is set. ## How was this tested? - [x] Unit Tests | Test | Scenario | |------|----------| | UsesDbMetadata_WhenInitialized | Schema uses typed params after InitializeMetadata | | FallsBackToConfig_WhenDbMetadataUnavailable | Permissive array when no metadata factory | | UsesConfigSchema_WhenNotInitialized | Config schema used without InitializeMetadata call | | IncludesAllDbDiscoveredParams | Params not in config but in DB appear in schema | | MapsSystemTypesToJsonSchemaTypes (x12) | int/long/short/byte to integer, float/double/decimal to number, bool to boolean, string/Guid/DateTime to string | | UsesPermissiveType_WhenSystemTypeIsNull | Null SystemType falls back to multi-type array | | IncludesDefaultInDescription | (default: value) in description | | ZeroParamSP_ReturnsEmptyProperties | Empty properties for zero-param SP | - [x] Integration Tests | Test | Scenario | |------|----------| | InitializeMetadata_SchemaReflectsDbParameterTypes (x3) | int->integer, varchar->string mapping against live DB | | InitializeMetadata_ZeroParamSP_HasEmptyProperties | Zero-param SP has empty properties from DB metadata | | InitializeMetadata_DescriptionIncludesConfigDefaults (x2) | Config defaults (randomX, 1234) in descriptions | - [x] Manual Testing | Tool | Before | After | |------|--------|-------| | get_book(id) | type: [...] | type: integer | | insert_book(title, publisher_id) | permissive | string, integer + defaults in desc | | update_book_title(id, title) | permissive | integer, string + defaults in desc | | get_books (zero-param) | empty | empty | | End-to-end execution get_book(id=1) | N/A | returns book record | ## Sample Request(s) tools/list response (get_book schema after fix): ```json { "name": "get_book", "inputSchema": { "type": "object", "properties": { "id": { "type": "integer", "description": "Parameter id" } } } } ```
…and validate wire output (Azure#3616) ## Why make this change? Strict MCP clients reject DAB stdio tool responses when optional content-block metadata is emitted as explicit `null` (for example, `annotations: null` and `_meta: null`). This causes otherwise successful tool calls to fail schema validation on strict clients. ## What is this change? - **MCP stdio result serialization update** - Updated `McpStdioServer.WriteResult` to serialize JSON-RPC result payloads with `JsonIgnoreCondition.WhenWritingNull`. - This omits null-valued properties in emitted result responses, including optional MCP content-block metadata fields when unset. - Existing request handling and content coercion behavior in `CoerceToMcpContentBlocks` remain unchanged. - **Wire-output regression coverage** - Added `McpStdioServerContentBlockTests` coverage that captures actual stdio JSON-RPC output from `WriteResult` and verifies `TextContentBlock` output omits `annotations` and `_meta` when unset. - This ensures behavior is validated against emitted wire format, not just local serialization assumptions. - **Illustrative behavior** ```csharp // Before (strict clients reject): // { "type": "text", "text": "hello", "annotations": null, "_meta": null } // After: // { "type": "text", "text": "hello" } ``` ## How was this tested? - [x] Integration Tests - [x] Unit Tests ## Sample Request(s) - Example CLI usage: ```bash dotnet dab start --mcp-stdio role:anonymous --LogLevel Error --config dab.config.json ``` - Example MCP request/response shape: ```json {"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"describe_entities","arguments":{}}} ``` ```json { "jsonrpc":"2.0", "id":1, "result":{ "content":[ { "type":"text", "text":"..." } ] } } ``` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Aniruddh25 <3513779+Aniruddh25@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com>
…25, return description in serverInfo, and add initialize wire-shape tests (Azure#3622) ## Why make this change? DAB MCP stdio always returned `2025-06-18` in `initialize`, even when clients requested newer protocol versions, causing modern MCP clients (e.g., Claude Code 2.x) to disconnect. This update aligns initialize negotiation behavior with MCP version selection expectations, updates the default supported version to `2025-11-25`, aligns initialize metadata shape for newer protocol versions, and adds focused wiring tests to lock down the initialize response wire format end-to-end. ## What is this change? - **Protocol default** - Bumped MCP default protocol from `2025-06-18` to `2025-11-25` in `McpProtocolDefaults`. - **Initialize version negotiation** - `initialize` now reads `params.protocolVersion` from the client request. - Server responds with the greatest supported version `<=` client requested (never higher than the client). - **Initialize metadata field shape by negotiated protocol** - For negotiated protocol versions `>= 2025-11-25`, runtime MCP description is returned as `serverInfo.description`. - For older negotiated protocol versions, description remains top-level `instructions` for backward compatibility. - **Negotiation utility + parsing behavior** - Added centralized negotiation helper in `McpProtocolDefaults`. - Compares date-formatted versions (`yyyy-MM-dd`) with ordinal string fallback for non-date values. - Added protocol-threshold helper for choosing `serverInfo.description` vs `instructions`. - **Focused test coverage** - Added helper unit tests for: - new default protocol value, - client newer than server (server version returned), - client older than server (client version returned), - missing client version, - non-date fallback comparison path, - protocol threshold behavior for `serverInfo.description`. - Added focused initialize wiring tests for `HandleInitialize` that validate: - JSON-RPC envelope shape (`jsonrpc`, `id`), - negotiated `result.protocolVersion`, - capabilities shape, - field placement across protocol/description scenarios (`serverInfo.description` vs `instructions` vs neither). ```csharp string negotiatedProtocolVersion = McpProtocolDefaults.ResolveInitializeResponseProtocolVersion( _protocolVersion, clientRequestedProtocolVersion); ``` ## How was this tested? - [ ] Integration Tests - [x] Unit Tests ## Sample Request(s) - Example CLI usage: - `dab start --mcp-stdio` - Example initialize request: ```json {"jsonrpc":"2.0","method":"initialize","params":{"protocolVersion":"2025-11-25","capabilities":{},"clientInfo":{"name":"claude-code","version":"2.1.98"}},"id":1} ``` - Example initialize response behavior for negotiated `2025-11-25`: ```json {"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-11-25","capabilities":{"tools":{"listChanged":true},"logging":{}},"serverInfo":{"name":"SQL MCP Server","version":"<dab-version>","description":"<mcp-runtime-description>"}}} ``` - Example initialize response behavior for negotiated older versions: ```json {"jsonrpc":"2.0","id":1,"result":{"protocolVersion":"2025-06-18","capabilities":{"tools":{"listChanged":true},"logging":{}},"serverInfo":{"name":"SQL MCP Server","version":"<dab-version>"},"instructions":"<mcp-runtime-description>"}} ``` --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: Aniruddh25 <3513779+Aniruddh25@users.noreply.github.com> Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 22, 2026
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 22, 2026
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
## Why make this change? - Solves issue Azure#3618 ## What is this change? This adds `TargetsForTfmSpecificContentInPackage` and `IncludeInternalDependenciesInPackage` to create the dependencies from the Core project as dlls. And uses the `PrivateAssets` to ensure that those dependencies are not added to the nuspec file. ## How was this tested? - [ ] Integration Tests - [ ] Unit Tests - [X] Manual Tests Tested by using command `dotnet pack .\Azure.DataApiBuilder.Core.csproj -c Release` to build nuget package and ensure that all the dependencies are created as expected. <img width="828" height="355" alt="image" src="https://github.com/user-attachments/assets/cb3a72c1-1984-4869-aece-4f1d17bc7519" /> ## Sample Request(s) N/A --------- Co-authored-by: Aniruddh Munde <anmunde@microsoft.com> Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com>
Azure#3625) ## Summary Fixes a bug in the MCP stdio transport where `isError` was silently dropped from every `tools/call` response. ### Root Cause `HandleCallTool` in `McpStdioServer` used `CoerceToMcpContentBlocks` (reflection-based) to extract only the `Content` list from the `CallToolResult`, then constructed the JSON-RPC result with `new { content }`. The `IsError` property was never read, so all tool responses — successes and errors alike — looked identical on the wire. The HTTP transport was not affected; the MCP SDK serializes the full `CallToolResult` returned by `WithCallToolHandler`, correctly emitting `"isError": true` when set. ### Fix After `CoerceToMcpContentBlocks`, read `callResult.IsError` and include `isError` in the result object only when `== true`. The existing `WhenWritingNull` serializer option on `_jsonOptions` ensures a null `isError` (success path) is omitted from the wire — matching the MCP spec. ```csharp bool? isError = callResult.IsError; if (isError == true) { WriteResult(id, new { content, isError }); } else { WriteResult(id, new { content }); } ``` ### Tests Added (`McpStdioServerContentBlockTests`) - **`HandleCallTool_ErrorResult_EmitsIsErrorTrueOnWire`** — creates a real `CallToolResult { IsError = true }`, replicates the exact `HandleCallTool` conditional logic, and asserts `"isError": true` is present in the JSON-RPC wire output. - **`HandleCallTool_SuccessResult_OmitsIsErrorFromWire`** — verifies that a success result produces no `isError` field at all on the wire. --------- Co-authored-by: aaronburtle <93220300+aaronburtle@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 26, 2026
…nale Addresses two PR review comments from JimRoberts-MS on PR ajtiwari07#1: Comment ajtiwari07#2 (ParameterEmbeddingHelper.cs:157): "is there a limitation that only one parameter can be configured for embedding at a time? I think we'd want to do batching here to avoid multiple sequential waits on the api" Comment ajtiwari07#1 (RuntimeConfigValidator.cs:507): "can this limitation about default values be clarified a bit more?" Changes: 1. ParameterEmbeddingHelper.cs — refactor to use TryEmbedBatchAsync (addresses ajtiwari07#2): Restructured the substitution loop into 3 phases: - COLLECT — validate each embed param value, gather (paramName, text) pairs (preserves per-param error specificity for type/null checks) - BATCH — single TryEmbedBatchAsync call instead of N sequential TryEmbedAsync calls (saves ~(N-1) × API_LATENCY on cache miss) - SUBSTITUTE — write each returned vector back into resolvedParams Behavior preserved: - Single-embed-param path is equivalent (batch of 1) - All error status codes unchanged (400 for bad input, 500 for service failure, 503 for missing service) - In-place mutation contract on resolvedParams unchanged - G9 float format preserved Defensive checks added: - Length mismatch between requests and returned embeddings → 500 - Null/empty embedding for any individual param → 500 Type validation extracted into private ExtractTextValue(paramName, value) helper to flatten the nested if/else in the main loop. Verified with 16 manual test cases across REST POST/GET and GraphQL, covering single-embed and multi-embed sprocs, positive and negative inputs. 2. ParameterEmbeddingHelper.cs — fix misleading internal comment (related to ajtiwari07#1's surrounding context): The previous comment claimed "DAB's existing required-param validation handles missing required params later" — but DAB's request validation for sprocs only checks for extra fields (not missing ones). The actual mechanism that catches missing required params is the SQL Server error "expects parameter X, which was not supplied", parsed by MsSqlDbExceptionParser into a 400 DatabaseInputError. Updated the comment to describe what actually happens. 3. RuntimeConfigValidator.cs — expand embed/default rule rationale (addresses ajtiwari07#1): Replaced the one-line rationale for "Rule 3: embed:true with a default value is not supported" with a multi-paragraph explanation that: - Leads with the conceptual UX point: an embed param represents user input (typically a search query); defaulting it would mean the server fabricates and embeds a query the user never typed. That isn't a sensible fallback. - Notes that defaults on non-embed params of the same sproc remain supported (rule only fires for embed: true params). - Briefly documents why even setting aside the UX concern, supporting embed-defaults would be non-trivial (GraphQL schema literal-baking has no VECTOR type; REST/MCP defaults would be re-embedded every request; embedding-at-startup couples startup to provider availability). - Documents the current observed behavior when a client forgets to supply an embed param (verified empirically): explicit null/empty → 400 BadRequest from the helper; field omitted → 400 DatabaseInputError from SQL via MsSqlDbExceptionParser. Both produce a clear, actionable client error. - Notes that the rule can be lifted later if a real use case emerges. No behavior changes beyond the batching itself. Validator rule unchanged; helper logic for missing-value handling unchanged; error message text unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
prshri-msft
added a commit
to prshri-msft/data-api-builder
that referenced
this pull request
May 26, 2026
Adds the first formal automated test coverage for Phase 3 production code. New file: src/Service.Tests/UnitTests/ParameterEmbeddingHelperTests.cs 28 tests covering ParameterEmbeddingHelper.SubstituteEmbedParametersAsync, organized into 7 #region blocks (matching the EmbeddingServiceTests.cs "#region Batch Embedding Tests" pattern): #region No-Op Cases (4 tests) - NullConfigParams_ReturnsImmediately_NoServiceCall - NoEmbedParams_ReturnsImmediately_NoServiceCall - EmbedParamsConfiguredButNoneSupplied_ReturnsAfterCollect_NoServiceCall - EmptyConfigParamsList_ReturnsImmediately_NoServiceCall #region Service Availability (2 tests) - NullService_WithEmbedParams_Throws503 (defense-in-depth) - NullService_WithoutEmbedParams_NoThrow (backward compat) #region Input Type Validation (8 tests) - PlainString_AcceptsAndEmbeds - JsonElementString_AcceptsAndEmbeds - JsonElementNumber_Throws400 - JsonElementBoolean_Throws400 - JsonElementArray_Throws400 - JsonElementObject_Throws400 - JsonElementNull_Throws400AsEmpty - NonStringNonJsonElementValue_Throws400 #region Empty And Whitespace Validation (3 tests) - EmptyString_Throws400 - WhitespaceString_Throws400 - NullValue_Throws400 #region Batching Behavior (5 tests) — covers Jim review comment ajtiwari07#2 - SingleEmbedParam_CallsBatchOnce_NotSequential - MultipleEmbedParams_CallsBatchOnce_NotSequential ← key batching guarantee - MixedEmbedAndNonEmbed_OnlyEmbedTextsBatched - MultipleEmbedParams_OrderPreserved_BatchTextsMatchConfigOrder - PartiallySuppliedEmbedParams_BatchesSubsetOnly #region Batch Result Handling (4 tests) - BatchSuccess_VectorsSubstitutedInResolvedParams - BatchFailure_Throws500_WithAllParamNames - BatchLengthMismatch_Throws500 - IndividualEmbeddingEmpty_Throws500_NamingFailedParam #region Output Format And Cancellation (2 tests) - VectorJson_UsesG9AndInvariantCulture ← validates locale-independent G9 float serialization - CancellationToken_ForwardedToEmbeddingService Implementation notes -------------------- - Mocking pattern: Mock<IEmbeddingService> (Strict). Each test sets up the expected batch call and verifies post-conditions on resolvedParams. No database, no DI container — pure unit tests. - Helper factories at top of class (EmbedParam, NormalParam, JsonElementFrom, SetupBatch, VerifyBatchedExactlyOnce) keep individual test bodies focused on the behavior being tested. - JsonElement construction uses JsonDocument.Parse(...).RootElement.Clone() so the parsed element survives after the source document is disposed. - Float values used in expected-output assertions are powers of 1/2 (0.5, 0.25, 0.125) which are exactly representable in binary float and round-trip through G9 to the same string representation. The "G9 + InvariantCulture" test specifically uses non-exact values (0.1f, -0.2f, 0.0001234567f) and asserts on parsed-back values rather than exact strings to verify precision and locale independence. - `#nullable enable` directive at the top of the file is required because the helper signature uses IDictionary<string, object?> and the test project is set to `<Nullable>disable</Nullable>` globally. Per-file enable is the smallest scoped change that lets us match the helper's signature without introducing project-wide nullable warnings. Test results ------------ - dotnet build src/Service.Tests/Azure.DataApiBuilder.Service.Tests.csproj -c Release → Build succeeded. 0 Warning(s). 0 Error(s). - dotnet test --filter "FullyQualifiedName~ParameterEmbeddingHelperTests" → Total tests: 28. Passed: 28. Failed: 0. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Embeddings-Specific Cache Configuration
EmbeddingsCacheOptionsclass for embeddings-specific cache settings2. Level 2 (L2) Distributed Redis Cache Support
EmbeddingsCacheLevel2Optionsclass for Redis-specific configuration3. Enhanced Configuration Schema
cachesection underembeddings4. Dedicated FusionCache Instance for Embeddings
5. Comprehensive Test Coverage
EmbeddingsCacheOptionsclass (205 lines)EmbeddingsCacheLevel2Optionsclass (64 lines)