Skip to content

Surface execution-history lookup failures in df.instance_executions (#168)#225

Merged
pinodeca merged 1 commit into
microsoft:mainfrom
crprashant:crprashant/fix-instance-executions-empty-rows
Jun 12, 2026
Merged

Surface execution-history lookup failures in df.instance_executions (#168)#225
pinodeca merged 1 commit into
microsoft:mainfrom
crprashant:crprashant/fix-instance-executions-empty-rows

Conversation

@crprashant

Copy link
Copy Markdown
Contributor

Summary

Fixes #168.

df.instance_executions() could return no rows for a completed instance, making "this instance genuinely has no execution history" indistinguishable from "the execution-history lookup silently failed."

Root cause

The function swallowed every failure path into an empty rowset:

  • building the async runtime
  • connecting to the duroxide store (new_backend_provider)
  • list_executions
  • get_execution_info

Each was Err(_) => return vec![]. But duroxide-pg writes the instances row and the first executions row in the same transaction, so a completed instance always has at least one execution row. An empty result could therefore only mean a silently-swallowed lookup failure.

Fix

  • Propagate the runtime / provider / list_executions / get_execution_info failures and raise an explicit error (pgrx::error!) instead of returning an empty rowset. The error is raised outside rt.block_on(...) to avoid longjmp-through-tokio, matching the existing pattern in dsl.rs / client.rs.
  • The legitimately-empty case — a non-existent or non-owned instance, gated by the existing RLS existence check — still returns an empty rowset, which is the correct response.
  • Validate limit_count >= 1 and cap it at 10000, matching df.list_instances.

Issue #168 explicitly accepts "return an explicit error" as a valid resolution.

Testing

  • cargo fmt -p pg_durable -- --check — clean
  • cargo build --features pg17 — clean (only a pre-existing, feature-gated extract_host dead-code warning that is unrelated to this change and absent under CI's feature combo)
  • cargo clippy --no-default-features --features pg17,http-allow-test-domains -- -D warnings — clean (0 warnings)
  • cargo pgrx test pg17 (./scripts/test-unit.sh) — 145 passed, 0 failed, 16 ignored
  • ./scripts/test-e2e-local.sh27 passed, 0 failed, including the new regression test tests/e2e/sql/05_instance_executions.sql

…icrosoft#168)

df.instance_executions() previously swallowed every failure (runtime build, provider connect, list_executions, get_execution_info) into an empty rowset. Because a completed instance always has at least one execution row, an empty result was indistinguishable from a silently failed lookup, masking real errors (issue microsoft#168).

These failures now raise an explicit error, while the genuinely-empty case (a non-existent or non-owned instance, gated by the RLS existence check) still returns an empty rowset. Also validate limit_count >= 1 and cap it at 10000, matching df.list_instances.

Add e2e regression test tests/e2e/sql/05_instance_executions.sql.

@pinodeca pinodeca left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; reviewed with Opus 4.8

Minor / non-blocking

  • Test file naming: tests/e2e/sql/05_instance_executions.sql duplicates the 05_ prefix of the existing 05_monitoring_and_explain.sql, which already tests the monitoring functions. Folding it in there (or using a fresh number) would be tidier. The repo tolerates duplicate numbers (three 22_ files) and the runner auto-discovers via *.sql glob into the "standard" phase, so it runs fine — purely cosmetic.
  • The sibling metrics() function just below still uses the same Err(_) => vec![] swallow pattern. Out of scope for #168, but it carries the identical smell if you want a follow-up.
  • The test's deliberate choice not to assert status = 'completed' (with the documented rationale about df.instances vs duroxide per-execution status diverging under BGW load) is well-reasoned and avoids a flaky race.

Verdict

Approve. Solid, well-documented, minimal fix that matches issue #168's accepted resolution, with no security violations.

@pinodeca pinodeca merged commit a9f86bc into microsoft:main Jun 12, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

df.instance_executions() can return no rows for completed instances

2 participants