Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
a3b0703
implement canonical metrics, feature flagged to preserve legacy where…
chrishagglund-ship-it Apr 29, 2026
083f173
standardize on what truthiness means for env var
chrishagglund-ship-it Apr 29, 2026
be00e08
add some tests for new stuff
chrishagglund-ship-it May 5, 2026
2384d2d
delint
chrishagglund-ship-it May 5, 2026
0b082d6
add retry for 502,503,504 error codes
chrishagglund-ship-it May 5, 2026
396fcc5
experiments with flaky test runs ...
chrishagglund-ship-it May 5, 2026
bf71a1e
remove editorialization
chrishagglund-ship-it May 5, 2026
c8c4e9f
cleaner reporting on which metrics implementation is used in the test…
chrishagglund-ship-it May 6, 2026
c3da927
updates for metrics related documentation
chrishagglund-ship-it May 6, 2026
19773f8
add or update a changelog
chrishagglund-ship-it May 7, 2026
6cd069f
adjustments to internal mechanics
chrishagglund-ship-it May 15, 2026
db1fd4c
harness improvements for generating traffic with uris in metrics
chrishagglund-ship-it May 15, 2026
cb97171
harness improvements for generating traffic with uris in metrics .. c…
chrishagglund-ship-it May 15, 2026
24b162c
revisions to internal mechanics for http request metrics
chrishagglund-ship-it May 18, 2026
c8e97af
update to changelog and metrics docs
chrishagglund-ship-it May 18, 2026
5b0becd
address self-review concerns regarding backward compatibility
chrishagglund-ship-it May 18, 2026
7a4c991
address backward compat concerned from self review
chrishagglund-ship-it May 18, 2026
e4d8e67
trying to settle on a better implementation of retry fetch wrapper wi…
chrishagglund-ship-it May 18, 2026
a20cc3f
fix for label defect in canonical metrics
chrishagglund-ship-it May 18, 2026
f42bec9
give 1 flaky integration test more time and log it's progress
chrishagglund-ship-it May 19, 2026
01ca97f
test out pre-existing prom-client renderer
chrishagglund-ship-it May 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions .github/workflows/pull_request.yml
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,9 @@ jobs:
name: codecov-unit-node-${{ matrix.node-version }}
fail_ci_if_error: false

# Integration tests (v5): one job at a time (max-parallel: 1) to avoid 502/503 on shared Conductor.
# Sharding (--shard i/N) splits the suite so each job runs ~1/N of tests — keeps per-job under timeout.
# Integration tests (v5): lower max-parallel reduces 502/503 from the shared Conductor server
# but makes CI slower without eliminating flakes entirely — feel free to experiment.
# Sharding (--shard i/N) splits the suite so each job runs ~1/N of tests.
integration-tests:
runs-on: ubuntu-latest
timeout-minutes: 25
Expand Down Expand Up @@ -118,6 +119,7 @@ jobs:
CONDUCTOR_AUTH_KEY: ${{ secrets.AUTH_KEY }}
CONDUCTOR_AUTH_SECRET: ${{ secrets.AUTH_SECRET }}
CONDUCTOR_REQUEST_TIMEOUT_MS: "120000"
CONDUCTOR_RETRY_SERVER_ERRORS: "true"
JEST_JUNIT_OUTPUT_NAME: integration-v5-node-${{ matrix.node-version }}-shard-${{ matrix.shard }}-test-results.xml
- name: Publish Test Results
uses: dorny/test-reporter@v2
Expand Down Expand Up @@ -175,6 +177,7 @@ jobs:
CONDUCTOR_AUTH_KEY: ${{ secrets.AUTH_KEY_V4 }}
CONDUCTOR_AUTH_SECRET: ${{ secrets.AUTH_SECRET_V4 }}
CONDUCTOR_REQUEST_TIMEOUT_MS: "120000"
CONDUCTOR_RETRY_SERVER_ERRORS: "true"
JEST_JUNIT_OUTPUT_NAME: integration-v4-node-${{ matrix.node-version }}-shard-${{ matrix.shard }}-test-results.xml
- name: Publish Test Results
uses: dorny/test-reporter@v2
Expand Down
10 changes: 5 additions & 5 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ src/sdk/ # Main SDK source
decorators/worker.ts # @worker decorator + dual-mode support
decorators/registry.ts # Global registry (register/get/clear)
context/TaskContext.ts # AsyncLocalStorage per-task context
metrics/ # MetricsCollector, MetricsServer, PrometheusRegistry
metrics/ # LegacyMetricsCollector, CanonicalMetricsCollector, metricsFactory, MetricsServer, PrometheusRegistry, CanonicalPrometheusRegistry, accumulators, httpObserver
schema/ # jsonSchema, schemaField decorators
generators/ # Legacy generators (pre-v3, still exported for compat)
src/open-api/ # OpenAPI layer
Expand Down Expand Up @@ -211,10 +211,10 @@ public async someMethod(args): Promise<T> {

### Metrics Documentation (METRICS.md)

When adding, removing, or renaming metrics in `src/sdk/worker/metrics/MetricsCollector.ts`:
1. Update `METRICS.md` to reflect the change (name, type, labels, description)
2. Ensure both `MetricsCollector.toPrometheusText()` and `PrometheusRegistry.createMetrics()` are updated in sync — missing a summary/counter in either causes silent data loss
3. Update the metric count in the METRICS.md overview section
When adding, removing, or renaming metrics in `src/sdk/worker/metrics/`:
1. Update both `LegacyMetricsCollector.ts` and `CanonicalMetricsCollector.ts` (or add a no-op stub in the collector that does not emit the metric)
2. Ensure `toPrometheusText()` and the corresponding `PrometheusRegistry` / `CanonicalPrometheusRegistry` are updated in sync — missing a metric in either causes silent data loss
3. Update `METRICS.md` to reflect the change in both the legacy and canonical catalog tables
4. Add or update the corresponding direct recording method documentation if applicable

### SDK_NEW_LANGUAGE_GUIDE.md
Expand Down
29 changes: 29 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

- **Canonical metrics** -- opt-in harmonized metric surface via `WORKER_CANONICAL_METRICS=true`. See [METRICS.md](METRICS.md) for the full catalog, configuration, and migration guide.
- Bounded `uri` label on `http_api_client_request_seconds`: canonical mode uses path templates (e.g. `/workflow/{workflowId}`) instead of fully-resolved paths, preventing metric cardinality explosion from dynamic IDs.
- `TaskPaused` event type and `PollerOptions.onPaused` callback: emitted when a poll cycle is skipped because the worker is paused. Canonical mode records `task_paused_total`; legacy mode does not (see Implementation Notes in METRICS.md).
- `measurePayloadSize` option in `MetricsCollectorConfig`: controls whether `workflow_input_size_bytes` is recorded via `JSON.stringify` on each `startWorkflow` call. Defaults to `true` for canonical, `false` for legacy.
- `retryServerErrors` option in `OrkesApiConfig` / `RetryFetchOptions` and `CONDUCTOR_RETRY_SERVER_ERRORS` env var: opt-in retry of HTTP 502/503/504 for idempotent methods (GET, HEAD, OPTIONS, PUT, DELETE). Default `false`; set to `true` to enable.
- `WorkflowStatusProbe` in harness: opt-in probe (via `HARNESS_PROBE_RATE_PER_SEC`) that exercises UUID-bearing endpoints to validate template URI metrics.
- `WORKER_LEGACY_METRICS` is reserved for future use. Once canonical metrics become the default, setting `WORKER_LEGACY_METRICS=true` will re-activate the legacy surface. It is not read by the current implementation.

### Changed

- Legacy metrics emit unchanged when constructing `LegacyMetricsCollector` directly (the pre-existing pattern). Using `createMetricsCollector()` additionally enables automatic HTTP request timing via OpenAPI interceptors for both legacy and canonical modes; no other action required for existing deployments.
- `MetricsCollector.ts` renamed to `LegacyMetricsCollector.ts`; the public symbol is preserved via re-export so existing imports keep working.
- `http_api_client_request` timing is now recorded automatically by `wrapFetchWithRetry` when a metrics collector is active (via `createMetricsCollector()` or `setHttpMetricsObserver`), covering both successful responses and network-error fallback paths. A lightweight request interceptor captures OpenAPI path templates so the canonical `uri` label uses bounded-cardinality templates in all cases. Previously, `recordApiRequestTime` existed but was not wired into the HTTP pipeline -- [details](METRICS.md#implementation-notes).
- Added optional `durationMs` field to `TaskUpdateFailure` event, recording the duration of the last update attempt. Declared optional so existing event listener implementations are unaffected.

### Deprecated

- Legacy metric names remain the default during the transition period. Migration guidance is in [METRICS.md](METRICS.md#migrating-from-legacy-to-canonical).
Loading
Loading