feat(client): autonomous EdgeX 4.x edge collection container (MQTT bus)#9
feat(client): autonomous EdgeX 4.x edge collection container (MQTT bus)#9tomgrv wants to merge 17 commits into
Conversation
…ntainer Pivot the EdgeKit client from a bespoke Node.js metrics agent to an autonomous EdgeX Foundry 4.x edge collection stack, per the "Container autonome de collecte EdgeX (v4, MQTT)" epic. The edge node: - uses an internal MQTT message bus (Mosquitto, loopback) instead of Redis - collects via device-modbus (Modbus TCP) and device-rest (REST/JSON) - runs app-service-configurable to filter/tag/compress and export Events to the central MQTT broker, with native Store & Forward for offline buffering - persists locally in embedded PostgreSQL (EdgeX 4.x default DB) - optionally enables core-data for local generic persistence - is fully statically configured (baked res/ files + env overrides), no Consul Implementation: - client/Dockerfile: all-in-one image lifting EdgeX 4.x binaries, orchestrated by supervisord; entrypoint bootstraps embedded Postgres - client/res/: static config (device profiles/devices, app-service pipeline, internal broker), client/env/edge.env: per-site overrides - Helm: client converted to a StatefulSet with per-replica persistence + a headless service; values reworked for the new model - docker-compose, scripts, CI updated for the new client - docs: architecture rewrite, ADR 0001, edge-operations guide Note: the stack has not been booted end-to-end; pinned EdgeX paths, env-override keys and the Postgres bootstrap need validation against the target release (see ADR 0001 follow-ups). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
There was a problem hiding this comment.
Code Review
This pull request replaces the bespoke Node.js metrics agent with an autonomous EdgeX Foundry 4.x edge stack packaged as a single Docker image (edgekit-client), featuring local Modbus/REST collection, an internal MQTT bus, and Store & Forward capabilities. The review identified several critical issues: potential container failures in entrypoint.sh due to unsafe directory ownership changes and unquoted heredocs, a hardcoded database path in supervisord.conf that ignores environment overrides, and YAML type mismatches in device definitions where numeric values must be strings. Additionally, the AddTags pipeline function incorrectly uses = instead of : as a separator across multiple configuration files, and the Helm StatefulSet needs to utilize the Kubernetes downward API to ensure unique site identities are generated for each replica.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
- entrypoint: chown only $PGDATA (not its parent) to avoid chowning / when PGDATA is a root-level path; harden the Postgres bootstrap with a quoted heredoc + psql -v parameters and quote_ident - supervisord: use %(ENV_PGDATA)s so an overridden PGDATA is respected - device-modbus/device-rest: quote numeric ProtocolProperties values (EdgeX ProtocolProperties is map[string]string) - AddTags pipeline: use key:value separator (colon, not '=') in app-service config, edge.env and docker-compose - helm: inject POD_NAME via the downward API and build a unique per-replica site tag (site:<siteId>-$(POD_NAME)) Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
The release workflow now builds the EdgeX client with the same pinned EDGEX_VERSION used in client/Dockerfile and ci.yml, so released images are built against a known EdgeX release rather than the Dockerfile default. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
Edge nodes are commonly arm64 (gateways, SBCs), so both the CI validation builds and the released images now target amd64 and arm64. Adds QEMU setup and a shared PLATFORMS env; release images are published as multi-arch manifest lists. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
Switch the edge node from file-only configuration (-cp=false -r=false) to the EdgeX 4.0 configuration & registry provider, per https://docs.edgexfoundry.org/4.0/microservices/configuration/ConfigurationAndRegistry/ - add core-keeper (provider, port 59890) and core-common-config-bootstrapper to the all-in-one image and supervisord orchestration - add a shared common configuration (res/common/configuration.yaml) holding MessageBus, Database, Registry and the metadata client; the bootstrapper seeds it into Keeper on first start - slim each service's private config to service-specific settings only; the res/ files now seed Keeper rather than being the sole runtime source - run every service with -cp=keeper.http://localhost:59890 -r=true (Keeper itself runs without those flags); drop EDGEX_USE_REGISTRY=false - device profiles/devices remain file-based (loaded into metadata/Postgres) - docs: ADR 0002 (amends the no-registry decision in ADR 0001), architecture, edge-operations (sources + upgrade/re-seed procedure) and READMEs updated This deliberately reverses Story #4's "no registry" goal in favour of central, rebuild-free runtime config and a real service registry (see ADR 0002). Not booted end-to-end; Keeper paths/flags/seeding need validation against the pinned EDGEX_VERSION. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
…henticate to Postgres
Core-keeper (and every other service) reads DB credentials from
Writable.InsecureSecrets.DB.SecretData, not DATABASE_USERNAME/DATABASE_PASSWORD
(those env names don't exist in EdgeX). The embedded Postgres bootstrap set the
"postgres" role's real password to POSTGRES_PASSWORD while every service kept
reading the upstream default ("postgres"/"postgres"), causing
"password authentication failed for user postgres" at startup. Override the
actual InsecureSecrets env keys instead so they match.
EdgeX's Postgres client defaults to database "edgex_db" (a Go constant, overridable only via the EDGEX_DBNAME env var - not a configuration.yaml key). We were creating a database named "edgex" and never set EDGEX_DBNAME, so every service failed with "database \"edgex_db\" does not exist" right after the Postgres credential fix. Default POSTGRES_DB to edgex_db and export EDGEX_DBNAME from it so the created database always matches what's queried.
…st init The user/password/database setup only ran inside the one-time initdb guard, so an existing data directory from an earlier run (created when POSTGRES_DB defaulted to "edgex") never picked up the password fix or the edgex_db database - every service kept failing with "database edgex_db does not exist". Run the ALTER USER / CREATE DATABASE IF NOT EXISTS step on every start so it self-heals regardless of what an existing volume was initialised with.
core-keeper's idempotent SQL bootstrap runs CREATE EXTENSION "uuid-ossp", but that extension ships in postgresql16-contrib, which wasn't installed - core-keeper kept failing to set up its schema. Requires an image rebuild to take effect.
core-keeper's own private res/configuration.yaml carries InsecureSecrets.DB by default, so it could connect to Postgres - but core-metadata, device-modbus, device-rest and app-service get their DB credentials from the *common* config pushed by core-common-config-bootstrapper, which had no InsecureSecrets section at all. Those services were logging "InsecureSecrets missing from configuration" and never reached the database. Add the same Writable.InsecureSecrets.DB block (matching upstream's default) to the common config so every non-core-keeper service picks it up, with the WRITABLE_INSECURESECRETS_DB_SECRETDATA_PASSWORD override from entrypoint.sh applying uniformly.
- Common config's device-services/app-services groups were missing the
Writable section upstream provides; device-modbus/device-rest unconditionally
list common config keys under Writable and got a 404 with it entirely
absent.
- app-service's private configuration.yaml never set Service.Host/Port, so it
registered with Keeper without service info ("Service information not set")
and other services couldn't reach it.
…volume Core Keeper's stored config and the embedded Postgres data share one volume, and Keeper won't overwrite existing config without -o/--overwrite. After a client res/ config change, the only way to see it locally is a clean volume. Add a script to stop and remove just edge-pgdata (leaving the broker's mosquitto-data alone) so this doesn't require remembering the volume name.
core-common-config-bootstrapper builds override env var names from the full map path including the top-level group key (all-services/...), so the plain WRITABLE_INSECURESECRETS_DB_SECRETDATA_PASSWORD override only ever patched core-keeper's own private config. The common config pushed to Keeper (read by core-metadata, device-modbus, device-rest, and app-service) kept the literal "postgres" default, causing SQLSTATE 28P01 auth failures once the Postgres role's real password diverged from that default. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
- app-service: FilterByDeviceName's parameter is "DeviceNames" (per
app-functions-sdk-go), not "FilterValues" - the latter caused the
pipeline to fail loading entirely. Empty DeviceNames + FilterOut=false
still passes all events through, preserving the original no-op intent.
- device-modbus: the temperature profile's "scale" attribute was quoted
as a YAML string ("0.1") but core-metadata requires a float64, causing
the profile (and its dependent device modbus-th-01) to fail loading.
- device-rest/device-modbus: device-sdk-go's autodiscovery loop calls
time.ParseDuration on Device.Discovery.Interval unconditionally, even
when disabled; an absent/empty value logged a spurious error. Added
the standard Discovery block (Enabled: false, Interval: "30s").
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
device-sdk-go rejects any deviceCommand whose resourceOperations count exceeds Device.MaxCmdOps, which defaulted to 0 because our slimmed device-services common config dropped the upstream Device block entirely. This caused every AutoEvent read to fail with "exceed MaxCmdOps (0)". Restored the standard shared device-service defaults (MaxCmdOps: 128, MaxCmdValueLen: 256, EnableAsyncReadings, AsyncBufferSize, DataTransform). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
device-rest-go looks up the device's protocol parameters under the key "REST" (the RESTProtocol constant). The lowercase "rest" key meant the driver found no protocol block and failed every AutoEvent read with "no end device parameters defined in the protocol list". The field names (Host/Port/Path) were already correct. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01K7KSPTR8ntNPbb9DCZmKjm
Summary
Implements the "Container autonome de collecte EdgeX (v4, MQTT)" epic: the EdgeKit client is pivoted from a bespoke Node.js metrics agent to an autonomous EdgeX Foundry 4.x edge collection stack. Delivered as requested: design docs + client Dockerfile + static config files, replacing the old client.
The edge node:
EDGEKIT_ENABLE_CORE_DATA)res/files + env overrides), no Consul/registryHow user stories are addressed
client/Dockerfile(all-in-one, supervisord),docker-compose.ymlclient/res/app-service/configuration.yaml(MQTTExport,AutoReconnect)Writable.StoreAndForward+PersistOnError(Postgres-backed)client/res/,client/env/edge.env,docs/edge-operations.mdEDGEKIT_ENABLE_CORE_DATA,core-data-wrapper.shKey changes
Dockerfile(lifts EdgeX 4.x binaries, orchestrated bysupervisord),entrypoint.sh(Postgres bootstrap), staticres/config andenv/edge.envvalues.yamlreworked for the new modelarchitecture.mdrewrite,adr/0001-edgex-4x-mqtt-edge.md,edge-operations.mdThis stack has not been booted end-to-end. It is configuration/scaffolding and the following must be validated against the pinned
EDGEX_VERSIONbefore production use (tracked in ADR 0001 follow-ups):res/paths used by the multi-stageCOPY --fromapp-service-configurableprofile selectionAdding a smoke test to CI and an optional one-process-per-container
docker-compose.edge.ymlare noted as follow-ups.🤖 Generated with Claude Code
Generated by Claude Code