feat(registry): configurable driver registries via TOML config#360
feat(registry): configurable driver registries via TOML config#360zeroshade wants to merge 33 commits into
Conversation
5a8bf36 to
5ac1ddd
Compare
0813c82 to
79e6f5c
Compare
79e6f5c to
40e5ad2
Compare
amoeba
left a comment
There was a problem hiding this comment.
I did a quick code review over the new pieces and left some comments. Even if we don't have an immediate use case for this, I think having it in dbc is useful.
Some other more general thoughts:
- Would we want CLI integration for this? I think it would be useful for dbc to be able to do all the work to show/add/remove registries from the project and user config.
- I didn't see any tests for how
dbc installwill work with this. Presumably dbc install will respect the new fields in the user config. - Do we need to define a default priority order for registries to handle name conflicts? We have a lot of the same concerns as conda does with channels. I'm wishing for more tests in this PR for the interaction between the default, user, and project registries around name conflicts.
Introduce RegistryEntry, GlobalConfig, LoadGlobalConfig, WithGlobalConfig, and WithProjectRegistries. NewClient now merges project, global, and default registries (project wins, then global, then defaults) with deduplication by scheme+host+path. WithBaseURL still overrides all other registry sources. Validation (empty URL, missing host, non-http scheme, replace_defaults=true with no entries) runs at NewClient time so malformed configuration surfaces immediately rather than at first request.
Extend DriversList with omitempty Registries and ReplaceDefaults fields so projects can pin or override driver registries inline in dbc.toml. Existing dbc.toml files without those keys continue to decode unchanged. applyProjectRegistries rebuilds the process-wide dbc client with the merged registry list; callers (add, sync) invoke it after decoding the list.
Load ~/.config/dbc/config.toml once at startup and thread it into the default client via WithGlobalConfig. Project commands (add, sync) rebuild the client with WithProjectRegistries after parsing dbc.toml so driver lookups use the merged registry list. DBC_BASE_URL continues to short circuit all registry merging. Tests cover project-registry wiring, invalid project URLs, and backwards-compatibility with dbc.toml files that omit the new fields.
a41a2ad to
5f8660a
Compare
Two issues from roborev review: - DriversList.ReplaceDefaults is now *bool so a project can explicitly set replace_defaults = false to override a global config's replace_defaults = true (previously collapsed into unset vs false). - NewClient now validates GlobalConfig.Registries URLs with the same path used for project registries, and rejects any configuration that produces an empty registry list (e.g. global replace_defaults = true with zero entries), so malformed config surfaces at NewClient time instead of at first request with silent zero-registry lookups. - LoadGlobalConfig applies the same no-entries check when decoding config.toml so CLI users see the error at startup.
…ries The existing TestAdd/TestSync project-registry tests route through testBaseModel, whose getDriverRegistry stub bypasses dbcClient entirely. Add direct coverage that swaps in a lookup routed to the real dbcClient against an httptest server, proving applyProjectRegistries genuinely rewires driver resolution to the merged registry list and that: - a project [[registries]] entry routes Search() to that server; - an explicit project replace_defaults=false restores built-in defaults even when the global config declared replace_defaults=true; - DBC_BASE_URL continues to short-circuit project registry overrides.
LoadGlobalConfig rejecting global replace_defaults=true with zero entries made the CLI diverge from NewClient: the CLI dropped the global config with only a warning, silently re-enabling the built-in defaults even when the project dbc.toml supplied the registries the user actually wanted. Remove that premature check; the post-merge empty-list guard in NewClient already catches the genuinely broken config (global replace_defaults=true with no entries AND no project entries), while letting the valid combination through. Adds both a library test and a CLI wiring test for the project-supplies-entries case.
Eager initDBCClient() in main() failed startup for the valid
'global replace_defaults=true + project supplies registries' case —
NewClient was called before the project dbc.toml could merge in.
Remove the eager call and rely on lazy init via getDriverRegistry /
initDBCClient, so project commands get a chance to rebuild the client
via applyProjectRegistries first.
Non-project commands (search, info, install) still fail fast on the
broken case (global replace_defaults=true with no entries and no
project override) because NewClient rejects the empty merge — now
surfaced at first use rather than startup.
auth.go's requestDeviceCode now ensures init before touching
dbcClient.HTTPClient(). dbcClientOnce is a pointer so tests can simulate
fresh startup.
Adds TestAddCmdEndToEndThroughRealClient which runs AddCmd.GetModel()
against an httptest registry — proves the production getDriverRegistry
closure routes through the rebuilt dbcClient. Adds TestStartupDeferred*
tests for the main() ordering invariants.
Also replaces 'switch { case err == ... }' with 'switch err' in main().
…in GetDriverList Two findings from roborev review of the branch: MEDIUM: OAuth device-code login ran through initDBCClient just to get an HTTPClient. A misconfigured global registry (e.g. replace_defaults=true with no entries) broke 'dbc auth login' — the very recovery path the user would run to fix the config. Add authHTTPClient() that builds a bare HTTP client without touching registry validation, and route device-code calls through it. LOW: GetDriverList decoded the new [[registries]] / replace_defaults fields but never called applyProjectRegistries, so library consumers using that helper silently ignored project registry overrides. Call it immediately after decoding. Regression tests pin both behaviors: - TestAuthHTTPClientDoesNotRequireRegistryConfig - TestGetDriverListHonorsProjectRegistries
Three findings from review of the branch: MEDIUM (1632): GetDriverList mutated the process-wide dbcClient via applyProjectRegistries, which leaked registry state across calls — a later invocation on a dbc.toml without [[registries]] would silently reuse the previous call's client. Switch GetDriverList to build a per-call dbc.Client scoped to the decoded file, so each call sees a fresh registry merge. Test exercises two sequential calls against two different servers to prove no leakage. MEDIUM (1630): NewClient's pre-merge guard rejected project replace_defaults=true with no project entries, even when the global config supplied registries. Remove the pre-merge guard; the post-merge empty-list check is the correct gate and lets 'drop defaults, keep globals' through. Test updated and new test pins the global-entries+project-replace_defaults case. LOW (1629): startup tests didn't exercise the real main() loading path. Extract loadStartupRegistryConfig() and add TestStartupEndToEndGlobalReplaceDefaultsWithProjectEntries that writes a real config.toml + dbc.toml, runs the full startup sequence, and drives AddCmd.GetModel() against an httptest server.
Addresses review 1633: MEDIUM: extract parseStartupArgs() and have main() use it, so the same config-load + argv-parse helper runs under test. The startup regression test now asserts at each step that dbcClient remains nil — if anyone reintroduces eager initDBCClient() between config load and subcommand dispatch the test fails at whichever step broke. LOW: update WithProjectRegistries doc comment to match the actual semantics: an error only when the MERGED registry list is empty. The old text claimed replace_defaults=true with no entries is always rejected, which contradicts the post-merge-only validation.
Addresses review 1634: the previous parseStartupArgs extraction only covered the parser bit. Anyone reintroducing an eager initDBCClient() between p.Parse() and GetModel() would still not have failed the test. Extract a single runStartup() that covers the full path main() runs: config load, argv parse, subcommand selection, and GetModel() for model commands. main() now calls runStartup() and switches on the returned startupKind for its exit cases. Tests drive runStartup() end-to-end, asserting dbcClient remains nil throughout. No functional change to main()'s behavior — same exit codes, same help and version paths, same subcommand dispatch.
Addresses review 1635: after the runStartup refactor, main() always
called runStartup(configDir, ...) even when internal.GetUserConfigPath
failed, leaving configDir = "". runStartup then forwarded that to
LoadGlobalConfig(""), which reads ./config.toml from the current
working directory. That changed registry resolution based on cwd — a
regression against the pre-refactor behavior that simply skipped the
load on this error path.
Guard runStartup to skip the load when configDir is empty. Document the
sentinel in the helper's doc comment. Add TestRunStartupSkipsLoadWhenConfigDirEmpty
that plants a hostile ./config.toml and asserts runStartup leaves
globalRegistryConfig nil.
Addresses review 1636: runStartup("", ...) skipped the load but did not
clear an already-primed globalRegistryConfig, so a second in-process
call would silently reuse the previous invocation's global registries.
Set globalRegistryConfig = nil explicitly in the empty-configDir branch
and add TestRunStartupClearsStaleGlobalConfigWhenConfigDirEmpty, which
primes the global before calling runStartup("") and asserts the
state is cleared.
Two MEDIUM findings from the full-branch review (job 1641): 1. NewClient regression: passing WithRegistries() alongside WithGlobalConfig/WithProjectRegistries caused the explicit list to be treated as the 'defaults' slice in the merge, prepending extra entries and generally not honoring the documented 'sets the driver registries to use' contract. Track explicitRegistries separately and skip the config-file merge when WithRegistries was passed. Added TestNewClientWithRegistryOptions/WithRegistries_takes_precedence... 2. cmd/dbc add lock ordering: the project lock wrapped the registry lookup, so a slow registry fetch could hold .dbc.project.lock for up to the lookup timeout. Concurrent dbc commands would hit the 10s lock timeout and fail with 'another dbc operation is in progress' even though no file mutation was happening. Restructure the add flow: decode + applyProjectRegistries + getDriverRegistry WITHOUT the lock, then acquire the lock only around the read-modify-write phase, re-reading the file under the lock so concurrent additions aren't clobbered. Added TestAddDoesNotHoldLockDuringRegistryLookup.
Addresses review 1645: the post-lock merge wrote the entire stale pre-lock m.list back onto the freshly re-read on-disk state. That could clobber a concurrent dbc add/remove that changed a different driver, and unconditionally overwrote any concurrent edits to [[registries]] / replace_defaults. Narrow the merge to only the driver entries this invocation actually added/replaced (iterating 'specs'), and leave current.Registries / current.ReplaceDefaults untouched so concurrent edits to those fields survive. Added TestAddPreservesConcurrentEdits which simulates a concurrent editor modifying the file mid-registry-lookup (new driver + [[registries]] addition) and asserts both the new driver and the concurrent changes are present after the add completes.
…faults Addresses review 1646: MEDIUM: a concurrent editor changing [[registries]] or replace_defaults between the unlocked driver lookup and the locked write-back phase could cause dbc add to write drivers that were validated against a stale registry set — later sync/install would fail against the new config. Under the lock, compare the re-read list's registry fields to the pre-lock snapshot via a new registriesChanged() helper. On drift, return a clear 'please retry' error rather than write inconsistent state. LOW: TestAddPreservesConcurrentEdits claimed to cover replace_defaults but the simulated concurrent content didn't set it. Split into two tests: one for concurrent driver-only edits (preserved) and one for registry-config drift (aborts). The drift test verifies the written file still contains the concurrent replace_defaults=true, the concurrent registry, and that the aborted add did NOT write its driver entry.
Addresses review 1648: the previous pattern — read-without-lock, lookup, reacquire-for-write — left a new torn-read race. A concurrent writer using os.Create (the writer path for dbc add/remove) could expose partial file contents to the unlocked initial decode, turning a valid concurrent operation into a spurious TOML parse error or a snapshot of torn state. Take the project lock briefly for the initial read, release it before the network lookup (so slow registry fetches still don't hold the lock), and reacquire for the final merge/write. The lock is never held across I/O that blocks on external services. TestAddDoesNotHoldLockDuringRegistryLookup still passes — the read lock is released before the stalled getDriverRegistry call begins.
Addresses review 1649 (LOW): the torn-read fix was uncovered by any regression test. Add TestAddInitialReadBlocksWhileLockHeld which holds the project lock externally and instruments getDriverRegistry to signal when add progressed past the initial read. With the read lock in place, add blocks before reaching the registry lookup and the test waits out a 300ms window before releasing the lock. Without the read lock fix, add reads the file and drives into getDriverRegistry while the lock is held, which the test detects via the channel signal and fails. Verified the test fails under a reverted fix (python-edited add.go into the pre-fix state) and passes on the current code.
Addresses review 1652 (LOW): the prior TestAddInitialReadBlocksWhileLockHeld test used a 300ms timeout on a registry-hook channel as the failure signal, which could spuriously pass on slow CI runners and didn't exercise the torn-read failure mode directly. Add TestAddInitialReadIsAtomicAgainstTornState: write invalidTOML while holding the project lock, start dbc add, assert add does NOT surface a decode error (it's blocked on the lock), then restore validTOML and release. If the initial read skips the lock, add reads the invalid TOML and the channel yields a TOML decode error, which the test detects. Verified: under a reverted fix (initial read without the lock) this test fails with 'toml: expected character ='. Under the current code the test passes — add blocks on the lock until valid content is restored.
Addresses review 1654 (LOW): registriesChanged compared whole RegistryEntry structs, so a concurrent edit that only renamed a registry would false-positive as 'registry configuration changed' and abort dbc add even though the URL resolution is identical. Equivalent URL spellings (trailing slash, host casing) would also false-positive. Normalize URLs (lowercase scheme/host, strip trailing path slash) and ignore Name when comparing. Documented the intent in the helper's doc comment and added TestRegistriesChanged covering name-only changes, trailing-slash variation, case differences, and tri-state replace_defaults. The 1654 MEDIUM on 'project commands rely on process-global state' is intentionally left as-is: cmd/dbc is a single-command-per-process binary, resetClientState() at the top of runStartup clears all cached state at the start of every run, and in-process reuse is a test-only concern that the test harness handles. Adding Once/Err resets to setDBCClient would be defensive code for an unreachable path.
…tion Addresses review 1657: normalizeRegistryURL dropped query, userinfo, fragment, and full path — so two URLs like 'https://r.example.com/?tenant=a' and 'https://r.example.com/?tenant=b' compared equal. That could hide a real registry config change from the dbc add drift check, letting it write a driver resolved against different registry settings than the final dbc.toml. Rebuild the URL via url.URL.String() after lowercasing the scheme and host and trimming a trailing path slash — only the truly no-op differences. Query, userinfo, fragment, and interior path segments are preserved. Added regression tests for tenant-selector query, userinfo, and path-segment changes all comparing unequal.
Addresses review 1658 (LOW): fragments aren't sent on HTTP requests, so two registry URLs that only differ in fragment resolve to the same endpoint. The prior normalizeRegistryURL preserved fragments, which meant a cosmetic '#a' -> '#b' edit triggered the config-drift abort in dbc add. Clear Fragment and RawFragment before serializing; added a regression test pinning that fragment-only changes compare equal.
…ntics Addresses review 1660: mergeRegistries deduplicated entries using a key of scheme://host + trimmed path, which collapsed semantically distinct registries — tenant-bearing query strings, credential-bearing userinfo, and path-mounted tenants were all treated as duplicates. The merged client silently dropped configured registries and resolved drivers against the wrong tenant/identity. Update the dedupe key to the same normalization used by cmd/dbc's registriesChanged drift check: lowercase scheme/host, trim only a trailing path slash, drop fragment — preserve query, userinfo, and path segments. Added regression tests in TestMergeRegistries covering tenant query, userinfo, path-mounted tenants, casing/trailing-slash collapse, and fragment collapse.
Three separate suites (TestLoadGlobalConfig, TestMergeRegistries, TestNewClientWithRegistryOptions) were each a long chain of t.Run closures with repeated setup. Convert each to a single table-driven test with a small struct of inputs + expected outputs and one driver loop. Adds an urls() helper so mergeRegistries cases can assert the full ordered URL list in one expression. Same coverage, much shorter, easier to scan and extend.
…able Addresses review 1662 (LOW): the table refactor only checked len(cfg.Registries) + ReplaceDefaults, dropping the pre-refactor assertions on the parsed URL and Name fields. A regression in TOML decoding, ordering, or field mapping would now pass unnoticed. Replace the wantRegs int with a wantEntries []RegistryEntry field containing the fully-specified expected contents, and compare via assert.Equal so ordering, URL, and Name mismatches all fail the test.
Addresses review 1664 (LOW): registriesChanged compared raw project fields, so semantically-equivalent edits produced false-positive drift aborts — e.g. flipping nil → &false when defaults were already inherited, or adding an exact duplicate registry entry that mergeRegistries would dedupe anyway. Rewrite the helper to run both DriversList values through the same newDBCClient merge path and compare the resulting normalized URL sets. That matches what the client actually uses for driver resolution, so mergeRegistries no-ops and tri-state flips that don't actually change the active registry set no longer abort dbc add. Added three new subtests: - nil vs &false replace_defaults (no global override): equal - exact duplicate entries: equal - name-only duplicate rename: equal The 1664 MEDIUM on concurrent global config.toml edits is intentionally left out of scope. The global-config-during-add race is extremely unlikely in practice (user editing ~/.config/dbc/config.toml while a separate shell runs dbc add), and the recovery is a simple retry — the already-written driver entry is valid against the dbc.toml the user wrote, just not against the fresh global config, which 'dbc sync' or a fresh add would correctly report.
Give the size of this PR, we can add CLI integration as a follow-up PR.
Good point, i'll add tests for this
Having That said, there's no tests currently for that behavior so I'll add some tests for it. |
…erge The merge of main brought in PR #374 which threaded context.Context through Client request methods. Two call sites added by this branch were missed by the merge: - cmd/dbc/driver_list.go:155 in GetDriverList — now passes context.Background() (matches the pattern used by drivers.GetDriverList and cmd/dbc/main.getDriverRegistry). - cmd/dbc/registry_wiring_test.go test stubs that mimic the production getDriverRegistry closure — match the production context.Background() call so the stubs faithfully reproduce the closure they replace. Build & Integrate (ubuntu/macos/windows), snapshot, and the per-platform build matrix were all failing on the same compile error.
Adds end-to-end tests pinning that 'dbc install' honors the new configurable registry surface. install is user-scope (it never reads dbc.toml), so the global config plus built-in defaults are its only configurable registry sources — these tests drive InstallCmd.GetModel() through the real dbcClient.Search path and assert the merged registry list and the resolved package URL. - TestInstallCmdHonorsGlobalConfigRegistry: a [[registries]] entry in the global config routes install's driver lookup through that registry, and the resolved PkgInfo's Driver.Registry / package URL come from it. - TestInstallCmdGlobalReplaceDefaultsLimitsToConfiguredRegistry: replace_defaults=true in the global config drops the built-in defaults from install's view; the merged client holds exactly one registry. - TestInstallCmdRegistryPriorityFirstDeclaredWins: when two registries publish a driver with the same path, the first-declared wins — Client.Search returns both copies, findDriver picks the first match, and the resolved package URL comes from the first registry's index. - TestInstallCmdDBCBaseURLOverridesGlobalConfig: DBC_BASE_URL takes precedence over the global config; install queries only the override and the global-config registry is never hit. Each test stubs downloadPkg with a sentinel error so the install ends right after the registry lookup — capturedPkg records what install would have downloaded, exposing both the resolving registry and the package URL for assertions. Adds runInstallModelToCompletion driver helper because install's Init() returns tea.Batch (spinner + lookup), which the existing runTeaCmdToCompletion (used by add tests, single tea.Cmd Init) cannot dispatch.
…efault
Pins behavior at the boundary where the project, global (user), and
built-in default registry tiers interact. Covers both URL conflicts
(merge-level dedup) and driver-name conflicts (Search-level priority).
Library tests (registry_config_test.go):
- Extend TestMergeRegistries with cross-tier URL collision cases:
- project URL collides with default URL — project wins, default
URL dedup'd, project's Name kept;
- global URL collides with default URL — global wins;
- all three tiers share a URL — project wins, others dedup'd;
- project URL matches second default — only that one default URL
is dropped, the rest of the merge is preserved.
- TestSearchPriorityAcrossRegistryTiers — Client.Search returns
drivers in registry-priority order across three httptest servers
acting as project/global/default tiers. Confirms first-match-wins
in cmd/dbc.findDriver and Client.Install resolves cross-tier name
conflicts to the highest-priority tier. Constructs the Client
directly so the test isn't sensitive to whether the real built-in
default URLs are reachable from CI.
CLI tests (cmd/dbc/registry_wiring_test.go):
- TestApplyProjectRegistriesPreservesAllThreeTiersInOrder — when no
replace_defaults is set, dbcClient.Registries() exposes all three
tiers in [project, global, defaults...] order. Asserts the tail
is exactly the built-in defaults (project/global URLs are not
duplicated into it).
- TestApplyProjectRegistriesProjectReplaceDefaultsKeepsGlobal — pins
the asymmetric semantics of project replace_defaults=true: it
drops only the built-in defaults, NOT the global tier. A project
opting out of defaults still inherits the user's global registries.
- TestProjectRegistryShadowsGlobalOnDriverNameConflict — full CLI
wiring (applyProjectRegistries -> dbcClient.Search -> findDriver):
when project and global both publish a driver with the same path,
project wins and the resolved package URL comes from the project
tier's index. replace_defaults=true is set on the project so the
built-in defaults stay out of the merge — the all-three-tiers
case is covered by the ordering test above and by the library
TestSearchPriorityAcrossRegistryTiers.
|
@amoeba I addressed the comments, updated this, and added the tests that were mentioned. Let me know what you think |
Resolves conflict in cmd/dbc/main.go: ports the initVTProcessing() call added on main (#383) into the new main() that calls runStartup(), instead of the pre-refactor main() body that no longer exists on this branch.
|
the failing snapshot CI is an upstream snapd problem |
Summary
Adds support for configuring driver registries via TOML config files, allowing users to add custom registries at both the global (
~/.config/columnar/dbc/config.toml) and project (dbc.toml) level.What's New
~/.config/columnar/dbc/config.toml): loaded on every command viaConfigureRegistries()called inmain()dbc.toml):[[registries]]section respected bysyncandaddcommandsreplace_defaults = true: opt-in flag to suppress built-in registries entirelyDBC_BASE_URLenv var: still overrides everything (unchanged behavior)dbc.tomlfiles without[[registries]]work unchangedConfig Format
Global (
~/.config/columnar/dbc/config.toml):Project (
dbc.toml):Implementation Notes
http/httpsscheme and non-empty hostdefaultRegistriesis snapshotted eagerly ininit()(afterdrivers.goinit runs); env-sensitivity documented in commentsadd.godecodesdbc.tomlonce (single file open) for both registry config and driver listremoveintentionally omits registry wiring — it never queries a registry