Skip to content

Amp-powered subgraphs: Move Amp configs from ENV to TOML#6383

Open
isum wants to merge 15 commits intomasterfrom
ion/add-amp-graph-node-config
Open

Amp-powered subgraphs: Move Amp configs from ENV to TOML#6383
isum wants to merge 15 commits intomasterfrom
ion/add-amp-graph-node-config

Conversation

@isum
Copy link
Member

@isum isum commented Feb 18, 2026

Summary

Amp Flight service configuration moves from a single global endpoint (via GRAPH_AMP_FLIGHT_SERVICE_ADDRESS env var or --amp-flight-service-address CLI flag) to per-chain TOML configuration under [chains.<name>.amp]. This allows each chain to connect to its own Amp Flight service with independent credentials and context tables, which is necessary for multi-chain deployments where different chains are served by different Amp instances.

The per-chain config also makes the context table (used to join block hashes and timestamps into query results) explicit rather than discovered at runtime by probing source tables. Previously, manifest resolution would iterate over source.tables and try each one as a potential context source, silently skipping failures. Now the operator declares context_dataset and context_table in the TOML config, and resolution fails clearly if the context table is missing or lacks required columns.

A secondary change adds Amp-based start block resolution: when an Amp subgraph has start_block > 0, the registrar queries the Amp Flight service for the block hash at start_block - 1 instead of making an RPC call to the chain. This falls back to RPC on failure.

Breaking changes

  • Removed: GRAPH_AMP_FLIGHT_SERVICE_ADDRESS env var and --amp-flight-service-address CLI flag
  • Removed: GRAPH_AMP_FLIGHT_SERVICE_TOKEN env var (now token field in TOML)
  • Changed: The amp field on a chain section changes from a plain string (network alias) to a TOML table with address, context_dataset, context_table, and optional token and network fields

New config format

[chains.mainnet.amp]
address = "http://amp-flight.example.com:50051"
token = "my-secret-token"
context_dataset = "ethereum"
context_table = "blocks"
network = "ethereum-mainnet"  # optional, defaults to chain name

isum added 14 commits February 18, 2026 14:48
Add AmpChainConfig runtime struct in graph/src/components/network_provider/mod.rs
with address as parsed Uri, plus token, context_dataset, context_table, and network
fields. Add Config::amp_chain_configs() method that produces a
HashMap<String, AmpChainConfig> for all chains with an [amp] TOML section. URI parsing
is defensive (returns error, not panic) even though ChainSection::validate already
rejects invalid URIs. Three unit tests cover constructability, mixed configs,
and error handling for invalid addresses.
Replace the single global Amp Flight client with per-chain instances
stored in a new AmpClients<AC> wrapper type. Each chain with an [amp]
section in the TOML config now gets its own FlightClient, looked up by
chain name at the point of use.

Key changes:
- Add AmpClients<AC> in graph/src/components/network_provider/mod.rs
  with new(), get(), is_empty() methods
- launcher.rs and run.rs: iterate amp_chain_configs() to create
  per-chain clients; log errors per-chain and continue startup
- registrar.rs, instance_manager.rs, amp_subgraph/manager.rs: accept
  AmpClients instead of Option<Arc<AC>>; look up client by network
  name from the raw manifest
- server/index-node: thread AmpClients through server, service,
  resolver
- Add 3 unit tests for AmpClients wrapper
Improved Amp status logging in launcher.rs and run.rs to provide
three distinct log levels based on config state: info when no chains
have [amp] configuration (skip connection loop entirely), info with
chain names when connections succeed, and warn when all configured
chains fail to connect. Added two unit tests verifying AmpClients
is_empty() drives Amp manager registration decisions.
Add resolve_amp_start_block function that queries the Amp Flight service
for a block hash at a given block number, using auto_block_hash_decoder
to detect the block hash column from the returned RecordBatch. In
create_subgraph_version, detect Amp subgraphs and use this new path
instead of RPC-based block_pointer_from_number when start_block > 0,
falling back to RPC on failure.
Remove the GRAPH_AMP_FLIGHT_SERVICE_ADDRESS CLI flag from Opt structs
in opt.rs and manager.rs, remove GRAPH_AMP_FLIGHT_SERVICE_TOKEN env var
from AmpEnv and Inner envconfig struct. These are now fully superseded
by per-chain TOML config. Add unit test verifying AmpEnv constructs
correctly without the removed fields.
Verify that full_config.toml parses successfully with the [chains.mainnet.amp]
table form and that all five Amp field values (address, token, context_dataset,
context_table, network) are correctly deserialized.
…nction

Consolidates the duplicated YAML traversal pattern (dataSources[0].network)
into a single pub fn in graph::data::subgraph, replacing four identical
copies across registrar, instance_manager, amp_subgraph/manager, and
index-node resolver.
@isum isum self-assigned this Feb 18, 2026
@isum isum changed the title Amp-powered subgraphs: Move Amp configs to from ENV to TOML Amp-powered subgraphs: Move Amp configs from ENV to TOML Feb 18, 2026
@isum isum requested a review from lutter February 18, 2026 14:56
@lutter
Copy link
Collaborator

lutter commented Feb 18, 2026

I feel this would be more logical to configure as a new kind of provider so that in the graph-node.toml you would say

[chains.mainnet]
shard = "blocks"
provider = [
  { label = "mainnet-amp", details = { type = "amp", url = "https://..", token = "..", pointers = "ethereum.blocks", name = "ethereum-mainnet" },
  { label = "mainnet-rpc", details = { type = "web3", url = "http://..", features = [..] },
 ..
]

For now, we would only allow one amp provider, but it leaves the door open to allowing multiple if that would ever make sense. I also renamed the context_* settings to pointers - I believe that's what we expect from that table, right? That it contains columns for block number, hash, parent hash, and timestamp? I am open to other names, but context doesn't seem very clear to me

/// (e.g., AMP uses `"ethereum-mainnet"` while graph-node uses `"mainnet"`).
/// This struct is the *runtime* counterpart of the TOML-level `AmpConfig`
/// (which stores the address as a plain `String`). The `Config::amp_chain_configs()`
/// method bridges the two by parsing each address string into a `Uri`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could avoid this almost identical struct by deserializing the Url in the config (have a look at annotations like #[serde(with = ..)] and #[serde(deserialize_with = ..)] in config.rs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a helper!

/// This wrapper is used to pass per-chain Amp clients through the system
/// instead of a single global `Option<Arc<AC>>`. Use `get(chain_name)` to
/// retrieve the client for a specific chain.
pub struct AmpClients<AC> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this pretty much a HashMap? Just a curried type like type AmpClients<V> = HashMap<String, V>;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is the point where the different config syntax we discussed doesn't matter - Config accepts a different format, but amp_chain_configs can return the same structs as it does now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the new config syntax has a label, we should only log the label anywhere where we need to identify the amp server we are talking to. That completely obscures any operational details, like, e.g., host addresses or names

raw: serde_yaml::Mapping,
resolver: &Arc<dyn LinkResolver>,
amp_client: Option<Arc<AC>>,
amp_context: Option<(String, String)>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we seem to be passing amp_context around with amp_client a lot, could we carry the context in the client? It would mostly require an additional method on the Client trait.


/// Resolves a block pointer for an Amp subgraph by querying the Amp Flight
/// service for the block hash at the given `block_number`.
async fn resolve_amp_start_block<AC: amp::Client>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a good candidate for a method on amp::Client

// Non-Amp subgraph — use RPC.
_ => resolve_start_block(&manifest, &*chain, &logger).await?,
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overall create_subgraph_version function is already not great, but it would be great if these 30 lines of fairly straightforward code lived somewhere in a helper. I am always worried about the readability of code that mixes control logic with basic "get me this thing" code.

context_dataset: amp.context_dataset.clone(),
context_table: amp.context_table.clone(),
context_dataset: normalize_sql_ident(&amp.context_dataset),
context_table: normalize_sql_ident(&amp.context_table),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely good to do the normalization early - I am just not sure if we should do this, since SQL identifiers are case-sensitive, it's just that dealing with identifiers that are not all-lowercase is highly annoying since, at least in Postgres, you need to write select * from "MyTable" since it would interpret select * from MyTable as the same as select * from mytable. An alternative would be to use what the user gave us verbatim but enclose each component in double quotes (if it is not already in double-quotes, it gets complicated)

Either way, fine to leave this as-is for now, just something to think about.

(0, _, _) => None,
// Amp subgraph with start_block > 0 — try Amp-based resolution.
(min, Some(client), Some((dataset, table))) => {
(min, Some(client), Some((dataset, table))) if is_amp_subgraph => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't there some bigger issue if is_amp_subgraph is true and client or context are None (and vice versa)? I think we should produce some sort of internal error for those cases rather than silently do something else (like talk to RPC)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments