Clean up NodeDefinition typing, fix empty-input handling, expose ClusterSummaryFeatures#890
Clean up NodeDefinition typing, fix empty-input handling, expose ClusterSummaryFeatures#890sevmag wants to merge 5 commits into
Conversation
Replace lowercase `torch.tensor` annotations with `torch.Tensor`, fix return-type docstrings on `NodeDefinition.forward` and `_construct_nodes` (they return a tensor, not a graph), correct the `ClusterSummaryFeatures._construct_nodes` return annotation from `Data` to `torch.Tensor`, and fix `_verify_standardization`'s return annotation (it returns None, not a tensor). Reorder imports to match the project's standard grouping. Also clarify the `ClusterSummaryFeatures` docstring about which features come from Glauch's thesis and which (counts) do not.
When the input pulse tensor was empty, `_construct_nodes` returned a `torch_geometric.data.Data` object built from `np.column_stack([x, []])` instead of a `torch.Tensor` shaped `[0, num_features]`. Downstream code expects a tensor, so this could crash or silently produce a malformed graph. Return a correctly-shaped empty tensor instead, and drop the now-unused `Data` import.
The class was already importable through `graphnet.models.data_representation.graphs` but missing from the top-level `data_representation` package's public re-exports.
84154ba to
5576fdd
Compare
Aske-Rosted
left a comment
There was a problem hiding this comment.
This looks good to me as long as there is a reason for why you might want to allow the passing of empty data objects. I would assume that we in that case might just want to catch that instance and raise an error?
Good point! I agree that an empty graph is probably not what we want. I will adjust it to throw an error. |
Just realised one of the tests is now failing. I guess we have been doing a dummy test with an empty data object.
|
https://github.com/graphnet-team/graphnet/blob/main/src/graphnet/datasets/test_dataset.py#L10 It seems like event 5 in the ERDA hosted dataset is empty. So we actually have to consider how we handle that in a way where this test still passes. I think it's probably appropriate to raise a warning instead of an error and then leave it at that. That will at least inform the user if they have empty events in their input. |
|
I agree! |
Summary
First in a series of small PRs being carved out of #813 to make review easier (see the comment on #813 for the full split plan). This one is all the touch-ups around
NodeDefinitionthat landed in that branch and aren't tied to the CNN work — pulling them out so they can be reviewed and merged on their own.Three independent commits:
NodeDefinition— replace lowercasetorch.tensorannotations withtorch.Tensor; fix return-type docstrings onforwardand_construct_nodes(they return a tensor, not a graph); correctClusterSummaryFeatures._construct_nodesreturn annotation fromDatatotorch.Tensor; fix_verify_standardizationannotation (returnsNone, nottorch.Tensor); reorder imports to project convention; clarify theClusterSummaryFeaturesdocstring about which features come from Glauch's thesis and which (counts) don't.NodeAsDOMTimeSeriesempty-input handling — when the input pulse tensor is empty,_construct_nodespreviously returned atorch_geometric.data.Databuilt fromnp.column_stack([x, []])instead of a tensor of shape[0, num_features]. Downstream callers expect a tensor; this could crash or silently produce a malformed graph. Returns a correctly-shaped empty tensor and drops the now-unusedDataimport.ClusterSummaryFeaturesfrom the top-leveldata_representationpackage (was already reachable via.graphs).No functional changes outside of (2), which is a bug fix.
Test plan
pre-commit run --files src/graphnet/models/data_representation/graphs/nodes/nodes.py src/graphnet/models/data_representation/__init__.py— already verified locallySplit from #813.
🤖 Generated with Claude Code