Add log_train_loss_on_step toggle to EasySyntax by sevmag · Pull Request #886 · graphnet-team/graphnet

sevmag · 2026-05-04T22:14:26Z

Summary

Adds an opt-in log_train_loss_on_step flag to EasySyntax that logs a per-step train_loss_step metric in addition to the existing epoch-aggregated train_loss.
Default is False, so existing logging behavior is unchanged.

addresses #882

Adds an opt-in `log_train_loss_on_step` constructor argument that, when enabled, logs the per-batch training loss under `train_loss_step` in addition to the epoch-aggregated `train_loss`. Default is False so existing behavior is unchanged.

christianlocatelli · 2026-05-12T11:31:08Z

I will look at this.

christianlocatelli · 2026-05-26T08:44:31Z

        scheduler_class: Optional[type] = None,
        scheduler_kwargs: Optional[Dict] = None,
        scheduler_config: Optional[Dict] = None,
+        log_train_loss_on_step: bool = False,


The variable name could be renamed to also_log_train_loss_per_step. This would immediately clarify, that it is an additional option for logging the per-batch loss under a different key.

Suggested change

log_train_loss_on_step: bool = False,

also_log_train_loss_per_step: bool = False,

It could be also useful to add a Docstring explaining the arguments in __init__(), but especially for also_log_train_loss_per_step.

""" Args: also_log_train_loss_per_step: If `True`, logs an additional per-batch metric (`train_loss_step`) alongside the existing per-epoch metric (`train_loss`). This can be useful for debugging training instabilities or monitoring convergence within long epochs. """

christianlocatelli · 2026-05-26T08:47:58Z

        self._scheduler_class = scheduler_class
        self._scheduler_kwargs = scheduler_kwargs or dict()
        self._scheduler_config = scheduler_config or dict()
+        self._log_train_loss_on_step = log_train_loss_on_step


Suggested change

self._log_train_loss_on_step = log_train_loss_on_step

self._also_log_train_loss_per_step = also_log_train_loss_per_step

christianlocatelli · 2026-05-26T08:54:52Z

+        if self._log_train_loss_on_step:
+            self.log(
+                "train_loss_step",
+                loss,
+                batch_size=batch_size,
+                prog_bar=False,
+                on_epoch=False,
+                on_step=True,
+                sync_dist=True,


It might be computationally expensive, if sync_dist=True.
There would be syncing across GPUs on every batch, which quickly adds up for high batch number. It should be maybe clarified in the Docstring at the top, that the training might be slowed down. The default of this option could also be set to sync_dist=False.

Suggested change

if self._log_train_loss_on_step:

self.log(

"train_loss_step",

loss,

batch_size=batch_size,

prog_bar=False,

on_epoch=False,

on_step=True,

sync_dist=True,

if self._also_log_train_loss_on_step:

self.log(

"train_loss_step",

loss,

batch_size=batch_size,

prog_bar=False,

on_epoch=False,

on_step=True,

sync_dist=True,

Good point! Let's set sync_dist to false as the default

Co-authored-by: Christian Locatelli <97306084+christianlocatelli@users.noreply.github.com>

christianlocatelli

I left some comments about optional naming and doc improvements.

The previous commit ("Apply suggestions from code review") was created via GitHub's batch-suggestion apply, which mangled the indentation and left a name mismatch, so the module no longer imported: - under-indented `also_log_train_loss_per_step` parameter and attribute - top-level `if self._also_log_train_loss_on_step:` referencing an attribute that is never set (`_on_step` vs `_per_step`) Re-apply the reviewer's intent cleanly: rename to `also_log_train_loss_per_step`, log the per-step metric with `sync_dist=False`, and document all `__init__` arguments. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

sevmag · 2026-05-27T01:59:21Z

Hey @christianlocatelli, I implemented your suggestions. Sorry for the commit name, that was claude 😅

christianlocatelli

This looks good to me, thanks for editing the code 👍

sevmag · 2026-05-27T19:59:26Z

@Aske-Rosted tagging you here for completeness (and a potential approval 😅 )

Aske-Rosted · 2026-05-29T06:42:03Z

        scheduler_class: Optional[type] = None,
        scheduler_kwargs: Optional[Dict] = None,
        scheduler_config: Optional[Dict] = None,
+        also_log_train_loss_per_step: bool = False,


I am sorry for being a little bit annoying here but I think it is better to just expose the on_epoch and on_step from the lightning library then the user can decide themselves whether they want just on step, or just on epoch, on both or on neither. The standard values should follow current behavior.

Suggested change

also_log_train_loss_per_step: bool = False,

log_on_epoch: bool = True

log_on_step: bool = False,

Aske-Rosted · 2026-05-29T06:47:12Z

            on_epoch=True,
            on_step=False,


And then expose the logging setting to the class. This removes duplicate code and then we don't have to define the batch_size.

Suggested change

on_epoch=True,

on_step=False,

on_epoch=log_on_epoch,

on_step=log_on_step,

makes sense, but how do we want to handle the logging of the val loss? log_on_epoch and log_on_step for me sounds like you log both val and train on epoch and or step, which I think could also be valid (personally I log train on log and step and val only on epoch). As long as we agree on something together I think either way is fine

I think it is fine to have the logging of the validation and train loss behave in the same way. In principle we could separate the arguments for validation and training, but I think that is a little too many arguments, and moving towards instances where people should just create their own torch-lightning callbacks.

christianlocatelli self-requested a review May 12, 2026 18:33

christianlocatelli reviewed May 26, 2026

View reviewed changes

Apply suggestions from code review

d4e3508

Co-authored-by: Christian Locatelli <97306084+christianlocatelli@users.noreply.github.com>

christianlocatelli reviewed May 26, 2026

View reviewed changes

sevmag requested a review from christianlocatelli May 27, 2026 01:59

christianlocatelli reviewed May 27, 2026

View reviewed changes

Aske-Rosted reviewed May 29, 2026

View reviewed changes

sevmag added 2 commits May 29, 2026 12:12

using canonical on_step and on_epoch kwargs

4dd59b7

setting the correct attributes

07b2c72

	log_train_loss_on_step: bool = False,
	also_log_train_loss_per_step: bool = False,

	self._log_train_loss_on_step = log_train_loss_on_step
	self._also_log_train_loss_per_step = also_log_train_loss_per_step

	also_log_train_loss_per_step: bool = False,
	log_on_epoch: bool = True
	log_on_step: bool = False,

Conversation

sevmag commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

christianlocatelli commented May 12, 2026

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christianlocatelli May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sevmag May 27, 2026

Choose a reason for hiding this comment

Uh oh!

christianlocatelli left a comment

Choose a reason for hiding this comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

christianlocatelli left a comment

Choose a reason for hiding this comment

Uh oh!

sevmag commented May 27, 2026

Uh oh!

Aske-Rosted May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Aske-Rosted May 29, 2026

Choose a reason for hiding this comment

Uh oh!

sevmag May 29, 2026

Choose a reason for hiding this comment

Uh oh!

Aske-Rosted May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sevmag commented May 4, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading

christianlocatelli May 26, 2026 •

edited

Loading

Aske-Rosted May 29, 2026 •

edited

Loading