add tracker to support swanlab by tpx818 · Pull Request #213 · modelscope/twinkle

tpx818 · 2026-06-02T12:55:01Z

PR type

Bug Fix
New Feature
Document Updates
More Models or Datasets Support

PR information

add tracker submodule.

Experiment results

Paste your experiment result here(if needed).

Cover SwanLabTracker behavioral tests (23 cases) and dispatch system tests (38 cases): register_tracker, dispatch, dispatch_hyperparams, clear_trackers, set_rank, auto_init_from_env. Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

gemini-code-assist

Code Review

This pull request introduces a unified experiment tracking dispatch system for Twinkle training metrics, adding support for SwanLab and Weights & Biases (W&B) trackers along with environment-variable auto-initialization. The review feedback highlights several critical issues and improvement opportunities: a potential TypeError in WandbTracker when custom settings are passed, distributed training issues caused by initializing trackers at import time rather than lazily, a lack of idempotency in dispatch_hyperparams when adapter_name is None (which would flood tracking servers), and a silent failure in SwanLabTracker if the output directory does not exist. Additionally, suggestions were made to fully support lazy initialization in list_trackers and dispatch, and to correct the type hint for raw metrics.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-02T12:57:02Z

+        settings = None
+        proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")
+        if proxy:
+            settings = wandb.Settings(https_proxy=proxy)


If settings is passed in kwargs when instantiating WandbTracker, it will cause a TypeError because settings is also explicitly passed as a keyword argument to wandb.init (line 43) without being popped from kwargs first. Popping settings from kwargs beforehand resolves this issue and allows users to safely pass custom configurations.

Suggested change

settings = None

proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")

if proxy:

settings = wandb.Settings(https_proxy=proxy)

settings = kwargs.pop("settings", None)

proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")

if proxy:

settings = wandb.Settings(https_proxy=proxy)

gemini-code-assist · 2026-06-02T12:57:03Z

+# Run auto-init once at import time (before user code or atexit runs)
+_auto_init_from_env()


Initializing trackers at import time is problematic for distributed training. When multiple processes (e.g., in DDP or Ray) import twinkle.tracker, they will all trigger _auto_init_from_env() simultaneously. Since set_rank has not been called yet, _rank is 0 on all processes, causing every rank to initialize its own tracking run. Removing the import-time call and running auto-initialization lazily ensures it only runs on rank 0 after the distributed environment is set up.

Suggested change

# Run auto-init once at import time (before user code or atexit runs)

_auto_init_from_env()

# Auto-init is run lazily on first tracker registration or dispatch

gemini-code-assist · 2026-06-02T12:57:03Z

+def dispatch_hyperparams(params: Dict[str, Any], adapter_name: Optional[str] = None) -> None:
+    """Send hyperparameters to all registered trackers (call once at training start).
+
+    Idempotent per ``(adapter_name,)`` — repeated calls with the same
+    *adapter_name* are silently ignored so that this can safely be called
+    from ``calculate_metrics`` on its first invocation without
+    flooding trackers with redundant config updates.
+
+    Args:
+        params: Flat or nested dict of hyperparameters (e.g. model config,
+            training args, LoRA config).
+        adapter_name: Optional adapter identifier.  If omitted, the params
+            are dispatched unconditionally on every call.
+    """
+    if not _trackers or _rank != 0:
+        return
+
+    # Idempotency guard: only dispatch once per adapter
+    if adapter_name is not None:
+        if adapter_name in _hparams_dispatched:
+            return
+        _hparams_dispatched.add(adapter_name)


This function has two issues:

It needs to call _auto_init_from_env() to support lazy initialization.

If adapter_name is None (which is the default for full fine-tuning without LoRA), the idempotency guard is skipped entirely. This causes dispatch_hyperparams to flood tracking servers with redundant config updates on every single training step where calculate_metrics is called. Using a default key like "_default_" when adapter_name is None prevents this flooding.

def dispatch_hyperparams(params: Dict[str, Any], adapter_name: Optional[str] = None) -> None: """Send hyperparameters to all registered trackers (call once at training start). Idempotent per ``(adapter_name,)`` — repeated calls with the same *adapter_name* are silently ignored so that this can safely be called from ``calculate_metrics`` on its first invocation without flooding trackers with redundant config updates. Args: params: Flat or nested dict of hyperparameters (e.g. model config, training args, LoRA config). adapter_name: Optional adapter identifier. If omitted, the params are dispatched unconditionally on every call. """ _auto_init_from_env() if not _trackers or _rank != 0: return # Idempotency guard: only dispatch once per adapter (or once globally if None) guard_key = adapter_name if adapter_name is not None else "_default_" if guard_key in _hparams_dispatched: return _hparams_dispatched.add(guard_key)

gemini-code-assist · 2026-06-02T12:57:03Z

+    def _save_experiment_info(self, output_dir: str) -> None:
+        try:
+            info = {"swanlab_experiment_url": self._run.get_run().url}
+            out = Path(output_dir) / "swanlab_config.json"
+            out.write_text(json.dumps(info, indent=2))
+        except Exception:
+            pass


If the specified output_dir does not exist, out.write_text will raise a FileNotFoundError because Python does not automatically create parent directories. Although this is wrapped in a try...except block, it results in a silent failure where the experiment info is not saved. Creating the parent directories first ensures robustness.

Suggested change

def _save_experiment_info(self, output_dir: str) -> None:

try:

info = {"swanlab_experiment_url": self._run.get_run().url}

out = Path(output_dir) / "swanlab_config.json"

out.write_text(json.dumps(info, indent=2))

except Exception:

pass

def _save_experiment_info(self, output_dir: str) -> None:

try:

info = {"swanlab_experiment_url": self._run.get_run().url}

out = Path(output_dir) / "swanlab_config.json"

out.parent.mkdir(parents=True, exist_ok=True)

out.write_text(json.dumps(info, indent=2))

except Exception:

pass

gemini-code-assist · 2026-06-02T12:57:03Z

+def list_trackers() -> List[ExperimentTracker]:
+    """Return a snapshot of currently registered trackers."""
+    return list(_trackers)


To support lazy initialization of trackers from environment variables, _auto_init_from_env() should be called when listing trackers.

Suggested change

def list_trackers() -> List[ExperimentTracker]:

"""Return a snapshot of currently registered trackers."""

return list(_trackers)

def list_trackers() -> List[ExperimentTracker]:

"""Return a snapshot of currently registered trackers."""

_auto_init_from_env()

return list(_trackers)

gemini-code-assist · 2026-06-02T12:57:03Z

+def dispatch(data: Dict[str, float], step: int) -> None:
+    """Send computed metrics to all registered trackers.
+
+    Metric values are normalized to ``float`` via :func:`clean_metrics`
+    before dispatching.  Only the rank-0 process performs the dispatch;
+    all other ranks return immediately with no overhead.
+
+    Args:
+        data: Raw metric dict (may contain strings, ints, floats).
+        step: Current training step (``cur_step`` from optimizer group).
+    """
+    if not _trackers:
+        return


To support lazy initialization of trackers, _auto_init_from_env() should be called inside dispatch. Additionally, the type hint for data should be Dict[str, Any] since raw metrics can contain strings, ints, or floats before being normalized by clean_metrics.

Suggested change

def dispatch(data: Dict[str, float], step: int) -> None:

"""Send computed metrics to all registered trackers.

Metric values are normalized to ``float`` via :func:`clean_metrics`

before dispatching. Only the rank-0 process performs the dispatch;

all other ranks return immediately with no overhead.

Args:

data: Raw metric dict (may contain strings, ints, floats).

step: Current training step (``cur_step`` from optimizer group).

"""

if not _trackers:

return

def dispatch(data: Dict[str, Any], step: int) -> None:

"""Send computed metrics to all registered trackers.

Metric values are normalized to ``float`` via :func:`clean_metrics`

before dispatching. Only the rank-0 process performs the dispatch;

all other ranks return immediately with no overhead.

Args:

data: Raw metric dict (may contain strings, ints, floats).

step: Current training step (``cur_step`` from optimizer group).

"""

_auto_init_from_env()

if not _trackers:

return

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…otency, settings kwarg, mkdir Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

tpx818 and others added 3 commits May 12, 2026 14:27

init tracker

8723c57

fix swanlab

3b31340

tpx818 changed the title ad add tracker to support swanlab Jun 2, 2026

gemini-code-assist Bot reviewed Jun 2, 2026

View reviewed changes

tpx and others added 2 commits June 2, 2026 21:07

apply lint auto-fixes to tracker module

a623576

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

add debug_tracker.py for tracker module debugging

31a42b3

Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

tpx818 marked this pull request as draft June 2, 2026 14:06

add per-adapter routing to tracker dispatch

5ce53ee

Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

supertpx force-pushed the tpx_tracker branch from caf07aa to 5ce53ee Compare June 2, 2026 14:11

tpx and others added 2 commits June 2, 2026 22:14

apply PR#213 review fixes: lazy auto-init, dispatch_hyperparams idemp…

e1b6a46

…otency, settings kwarg, mkdir Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

fix lint issues: E731/E226 in debug_tracker.py, E402 in tests, isort

d46462e

Ultraworked with Sisyphus Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

tpx818 closed this Jun 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add tracker to support swanlab#213

add tracker to support swanlab#213
tpx818 wants to merge 8 commits into
modelscope:mainfrom
tpx818:tpx_tracker

tpx818 commented Jun 2, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Run auto-init once at import time (before user code or atexit runs)
		_auto_init_from_env()

	# Run auto-init once at import time (before user code or atexit runs)
	_auto_init_from_env()
	# Auto-init is run lazily on first tracker registration or dispatch

Conversation

tpx818 commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR type

PR information

Experiment results

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tpx818 commented Jun 2, 2026 •

edited

Loading