Skip to content

add tracker to support swanlab#213

Closed
tpx818 wants to merge 8 commits into
modelscope:mainfrom
tpx818:tpx_tracker
Closed

add tracker to support swanlab#213
tpx818 wants to merge 8 commits into
modelscope:mainfrom
tpx818:tpx_tracker

Conversation

@tpx818
Copy link
Copy Markdown
Collaborator

@tpx818 tpx818 commented Jun 2, 2026

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

add tracker submodule.

Experiment results

Paste your experiment result here(if needed).

tpx818 and others added 3 commits May 12, 2026 14:27
Cover SwanLabTracker behavioral tests (23 cases) and dispatch system tests (38 cases): register_tracker, dispatch, dispatch_hyperparams, clear_trackers, set_rank, auto_init_from_env.

Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@tpx818 tpx818 changed the title ad add tracker to support swanlab Jun 2, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a unified experiment tracking dispatch system for Twinkle training metrics, adding support for SwanLab and Weights & Biases (W&B) trackers along with environment-variable auto-initialization. The review feedback highlights several critical issues and improvement opportunities: a potential TypeError in WandbTracker when custom settings are passed, distributed training issues caused by initializing trackers at import time rather than lazily, a lack of idempotency in dispatch_hyperparams when adapter_name is None (which would flood tracking servers), and a silent failure in SwanLabTracker if the output directory does not exist. Additionally, suggestions were made to fully support lazy initialization in list_trackers and dispatch, and to correct the type hint for raw metrics.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/twinkle/tracker/wandb.py Outdated
Comment on lines +33 to +36
settings = None
proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")
if proxy:
settings = wandb.Settings(https_proxy=proxy)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If settings is passed in kwargs when instantiating WandbTracker, it will cause a TypeError because settings is also explicitly passed as a keyword argument to wandb.init (line 43) without being popped from kwargs first. Popping settings from kwargs beforehand resolves this issue and allows users to safely pass custom configurations.

Suggested change
settings = None
proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")
if proxy:
settings = wandb.Settings(https_proxy=proxy)
settings = kwargs.pop("settings", None)
proxy = kwargs.pop("wandb_proxy", None) or os.environ.get("WANDB_PROXY")
if proxy:
settings = wandb.Settings(https_proxy=proxy)

Comment thread src/twinkle/tracker/__init__.py Outdated
Comment on lines +194 to +195
# Run auto-init once at import time (before user code or atexit runs)
_auto_init_from_env()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Initializing trackers at import time is problematic for distributed training. When multiple processes (e.g., in DDP or Ray) import twinkle.tracker, they will all trigger _auto_init_from_env() simultaneously. Since set_rank has not been called yet, _rank is 0 on all processes, causing every rank to initialize its own tracking run. Removing the import-time call and running auto-initialization lazily ensures it only runs on rank 0 after the distributed environment is set up.

Suggested change
# Run auto-init once at import time (before user code or atexit runs)
_auto_init_from_env()
# Auto-init is run lazily on first tracker registration or dispatch

Comment thread src/twinkle/tracker/__init__.py Outdated
Comment on lines +112 to +133
def dispatch_hyperparams(params: Dict[str, Any], adapter_name: Optional[str] = None) -> None:
"""Send hyperparameters to all registered trackers (call once at training start).

Idempotent per ``(adapter_name,)`` — repeated calls with the same
*adapter_name* are silently ignored so that this can safely be called
from ``calculate_metrics`` on its first invocation without
flooding trackers with redundant config updates.

Args:
params: Flat or nested dict of hyperparameters (e.g. model config,
training args, LoRA config).
adapter_name: Optional adapter identifier. If omitted, the params
are dispatched unconditionally on every call.
"""
if not _trackers or _rank != 0:
return

# Idempotency guard: only dispatch once per adapter
if adapter_name is not None:
if adapter_name in _hparams_dispatched:
return
_hparams_dispatched.add(adapter_name)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This function has two issues:

  1. It needs to call _auto_init_from_env() to support lazy initialization.
  2. If adapter_name is None (which is the default for full fine-tuning without LoRA), the idempotency guard is skipped entirely. This causes dispatch_hyperparams to flood tracking servers with redundant config updates on every single training step where calculate_metrics is called. Using a default key like "_default_" when adapter_name is None prevents this flooding.
def dispatch_hyperparams(params: Dict[str, Any], adapter_name: Optional[str] = None) -> None:
    """Send hyperparameters to all registered trackers (call once at training start).

    Idempotent per ``(adapter_name,)`` — repeated calls with the same
    *adapter_name* are silently ignored so that this can safely be called
    from ``calculate_metrics`` on its first invocation without
    flooding trackers with redundant config updates.

    Args:
        params: Flat or nested dict of hyperparameters (e.g. model config,
            training args, LoRA config).
        adapter_name: Optional adapter identifier.  If omitted, the params
            are dispatched unconditionally on every call.
    """
    _auto_init_from_env()
    if not _trackers or _rank != 0:
        return

    # Idempotency guard: only dispatch once per adapter (or once globally if None)
    guard_key = adapter_name if adapter_name is not None else "_default_"
    if guard_key in _hparams_dispatched:
        return
    _hparams_dispatched.add(guard_key)

Comment on lines +69 to +75
def _save_experiment_info(self, output_dir: str) -> None:
try:
info = {"swanlab_experiment_url": self._run.get_run().url}
out = Path(output_dir) / "swanlab_config.json"
out.write_text(json.dumps(info, indent=2))
except Exception:
pass
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

If the specified output_dir does not exist, out.write_text will raise a FileNotFoundError because Python does not automatically create parent directories. Although this is wrapped in a try...except block, it results in a silent failure where the experiment info is not saved. Creating the parent directories first ensures robustness.

Suggested change
def _save_experiment_info(self, output_dir: str) -> None:
try:
info = {"swanlab_experiment_url": self._run.get_run().url}
out = Path(output_dir) / "swanlab_config.json"
out.write_text(json.dumps(info, indent=2))
except Exception:
pass
def _save_experiment_info(self, output_dir: str) -> None:
try:
info = {"swanlab_experiment_url": self._run.get_run().url}
out = Path(output_dir) / "swanlab_config.json"
out.parent.mkdir(parents=True, exist_ok=True)
out.write_text(json.dumps(info, indent=2))
except Exception:
pass

Comment thread src/twinkle/tracker/__init__.py Outdated
Comment on lines +62 to +64
def list_trackers() -> List[ExperimentTracker]:
"""Return a snapshot of currently registered trackers."""
return list(_trackers)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support lazy initialization of trackers from environment variables, _auto_init_from_env() should be called when listing trackers.

Suggested change
def list_trackers() -> List[ExperimentTracker]:
"""Return a snapshot of currently registered trackers."""
return list(_trackers)
def list_trackers() -> List[ExperimentTracker]:
"""Return a snapshot of currently registered trackers."""
_auto_init_from_env()
return list(_trackers)

Comment thread src/twinkle/tracker/__init__.py Outdated
Comment on lines +85 to +97
def dispatch(data: Dict[str, float], step: int) -> None:
"""Send computed metrics to all registered trackers.

Metric values are normalized to ``float`` via :func:`clean_metrics`
before dispatching. Only the rank-0 process performs the dispatch;
all other ranks return immediately with no overhead.

Args:
data: Raw metric dict (may contain strings, ints, floats).
step: Current training step (``cur_step`` from optimizer group).
"""
if not _trackers:
return
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To support lazy initialization of trackers, _auto_init_from_env() should be called inside dispatch. Additionally, the type hint for data should be Dict[str, Any] since raw metrics can contain strings, ints, or floats before being normalized by clean_metrics.

Suggested change
def dispatch(data: Dict[str, float], step: int) -> None:
"""Send computed metrics to all registered trackers.
Metric values are normalized to ``float`` via :func:`clean_metrics`
before dispatching. Only the rank-0 process performs the dispatch;
all other ranks return immediately with no overhead.
Args:
data: Raw metric dict (may contain strings, ints, floats).
step: Current training step (``cur_step`` from optimizer group).
"""
if not _trackers:
return
def dispatch(data: Dict[str, Any], step: int) -> None:
"""Send computed metrics to all registered trackers.
Metric values are normalized to ``float`` via :func:`clean_metrics`
before dispatching. Only the rank-0 process performs the dispatch;
all other ranks return immediately with no overhead.
Args:
data: Raw metric dict (may contain strings, ints, floats).
step: Current training step (``cur_step`` from optimizer group).
"""
_auto_init_from_env()
if not _trackers:
return

tpx and others added 2 commits June 2, 2026 21:07
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent)

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@tpx818 tpx818 marked this pull request as draft June 2, 2026 14:06
Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
tpx and others added 2 commits June 2, 2026 22:14
…otency, settings kwarg, mkdir

Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
Ultraworked with Sisyphus

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
@tpx818 tpx818 closed this Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant