Skip to content

fix: resolve evaluation metrics bugs in Classification_Transformers#229

Open
rixav77 wants to merge 1 commit into
ML4SCI:mainfrom
rixav77:fix/classification-transformers-eval-metrics
Open

fix: resolve evaluation metrics bugs in Classification_Transformers#229
rixav77 wants to merge 1 commit into
ML4SCI:mainfrom
rixav77:fix/classification-transformers-eval-metrics

Conversation

@rixav77
Copy link
Copy Markdown

@rixav77 rixav77 commented May 7, 2026

Summary

Fixes #192

Three bugs in the evaluation pipeline of DeepLense_Classification_Transformers_Archil_Srivastava produce incorrect metrics:

Bug 1: micro_auroc always NaN

The micro_auroc list is initialized but never populated — np.mean([]) silently returns nan. Added the missing auroc_fn(..., average="micro") call.

Bug 2: Missing softmax dimension

torch.nn.functional.softmax(metrics["logits"]) on line 172 omits the dim argument, triggering a deprecation warning and potentially incorrect behavior. Line 158 already uses dim=-1 correctly — applied the same fix for consistency.

Bug 3: Hardcoded W&B entity

entity="_archil" is hardcoded in both train.py and eval.py, causing authentication errors for other contributors. Replaced with a configurable --entity CLI argument that defaults to the WANDB_ENTITY environment variable (or None if unset, which lets W&B use the logged-in user's default entity).

Changes

  • eval.py: Add micro_auroc computation, add dim=-1 to softmax, add --entity arg
  • train.py: Add --entity arg, replace hardcoded entity

Test plan

  • Verify micro_auroc is no longer NaN after evaluation
  • Verify no deprecation warning from softmax call
  • Verify training works without --entity flag (defaults to logged-in W&B user)
  • Verify --entity my_team overrides correctly

Fixes ML4SCI#192

- Add missing micro_auroc computation (was initialized but never
  populated, causing np.mean([]) to silently return NaN)
- Add explicit dim=-1 to softmax call in ROC curve plotting to
  match the correct usage elsewhere and suppress deprecation warning
- Replace hardcoded W&B entity "_archil" with configurable --entity
  CLI arg that falls back to WANDB_ENTITY env var, allowing other
  contributors to use their own W&B accounts
Copilot AI review requested due to automatic review settings May 7, 2026 08:31
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes incorrect and/or unusable evaluation logging in DeepLense_Classification_Transformers_Archil_Srivastava by addressing missing metric computation, a PyTorch softmax API misuse, and hardcoded W&B configuration.

Changes:

  • Compute and log micro_auroc during evaluation (previously always NaN due to an empty list).
  • Specify dim=-1 in the ROC softmax call to avoid deprecated/ambiguous behavior.
  • Replace hardcoded W&B entity with a configurable --entity CLI argument (defaulting to WANDB_ENTITY).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
DeepLense_Classification_Transformers_Archil_Srivastava/eval.py Adds missing micro_auroc, fixes softmax(..., dim=-1), and makes W&B entity configurable for evaluation runs.
DeepLense_Classification_Transformers_Archil_Srivastava/train.py Makes W&B entity configurable for training runs (removes hardcoded entity).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -84,6 +85,7 @@ def evaluate(model, data_loader, loss_fn, device):
# Wandb-specific params
parser.add_argument("--runid", type=str, help="ID of train run")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] DeepLense_Classification_Transformers: Evaluation metrics bugs — micro_auroc always NaN, missing softmax dim, hardcoded W&B entity

2 participants