Skip to content

feat(evaluator): add metric model resolvers#38

Merged
SandyChapman merged 1 commit into
mainfrom
4050-evaluator-metric-resolvers/schapman
May 29, 2026
Merged

feat(evaluator): add metric model resolvers#38
SandyChapman merged 1 commit into
mainfrom
4050-evaluator-metric-resolvers/schapman

Conversation

@SandyChapman
Copy link
Copy Markdown
Contributor

@SandyChapman SandyChapman commented May 25, 2026

Summary

  • Add SDK resolver protocols plus local secret/model resolvers for evaluator metric reference hydration.
  • Add model-ref discovery/resolution helpers and wire LLM judge/RAGAS metrics through backend/plugin resolution.
  • Add platform model resolution for plugin execution and preserve service prompt-default compatibility while deferring SDK defaults until runtime prep.

Verification

  • uv run --frozen ruff check packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py
  • uv run --frozen ty check packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py services/evaluator/src/nmp/evaluator/app/metrics/metric.py services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py services/evaluator/src/nmp/evaluator/entities/metrics.py
  • uv run --frozen pytest packages/nemo_evaluator_sdk/tests plugins/nemo-evaluator/tests services/evaluator/tests -q

Draft until review.

Summary by CodeRabbit

  • New Features

    • Metrics can reference validated ModelRef values; local and platform resolvers resolve models and secrets.
    • Local in-process resolvers and local backend support resolver-driven execution; persistent prompt_template exposed in APIs.
    • Default, model-aware LLM judge prompt templates supplied.
  • Refactor

    • Unified metric preparation pipeline: applies job params, resolves models/secrets, runs preflight, and isolates metrics before execution.
    • LLM-judge/RAGAS defer model/secret-dependent setup until resolution.
  • Tests

    • Expanded tests for resolver, resolution, and job-time validation flows.

Review Change Stack

Comment thread packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 25, 2026

Suite Lines Covered Line Rate Branch Rate
Unit Tests 18416/24382 75.5% 61.9%
Integration Tests 11764/23159 50.8% 25.9%

@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 81c2109 to 4910652 Compare May 26, 2026 12:19
@SandyChapman SandyChapman changed the title Add evaluator metric model resolvers feat(evaluator): add metric model resolvers May 26, 2026
@SandyChapman SandyChapman marked this pull request as ready for review May 26, 2026 12:23
@SandyChapman SandyChapman requested review from a team as code owners May 26, 2026 12:23
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch 2 times, most recently from 2e18d07 to 2e715c5 Compare May 26, 2026 12:53
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 2e715c5 to b34abef Compare May 28, 2026 14:13
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

<review_stack_artifact>

</review_stack_artifact>

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed Title accurately summarizes the main change: adding metric model resolvers to support evaluator metric reference hydration.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch 4050-evaluator-metric-resolvers/schapman

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py`:
- Around line 265-270: input_schema() currently calls
default_judge_prompt_template_for_model(self.model) without the job_type,
causing offline/online differences; update the call in input_schema (in
class/method input_schema) to pass the execution mode by calling
default_judge_prompt_template_for_model(self.model, self.job_type) (or the
appropriate attribute/enum for job type on the instance) so the default
prompt-template schema inference and subsequent validation use the correct
job_type-aware defaults.

In `@plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py`:
- Around line 201-205: The code silently skips model resolution when both
async_sdk and sdk are None, causing a later misleading LocalBackend error; add a
fast-fail before calling resolve_run_dataset: if platform_sdk is None and
spec.metric indicates a ModelRef (check spec.metric or its type used by
self._resolve_metric_models/PlatformModelResolver), raise a clear exception
(e.g., ValueError or a domain-specific error) explaining that a platform SDK is
required to resolve ModelRef metrics; otherwise keep the existing
run_sync(self._resolve_metric_models(...)) path when platform_sdk is present.

In `@services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py`:
- Around line 196-203: The create() path currently only gathers model-related
secrets via _append_model_secret and misses validating entities.RemoteMetric /
entities.NemoAgentToolkitRemoteMetric api_key_secret, so update create() in
manager.py to also collect and validate metric.api_key_secret (use the same
SecretRef/ApiSecretRef handling as for models), e.g. call the existing
secret-append helper (or add a small _append_api_key_secret) for
entities.RemoteMetric and entities.NemoAgentToolkitRemoteMetric before the final
secrets loop so invalid api_key_secret refs cannot be persisted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d3afb3e8-5267-4050-8582-8e9300640c4e

📥 Commits

Reviewing files that changed from the base of the PR and between 0157981 and b34abef.

⛔ Files ignored due to path filters (15)
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.py is excluded by !sdk/**
📒 Files selected for processing (28)
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
  • packages/nemo_evaluator_sdk/tests/execution/test_metric_execution.py
  • packages/nemo_evaluator_sdk/tests/execution/test_resolvers.py
  • packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
  • plugins/nemo-evaluator/tests/test_evaluate_job.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
  • services/evaluator/src/nmp/evaluator/app/metrics/metric.py
  • services/evaluator/src/nmp/evaluator/entities/metrics.py
  • services/evaluator/tests/app/metrics/test_metric_factory.py

Comment thread plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
Comment thread services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch 2 times, most recently from ea313db to 210323a Compare May 28, 2026 15:09
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 210323a to 2315857 Compare May 28, 2026 15:22
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 2315857 to d851e48 Compare May 28, 2026 15:42
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 28, 2026

Actionable comments posted: 0

Comment thread packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py Outdated
Comment thread packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py Outdated
Comment thread plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py Outdated
Comment thread plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
Comment thread plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py Outdated
Comment thread plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from d851e48 to b2d0f26 Compare May 28, 2026 18:25
Comment thread plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py Dismissed
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py`:
- Around line 191-210: resolve_models() updates the metric's model but doesn't
clear cached authentication/client, so old _client and _api_key can be reused;
after calling resolve_model_refs(self, model_resolver) (or right before
returning from resolve_models) set self._client = None and self._api_key = None
to force reinitialization via resolve_secrets(), and also ensure __deepcopy__
does not preserve the live _client/_api_key (either omit them from the copied
state or set them to None) so copies don't retain stale connections.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 69946b1a-05c4-4563-a8ac-7a536fd7cc55

📥 Commits

Reviewing files that changed from the base of the PR and between d851e48 and b2d0f26.

⛔ Files ignored due to path filters (15)
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.py is excluded by !sdk/**
📒 Files selected for processing (29)
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
  • packages/nemo_evaluator_sdk/tests/execution/test_metric_execution.py
  • packages/nemo_evaluator_sdk/tests/execution/test_resolvers.py
  • packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
  • plugins/nemo-evaluator/tests/test_evaluate_job.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
  • services/evaluator/src/nmp/evaluator/app/metrics/metric.py
  • services/evaluator/src/nmp/evaluator/entities/metrics.py
  • services/evaluator/tests/app/metrics/test_metric_factory.py
  • services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
🚧 Files skipped from review as they are similar to previous changes (25)
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/init.py
  • services/evaluator/src/nmp/evaluator/entities/metrics.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.py
  • services/evaluator/src/nmp/evaluator/app/metrics/metric.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
  • packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
  • services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/init.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
  • plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
  • plugins/nemo-evaluator/tests/test_evaluate_job.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • packages/nemo_evaluator_sdk/tests/execution/test_metric_execution.py

@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from b2d0f26 to 54c0193 Compare May 28, 2026 19:26
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py`:
- Around line 23-27: The function _parse_required_workspace_name currently
allows refs like "workspace/name/extra"; enforce exactly one path separator by
validating that ref contains exactly one '/' (e.g., use ref.count("/") == 1 or
ensure no additional '/' in the name after partition), and raise the same
ValueError(f"{label} must be in format '{expected_format}'") when the check
fails; update the condition that currently checks separator, workspace, and name
to also reject any extra '/' so only "workspace/name" is accepted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b56f21bd-8317-41f7-b0de-b858022ab0ac

📥 Commits

Reviewing files that changed from the base of the PR and between b2d0f26 and 54c0193.

⛔ Files ignored due to path filters (15)
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.py is excluded by !sdk/**
  • sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.py is excluded by !sdk/**
📒 Files selected for processing (29)
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
  • packages/nemo_evaluator_sdk/tests/execution/test_metric_execution.py
  • packages/nemo_evaluator_sdk/tests/execution/test_resolvers.py
  • packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
  • plugins/nemo-evaluator/tests/test_evaluate_job.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
  • services/evaluator/src/nmp/evaluator/app/metrics/metric.py
  • services/evaluator/src/nmp/evaluator/entities/metrics.py
  • services/evaluator/tests/app/metrics/test_metric_factory.py
  • services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
🚧 Files skipped from review as they are similar to previous changes (25)
  • services/evaluator/tests/app/metrics/test_metric_factory.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/init.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
  • services/evaluator/src/nmp/evaluator/entities/metrics.py
  • services/evaluator/src/nmp/evaluator/app/metrics/metric.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
  • packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/init.py
  • services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
  • plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
  • services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
  • packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
  • packages/nemo_evaluator_sdk/tests/execution/test_resolvers.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
  • packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
  • plugins/nemo-evaluator/tests/test_evaluate_job.py

Comment thread plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
Comment thread packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py Dismissed
Comment thread packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py Dismissed
Comment thread packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py Outdated
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 1845fa9 to 5145619 Compare May 29, 2026 12:17
Signed-off-by: Sandy Chapman <schapman@nvidia.com>
@SandyChapman SandyChapman force-pushed the 4050-evaluator-metric-resolvers/schapman branch from 5145619 to 9dcbc66 Compare May 29, 2026 12:22
@SandyChapman SandyChapman added this pull request to the merge queue May 29, 2026
Merged via the queue into main with commit a018dee May 29, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants