feat(evaluator): add metric model resolvers#38
Conversation
|
81c2109 to
4910652
Compare
2e18d07 to
2e715c5
Compare
2e715c5 to
b34abef
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 Walkthrough<review_stack_artifact> </review_stack_artifact> 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py`:
- Around line 265-270: input_schema() currently calls
default_judge_prompt_template_for_model(self.model) without the job_type,
causing offline/online differences; update the call in input_schema (in
class/method input_schema) to pass the execution mode by calling
default_judge_prompt_template_for_model(self.model, self.job_type) (or the
appropriate attribute/enum for job type on the instance) so the default
prompt-template schema inference and subsequent validation use the correct
job_type-aware defaults.
In `@plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py`:
- Around line 201-205: The code silently skips model resolution when both
async_sdk and sdk are None, causing a later misleading LocalBackend error; add a
fast-fail before calling resolve_run_dataset: if platform_sdk is None and
spec.metric indicates a ModelRef (check spec.metric or its type used by
self._resolve_metric_models/PlatformModelResolver), raise a clear exception
(e.g., ValueError or a domain-specific error) explaining that a platform SDK is
required to resolve ModelRef metrics; otherwise keep the existing
run_sync(self._resolve_metric_models(...)) path when platform_sdk is present.
In `@services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py`:
- Around line 196-203: The create() path currently only gathers model-related
secrets via _append_model_secret and misses validating entities.RemoteMetric /
entities.NemoAgentToolkitRemoteMetric api_key_secret, so update create() in
manager.py to also collect and validate metric.api_key_secret (use the same
SecretRef/ApiSecretRef handling as for models), e.g. call the existing
secret-append helper (or add a small _append_api_key_secret) for
entities.RemoteMetric and entities.NemoAgentToolkitRemoteMetric before the final
secrets loop so invalid api_key_secret refs cannot be persisted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d3afb3e8-5267-4050-8582-8e9300640c4e
⛔ Files ignored due to path filters (15)
sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.pyis excluded by!sdk/**
📒 Files selected for processing (28)
packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.pypackages/nemo_evaluator_sdk/tests/execution/test_metric_execution.pypackages/nemo_evaluator_sdk/tests/execution/test_resolvers.pypackages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.pypackages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.pypackages/nemo_evaluator_sdk/tests/metrics/test_remote.pyplugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.pyplugins/nemo-evaluator/src/nemo_evaluator/resolvers.pyplugins/nemo-evaluator/tests/test_evaluate_job.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/manager.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.pyservices/evaluator/src/nmp/evaluator/app/metrics/metric.pyservices/evaluator/src/nmp/evaluator/entities/metrics.pyservices/evaluator/tests/app/metrics/test_metric_factory.py
ea313db to
210323a
Compare
210323a to
2315857
Compare
|
Actionable comments posted: 0 |
2315857 to
d851e48
Compare
|
Actionable comments posted: 0 |
d851e48 to
b2d0f26
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.py`:
- Around line 191-210: resolve_models() updates the metric's model but doesn't
clear cached authentication/client, so old _client and _api_key can be reused;
after calling resolve_model_refs(self, model_resolver) (or right before
returning from resolve_models) set self._client = None and self._api_key = None
to force reinitialization via resolve_secrets(), and also ensure __deepcopy__
does not preserve the live _client/_api_key (either omit them from the copied
state or set them to None) so copies don't retain stale connections.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 69946b1a-05c4-4563-a8ac-7a536fd7cc55
⛔ Files ignored due to path filters (15)
sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.pyis excluded by!sdk/**
📒 Files selected for processing (29)
packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.pypackages/nemo_evaluator_sdk/tests/execution/test_metric_execution.pypackages/nemo_evaluator_sdk/tests/execution/test_resolvers.pypackages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.pypackages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.pypackages/nemo_evaluator_sdk/tests/metrics/test_remote.pyplugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.pyplugins/nemo-evaluator/src/nemo_evaluator/resolvers.pyplugins/nemo-evaluator/tests/test_evaluate_job.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/manager.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.pyservices/evaluator/src/nmp/evaluator/app/metrics/metric.pyservices/evaluator/src/nmp/evaluator/entities/metrics.pyservices/evaluator/tests/app/metrics/test_metric_factory.pyservices/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
🚧 Files skipped from review as they are similar to previous changes (25)
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/init.py
- services/evaluator/src/nmp/evaluator/entities/metrics.py
- services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.py
- services/evaluator/src/nmp/evaluator/app/metrics/metric.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
- packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
- packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
- services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
- packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/init.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
- plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
- plugins/nemo-evaluator/tests/test_evaluate_job.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
- plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
- packages/nemo_evaluator_sdk/tests/execution/test_metric_execution.py
b2d0f26 to
54c0193
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@plugins/nemo-evaluator/src/nemo_evaluator/resolvers.py`:
- Around line 23-27: The function _parse_required_workspace_name currently
allows refs like "workspace/name/extra"; enforce exactly one path separator by
validating that ref contains exactly one '/' (e.g., use ref.count("/") == 1 or
ensure no additional '/' in the name after partition), and raise the same
ValueError(f"{label} must be in format '{expected_format}'") when the check
fails; update the condition that currently checks separator, workspace, and name
to also reject any extra '/' so only "workspace/name" is accepted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b56f21bd-8317-41f7-b0de-b858022ab0ac
⛔ Files ignored due to path filters (15)
sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/backends/local/backend.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/metric_execution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/execution/utils.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/llm_judge_defaults.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/protocol.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/ragas/base.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/remote.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/metrics/resolution.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolver_protocols.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/resolvers.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/__init__.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/metrics.pyis excluded by!sdk/**sdk/python/nemo-platform/src/nemo_platform/beta/evaluator/values/models.pyis excluded by!sdk/**
📒 Files selected for processing (29)
packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/remote.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/__init__.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.pypackages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.pypackages/nemo_evaluator_sdk/tests/execution/test_metric_execution.pypackages/nemo_evaluator_sdk/tests/execution/test_resolvers.pypackages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.pypackages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.pypackages/nemo_evaluator_sdk/tests/metrics/test_remote.pyplugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.pyplugins/nemo-evaluator/src/nemo_evaluator/resolvers.pyplugins/nemo-evaluator/tests/test_evaluate_job.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/manager.pyservices/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.pyservices/evaluator/src/nmp/evaluator/app/metrics/metric.pyservices/evaluator/src/nmp/evaluator/entities/metrics.pyservices/evaluator/tests/app/metrics/test_metric_factory.pyservices/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
🚧 Files skipped from review as they are similar to previous changes (25)
- services/evaluator/tests/app/metrics/test_metric_factory.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/init.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolver_protocols.py
- services/evaluator/src/nmp/evaluator/entities/metrics.py
- services/evaluator/src/nmp/evaluator/app/metrics/metric.py
- packages/nemo_evaluator_sdk/tests/metrics/test_llm_judge.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/resolution.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/resolvers.py
- services/evaluator/src/nmp/evaluator/api/v2/metrics/schemas/metrics.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/metric_execution.py
- packages/nemo_evaluator_sdk/tests/metrics/ragas/test_ragas.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/backends/local/backend.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/protocol.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/init.py
- services/evaluator/tests/nmp/evaluator/api/v2/metrics/test_service.py
- plugins/nemo-evaluator/src/nemo_evaluator/jobs/evaluate.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/models.py
- services/evaluator/src/nmp/evaluator/api/v2/metrics/manager.py
- packages/nemo_evaluator_sdk/tests/metrics/test_remote.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/llm_judge_defaults.py
- packages/nemo_evaluator_sdk/tests/execution/test_resolvers.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/metrics/ragas/base.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/values/metrics.py
- packages/nemo_evaluator_sdk/src/nemo_evaluator_sdk/execution/utils.py
- plugins/nemo-evaluator/tests/test_evaluate_job.py
1845fa9 to
5145619
Compare
Signed-off-by: Sandy Chapman <schapman@nvidia.com>
5145619 to
9dcbc66
Compare
Summary
Verification
Draft until review.
Summary by CodeRabbit
New Features
Refactor
Tests