Skip to content

feat(eval): expose user_simulator_config in generate_responses#5733

Open
primenko-v wants to merge 14 commits into
google:mainfrom
primenko-v:propagate-user-simulator-config
Open

feat(eval): expose user_simulator_config in generate_responses#5733
primenko-v wants to merge 14 commits into
google:mainfrom
primenko-v:propagate-user-simulator-config

Conversation

@primenko-v
Copy link
Copy Markdown

Link to Issue or Description of Change

1. Link to an existing issue (if applicable):

N/A

2. Or, if no issue exists, describe the change:

Problem:
EvaluationGenerator.generate_responses constructs a UserSimulatorProvider() with no arguments, so the LLM-backed path always runs with the default BaseUserSimulatorConfig. There is no way for a caller to override the user-simulation model, max-allowed invocations, or custom instructions when driving multi-turn conversations through LlmBackedUserSimulator.

Solution:
Add an optional user_simulator_config parameter to generate_responses and forward it to UserSimulatorProvider(...). Callers can now pass an LlmBackedUserSimulatorConfig to customize the LLM-backed simulator.

The behavior is backward compatible:

  • When the argument is omitted, UserSimulatorProvider falls back to BaseUserSimulatorConfig() exactly as before.
  • Static eval cases are unaffected: the config is ignored by StaticUserSimulator.

Testing Plan

Unit Tests:

  • I have added or updated unit tests for my change.
  • All unit tests pass locally.

A unit test for the proposed change was added to tests/unittests/evaluation/test_evaluation_generator.py: TestGenerateResponses::test_generate_responses_forwards_llm_backed_user_simulator_config

All tests pass:

> uv run pytest tests/unittests/ -rs
...
================================== short test summary info ================================== 
SKIPPED [1] tests/unittests/integrations/crewai/test_crewai_tool.py:20: Requires Python 3.10+
================  5770 passed, 1 skipped, 2358 warnings in 129.40s (0:02:09) ================

The skipped test is not related to this change — it skips on main as well.

Manual End-to-End (E2E) Tests:

A reference setup lives at https://github.com/primenko-v/adk-x-mlflow (tag pr-demo/user-simulator-config).

It loads an LlmBackedUserSimulatorConfig from YAML and forwards it to EvaluationGenerator.generate_responses via the new user_simulator_config parameter — see src/mlflow_adk/simulate.py.

To reproduce (requires GOOGLE_CLOUD_PROJECT and ADC via gcloud auth application-default login):

git clone --recurse-submodules --branch pr-demo/user-simulator-config \
    https://github.com/primenko-v/adk-x-mlflow.git
cd adk-x-mlflow
cp .env.example .env  # fill in GOOGLE_CLOUD_PROJECT
uv sync
uv run python -m mlflow_adk.simulate --no-mlflow --output-traces traces.jsonl

Checklist

  • I have read the CONTRIBUTING.md document.
  • I have performed a self-review of my own code.
  • I have commented my code, particularly in hard-to-understand areas.
  • I have added tests that prove my fix is effective or that my feature works.
  • New and existing unit tests pass locally with my changes.
  • I have manually tested my changes end-to-end.
  • Any dependent changes have been merged and published in downstream modules.

@rohityan rohityan self-assigned this May 18, 2026
@rohityan rohityan added eval [Component] This issue is related to evaluation request clarification [Status] The maintainer need clarification or more information from the author labels May 20, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @primenko-v , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh.

@primenko-v
Copy link
Copy Markdown
Author

Hi @primenko-v , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh.

Thank you for the review @rohityan ! I installed and ran the pre-commit hook, as the autoformat.sh doesn't seem to exist anymore.

@primenko-v
Copy link
Copy Markdown
Author

@rohityan let me know if there's anything else needed!

@rohityan rohityan added needs review [Status] The PR/issue is awaiting review from the maintainer and removed request clarification [Status] The maintainer need clarification or more information from the author labels May 29, 2026
@rohityan rohityan requested a review from ankursharmas May 29, 2026 18:53
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @ankursharmas , can you please review this.

Comment on lines +601 to +602
class TestGenerateResponses:
"""Test cases for EvaluationGenerator.generate_responses method."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this new class. The newly added test case can be a part of the existing class TestGenerateInferencesFromRootAgent

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the feedback @ankursharmas !

The newly added test exercises generate_responses rather than _generate_inferences_from_root_agent. It seems to me that test_evaluation_generator.py mostly follows a one-class-per-method convention, where each test class targets a single method. For example:

Test class Method under test
TestConvertEventsToEvalInvocation convert_events_to_eval_invocations
TestGetAppDetailsByInvocationId _get_app_details_by_invocation_id
TestGenerateInferencesForSingleUserInvocation _generate_inferences_for_single_user_invocation
TestGenerateInferencesForSingleUserInvocationLive _generate_inferences_for_single_user_invocation_live

Following that, wouldn't a separate TestGenerateResponses class for generate_responses fit better here?

While looking at this I also noticed that test_generates_inferences_with_user_simulator_live lives in TestGenerateInferencesFromRootAgent even though it tests _generate_inferences_from_root_agent_live, so the convention is already a bit mixed there.

On a separate note: there was a careless merge on my side which I have now fixed.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah makes sense! Thank you for pointing that out.

@ankursharmas ankursharmas requested a review from GWeale June 3, 2026 20:13
@ankursharmas ankursharmas self-assigned this Jun 4, 2026
@primenko-v
Copy link
Copy Markdown
Author

primenko-v commented Jun 4, 2026

I can see that the pre-commit fails on the formatting now, and surprisingly not on the files changed by this PR:

uv run pre-commit run --all-files
check yaml...............................................................Passed
fix end of files.........................................................Failed
- hook id: end-of-file-fixer
- exit code: 1
- files were modified by this hook

Fixing tests/unittests/flows/llm_flows/test_base_llm_flow.py

[Manually removed trim trailing whitespace errors on src/google/adk/cli/browser/*.js files from the output]

pyproject-fmt............................................................Passed
isort....................................................................Failed
- hook id: isort
- files were modified by this hook

Fixing /home/prmnk/proj/adk-python-main/src/google/adk/flows/llm_flows/base_llm_flow.py
Fixing /home/prmnk/proj/adk-python-main/tests/unittests/flows/llm_flows/test_base_llm_flow.py

pyink....................................................................Failed
- hook id: pyink
- files were modified by this hook

reformatted src/google/adk/flows/llm_flows/basic.py
reformatted src/google/adk/models/gemini_llm_connection.py
reformatted tests/unittests/flows/llm_flows/test_base_llm_flow.py
reformatted tests/unittests/models/test_gemini_llm_connection.py

All done! ✨ 🍰 ✨
4 files reformatted, 1510 files left unchanged.

I have tried syncing my branch with the latest main, but the formatting issue is still there.

It looks like the reformatted files were brought by this commit that was pushed to main directly. The thing is, pre-commit/action runs --all-files, so any PR that touches .py files now inherits this failure.

I have opened #5962 to fix the formatting introduced by that commit, and #5963 to try to prevent these formatting issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants