feat(eval): expose user_simulator_config in generate_responses#5733
feat(eval): expose user_simulator_config in generate_responses#5733primenko-v wants to merge 14 commits into
Conversation
|
Hi @primenko-v , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh. |
Thank you for the review @rohityan ! I installed and ran the pre-commit hook, as the autoformat.sh doesn't seem to exist anymore. |
|
@rohityan let me know if there's anything else needed! |
|
Hi @ankursharmas , can you please review this. |
| class TestGenerateResponses: | ||
| """Test cases for EvaluationGenerator.generate_responses method.""" |
There was a problem hiding this comment.
We don't need this new class. The newly added test case can be a part of the existing class TestGenerateInferencesFromRootAgent
There was a problem hiding this comment.
Thank you for the feedback @ankursharmas !
The newly added test exercises generate_responses rather than _generate_inferences_from_root_agent. It seems to me that test_evaluation_generator.py mostly follows a one-class-per-method convention, where each test class targets a single method. For example:
| Test class | Method under test |
|---|---|
TestConvertEventsToEvalInvocation |
convert_events_to_eval_invocations |
TestGetAppDetailsByInvocationId |
_get_app_details_by_invocation_id |
TestGenerateInferencesForSingleUserInvocation |
_generate_inferences_for_single_user_invocation |
TestGenerateInferencesForSingleUserInvocationLive |
_generate_inferences_for_single_user_invocation_live |
Following that, wouldn't a separate TestGenerateResponses class for generate_responses fit better here?
While looking at this I also noticed that test_generates_inferences_with_user_simulator_live lives in TestGenerateInferencesFromRootAgent even though it tests _generate_inferences_from_root_agent_live, so the convention is already a bit mixed there.
On a separate note: there was a careless merge on my side which I have now fixed.
There was a problem hiding this comment.
Ah makes sense! Thank you for pointing that out.
|
I can see that the I have tried syncing my branch with the latest It looks like the reformatted files were brought by this commit that was pushed to I have opened #5962 to fix the formatting introduced by that commit, and #5963 to try to prevent these formatting issues. |
Link to Issue or Description of Change
1. Link to an existing issue (if applicable):
N/A
2. Or, if no issue exists, describe the change:
Problem:
EvaluationGenerator.generate_responsesconstructs aUserSimulatorProvider()with no arguments, so the LLM-backed path always runs with the defaultBaseUserSimulatorConfig. There is no way for a caller to override the user-simulation model, max-allowed invocations, or custom instructions when driving multi-turn conversations throughLlmBackedUserSimulator.Solution:
Add an optional
user_simulator_configparameter togenerate_responsesand forward it toUserSimulatorProvider(...). Callers can now pass anLlmBackedUserSimulatorConfigto customize the LLM-backed simulator.The behavior is backward compatible:
UserSimulatorProviderfalls back toBaseUserSimulatorConfig()exactly as before.StaticUserSimulator.Testing Plan
Unit Tests:
A unit test for the proposed change was added to
tests/unittests/evaluation/test_evaluation_generator.py:TestGenerateResponses::test_generate_responses_forwards_llm_backed_user_simulator_configAll tests pass:
The skipped test is not related to this change — it skips on
mainas well.Manual End-to-End (E2E) Tests:
A reference setup lives at https://github.com/primenko-v/adk-x-mlflow (tag
pr-demo/user-simulator-config).It loads an
LlmBackedUserSimulatorConfigfrom YAML and forwards it toEvaluationGenerator.generate_responsesvia the newuser_simulator_configparameter — seesrc/mlflow_adk/simulate.py.To reproduce (requires GOOGLE_CLOUD_PROJECT and ADC via
gcloud auth application-default login):git clone --recurse-submodules --branch pr-demo/user-simulator-config \ https://github.com/primenko-v/adk-x-mlflow.git cd adk-x-mlflow cp .env.example .env # fill in GOOGLE_CLOUD_PROJECT uv sync uv run python -m mlflow_adk.simulate --no-mlflow --output-traces traces.jsonlChecklist