Add tinker backend. #448

chenyushuo · 2025-12-24T10:16:10Z

Description

Add tinker training backend.
Refactor the Experiences field in the return value of SampleStrategy to List[Experience], and update all related interfaces accordingly.
Add tinker examples.

Checklist

Please check the following items before code is ready to be reviewed.

Code has passed all tests
Docstrings have been added/updated in Google Style
Documentation has been updated
Code is ready for review

gemini-code-assist · 2025-12-24T10:16:35Z

Summary of Changes

Hello @chenyushuo, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a comprehensive integration of a new 'tinker' backend into the system. It extends the framework's capabilities to support Tinker models for both inference and training, complete with dedicated configuration, model handling, and training wrappers. The changes also involve significant refactoring to enable asynchronous operations and ensure proper synchronization and checkpointing mechanisms are in place for the new backend.

Highlights

Tinker Backend Integration: Introduced a new 'tinker' backend for model training and inference, enabling the use of Tinker-specific configurations and services.
New Configuration Options: Added a TinkerConfig dataclass to trinity/common/config.py for managing Tinker-specific settings such as enable, base_model, rank, seed, and training parameters for MLP, attention, and unembedding layers. This TinkerConfig is now part of the main ModelConfig.
Tinker-Specific Model and Trainer Implementations: Created TinkerModel in trinity/common/models/tinker_model.py for Tinker-based inference and experience generation, and TinkerTrainerWrapper in trinity/trainer/tinker_trainer.py to handle the training loop, loss calculation, and checkpointing for the Tinker backend.
Asynchronous Operations and Refactoring: Updated Trainer and TrainEngineWrapper to support asynchronous prepare and train_step methods, and refactored configuration validation and model synchronization logic to accommodate the new Tinker backend.
Synchronization and Checkpointing for Tinker: Modified the Synchronizer to specifically handle state dictionary finding and updates for Tinker-based checkpoints and remote sampler paths, including warnings for unsupported NCCL synchronization with Tinker or LoRA.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two major changes. First, it refactors the module registration mechanism, moving from a decorator-based approach to a centralized default_mapping in __init__.py files. This is a significant improvement for code clarity and maintainability. Second, it adds a new "tinker" backend, which is a substantial piece of work but appears to be in a work-in-progress state. My review focuses on the new tinker backend implementation and some general observations. I've identified a critical issue in the new tinker trainer that will cause a crash, along with a few other areas for improvement.

trinity/trainer/tinker_trainer.py

trinity/common/models/tinker_model.py

trinity/common/config.py

2. fix dyn sync in trainer and explorer 3. fix entropy in tinker trainer 4. add `tinker_base_model` to `InferenceModelConfig`

…_tinker_backend

chenyushuo · 2025-12-25T07:05:26Z

/unittest-all

…_tinker_backend

chenyushuo · 2025-12-25T07:49:17Z

/unittest-all

github-actions · 2025-12-25T07:53:55Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
230	115	111	4	0	0	2m 51s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	The test failed in the call phase
❌ tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	The test failed in the call phase
❌ tests/buffer/file_test.py::TestFileBuffer::test_file_reader	The test failed in the call phase
❌ tests/buffer/file_test.py::TestFileBuffer::test_file_writer	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	The test failed in the call phase
❌ tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	The test failed in the call phase
❌ tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	The test failed in the call phase
❌ tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	The test failed in the call phase
❌ tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	The test failed in the call phase
❌ tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	The test failed in the call phase
❌ tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_all_examples_are_valid	The test failed in the call phase due to an exception
❌ tests/common/config_test.py::TestConfig::test_chat_template_path	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_default_workflow	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_load_default_config	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	The test failed in the call phase
❌ tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	The test failed in the call phase
❌ tests/common/vllm_test.py::TestModelLen_0::test_model_len	The test failed in the call phase
❌ tests/common/vllm_test.py::TestModelLen_1::test_model_len	The test failed in the call phase
❌ tests/common/vllm_test.py::TestModelLen_2::test_model_len	The test failed in the call phase
❌ tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServer::test_api	The test failed in the call phase
❌ tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	The test failed in the call phase
❌ tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	The test failed in the call phase
❌ tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	The test failed in the call phase
❌ tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	The test failed in the call phase
❌ tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	The test failed in the call phase
❌ tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	The test failed in the call phase
❌ tests/explorer/explorer_test.py::ServeTest::test_serve	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	The test failed in the call phase
❌ tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	The test failed in the call phase
❌ tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	The test failed in the call phase
❌ tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	The test failed in the call phase
❌ tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	The test failed in the call phase
❌ tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	The test failed in the call phase
❌ tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	The test failed in the call phase
❌ tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestOverRollout::test_trainer	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	The test failed in the call phase
❌ tests/utils/log_test.py::LogTest::test_actor_log	The test failed in the call phase
❌ tests/utils/log_test.py::LogTest::test_group_by_node	The test failed in the call phase
❌ tests/utils/log_test.py::LogTest::test_no_actor_log	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins	The test failed in the call phase
❌ tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins	The test failed in the call phase

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	42ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	5ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	2ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold	✅	2ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	3ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	❌	2.2s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	❌	732ms
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	6.1s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	5.0s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	5.5s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	❌	46ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	❌	43ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	540ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	465ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	1.3s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	998ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	761ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	234ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	6.4s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	2.4s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	4.5s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	3.4s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	3.6s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4.0s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration	✅	613ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	7ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy	❌	67ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy	❌	44ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy	❌	44ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy	❌	44ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy	❌	590ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy	❌	66ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy	❌	44ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy	❌	43ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy	❌	334ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy	❌	604ms
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0	✅	5.8s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1	✅	2.8s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write	✅	3.8s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	❌	46ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	❌	44ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	❌	44ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file	✅	74ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql	✅	3.0s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file	✅	54ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql	✅	3.4s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file	✅	53ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql	✅	3.7s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	❌	46ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	❌	42ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	❌	43ms
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	❌	42ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	❌	42ms
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	❌	49ms
tests/common/config_test.py::TestConfig::test_chat_template_path	❌	43ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	42ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	❌	43ms
tests/common/config_test.py::TestConfig::test_default_workflow	❌	43ms
tests/common/config_test.py::TestConfig::test_load_default_config	❌	42ms
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	❌	43ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	❌	42ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	356ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	15ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	❌	584ms
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	❌	885ms
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	❌	885ms
tests/common/vllm_test.py::TestModelLen_0::test_model_len	❌	585ms
tests/common/vllm_test.py::TestModelLen_1::test_model_len	❌	585ms
tests/common/vllm_test.py::TestModelLen_2::test_model_len	❌	892ms
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	❌	884ms
tests/common/vllm_test.py::TestAPIServer::test_api	❌	584ms
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	❌	885ms
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	❌	585ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	250ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	240ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	❌	584ms
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	❌	586ms
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	❌	586ms
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	❌	591ms
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	❌	592ms
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	❌	593ms
tests/explorer/explorer_test.py::ServeTest::test_serve	❌	993ms
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	52ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	❌	67ms
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	❌	50ms
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	❌	51ms
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	❌	48ms
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	❌	51ms
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	❌	50ms
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	❌	52ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	❌	50ms
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	❌	48ms
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	❌	49ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	❌	48ms
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	❌	49ms
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	❌	50ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	602ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	35ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	24ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	224ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	4ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	13ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	9ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	109ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	101ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	❌	45ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	❌	44ms
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	✅	527ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	823ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.1s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	❌	48ms
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	❌	1.4s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	❌	45ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	❌	43ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	❌	42ms
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	❌	43ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	❌	42ms
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	❌	44ms
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	2.1s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21.9s
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	❌	50ms
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	❌	897ms
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	❌	584ms
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	❌	903ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	❌	585ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	❌	883ms
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	❌	883ms
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	❌	883ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	❌	43ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	❌	43ms
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	❌	42ms
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	❌	43ms
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	❌	42ms
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	❌	586ms
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	❌	983ms
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	611ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	613ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	❌	586ms
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	❌	884ms
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	❌	583ms
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	611ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	14ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	2ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	4ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	83ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	6ms
tests/utils/log_test.py::LogTest::test_actor_log	❌	43ms
tests/utils/log_test.py::LogTest::test_group_by_node	❌	42ms
tests/utils/log_test.py::LogTest::test_no_actor_log	❌	42ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins	❌	42ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins	❌	41ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins	❌	42ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins	❌	44ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins	❌	42ms
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins	❌	42ms
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import	✅	5.9s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping	✅	9ms
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping	✅	2ms
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping	✅	41ms
tests/utils/registry_test.py::TestRegistry::test_register_module	✅	1ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping	✅	1ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke	⏭️	1ms

Github Test Reporter by CTRF 💚

chenyushuo · 2025-12-25T07:56:49Z

/unittest-all

…_tinker_backend

chenyushuo · 2025-12-25T11:12:36Z

/unittest-explorer

github-actions · 2025-12-25T12:25:05Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
230	216	11	3	0	0	1h 10m

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy	The test failed in the call phase
❌ tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy	The test failed in the call phase
❌ tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	The test failed in the call phase

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke	skipped ⏭️

Tests

Test Name	Status	Duration
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_std_grpo	✅	39ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_batch_level_step_wise_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_duplicate_grpo	✅	5ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_advantage	✅	3ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_correct_bias	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_grpo_reward_std	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_advantage	✅	1ms
tests/algorithm/advantage_fn_test.py::TestGroupedAdvantageFn::test_step_wise_grpo_with_std_threshold	✅	2ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_abs_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_fallback	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_loss	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_same_policy	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_corrected_k3_with_old_logprob	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_dummy_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k1_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k2_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_k3_kl_fn	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_kl_loss_aggregation_modes	✅	1ms
tests/algorithm/kl_fn_test.py::KLFnTest::test_low_var_kl_fn	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_dpo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_gspo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_mix_policy_loss	✅	3ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_opmd_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_ppo_policy_loss_with_sequence_masking	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sapo_policy_loss	✅	1ms
tests/algorithm/policy_loss_test.py::VerlPolicyLossTest::test_sft_policy_loss	✅	1ms
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	9.6s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	✅	6.3s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	2.9s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	5.7s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	5.5s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	144ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	1.9s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	553ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	456ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	1.4s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	982ms
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	725ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	233ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	6.7s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	2.7s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	4.6s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	3.5s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	3.5s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4.2s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration	✅	586ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	6ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy	❌	1.6s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy	❌	1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy	❌	1.6s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy	❌	1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy	✅	4.9s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy	❌	1.7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy	❌	1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy	❌	1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy	❌	1.8s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy	✅	4.5s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0	✅	6.0s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1	✅	3.0s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write	✅	3.4s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	✅	71ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	✅	55ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	✅	88ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	✅	88ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	✅	89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	✅	92ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	✅	106ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	✅	45ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file	✅	61ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql	✅	3.1s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file	✅	40ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql	✅	3.1s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file	✅	40ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql	✅	3.6s
tests/cli/launcher_test.py::TestLauncherMain::test_debug_mode	✅	1m 15s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_command	✅	6.1s
tests/cli/launcher_test.py::TestLauncherMain::test_main_run_in_dlc	✅	1.4s
tests/cli/launcher_test.py::TestLauncherMain::test_main_studio_command	✅	295ms
tests/cli/launcher_test.py::TestLauncherMain::test_multi_stage_run	✅	1.7s
tests/common/config_test.py::TestConfig::test_all_examples_are_valid	✅	33.8s
tests/common/config_test.py::TestConfig::test_chat_template_path	✅	68ms
tests/common/config_test.py::TestConfig::test_config_flatten	✅	30ms
tests/common/config_test.py::TestConfig::test_continue_from_checkpoint_is_valid	✅	136ms
tests/common/config_test.py::TestConfig::test_default_workflow	✅	66ms
tests/common/config_test.py::TestConfig::test_load_default_config	✅	28.7s
tests/common/config_test.py::TestConfig::test_max_token_len_per_gpu_set_correctly	✅	70ms
tests/common/config_test.py::TestConfig::test_optimizer_config_propagation	✅	69ms
tests/common/config_test.py::TestConfig::test_update_config_from_ray_cluster	✅	128ms
tests/common/experience_test.py::TestEID::test_eid_properties	✅	1ms
tests/common/experience_test.py::TestExperience::test_action_mask_and_logprobs_type	✅	1ms
tests/common/experience_test.py::TestExperience::test_assertions	✅	1ms
tests/common/experience_test.py::TestExperience::test_dpo_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather	✅	1ms
tests/common/experience_test.py::TestExperience::test_gather_with_token_level_reward	✅	1ms
tests/common/experience_test.py::TestExperience::test_hf_datasets_conversion	✅	14ms
tests/common/experience_test.py::TestExperience::test_multi_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_serialize_deserialize	✅	1ms
tests/common/experience_test.py::TestExperience::test_single_turn_experience	✅	1ms
tests/common/experience_test.py::TestExperience::test_to_dict	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_dpo_experience_batch_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_experience_model_experience_conversion	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_gather_experiences_with_custom_fields	✅	1ms
tests/common/experience_test.py::TestExperienceConversion::test_multiturn_experience_batch_converstion	✅	1ms
tests/common/vllm_test.py::ModelWrapperTest_0::test_generate	✅	1m 13s
tests/common/vllm_test.py::ModelWrapperTest_1::test_generate	✅	1m 5s
tests/common/vllm_test.py::ModelWrapperTest_2::test_generate	✅	1m 11s
tests/common/vllm_test.py::TestModelLen_0::test_model_len	✅	57.4s
tests/common/vllm_test.py::TestModelLen_1::test_model_len	✅	21.6s
tests/common/vllm_test.py::TestModelLen_2::test_model_len	✅	50.9s
tests/common/vllm_test.py::TestModelLenWithoutPromptTruncation::test_model_len	✅	51.4s
tests/common/vllm_test.py::TestAPIServer::test_api	✅	24.1s
tests/common/vllm_test.py::TestLogprobs::test_logprobs_api	✅	21.4s
tests/common/vllm_test.py::TestAsyncAPIServer::test_api_async	✅	23.5s
tests/common/vllm_test.py::TestTokenizer::test_action_mask	✅	277ms
tests/common/vllm_test.py::TestTokenizer::test_action_mask_with_tools	✅	256ms
tests/common/vllm_test.py::TestAPIServerToolCall_0_deepseek_r1::test_api_tool_calls	✅	52.7s
tests/common/vllm_test.py::TestAPIServerToolCall_1::test_api_tool_calls	✅	52.0s
tests/common/vllm_test.py::TestSuperLongGeneration::test_generate	✅	1m 53s
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	48.4s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	1m 17s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 31s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 29s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	91ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	4.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	12.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19.6s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	4.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	4.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4.7s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	8.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	24.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	12.7s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	9.2s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	601ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	12ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	18ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	115ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	3ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	9ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	7ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	116ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	100ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	20.7s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	20.3s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	✅	1.3s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	159ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.0s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	21.7s
tests/manager/synchronizer_test.py::TestSynchronizerExit::test_synchronizer	✅	29.1s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_0::test_synchronizer	✅	1m 23s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_1::test_synchronizer	✅	1m 29s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_2::test_synchronizer	✅	2m 3s
tests/manager/synchronizer_test.py::TestStateDictBasedSynchronizer_3::test_synchronizer	✅	1m 29s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_0::test_synchronizer	✅	1m 7s
tests/manager/synchronizer_test.py::TestNCCLBasedSynchronizer_1::test_synchronizer	✅	1m 7s
tests/service/data_juicer_test.py::TestDataJuicer::test_config	✅	1.2s
tests/service/data_juicer_test.py::TestDataJuicer::test_server_start	✅	21.6s
tests/service/data_juicer_test.py::TestDataJuicerExperiencePipeline::test_data_juicer_operators	✅	20.2s
tests/service/data_juicer_test.py::TestDataJuicerTaskPipeline::test_data_juicer_task_pipeline	✅	14.3s
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	❌	2m 58s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	3m 41s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	❌	1m 3s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	49.2s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	49.8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	53.6s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m 1s
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	1m 37s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	37.9s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	34.9s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	36.1s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 34s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 34s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 20s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	1m 36s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	3m 26s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	1m 33s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	✅	1m 24s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	1.9s
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	551ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	3m 32s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	47.3s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	43.2s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	❌	949ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_equivalent	✅	11ms
tests/utils/eval_utils_test.py::TestComputeScore::test_both_boxed_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_ground_truth	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_empty_solution_string	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_multiple_boxed_answers_in_solution	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_boxed_truth_raw_and_not_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_not_boxed	✅	1ms
tests/utils/eval_utils_test.py::TestComputeScore::test_solution_raw_and_ground_truth_boxed_equivalent	✅	1ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_extract_answer	✅	2ms
tests/utils/eval_utils_test.py::TestMathEvalUtils::test_verify_math_answer	✅	58ms
tests/utils/eval_utils_test.py::TestEvalUtils::test_is_equiv	✅	4ms
tests/utils/log_test.py::LogTest::test_actor_log	✅	2.2s
tests/utils/log_test.py::LogTest::test_group_by_node	✅	2.1s
tests/utils/log_test.py::LogTest::test_no_actor_log	✅	625ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_0__workspace_tests_utils_plugins	✅	71ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_local_1_tests_utils_plugins	✅	67ms
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_0__workspace_tests_utils_plugins	✅	5.5s
tests/utils/plugin_test.py::TestPluginLoader::test_load_plugins_remote_1_tests_utils_plugins	✅	5.6s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_0__workspace_tests_utils_plugins	✅	3.3s
tests/utils/plugin_test.py::TestPluginLoader::test_passing_custom_class_1_tests_utils_plugins	✅	3.2s
tests/utils/registry_test.py::TestRegistryWithRay::test_dynamic_import	✅	2.8s
tests/utils/registry_test.py::TestRegistry::test_algorithm_registry_mapping	✅	3ms
tests/utils/registry_test.py::TestRegistry::test_buffer_module_registry_mapping	✅	1ms
tests/utils/registry_test.py::TestRegistry::test_common_module_registry_mapping	✅	49ms
tests/utils/registry_test.py::TestRegistry::test_register_module	✅	1ms
tests/utils/registry_test.py::TestRegistry::test_utils_module_registry_mapping	✅	1ms
tests/utils/swanlab_test.py::TestSwanlabMonitor::test_swanlab_monitor_smoke	⏭️	1ms

Github Test Reporter by CTRF 💚

chenyushuo · 2025-12-25T12:31:15Z

/unittest-module-buffer

chenyushuo · 2025-12-25T12:34:32Z

/unittest-module-trainer

github-actions · 2025-12-25T12:35:07Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
48	48	0	0	0	0	2m 1s

Tests

Test Name	Status	Duration
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_experience_pipeline	✅	12.6s
tests/buffer/experience_pipeline_test.py::TestExperiencePipeline::test_pass_rate_calculation	✅	4.6s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_experience_buffer	✅	2.8s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_0_sft	✅	4.8s
tests/buffer/experience_storage_test.py::ExperienceStorageTest::test_sql_storage_1_dpo	✅	5.3s
tests/buffer/file_test.py::TestFileBuffer::test_file_reader	✅	439ms
tests/buffer/file_test.py::TestFileBuffer::test_file_writer	✅	1.9s
tests/buffer/formatter_test.py::TestFormatter::test_dpo_messages_formatter	✅	568ms
tests/buffer/formatter_test.py::TestFormatter::test_dpo_plaintext_formatter	✅	482ms
tests/buffer/formatter_test.py::TestFormatter::test_multi_modal_sft_formatter	✅	1.4s
tests/buffer/formatter_test.py::TestFormatter::test_sft_messages_formatter	✅	1.0s
tests/buffer/formatter_test.py::TestFormatter::test_sft_plaintext_formatter	✅	750ms
tests/buffer/formatter_test.py::TestFormatter::test_task_formatter	✅	245ms
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_buffer_reuse	✅	6.6s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_capacity	✅	2.5s
tests/buffer/queue_test.py::TestQueueBuffer::test_priority_queue_reuse_count_control	✅	4.7s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_0_queue	✅	3.7s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_1_priority_queue	✅	3.5s
tests/buffer/queue_test.py::TestQueueBuffer::test_queue_buffer_capacity	✅	4.1s
tests/buffer/reader_test.py::TestBufferReader::test_buffer_reader_registration	✅	547ms
tests/buffer/reward_shaping_mapper_test.py::TestRewardShapingMapper::test_basic_usage	✅	7ms
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_default_sample_strategy	✅	2.5s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_default_queue_staleness_control_sample_strategy	✅	2.2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_default_sample_strategy	✅	2.1s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_priority_queue_staleness_control_sample_strategy	✅	2.2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_0::test_sql_staleness_control_sample_strategy	✅	5.7s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_default_sample_strategy	✅	2.2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_default_queue_staleness_control_sample_strategy	✅	2.2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_default_sample_strategy	✅	2.1s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_priority_queue_staleness_control_sample_strategy	✅	2.2s
tests/buffer/sample_strategy_test.py::ExperienceStorageTest_1::test_sql_staleness_control_sample_strategy	✅	3.9s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_0	✅	5.9s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_exp_buffer_read_write_1	✅	3.1s
tests/buffer/sql_test.py::TestSQLBuffer::test_sql_task_buffer_read_write	✅	3.5s
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_0	✅	72ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_1	✅	56ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_2	✅	89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_3	✅	89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_4	✅	89ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_5	✅	92ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_6	✅	107ms
tests/buffer/task_scheduler_test.py::TestTaskScheduler::test_task_scheduler_simple	✅	44ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_0_file	✅	61ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_1_sql	✅	3.1s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_2_file	✅	62ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_3_sql	✅	3.6s
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_4_file	✅	38ms
tests/buffer/task_storage_test.py::TaskStorageTest::test_read_task_5_sql	✅	3.7s

Github Test Reporter by CTRF 💚

github-actions · 2025-12-25T13:12:07Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
24	21	0	3	0	0	35m 2s

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	3m 42s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	3m 39s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	✅	1m 46s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 19s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	49.0s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	53.4s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	1m 37s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	37.6s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	34.3s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	34.4s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 34s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 35s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 18s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	1m 36s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	3m 30s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	1m 33s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	✅	1m 22s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	535ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	533ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	2m 47s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	45.9s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	1m 13s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	540ms

Github Test Reporter by CTRF 💚

trinity/trainer/verl/utils.py

trinity/common/config.py

trinity/manager/synchronizer.py

trinity/trainer/tinker/utils.py

trinity/trainer/tinker_trainer.py

chenyushuo · 2025-12-26T06:58:33Z

/unittest-all

pan-x-c · 2025-12-26T07:20:12Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new tinker backend, which enables model training on devices without GPUs. This is a significant addition and involves substantial refactoring of the data pipeline, particularly around how experiences are handled (List[Experience] instead of the Experiences batch class). The changes include new configuration options, a TinkerModel for inference, a TinkerTrainerWrapper for training, and updates to documentation and examples. The implementation looks solid, but I've found a critical bug in file handling within the new trainer and a couple of medium-severity issues related to configuration clarity and code maintainability. Overall, great work on adding this new capability.

trinity/trainer/tinker_trainer.py

examples/tinker/README.md

examples/tinker/tinker.yaml

trinity/common/models/tinker_model.py

examples/tinker/README.md

chenyushuo · 2025-12-26T08:17:48Z

/unittest-module-explorer

chenyushuo · 2025-12-26T08:21:29Z

/unittest-module-trainer

chenyushuo · 2025-12-26T08:22:03Z

/unittest-module-algorithm

chenyushuo · 2025-12-26T08:22:09Z

/unittest-module-buffer

chenyushuo · 2025-12-26T08:22:18Z

/unittest-module-cli

chenyushuo · 2025-12-26T08:22:36Z

/unittest-module-common

github-actions · 2025-12-26T08:32:17Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
47	47	0	0	0	0	12m 32s

Tests

Test Name	Status	Duration
tests/explorer/explorer_test.py::TestExplorerCountdownEval::test_explorer	✅	1m 35s
tests/explorer/explorer_test.py::TestExplorerGSM8KRULERNoEval::test_explorer	✅	1m 21s
tests/explorer/explorer_test.py::TestExplorerGSM8k::test_explorer	✅	3m 38s
tests/explorer/explorer_test.py::ServeTest::test_serve	✅	1m 23s
tests/explorer/proxy_test.py::RecorderTest::test_recorder	✅	66ms
tests/explorer/scheduler_test.py::SchedulerTest::test_async_workflow	✅	4.2s
tests/explorer/scheduler_test.py::SchedulerTest::test_concurrent_operations	✅	4.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_dynamic_timeout	✅	12.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_get_results	✅	19.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_0	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_non_repeatable_workflow_1	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_0	✅	4.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_metric_calculation_with_repeatable_workflow_1	✅	3.9s
tests/explorer/scheduler_test.py::SchedulerTest::test_multi_step_execution	✅	4.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_non_repeatable_workflow	✅	4.4s
tests/explorer/scheduler_test.py::SchedulerTest::test_over_rollout_min_wait	✅	8.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_all_methods	✅	14.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_scheduler_restart_after_stop	✅	7.8s
tests/explorer/scheduler_test.py::SchedulerTest::test_split_tasks	✅	7.3s
tests/explorer/scheduler_test.py::SchedulerTest::test_stepwise_experience_eid	✅	24.1s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all	✅	7.0s
tests/explorer/scheduler_test.py::SchedulerTest::test_wait_all_timeout_with_multi_batch	✅	13.4s
tests/explorer/scheduler_test.py::TestRunnerStateCollection::test_runner_state_collection	✅	9.1s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_reward_propagation_workflow_1	✅	601ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_0	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_step_wise_reward_workflow_1	✅	1.0s
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_raise_error	✅	1ms
tests/explorer/step_wise_workflow_test.py::WorkflowTest::test_workflows_stop_at_max_env_steps	✅	1.0s
tests/explorer/workflow_test.py::WorkflowTest::test_gsm8k_workflow	✅	25ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_boxed_workflow	✅	15ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_complex_workflow	✅	127ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_eval_workflow	✅	2ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_fraction_workflow	✅	10ms
tests/explorer/workflow_test.py::WorkflowTest::test_math_workflow	✅	7ms
tests/explorer/workflow_test.py::WorkflowTest::test_rm_gallery_workflow	✅	131ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_repeatable_1	✅	100ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_0	✅	1ms
tests/explorer/workflow_test.py::WorkflowTest::test_workflow_resettable_1	✅	201ms
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_0::test_multi_turn_workflow	✅	49.9s
tests/explorer/workflow_test.py::MultiTurnWorkflowTest_1::test_multi_turn_workflow	✅	20.9s
tests/explorer/workflow_test.py::TestWorkflowStateRecording::test_workflow_state_recording	✅	4.0s
tests/explorer/workflow_test.py::TestAgentScopeWorkflowAdapter::test_adapter	✅	710ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner	✅	163ms
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_runner_get_state	✅	8.0s
tests/explorer/workflow_test.py::TestWorkflowRunner::test_workflow_with_openai	✅	22.5s

Github Test Reporter by CTRF 💚

github-actions · 2025-12-26T09:12:49Z

Summary

Tests 📝	Passed ✅	Failed ❌	Skipped ⏭️	Other ❓	Flaky 🍂	Duration ⏱️
24	19	2	3	0	0	34m 53s

Failed Tests

Failed Tests ❌	Fail Message
❌ tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	The test failed in the call phase due to an assertion error
❌ tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	The test failed in the call phase due to an assertion error

Skipped

Tests	Status
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	skipped ⏭️
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	skipped ⏭️

Tests

Test Name	Status	Duration
tests/trainer/trainer_test.py::TestTrainerCountdown_0_fsdp::test_trainer	✅	2m 56s
tests/trainer/trainer_test.py::TestTrainerCountdown_1_megatron::test_trainer	✅	4m 22s
tests/trainer/trainer_test.py::TestStepAheadAsyncRL::test_trainer	❌	1m 41s
tests/trainer/trainer_test.py::TestTrainerGSM8K_0_fsdp::test_trainer	✅	1m 19s
tests/trainer/trainer_test.py::TestTrainerGSM8K_1_fsdp2::test_trainer	✅	48.8s
tests/trainer/trainer_test.py::TestTrainerGSM8K_2_fsdp::test_trainer	✅	53.4s
tests/trainer/trainer_test.py::TestTrainerGSM8K_3_fsdp2::test_trainer	✅	1m
tests/trainer/trainer_test.py::TestTrainerSFTWarmupGSM8K::test_trainer	✅	1m 40s
tests/trainer/trainer_test.py::TestTrainerDPO::test_trainer	✅	37.7s
tests/trainer/trainer_test.py::TestTrainerSFT::test_trainer	✅	34.5s
tests/trainer/trainer_test.py::TestTrainerToolsSFT::test_trainer_tools	✅	34.6s
tests/trainer/trainer_test.py::TestFullyAsyncMode_0_fsdp::test_fully_async_mode	✅	1m 33s
tests/trainer/trainer_test.py::TestFullyAsyncMode_1_fsdp::test_fully_async_mode	✅	1m 33s
tests/trainer/trainer_test.py::TestFullyAsyncMode_2_megatron::test_fully_async_mode	✅	2m 16s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_0_fsdp::test_trainer	✅	1m 37s
tests/trainer/trainer_test.py::TestTrainerCheckpointSave_1_megatron::test_trainer	✅	3m 32s
tests/trainer/trainer_test.py::TestTrainerMIX::test_trainer	✅	1m 30s
tests/trainer/trainer_test.py::TestServeWithTrainer::test_serve_with_trainer	❌	1m 24s
tests/trainer/trainer_test.py::TestMultiModalGRPO::test_trainer	⏭️	535ms
tests/trainer/trainer_test.py::TestMultiModalSFT::test_trainer	⏭️	532ms
tests/trainer/trainer_test.py::TestTrainerLoRA::test_trainer	✅	2m 47s
tests/trainer/trainer_test.py::TestOverRollout::test_trainer	✅	46.5s
tests/trainer/trainer_test.py::TestTrainerPromptTruncation::test_trainer	✅	1m 13s
tests/trainer/trainer_test.py::TestTinkerTrainer::test_trainer	⏭️	538ms

Github Test Reporter by CTRF 💚

chenyushuo · 2025-12-26T10:30:50Z

/unittest-module-trainer

Add tinker backend.

8eda2af

chenyushuo changed the base branch from main to dev December 24, 2025 10:16

chenyushuo changed the base branch from dev to main December 24, 2025 10:16

gemini-code-assist bot reviewed Dec 24, 2025

View reviewed changes

trinity/trainer/tinker_trainer.py Outdated Show resolved Hide resolved

trinity/common/models/tinker_model.py Show resolved Hide resolved

trinity/common/config.py Outdated Show resolved Hide resolved

chenyushuo added 3 commits December 25, 2025 14:19

1. add tinker unittest

11cf729

2. fix dyn sync in trainer and explorer 3. fix entropy in tinker trainer 4. add `tinker_base_model` to `InferenceModelConfig`

Merge branch 'main' of github.com:modelscope/Trinity-RFT into dev/add…

0e8a2b4

…_tinker_backend

bug fix

c6000f2

chenyushuo added 2 commits December 25, 2025 15:44

fix import

3b6f8a0

Merge branch 'main' of github.com:modelscope/Trinity-RFT into dev/add…

7edde21

…_tinker_backend

fix unittest

d1d450c

chenyushuo added 3 commits December 25, 2025 19:06

Refactor Experiences to List[Experience]

5565309

Merge branch 'main' of github.com:modelscope/Trinity-RFT into dev/add…

548f937

…_tinker_backend

fix pre-commit

0bc0398

fix unittest

10dcadf

pan-x-c reviewed Dec 25, 2025

View reviewed changes

chenyushuo changed the title ~~[WIP] Add tinker backend.~~ Add tinker backend. Dec 26, 2025

chenyushuo added 2 commits December 26, 2025 11:48

Add document and apply reviews

9a4927f

update document

f029cbd

gemini-code-assist bot reviewed Dec 26, 2025

View reviewed changes

trinity/trainer/tinker_trainer.py Outdated Show resolved Hide resolved

examples/tinker/README.md Show resolved Hide resolved

examples/tinker/tinker.yaml Show resolved Hide resolved

trinity/common/models/tinker_model.py Outdated Show resolved Hide resolved

yanxi-chen reviewed Dec 26, 2025

View reviewed changes

examples/tinker/README.md Outdated Show resolved Hide resolved

yanxi-chen reviewed Dec 26, 2025

View reviewed changes

examples/tinker/README.md Outdated Show resolved Hide resolved

examples/tinker/README.md Outdated Show resolved Hide resolved

update docs and apply reviews

ba4b34e

fix unittest and seperate verl dependency

7160a35

Add tinker backend. #448

Are you sure you want to change the base?

Add tinker backend. #448

Uh oh!

Conversation

chenyushuo commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Uh oh!

gemini-code-assist bot commented Dec 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 25, 2025

Summary

Failed Tests

Skipped

Tests

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 25, 2025

Summary

Failed Tests

Skipped

Tests

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

chenyushuo commented Dec 25, 2025

Uh oh!

github-actions bot commented Dec 25, 2025

Summary

Tests

Uh oh!

github-actions bot commented Dec 25, 2025

Summary

Skipped

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenyushuo commented Dec 26, 2025

Uh oh!

pan-x-c commented Dec 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chenyushuo commented Dec 26, 2025

Uh oh!

chenyushuo commented Dec 26, 2025

Uh oh!

chenyushuo commented Dec 24, 2025 •

edited

Loading