Skip to content

[Feature] Add max-inflight guard for remote policy clients#3897

Draft
vmoens wants to merge 4 commits into
gh/vmoens/292/basefrom
gh/vmoens/292/head
Draft

[Feature] Add max-inflight guard for remote policy clients#3897
vmoens wants to merge 4 commits into
gh/vmoens/292/basefrom
gh/vmoens/292/head

Conversation

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 21, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/3897

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 7 New Failures, 1 Cancelled Job

As of commit 7653b7d with merge base b660f05 (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Benchmark Results: PR 7653b7dc vs main 8396c61f

Benchmark run: https://github.com/pytorch/rl/actions/runs/28074360335

Higher ops/sec is better. Tables are sorted by largest absolute change.

CPU

Compared 216 benchmarks. Regressions over 5%: 43. Improvements over 5%: 12.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 192.35 37.14 -80.69%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 54.05 88.25 +63.28%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 225.51 281.80 +24.96%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-same] 24.26 28.41 +17.10%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,689 3,083 -16.42%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,457 2,915 -15.67%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,333 2,007 -13.97%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,636 3,172 -12.77%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-same] 22.53 19.70 -12.57%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-backward] 873.31 977.01 +11.87%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 49.85 55.69 +11.70%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 774.00 864.58 +11.70%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,228 3,586 +11.08%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 77,433 69,013 -10.87%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 2,276 2,038 -10.46%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-True] 38,304 34,326 -10.39%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 37,621 33,731 -10.34%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-32-512] 32.36 29.25 -9.62%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 28,964 26,230 -9.44%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 34,456 31,263 -9.26%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-False] 55,378 50,268 -9.23%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-False-False] 64,657 58,692 -9.23%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] 7.2281 6.5618 -9.22%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-False] 43,196 39,342 -8.92%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 22,038 20,077 -8.89%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 38,267 34,950 -8.67%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 64,377 58,818 -8.64%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 50,286 45,945 -8.63%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-False-True] 34,883 31,879 -8.61%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] 20,608 18,847 -8.55%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 19,663 17,993 -8.49%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 27,595 25,261 -8.46%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-False] 45,021 41,410 -8.02%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 29,256 26,912 -8.01%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-True] 18,072 16,635 -7.95%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 21,009 19,342 -7.94%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 133.00 143.44 +7.85%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-True] 19,872 18,316 -7.83%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 500.01 538.96 +7.79%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-False] 31,513 29,108 -7.63%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 49,737 45,948 -7.62%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 492.78 530.22 +7.60%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-True] 18,589 17,184 -7.56%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 30,585 28,375 -7.23%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-True] 30,182 28,221 -6.50%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,185 4,453 +6.40%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 31,694 29,681 -6.35%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-True] 32,766 30,686 -6.35%
benchmarks/test_envs_benchmark.py::test_parallel 0.9748 0.9143 -6.21%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 515.90 484.65 -6.06%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 56,522 53,281 -5.73%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-True] 1.3708 1.2928 -5.69%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 23,442 22,163 -5.46%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[reduce-overhead-None] 1,736 1,824 +5.06%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 41,601 39,510 -5.02%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-None] 270.83 283.57 +4.70%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[100-img_shape1-atari] 687.29 719.18 +4.64%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 109.75 114.68 +4.49%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 396.17 413.26 +4.31%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 3,034 3,163 +4.24%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[reduce-overhead-None] 220.80 230.14 +4.23%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-gru] 1.4285 1.4884 +4.19%
benchmarks/test_objectives_benchmarks.py::test_values[td0_return_estimate-False-False] 8,071 7,743 -4.06%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-1000000-10000-100-True] 22.85 23.77 +4.01%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-4] 167.41 161.05 -3.80%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[untyped_storage] 7.9421 8.2143 +3.43%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,397 4,247 -3.42%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 685.29 662.73 -3.29%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[200-img_shape3-large_batch] 330.34 340.95 +3.21%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 1,080 1,047 -3.06%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-True-0-lstm] 0.9361 0.9647 +3.06%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 29.61 28.74 -2.94%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 6,987 7,188 +2.87%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cpu] 96.55 99.31 +2.86%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 574.49 558.55 -2.77%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 4.1604 4.2736 +2.72%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape1-atari] 637.93 655.03 +2.68%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 60.05 58.47 -2.64%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 829.61 807.99 -2.61%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 2,184 2,240 +2.56%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-16] 43.13 44.19 +2.46%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 3,410 3,330 -2.37%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 63.70 62.25 -2.28%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 2.9854 3.0529 +2.26%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-1] 188.24 192.21 +2.11%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-None] 543.54 554.60 +2.03%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-False] 1.6047 1.5731 -1.97%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-backward] 60.58 59.40 -1.96%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[100-img_shape0-atari] 25.63 25.13 -1.95%
benchmarks/test_collectors_benchmark.py::test_single 8.9667 8.7966 -1.90%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[200-img_shape3-large_batch] 309.40 315.17 +1.86%
benchmarks/test_collectors_benchmark.py::test_single_with_rb 8.7234 8.5675 -1.79%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 288.26 293.15 +1.70%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-cudnn-False-0-lstm] 0.8898 0.8749 -1.68%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[200-img_shape1-large_batch] 14.82 14.58 -1.61%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 402.59 408.84 +1.55%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-None] 85.82 84.50 -1.54%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 78.03 79.19 +1.49%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-True] 0.5175 0.5099 -1.47%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-backward] 240.29 243.81 +1.47%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[50-img_shape0-small] 3,483 3,432 -1.46%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 167.21 164.80 -1.44%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[10000000-cpu] 51.09 51.81 +1.41%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-buffers-False] 0.5959 0.5876 -1.40%
benchmarks/test_envs_benchmark.py::test_transformed 0.8895 0.9018 +1.38%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-serial-no-buffers-True] 0.5986 0.5904 -1.37%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 268.04 264.38 -1.37%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 768.98 758.51 -1.36%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-backward] 28.04 27.66 -1.35%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[False-None] 38.33 37.82 -1.35%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.92 10.78 -1.32%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 53.77 53.06 -1.32%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 2.0500 2.0233 -1.31%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-None] 89.83 88.68 -1.28%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-None] 265.96 262.64 -1.25%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-16] 18.04 17.81 -1.23%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 5,287 5,351 +1.20%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape1-atari] 275.32 278.62 +1.20%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 777.62 768.54 -1.17%
benchmarks/test_objectives_benchmarks.py::test_redq_speed[True-None] 229.45 226.80 -1.15%
... ... ... Showing 120 of 216 comparisons, sorted by absolute change.

GPU

Compared 226 benchmarks. Regressions over 5%: 8. Improvements over 5%: 17.

Benchmark main ops PR ops Change
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-400] 50.36 482.35 +857.74%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-100000-10000-100-False] 28.99 52.86 +82.34%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-RandomSampler-400] 199.21 40.08 -79.88%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-ListStorage-None-400] 189.96 51.37 -72.96%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 2,816 3,662 +30.06%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,667 3,373 +26.46%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-10000] 3,625 2,756 -23.97%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[reduce-overhead-None] 107.23 86.12 -19.69%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-400] 733.46 855.62 +16.66%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-SamplerWithoutReplacement-10000] 3,227 3,679 +14.03%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-64] 6.5096 7.2334 +11.12%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,721 2,478 -8.92%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 1,986 2,142 +7.87%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[True-backward] 369.30 396.64 +7.40%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-False] 0.5858 0.6219 +6.16%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[False-None] 107.27 113.69 +5.99%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-backward] 358.53 337.57 -5.85%
benchmarks/test_objectives_benchmarks.py::test_values[vec_generalized_advantage_estimate-True-True] 294.01 310.84 +5.72%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-True-True] 24,911 23,533 -5.53%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[pickle] 11,895 12,540 +5.42%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-backward] 366.24 346.42 -5.41%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-False-True] 28,619 30,078 +5.10%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyTensorStorage-RandomSampler-400] 940.48 988.36 +5.09%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[4-constant] 4,527 4,758 +5.09%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[True-backward] 258.12 271.12 +5.03%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[200-img_shape3-large_batch] 760.05 722.44 -4.95%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyTensorStorage-None-10000] 2,064 1,971 -4.48%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[True-None] 1,848 1,930 +4.43%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[True-backward] 330.22 315.87 -4.35%
benchmarks/test_envs_benchmark.py::test_simple 1.2657 1.2121 -4.23%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-10000] 2,861 2,740 -4.23%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-backward] 472.93 453.51 -4.10%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-256-256-64] 10.49 10.91 +4.04%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-backward] 281.23 270.19 -3.92%
benchmarks/test_envs_benchmark.py::test_cat_frames_functional[16-constant] 4,603 4,783 +3.91%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyMemmapStorage-sampler6-10000] 721.36 694.18 -3.77%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[50-img_shape0-small] 6,248 6,022 -3.61%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-480-640-1] 479.90 462.93 -3.54%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[False-backward] 150.72 145.52 -3.45%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-True] 30,457 31,473 +3.33%
benchmarks/test_objectives_benchmarks.py::test_values[td_lambda_return_estimate-True-False] 11.63 12.01 +3.31%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-LazyTensorStorage-sampler7-10000] 768.54 743.22 -3.29%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-32-512] 1,311 1,269 -3.14%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-backward] 248.99 241.31 -3.09%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[200-img_shape3-large_batch] 141.99 137.71 -3.01%
benchmarks/test_envs_benchmark.py::test_serial 0.4199 0.4325 +3.00%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-True] 39,520 38,339 -2.99%
benchmarks/test_objectives_benchmarks.py::test_a2c_speed[True-None] 729.65 750.46 +2.85%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-10000] 1,963 1,908 -2.79%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictPrioritizedReplayBuffer-LazyMemmapStorage-None-400] 487.33 500.64 +2.73%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-backward] 131.13 127.57 -2.71%
benchmarks/test_objectives_benchmarks.py::test_dqn_speed[reduce-overhead-None] 1,926 1,875 -2.62%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-False] 38,636 39,629 +2.57%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_lazystack[50-img_shape0-small] 4,374 4,485 +2.53%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-gru] 22.20 22.76 +2.52%
benchmarks/test_objectives_benchmarks.py::test_cql_speed[True-backward] 224.92 219.35 -2.48%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-LazyMemmapStorage-SamplerWithoutReplacement-400] 469.30 480.22 +2.33%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[False-None] 349.68 341.55 -2.33%
benchmarks/test_objectives_benchmarks.py::test_redq_deprec_speed[False-backward] 72.27 70.59 -2.32%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-True-True] 22,823 22,302 -2.28%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-False-False] 58,580 59,896 +2.25%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-True-0-gru] 49.17 48.07 -2.24%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-True] 21,533 21,063 -2.19%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-False-1-512] 1,342 1,313 -2.17%
benchmarks/test_envs_benchmark.py::test_transformed 0.7248 0.7093 -2.14%
benchmarks/test_replaybuffer_benchmark.py::TestPrioritizedReplayBufferBenchmark::test_sampler_sample_scale[1000000-cuda] 2,159 2,205 +2.12%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[True-None] 689.63 704.25 +2.12%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_lazystack_then_write[100-img_shape2-large_img] 398.45 406.71 +2.07%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-False-False] 50,665 51,682 +2.01%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape1-atari] 4,281 4,366 +1.98%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-False-True-False] 28,143 28,683 +1.92%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-16] 49.05 49.97 +1.88%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[torch.save] 7,244 7,379 +1.87%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-False-False-False] 65,640 66,864 +1.86%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[200-img_shape1-large_batch] 8.7112 8.5509 -1.84%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb_cuda[100-img_shape0-atari] 17.44 17.12 -1.84%
benchmarks/test_objectives_benchmarks.py::test_td3_speed[reduce-overhead-None] 44.13 44.93 +1.82%
benchmarks/test_objectives_benchmarks.py::test_gae_speed[vec_generalized_advantage_estimate-True-1-512] 1,271 1,248 -1.82%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[reduce-overhead-None] 809.78 824.44 +1.81%
benchmarks/test_replaybuffer_benchmark.py::test_rb_extend_sample[ReplayBuffer-LazyTensorStorage-RandomSampler-10000-10000-100-False] 54.79 53.83 -1.75%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-False] 34,895 35,503 +1.74%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-backward] 352.09 358.17 +1.73%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-4] 187.06 183.85 -1.72%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[False-backward] 79.96 78.62 -1.68%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_collector_stack_then_write[100-img_shape2-large_img] 174.19 177.10 +1.67%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-False-True-True] 20,266 19,928 -1.67%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-True] 44,130 43,399 -1.66%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[torchvision-224-224-1] 631.10 641.25 +1.61%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[True-None] 781.84 769.27 -1.61%
benchmarks/test_collectors_benchmark.py::test_single 6.6321 6.7384 +1.60%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-64] 4.4880 4.5593 +1.59%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-True-True-False-False] 78,161 79,396 +1.58%
benchmarks/test_collectors_benchmark.py::test_async_pixels 10.95 10.78 -1.55%
benchmarks/test_compressed_storage_benchmark.py::TestCompressedStorageBenchmark::test_tensor_to_bytestream_speed[safetensors] 24,815 24,432 -1.54%
benchmarks/test_replaybuffer_benchmark.py::test_rb_iterate[TensorDictReplayBuffer-LazyMemmapStorage-RandomSampler-10000] 2,658 2,618 -1.51%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-1] 281.89 286.15 +1.51%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-parallel-buffers-True] 0.5372 0.5290 -1.51%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb_cuda[200-img_shape1-large_batch] 8.4360 8.3104 -1.49%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[True-None] 825.65 837.59 +1.45%
benchmarks/test_storage_write_benchmark.py::TestStorageWriteBenchmark::test_storage_write_contiguous[100-img_shape2-large_img] 555.19 547.43 -1.40%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-False-False] 51,925 51,217 -1.36%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[False-backward] 70.18 69.25 -1.33%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[reduce-overhead-None] 106.88 108.28 +1.32%
benchmarks/test_objectives_benchmarks.py::test_iql_speed[True-None] 516.04 522.71 +1.29%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_with_rb[200-img_shape1-large_batch] 13.66 13.48 -1.29%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[True-False-True-True-False] 35,090 35,515 +1.21%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-False-True-False] 33,032 32,638 -1.19%
benchmarks/test_objectives_benchmarks.py::test_sac_speed[reduce-overhead-None] 104.74 105.96 +1.17%
benchmarks/test_storage_write_benchmark.py::TestCollectorIntegrationBenchmark::test_collector_without_rb[100-img_shape0-atari] 30.94 30.58 -1.16%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-False-True-True-False] 30,189 30,539 +1.16%
benchmarks/test_rnn_reset_backends_benchmark.py::test_rnn_rollout_with_intermediate_resets[b256-t128-i32-h512-scan-False-0-lstm] 21.65 21.41 -1.15%
benchmarks/test_objectives_benchmarks.py::test_ppo_speed[False-None] 231.84 229.21 -1.13%
benchmarks/test_objectives_benchmarks.py::test_reinforce_speed[False-None] 402.51 397.95 -1.13%
benchmarks/test_objectives_benchmarks.py::test_ddpg_speed[reduce-overhead-None] 827.46 836.82 +1.13%
benchmarks/test_replaybuffer_benchmark.py::test_rb_sample[TensorDictReplayBuffer-ListStorage-RandomSampler-4000] 166.47 168.33 +1.12%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-256-256-64] 3.0228 3.0556 +1.09%
benchmarks/test_envs_benchmark.py::test_step_mdp_speed[False-True-True-True-True] 20,728 20,952 +1.08%
benchmarks/test_vla_preprocessing_benchmark.py::test_openvla_preprocessing_throughput[pil-224-224-4] 71.09 71.84 +1.05%
benchmarks/test_replaybuffer_benchmark.py::test_rb_populate[TensorDictReplayBuffer-ListStorage-SamplerWithoutReplacement-400] 195.64 197.63 +1.01%
benchmarks/test_non_tensor_env_benchmark.py::test_non_tensor_env_rollout_speed[1000-single-False] 1.6249 1.6412 +1.00%
... ... ... Showing 120 of 226 comparisons, sorted by absolute change.

[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jun 22, 2026
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jun 24, 2026
[ghstack-poisoned]
vmoens added a commit that referenced this pull request Jun 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Collectors Feature New feature Integrations/torch_geometric Integrations Modules

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant