Currently we only support RL on Terminus2:
We'd want to RL on other agent harness via Harbor as well.
This might include plumbing Harbor and make changes if needed in SkyRL internals.
Need to be especially careful about whether open hands does off policy things that make chat history non-strictly appending (e.g. summarization).
We should support RL on other agents with strictly appending chat history first, and then we can support step-wise training for all agents.
Final deliverable: a curve on perhaps CodeContest, compare against Terminus2
Hardware needed:
- 1xA100 for development (final curve can be run by SkyRL maintainers)
- Modal/Daytona (8 sandbox concurrency should be enough)
Currently we only support RL on Terminus2:
SkyRL/examples/train_integrations/harbor/harbor_trial_config/default.yaml
Line 31 in 168b20f
We'd want to RL on other agent harness via Harbor as well.
This might include plumbing Harbor and make changes if needed in SkyRL internals.
Need to be especially careful about whether open hands does off policy things that make chat history non-strictly appending (e.g. summarization).
We should support RL on other agents with strictly appending chat history first, and then we can support step-wise training for all agents.
Final deliverable: a curve on perhaps CodeContest, compare against Terminus2
Hardware needed: