[Harbor] Support agents beyond terminus 2 such as open hands

Currently we only support RL on Terminus2: https://github.com/NovaSky-AI/SkyRL/blob/168b20fac2eb49092cdf901ee9785b75c35711a7/examples/train_integrations/harbor/harbor_trial_config/default.yaml#L31

We'd want to RL on other agent harness via Harbor as well.

This might include plumbing Harbor and make changes if needed in SkyRL internals.

Need to be especially careful about whether open hands does off policy things that make chat history non-strictly appending (e.g. summarization).

We should support RL on other agents with strictly appending chat history first, and then we can support step-wise training for all agents.

Final deliverable: a curve on perhaps CodeContest, compare against Terminus2

Hardware needed:
- 1xA100 for development (final curve can be run by SkyRL maintainers)
- Modal/Daytona (8 sandbox concurrency should be enough)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Harbor] Support agents beyond terminus 2 such as open hands #1184

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Harbor] Support agents beyond terminus 2 such as open hands #1184

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions