-
Notifications
You must be signed in to change notification settings - Fork 0
Budget Controls
Prevent runaway LLM loops with turn, tool-call, wall-clock, timeout, token, and repeated-tool budgets.
The agentic loop works like this: the LLM reasons, calls tools, receives results, reasons again, calls more tools, and so on until it produces a final text answer. But what if it never produces that final answer?
Without a budget, an agent can:
- Loop indefinitely, calling the same tool with slightly different arguments.
- Burn through API credits or local compute on a single request.
- Block a thread forever in a synchronous execution model.
- Amplify errors: each failed tool call leads to another attempt, which fails again.
Budgets are the guardrail. They set hard upper bounds on how much one agentic invocation can spend before the framework stops it with a typed reason.
Set a budget with the budget {} DSL block inside your agent definition:
import kotlin.time.Duration.Companion.minutes
import kotlin.time.Duration.Companion.seconds
val agent = agent<String, String>("researcher") {
model { ollama("qwen2.5:7b") }
budget {
maxTurns = 10
maxToolCalls = 24
maxDuration = 2.minutes
perToolTimeout = 15.seconds
maxTokens = 12_000
maxConsecutiveSameTool = 3
}
lateinit var search: Tool<Map<String, Any?>, Any?>
lateinit var summarize: Tool<Map<String, Any?>, Any?>
tools {
search = tool("search", "Search the web") { args -> /* ... */ }
summarize = tool("summarize", "Summarize text") { args -> /* ... */ }
}
skills {
skill<String, String>("research", "Research a topic using tools") {
tools(search, summarize)
}
}
}The DSL produces a BudgetConfig data class:
import kotlin.time.Duration
import kotlin.time.Duration.Companion.minutes
data class BudgetConfig(
val maxTurns: Int = 8,
val maxToolCalls: Int = 32,
val maxDuration: Duration = 5.minutes,
val perToolTimeout: Duration? = null,
val maxTokens: Int? = null,
val maxConsecutiveSameTool: Int? = null,
)Defaults are production-friendly: tight enough to bound runaway cost and wall time, generous enough for well-designed loops. Override individual fields when a workflow legitimately needs more headroom.
| Field | Default | What It Caps |
|---|---|---|
maxTurns |
8 |
LLM request-response cycles in one invocation |
maxToolCalls |
32 |
Total tool invocations across the loop |
maxDuration |
5.minutes |
Wall-clock time from invocation start |
perToolTimeout |
null |
One tool execution; null means no per-tool timeout |
maxTokens |
null |
Cumulative provider-reported prompt + completion tokens |
maxConsecutiveSameTool |
null |
Immediate repeats of the same tool without another tool in between |
When the agent reaches any cap, the framework throws BudgetExceededException with a BudgetReason:
import agents_engine.model.BudgetExceededException
try {
val result = agent("Analyze all 10,000 files in the repository")
} catch (e: BudgetExceededException) {
println("Agent ran out of budget (${e.reason}): ${e.message}")
// Handle gracefully: return partial result, notify user, etc.
}For turn, duration, token, and tool-call caps, the exception is thrown before the next bounded operation would happen. This means:
- All previous tool calls have completed.
- All previous LLM responses are intact.
- The agent's message history is available up to the point of termination.
In a then pipeline, BudgetExceededException propagates like any other exception:
val pipeline = parse then analyze then summarize
try {
pipeline(input)
} catch (e: BudgetExceededException) {
// Which agent exceeded its budget? Check the message.
println(e.reason) // TURNS, TOOL_CALLS, DURATION, TOKENS, ...
println(e.message)
}A turn is one LLM request-response cycle. Here is how turns map to the agentic loop:
Turn 1: LLM receives [system, user] -> returns ToolCalls([search("kotlin agents")])
Framework executes search, appends tool result
Turn 2: LLM receives [system, user, assistant(toolcalls), tool(result)] -> returns ToolCalls([summarize(...)])
Framework executes summarize, appends tool result
Turn 3: LLM receives [system, user, assistant, tool, assistant, tool] -> returns Text("Here is the summary...")
Done. 3 turns used.
Key points:
- Each call to
ModelClient.chat()is one turn. - Multiple tool calls in a single LLM response count as one turn (the LLM made one request that happened to include multiple tool calls).
- The final
Textresponse also counts as a turn. - Tool execution itself does not count -- only the LLM call does.
val agent = agent<String, String>("counter-demo") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 3 }
lateinit var stepA: Tool<Map<String, Any?>, Any?>
lateinit var stepB: Tool<Map<String, Any?>, Any?>
tools {
stepA = tool("step_a", "First step") { args -> "result_a" }
stepB = tool("step_b", "Second step") { args -> "result_b" }
}
skills {
skill<String, String>("work", "Do work") {
tools(stepA, stepB)
}
}
}If the LLM's behavior is:
- Turn 1: calls
step_aandstep_btogether -> 1 turn - Turn 2: calls
step_aagain -> 1 turn - Turn 3: returns text "Done" -> 1 turn
Total: 3 turns. Exactly at the limit. If the LLM tried a 4th call, it would throw.
// Defaults are bounded, but may not match this workflow.
val agent = agent<String, String>("risky") {
model { ollama("qwen2.5:7b") }
// Uses defaults: 8 turns, 32 tool calls, 5 minutes.
// ...
}
// Be explicit for production workflows.
val agent = agent<String, String>("safe") {
model { ollama("qwen2.5:7b") }
budget {
maxTurns = 15
maxToolCalls = 40
maxDuration = 3.minutes
maxConsecutiveSameTool = 3
}
// ...
}Match your budget to the expected number of tool calls:
| Task Type | Typical Turns | Suggested Budget |
|---|---|---|
| Single tool call + answer | 2 | 3-5 |
| Multi-step analysis (3-5 tools) | 4-6 | 8-10 |
| Complex research (many tools, iteration) | 8-15 | 15-20 |
| Open-ended exploration | 10-30 | 25-30 |
Leave headroom above the expected turns. The LLM might need an extra turn to correct a mistake or rephrase its answer.
When agents are composed via structure {}, each has its own budget. A parent agent with maxTurns = 10 does not share that budget with its children:
val researcher = agent<String, String>("researcher") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 20 } // generous budget for deep research
// ...
}
val summarizer = agent<String, String>("summarizer") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 3 } // tight budget: should be quick
// ...
}
val pipeline = researcher then summarizer
// researcher gets 20 turns, summarizer gets 3 -- independentTool Error Recovery repair agents should have tight budgets. A repair agent that loops is worse than the original error:
val jsonFixer = agent<String, String>("json-fixer") {
model { ollama("qwen2.5:7b") }
budget { maxTurns = 1 } // single-shot: one LLM call, no tools
// ...
}Write tests that verify your agent completes within its budget:
@Test
fun `agent completes within budget`() {
var turnCount = 0
val mockClient = ModelClient { messages ->
turnCount++
if (turnCount < 3) {
LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
} else {
LlmResponse.Text("done")
}
}
val agent = agent<String, String>("test") {
model { ollama("unused"); client = mockClient }
budget { maxTurns = 5 }
lateinit var step: Tool<Map<String, Any?>, Any?>
tools { step = tool("step", "A step") { "ok" } }
skills {
skill<String, String>("work", "Work") {
tools(step)
}
}
}
val result = agent("go")
assertEquals("done", result)
assertEquals(3, turnCount) // completed in 3 turns, well within budget of 5
}
@Test
fun `agent throws when budget exceeded`() {
val mockClient = ModelClient { _ ->
// Never returns Text -- always calls tools
LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
}
val agent = agent<String, String>("test") {
model { ollama("unused"); client = mockClient }
budget { maxTurns = 3 }
lateinit var step: Tool<Map<String, Any?>, Any?>
tools { step = tool("step", "A step") { "ok" } }
skills {
skill<String, String>("work", "Work") {
tools(step)
}
}
}
assertThrows<BudgetExceededException> {
agent("go")
}
}Use onBudgetThreshold when you want to see a warning before a cap throws. It fires once per BudgetReason when cumulative usage crosses the threshold:
val agent = agent<String, String>("monitored") {
model { ollama("qwen2.5:7b") }
budget {
maxTurns = 10
maxToolCalls = 20
maxTokens = 8_000
}
onBudgetThreshold(0.8) { reason, usedPercent ->
println("Budget warning: $reason is at ${(usedPercent * 100).toInt()}%")
}
// ...
}Combine this with Observability Hooks to tune budgets over time. If an agent consistently uses 3 turns, a budget of 20 is wasteful -- tighten it to catch regressions early.
- Model & Tool Calling -- understand the loop that budgets constrain
- Tool Error Recovery -- error recovery interacts with budgets (retries consume turns)
- Observability Hooks -- monitor budget usage in real time
Project Links
Getting Started
Core Concepts
Composition Operators
LLM Integration
- Model & Tool Calling
- MCP Integration
- Agent Deployment Modes
- Swarm
- Tool Error Recovery
- Skill Selection & Routing
- Budget Controls
- Observability Hooks
Guided Generation
Agent Memory
Reference
- API Quick Reference
- Type Algebra Cheat Sheet
- Glossary
- Best Practices
- Cookbook & Recipes
- Troubleshooting & FAQ
- Roadmap
Contributing