Budget Controls

Prevent runaway LLM loops with turn, tool-call, wall-clock, timeout, token, and repeated-tool budgets.

Why Budgets Exist

The agentic loop works like this: the LLM reasons, calls tools, receives results, reasons again, calls more tools, and so on until it produces a final text answer. But what if it never produces that final answer?

Without a budget, an agent can:

Loop indefinitely, calling the same tool with slightly different arguments.
Burn through API credits or local compute on a single request.
Block a thread forever in a synchronous execution model.
Amplify errors: each failed tool call leads to another attempt, which fails again.

Budgets are the guardrail. They set hard upper bounds on how much one agentic invocation can spend before the framework stops it with a typed reason.

Configuration

Set a budget with the budget {} DSL block inside your agent definition:

import kotlin.time.Duration.Companion.minutes
import kotlin.time.Duration.Companion.seconds

val agent = agent<String, String>("researcher") {
    model { ollama("qwen2.5:7b") }

    budget {
        maxTurns = 10
        maxToolCalls = 24
        maxDuration = 2.minutes
        perToolTimeout = 15.seconds
        maxTokens = 12_000
        maxConsecutiveSameTool = 3
    }

    lateinit var search: Tool<Map<String, Any?>, Any?>
    lateinit var summarize: Tool<Map<String, Any?>, Any?>
    tools {
        search    = tool("search",    "Search the web") { args -> /* ... */ }
        summarize = tool("summarize", "Summarize text") { args -> /* ... */ }
    }
    skills {
        skill<String, String>("research", "Research a topic using tools") {
            tools(search, summarize)
        }
    }
}

BudgetConfig

The DSL produces a BudgetConfig data class:

import kotlin.time.Duration
import kotlin.time.Duration.Companion.minutes

data class BudgetConfig(
    val maxTurns: Int = 8,
    val maxToolCalls: Int = 32,
    val maxDuration: Duration = 5.minutes,
    val perToolTimeout: Duration? = null,
    val maxTokens: Int? = null,
    val maxConsecutiveSameTool: Int? = null,
)

Defaults are production-friendly: tight enough to bound runaway cost and wall time, generous enough for well-designed loops. Override individual fields when a workflow legitimately needs more headroom.

Field	Default	What It Caps
`maxTurns`	`8`	LLM request-response cycles in one invocation
`maxToolCalls`	`32`	Total tool invocations across the loop
`maxDuration`	`5.minutes`	Wall-clock time from invocation start
`perToolTimeout`	`null`	One tool execution; `null` means no per-tool timeout
`maxTokens`	`null`	Cumulative provider-reported prompt + completion tokens
`maxConsecutiveSameTool`	`null`	Immediate repeats of the same tool without another tool in between

BudgetExceededException

When the agent reaches any cap, the framework throws BudgetExceededException with a BudgetReason:

import agents_engine.model.BudgetExceededException

try {
    val result = agent("Analyze all 10,000 files in the repository")
} catch (e: BudgetExceededException) {
    println("Agent ran out of budget (${e.reason}): ${e.message}")
    // Handle gracefully: return partial result, notify user, etc.
}

For turn, duration, token, and tool-call caps, the exception is thrown before the next bounded operation would happen. This means:

All previous tool calls have completed.
All previous LLM responses are intact.
The agent's message history is available up to the point of termination.

Catching in Pipelines

In a then pipeline, BudgetExceededException propagates like any other exception:

val pipeline = parse then analyze then summarize

try {
    pipeline(input)
} catch (e: BudgetExceededException) {
    // Which agent exceeded its budget? Check the message.
    println(e.reason)   // TURNS, TOOL_CALLS, DURATION, TOKENS, ...
    println(e.message)
}

Counting Turns

A turn is one LLM request-response cycle. Here is how turns map to the agentic loop:

Turn 1: LLM receives [system, user] -> returns ToolCalls([search("kotlin agents")])
         Framework executes search, appends tool result

Turn 2: LLM receives [system, user, assistant(toolcalls), tool(result)] -> returns ToolCalls([summarize(...)])
         Framework executes summarize, appends tool result

Turn 3: LLM receives [system, user, assistant, tool, assistant, tool] -> returns Text("Here is the summary...")
         Done. 3 turns used.

Key points:

Each call to ModelClient.chat() is one turn.
Multiple tool calls in a single LLM response count as one turn (the LLM made one request that happened to include multiple tool calls).
The final Text response also counts as a turn.
Tool execution itself does not count -- only the LLM call does.

Example: Turn Counting

val agent = agent<String, String>("counter-demo") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 3 }

    lateinit var stepA: Tool<Map<String, Any?>, Any?>
    lateinit var stepB: Tool<Map<String, Any?>, Any?>
    tools {
        stepA = tool("step_a", "First step")  { args -> "result_a" }
        stepB = tool("step_b", "Second step") { args -> "result_b" }
    }
    skills {
        skill<String, String>("work", "Do work") {
            tools(stepA, stepB)
        }
    }
}

If the LLM's behavior is:

Turn 1: calls step_a and step_b together -> 1 turn
Turn 2: calls step_a again -> 1 turn
Turn 3: returns text "Done" -> 1 turn

Total: 3 turns. Exactly at the limit. If the LLM tried a 4th call, it would throw.

Best Practices

1. Tune Budgets in Production

// Defaults are bounded, but may not match this workflow.
val agent = agent<String, String>("risky") {
    model { ollama("qwen2.5:7b") }
    // Uses defaults: 8 turns, 32 tool calls, 5 minutes.
    // ...
}

// Be explicit for production workflows.
val agent = agent<String, String>("safe") {
    model { ollama("qwen2.5:7b") }
    budget {
        maxTurns = 15
        maxToolCalls = 40
        maxDuration = 3.minutes
        maxConsecutiveSameTool = 3
    }
    // ...
}

2. Budget by Task Complexity

Match your budget to the expected number of tool calls:

Task Type	Typical Turns	Suggested Budget
Single tool call + answer	2	3-5
Multi-step analysis (3-5 tools)	4-6	8-10
Complex research (many tools, iteration)	8-15	15-20
Open-ended exploration	10-30	25-30

Leave headroom above the expected turns. The LLM might need an extra turn to correct a mistake or rephrase its answer.

3. Separate Budgets for Nested Agents

When agents are composed via structure {}, each has its own budget. A parent agent with maxTurns = 10 does not share that budget with its children:

val researcher = agent<String, String>("researcher") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 20 }   // generous budget for deep research
    // ...
}

val summarizer = agent<String, String>("summarizer") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 3 }    // tight budget: should be quick
    // ...
}

val pipeline = researcher then summarizer
// researcher gets 20 turns, summarizer gets 3 -- independent

4. Use Low Budgets for Repair Agents

Tool Error Recovery repair agents should have tight budgets. A repair agent that loops is worse than the original error:

val jsonFixer = agent<String, String>("json-fixer") {
    model { ollama("qwen2.5:7b") }
    budget { maxTurns = 1 }    // single-shot: one LLM call, no tools
    // ...
}

5. Test Budget Boundaries

Write tests that verify your agent completes within its budget:

@Test
fun `agent completes within budget`() {
    var turnCount = 0
    val mockClient = ModelClient { messages ->
        turnCount++
        if (turnCount < 3) {
            LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
        } else {
            LlmResponse.Text("done")
        }
    }

    val agent = agent<String, String>("test") {
        model { ollama("unused"); client = mockClient }
        budget { maxTurns = 5 }

        lateinit var step: Tool<Map<String, Any?>, Any?>
        tools { step = tool("step", "A step") { "ok" } }
        skills {
            skill<String, String>("work", "Work") {
                tools(step)
            }
        }
    }

    val result = agent("go")
    assertEquals("done", result)
    assertEquals(3, turnCount)  // completed in 3 turns, well within budget of 5
}

@Test
fun `agent throws when budget exceeded`() {
    val mockClient = ModelClient { _ ->
        // Never returns Text -- always calls tools
        LlmResponse.ToolCalls(listOf(ToolCall("step", emptyMap())))
    }

    val agent = agent<String, String>("test") {
        model { ollama("unused"); client = mockClient }
        budget { maxTurns = 3 }

        lateinit var step: Tool<Map<String, Any?>, Any?>
        tools { step = tool("step", "A step") { "ok" } }
        skills {
            skill<String, String>("work", "Work") {
                tools(step)
            }
        }
    }

    assertThrows<BudgetExceededException> {
        agent("go")
    }
}

6. Add a Pre-Cap Warning

Use onBudgetThreshold when you want to see a warning before a cap throws. It fires once per BudgetReason when cumulative usage crosses the threshold:

val agent = agent<String, String>("monitored") {
    model { ollama("qwen2.5:7b") }
    budget {
        maxTurns = 10
        maxToolCalls = 20
        maxTokens = 8_000
    }

    onBudgetThreshold(0.8) { reason, usedPercent ->
        println("Budget warning: $reason is at ${(usedPercent * 100).toInt()}%")
    }

    // ...
}

Combine this with Observability Hooks to tune budgets over time. If an agent consistently uses 3 turns, a budget of 20 is wasteful -- tighten it to catch regressions early.

Next Steps

Model & Tool Calling -- understand the loop that budgets constrain
Tool Error Recovery -- error recovery interacts with budgets (retries consume turns)
Observability Hooks -- monitor budget usage in real time

Agents.KT Wiki

Project Links

Getting Started

Core Concepts

Composition Operators

LLM Integration

Guided Generation

Agent Memory

MemoryBank

Reference

Contributing

Building From Source

Budget Controls

Budget Controls

Why Budgets Exist

Configuration

BudgetConfig

BudgetExceededException

Catching in Pipelines

Counting Turns

Example: Turn Counting

Best Practices

1. Tune Budgets in Production

2. Budget by Task Complexity

3. Separate Budgets for Nested Agents

4. Use Low Budgets for Repair Agents

5. Test Budget Boundaries

6. Add a Pre-Cap Warning

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally