Skip to content

Commit 34f1eb0

Browse files
authored
Merge pull request #190 from Predicate-Labs/agent_auto
Predicate agent
2 parents 18d5c96 + c5541b0 commit 34f1eb0

10 files changed

+1129
-2
lines changed

CHANGELOG.md

Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,115 @@ All notable changes to `@predicatelabs/sdk` will be documented in this file.
44

55
## Unreleased
66

7+
### 2026-02-15
8+
9+
#### PredicateBrowserAgent (snapshot-first, verification-first)
10+
11+
`PredicateBrowserAgent` is a new high-level agent wrapper that gives you a **browser-use-like** `step()` / `run()` surface, but keeps Predicate’s core philosophy:
12+
13+
- **Snapshot-first perception** (structured DOM snapshot is the default)
14+
- **Verification-first control plane** (you can gate progress with deterministic checks)
15+
- Optional **vision fallback** (bounded) when snapshots aren’t sufficient
16+
17+
It’s built on top of `AgentRuntime` + `RuntimeAgent`.
18+
19+
##### Quickstart (single step)
20+
21+
```ts
22+
import {
23+
AgentRuntime,
24+
PredicateBrowserAgent,
25+
type RuntimeStep,
26+
LocalLLMProvider, // or OpenAIProvider / AnthropicProvider / DeepInfraProvider
27+
} from '@predicatelabs/sdk';
28+
29+
const runtime = new AgentRuntime(browserLike, page, tracer);
30+
const llm = new LocalLLMProvider({ model: 'qwen2.5:7b', baseUrl: 'http://localhost:11434/v1' });
31+
32+
const agent = new PredicateBrowserAgent({
33+
runtime,
34+
executor: llm,
35+
config: {
36+
// Token control: include last N step summaries in the prompt (0 disables history).
37+
historyLastN: 2,
38+
},
39+
});
40+
41+
const ok = await agent.step({
42+
taskGoal: 'Find pricing and verify checkout button exists',
43+
step: { goal: 'Open pricing page' } satisfies RuntimeStep,
44+
});
45+
```
46+
47+
##### Customize the compact prompt (advanced)
48+
49+
```ts
50+
const agent = new PredicateBrowserAgent({
51+
runtime,
52+
executor: llm,
53+
config: {
54+
compactPromptBuilder: (_taskGoal, _stepGoal, domContext, _snap, historySummary) => ({
55+
systemPrompt:
56+
'You are a web automation agent. Return ONLY one action: CLICK(id) | TYPE(id,"text") | PRESS("key") | FINISH()',
57+
userPrompt: `RECENT:\n${historySummary}\n\nELEMENTS:\n${domContext}\n\nReturn the single best action:`,
58+
}),
59+
},
60+
});
61+
```
62+
63+
##### CAPTCHA handling (interface-only; no solver shipped)
64+
65+
If you set `captcha.policy="callback"`, you must provide a handler. The SDK does **not** include a public CAPTCHA solver.
66+
67+
```ts
68+
import { HumanHandoffSolver } from '@predicatelabs/sdk';
69+
70+
const agent = new PredicateBrowserAgent({
71+
runtime,
72+
executor: llm,
73+
config: {
74+
captcha: {
75+
policy: 'callback',
76+
// Manual solve in the live session; SDK waits until it clears:
77+
handler: HumanHandoffSolver({ timeoutMs: 10 * 60_000, pollMs: 1_000 }),
78+
},
79+
},
80+
});
81+
```
82+
83+
#### RuntimeAgent: structured prompt override hooks
84+
85+
`RuntimeAgent` now supports optional hooks used by `PredicateBrowserAgent`:
86+
87+
- `structuredPromptBuilder(...)`
88+
- `domContextPostprocessor(...)`
89+
- `historySummaryProvider(...)`
90+
91+
#### PredicateBrowserAgent: opt-in token usage accounting (best-effort)
92+
93+
If you want to measure token spend, you can enable best-effort accounting (depends on provider reporting token counts):
94+
95+
```ts
96+
const agent = new PredicateBrowserAgent({
97+
runtime,
98+
executor: llm,
99+
config: {
100+
tokenUsageEnabled: true,
101+
},
102+
});
103+
104+
const usage = agent.getTokenUsage();
105+
agent.resetTokenUsage();
106+
```
107+
108+
#### RuntimeAgent: actOnce without step lifecycle (orchestrators)
109+
110+
`RuntimeAgent` now exposes `actOnce(...)` helpers that execute exactly one action **without** calling `runtime.beginStep()` / `runtime.emitStepEnd()`. This is intended for external orchestrators (e.g. WebBench) that already own step lifecycle and just want the SDK’s snapshot-first propose+execute block.
111+
112+
- `await agent.actOnce(...) -> string`
113+
- `await agent.actOnceWithSnapshot(...) -> { action, snap }`
114+
- `await agent.actOnceResult(...) -> { action, snap, usedVision }`
115+
7116
### 2026-02-13
8117

9118
#### Expanded deterministic verifications (adaptive resnapshotting)

examples/agent/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
Predicate agent examples.
2+
3+
- `predicate-browser-agent-minimal.ts`: minimal `PredicateBrowserAgent` usage.
4+
- `predicate-browser-agent-custom-prompt.ts`: customize the compact prompt builder.
5+
- `predicate-browser-agent-video-recording-playwright.ts`: enable Playwright video recording via context options (recommended).
6+
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
/**
2+
* Example: PredicateBrowserAgent with compact prompt customization.
3+
*
4+
* Usage:
5+
* ts-node examples/agent/predicate-browser-agent-custom-prompt.ts
6+
*/
7+
8+
import { Page } from 'playwright';
9+
import {
10+
AgentRuntime,
11+
PredicateBrowserAgent,
12+
type PredicateBrowserAgentConfig,
13+
RuntimeStep,
14+
SentienceBrowser,
15+
} from '../../src';
16+
import { createTracer } from '../../src/tracing/tracer-factory';
17+
import { LLMProvider, type LLMResponse } from '../../src/llm-provider';
18+
import type { Snapshot } from '../../src/types';
19+
20+
function createBrowserAdapter(browser: SentienceBrowser) {
21+
return {
22+
snapshot: async (_page: Page, options?: Record<string, any>): Promise<Snapshot> => {
23+
return await browser.snapshot(options);
24+
},
25+
};
26+
}
27+
28+
class RecordingProvider extends LLMProvider {
29+
public lastSystem: string | null = null;
30+
public lastUser: string | null = null;
31+
32+
constructor(private action: string = 'FINISH()') {
33+
super();
34+
}
35+
36+
get modelName(): string {
37+
return 'recording-provider';
38+
}
39+
supportsJsonMode(): boolean {
40+
return false;
41+
}
42+
async generate(
43+
systemPrompt: string,
44+
userPrompt: string,
45+
_options: Record<string, any> = {}
46+
): Promise<LLMResponse> {
47+
this.lastSystem = systemPrompt;
48+
this.lastUser = userPrompt;
49+
return { content: this.action, modelName: this.modelName };
50+
}
51+
}
52+
53+
const config: PredicateBrowserAgentConfig = {
54+
historyLastN: 2,
55+
compactPromptBuilder: (
56+
taskGoal: string,
57+
stepGoal: string,
58+
domContext: string,
59+
_snap: Snapshot,
60+
historySummary: string
61+
) => {
62+
const systemPrompt =
63+
'You are a web automation executor. Return ONLY ONE action: CLICK(id) | TYPE(id,"text") | PRESS("key") | FINISH(). No prose.';
64+
const userPrompt =
65+
`TASK GOAL:\n${taskGoal}\n\n` +
66+
(historySummary ? `RECENT STEPS:\n${historySummary}\n\n` : '') +
67+
`STEP GOAL:\n${stepGoal}\n\n` +
68+
`DOM CONTEXT:\n${domContext.slice(0, 4000)}\n`;
69+
return { systemPrompt, userPrompt };
70+
},
71+
};
72+
73+
async function main() {
74+
const apiKey = (process.env.PREDICATE_API_KEY ||
75+
process.env.SENTIENCE_API_KEY) as string | undefined;
76+
if (!apiKey) {
77+
console.error('Error: PREDICATE_API_KEY or SENTIENCE_API_KEY not set');
78+
process.exit(1);
79+
}
80+
81+
const runId = 'predicate-browser-agent-custom-prompt';
82+
const tracer = await createTracer({ apiKey, runId, uploadTrace: false });
83+
84+
const browser = new SentienceBrowser(apiKey, undefined, false);
85+
await browser.start();
86+
const page = browser.getPage();
87+
88+
try {
89+
await page.goto('https://example.com');
90+
await page.waitForLoadState('networkidle');
91+
92+
const runtime = new AgentRuntime(createBrowserAdapter(browser), page, tracer);
93+
const executor = new RecordingProvider('FINISH()');
94+
95+
const agent = new PredicateBrowserAgent({ runtime, executor, config });
96+
97+
const out = await agent.step({
98+
taskGoal: 'Open example.com',
99+
step: { goal: 'Take no action; just finish' } satisfies RuntimeStep,
100+
});
101+
102+
console.log(`step ok: ${out.ok}`);
103+
console.log('--- prompt preview (system) ---');
104+
console.log((executor.lastSystem || '').slice(0, 300));
105+
console.log('--- prompt preview (user) ---');
106+
console.log((executor.lastUser || '').slice(0, 300));
107+
} finally {
108+
await tracer.close(true);
109+
await browser.close();
110+
}
111+
}
112+
113+
main().catch(console.error);
114+
Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
/**
2+
* Example: PredicateBrowserAgent minimal demo.
3+
*
4+
* Usage:
5+
* ts-node examples/agent/predicate-browser-agent-minimal.ts
6+
*
7+
* Requires:
8+
* - PREDICATE_API_KEY or SENTIENCE_API_KEY (SentienceBrowser API key)
9+
*/
10+
11+
import { Page } from 'playwright';
12+
import {
13+
AgentRuntime,
14+
PredicateBrowserAgent,
15+
type PredicateBrowserAgentConfig,
16+
RuntimeStep,
17+
StepVerification,
18+
SentienceBrowser,
19+
exists,
20+
urlContains,
21+
} from '../../src';
22+
import { createTracer } from '../../src/tracing/tracer-factory';
23+
import { LLMProvider, type LLMResponse } from '../../src/llm-provider';
24+
import type { Snapshot } from '../../src/types';
25+
26+
function createBrowserAdapter(browser: SentienceBrowser) {
27+
return {
28+
snapshot: async (_page: Page, options?: Record<string, any>): Promise<Snapshot> => {
29+
return await browser.snapshot(options);
30+
},
31+
};
32+
}
33+
34+
class FixedActionProvider extends LLMProvider {
35+
constructor(private action: string) {
36+
super();
37+
}
38+
get modelName(): string {
39+
return 'fixed-action';
40+
}
41+
supportsJsonMode(): boolean {
42+
return false;
43+
}
44+
async generate(
45+
_systemPrompt: string,
46+
_userPrompt: string,
47+
_options: Record<string, any> = {}
48+
): Promise<LLMResponse> {
49+
return { content: this.action, modelName: this.modelName };
50+
}
51+
}
52+
53+
async function main() {
54+
const apiKey = (process.env.PREDICATE_API_KEY ||
55+
process.env.SENTIENCE_API_KEY) as string | undefined;
56+
if (!apiKey) {
57+
console.error('Error: PREDICATE_API_KEY or SENTIENCE_API_KEY not set');
58+
process.exit(1);
59+
}
60+
61+
const runId = 'predicate-browser-agent-minimal';
62+
const tracer = await createTracer({ apiKey, runId, uploadTrace: false });
63+
64+
const browser = new SentienceBrowser(apiKey, undefined, false);
65+
await browser.start();
66+
const page = browser.getPage();
67+
68+
try {
69+
await page.goto('https://example.com');
70+
await page.waitForLoadState('networkidle');
71+
72+
const runtime = new AgentRuntime(createBrowserAdapter(browser), page, tracer);
73+
74+
const executor = new FixedActionProvider('FINISH()');
75+
const config: PredicateBrowserAgentConfig = { historyLastN: 2 };
76+
77+
const agent = new PredicateBrowserAgent({ runtime, executor, config });
78+
79+
const steps: RuntimeStep[] = [
80+
{
81+
goal: 'Verify Example Domain is loaded',
82+
verifications: [
83+
{
84+
predicate: urlContains('example.com'),
85+
label: 'url_contains_example',
86+
required: true,
87+
} satisfies StepVerification,
88+
{
89+
predicate: exists('role=heading'),
90+
label: 'has_heading',
91+
required: true,
92+
} satisfies StepVerification,
93+
],
94+
maxSnapshotAttempts: 2,
95+
snapshotLimitBase: 60,
96+
},
97+
];
98+
99+
const ok = await agent.run({ taskGoal: 'Open example.com and verify', steps });
100+
console.log(`run ok: ${ok}`);
101+
} finally {
102+
await tracer.close(true);
103+
await browser.close();
104+
}
105+
}
106+
107+
main().catch(console.error);
108+

0 commit comments

Comments
 (0)