diff --git a/.claude/skills/audit-langchain-docs/SKILL.md b/.claude/skills/audit-langchain-docs/SKILL.md
new file mode 100644
index 0000000..13441b2
--- /dev/null
+++ b/.claude/skills/audit-langchain-docs/SKILL.md
@@ -0,0 +1,169 @@
+---
+name: audit-langchain-docs
+description: "Audit all Diffbot documentation in langchain-ai/docs for drift against this repo's public API. Runs Vale and other docs-repo validators, then applies any prose fixes found back to README.md. Use when the user says: audit docs, check docs are up to date, review langchain docs, docs drift."
+allowed-tools: Bash(python3:*), Bash(grep:*), Bash(find:*), Bash(make:*), Read, Edit
+---
+
+# Audit Diffbot docs in langchain-ai/docs
+
+This skill checks every Diffbot documentation file on the LangChain docs site against this repo's public API, runs the docs repo's own validators (Vale, etc.), and propagates any findings back into this repo — `README.md` and Python source files (docstrings, comments). The docs repo's validation standards apply repo-wide here.
+
+## Source of truth (this repo)
+
+| Artifact | What it defines |
+|----------|----------------|
+| `langchain_diffbot/__init__.py` (`__all__`) | The complete public API — every class that should be documented |
+| `langchain_diffbot/tools/` | Tool class signatures, input schemas, return types |
+| `langchain_diffbot/retrievers/` | Retriever constructor parameters, return types |
+| `langchain_diffbot/chat_models/` | Chat model constructor parameters |
+| `langchain_diffbot/document_loaders/` | Loader constructor parameters |
+| `README.md` | Canonical prose, API table, auth model, examples |
+
+## Documentation files to audit (langchain-ai/docs)
+
+A local `langchain-ai/docs` checkout is expected at the sibling **`../langchain-docs`** by default (override with `$LANGCHAIN_DOCS_REPO`).
+
+| File | What it should reflect |
+|------|----------------------|
+| `src/oss/python/integrations/providers/diffbot.mdx` | Overview hub: API table, install, auth, component table, links to tools/retrievers pages |
+| `src/oss/python/integrations/tools/diffbot.mdx` | All 7 tools: `DiffbotExtractTool`, `DiffbotWebSearchTool`, `DiffbotKnowledgeGraphTool`, `DiffbotEntitiesTool`, `DiffbotAskTool`, `DiffbotOntologyTool`, `DiffbotDQLProbeTool` |
+| `src/oss/python/integrations/retrievers/diffbot.mdx` | Both retrievers: `DiffbotKnowledgeGraphRetriever`, `DiffbotWebSearchRetriever` |
+| `src/oss/python/integrations/providers/all_providers.mdx` | Card entry for Diffbot |
+| `src/oss/python/integrations/tools/index.mdx` | Row in Search table + card in All tools and toolkits |
+| `src/oss/python/integrations/retrievers/index.mdx` | Rows in External index table + card in All retrievers |
+
+## Steps
+
+
+
+### Locate the docs repo
+
+```bash
+DOCS_REPO="${LANGCHAIN_DOCS_REPO:-../langchain-docs}"
+if [ ! -d "$DOCS_REPO/src" ]; then
+ echo "ERROR: docs repo not found at $DOCS_REPO"
+ exit 1
+fi
+echo "Docs repo: $DOCS_REPO"
+```
+
+### Read the public API surface
+
+Read `langchain_diffbot/__init__.py` and extract every name from `__all__`. This is the canonical list of classes that must appear in the docs.
+
+Also read the source files to extract key constructor parameters and return types for each class:
+- Tools: what input schema does each tool accept? What does it return?
+- Retrievers: what constructor parameters does each take (`client`, `k`, `fields`, `content_fields`, `document_mapper`)?
+- Chat model: what parameters does `ChatDiffbot` take?
+- Loaders: what parameters do `DiffbotExtractLoader` and `DiffbotCrawlLoader` take?
+
+### Read every documentation file
+
+Read all 6 files listed in the table above. Use the file paths relative to `$DOCS_REPO/src/`.
+
+### Run Vale on the three Diffbot content pages
+
+Run the docs repo's own Vale validation against the three Diffbot content pages. The docs repo's `make lint_prose` accepts a space-separated `FILES=` argument:
+
+```bash
+make -C "${LANGCHAIN_DOCS_REPO:-../langchain-docs}" lint_prose \
+ FILES="src/oss/python/integrations/providers/diffbot.mdx \
+ src/oss/python/integrations/tools/diffbot.mdx \
+ src/oss/python/integrations/retrievers/diffbot.mdx"
+```
+
+Capture the full output. Any errors or warnings Vale reports are **definitive** — they are exactly what would fail the docs repo's CI. Include every Vale finding verbatim in the report under a **Vale violations** category. If Vale reports clean, note that explicitly.
+
+### Compare and report
+
+For each issue found, report: the file path, the line or section, and the specific discrepancy. Organize the report into these categories:
+
+**Missing coverage** — Classes in `__all__` not mentioned anywhere in the docs:
+- Check `tools/diffbot.mdx` covers all 7 tool classes
+- Check `retrievers/diffbot.mdx` covers both retriever classes
+- Check `providers/diffbot.mdx` component table has all 12 classes
+
+**Stale class names** — Class names in docs that no longer exist in `__all__`:
+- Search docs files for any class name starting with `Diffbot` or `ChatDiffbot` and verify each is still in `__all__`
+
+**Missing constructor parameters** — Key parameters documented in docs but removed from source, or present in source but undocumented:
+- Retrievers: `k`, `fields`, `content_fields`, `document_mapper` — verify docs mention all four
+- Tools: verify each tool's documented input fields match the actual input schema
+
+**Stale prose** — Description in docs contradicts current behavior:
+- The authentication model (client-based, not token-based env var pattern)
+- Error handling behavior of `DiffbotExtractTool` (returns `{"error": ..., "errorCode": ...}` dict on failure, does not raise)
+- `DiffbotCrawlLoader` page_content behavior (URL only, not page content)
+
+**Broken cross-links** — Internal links in docs files that point to non-existent pages:
+- `providers/diffbot.mdx` links to `/oss/integrations/tools/diffbot` and `/oss/integrations/retrievers/diffbot` — verify both pages exist
+- `tools/diffbot.mdx` links to `/oss/integrations/providers/diffbot` — verify it exists
+- `retrievers/diffbot.mdx` links to `/oss/integrations/providers/diffbot` — verify it exists
+
+**Missing index entries** — Diffbot entries absent from listing pages:
+- `all_providers.mdx` has a card for Diffbot
+- `tools/index.mdx` has a row in the Search table and a card in All tools and toolkits
+- `retrievers/index.mdx` has rows in the External index table and a card in All retrievers
+
+**Vale violations** — Prose issues caught by the docs repo's Vale CI (terminology, dash spacing, etc.). These are definitive: any error here would block a PR merge. Include the raw Vale output line for each finding.
+
+**README/hub drift** — The provider hub page (`providers/diffbot.mdx`) is kept in sync with README.md. Check:
+- The API table in the hub matches the API table in README.md
+- The component reference table in the hub matches README.md and `__all__`
+- The install instructions are consistent
+
+### Output a summary
+
+Produce a concise report:
+
+```
+DIFFBOT DOCS AUDIT —
+
+VALE —
+
+
+✅ PASSING
+ - All 12 classes in __all__ appear in docs
+ - ...
+
+⚠️ ISSUES ()
+ 1. [VALE] providers/diffbot.mdx:39 — Remove whitespace around ' —'. (LangChain.DashesSpaces)
+ 2. [MISSING COVERAGE] tools/diffbot.mdx: DiffbotFooTool added to __all__ but not documented
+ 3. [STALE IMPORTS] retrievers/diffbot.mdx:55 — legacy langchain.schema.* import path
+ 4. [MISSING INDEX ENTRY] tools/index.mdx: Diffbot missing from Search table
+ ...
+
+FIXED IN THIS REPO
+ - README.md:48 — "pre-built" → "prebuilt"
+ - langchain_diffbot/tools.py:43 — docstring: "pre-built" → "prebuilt"
+
+STILL NEEDS ATTENTION IN langchain-docs
+ - retrievers/diffbot.mdx:55 — legacy langchain.schema.* import (edit directly in langchain-docs)
+```
+
+### Propagate findings back into this repo
+
+For every issue found — Vale violations, stale prose, outdated import patterns — apply the same fix throughout this repo:
+
+1. **README.md** — fix any matching prose, terminology, or example code.
+2. **Python source files** (`langchain_diffbot/*.py`) — fix matching issues in docstrings and inline comments. Do not change logic or signatures.
+
+Use `grep` to find occurrences before editing:
+
+```bash
+grep -rn "pre-built\| — " langchain_diffbot/ README.md
+```
+
+After applying fixes, run the parity tests to confirm nothing broke:
+
+```bash
+uv run pytest tests/unit_tests/test_readme_parity.py tests/unit_tests/test_readme_examples.py -q
+```
+
+
+
+## Notes
+
+- Fixes to `README.md` and Python source are applied directly by this skill. Fixes to `langchain-docs` files must be made there separately (this repo has no write access to the remote).
+- To sync the provider hub after fixing README.md: run `/sync-langchain-docs`.
+- To fix tools/retrievers deep-dive pages: edit them directly in `langchain-ai/docs`.
diff --git a/.claude/skills/sync-langchain-docs/SKILL.md b/.claude/skills/sync-langchain-docs/SKILL.md
new file mode 100644
index 0000000..eda4c30
--- /dev/null
+++ b/.claude/skills/sync-langchain-docs/SKILL.md
@@ -0,0 +1,114 @@
+---
+name: sync-langchain-docs
+description: "Keep all Diffbot docs in sync: README.md in this repo and the three LangChain docs pages (providers, tools, retrievers). Use when code changes, when any doc page drifts, or when the user says: sync langchain docs, update the diffbot docs, sync the integration pages."
+allowed-tools: Bash(python3:*), Bash(git:*), Bash(gh:*), Bash(make:*), Read, Edit, Write
+---
+
+# Sync Diffbot docs across all pages
+
+## What this skill does
+
+Four documents describe the `langchain-diffbot` package. Keep them all accurate and consistent — update whichever ones need it, not just one.
+
+| File | Audience | Owns |
+|------|----------|------|
+| `README.md` (this repo) | GitHub / PyPI readers | Complete package reference — install, auth, all classes, examples |
+| `providers/diffbot.mdx` (langchain-docs) | Docs site visitors landing on Diffbot | Overview only: install, auth, components table, links to detail pages |
+| `tools/diffbot.mdx` (langchain-docs) | Docs site visitors looking for tools | Full tool documentation with examples |
+| `retrievers/diffbot.mdx` (langchain-docs) | Docs site visitors looking for retrievers | Full retriever documentation with examples |
+
+**Link, don't duplicate within langchain-docs.** The provider hub names every class and links to the tools/retrievers pages; the detail pages don't repeat install/auth. The README is for a different audience and channel (GitHub/PyPI) — it can be complete without violating this rule.
+
+## Where the docs repo lives
+
+A local `langchain-ai/docs` checkout is expected at the sibling **`../langchain-docs`** by default (override with `$LANGCHAIN_DOCS_REPO`). Verify it:
+
+```bash
+python3 .claude/skills/sync-langchain-docs/sync.py --path
+```
+
+If that errors, ask the user for its path or to clone `git@github.com:langchain-ai/docs.git` next to this repo.
+
+## Steps
+
+
+
+### Read all four files
+
+Read every file before touching any of them:
+
+- `README.md`
+- `$(python3 .claude/skills/sync-langchain-docs/sync.py --repo)/src/oss/python/integrations/providers/diffbot.mdx`
+- `$(python3 .claude/skills/sync-langchain-docs/sync.py --repo)/src/oss/python/integrations/tools/diffbot.mdx`
+- `$(python3 .claude/skills/sync-langchain-docs/sync.py --repo)/src/oss/python/integrations/retrievers/diffbot.mdx`
+
+Also check what triggered the sync — git diff, the user's description, or a specific change — so you know what actually changed and can limit edits to what's necessary.
+
+### Identify what needs updating
+
+For each of the four files, decide independently whether it needs a change. Common triggers:
+
+- **New or renamed class** → update the components table in README + provider hub; add documentation to the appropriate detail page (tools or retrievers); update any import examples.
+- **Behavior change to a tool or retriever** → update README + the matching detail page.
+- **Auth model change** → update README + provider hub (both cover auth); check if detail pages reference auth.
+- **Install instructions change** → update README + provider hub.
+- **Example improvement** → update README; mirror to the matching detail page if it's more illustrative.
+- **Detail page drifted from the code** → update just that page.
+
+If only one file needs a change, only edit that file.
+
+### Apply the updates
+
+Edit each file that needs it. Rules per file:
+
+**README.md** — complete reference, no format restrictions. Run the parity guard after any change to ensure the components table and examples stay in sync with the package:
+
+```bash
+uv run pytest tests/unit_tests/test_readme_parity.py tests/unit_tests/test_readme_examples.py -q
+```
+
+**providers/diffbot.mdx** — hub only. Structure:
+1. Frontmatter (`title`, `description`)
+2. Sync comment (keep it — see current file for wording)
+3. Short intro + API → class mapping table
+4. `## Installation` as `` with `pip` and `uv` tabs
+5. `## Authentication` — prose + `db = Diffbot(...)` snippet only; no usage examples
+6. One short section per class group (Retrievers, Tools, Chat model, Document loaders) — one sentence + import snippet + link to the detail page; no examples
+7. `## Components reference` table
+
+**tools/diffbot.mdx** — full tool docs. Include every tool class, usage examples, and any agent patterns. Link back to the provider hub for install/auth. Do not repeat retriever content.
+
+**retrievers/diffbot.mdx** — full retriever docs. Include both retriever classes, output shaping, LCEL chain usage. Link back to the provider hub for install/auth. Do not repeat tool content.
+
+MDX formatting rules (Vale enforces these — violations block the docs CI):
+- Em dashes: no surrounding spaces (`word—word`, not `word — word`)
+- `prebuilt` not `pre-built`
+- Install blocks: `` with `pip` and `uv` tabs
+- Relative links to this repo become absolute `https://github.com/diffbot/langchain-diffbot/...` URLs
+
+### Lint every changed MDX file
+
+The docs repo has its own Vale setup. Run it against each changed MDX:
+
+```bash
+make -C "$(python3 .claude/skills/sync-langchain-docs/sync.py --repo)" \
+ lint_prose FILES="src/oss/python/integrations/providers/diffbot.mdx"
+```
+
+**If Vale or any other docs-repo validation catches a prose issue, fix it in `README.md` too** — the docs repo leads on prose quality and the README should follow. Fix violations and re-run until clean.
+
+### Commit and push
+
+Work from inside the docs repo. Reuse the existing `integration/diffbot` branch if it exists; otherwise create `docs/sync-diffbot-`.
+
+```bash
+DOCS="${LANGCHAIN_DOCS_REPO:-../langchain-docs}"
+cd "$DOCS"
+git add
+git commit -m "docs: "
+git push
+```
+
+Stage only the files you changed. If the branch already has an open PR, the push updates it automatically — no need to open a new one unless the user asks.
+
+
diff --git a/.claude/skills/sync-langchain-docs/sync.py b/.claude/skills/sync-langchain-docs/sync.py
new file mode 100644
index 0000000..e9e8634
--- /dev/null
+++ b/.claude/skills/sync-langchain-docs/sync.py
@@ -0,0 +1,71 @@
+#!/usr/bin/env python3
+"""Resolve the Diffbot provider page inside a local langchain-ai/docs checkout.
+
+README.md in this repo is the single source of truth for the Diffbot provider
+page on the LangChain docs site; the `sync-langchain-docs` skill *generates* the
+page (`.mdx`) from it. The generation itself is agent-driven (prose → house
+style), so this script does not copy anything — it just resolves where the page
+lives, so the skill never hardcodes the path:
+
+ # Resolve the docs repo from --docs-repo, $LANGCHAIN_DOCS_REPO, or the
+ # sibling ../langchain-docs, then print:
+ python3 sync.py --path # absolute path of the target .mdx page
+ python3 sync.py --repo # absolute path of the docs repo root
+
+The target path inside the docs repo is fixed:
+ src/oss/python/integrations/providers/diffbot.mdx
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+from pathlib import Path
+
+# This file lives at /.claude/skills/sync-langchain-docs/sync.py.
+_REPO_ROOT = Path(__file__).resolve().parents[3]
+TARGET_RELPATH = "src/oss/python/integrations/providers/diffbot.mdx"
+
+
+def resolve_docs_repo(arg: str | None) -> Path:
+ """Find the langchain-ai/docs checkout from the arg, env, or sibling dir."""
+ candidate = arg or os.environ.get("LANGCHAIN_DOCS_REPO")
+ if candidate:
+ path = Path(candidate).expanduser().resolve()
+ else:
+ path = (_REPO_ROOT.parent / "langchain-docs").resolve()
+ if not (path / TARGET_RELPATH).exists():
+ msg = (
+ f"langchain-ai/docs checkout not found at {path} "
+ f"(no {TARGET_RELPATH}). Pass --docs-repo or set LANGCHAIN_DOCS_REPO."
+ )
+ raise SystemExit(msg)
+ return path
+
+
+def main() -> int:
+ """Print the resolved docs repo root or target page path."""
+ parser = argparse.ArgumentParser(description=__doc__)
+ parser.add_argument(
+ "--repo",
+ action="store_true",
+ help="Print the docs repo root instead of the target page path.",
+ )
+ parser.add_argument(
+ "--path",
+ action="store_true",
+ help="Print the absolute path of the target page (default).",
+ )
+ parser.add_argument(
+ "--docs-repo", default=None, help="Path to a local langchain-ai/docs checkout."
+ )
+ args = parser.parse_args()
+
+ repo = resolve_docs_repo(args.docs_repo)
+ # Default to the page path so a bare invocation is useful too.
+ print(repo if args.repo else repo / TARGET_RELPATH)
+ return 0
+
+
+if __name__ == "__main__":
+ raise SystemExit(main())
diff --git a/.gitignore b/.gitignore
index dc4bc87..134ab8b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -14,3 +14,6 @@ build/
# Claude Code harness runtime artifacts
**/.claude/scheduled_tasks.lock
+
+# Claude Code personal/local settings
+.claude/settings.local.json
diff --git a/CLAUDE.md b/CLAUDE.md
index 25d2e8b..7f16a85 100644
--- a/CLAUDE.md
+++ b/CLAUDE.md
@@ -18,11 +18,11 @@ Every public class calls `diffbot.Diffbot` / `diffbot.DiffbotAsync` methods dire
Each class accepts the SDK's kwargs by name. The only renames are LangChain-convention: `size` → `k` on the KG retriever, `num_results` → `k` on the web-search retriever. Everything else (`from_`, `filter`, `format`, `exportspec`, `extra`, `max_tokens`, `api`, `fmt`, `lang`) keeps its SDK name.
-### Bring-your-own-client
+### Client-only: the SDK client is the single configuration surface
-`_BaseDiffbotComponent` (in `_base.py`) gives every class four optional fields: `diffbot_api_token`, `timeout`, `client`, `async_client`. The `client` / `async_client` fields take a pre-built `diffbot.Diffbot` / `diffbot.DiffbotAsync`. When supplied, we use it as-is and **do not close it** — the user owns the lifecycle. When not supplied, we construct a fresh SDK client per call (matching the legacy per-call lifecycle).
+`_BaseDiffbotComponent` (in `_base.py`) gives every class exactly two fields: `client` (a `diffbot.Diffbot`) and `async_client` (a `diffbot.DiffbotAsync`). The caller builds the SDK client and passes it in; `_sync_db` / `_async_db` yield it as-is and **never close it** (the caller owns the lifecycle), and raise a clear error if the matching client is missing. There is no `diffbot_api_token` / `timeout` on components and no per-call client construction — the component never builds a client itself.
-This is the escape hatch for anything the SDK supports that we don't re-expose: custom URLs (`analyze_url`, `web_search_url`, …), `transport=httpx.MockTransport(...)` for tests, shared connection pools, custom headers. We don't have to mirror each knob — the user just passes a configured client.
+This is deliberate: there is one way to give a component HTTP access (hand it a client), and one place to configure the SDK (the client you build). Everything the SDK supports — token, `timeout`, custom URLs (`analyze_url`, `web_search_url`, …), `transport=` (logging/retries/headers, or `httpx.MockTransport(...)` in tests) — is set on that client, not mirrored as component fields. Share one client across many components to reuse a single connection pool. Pick the execution mode by which client you build: `Diffbot` for the sync surface, `DiffbotAsync` for the async surface, both if a component is used both ways. Unit tests still mock at the httpx layer with `respx`, so they construct a `Diffbot(token="t")` and pass `client=`/`async_client=` without needing a real transport.
### Both sync and async are implemented natively
@@ -51,6 +51,23 @@ The intended agent loop is **introspect (ontology) → probe → run (`DiffbotKn
Methods like `crawl_list_jobs`, `crawl_get_job`, `crawl_delete_job`, and `dql_refresh_ontology` don't fit cleanly as LangChain primitives. We don't wrap them, but since every component exposes `.client` / `.async_client`, users can reach them directly.
+## Documentation
+
+Four documents describe this package; keep them all accurate when things change:
+
+| File | Audience | Owns |
+|------|----------|------|
+| `README.md` | GitHub / PyPI readers | Complete reference — install, auth, all classes, examples |
+| `providers/diffbot.mdx` (langchain-docs) | Docs site — Diffbot landing page | Overview only: install, auth, components table, links to detail pages |
+| `tools/diffbot.mdx` (langchain-docs) | Docs site — tools reference | Full tool documentation with examples |
+| `retrievers/diffbot.mdx` (langchain-docs) | Docs site — retrievers reference | Full retriever documentation with examples |
+
+The langchain-docs pages link to each other rather than duplicate content. Use the `sync-langchain-docs` skill to update whichever pages need it. The local docs checkout is expected at sibling `../langchain-docs` (overridable via `$LANGCHAIN_DOCS_REPO`).
+
+**The docs repo leads on prose quality.** If validation fixes are made to the docs pages (Vale, CI, editorial review), propagate those improvements back to `README.md` — don't let the README drift to a lower standard.
+
+Two unit suites keep the README in lockstep with the package: `tests/unit_tests/test_readme_parity.py` (the `## Components reference` table matches `__all__`, every class is documented, every example builds a client) and `tests/unit_tests/test_readme_examples.py` (every executable example runs under `respx` mocks; the crawl example is documented but not executed — flagged via `tests/readme.py`'s `is_executable`). `tests/integration_tests/test_readme_examples.py` runs the same blocks live.
+
## Commands
```
@@ -93,11 +110,11 @@ make verify-release # install from PyPI in a throwaway venv
```
langchain_diffbot/
├── __init__.py # public re-exports — every user-facing class listed in __all__
-├── _base.py # _BaseDiffbotComponent — token / timeout / client lifecycle helper
+├── _base.py # _BaseDiffbotComponent — holds client / async_client; yields them per call (never closes)
├── chat_models.py # ChatDiffbot (wraps `ask`)
├── document_loaders.py # DiffbotExtractLoader, DiffbotCrawlLoader
├── retrievers.py # DiffbotKnowledgeGraphRetriever, DiffbotWebSearchRetriever
-├── tools.py # DiffbotExtractTool, DiffbotWebSearchTool, DiffbotEntitiesTool, DiffbotKnowledgeGraphTool, DiffbotOntologyTool, DiffbotDQLProbeTool
+├── tools.py # DiffbotExtractTool, DiffbotWebSearchTool, DiffbotEntitiesTool, DiffbotKnowledgeGraphTool, DiffbotAskTool, DiffbotOntologyTool, DiffbotDQLProbeTool
└── py.typed # PEP 561 marker
tests/
├── unit_tests/ # no network — use respx to mock the SDK's httpx calls
diff --git a/README.md b/README.md
index 68eb1dd..1cacd9c 100644
--- a/README.md
+++ b/README.md
@@ -11,7 +11,7 @@ A thin LangChain integration over the official [`diffbot-python`](https://github
| Extract (Analyze) | `DiffbotExtractTool`, `DiffbotExtractLoader` |
| NLP entities | `DiffbotEntitiesTool` |
| Crawl | `DiffbotCrawlLoader` |
-| LLM RAG (`ask`) | `ChatDiffbot` (with native streaming) |
+| LLM RAG (`ask`) | `ChatDiffbot` (with native streaming), `DiffbotAskTool` |
## Installation
@@ -19,22 +19,30 @@ A thin LangChain integration over the official [`diffbot-python`](https://github
pip install langchain-diffbot
```
-## Authentication
+## Authentication & clients
-Get an API token at https://app.diffbot.com/get-started/ and export it:
+Get an API token at https://app.diffbot.com/get-started/.
-```bash
-export DIFFBOT_API_TOKEN="..."
+Every component takes a pre-built SDK client — you build a `diffbot.Diffbot` (sync) and/or `diffbot.DiffbotAsync` (async) and pass it via `client=` / `async_client=`. That's the only way to give a component HTTP access, and it keeps configuration in one place: customize the client (token, `timeout`, `transport=`, custom URLs) however the SDK allows, and share one client across many components to reuse a single connection pool. The component uses the client as-is and never closes it — you own its lifecycle.
+
+```python
+import os
+from diffbot import Diffbot
+
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
```
-Every class also accepts `diffbot_api_token=...` directly, or a pre-built `diffbot.Diffbot` client via `client=...` (see [Bring-your-own-client](#bring-your-own-client) below).
+Pick your execution mode by which client you build: `Diffbot` for the sync surface (`invoke`, `stream`, `load`), `DiffbotAsync` for the async surface (`ainvoke`, `astream`, `alazy_load`). Pass both if a component is used both ways.
## Quickstart — Knowledge Graph retriever
```python
+import os
+from diffbot import Diffbot
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
-retriever = DiffbotKnowledgeGraphRetriever(k=5)
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+retriever = DiffbotKnowledgeGraphRetriever(client=db, k=5)
docs = retriever.invoke("type:Organization industries:\"Artificial Intelligence\" location.city.name:\"Boston\"")
for d in docs:
print(d.metadata["name"], "—", d.page_content[:120])
@@ -47,18 +55,24 @@ The query string is a [DQL (Diffbot Query Language)](https://docs.diffbot.com/re
Diffbot KG entities and web-search results are large. Dumping them straight into an LLM prompt can blow past per-minute input-token limits in a single call. Both retrievers expose three shaping knobs:
```python
+import os
+from diffbot import Diffbot
from langchain_core.documents import Document
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+
# 1. Project only the top-level fields you care about. Drops everything else
# from `metadata`. Recommended for agent / tool-use scenarios.
retriever = DiffbotKnowledgeGraphRetriever(
+ client=db,
k=5,
fields=["id", "type", "name", "homepageUri", "nbEmployees"],
)
# 2. Choose which field becomes `page_content`. First non-empty value wins.
retriever = DiffbotKnowledgeGraphRetriever(
+ client=db,
content_fields=["summary", "description", "name"],
)
@@ -70,7 +84,7 @@ def mapper(entity: dict) -> Document:
metadata={"id": entity["id"], "name": entity["name"]},
)
-retriever = DiffbotKnowledgeGraphRetriever(document_mapper=mapper)
+retriever = DiffbotKnowledgeGraphRetriever(client=db, document_mapper=mapper)
```
`fields` and `content_fields` are ignored when `document_mapper` is set. The same knobs work on `DiffbotWebSearchRetriever`.
@@ -78,36 +92,60 @@ retriever = DiffbotKnowledgeGraphRetriever(document_mapper=mapper)
## Web search
```python
+import os
+from diffbot import Diffbot
from langchain_diffbot import DiffbotWebSearchRetriever
-web = DiffbotWebSearchRetriever(k=5, fields=["title", "pageUrl", "score"])
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+web = DiffbotWebSearchRetriever(client=db, k=5, fields=["title", "pageUrl", "score"])
docs = web.invoke("diffbot knowledge graph llm grounding")
```
## Extract a URL
```python
+import os
+from diffbot import Diffbot
from langchain_diffbot import DiffbotExtractTool, DiffbotExtractLoader
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+
# Single URL
-tool = DiffbotExtractTool()
+tool = DiffbotExtractTool(client=db)
page = tool.invoke({"url": "https://www.diffbot.com/products/extract/"})
-# Batch — yields one Document per URL, sync or async
-loader = DiffbotExtractLoader(urls=["https://example.com", "https://diffbot.com"])
+# Batch — yields one Document per URL (the same client is reused)
+loader = DiffbotExtractLoader(client=db, urls=["https://example.com", "https://diffbot.com"])
for doc in loader.lazy_load():
print(doc.metadata["title"], doc.page_content[:200])
```
`DiffbotExtractTool` returns a structured `{"error": ..., "errorCode": ...}` dict when Diffbot reports an extraction failure (200 with `errorCode`), so agents can react and try another URL instead of catching an exception. Auth / rate-limit errors propagate as `diffbot.errors.AuthError` / `RateLimitError`.
+## Crawl a site
+
+`DiffbotCrawlLoader` drives a Diffbot crawl job and yields one `Document` per crawled URL. The `page_content` is the URL itself (the crawl API surfaces URLs, not page contents) — chain it with `DiffbotExtractLoader` to fetch the content of each URL.
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotCrawlLoader
+
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+loader = DiffbotCrawlLoader(client=db, site="https://www.diffbot.com")
+for doc in loader.lazy_load():
+ print(doc.metadata["url"], doc.metadata["status"])
+```
+
## ChatDiffbot
```python
-from langchain_core.messages import HumanMessage
+import os
+from diffbot import Diffbot
+from langchain.messages import HumanMessage
from langchain_diffbot import ChatDiffbot
-llm = ChatDiffbot()
+llm = ChatDiffbot(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
for chunk in llm.stream([HumanMessage(content="What is the Diffbot Knowledge Graph?")]):
print(chunk.content, end="", flush=True)
@@ -115,11 +153,93 @@ for chunk in llm.stream([HumanMessage(content="What is the Diffbot Knowledge Gra
`_stream` / `_astream` are native — no thread-pool fallback. `.invoke()` aggregates the stream into a single message.
+To let a tool-calling agent *consult* Diffbot's LLM (rather than use it as the primary model), hand it `DiffbotAskTool` instead — it answers a natural-language question from the Knowledge Graph + live web and returns a synthesized string:
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotAskTool
+
+ask = DiffbotAskTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+print(ask.invoke({"question": "Who founded Diffbot, and when?"}))
+```
+
+## Agent tools
+
+Every Diffbot API is also exposed as an agent-callable `BaseTool`. Hand a tool-calling agent only the tools you want — they all share whatever client you pass. `DiffbotExtractTool` and `DiffbotAskTool` are shown above; the rest:
+
+### DiffbotWebSearchTool
+
+Runs a Diffbot web search and returns the result list — each item with `title`, `pageUrl`, `score`, and `content`. (Use `DiffbotWebSearchRetriever` when you want `Document` output instead.)
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotWebSearchTool
+
+tool = DiffbotWebSearchTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+results = tool.invoke({"text": "diffbot knowledge graph"})
+```
+
+### DiffbotKnowledgeGraphTool
+
+Runs a DQL query against the Knowledge Graph from within an agent and returns the raw response dict. (Use `DiffbotKnowledgeGraphRetriever` when you want `Document` output instead.)
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotKnowledgeGraphTool
+
+tool = DiffbotKnowledgeGraphTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+body = tool.invoke({"query": 'type:Organization name:"Diffbot"'})
+```
+
+### DiffbotEntitiesTool
+
+Identifies named entities and sentiment in text via Diffbot's NLP API. The returned entity IDs can be looked up in the Knowledge Graph (e.g. `id:or("E1","E2")`).
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotEntitiesTool
+
+tool = DiffbotEntitiesTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+result = tool.invoke({"text": "Diffbot was founded in Menlo Park."})
+```
+
+### Authoring DQL on the fly: DiffbotOntologyTool + DiffbotDQLProbeTool
+
+So an agent can build valid DQL instead of guessing field names, two tools wrap Diffbot's DQL-authoring helpers. The intended loop is **introspect (ontology) → probe → run (`DiffbotKnowledgeGraphTool`) → refine**.
+
+`DiffbotOntologyTool` navigates the KG ontology — discover real entity types, field paths, taxonomy, and enum values before querying. The ontology is fetched once over HTTP and cached on the tool instance for the rest of its lifetime.
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotOntologyTool
+
+tool = DiffbotOntologyTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+types = tool.invoke({"op": "types"})
+```
+
+`DiffbotDQLProbeTool` probes query variants at `size=0` (hit counts only), so an agent can check selectivity — not zero, not millions — before committing to a full query.
+
+```python
+import os
+from diffbot import Diffbot
+from langchain_diffbot import DiffbotDQLProbeTool
+
+tool = DiffbotDQLProbeTool(client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]))
+counts = tool.invoke({"queries": ['type:Organization name:"Diffbot"', "type:Person"]})
+```
+
## Using a retriever in a chain
The retrievers are standard `BaseRetriever`s, so they slot into LCEL like any other:
```python
+import os
+from diffbot import Diffbot
from langchain_anthropic import ChatAnthropic
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
@@ -127,6 +247,7 @@ from langchain_core.runnables import RunnablePassthrough
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
retriever = DiffbotKnowledgeGraphRetriever(
+ client=Diffbot(token=os.environ["DIFFBOT_API_TOKEN"]),
k=5,
fields=["id", "name", "homepageUri", "nbEmployees", "industries"],
)
@@ -153,19 +274,49 @@ chain = (
chain.invoke('type:Organization location.city.name:"Boston" industries:"Biotech"')
```
-## Bring-your-own-client
+## Sharing a client across components
-Every class accepts a pre-built `diffbot.Diffbot` (or `diffbot.DiffbotAsync`) via `client` / `async_client`. The package uses it as-is and **does not close it** — you own the lifecycle. This is the escape hatch for anything the SDK supports that's not re-exposed as a field (custom URLs, `transport=`, shared connection pools, custom headers).
+Because every component takes a client, you configure the SDK once and hand the same client to as many components as you like — they share its connection pool, and there's no per-call pool churn. Build the tools/retrievers you actually want and add only those to your agent; the client is the shared resource, not a bundle.
```python
+import os
from diffbot import Diffbot
-from langchain_diffbot import DiffbotKnowledgeGraphRetriever
+from langchain_diffbot import (
+ DiffbotKnowledgeGraphTool,
+ DiffbotAskTool,
+ DiffbotWebSearchRetriever,
+)
-# One client shared across many retriever calls (no per-call httpx pool churn)
-shared = Diffbot(token="...", timeout=60.0)
-retriever = DiffbotKnowledgeGraphRetriever(client=shared, k=5)
+# One client, configured once (timeout, transport, custom URLs, ...),
+# shared across every component.
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"], timeout=60.0)
+
+kg = DiffbotKnowledgeGraphTool(client=db)
+ask = DiffbotAskTool(client=db)
+web = DiffbotWebSearchRetriever(client=db, k=5)
+
+# `db.close()` when you're done — the components never close it for you.
```
+Anything the SDK supports (custom URLs, `transport=`, headers via a custom transport) is configured on the client you build — there's no second configuration surface to learn. For async, build a `diffbot.DiffbotAsync` and pass `async_client=` instead (or both, if a component is used both ways).
+
+## Components reference
+
+| Class | Abstraction | Import path |
+|-------|-------------|-------------|
+| `ChatDiffbot` | Chat model | `from langchain_diffbot import ChatDiffbot` |
+| `DiffbotKnowledgeGraphRetriever` | Retriever | `from langchain_diffbot import DiffbotKnowledgeGraphRetriever` |
+| `DiffbotWebSearchRetriever` | Retriever | `from langchain_diffbot import DiffbotWebSearchRetriever` |
+| `DiffbotExtractLoader` | Document loader | `from langchain_diffbot import DiffbotExtractLoader` |
+| `DiffbotCrawlLoader` | Document loader | `from langchain_diffbot import DiffbotCrawlLoader` |
+| `DiffbotExtractTool` | Tool | `from langchain_diffbot import DiffbotExtractTool` |
+| `DiffbotWebSearchTool` | Tool | `from langchain_diffbot import DiffbotWebSearchTool` |
+| `DiffbotKnowledgeGraphTool` | Tool | `from langchain_diffbot import DiffbotKnowledgeGraphTool` |
+| `DiffbotEntitiesTool` | Tool | `from langchain_diffbot import DiffbotEntitiesTool` |
+| `DiffbotAskTool` | Tool | `from langchain_diffbot import DiffbotAskTool` |
+| `DiffbotOntologyTool` | Tool | `from langchain_diffbot import DiffbotOntologyTool` |
+| `DiffbotDQLProbeTool` | Tool | `from langchain_diffbot import DiffbotDQLProbeTool` |
+
## Examples
The [`examples/`](./examples) folder has runnable demos:
@@ -191,3 +342,4 @@ Integration tests hit the live Diffbot API and require `DIFFBOT_API_TOKEN`:
```bash
uv run pytest tests/integration_tests
```
+
diff --git a/examples/README.md b/examples/README.md
index 06815f8..dbd7884 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,9 +1,17 @@
# `langchain-diffbot` examples
-A few ways to see the package in action, from a guided notebook to a CLI and a
-browser app. Each talks to the live Diffbot APIs (and the agent-based examples
-also use an Anthropic model). Every example below has its own README covering
-setup and how to run it.
+A few ways to see the package in action, from a browser app to a guided
+notebook and a CLI. Each talks to the live Diffbot APIs (and the agent-based
+examples also use an Anthropic model). Every example below has its own README
+covering setup and how to run it.
+
+## Web app
+
+[`dql_explorer/`](./dql_explorer) is a browser UI for the DQL-authoring loop:
+type a question in plain English, and an agent inspects the ontology, probes
+query variants, writes the DQL, and the results come back as a table. It also
+has an M&A / IPO dashboard. It's a FastAPI backend serving a React + TypeScript
+frontend, with optional LangSmith tracing.
## Notebook
@@ -24,11 +32,3 @@ setup and how to run it.
one-shot command-line tool: ask a company-research question in plain English and
the agent searches the Knowledge Graph and the live web, then cites the entity
IDs / URLs it used. Useful for shell scripting or quick spot checks.
-
-## Web app
-
-[`dql_explorer/`](./dql_explorer) is a browser UI for the DQL-authoring loop:
-type a question in plain English, and an agent inspects the ontology, probes
-query variants, writes the DQL, and the results come back as a table. It also
-has an M&A / IPO dashboard. It's a FastAPI backend serving a React + TypeScript
-frontend, with optional LangSmith tracing.
diff --git a/examples/company_research/cli.py b/examples/company_research/cli.py
index 6980e6b..6ca1c45 100644
--- a/examples/company_research/cli.py
+++ b/examples/company_research/cli.py
@@ -6,7 +6,7 @@
import sys
from dotenv import load_dotenv
-from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
+from langchain.messages import AIMessage, HumanMessage, ToolMessage
from company_research.agent import build_agent
diff --git a/examples/company_research/tools.py b/examples/company_research/tools.py
index c0059c5..8630209 100644
--- a/examples/company_research/tools.py
+++ b/examples/company_research/tools.py
@@ -19,11 +19,13 @@
from __future__ import annotations
+import os
from functools import lru_cache
from typing import Any
+from diffbot import Diffbot
from diffbot.errors import APIError
-from langchain_core.tools import tool
+from langchain.tools import tool
from langchain_diffbot import (
DiffbotDQLProbeTool,
@@ -52,15 +54,24 @@
_EXTRACT_CONTENT_CHARS = 4000
+@lru_cache(maxsize=1)
+def _db() -> Diffbot:
+ # One client shared across every Diffbot-backed tool below, so the whole
+ # agent run reuses a single connection pool. Lazy so importing this module
+ # doesn't require DIFFBOT_API_TOKEN. This agent is sync (the CLI calls
+ # `agent.invoke`), so a sync `Diffbot` is all we need.
+ return Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+
+
@lru_cache(maxsize=1)
def _kg_retriever() -> DiffbotKnowledgeGraphRetriever:
- # Lazy so importing this module doesn't require DIFFBOT_API_TOKEN.
- return DiffbotKnowledgeGraphRetriever(k=5, fields=_KG_FIELDS)
+ return DiffbotKnowledgeGraphRetriever(client=_db(), k=5, fields=_KG_FIELDS)
@lru_cache(maxsize=1)
def _web_retriever() -> DiffbotWebSearchRetriever:
return DiffbotWebSearchRetriever(
+ client=_db(),
k=_WEB_SEARCH_K,
fields=["title", "pageUrl", "score"],
)
@@ -68,18 +79,18 @@ def _web_retriever() -> DiffbotWebSearchRetriever:
@lru_cache(maxsize=1)
def _extract_tool() -> DiffbotExtractTool:
- return DiffbotExtractTool()
+ return DiffbotExtractTool(client=_db())
@lru_cache(maxsize=1)
def _ontology_tool() -> DiffbotOntologyTool:
# Cached so the fetched ontology is reused across the whole agent run.
- return DiffbotOntologyTool()
+ return DiffbotOntologyTool(client=_db())
@lru_cache(maxsize=1)
def _probe_tool() -> DiffbotDQLProbeTool:
- return DiffbotDQLProbeTool()
+ return DiffbotDQLProbeTool(client=_db())
@tool
diff --git a/examples/dql_explorer/README.md b/examples/dql_explorer/README.md
index 7694e95..1982726 100644
--- a/examples/dql_explorer/README.md
+++ b/examples/dql_explorer/README.md
@@ -1,12 +1,14 @@
# DQL Explorer
-A small web app with two tabs over the Diffbot Knowledge Graph:
+A small web app with three tabs over the Diffbot Knowledge Graph:
- **M&A / IPO Dashboard** — a parameterized view of recent acquisitions and IPOs,
broken down by industry, geography, and time with donut/bar charts.
- **DQL Builder** — type a question in plain English, and a LangChain agent turns
it into a valid [DQL](https://docs.diffbot.com/reference/dql-quickstart) query,
runs it, and shows the results as a table.
+- **Ask Diffbot** — type a question and Diffbot's own RAG LLM answers it directly,
+ streaming the response token by token.
## DQL Builder
@@ -34,6 +36,17 @@ queries are shown in the UI ("DQL behind this dashboard"). Tune the size floor
and date range, then **Refresh** to re-query. The charts are dependency-free SVG
(`charts.tsx`), so the example pulls in no charting library.
+## Ask Diffbot
+
+The package's `ChatDiffbot` — a LangChain `BaseChatModel` wrapping Diffbot's own
+LLM RAG endpoint. Where the DQL Builder *authors* a precise query (and so needs a
+tool-calling model like Claude), this tab just asks Diffbot's LLM, which is
+grounded in the Knowledge Graph and the live web. `ChatDiffbot.astream` streams
+tokens natively, so the backend (`POST /api/ask`) forwards each chunk to the
+browser as a [Server-Sent Event](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events)
+and the answer renders as it arrives — no buffering, no extra LLM provider, no API
+key beyond `DIFFBOT_API_TOKEN`.
+
## Prerequisites
- `DIFFBOT_API_TOKEN` and `ANTHROPIC_API_KEY` in the environment. Copy
@@ -92,14 +105,15 @@ When unset, the app works exactly the same — just no trace link.
dql_explorer/
├── agent.py # create_agent + DQLPlan structured output + ontology/probe tools
├── dashboard.py # M&A/IPO DQL templates + Python roll-up into chart breakdowns
-├── server.py # FastAPI: POST /api/query, POST /api/dashboard, serves the SPA
+├── server.py # FastAPI: POST /api/query, /api/dashboard, /api/ask; serves the SPA
├── projection.py # dot-path projection of KG entities into table rows
├── __main__.py # `python -m dql_explorer` → uvicorn
├── dev.sh # one-command live-reload dev (backend + Vite together)
└── web/ # React + TypeScript + Vite frontend (pnpm)
└── src/
- ├── App.tsx # tab shell (Dashboard / DQL Builder)
+ ├── App.tsx # tab shell (Dashboard / DQL Builder / Ask Diffbot)
├── Explorer.tsx # DQL Builder: plain-English → DQL table
+ ├── Ask.tsx # Ask Diffbot: ChatDiffbot answer, streamed via SSE
├── Dashboard.tsx # M&A/IPO controls + charts
└── charts.tsx # dependency-free SVG donut / bar charts
```
diff --git a/examples/dql_explorer/agent.py b/examples/dql_explorer/agent.py
index 5a4a05b..cb1ad4c 100644
--- a/examples/dql_explorer/agent.py
+++ b/examples/dql_explorer/agent.py
@@ -12,10 +12,11 @@
import os
from functools import lru_cache
+from diffbot import Diffbot
from diffbot.errors import APIError
from langchain.agents import create_agent
from langchain.chat_models import init_chat_model
-from langchain_core.tools import tool
+from langchain.tools import tool
from pydantic import BaseModel, Field
from langchain_diffbot import DiffbotDQLProbeTool, DiffbotOntologyTool
@@ -63,15 +64,24 @@ class DQLPlan(BaseModel):
_ONTOLOGY_MAX_ITEMS = 80
+@lru_cache(maxsize=1)
+def _db() -> Diffbot:
+ # Shared sync client for the authoring tools. The agent's @tool functions
+ # call `.invoke()` (sync) — even under the server's `ainvoke`, LangChain runs
+ # a sync tool in a thread — so these need a sync `Diffbot`. The KG query and
+ # the Ask tab run async and build a `DiffbotAsync` in server.py instead.
+ return Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+
+
@lru_cache(maxsize=1)
def _ontology_tool() -> DiffbotOntologyTool:
# Cached so the fetched ontology is reused across the whole process.
- return DiffbotOntologyTool()
+ return DiffbotOntologyTool(client=_db())
@lru_cache(maxsize=1)
def _probe_tool() -> DiffbotDQLProbeTool:
- return DiffbotDQLProbeTool()
+ return DiffbotDQLProbeTool(client=_db())
@tool
diff --git a/examples/dql_explorer/dashboard.py b/examples/dql_explorer/dashboard.py
index 8bdb133..14e7093 100644
--- a/examples/dql_explorer/dashboard.py
+++ b/examples/dql_explorer/dashboard.py
@@ -173,7 +173,9 @@ def _month_series(events: list[dict[str, Any]]) -> list[dict[str, Any]]:
return rows
-async def _build(min_employees: int, date_from: str, date_to: str) -> dict[str, Any]:
+async def _build(
+ kg: DiffbotKnowledgeGraphTool, min_employees: int, date_from: str, date_to: str
+) -> dict[str, Any]:
"""Run both queries (concurrently), aggregate, and shape the response."""
# Omit the headcount clause entirely at 0 — DQL has no "any value" wildcard.
emp = f" nbEmployees>{min_employees}" if min_employees > 0 else ""
@@ -183,7 +185,6 @@ async def _build(min_employees: int, date_from: str, date_to: str) -> dict[str,
)
ipo_query = f'type:Organization ipo.date>="{date_from}" ipo.date<="{date_to}"{emp}'
- kg = DiffbotKnowledgeGraphTool()
# The dashboard is deterministic (no model), and each KG call returns up to
# MAX_FETCH full entities — tracing those to LangSmith serializes tens of MB
# per call (and trips its 26 MB ingest cap with a 422). There's nothing worth
@@ -240,11 +241,15 @@ async def _build(min_employees: int, date_from: str, date_to: str) -> dict[str,
async def build_dashboard(
- min_employees: int, date_from: str, date_to: str
+ kg: DiffbotKnowledgeGraphTool, min_employees: int, date_from: str, date_to: str
) -> dict[str, Any]:
- """Build the dashboard payload, surfacing any failure as an `error` field."""
+ """Build the dashboard payload, surfacing any failure as an `error` field.
+
+ `kg` must carry an `async_client` — the two KG queries run on the event loop.
+ The caller owns the client so the dashboard reuses the app's connection pool.
+ """
try:
- return await _build(min_employees, date_from, date_to)
+ return await _build(kg, min_employees, date_from, date_to)
except Exception as exc: # surface to the UI instead of a 500
return {
"min_employees": min_employees,
diff --git a/examples/dql_explorer/server.py b/examples/dql_explorer/server.py
index 964500e..a13f477 100644
--- a/examples/dql_explorer/server.py
+++ b/examples/dql_explorer/server.py
@@ -8,24 +8,27 @@
from __future__ import annotations
+import json
import os
+from collections.abc import AsyncIterator
from functools import lru_cache
from pathlib import Path
from typing import Any
+from diffbot import DiffbotAsync
from diffbot.errors import APIError
from dotenv import load_dotenv
from fastapi import FastAPI
-from fastapi.responses import HTMLResponse
+from fastapi.responses import HTMLResponse, StreamingResponse
from fastapi.staticfiles import StaticFiles
-from langchain_core.messages import AIMessage, HumanMessage, ToolMessage
+from langchain.messages import AIMessage, HumanMessage, ToolMessage
from langchain_core.tracers.context import collect_runs
from pydantic import BaseModel, Field
from dql_explorer.agent import DQLPlan, build_dql_agent
from dql_explorer.dashboard import build_dashboard, default_range
from dql_explorer.projection import build_rows
-from langchain_diffbot import DiffbotKnowledgeGraphTool
+from langchain_diffbot import ChatDiffbot, DiffbotKnowledgeGraphTool
# Read examples/.env (DIFFBOT_API_TOKEN, ANTHROPIC_API_KEY, optional LANGSMITH_*).
load_dotenv()
@@ -45,9 +48,24 @@ def _agent():
return build_dql_agent()
+@lru_cache(maxsize=1)
+def _adb() -> DiffbotAsync:
+ # Shared async client for the endpoints that run on the event loop: the KG
+ # query (`_kg_tool().ainvoke`) and the Ask tab (`_chat().astream`). One pool
+ # serves both. The authoring tools run sync and build a `Diffbot` in agent.py.
+ return DiffbotAsync(token=os.environ["DIFFBOT_API_TOKEN"])
+
+
@lru_cache(maxsize=1)
def _kg_tool() -> DiffbotKnowledgeGraphTool:
- return DiffbotKnowledgeGraphTool()
+ return DiffbotKnowledgeGraphTool(async_client=_adb())
+
+
+@lru_cache(maxsize=1)
+def _chat() -> ChatDiffbot:
+ # Diffbot's own RAG LLM. Unlike the DQL Builder's Anthropic agent, this needs
+ # no API key beyond DIFFBOT_API_TOKEN — the graph is the model's knowledge.
+ return ChatDiffbot(async_client=_adb())
class QueryRequest(BaseModel):
@@ -57,6 +75,12 @@ class QueryRequest(BaseModel):
k: int = Field(default=DEFAULT_K, ge=1, le=100, description="Max rows to fetch.")
+class AskRequest(BaseModel):
+ """Body for `POST /api/ask`."""
+
+ question: str = Field(description="Plain-English question for Diffbot's RAG LLM.")
+
+
class DashboardRequest(BaseModel):
"""Body for `POST /api/dashboard`. Omitted fields fall back to defaults."""
@@ -170,12 +194,61 @@ async def dashboard(req: DashboardRequest) -> dict[str, Any]:
"""Build the M&A / IPO dashboard for a headcount floor and date window."""
default_from, default_to = default_range()
return await build_dashboard(
+ _kg_tool(),
min_employees=req.min_employees,
date_from=req.date_from or default_from,
date_to=req.date_to or default_to,
)
+def _sse(event: str, data: dict[str, Any]) -> str:
+ """Format one Server-Sent Event frame."""
+ return f"event: {event}\ndata: {json.dumps(data)}\n\n"
+
+
+@app.post("/api/ask")
+async def ask(req: AskRequest) -> StreamingResponse:
+ """Answer `question` with Diffbot's RAG LLM, streaming tokens as SSE.
+
+ This is the showcase for `ChatDiffbot`: where the DQL Builder authors a precise
+ query, this just asks Diffbot's own LLM, which is grounded in the Knowledge
+ Graph and the live web. `ChatDiffbot.astream` streams tokens natively, so we
+ forward each chunk to the browser as it arrives instead of buffering the answer.
+ """
+
+ async def gen() -> AsyncIterator[str]:
+ try:
+ async for chunk in _chat().astream([HumanMessage(content=req.question)]):
+ text = (
+ chunk.content
+ if isinstance(chunk.content, str)
+ else str(chunk.content)
+ )
+ if text:
+ yield _sse("token", {"text": text})
+ except APIError as exc:
+ yield _sse(
+ "error",
+ {
+ "message": (
+ f"Diffbot rejected the request ({exc.status_code}): "
+ f"{exc.message or 'see body'}."
+ )
+ },
+ )
+ except Exception as exc: # surface any failure to the UI instead of hanging
+ yield _sse("error", {"message": str(exc)})
+ else:
+ yield _sse("done", {})
+
+ # Disable proxy/nginx buffering so tokens reach the browser as they stream.
+ return StreamingResponse(
+ gen(),
+ media_type="text/event-stream",
+ headers={"Cache-Control": "no-cache", "X-Accel-Buffering": "no"},
+ )
+
+
# In dev (`./dev.sh` sets DQL_EXPLORER_RELOAD=1) the live UI is the Vite server
# on :5173 — the `dist/` this backend would serve is the last `pnpm build` and is
# stale the moment you edit any frontend file. Bounce :8000 over to Vite so an
diff --git a/examples/dql_explorer/web/package.json b/examples/dql_explorer/web/package.json
index 95d7c3f..f43c5e3 100644
--- a/examples/dql_explorer/web/package.json
+++ b/examples/dql_explorer/web/package.json
@@ -10,9 +10,12 @@
},
"dependencies": {
"react": "^19.0.0",
- "react-dom": "^19.0.0"
+ "react-dom": "^19.0.0",
+ "react-markdown": "^10.1.0",
+ "remark-gfm": "^4.0.1"
},
"devDependencies": {
+ "@tailwindcss/typography": "^0.5.20",
"@tailwindcss/vite": "^4.0.0",
"@types/react": "^19.0.0",
"@types/react-dom": "^19.0.0",
diff --git a/examples/dql_explorer/web/pnpm-lock.yaml b/examples/dql_explorer/web/pnpm-lock.yaml
index 07f7ac1..7ced975 100644
--- a/examples/dql_explorer/web/pnpm-lock.yaml
+++ b/examples/dql_explorer/web/pnpm-lock.yaml
@@ -14,7 +14,16 @@ importers:
react-dom:
specifier: ^19.0.0
version: 19.2.7(react@19.2.7)
+ react-markdown:
+ specifier: ^10.1.0
+ version: 10.1.0(@types/react@19.2.16)(react@19.2.7)
+ remark-gfm:
+ specifier: ^4.0.1
+ version: 4.0.1
devDependencies:
+ '@tailwindcss/typography':
+ specifier: ^0.5.20
+ version: 0.5.20(tailwindcss@4.3.0)
'@tailwindcss/vite':
specifier: ^4.0.0
version: 4.3.0(vite@6.4.3(jiti@2.7.0)(lightningcss@1.32.0))
@@ -524,6 +533,11 @@ packages:
resolution: {integrity: sha512-F7HZGBeN9I0/AuuJS5PwcD8xayx5ri5GhjYUDBEVYUkexyA/giwbDNjRVrxSezE3T250OU2K/wp/ltWx3UOefg==}
engines: {node: '>= 20'}
+ '@tailwindcss/typography@0.5.20':
+ resolution: {integrity: sha512-hwbzQuNUfcPvbegQFatVPl/MY/tcM9KLl963hQ5laJKPh81TEZ1+dNG9PirGvcaDBkp+BCshExAyKVPW91dozw==}
+ peerDependencies:
+ tailwindcss: '>=3.0.0 || >=4.0.0 || insiders'
+
'@tailwindcss/vite@4.3.0':
resolution: {integrity: sha512-t6J3OrB5Fc0ExuhohouH0fWUGMYL6PTLhW+E7zIk/pdbnJARZDCwjBznFnkh5ynRnIRSI4YjtTH0t6USjJISrw==}
peerDependencies:
@@ -541,9 +555,24 @@ packages:
'@types/babel__traverse@7.28.0':
resolution: {integrity: sha512-8PvcXf70gTDZBgt9ptxJ8elBeBjcLOAcOtoO/mPJjtji1+CdGbHgm77om1GrsPxsiE+uXIpNSK64UYaIwQXd4Q==}
+ '@types/debug@4.1.13':
+ resolution: {integrity: sha512-KSVgmQmzMwPlmtljOomayoR89W4FynCAi3E8PPs7vmDVPe84hT+vGPKkJfThkmXs0x0jAaa9U8uW8bbfyS2fWw==}
+
+ '@types/estree-jsx@1.0.5':
+ resolution: {integrity: sha512-52CcUVNFyfb1A2ALocQw/Dd1BQFNmSdkuC3BkZ6iqhdMfQz7JWOFRuJFloOzjk+6WijU56m9oKXFAXc7o3Towg==}
+
'@types/estree@1.0.9':
resolution: {integrity: sha512-GhdPgy1el4/ImP05X05Uw4cw2/M93BCUmnEvWZNStlCzEKME4Fkk+YpoA5OiHNQmoS7Cafb8Xa3Pya8m1Qrzeg==}
+ '@types/hast@3.0.4':
+ resolution: {integrity: sha512-WPs+bbQw5aCj+x6laNGWLH3wviHtoCv/P3+otBhbOhJgG8qtpdAMlTCxLtsTWA7LH1Oh/bFCHsBn0TPS5m30EQ==}
+
+ '@types/mdast@4.0.4':
+ resolution: {integrity: sha512-kGaNbPh1k7AFzgpud/gMdvIm5xuECykRR+JnWKQno9TAXVa6WIVCGTPvYGekIDL4uwCZQSYbUxNBSb1aUo79oA==}
+
+ '@types/ms@2.1.0':
+ resolution: {integrity: sha512-GsCCIZDE/p3i96vtEqx+7dBUGXrc7zeSK3wwPHIaRThS+9OhWIXRqzs4d6k1SVU8g91DrNRWxWUGhp5KXQb2VA==}
+
'@types/react-dom@19.2.3':
resolution: {integrity: sha512-jp2L/eY6fn+KgVVQAOqYItbF0VY/YApe5Mz2F0aykSO8gx31bYCZyvSeYxCHKvzHG5eZjc+zyaS5BrBWya2+kQ==}
peerDependencies:
@@ -552,12 +581,24 @@ packages:
'@types/react@19.2.16':
resolution: {integrity: sha512-esJiCAnl0kfpNdE69f3So4WJUXy95dLZydX0KwK46riIHDzHM7O9Vtf9xCHW0PXIqvgqNrswl522kA/5yx+F4w==}
+ '@types/unist@2.0.11':
+ resolution: {integrity: sha512-CmBKiL6NNo/OqgmMn95Fk9Whlp2mtvIv+KNpQKN2F4SjvrEesubTRWGYSg+BnWZOnlCaSTU1sMpsBOzgbYhnsA==}
+
+ '@types/unist@3.0.3':
+ resolution: {integrity: sha512-ko/gIFJRv177XgZsZcBwnqJN5x/Gien8qNOn0D5bQU/zAzVf9Zt3BlcUiLqhV9y4ARk0GbT3tnUiPNgnTXzc/Q==}
+
+ '@ungap/structured-clone@1.3.1':
+ resolution: {integrity: sha512-mUFwbeTqrVgDQxFveS+df2yfap6iuP20NAKAsBt5jDEoOTDew+zwLAOilHCeQJOVSvmgCX4ogqIrA0mnyr08yQ==}
+
'@vitejs/plugin-react@4.7.0':
resolution: {integrity: sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==}
engines: {node: ^14.18.0 || >=16.0.0}
peerDependencies:
vite: ^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0
+ bail@2.0.2:
+ resolution: {integrity: sha512-0xO6mYd7JB2YesxDKplafRpsiOzPt9V02ddPCLbY1xYGPOX24NTyN50qnUxgCPcSoYMhKpAuBTjQoRZCAkUDRw==}
+
baseline-browser-mapping@2.10.33:
resolution: {integrity: sha512-bA6+tcSLpz2tIEdDXZPpPTIuxBcC4+w6SieaYyfigIa4h8GlFxbA17v22Vx3JUtuZQj9SgOsnbK+aTBzyDyEuw==}
engines: {node: '>=6.0.0'}
@@ -571,9 +612,32 @@ packages:
caniuse-lite@1.0.30001793:
resolution: {integrity: sha512-iwSsYWaCOoh26cV8NwNRViHlrfUvYsHDfRVcbtmw0Kg6PJIZZXwMkj1442FYLBGkeUf1juAsU3DTfxW579mrPA==}
+ ccount@2.0.1:
+ resolution: {integrity: sha512-eyrF0jiFpY+3drT6383f1qhkbGsLSifNAjA61IUjZjmLCWjItY6LB9ft9YhoDgwfmclB2zhu51Lc7+95b8NRAg==}
+
+ character-entities-html4@2.1.0:
+ resolution: {integrity: sha512-1v7fgQRj6hnSwFpq1Eu0ynr/CDEw0rXo2B61qXrLNdHZmPKgb7fqS1a2JwF0rISo9q77jDI8VMEHoApn8qDoZA==}
+
+ character-entities-legacy@3.0.0:
+ resolution: {integrity: sha512-RpPp0asT/6ufRm//AJVwpViZbGM/MkjQFxJccQRHmISF/22NBtsHqAWmL+/pmkPWoIUJdWyeVleTl1wydHATVQ==}
+
+ character-entities@2.0.2:
+ resolution: {integrity: sha512-shx7oQ0Awen/BRIdkjkvz54PnEEI/EjwXDSIZp86/KKdbafHh1Df/RYGBhn4hbe2+uKC9FnT5UCEdyPz3ai9hQ==}
+
+ character-reference-invalid@2.0.1:
+ resolution: {integrity: sha512-iBZ4F4wRbyORVsu0jPV7gXkOsGYjGHPmAyv+HiHG8gi5PtC9KI2j1+v8/tlibRvjoWX027ypmG/n0HtO5t7unw==}
+
+ comma-separated-tokens@2.0.3:
+ resolution: {integrity: sha512-Fu4hJdvzeylCfQPp9SGWidpzrMs7tTrlu6Vb8XGaRGck8QSNZJJp538Wrb60Lax4fPwR64ViY468OIUTbRlGZg==}
+
convert-source-map@2.0.0:
resolution: {integrity: sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==}
+ cssesc@3.0.0:
+ resolution: {integrity: sha512-/Tb/JcjK111nNScGob5MNtsntNM1aCNUDipB/TkwZFhyDrrE47SOx/18wF2bbjgc3ZzCSKW1T5nt5EbFoAz/Vg==}
+ engines: {node: '>=4'}
+ hasBin: true
+
csstype@3.2.3:
resolution: {integrity: sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==}
@@ -586,10 +650,20 @@ packages:
supports-color:
optional: true
+ decode-named-character-reference@1.3.0:
+ resolution: {integrity: sha512-GtpQYB283KrPp6nRw50q3U9/VfOutZOe103qlN7BPP6Ad27xYnOIWv4lPzo8HCAL+mMZofJ9KEy30fq6MfaK6Q==}
+
+ dequal@2.0.3:
+ resolution: {integrity: sha512-0je+qPKHEMohvfRTCEo3CrPG6cAzAYgmzKyxRiYSSDkS6eGJdyVJm7WaYA5ECaAD9wLB2T4EEeymA5aFVcYXCA==}
+ engines: {node: '>=6'}
+
detect-libc@2.1.2:
resolution: {integrity: sha512-Btj2BOOO83o3WyH59e8MgXsxEQVcarkUOpEYrubB0urwnN10yQ364rsiByU11nZlqWYZm05i/of7io4mzihBtQ==}
engines: {node: '>=8'}
+ devlop@1.1.0:
+ resolution: {integrity: sha512-RWmIqhcFf1lRYBvNmr7qTNuyCt/7/ns2jbpp1+PalgE/rDQcBT0fioSMUpJ93irlUhC5hrg4cYqe6U+0ImW0rA==}
+
electron-to-chromium@1.5.364:
resolution: {integrity: sha512-G/dYE3+AYhyHwzTwg8UbnXf7zqMERYh7l2jJ3QujhFsH8agSYwtnGAR2aZ7f0AakIKJXd5En/Hre4igIUrdlYw==}
@@ -606,6 +680,16 @@ packages:
resolution: {integrity: sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==}
engines: {node: '>=6'}
+ escape-string-regexp@5.0.0:
+ resolution: {integrity: sha512-/veY75JbMK4j1yjvuUxuVsiS/hr/4iHs9FTT6cgTexxdE0Ly/glccBAkloH/DofkjRbZU3bnoj38mOmhkZ0lHw==}
+ engines: {node: '>=12'}
+
+ estree-util-is-identifier-name@3.0.0:
+ resolution: {integrity: sha512-hFtqIDZTIUZ9BXLb8y4pYGyk6+wekIivNVTcmvk8NoOh+VeRn5y6cEHzbURrWbfp1fIqdVipilzj+lfaadNZmg==}
+
+ extend@3.0.2:
+ resolution: {integrity: sha512-fjquC59cD7CyW6urNXK0FBufkZcoiGG80wTuPujX590cB5Ttln20E2UB4S/WARVqhXffZl2LNgS+gQdPIIim/g==}
+
fdir@6.5.0:
resolution: {integrity: sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg==}
engines: {node: '>=12.0.0'}
@@ -627,6 +711,34 @@ packages:
graceful-fs@4.2.11:
resolution: {integrity: sha512-RbJ5/jmFcNNCcDV5o9eTnBLJ/HszWV0P73bc+Ff4nS/rJj+YaS6IGyiOL0VoBYX+l1Wrl3k63h/KrH+nhJ0XvQ==}
+ hast-util-to-jsx-runtime@2.3.6:
+ resolution: {integrity: sha512-zl6s8LwNyo1P9uw+XJGvZtdFF1GdAkOg8ujOw+4Pyb76874fLps4ueHXDhXWdk6YHQ6OgUtinliG7RsYvCbbBg==}
+
+ hast-util-whitespace@3.0.0:
+ resolution: {integrity: sha512-88JUN06ipLwsnv+dVn+OIYOvAuvBMy/Qoi6O7mQHxdPXpjy+Cd6xRkWwux7DKO+4sYILtLBRIKgsdpS2gQc7qw==}
+
+ html-url-attributes@3.0.1:
+ resolution: {integrity: sha512-ol6UPyBWqsrO6EJySPz2O7ZSr856WDrEzM5zMqp+FJJLGMW35cLYmmZnl0vztAZxRUoNZJFTCohfjuIJ8I4QBQ==}
+
+ inline-style-parser@0.2.7:
+ resolution: {integrity: sha512-Nb2ctOyNR8DqQoR0OwRG95uNWIC0C1lCgf5Naz5H6Ji72KZ8OcFZLz2P5sNgwlyoJ8Yif11oMuYs5pBQa86csA==}
+
+ is-alphabetical@2.0.1:
+ resolution: {integrity: sha512-FWyyY60MeTNyeSRpkM2Iry0G9hpr7/9kD40mD/cGQEuilcZYS4okz8SN2Q6rLCJ8gbCt6fN+rC+6tMGS99LaxQ==}
+
+ is-alphanumerical@2.0.1:
+ resolution: {integrity: sha512-hmbYhX/9MUMF5uh7tOXyK/n0ZvWpad5caBA17GsC6vyuCqaWliRG5K1qS9inmUhEMaOBIW7/whAnSwveW/LtZw==}
+
+ is-decimal@2.0.1:
+ resolution: {integrity: sha512-AAB9hiomQs5DXWcRB1rqsxGUstbRroFOPPVAomNk/3XHR5JyEZChOyTWe2oayKnsSsr/kcGqF+z6yuH6HHpN0A==}
+
+ is-hexadecimal@2.0.1:
+ resolution: {integrity: sha512-DgZQp241c8oO6cA1SbTEWiXeoxV42vlcJxgH+B3hi1AiqqKruZR3ZGF8In3fj4+/y/7rHvlOZLZtgJ/4ttYGZg==}
+
+ is-plain-obj@4.1.0:
+ resolution: {integrity: sha512-+Pgi+vMuUNkJyExiMBt5IlFoMyKnr5zhJ4Uspz58WOhBF5QoIZkFyNHIbBAtHwzVAgk5RtndVNsDRN61/mmDqg==}
+ engines: {node: '>=12'}
+
jiti@2.7.0:
resolution: {integrity: sha512-AC/7JofJvZGrrneWNaEnJeOLUx+JlGt7tNa0wZiRPT4MY1wmfKjt2+6O2p2uz2+skll8OZZmJMNqeke7kKbNgQ==}
hasBin: true
@@ -718,12 +830,147 @@ packages:
resolution: {integrity: sha512-NXYBzinNrblfraPGyrbPoD19C1h9lfI/1mzgWYvXUTe414Gz/X1FD2XBZSZM7rRTrMA8JL3OtAaGifrIKhQ5yQ==}
engines: {node: '>= 12.0.0'}
+ longest-streak@3.1.0:
+ resolution: {integrity: sha512-9Ri+o0JYgehTaVBBDoMqIl8GXtbWg711O3srftcHhZ0dqnETqLaoIK0x17fUw9rFSlK/0NlsKe0Ahhyl5pXE2g==}
+
lru-cache@5.1.1:
resolution: {integrity: sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==}
magic-string@0.30.21:
resolution: {integrity: sha512-vd2F4YUyEXKGcLHoq+TEyCjxueSeHnFxyyjNp80yg0XV4vUhnDer/lvvlqM/arB5bXQN5K2/3oinyCRyx8T2CQ==}
+ markdown-table@3.0.4:
+ resolution: {integrity: sha512-wiYz4+JrLyb/DqW2hkFJxP7Vd7JuTDm77fvbM8VfEQdmSMqcImWeeRbHwZjBjIFki/VaMK2BhFi7oUUZeM5bqw==}
+
+ mdast-util-find-and-replace@3.0.2:
+ resolution: {integrity: sha512-Tmd1Vg/m3Xz43afeNxDIhWRtFZgM2VLyaf4vSTYwudTyeuTneoL3qtWMA5jeLyz/O1vDJmmV4QuScFCA2tBPwg==}
+
+ mdast-util-from-markdown@2.0.3:
+ resolution: {integrity: sha512-W4mAWTvSlKvf8L6J+VN9yLSqQ9AOAAvHuoDAmPkz4dHf553m5gVj2ejadHJhoJmcmxEnOv6Pa8XJhpxE93kb8Q==}
+
+ mdast-util-gfm-autolink-literal@2.0.1:
+ resolution: {integrity: sha512-5HVP2MKaP6L+G6YaxPNjuL0BPrq9orG3TsrZ9YXbA3vDw/ACI4MEsnoDpn6ZNm7GnZgtAcONJyPhOP8tNJQavQ==}
+
+ mdast-util-gfm-footnote@2.1.0:
+ resolution: {integrity: sha512-sqpDWlsHn7Ac9GNZQMeUzPQSMzR6Wv0WKRNvQRg0KqHh02fpTz69Qc1QSseNX29bhz1ROIyNyxExfawVKTm1GQ==}
+
+ mdast-util-gfm-strikethrough@2.0.0:
+ resolution: {integrity: sha512-mKKb915TF+OC5ptj5bJ7WFRPdYtuHv0yTRxK2tJvi+BDqbkiG7h7u/9SI89nRAYcmap2xHQL9D+QG/6wSrTtXg==}
+
+ mdast-util-gfm-table@2.0.0:
+ resolution: {integrity: sha512-78UEvebzz/rJIxLvE7ZtDd/vIQ0RHv+3Mh5DR96p7cS7HsBhYIICDBCu8csTNWNO6tBWfqXPWekRuj2FNOGOZg==}
+
+ mdast-util-gfm-task-list-item@2.0.0:
+ resolution: {integrity: sha512-IrtvNvjxC1o06taBAVJznEnkiHxLFTzgonUdy8hzFVeDun0uTjxxrRGVaNFqkU1wJR3RBPEfsxmU6jDWPofrTQ==}
+
+ mdast-util-gfm@3.1.0:
+ resolution: {integrity: sha512-0ulfdQOM3ysHhCJ1p06l0b0VKlhU0wuQs3thxZQagjcjPrlFRqY215uZGHHJan9GEAXd9MbfPjFJz+qMkVR6zQ==}
+
+ mdast-util-mdx-expression@2.0.1:
+ resolution: {integrity: sha512-J6f+9hUp+ldTZqKRSg7Vw5V6MqjATc+3E4gf3CFNcuZNWD8XdyI6zQ8GqH7f8169MM6P7hMBRDVGnn7oHB9kXQ==}
+
+ mdast-util-mdx-jsx@3.2.0:
+ resolution: {integrity: sha512-lj/z8v0r6ZtsN/cGNNtemmmfoLAFZnjMbNyLzBafjzikOM+glrjNHPlf6lQDOTccj9n5b0PPihEBbhneMyGs1Q==}
+
+ mdast-util-mdxjs-esm@2.0.1:
+ resolution: {integrity: sha512-EcmOpxsZ96CvlP03NghtH1EsLtr0n9Tm4lPUJUBccV9RwUOneqSycg19n5HGzCf+10LozMRSObtVr3ee1WoHtg==}
+
+ mdast-util-phrasing@4.1.0:
+ resolution: {integrity: sha512-TqICwyvJJpBwvGAMZjj4J2n0X8QWp21b9l0o7eXyVJ25YNWYbJDVIyD1bZXE6WtV6RmKJVYmQAKWa0zWOABz2w==}
+
+ mdast-util-to-hast@13.2.1:
+ resolution: {integrity: sha512-cctsq2wp5vTsLIcaymblUriiTcZd0CwWtCbLvrOzYCDZoWyMNV8sZ7krj09FSnsiJi3WVsHLM4k6Dq/yaPyCXA==}
+
+ mdast-util-to-markdown@2.1.2:
+ resolution: {integrity: sha512-xj68wMTvGXVOKonmog6LwyJKrYXZPvlwabaryTjLh9LuvovB/KAH+kvi8Gjj+7rJjsFi23nkUxRQv1KqSroMqA==}
+
+ mdast-util-to-string@4.0.0:
+ resolution: {integrity: sha512-0H44vDimn51F0YwvxSJSm0eCDOJTRlmN0R1yBh4HLj9wiV1Dn0QoXGbvFAWj2hSItVTlCmBF1hqKlIyUBVFLPg==}
+
+ micromark-core-commonmark@2.0.3:
+ resolution: {integrity: sha512-RDBrHEMSxVFLg6xvnXmb1Ayr2WzLAWjeSATAoxwKYJV94TeNavgoIdA0a9ytzDSVzBy2YKFK+emCPOEibLeCrg==}
+
+ micromark-extension-gfm-autolink-literal@2.1.0:
+ resolution: {integrity: sha512-oOg7knzhicgQ3t4QCjCWgTmfNhvQbDDnJeVu9v81r7NltNCVmhPy1fJRX27pISafdjL+SVc4d3l48Gb6pbRypw==}
+
+ micromark-extension-gfm-footnote@2.1.0:
+ resolution: {integrity: sha512-/yPhxI1ntnDNsiHtzLKYnE3vf9JZ6cAisqVDauhp4CEHxlb4uoOTxOCJ+9s51bIB8U1N1FJ1RXOKTIlD5B/gqw==}
+
+ micromark-extension-gfm-strikethrough@2.1.0:
+ resolution: {integrity: sha512-ADVjpOOkjz1hhkZLlBiYA9cR2Anf8F4HqZUO6e5eDcPQd0Txw5fxLzzxnEkSkfnD0wziSGiv7sYhk/ktvbf1uw==}
+
+ micromark-extension-gfm-table@2.1.1:
+ resolution: {integrity: sha512-t2OU/dXXioARrC6yWfJ4hqB7rct14e8f7m0cbI5hUmDyyIlwv5vEtooptH8INkbLzOatzKuVbQmAYcbWoyz6Dg==}
+
+ micromark-extension-gfm-tagfilter@2.0.0:
+ resolution: {integrity: sha512-xHlTOmuCSotIA8TW1mDIM6X2O1SiX5P9IuDtqGonFhEK0qgRI4yeC6vMxEV2dgyr2TiD+2PQ10o+cOhdVAcwfg==}
+
+ micromark-extension-gfm-task-list-item@2.1.0:
+ resolution: {integrity: sha512-qIBZhqxqI6fjLDYFTBIa4eivDMnP+OZqsNwmQ3xNLE4Cxwc+zfQEfbs6tzAo2Hjq+bh6q5F+Z8/cksrLFYWQQw==}
+
+ micromark-extension-gfm@3.0.0:
+ resolution: {integrity: sha512-vsKArQsicm7t0z2GugkCKtZehqUm31oeGBV/KVSorWSy8ZlNAv7ytjFhvaryUiCUJYqs+NoE6AFhpQvBTM6Q4w==}
+
+ micromark-factory-destination@2.0.1:
+ resolution: {integrity: sha512-Xe6rDdJlkmbFRExpTOmRj9N3MaWmbAgdpSrBQvCFqhezUn4AHqJHbaEnfbVYYiexVSs//tqOdY/DxhjdCiJnIA==}
+
+ micromark-factory-label@2.0.1:
+ resolution: {integrity: sha512-VFMekyQExqIW7xIChcXn4ok29YE3rnuyveW3wZQWWqF4Nv9Wk5rgJ99KzPvHjkmPXF93FXIbBp6YdW3t71/7Vg==}
+
+ micromark-factory-space@2.0.1:
+ resolution: {integrity: sha512-zRkxjtBxxLd2Sc0d+fbnEunsTj46SWXgXciZmHq0kDYGnck/ZSGj9/wULTV95uoeYiK5hRXP2mJ98Uo4cq/LQg==}
+
+ micromark-factory-title@2.0.1:
+ resolution: {integrity: sha512-5bZ+3CjhAd9eChYTHsjy6TGxpOFSKgKKJPJxr293jTbfry2KDoWkhBb6TcPVB4NmzaPhMs1Frm9AZH7OD4Cjzw==}
+
+ micromark-factory-whitespace@2.0.1:
+ resolution: {integrity: sha512-Ob0nuZ3PKt/n0hORHyvoD9uZhr+Za8sFoP+OnMcnWK5lngSzALgQYKMr9RJVOWLqQYuyn6ulqGWSXdwf6F80lQ==}
+
+ micromark-util-character@2.1.1:
+ resolution: {integrity: sha512-wv8tdUTJ3thSFFFJKtpYKOYiGP2+v96Hvk4Tu8KpCAsTMs6yi+nVmGh1syvSCsaxz45J6Jbw+9DD6g97+NV67Q==}
+
+ micromark-util-chunked@2.0.1:
+ resolution: {integrity: sha512-QUNFEOPELfmvv+4xiNg2sRYeS/P84pTW0TCgP5zc9FpXetHY0ab7SxKyAQCNCc1eK0459uoLI1y5oO5Vc1dbhA==}
+
+ micromark-util-classify-character@2.0.1:
+ resolution: {integrity: sha512-K0kHzM6afW/MbeWYWLjoHQv1sgg2Q9EccHEDzSkxiP/EaagNzCm7T/WMKZ3rjMbvIpvBiZgwR3dKMygtA4mG1Q==}
+
+ micromark-util-combine-extensions@2.0.1:
+ resolution: {integrity: sha512-OnAnH8Ujmy59JcyZw8JSbK9cGpdVY44NKgSM7E9Eh7DiLS2E9RNQf0dONaGDzEG9yjEl5hcqeIsj4hfRkLH/Bg==}
+
+ micromark-util-decode-numeric-character-reference@2.0.2:
+ resolution: {integrity: sha512-ccUbYk6CwVdkmCQMyr64dXz42EfHGkPQlBj5p7YVGzq8I7CtjXZJrubAYezf7Rp+bjPseiROqe7G6foFd+lEuw==}
+
+ micromark-util-decode-string@2.0.1:
+ resolution: {integrity: sha512-nDV/77Fj6eH1ynwscYTOsbK7rR//Uj0bZXBwJZRfaLEJ1iGBR6kIfNmlNqaqJf649EP0F3NWNdeJi03elllNUQ==}
+
+ micromark-util-encode@2.0.1:
+ resolution: {integrity: sha512-c3cVx2y4KqUnwopcO9b/SCdo2O67LwJJ/UyqGfbigahfegL9myoEFoDYZgkT7f36T0bLrM9hZTAaAyH+PCAXjw==}
+
+ micromark-util-html-tag-name@2.0.1:
+ resolution: {integrity: sha512-2cNEiYDhCWKI+Gs9T0Tiysk136SnR13hhO8yW6BGNyhOC4qYFnwF1nKfD3HFAIXA5c45RrIG1ub11GiXeYd1xA==}
+
+ micromark-util-normalize-identifier@2.0.1:
+ resolution: {integrity: sha512-sxPqmo70LyARJs0w2UclACPUUEqltCkJ6PhKdMIDuJ3gSf/Q+/GIe3WKl0Ijb/GyH9lOpUkRAO2wp0GVkLvS9Q==}
+
+ micromark-util-resolve-all@2.0.1:
+ resolution: {integrity: sha512-VdQyxFWFT2/FGJgwQnJYbe1jjQoNTS4RjglmSjTUlpUMa95Htx9NHeYW4rGDJzbjvCsl9eLjMQwGeElsqmzcHg==}
+
+ micromark-util-sanitize-uri@2.0.1:
+ resolution: {integrity: sha512-9N9IomZ/YuGGZZmQec1MbgxtlgougxTodVwDzzEouPKo3qFWvymFHWcnDi2vzV1ff6kas9ucW+o3yzJK9YB1AQ==}
+
+ micromark-util-subtokenize@2.1.0:
+ resolution: {integrity: sha512-XQLu552iSctvnEcgXw6+Sx75GflAPNED1qx7eBJ+wydBb2KCbRZe+NwvIEEMM83uml1+2WSXpBAcp9IUCgCYWA==}
+
+ micromark-util-symbol@2.0.1:
+ resolution: {integrity: sha512-vs5t8Apaud9N28kgCrRUdEed4UJ+wWNvicHLPxCa9ENlYuAY31M0ETy5y1vA33YoNPDFTghEbnh6efaE8h4x0Q==}
+
+ micromark-util-types@2.0.2:
+ resolution: {integrity: sha512-Yw0ECSpJoViF1qTU4DC6NwtC4aWGt1EkzaQB8KPPyCRR8z9TWeV0HbEFGTO+ZY1wB22zmxnJqhPyTpOVCpeHTA==}
+
+ micromark@4.0.2:
+ resolution: {integrity: sha512-zpe98Q6kvavpCr1NPVSCMebCKfD7CA2NqZ+rykeNhONIJBpc1tFKt9hucLGwha3jNTNI8lHpctWJWoimVF4PfA==}
+
ms@2.1.3:
resolution: {integrity: sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==}
@@ -736,6 +983,9 @@ packages:
resolution: {integrity: sha512-GYVXHE2KnrzAfsAjl4uP++evGFCrAU1jta4ubEjIG7YWt/64Gqv66a30yKwWczVjA6j3bM4nBwH7Pk1JmDHaxQ==}
engines: {node: '>=18'}
+ parse-entities@4.0.2:
+ resolution: {integrity: sha512-GG2AQYWoLgL877gQIKeRPGO1xF9+eG1ujIb5soS5gPvLQ1y2o8FL90w2QWNdf9I361Mpp7726c+lj3U0qK1uGw==}
+
picocolors@1.1.1:
resolution: {integrity: sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==}
@@ -743,15 +993,28 @@ packages:
resolution: {integrity: sha512-QP88BAKvMam/3NxH6vj2o21R6MjxZUAd6nlwAS/pnGvN9IVLocLHxGYIzFhg6fUQ+5th6P4dv4eW9jX3DSIj7A==}
engines: {node: '>=12'}
+ postcss-selector-parser@6.0.10:
+ resolution: {integrity: sha512-IQ7TZdoaqbT+LCpShg46jnZVlhWD2w6iQYAcYXfHARZ7X1t/UGhhceQDs5X0cGqKvYlHNOuv7Oa1xmb0oQuA3w==}
+ engines: {node: '>=4'}
+
postcss@8.5.15:
resolution: {integrity: sha512-FfR8sjd4em2T6fb3I2MwAJU7HWVMr9zba+enmQeeWFfCbm+UOC/0X4DS8XtpUTMwWMGbjKYP7xjfNekzyGmB3A==}
engines: {node: ^10 || ^12 || >=14}
+ property-information@7.2.0:
+ resolution: {integrity: sha512-IAtzIB6sUiWaJYrX9smp3V46pBGbBeLFRGdh25kg1334VcBlD8HzhPeNIWQH9zhGmo2itIe25EHt9dQP7G5hmg==}
+
react-dom@19.2.7:
resolution: {integrity: sha512-t0BRVXvbiE/o20Hfw669rLbMCDWtYZLvmJigy2f0MxsXF+71pxhR3xOkspmsO8h3ZlNzyibAmtCa3l4lYKk6gQ==}
peerDependencies:
react: ^19.2.7
+ react-markdown@10.1.0:
+ resolution: {integrity: sha512-qKxVopLT/TyA6BX3Ue5NwabOsAzm0Q7kAPwq6L+wWDwisYs7R8vZ0nRXqq6rkueboxpkjvLGU9fWifiX/ZZFxQ==}
+ peerDependencies:
+ '@types/react': '>=18'
+ react: '>=18'
+
react-refresh@0.17.0:
resolution: {integrity: sha512-z6F7K9bV85EfseRCp2bzrpyQ0Gkw1uLoCel9XBVWPg/TjRj94SkJzUTGfOa4bs7iJvBWtQG0Wq7wnI0syw3EBQ==}
engines: {node: '>=0.10.0'}
@@ -760,6 +1023,18 @@ packages:
resolution: {integrity: sha512-HNe9WslTbXmFK8o8cmwgAeJFSBvt1bPdHCVKtaaV+WlAN36mpT4hcRpwbf3fY56ar2oIXzsBpOAiIRHAdY0OlQ==}
engines: {node: '>=0.10.0'}
+ remark-gfm@4.0.1:
+ resolution: {integrity: sha512-1quofZ2RQ9EWdeN34S79+KExV1764+wCUGop5CPL1WGdD0ocPpu91lzPGbwWMECpEpd42kJGQwzRfyov9j4yNg==}
+
+ remark-parse@11.0.0:
+ resolution: {integrity: sha512-FCxlKLNGknS5ba/1lmpYijMUzX2esxW5xQqjWxw2eHFfS2MSdaHVINFmhjo+qN1WhZhNimq0dZATN9pH0IDrpA==}
+
+ remark-rehype@11.1.2:
+ resolution: {integrity: sha512-Dh7l57ianaEoIpzbp0PC9UKAdCSVklD8E5Rpw7ETfbTl3FqcOOgq5q2LVDhgGCkaBv7p24JXikPdvhhmHvKMsw==}
+
+ remark-stringify@11.0.0:
+ resolution: {integrity: sha512-1OSmLd3awB/t8qdoEOMazZkNsfVTeY4fTsgzcQFdXNq8ToTN4ZGwrMnlda4K6smTFKD+GRV6O48i6Z4iKgPPpw==}
+
rollup@4.61.0:
resolution: {integrity: sha512-T9mWdbWfQtp0B5lv/HX+wrhYsmXRlcWnXXmJbXqKJhlRaoS6KMhq0gpyzW4UJfclcxrEdLnTgjT2NjruLONu0g==}
engines: {node: '>=18.0.0', npm: '>=8.0.0'}
@@ -776,6 +1051,18 @@ packages:
resolution: {integrity: sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA==}
engines: {node: '>=0.10.0'}
+ space-separated-tokens@2.0.2:
+ resolution: {integrity: sha512-PEGlAwrG8yXGXRjW32fGbg66JAlOAwbObuqVoJpv/mRgoWDQfgH1wDPvtzWyUSNAXBGSk8h755YDbbcEy3SH2Q==}
+
+ stringify-entities@4.0.4:
+ resolution: {integrity: sha512-IwfBptatlO+QCJUo19AqvrPNqlVMpW9YEL2LIVY+Rpv2qsjCGxaDLNRgeGsQWJhfItebuJhsGSLjaBbNSQ+ieg==}
+
+ style-to-js@1.1.21:
+ resolution: {integrity: sha512-RjQetxJrrUJLQPHbLku6U/ocGtzyjbJMP9lCNK7Ag0CNh690nSH8woqWH9u16nMjYBAok+i7JO1NP2pOy8IsPQ==}
+
+ style-to-object@1.0.14:
+ resolution: {integrity: sha512-LIN7rULI0jBscWQYaSswptyderlarFkjQ+t79nzty8tcIAceVomEVlLzH5VP4Cmsv6MtKhs7qaAiwlcp+Mgaxw==}
+
tailwindcss@4.3.0:
resolution: {integrity: sha512-y6nxMGB1nMW9R6k96e5gdIFzcfL/gTJRNaqGes1YvkLnPVXzWgbqFF2yLC0T8G774n24cx3Pe8XrKoniCOAH+Q==}
@@ -787,17 +1074,50 @@ packages:
resolution: {integrity: sha512-wXR/dYpcqKmfWpEdZjiKJOwCNFndD0DMnrW/cYjVGttEkBfVgcLFHoNrlj47mjOVic9yyNu65alsgF4NQyTa2g==}
engines: {node: '>=12.0.0'}
+ trim-lines@3.0.1:
+ resolution: {integrity: sha512-kRj8B+YHZCc9kQYdWfJB2/oUl9rA99qbowYYBtr4ui4mZyAQ2JpvVBd/6U2YloATfqBhBTSMhTpgBHtU0Mf3Rg==}
+
+ trough@2.2.0:
+ resolution: {integrity: sha512-tmMpK00BjZiUyVyvrBK7knerNgmgvcV/KLVyuma/SC+TQN167GrMRciANTz09+k3zW8L8t60jWO1GpfkZdjTaw==}
+
typescript@5.9.3:
resolution: {integrity: sha512-jl1vZzPDinLr9eUt3J/t7V6FgNEw9QjvBPdysz9KfQDD41fQrC2Y4vKQdiaUpFT4bXlb1RHhLpp8wtm6M5TgSw==}
engines: {node: '>=14.17'}
hasBin: true
+ unified@11.0.5:
+ resolution: {integrity: sha512-xKvGhPWw3k84Qjh8bI3ZeJjqnyadK+GEFtazSfZv/rKeTkTjOJho6mFqh2SM96iIcZokxiOpg78GazTSg8+KHA==}
+
+ unist-util-is@6.0.1:
+ resolution: {integrity: sha512-LsiILbtBETkDz8I9p1dQ0uyRUWuaQzd/cuEeS1hoRSyW5E5XGmTzlwY1OrNzzakGowI9Dr/I8HVaw4hTtnxy8g==}
+
+ unist-util-position@5.0.0:
+ resolution: {integrity: sha512-fucsC7HjXvkB5R3kTCO7kUjRdrS0BJt3M/FPxmHMBOm8JQi2BsHAHFsy27E0EolP8rp0NzXsJ+jNPyDWvOJZPA==}
+
+ unist-util-stringify-position@4.0.0:
+ resolution: {integrity: sha512-0ASV06AAoKCDkS2+xw5RXJywruurpbC4JZSm7nr7MOt1ojAzvyyaO+UxZf18j8FCF6kmzCZKcAgN/yu2gm2XgQ==}
+
+ unist-util-visit-parents@6.0.2:
+ resolution: {integrity: sha512-goh1s1TBrqSqukSc8wrjwWhL0hiJxgA8m4kFxGlQ+8FYQ3C/m11FcTs4YYem7V664AhHVvgoQLk890Ssdsr2IQ==}
+
+ unist-util-visit@5.1.0:
+ resolution: {integrity: sha512-m+vIdyeCOpdr/QeQCu2EzxX/ohgS8KbnPDgFni4dQsfSCtpz8UqDyY5GjRru8PDKuYn7Fq19j1CQ+nJSsGKOzg==}
+
update-browserslist-db@1.2.3:
resolution: {integrity: sha512-Js0m9cx+qOgDxo0eMiFGEueWztz+d4+M3rGlmKPT+T4IS/jP4ylw3Nwpu6cpTTP8R1MAC1kF4VbdLt3ARf209w==}
hasBin: true
peerDependencies:
browserslist: '>= 4.21.0'
+ util-deprecate@1.0.2:
+ resolution: {integrity: sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==}
+
+ vfile-message@4.0.3:
+ resolution: {integrity: sha512-QTHzsGd1EhbZs4AsQ20JX1rC3cOlt/IWJruk893DfLRr57lcnOeMaWG4K0JrRta4mIJZKth2Au3mM3u03/JWKw==}
+
+ vfile@6.0.3:
+ resolution: {integrity: sha512-KzIbH/9tXat2u30jf+smMwFCsno4wHVdNmzFyL+T/L3UGqqk6JKfVqOFOZEpZSHADH1k40ab6NUIXZq422ov3Q==}
+
vite@6.4.3:
resolution: {integrity: sha512-NTKlcQjlAK7MlQoyb6LgaqHc8sso/pVyUJYWMws3jg21uTJw/LddqIFPcPqP6PzpgbIcZyKI85sFE4HBrQDA8A==}
engines: {node: ^18.0.0 || ^20.0.0 || >=22.0.0}
@@ -841,6 +1161,9 @@ packages:
yallist@3.1.1:
resolution: {integrity: sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==}
+ zwitch@2.0.4:
+ resolution: {integrity: sha512-bXE4cR/kVZhKZX/RjPEflHaKVhUVl85noU3v6b8apfQEc1x4A+zBxjZ4lN8LqGd6WZ3dl98pY4o717VFmoPp+A==}
+
snapshots:
'@babel/code-frame@7.29.7':
@@ -1190,6 +1513,11 @@ snapshots:
'@tailwindcss/oxide-win32-arm64-msvc': 4.3.0
'@tailwindcss/oxide-win32-x64-msvc': 4.3.0
+ '@tailwindcss/typography@0.5.20(tailwindcss@4.3.0)':
+ dependencies:
+ postcss-selector-parser: 6.0.10
+ tailwindcss: 4.3.0
+
'@tailwindcss/vite@4.3.0(vite@6.4.3(jiti@2.7.0)(lightningcss@1.32.0))':
dependencies:
'@tailwindcss/node': 4.3.0
@@ -1218,8 +1546,26 @@ snapshots:
dependencies:
'@babel/types': 7.29.7
+ '@types/debug@4.1.13':
+ dependencies:
+ '@types/ms': 2.1.0
+
+ '@types/estree-jsx@1.0.5':
+ dependencies:
+ '@types/estree': 1.0.9
+
'@types/estree@1.0.9': {}
+ '@types/hast@3.0.4':
+ dependencies:
+ '@types/unist': 3.0.3
+
+ '@types/mdast@4.0.4':
+ dependencies:
+ '@types/unist': 3.0.3
+
+ '@types/ms@2.1.0': {}
+
'@types/react-dom@19.2.3(@types/react@19.2.16)':
dependencies:
'@types/react': 19.2.16
@@ -1228,6 +1574,12 @@ snapshots:
dependencies:
csstype: 3.2.3
+ '@types/unist@2.0.11': {}
+
+ '@types/unist@3.0.3': {}
+
+ '@ungap/structured-clone@1.3.1': {}
+
'@vitejs/plugin-react@4.7.0(vite@6.4.3(jiti@2.7.0)(lightningcss@1.32.0))':
dependencies:
'@babel/core': 7.29.7
@@ -1240,6 +1592,8 @@ snapshots:
transitivePeerDependencies:
- supports-color
+ bail@2.0.2: {}
+
baseline-browser-mapping@2.10.33: {}
browserslist@4.28.2:
@@ -1252,16 +1606,40 @@ snapshots:
caniuse-lite@1.0.30001793: {}
+ ccount@2.0.1: {}
+
+ character-entities-html4@2.1.0: {}
+
+ character-entities-legacy@3.0.0: {}
+
+ character-entities@2.0.2: {}
+
+ character-reference-invalid@2.0.1: {}
+
+ comma-separated-tokens@2.0.3: {}
+
convert-source-map@2.0.0: {}
+ cssesc@3.0.0: {}
+
csstype@3.2.3: {}
debug@4.4.3:
dependencies:
ms: 2.1.3
+ decode-named-character-reference@1.3.0:
+ dependencies:
+ character-entities: 2.0.2
+
+ dequal@2.0.3: {}
+
detect-libc@2.1.2: {}
+ devlop@1.1.0:
+ dependencies:
+ dequal: 2.0.3
+
electron-to-chromium@1.5.364: {}
enhanced-resolve@5.22.1:
@@ -1300,6 +1678,12 @@ snapshots:
escalade@3.2.0: {}
+ escape-string-regexp@5.0.0: {}
+
+ estree-util-is-identifier-name@3.0.0: {}
+
+ extend@3.0.2: {}
+
fdir@6.5.0(picomatch@4.0.4):
optionalDependencies:
picomatch: 4.0.4
@@ -1311,6 +1695,47 @@ snapshots:
graceful-fs@4.2.11: {}
+ hast-util-to-jsx-runtime@2.3.6:
+ dependencies:
+ '@types/estree': 1.0.9
+ '@types/hast': 3.0.4
+ '@types/unist': 3.0.3
+ comma-separated-tokens: 2.0.3
+ devlop: 1.1.0
+ estree-util-is-identifier-name: 3.0.0
+ hast-util-whitespace: 3.0.0
+ mdast-util-mdx-expression: 2.0.1
+ mdast-util-mdx-jsx: 3.2.0
+ mdast-util-mdxjs-esm: 2.0.1
+ property-information: 7.2.0
+ space-separated-tokens: 2.0.2
+ style-to-js: 1.1.21
+ unist-util-position: 5.0.0
+ vfile-message: 4.0.3
+ transitivePeerDependencies:
+ - supports-color
+
+ hast-util-whitespace@3.0.0:
+ dependencies:
+ '@types/hast': 3.0.4
+
+ html-url-attributes@3.0.1: {}
+
+ inline-style-parser@0.2.7: {}
+
+ is-alphabetical@2.0.1: {}
+
+ is-alphanumerical@2.0.1:
+ dependencies:
+ is-alphabetical: 2.0.1
+ is-decimal: 2.0.1
+
+ is-decimal@2.0.1: {}
+
+ is-hexadecimal@2.0.1: {}
+
+ is-plain-obj@4.1.0: {}
+
jiti@2.7.0: {}
js-tokens@4.0.0: {}
@@ -1368,6 +1793,8 @@ snapshots:
lightningcss-win32-arm64-msvc: 1.32.0
lightningcss-win32-x64-msvc: 1.32.0
+ longest-streak@3.1.0: {}
+
lru-cache@5.1.1:
dependencies:
yallist: 3.1.1
@@ -1376,31 +1803,446 @@ snapshots:
dependencies:
'@jridgewell/sourcemap-codec': 1.5.5
+ markdown-table@3.0.4: {}
+
+ mdast-util-find-and-replace@3.0.2:
+ dependencies:
+ '@types/mdast': 4.0.4
+ escape-string-regexp: 5.0.0
+ unist-util-is: 6.0.1
+ unist-util-visit-parents: 6.0.2
+
+ mdast-util-from-markdown@2.0.3:
+ dependencies:
+ '@types/mdast': 4.0.4
+ '@types/unist': 3.0.3
+ decode-named-character-reference: 1.3.0
+ devlop: 1.1.0
+ mdast-util-to-string: 4.0.0
+ micromark: 4.0.2
+ micromark-util-decode-numeric-character-reference: 2.0.2
+ micromark-util-decode-string: 2.0.1
+ micromark-util-normalize-identifier: 2.0.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+ unist-util-stringify-position: 4.0.0
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-gfm-autolink-literal@2.0.1:
+ dependencies:
+ '@types/mdast': 4.0.4
+ ccount: 2.0.1
+ devlop: 1.1.0
+ mdast-util-find-and-replace: 3.0.2
+ micromark-util-character: 2.1.1
+
+ mdast-util-gfm-footnote@2.1.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ devlop: 1.1.0
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ micromark-util-normalize-identifier: 2.0.1
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-gfm-strikethrough@2.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-gfm-table@2.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ devlop: 1.1.0
+ markdown-table: 3.0.4
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-gfm-task-list-item@2.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ devlop: 1.1.0
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-gfm@3.1.0:
+ dependencies:
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-gfm-autolink-literal: 2.0.1
+ mdast-util-gfm-footnote: 2.1.0
+ mdast-util-gfm-strikethrough: 2.0.0
+ mdast-util-gfm-table: 2.0.0
+ mdast-util-gfm-task-list-item: 2.0.0
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-mdx-expression@2.0.1:
+ dependencies:
+ '@types/estree-jsx': 1.0.5
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ devlop: 1.1.0
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-mdx-jsx@3.2.0:
+ dependencies:
+ '@types/estree-jsx': 1.0.5
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ '@types/unist': 3.0.3
+ ccount: 2.0.1
+ devlop: 1.1.0
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ parse-entities: 4.0.2
+ stringify-entities: 4.0.4
+ unist-util-stringify-position: 4.0.0
+ vfile-message: 4.0.3
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-mdxjs-esm@2.0.1:
+ dependencies:
+ '@types/estree-jsx': 1.0.5
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ devlop: 1.1.0
+ mdast-util-from-markdown: 2.0.3
+ mdast-util-to-markdown: 2.1.2
+ transitivePeerDependencies:
+ - supports-color
+
+ mdast-util-phrasing@4.1.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ unist-util-is: 6.0.1
+
+ mdast-util-to-hast@13.2.1:
+ dependencies:
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ '@ungap/structured-clone': 1.3.1
+ devlop: 1.1.0
+ micromark-util-sanitize-uri: 2.0.1
+ trim-lines: 3.0.1
+ unist-util-position: 5.0.0
+ unist-util-visit: 5.1.0
+ vfile: 6.0.3
+
+ mdast-util-to-markdown@2.1.2:
+ dependencies:
+ '@types/mdast': 4.0.4
+ '@types/unist': 3.0.3
+ longest-streak: 3.1.0
+ mdast-util-phrasing: 4.1.0
+ mdast-util-to-string: 4.0.0
+ micromark-util-classify-character: 2.0.1
+ micromark-util-decode-string: 2.0.1
+ unist-util-visit: 5.1.0
+ zwitch: 2.0.4
+
+ mdast-util-to-string@4.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+
+ micromark-core-commonmark@2.0.3:
+ dependencies:
+ decode-named-character-reference: 1.3.0
+ devlop: 1.1.0
+ micromark-factory-destination: 2.0.1
+ micromark-factory-label: 2.0.1
+ micromark-factory-space: 2.0.1
+ micromark-factory-title: 2.0.1
+ micromark-factory-whitespace: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-chunked: 2.0.1
+ micromark-util-classify-character: 2.0.1
+ micromark-util-html-tag-name: 2.0.1
+ micromark-util-normalize-identifier: 2.0.1
+ micromark-util-resolve-all: 2.0.1
+ micromark-util-subtokenize: 2.1.0
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-autolink-literal@2.1.0:
+ dependencies:
+ micromark-util-character: 2.1.1
+ micromark-util-sanitize-uri: 2.0.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-footnote@2.1.0:
+ dependencies:
+ devlop: 1.1.0
+ micromark-core-commonmark: 2.0.3
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-normalize-identifier: 2.0.1
+ micromark-util-sanitize-uri: 2.0.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-strikethrough@2.1.0:
+ dependencies:
+ devlop: 1.1.0
+ micromark-util-chunked: 2.0.1
+ micromark-util-classify-character: 2.0.1
+ micromark-util-resolve-all: 2.0.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-table@2.1.1:
+ dependencies:
+ devlop: 1.1.0
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-tagfilter@2.0.0:
+ dependencies:
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm-task-list-item@2.1.0:
+ dependencies:
+ devlop: 1.1.0
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-extension-gfm@3.0.0:
+ dependencies:
+ micromark-extension-gfm-autolink-literal: 2.1.0
+ micromark-extension-gfm-footnote: 2.1.0
+ micromark-extension-gfm-strikethrough: 2.1.0
+ micromark-extension-gfm-table: 2.1.1
+ micromark-extension-gfm-tagfilter: 2.0.0
+ micromark-extension-gfm-task-list-item: 2.1.0
+ micromark-util-combine-extensions: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-factory-destination@2.0.1:
+ dependencies:
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-factory-label@2.0.1:
+ dependencies:
+ devlop: 1.1.0
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-factory-space@2.0.1:
+ dependencies:
+ micromark-util-character: 2.1.1
+ micromark-util-types: 2.0.2
+
+ micromark-factory-title@2.0.1:
+ dependencies:
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-factory-whitespace@2.0.1:
+ dependencies:
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-util-character@2.1.1:
+ dependencies:
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-util-chunked@2.0.1:
+ dependencies:
+ micromark-util-symbol: 2.0.1
+
+ micromark-util-classify-character@2.0.1:
+ dependencies:
+ micromark-util-character: 2.1.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-util-combine-extensions@2.0.1:
+ dependencies:
+ micromark-util-chunked: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-util-decode-numeric-character-reference@2.0.2:
+ dependencies:
+ micromark-util-symbol: 2.0.1
+
+ micromark-util-decode-string@2.0.1:
+ dependencies:
+ decode-named-character-reference: 1.3.0
+ micromark-util-character: 2.1.1
+ micromark-util-decode-numeric-character-reference: 2.0.2
+ micromark-util-symbol: 2.0.1
+
+ micromark-util-encode@2.0.1: {}
+
+ micromark-util-html-tag-name@2.0.1: {}
+
+ micromark-util-normalize-identifier@2.0.1:
+ dependencies:
+ micromark-util-symbol: 2.0.1
+
+ micromark-util-resolve-all@2.0.1:
+ dependencies:
+ micromark-util-types: 2.0.2
+
+ micromark-util-sanitize-uri@2.0.1:
+ dependencies:
+ micromark-util-character: 2.1.1
+ micromark-util-encode: 2.0.1
+ micromark-util-symbol: 2.0.1
+
+ micromark-util-subtokenize@2.1.0:
+ dependencies:
+ devlop: 1.1.0
+ micromark-util-chunked: 2.0.1
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+
+ micromark-util-symbol@2.0.1: {}
+
+ micromark-util-types@2.0.2: {}
+
+ micromark@4.0.2:
+ dependencies:
+ '@types/debug': 4.1.13
+ debug: 4.4.3
+ decode-named-character-reference: 1.3.0
+ devlop: 1.1.0
+ micromark-core-commonmark: 2.0.3
+ micromark-factory-space: 2.0.1
+ micromark-util-character: 2.1.1
+ micromark-util-chunked: 2.0.1
+ micromark-util-combine-extensions: 2.0.1
+ micromark-util-decode-numeric-character-reference: 2.0.2
+ micromark-util-encode: 2.0.1
+ micromark-util-normalize-identifier: 2.0.1
+ micromark-util-resolve-all: 2.0.1
+ micromark-util-sanitize-uri: 2.0.1
+ micromark-util-subtokenize: 2.1.0
+ micromark-util-symbol: 2.0.1
+ micromark-util-types: 2.0.2
+ transitivePeerDependencies:
+ - supports-color
+
ms@2.1.3: {}
nanoid@3.3.12: {}
node-releases@2.0.46: {}
+ parse-entities@4.0.2:
+ dependencies:
+ '@types/unist': 2.0.11
+ character-entities-legacy: 3.0.0
+ character-reference-invalid: 2.0.1
+ decode-named-character-reference: 1.3.0
+ is-alphanumerical: 2.0.1
+ is-decimal: 2.0.1
+ is-hexadecimal: 2.0.1
+
picocolors@1.1.1: {}
picomatch@4.0.4: {}
+ postcss-selector-parser@6.0.10:
+ dependencies:
+ cssesc: 3.0.0
+ util-deprecate: 1.0.2
+
postcss@8.5.15:
dependencies:
nanoid: 3.3.12
picocolors: 1.1.1
source-map-js: 1.2.1
+ property-information@7.2.0: {}
+
react-dom@19.2.7(react@19.2.7):
dependencies:
react: 19.2.7
scheduler: 0.27.0
+ react-markdown@10.1.0(@types/react@19.2.16)(react@19.2.7):
+ dependencies:
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ '@types/react': 19.2.16
+ devlop: 1.1.0
+ hast-util-to-jsx-runtime: 2.3.6
+ html-url-attributes: 3.0.1
+ mdast-util-to-hast: 13.2.1
+ react: 19.2.7
+ remark-parse: 11.0.0
+ remark-rehype: 11.1.2
+ unified: 11.0.5
+ unist-util-visit: 5.1.0
+ vfile: 6.0.3
+ transitivePeerDependencies:
+ - supports-color
+
react-refresh@0.17.0: {}
react@19.2.7: {}
+ remark-gfm@4.0.1:
+ dependencies:
+ '@types/mdast': 4.0.4
+ mdast-util-gfm: 3.1.0
+ micromark-extension-gfm: 3.0.0
+ remark-parse: 11.0.0
+ remark-stringify: 11.0.0
+ unified: 11.0.5
+ transitivePeerDependencies:
+ - supports-color
+
+ remark-parse@11.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ mdast-util-from-markdown: 2.0.3
+ micromark-util-types: 2.0.2
+ unified: 11.0.5
+ transitivePeerDependencies:
+ - supports-color
+
+ remark-rehype@11.1.2:
+ dependencies:
+ '@types/hast': 3.0.4
+ '@types/mdast': 4.0.4
+ mdast-util-to-hast: 13.2.1
+ unified: 11.0.5
+ vfile: 6.0.3
+
+ remark-stringify@11.0.0:
+ dependencies:
+ '@types/mdast': 4.0.4
+ mdast-util-to-markdown: 2.1.2
+ unified: 11.0.5
+
rollup@4.61.0:
dependencies:
'@types/estree': 1.0.9
@@ -1438,6 +2280,21 @@ snapshots:
source-map-js@1.2.1: {}
+ space-separated-tokens@2.0.2: {}
+
+ stringify-entities@4.0.4:
+ dependencies:
+ character-entities-html4: 2.1.0
+ character-entities-legacy: 3.0.0
+
+ style-to-js@1.1.21:
+ dependencies:
+ style-to-object: 1.0.14
+
+ style-to-object@1.0.14:
+ dependencies:
+ inline-style-parser: 0.2.7
+
tailwindcss@4.3.0: {}
tapable@2.3.3: {}
@@ -1447,14 +2304,63 @@ snapshots:
fdir: 6.5.0(picomatch@4.0.4)
picomatch: 4.0.4
+ trim-lines@3.0.1: {}
+
+ trough@2.2.0: {}
+
typescript@5.9.3: {}
+ unified@11.0.5:
+ dependencies:
+ '@types/unist': 3.0.3
+ bail: 2.0.2
+ devlop: 1.1.0
+ extend: 3.0.2
+ is-plain-obj: 4.1.0
+ trough: 2.2.0
+ vfile: 6.0.3
+
+ unist-util-is@6.0.1:
+ dependencies:
+ '@types/unist': 3.0.3
+
+ unist-util-position@5.0.0:
+ dependencies:
+ '@types/unist': 3.0.3
+
+ unist-util-stringify-position@4.0.0:
+ dependencies:
+ '@types/unist': 3.0.3
+
+ unist-util-visit-parents@6.0.2:
+ dependencies:
+ '@types/unist': 3.0.3
+ unist-util-is: 6.0.1
+
+ unist-util-visit@5.1.0:
+ dependencies:
+ '@types/unist': 3.0.3
+ unist-util-is: 6.0.1
+ unist-util-visit-parents: 6.0.2
+
update-browserslist-db@1.2.3(browserslist@4.28.2):
dependencies:
browserslist: 4.28.2
escalade: 3.2.0
picocolors: 1.1.1
+ util-deprecate@1.0.2: {}
+
+ vfile-message@4.0.3:
+ dependencies:
+ '@types/unist': 3.0.3
+ unist-util-stringify-position: 4.0.0
+
+ vfile@6.0.3:
+ dependencies:
+ '@types/unist': 3.0.3
+ vfile-message: 4.0.3
+
vite@6.4.3(jiti@2.7.0)(lightningcss@1.32.0):
dependencies:
esbuild: 0.25.12
@@ -1469,3 +2375,5 @@ snapshots:
lightningcss: 1.32.0
yallist@3.1.1: {}
+
+ zwitch@2.0.4: {}
diff --git a/examples/dql_explorer/web/src/App.tsx b/examples/dql_explorer/web/src/App.tsx
index 770330a..3ce2168 100644
--- a/examples/dql_explorer/web/src/App.tsx
+++ b/examples/dql_explorer/web/src/App.tsx
@@ -1,12 +1,14 @@
import { useState } from "react";
+import { Ask } from "./Ask";
import { Dashboard } from "./Dashboard";
import { Explorer } from "./Explorer";
-type Tab = "dashboard" | "explorer";
+type Tab = "dashboard" | "explorer" | "ask";
const TABS: { id: Tab; label: string }[] = [
{ id: "dashboard", label: "M&A / IPO Dashboard" },
{ id: "explorer", label: "DQL Builder" },
+ { id: "ask", label: "Ask Diffbot" },
];
export function App() {
@@ -39,10 +41,13 @@ export function App() {
))}
- {/* Keep both mounted so switching tabs preserves each one's state. */}
+ {/* Keep each mounted so switching tabs preserves its state. */}
+
+
+
{dashboardSeen && (
diff --git a/examples/dql_explorer/web/src/Ask.tsx b/examples/dql_explorer/web/src/Ask.tsx
new file mode 100644
index 0000000..843fc55
--- /dev/null
+++ b/examples/dql_explorer/web/src/Ask.tsx
@@ -0,0 +1,126 @@
+import { useRef, useState } from "react";
+import ReactMarkdown from "react-markdown";
+import remarkGfm from "remark-gfm";
+import { streamAsk } from "./api";
+
+const EXAMPLES = [
+ "Who founded Diffbot, and when?",
+ "What does Diffbot's Knowledge Graph contain?",
+ "Summarize OpenAI's funding history.",
+ "Which companies acquired robotics startups in the last year?",
+];
+
+export function Ask() {
+ const [question, setQuestion] = useState("");
+ const [answer, setAnswer] = useState("");
+ const [streaming, setStreaming] = useState(false);
+ const [error, setError] = useState(null);
+ // Track the active stream so a new question (or unmount) cancels the old one.
+ const abortRef = useRef(null);
+
+ async function run(q: string) {
+ const trimmed = q.trim();
+ if (!trimmed || streaming) return;
+
+ abortRef.current?.abort();
+ const ctrl = new AbortController();
+ abortRef.current = ctrl;
+
+ setStreaming(true);
+ setError(null);
+ setAnswer("");
+ try {
+ await streamAsk(trimmed, (text) => setAnswer((a) => a + text), ctrl.signal);
+ } catch (e) {
+ if (!ctrl.signal.aborted) {
+ setError(e instanceof Error ? e.message : String(e));
+ }
+ } finally {
+ if (abortRef.current === ctrl) abortRef.current = null;
+ setStreaming(false);
+ }
+ }
+
+ return (
+
+
+
Ask Diffbot
+
+ Where the DQL Builder authors a precise query, this just asks Diffbot's
+ own LLM — grounded in the Knowledge Graph and the live web. Powered by{" "}
+ ChatDiffbot, which
+ streams tokens natively.
+
+
+
+
+
+
+ {EXAMPLES.map((ex) => (
+
+ ))}
+
+
+ {error && (
+
+ {error}
+
+ )}
+
+ {(answer || streaming) && (
+
+
+ {/*
+ Diffbot's RAG LLM emits markdown (headings, lists, links, tables),
+ so render it as such — re-parsing the full string each token is
+ cheap for these short answers. `prose` (Tailwind typography) styles
+ the elements; `dark:prose-invert` flips it for dark mode.
+ */}
+
+ {answer}
+
+ {streaming && (
+
+ ▌
+
+ )}
+
+
+ )}
+
+ );
+}
diff --git a/examples/dql_explorer/web/src/api.ts b/examples/dql_explorer/web/src/api.ts
index a69a3a9..3d1f8b9 100644
--- a/examples/dql_explorer/web/src/api.ts
+++ b/examples/dql_explorer/web/src/api.ts
@@ -32,3 +32,63 @@ export function postDashboard(
): Promise {
return postJSON("/api/dashboard", req);
}
+
+/*
+ Stream an answer from ChatDiffbot (`POST /api/ask`). The backend emits
+ Server-Sent Events — `token` frames carry text, an `error` frame carries a
+ failure message, and a `done` frame ends the stream. `onToken` is called with
+ each piece of text as it arrives; the promise resolves when the stream ends and
+ rejects on an error frame (or a transport failure). Pass `signal` to abort.
+*/
+export async function streamAsk(
+ question: string,
+ onToken: (text: string) => void,
+ signal?: AbortSignal,
+): Promise {
+ const res = await fetch("/api/ask", {
+ method: "POST",
+ headers: { "Content-Type": "application/json" },
+ body: JSON.stringify({ question }),
+ signal,
+ });
+ if (!res.ok || !res.body) {
+ throw new Error(`Request failed (${res.status}): ${res.statusText}`);
+ }
+
+ const reader = res.body.getReader();
+ const decoder = new TextDecoder();
+ let buffer = "";
+
+ for (;;) {
+ const { done, value } = await reader.read();
+ if (done) break;
+ buffer += decoder.decode(value, { stream: true });
+
+ /*
+ SSE frames are separated by a blank line. Process every complete frame in
+ the buffer and keep the trailing partial frame for the next read.
+ */
+ let sep: number;
+ while ((sep = buffer.indexOf("\n\n")) !== -1) {
+ const frame = buffer.slice(0, sep);
+ buffer = buffer.slice(sep + 2);
+
+ let event = "message";
+ const dataLines: string[] = [];
+ for (const line of frame.split("\n")) {
+ if (line.startsWith("event:")) event = line.slice(6).trim();
+ else if (line.startsWith("data:")) dataLines.push(line.slice(5).trim());
+ }
+ const payload = dataLines.join("\n");
+
+ if (event === "done") return;
+ if (event === "error") {
+ const message = payload ? JSON.parse(payload).message : "stream error";
+ throw new Error(message);
+ }
+ if (event === "token" && payload) {
+ onToken(JSON.parse(payload).text);
+ }
+ }
+ }
+}
diff --git a/examples/dql_explorer/web/src/index.css b/examples/dql_explorer/web/src/index.css
index 6cc4a62..dbb26fd 100644
--- a/examples/dql_explorer/web/src/index.css
+++ b/examples/dql_explorer/web/src/index.css
@@ -1,4 +1,5 @@
@import "tailwindcss";
+@plugin "@tailwindcss/typography";
@layer base {
/* Let the browser theme native controls (scrollbars, inputs) to match. */
diff --git a/examples/quickstart/_build_notebook.py b/examples/quickstart/_build_notebook.py
index 75a3ac4..ee90771 100644
--- a/examples/quickstart/_build_notebook.py
+++ b/examples/quickstart/_build_notebook.py
@@ -37,8 +37,8 @@ def code(src: str) -> None:
5. **Web Search retriever** — natural-language web search backed by Diffbot
6. **Extract tool + loader** — fetch and read individual URLs
7. **Entities tool** — NLP entity / sentiment extraction
-8. **ChatDiffbot** — Diffbot's own LLM RAG endpoint, with native streaming
-9. **Bring-your-own-client** — pre-built SDK clients for full transport control
+8. **ChatDiffbot + DiffbotAskTool** — Diffbot's own LLM RAG endpoint, with native streaming
+9. **Configuring the client** — timeout, transport, custom URLs all live on the client
10. A **multi-tool research agent** that combines KG + web search + extract
You'll need:
@@ -55,13 +55,16 @@ def code(src: str) -> None:
langchain-diffbot \\
langchain langchain-anthropic python-dotenv""")
-md("""## 2. Authenticate
+md("""## 2. Authenticate and build a client
-Put your keys in a `.env` next to this notebook, or paste them inline below. `getpass` keeps them out of the notebook output.""")
+Put your keys in a `.env` next to this notebook, or paste them inline below. `getpass` keeps them out of the notebook output.
+
+Every component in this package takes a pre-built SDK client. Build one `Diffbot` (sync) and one `DiffbotAsync` (async) here and share them across every section below — one connection pool each, configured in one place. The components use these as-is and never close them; you own the lifecycle (we close them at the end).""")
code("""import getpass
import os
+from diffbot import Diffbot, DiffbotAsync
from dotenv import load_dotenv
load_dotenv()
@@ -70,7 +73,12 @@ def code(src: str) -> None:
os.environ["DIFFBOT_API_TOKEN"] = getpass.getpass("DIFFBOT_API_TOKEN: ")
if not os.getenv("ANTHROPIC_API_KEY"):
- os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("ANTHROPIC_API_KEY: ")""")
+ os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("ANTHROPIC_API_KEY: ")
+
+# Shared clients. `db` drives the sync surface (invoke / stream / load); `adb`
+# drives the async surface (ainvoke / astream) used in section 5.
+db = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+adb = DiffbotAsync(token=os.environ["DIFFBOT_API_TOKEN"])""")
md("""## 3. Knowledge Graph retrieval
@@ -87,7 +95,7 @@ def code(src: str) -> None:
code("""from langchain_diffbot import DiffbotKnowledgeGraphRetriever
-retriever = DiffbotKnowledgeGraphRetriever(k=5)
+retriever = DiffbotKnowledgeGraphRetriever(client=db, k=5)
docs = retriever.invoke(
'type:Organization industries:"Artificial Intelligence" location.city.name:"Boston"'
@@ -109,6 +117,7 @@ def code(src: str) -> None:
### 4a. Field projection (recommended for agent / tool-use)""")
code("""retriever = DiffbotKnowledgeGraphRetriever(
+ client=db,
k=3,
fields=["id", "type", "name", "homepageUri", "nbEmployees", "industries"],
)
@@ -122,6 +131,7 @@ def code(src: str) -> None:
md("""### 4b. Pick which field becomes `page_content`""")
code("""retriever = DiffbotKnowledgeGraphRetriever(
+ client=db,
k=3,
fields=["id", "name"],
content_fields=["summary", "description", "name"],
@@ -150,7 +160,7 @@ def mapper(entity: dict) -> Document:
)
-retriever = DiffbotKnowledgeGraphRetriever(k=3, document_mapper=mapper)
+retriever = DiffbotKnowledgeGraphRetriever(client=db, k=3, document_mapper=mapper)
for d in retriever.invoke('type:Organization industries:"Biotechnology" revSortBy:nbEmployees'):
print(d.metadata, "—", d.page_content[:120])""")
@@ -161,7 +171,10 @@ def mapper(entity: dict) -> Document:
code("""import asyncio
-retriever = DiffbotKnowledgeGraphRetriever(k=3, fields=["id", "name", "industries"])
+# Async surface → pass the `DiffbotAsync` client. `ainvoke` runs on its pool.
+retriever = DiffbotKnowledgeGraphRetriever(
+ async_client=adb, k=3, fields=["id", "name", "industries"]
+)
queries = [
'type:Organization location.city.name:"Austin" industries:"Robotics"',
@@ -182,7 +195,7 @@ def mapper(entity: dict) -> Document:
code("""from langchain_diffbot import DiffbotWebSearchRetriever
-web = DiffbotWebSearchRetriever(k=3, fields=["title", "pageUrl", "score"])
+web = DiffbotWebSearchRetriever(client=db, k=3, fields=["title", "pageUrl", "score"])
for d in web.invoke("diffbot knowledge graph llm grounding"):
print(d.metadata)
@@ -199,7 +212,7 @@ def mapper(entity: dict) -> Document:
code("""from langchain_diffbot import DiffbotExtractTool
-extract = DiffbotExtractTool()
+extract = DiffbotExtractTool(client=db)
result = extract.invoke({"url": "https://www.diffbot.com/products/extract/"})
print("title:", result["title"])
@@ -213,6 +226,7 @@ def mapper(entity: dict) -> Document:
code("""from langchain_diffbot import DiffbotExtractLoader
loader = DiffbotExtractLoader(
+ client=db,
urls=[
"https://www.diffbot.com/products/extract/",
"https://www.diffbot.com/products/kg/",
@@ -236,7 +250,7 @@ def mapper(entity: dict) -> Document:
code("""from langchain_diffbot import DiffbotEntitiesTool
-entities = DiffbotEntitiesTool()
+entities = DiffbotEntitiesTool(client=db)
result = entities.invoke({
"text": "Anthropic, founded by Dario Amodei and Daniela Amodei in 2021, released Claude in 2023."
})
@@ -250,9 +264,9 @@ def mapper(entity: dict) -> Document:
`ChatDiffbot` wraps Diffbot's own LLM RAG endpoint as a LangChain `BaseChatModel`. It streams tokens natively, so both `.stream()` and `.astream()` work out of the box — and `.invoke()` aggregates the stream for you.""")
code("""from langchain_diffbot import ChatDiffbot
-from langchain_core.messages import HumanMessage
+from langchain.messages import HumanMessage
-llm = ChatDiffbot()
+llm = ChatDiffbot(client=db)
# Streaming
print("streaming: ", end="", flush=True)
@@ -264,20 +278,28 @@ def mapper(entity: dict) -> Document:
msg = llm.invoke([HumanMessage(content="Who founded Anthropic?")])
print(msg.content)""")
-md("""## 10. Bring-your-own-client
+md(
+ """`ChatDiffbot` uses Diffbot's LLM as your *primary* model. To instead let a tool-calling agent (driven by, say, Claude) *consult* Diffbot, give it `DiffbotAskTool` — it answers a natural-language question from the KG + live web and returns a string."""
+)
+
+code("""from langchain_diffbot import DiffbotAskTool
+
+ask = DiffbotAskTool(client=db)
+print(ask.invoke({"question": "Who founded Diffbot, and when?"}))""")
+
+md("""## 10. Configuring the client
-Every class in the package accepts a pre-built `diffbot.Diffbot` (or `diffbot.DiffbotAsync`) via the `client` / `async_client` fields. When you supply one, the package uses it as-is and **does not close it** — you own the lifecycle. This is the escape hatch for anything the SDK supports that we don't re-expose: custom URLs, `transport=`, shared connection pools, custom headers.""")
+There's no separate configuration surface on the components — everything the SDK supports is set on the client you build: `timeout`, a custom `transport=` (for logging / retries / mock transports in tests), or custom endpoint URLs. Build the client however you need and pass it in; share one across components to reuse its connection pool.""")
code("""from diffbot import Diffbot
-# Share one Diffbot client across many retriever calls instead of opening a
-# fresh httpx pool per call. Useful in long-running services.
-shared = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"], timeout=60.0)
+# e.g. a longer timeout for big exports. Customize the client, not the component.
+custom = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"], timeout=60.0)
-retriever = DiffbotKnowledgeGraphRetriever(client=shared, k=3, fields=["id", "name"])
+retriever = DiffbotKnowledgeGraphRetriever(client=custom, k=3, fields=["id", "name"])
print(retriever.invoke('type:Organization name:"Diffbot"'))
-shared.close()""")
+custom.close() # you own the lifecycle of clients you build""")
md("""## 11. Multi-tool research agent
@@ -288,7 +310,7 @@ def mapper(entity: dict) -> Document:
from diffbot.errors import APIError
from langchain.agents import create_agent
-from langchain_core.tools import tool
+from langchain.tools import tool
from langchain_diffbot import (
DiffbotExtractTool,
@@ -304,17 +326,17 @@ def mapper(entity: dict) -> Document:
@lru_cache(maxsize=1)
def _kg() -> DiffbotKnowledgeGraphRetriever:
- return DiffbotKnowledgeGraphRetriever(k=5, fields=_KG_FIELDS)
+ return DiffbotKnowledgeGraphRetriever(client=db, k=5, fields=_KG_FIELDS)
@lru_cache(maxsize=1)
def _web() -> DiffbotWebSearchRetriever:
- return DiffbotWebSearchRetriever(k=5, fields=["title", "pageUrl", "score"])
+ return DiffbotWebSearchRetriever(client=db, k=5, fields=["title", "pageUrl", "score"])
@lru_cache(maxsize=1)
def _extract() -> DiffbotExtractTool:
- return DiffbotExtractTool()
+ return DiffbotExtractTool(client=db)
@tool
diff --git a/examples/quickstart/quickstart.ipynb b/examples/quickstart/quickstart.ipynb
index 8effa26..e7ce8ba 100644
--- a/examples/quickstart/quickstart.ipynb
+++ b/examples/quickstart/quickstart.ipynb
@@ -16,8 +16,8 @@
"5. **Web Search retriever** — natural-language web search backed by Diffbot\n",
"6. **Extract tool + loader** — fetch and read individual URLs\n",
"7. **Entities tool** — NLP entity / sentiment extraction\n",
- "8. **ChatDiffbot** — Diffbot's own LLM RAG endpoint, with native streaming\n",
- "9. **Bring-your-own-client** — pre-built SDK clients for full transport control\n",
+ "8. **ChatDiffbot + DiffbotAskTool** — Diffbot's own LLM RAG endpoint, with native streaming\n",
+ "9. **Configuring the client** — timeout, transport, custom URLs all live on the client\n",
"10. A **multi-tool research agent** that combines KG + web search + extract\n",
"\n",
"You'll need:\n",
@@ -53,9 +53,11 @@
"id": "cell-03",
"metadata": {},
"source": [
- "## 2. Authenticate\n",
+ "## 2. Authenticate and build a client\n",
"\n",
- "Put your keys in a `.env` next to this notebook, or paste them inline below. `getpass` keeps them out of the notebook output."
+ "Put your keys in a `.env` next to this notebook, or paste them inline below. `getpass` keeps them out of the notebook output.\n",
+ "\n",
+ "Every component in this package takes a pre-built SDK client. Build one `Diffbot` (sync) and one `DiffbotAsync` (async) here and share them across every section below — one connection pool each, configured in one place. The components use these as-is and never close them; you own the lifecycle (we close them at the end)."
]
},
{
@@ -68,6 +70,7 @@
"import getpass\n",
"import os\n",
"\n",
+ "from diffbot import Diffbot, DiffbotAsync\n",
"from dotenv import load_dotenv\n",
"\n",
"load_dotenv()\n",
@@ -76,7 +79,12 @@
" os.environ[\"DIFFBOT_API_TOKEN\"] = getpass.getpass(\"DIFFBOT_API_TOKEN: \")\n",
"\n",
"if not os.getenv(\"ANTHROPIC_API_KEY\"):\n",
- " os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass(\"ANTHROPIC_API_KEY: \")"
+ " os.environ[\"ANTHROPIC_API_KEY\"] = getpass.getpass(\"ANTHROPIC_API_KEY: \")\n",
+ "\n",
+ "# Shared clients. `db` drives the sync surface (invoke / stream / load); `adb`\n",
+ "# drives the async surface (ainvoke / astream) used in section 5.\n",
+ "db = Diffbot(token=os.environ[\"DIFFBOT_API_TOKEN\"])\n",
+ "adb = DiffbotAsync(token=os.environ[\"DIFFBOT_API_TOKEN\"])"
]
},
{
@@ -107,7 +115,7 @@
"source": [
"from langchain_diffbot import DiffbotKnowledgeGraphRetriever\n",
"\n",
- "retriever = DiffbotKnowledgeGraphRetriever(k=5)\n",
+ "retriever = DiffbotKnowledgeGraphRetriever(client=db, k=5)\n",
"\n",
"docs = retriever.invoke(\n",
" 'type:Organization industries:\"Artificial Intelligence\" location.city.name:\"Boston\"'\n",
@@ -143,6 +151,7 @@
"outputs": [],
"source": [
"retriever = DiffbotKnowledgeGraphRetriever(\n",
+ " client=db,\n",
" k=3,\n",
" fields=[\"id\", \"type\", \"name\", \"homepageUri\", \"nbEmployees\", \"industries\"],\n",
")\n",
@@ -172,6 +181,7 @@
"outputs": [],
"source": [
"retriever = DiffbotKnowledgeGraphRetriever(\n",
+ " client=db,\n",
" k=3,\n",
" fields=[\"id\", \"name\"],\n",
" content_fields=[\"summary\", \"description\", \"name\"],\n",
@@ -214,7 +224,7 @@
" )\n",
"\n",
"\n",
- "retriever = DiffbotKnowledgeGraphRetriever(k=3, document_mapper=mapper)\n",
+ "retriever = DiffbotKnowledgeGraphRetriever(client=db, k=3, document_mapper=mapper)\n",
"\n",
"for d in retriever.invoke(\n",
" 'type:Organization industries:\"Biotechnology\" revSortBy:nbEmployees'\n",
@@ -241,7 +251,10 @@
"source": [
"import asyncio\n",
"\n",
- "retriever = DiffbotKnowledgeGraphRetriever(k=3, fields=[\"id\", \"name\", \"industries\"])\n",
+ "# Async surface → pass the `DiffbotAsync` client. `ainvoke` runs on its pool.\n",
+ "retriever = DiffbotKnowledgeGraphRetriever(\n",
+ " async_client=adb, k=3, fields=[\"id\", \"name\", \"industries\"]\n",
+ ")\n",
"\n",
"queries = [\n",
" 'type:Organization location.city.name:\"Austin\" industries:\"Robotics\"',\n",
@@ -276,7 +289,7 @@
"source": [
"from langchain_diffbot import DiffbotWebSearchRetriever\n",
"\n",
- "web = DiffbotWebSearchRetriever(k=3, fields=[\"title\", \"pageUrl\", \"score\"])\n",
+ "web = DiffbotWebSearchRetriever(client=db, k=3, fields=[\"title\", \"pageUrl\", \"score\"])\n",
"\n",
"for d in web.invoke(\"diffbot knowledge graph llm grounding\"):\n",
" print(d.metadata)\n",
@@ -307,7 +320,7 @@
"source": [
"from langchain_diffbot import DiffbotExtractTool\n",
"\n",
- "extract = DiffbotExtractTool()\n",
+ "extract = DiffbotExtractTool(client=db)\n",
"result = extract.invoke({\"url\": \"https://www.diffbot.com/products/extract/\"})\n",
"\n",
"print(\"title:\", result[\"title\"])\n",
@@ -335,6 +348,7 @@
"from langchain_diffbot import DiffbotExtractLoader\n",
"\n",
"loader = DiffbotExtractLoader(\n",
+ " client=db,\n",
" urls=[\n",
" \"https://www.diffbot.com/products/extract/\",\n",
" \"https://www.diffbot.com/products/kg/\",\n",
@@ -386,7 +400,7 @@
"source": [
"from langchain_diffbot import DiffbotEntitiesTool\n",
"\n",
- "entities = DiffbotEntitiesTool()\n",
+ "entities = DiffbotEntitiesTool(client=db)\n",
"result = entities.invoke(\n",
" {\n",
" \"text\": \"Anthropic, founded by Dario Amodei and Daniela Amodei in 2021, released Claude in 2023.\"\n",
@@ -414,20 +428,7 @@
"id": "cell-26",
"metadata": {},
"outputs": [],
- "source": [
- "from langchain_diffbot import ChatDiffbot\n",
- "from langchain_core.messages import HumanMessage\n",
- "\n",
- "llm = ChatDiffbot()\n",
- "\n",
- "# Streaming\n",
- "print(\"streaming: \", end=\"\", flush=True)\n",
- "for chunk in llm.stream(\n",
- " [HumanMessage(content=\"In one sentence, what is the Diffbot Knowledge Graph?\")]\n",
- "):\n",
- " print(chunk.content, end=\"\", flush=True)\n",
- "print()"
- ]
+ "source": "from langchain_diffbot import ChatDiffbot\nfrom langchain.messages import HumanMessage\n\nllm = ChatDiffbot(client=db)\n\n# Streaming\nprint(\"streaming: \", end=\"\", flush=True)\nfor chunk in llm.stream(\n [HumanMessage(content=\"In one sentence, what is the Diffbot Knowledge Graph?\")]\n):\n print(chunk.content, end=\"\", flush=True)\nprint()"
},
{
"cell_type": "code",
@@ -446,9 +447,7 @@
"id": "cell-28",
"metadata": {},
"source": [
- "## 10. Bring-your-own-client\n",
- "\n",
- "Every class in the package accepts a pre-built `diffbot.Diffbot` (or `diffbot.DiffbotAsync`) via the `client` / `async_client` fields. When you supply one, the package uses it as-is and **does not close it** — you own the lifecycle. This is the escape hatch for anything the SDK supports that we don't re-expose: custom URLs, `transport=`, shared connection pools, custom headers."
+ "`ChatDiffbot` uses Diffbot's LLM as your *primary* model. To instead let a tool-calling agent (driven by, say, Claude) *consult* Diffbot, give it `DiffbotAskTool` — it answers a natural-language question from the KG + live web and returns a string."
]
},
{
@@ -458,16 +457,10 @@
"metadata": {},
"outputs": [],
"source": [
- "from diffbot import Diffbot\n",
- "\n",
- "# Share one Diffbot client across many retriever calls instead of opening a\n",
- "# fresh httpx pool per call. Useful in long-running services.\n",
- "shared = Diffbot(token=os.environ[\"DIFFBOT_API_TOKEN\"], timeout=60.0)\n",
+ "from langchain_diffbot import DiffbotAskTool\n",
"\n",
- "retriever = DiffbotKnowledgeGraphRetriever(client=shared, k=3, fields=[\"id\", \"name\"])\n",
- "print(retriever.invoke('type:Organization name:\"Diffbot\"'))\n",
- "\n",
- "shared.close()"
+ "ask = DiffbotAskTool(client=db)\n",
+ "print(ask.invoke({\"question\": \"Who founded Diffbot, and when?\"}))"
]
},
{
@@ -475,9 +468,9 @@
"id": "cell-30",
"metadata": {},
"source": [
- "## 11. Multi-tool research agent\n",
+ "## 10. Configuring the client\n",
"\n",
- "A more realistic agent setup: hand it three tools — KG search, web search, and URL extract — and let it pick its own approach. The agent below mirrors the `examples/company_research/` CLI in this repo, which uses the same shaping pattern in all three tools to keep responses agent-sized."
+ "There's no separate configuration surface on the components — everything the SDK supports is set on the client you build: `timeout`, a custom `transport=` (for logging / retries / mock transports in tests), or custom endpoint URLs. Build the client however you need and pass it in; share one across components to reuse its connection pool."
]
},
{
@@ -487,101 +480,15 @@
"metadata": {},
"outputs": [],
"source": [
- "from functools import lru_cache\n",
- "from typing import Any\n",
- "\n",
- "from diffbot.errors import APIError\n",
- "from langchain.agents import create_agent\n",
- "from langchain_core.tools import tool\n",
- "\n",
- "from langchain_diffbot import (\n",
- " DiffbotExtractTool,\n",
- " DiffbotKnowledgeGraphRetriever,\n",
- " DiffbotWebSearchRetriever,\n",
- ")\n",
- "\n",
- "_KG_FIELDS = [\n",
- " \"id\",\n",
- " \"type\",\n",
- " \"name\",\n",
- " \"homepageUri\",\n",
- " \"nbEmployees\",\n",
- " \"industries\",\n",
- " \"location\",\n",
- " \"employments\",\n",
- " \"date\",\n",
- "]\n",
- "\n",
- "\n",
- "@lru_cache(maxsize=1)\n",
- "def _kg() -> DiffbotKnowledgeGraphRetriever:\n",
- " return DiffbotKnowledgeGraphRetriever(k=5, fields=_KG_FIELDS)\n",
- "\n",
- "\n",
- "@lru_cache(maxsize=1)\n",
- "def _web() -> DiffbotWebSearchRetriever:\n",
- " return DiffbotWebSearchRetriever(k=5, fields=[\"title\", \"pageUrl\", \"score\"])\n",
- "\n",
- "\n",
- "@lru_cache(maxsize=1)\n",
- "def _extract() -> DiffbotExtractTool:\n",
- " return DiffbotExtractTool()\n",
- "\n",
- "\n",
- "@tool\n",
- "def search_kg(dql_query: str) -> list[dict]:\n",
- " \"\"\"Search the Diffbot Knowledge Graph with a DQL query.\n",
- "\n",
- " DQL: `type:Organization`, `name:\"Diffbot\"`, `location.city.name:\"Austin\"`,\n",
- " `revSortBy:nbEmployees` (descending; use `sortBy:` for ascending). AND with\n",
- " spaces. Combine for filtered lookup.\n",
- " \"\"\"\n",
- " try:\n",
- " docs = _kg().invoke(dql_query)\n",
- " except APIError as exc:\n",
- " return [\n",
- " {\n",
- " \"error\": f\"Diffbot rejected the query ({exc.status_code}): {exc.message}. Refine and retry.\"\n",
- " }\n",
- " ]\n",
- " return [{\"summary\": d.page_content, **d.metadata} for d in docs]\n",
- "\n",
- "\n",
- "@tool\n",
- "def web_search(query: str) -> list[dict]:\n",
- " \"\"\"Search the web via Diffbot. Use when the KG comes up short.\"\"\"\n",
- " docs = _web().invoke(query)\n",
- " return [{**d.metadata, \"content\": d.page_content[:800]} for d in docs]\n",
- "\n",
- "\n",
- "@tool\n",
- "def extract_url(url: str) -> dict[str, Any]:\n",
- " \"\"\"Fetch and read a single web page (markdown + title + type).\"\"\"\n",
- " raw = _extract().invoke({\"url\": url})\n",
- " if \"error\" in raw:\n",
- " return raw\n",
- " return {**raw, \"content\": (raw.get(\"content\") or \"\")[:4000]}\n",
- "\n",
- "\n",
- "SYSTEM_PROMPT = \"\"\"\\\n",
- "You are a research assistant with three Diffbot-backed tools:\n",
+ "from diffbot import Diffbot\n",
"\n",
- "- `search_kg(dql_query)` — Knowledge Graph search via DQL. Prefer for known\n",
- " entities and filtered queries.\n",
- "- `web_search(query)` — natural-language web search. Use when the KG is\n",
- " empty or you need current info.\n",
- "- `extract_url(url)` — read a single web page in full.\n",
+ "# e.g. a longer timeout for big exports. Customize the client, not the component.\n",
+ "custom = Diffbot(token=os.environ[\"DIFFBOT_API_TOKEN\"], timeout=60.0)\n",
"\n",
- "Iterate: if KG is empty, web-search; if a web result looks promising, extract it.\n",
- "Cite the entity IDs or URLs you used in your answer.\"\"\"\n",
+ "retriever = DiffbotKnowledgeGraphRetriever(client=custom, k=3, fields=[\"id\", \"name\"])\n",
+ "print(retriever.invoke('type:Organization name:\"Diffbot\"'))\n",
"\n",
- "# Default to Haiku — a multi-step trace on a fresh Anthropic account can\n",
- "# blow past Sonnet's 30k input-tokens-per-minute Tier 1 limit.\n",
- "agent = create_agent(\n",
- " model=\"anthropic:claude-haiku-4-5\",\n",
- " tools=[search_kg, web_search, extract_url],\n",
- " system_prompt=SYSTEM_PROMPT,\n",
- ")"
+ "custom.close() # you own the lifecycle of clients you build"
]
},
{
@@ -589,7 +496,9 @@
"id": "cell-32",
"metadata": {},
"source": [
- "Ask it a research question. The agent will pick its own tools, may iterate, and cites its sources."
+ "## 11. Multi-tool research agent\n",
+ "\n",
+ "A more realistic agent setup: hand it three tools — KG search, web search, and URL extract — and let it pick its own approach. The agent below mirrors the `examples/company_research/` CLI in this repo, which uses the same shaping pattern in all three tools to keep responses agent-sized."
]
},
{
@@ -598,6 +507,22 @@
"id": "cell-33",
"metadata": {},
"outputs": [],
+ "source": "from functools import lru_cache\nfrom typing import Any\n\nfrom diffbot.errors import APIError\nfrom langchain.agents import create_agent\nfrom langchain.tools import tool\n\nfrom langchain_diffbot import (\n DiffbotExtractTool,\n DiffbotKnowledgeGraphRetriever,\n DiffbotWebSearchRetriever,\n)\n\n_KG_FIELDS = [\n \"id\",\n \"type\",\n \"name\",\n \"homepageUri\",\n \"nbEmployees\",\n \"industries\",\n \"location\",\n \"employments\",\n \"date\",\n]\n\n\n@lru_cache(maxsize=1)\ndef _kg() -> DiffbotKnowledgeGraphRetriever:\n return DiffbotKnowledgeGraphRetriever(client=db, k=5, fields=_KG_FIELDS)\n\n\n@lru_cache(maxsize=1)\ndef _web() -> DiffbotWebSearchRetriever:\n return DiffbotWebSearchRetriever(\n client=db, k=5, fields=[\"title\", \"pageUrl\", \"score\"]\n )\n\n\n@lru_cache(maxsize=1)\ndef _extract() -> DiffbotExtractTool:\n return DiffbotExtractTool(client=db)\n\n\n@tool\ndef search_kg(dql_query: str) -> list[dict]:\n \"\"\"Search the Diffbot Knowledge Graph with a DQL query.\n\n DQL: `type:Organization`, `name:\"Diffbot\"`, `location.city.name:\"Austin\"`,\n `revSortBy:nbEmployees` (descending; use `sortBy:` for ascending). AND with\n spaces. Combine for filtered lookup.\n \"\"\"\n try:\n docs = _kg().invoke(dql_query)\n except APIError as exc:\n return [\n {\n \"error\": f\"Diffbot rejected the query ({exc.status_code}): {exc.message}. Refine and retry.\"\n }\n ]\n return [{\"summary\": d.page_content, **d.metadata} for d in docs]\n\n\n@tool\ndef web_search(query: str) -> list[dict]:\n \"\"\"Search the web via Diffbot. Use when the KG comes up short.\"\"\"\n docs = _web().invoke(query)\n return [{**d.metadata, \"content\": d.page_content[:800]} for d in docs]\n\n\n@tool\ndef extract_url(url: str) -> dict[str, Any]:\n \"\"\"Fetch and read a single web page (markdown + title + type).\"\"\"\n raw = _extract().invoke({\"url\": url})\n if \"error\" in raw:\n return raw\n return {**raw, \"content\": (raw.get(\"content\") or \"\")[:4000]}\n\n\nSYSTEM_PROMPT = \"\"\"\\\nYou are a research assistant with three Diffbot-backed tools:\n\n- `search_kg(dql_query)` — Knowledge Graph search via DQL. Prefer for known\n entities and filtered queries.\n- `web_search(query)` — natural-language web search. Use when the KG is\n empty or you need current info.\n- `extract_url(url)` — read a single web page in full.\n\nIterate: if KG is empty, web-search; if a web result looks promising, extract it.\nCite the entity IDs or URLs you used in your answer.\"\"\"\n\n# Default to Haiku — a multi-step trace on a fresh Anthropic account can\n# blow past Sonnet's 30k input-tokens-per-minute Tier 1 limit.\nagent = create_agent(\n model=\"anthropic:claude-haiku-4-5\",\n tools=[search_kg, web_search, extract_url],\n system_prompt=SYSTEM_PROMPT,\n)"
+ },
+ {
+ "cell_type": "markdown",
+ "id": "cell-34",
+ "metadata": {},
+ "source": [
+ "Ask it a research question. The agent will pick its own tools, may iterate, and cites its sources."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "cell-35",
+ "metadata": {},
+ "outputs": [],
"source": [
"result = agent.invoke(\n",
" {\n",
@@ -611,7 +536,7 @@
},
{
"cell_type": "markdown",
- "id": "cell-34",
+ "id": "cell-36",
"metadata": {},
"source": [
"Inspect the trace to see which tools the agent reached for:"
@@ -620,7 +545,7 @@
{
"cell_type": "code",
"execution_count": null,
- "id": "cell-35",
+ "id": "cell-37",
"metadata": {},
"outputs": [],
"source": [
@@ -630,7 +555,7 @@
},
{
"cell_type": "markdown",
- "id": "cell-36",
+ "id": "cell-38",
"metadata": {},
"source": [
"## Where to go next\n",
@@ -655,4 +580,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
-}
+}
\ No newline at end of file
diff --git a/langchain_diffbot/__init__.py b/langchain_diffbot/__init__.py
index 96f5e39..3e2b37a 100644
--- a/langchain_diffbot/__init__.py
+++ b/langchain_diffbot/__init__.py
@@ -1,9 +1,10 @@
"""LangChain integration for Diffbot.
-Thin layer over the official `diffbot-python` SDK. Every public class accepts
-either a `diffbot_api_token` (or `DIFFBOT_API_TOKEN` env var) or a pre-built
-`diffbot.Diffbot` / `diffbot.DiffbotAsync` client via the `client` /
-`async_client` fields — anything the SDK can do, you can do via these classes.
+Thin layer over the official `diffbot-python` SDK. Every public class takes a
+pre-built `diffbot.Diffbot` (sync) and/or `diffbot.DiffbotAsync` (async) client
+via the `client` / `async_client` fields — build the client yourself (token,
+`timeout`, `transport=`, custom URLs), pass it in, and share one across many
+components. Anything the SDK can do, you configure on the client.
"""
from langchain_diffbot.chat_models import ChatDiffbot
@@ -16,6 +17,7 @@
DiffbotWebSearchRetriever,
)
from langchain_diffbot.tools import (
+ DiffbotAskTool,
DiffbotDQLProbeTool,
DiffbotEntitiesTool,
DiffbotExtractTool,
@@ -26,6 +28,7 @@
__all__ = [
"ChatDiffbot",
+ "DiffbotAskTool",
"DiffbotCrawlLoader",
"DiffbotDQLProbeTool",
"DiffbotEntitiesTool",
diff --git a/langchain_diffbot/_base.py b/langchain_diffbot/_base.py
index 932386a..2f8dcdb 100644
--- a/langchain_diffbot/_base.py
+++ b/langchain_diffbot/_base.py
@@ -1,29 +1,30 @@
"""Shared base for langchain-diffbot components.
Every public class in this package inherits from `_BaseDiffbotComponent`. The
-mixin holds the token / timeout / optional pre-built SDK clients and exposes
-two context managers (`_sync_db`, `_async_db`) that the components use to
-acquire a `diffbot.Diffbot` / `diffbot.DiffbotAsync` for a single call.
-
-Bring-your-own-client: if the user supplies `client=...` or
-`async_client=...`, we use it as-is and **do not close it** — the user owns
-the lifecycle. Otherwise we construct a fresh SDK client per call and close
-it on exit (same per-call lifecycle as the previous hand-rolled httpx
-wrapper).
+mixin holds the pre-built SDK clients and exposes two context managers
+(`_sync_db`, `_async_db`) that the components use to acquire a
+`diffbot.Diffbot` / `diffbot.DiffbotAsync` for a single call.
+
+Client-only by design: there is exactly one way to give a component HTTP
+access — hand it a client you built. Customize the client however the SDK
+allows (token, `timeout`, `transport=`, custom URLs); share a connection pool
+by passing the same client to several components; pick your execution mode by
+passing the matching class (`Diffbot` for sync, `DiffbotAsync` for async). The
+component never closes a client — you own its lifecycle (use `with`/`async
+with`, or call `.close()` / `.aclose()`).
"""
from __future__ import annotations
-import os
from collections.abc import AsyncIterator, Iterator
from contextlib import asynccontextmanager, contextmanager
from diffbot import Diffbot, DiffbotAsync
-from pydantic import BaseModel, ConfigDict, Field, SecretStr, model_validator
+from pydantic import BaseModel, ConfigDict, Field
class _BaseDiffbotComponent(BaseModel):
- """Mixin holding token, timeout, and optional pre-built SDK clients.
+ """Mixin holding the pre-built SDK clients.
Concrete classes inherit from this *and* a LangChain base
(`BaseRetriever`, `BaseTool`, `BaseDocumentLoader`, `BaseChatModel`).
@@ -32,79 +33,39 @@ class _BaseDiffbotComponent(BaseModel):
model_config = ConfigDict(arbitrary_types_allowed=True)
- diffbot_api_token: SecretStr | None = Field(default=None)
- """Diffbot API token. Falls back to `DIFFBOT_API_TOKEN`.
-
- Not required when both `client` and `async_client` are supplied.
- """
-
- timeout: float = 30.0
- """HTTP timeout (seconds) for SDK clients we construct ourselves.
-
- Ignored when `client` / `async_client` are supplied.
- """
-
client: Diffbot | None = Field(default=None, exclude=True, repr=False)
- """Optional pre-built sync SDK client.
+ """Pre-built sync SDK client. Required for the sync surface.
- If set, we use it as-is and do not close it.
+ Build it yourself — `Diffbot(token=..., timeout=..., transport=...)` — and
+ pass it here. Used as-is and never closed; you own its lifecycle.
"""
async_client: DiffbotAsync | None = Field(default=None, exclude=True, repr=False)
- """Optional pre-built async SDK client.
+ """Pre-built async SDK client. Required for the async surface.
- If set, we use it as-is and do not close it.
+ Build it yourself — `DiffbotAsync(token=...)` — and pass it here. Used
+ as-is and never closed; you own its lifecycle.
"""
- @model_validator(mode="after")
- def _resolve_token(self) -> _BaseDiffbotComponent:
- # If the user gave us a client (for either side), we can't be sure
- # they'll use the other side — but token resolution shouldn't block
- # construction in that case. Defer the missing-token error to call time.
- if self.client is not None or self.async_client is not None:
- return self
- if (
- self.diffbot_api_token is None
- or not self.diffbot_api_token.get_secret_value()
- ):
- env_token = os.environ.get("DIFFBOT_API_TOKEN", "")
- if not env_token:
- msg = (
- "A Diffbot API token is required. Pass `diffbot_api_token=...`, "
- "set the `DIFFBOT_API_TOKEN` environment variable, or supply a "
- "pre-built `client` / `async_client`."
- )
- raise ValueError(msg)
- self.diffbot_api_token = SecretStr(env_token)
- return self
-
- def _token(self) -> str:
- if (
- self.diffbot_api_token is None
- or not self.diffbot_api_token.get_secret_value()
- ):
+ @contextmanager
+ def _sync_db(self) -> Iterator[Diffbot]:
+ """Yield the sync client. Raises if none was supplied. Never closes it."""
+ if self.client is None:
msg = (
- "A Diffbot API token is required for this call. Pass "
- "`diffbot_api_token=...`, set `DIFFBOT_API_TOKEN`, or supply a "
- "pre-built client."
+ "This component has no sync client. Pass "
+ "`client=Diffbot(token=...)` (build it from the `diffbot` SDK)."
)
raise ValueError(msg)
- return self.diffbot_api_token.get_secret_value()
-
- @contextmanager
- def _sync_db(self) -> Iterator[Diffbot]:
- """Yield a `Diffbot` for one call. Closes only clients we constructed."""
- if self.client is not None:
- yield self.client
- return
- with Diffbot(token=self._token(), timeout=self.timeout) as db:
- yield db
+ yield self.client
@asynccontextmanager
async def _async_db(self) -> AsyncIterator[DiffbotAsync]:
- """Yield a `DiffbotAsync` for one call. Closes only clients we constructed."""
- if self.async_client is not None:
- yield self.async_client
- return
- async with DiffbotAsync(token=self._token(), timeout=self.timeout) as db:
- yield db
+ """Yield the async client. Raises if none was supplied. Never closes it."""
+ if self.async_client is None:
+ msg = (
+ "This component has no async client. Pass "
+ "`async_client=DiffbotAsync(token=...)` (build it from the "
+ "`diffbot` SDK)."
+ )
+ raise ValueError(msg)
+ yield self.async_client
diff --git a/langchain_diffbot/chat_models.py b/langchain_diffbot/chat_models.py
index 7b84ca1..c171746 100644
--- a/langchain_diffbot/chat_models.py
+++ b/langchain_diffbot/chat_models.py
@@ -48,9 +48,10 @@ class ChatDiffbot(_BaseDiffbotComponent, BaseChatModel):
Example:
```python
+ from diffbot import Diffbot
from langchain_diffbot import ChatDiffbot
- llm = ChatDiffbot()
+ llm = ChatDiffbot(client=Diffbot(token=...))
llm.invoke("What's the capital of France?")
```
"""
diff --git a/langchain_diffbot/document_loaders.py b/langchain_diffbot/document_loaders.py
index 4c06165..54a5b36 100644
--- a/langchain_diffbot/document_loaders.py
+++ b/langchain_diffbot/document_loaders.py
@@ -55,9 +55,11 @@ class DiffbotExtractLoader(_BaseDiffbotComponent, BaseLoader):
Example:
```python
+ from diffbot import Diffbot
from langchain_diffbot import DiffbotExtractLoader
docs = DiffbotExtractLoader(
+ client=Diffbot(token=...),
urls=["https://example.com", "https://diffbot.com"],
).load()
```
diff --git a/langchain_diffbot/retrievers.py b/langchain_diffbot/retrievers.py
index 81332e8..8a80a36 100644
--- a/langchain_diffbot/retrievers.py
+++ b/langchain_diffbot/retrievers.py
@@ -69,11 +69,16 @@ class DiffbotKnowledgeGraphRetriever(_BaseDiffbotComponent, BaseRetriever):
[DQL](https://docs.diffbot.com/reference/dql-quickstart) expression
(e.g. `type:Organization industries:"Artificial Intelligence"`).
+ Build a `diffbot.Diffbot` (sync) and/or `diffbot.DiffbotAsync` (async) and
+ pass it in; the same client can be shared across many components.
+
Example:
```python
+ from diffbot import Diffbot
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
- retriever = DiffbotKnowledgeGraphRetriever(k=5)
+ db = Diffbot(token=..., timeout=60.0)
+ retriever = DiffbotKnowledgeGraphRetriever(client=db, k=5)
retriever.invoke('type:Organization location.city.name:"Boston"')
```
@@ -82,6 +87,7 @@ class DiffbotKnowledgeGraphRetriever(_BaseDiffbotComponent, BaseRetriever):
```python
retriever = DiffbotKnowledgeGraphRetriever(
+ client=db,
k=5,
fields=["id", "type", "name", "homepageUri", "nbEmployees"],
)
@@ -96,16 +102,7 @@ def mapper(entity):
metadata={"id": entity["id"], "name": entity["name"]},
)
- retriever = DiffbotKnowledgeGraphRetriever(document_mapper=mapper)
- ```
-
- For full SDK control, supply a pre-built client:
-
- ```python
- from diffbot import Diffbot
- retriever = DiffbotKnowledgeGraphRetriever(
- client=Diffbot(token=..., timeout=60.0),
- )
+ retriever = DiffbotKnowledgeGraphRetriever(client=db, document_mapper=mapper)
```
"""
@@ -232,9 +229,10 @@ class DiffbotWebSearchRetriever(_BaseDiffbotComponent, BaseRetriever):
Example:
```python
+ from diffbot import Diffbot
from langchain_diffbot import DiffbotWebSearchRetriever
- retriever = DiffbotWebSearchRetriever(k=5)
+ retriever = DiffbotWebSearchRetriever(client=Diffbot(token=...), k=5)
retriever.invoke("diffbot knowledge graph")
```
"""
diff --git a/langchain_diffbot/tools.py b/langchain_diffbot/tools.py
index 4b31ac2..c53017d 100644
--- a/langchain_diffbot/tools.py
+++ b/langchain_diffbot/tools.py
@@ -376,6 +376,55 @@ async def _arun(
return _ontology_lookup(self._ontology, op, name, search)
+class _DiffbotAskInput(BaseModel):
+ question: str = Field(
+ description="Natural-language question for Diffbot's RAG LLM."
+ )
+
+
+class DiffbotAskTool(_BaseDiffbotComponent, BaseTool):
+ """Tool that asks Diffbot's LLM RAG (`ask`) endpoint a natural-language question.
+
+ Where `DiffbotKnowledgeGraphTool` runs a precise DQL query, this delegates a
+ fuzzy question to Diffbot's own LLM — grounded in the Knowledge Graph and the
+ live web — and returns a synthesized answer. Drop it into any tool-calling
+ agent so the agent can *consult* Diffbot for things it can't express in DQL.
+
+ The SDK streams the answer; this tool aggregates the stream into a single
+ string (use `ChatDiffbot` when you want the chat-model surface with streaming).
+ """
+
+ name: str = "diffbot_ask"
+ description: str = (
+ "Ask Diffbot's LLM a natural-language question. It answers using "
+ "Diffbot's Knowledge Graph and a live web search, returning a synthesized "
+ "answer with sources. Use for open-ended questions you can't express as a "
+ "precise Knowledge Graph (DQL) query."
+ )
+ args_schema: type[BaseModel] = _DiffbotAskInput
+
+ def _run(
+ self,
+ question: str,
+ run_manager: CallbackManagerForToolRun | None = None,
+ ) -> str:
+ messages = [{"role": "user", "content": question}]
+ with self._sync_db() as db:
+ return "".join(db.ask(messages))
+
+ async def _arun(
+ self,
+ question: str,
+ run_manager: AsyncCallbackManagerForToolRun | None = None,
+ ) -> str:
+ messages = [{"role": "user", "content": question}]
+ parts: list[str] = []
+ async with self._async_db() as db:
+ async for chunk in db.ask(messages):
+ parts.append(chunk)
+ return "".join(parts)
+
+
class _DiffbotDQLProbeInput(BaseModel):
queries: list[str] = Field(
description=(
diff --git a/pyproject.toml b/pyproject.toml
index da31321..ccda8f8 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -49,6 +49,7 @@ test = [
"pytest-asyncio>=0.23,<2.0",
"pytest-mock>=3.10,<4.0",
"respx>=0.21,<1.0",
+ "langchain>=1.3,<2.0",
"langchain-tests>=1.0,<2.0",
]
lint = ["ruff>=0.6,<1.0"]
diff --git a/tests/integration_tests/test_readme_examples.py b/tests/integration_tests/test_readme_examples.py
new file mode 100644
index 0000000..969a21c
--- /dev/null
+++ b/tests/integration_tests/test_readme_examples.py
@@ -0,0 +1,68 @@
+"""Execute the README's python code blocks against the live Diffbot API.
+
+The mocked unit suite (`unit_tests/test_readme_examples.py`) proves the examples
+*parse and run*; this proves they still work against the real service — catching
+drift in query semantics or response shapes that a canned mock can't.
+
+Credential gating, so CI never spends non-Diffbot quota:
+ - The whole module is skipped without `DIFFBOT_API_TOKEN`.
+ - Diffbot-only blocks run live (CI has the token).
+ - The LCEL example imports `langchain_anthropic`; it runs only when
+ `ANTHROPIC_API_KEY` is set (i.e. locally). CI does not provide that key — and
+ doesn't install `langchain_anthropic` — so the block is skipped there and no
+ Anthropic tokens are consumed.
+"""
+
+from __future__ import annotations
+
+import contextlib
+import importlib.util
+import io
+import os
+
+import pytest
+
+from tests.readme import (
+ DIFFBOT_ONLY_IMPORTS,
+ extract_blocks,
+ import_roots,
+ is_executable,
+)
+
+pytestmark = pytest.mark.skipif(
+ not os.environ.get("DIFFBOT_API_TOKEN"),
+ reason="DIFFBOT_API_TOKEN not set",
+)
+
+_BLOCKS = extract_blocks()
+
+
+def _skip_reason(source: str) -> str | None:
+ """Return why a block can't run live, or None if it's runnable."""
+ if not is_executable(source):
+ # The crawl example drives a real, slow crawl job; documented, not run.
+ return "illustrative block (e.g. crawl) — not executed by the suite"
+ extra = import_roots(source) - DIFFBOT_ONLY_IMPORTS
+ if not extra:
+ return None
+ # The only non-Diffbot example we support running is the Anthropic LCEL one.
+ if extra == {"langchain_anthropic"}:
+ if not os.environ.get("ANTHROPIC_API_KEY"):
+ return "ANTHROPIC_API_KEY not set (CI never runs the Anthropic example)"
+ if importlib.util.find_spec("langchain_anthropic") is None:
+ return "langchain_anthropic not installed"
+ return None
+ return f"needs un-runnable imports {sorted(extra)}"
+
+
+@pytest.mark.parametrize("source", [b for _, b in _BLOCKS], ids=[i for i, _ in _BLOCKS])
+def test_readme_example_runs_live(source: str) -> None:
+ reason = _skip_reason(source)
+ if reason:
+ pytest.skip(reason)
+
+ # Examples print for illustration; swallow it to keep test output clean.
+ with contextlib.redirect_stdout(io.StringIO()):
+ exec(
+ compile(source, f"README.md:{id(source)}", "exec"), {"__name__": "__main__"}
+ )
diff --git a/tests/integration_tests/test_retriever.py b/tests/integration_tests/test_retriever.py
index eb620a9..610300b 100644
--- a/tests/integration_tests/test_retriever.py
+++ b/tests/integration_tests/test_retriever.py
@@ -5,6 +5,7 @@
import os
import pytest
+from diffbot import Diffbot, DiffbotAsync
from langchain_core.documents import Document
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
@@ -16,15 +17,18 @@
def test_live_basic_query() -> None:
- retriever = DiffbotKnowledgeGraphRetriever(k=3)
+ client = Diffbot(token=os.environ["DIFFBOT_API_TOKEN"])
+ retriever = DiffbotKnowledgeGraphRetriever(client=client, k=3)
docs = retriever.invoke('type:Organization name:"Diffbot"')
assert isinstance(docs, list)
assert all(isinstance(d, Document) for d in docs)
assert len(docs) <= 3
+ client.close()
async def test_live_async_query() -> None:
- retriever = DiffbotKnowledgeGraphRetriever(k=2)
- docs = await retriever.ainvoke('type:Organization name:"Diffbot"')
+ async with DiffbotAsync(token=os.environ["DIFFBOT_API_TOKEN"]) as client:
+ retriever = DiffbotKnowledgeGraphRetriever(async_client=client, k=2)
+ docs = await retriever.ainvoke('type:Organization name:"Diffbot"')
assert isinstance(docs, list)
assert all(isinstance(d, Document) for d in docs)
diff --git a/tests/integration_tests/test_standard.py b/tests/integration_tests/test_standard.py
index 3fedae2..ed8b6fa 100644
--- a/tests/integration_tests/test_standard.py
+++ b/tests/integration_tests/test_standard.py
@@ -6,6 +6,7 @@
from typing import Any
import pytest
+from diffbot import Diffbot, DiffbotAsync
from langchain_core.retrievers import BaseRetriever
from langchain_tests.integration_tests import RetrieversIntegrationTests
@@ -24,7 +25,13 @@ def retriever_constructor(self) -> type[BaseRetriever]:
@property
def retriever_constructor_params(self) -> dict[str, Any]:
- return {}
+ # The standard suite exercises both `invoke` and `ainvoke`, so supply
+ # both a sync and an async client.
+ token = os.environ["DIFFBOT_API_TOKEN"]
+ return {
+ "client": Diffbot(token=token),
+ "async_client": DiffbotAsync(token=token),
+ }
@property
def retriever_query_example(self) -> str:
diff --git a/tests/readme.py b/tests/readme.py
new file mode 100644
index 0000000..831c24f
--- /dev/null
+++ b/tests/readme.py
@@ -0,0 +1,85 @@
+"""Shared helpers for testing the python code blocks in README.md.
+
+README.md is the single source of truth for this package's docs: the LangChain
+provider page is *generated from it* by the `sync-langchain-docs` skill. Three
+suites consume these helpers: `unit_tests/test_readme_examples.py` runs the
+blocks against respx mocks (deterministic, no token),
+`integration_tests/test_readme_examples.py` runs them against the live Diffbot
+API, and `unit_tests/test_readme_parity.py` checks the page stays in lockstep
+with the package surface (`__all__`). The execution suites decide which blocks
+to run by inspecting each block's imports and whether it is executable.
+"""
+
+from __future__ import annotations
+
+import ast
+import re
+from pathlib import Path
+
+README = Path(__file__).parents[1] / "README.md"
+
+# A block whose top-level imports all fall in this set needs only Diffbot to run:
+# it is mockable with respx in the unit suite, and safe to execute live in CI
+# with just a DIFFBOT_API_TOKEN — no Anthropic or other third-party calls (so CI
+# never consumes non-Diffbot quota). `os` is stdlib (blocks read the token from
+# the environment). The LCEL example imports `langchain_anthropic` and is
+# therefore skipped by both suites.
+DIFFBOT_ONLY_IMPORTS = {"langchain_diffbot", "langchain_core", "langchain", "diffbot", "os"}
+
+# Components that construct one of these are documented but not executed by the
+# README suites. `DiffbotCrawlLoader` drives a real crawl job, which is slow and
+# costly live and isn't mockable with the per-call respx pattern the rest of the
+# suite uses (the crawl SDK path is integration-tested upstream — see
+# `unit_tests/test_document_loaders.py`).
+_NON_EXECUTABLE_CONSTRUCTORS = ("DiffbotCrawlLoader",)
+
+_PYTHON_BLOCK = re.compile(r"```python\n(.*?)```", re.DOTALL)
+_TABLE_ROW = re.compile(r"^\|\s*`([A-Za-z_][A-Za-z0-9_]*)`\s*\|", re.MULTILINE)
+
+
+def read() -> str:
+ """Return the raw text of README.md."""
+ return README.read_text()
+
+
+def extract_blocks() -> list[tuple[str, str]]:
+ """Return (id, source) for each ```python block, id'd by README line number."""
+ text = read()
+ blocks = []
+ for match in _PYTHON_BLOCK.finditer(text):
+ line = text.count("\n", 0, match.start()) + 1
+ blocks.append((f"L{line}", match.group(1)))
+ return blocks
+
+
+def is_executable(source: str) -> bool:
+ """Whether the README suites should `exec` this block (vs. illustrate only)."""
+ return not any(
+ re.search(rf"\b{name}\s*\(", source) for name in _NON_EXECUTABLE_CONSTRUCTORS
+ )
+
+
+def components_table_classes() -> list[str]:
+ """Return the class names listed in the `## Components reference` table.
+
+ Reads the first backtick-wrapped cell of each row under that heading.
+ """
+ text = read()
+ start = text.index("## Components reference")
+ section = text[start:]
+ # Stop at the next H2 if any, so we don't pick up later backticked cells.
+ next_h2 = re.search(r"\n## ", section[3:])
+ if next_h2:
+ section = section[: next_h2.start() + 3]
+ return _TABLE_ROW.findall(section)
+
+
+def import_roots(source: str) -> set[str]:
+ """Top-level module names imported by a code block (e.g. ``langchain_core``)."""
+ roots: set[str] = set()
+ for node in ast.walk(ast.parse(source)):
+ if isinstance(node, ast.Import):
+ roots.update(alias.name.split(".")[0] for alias in node.names)
+ elif isinstance(node, ast.ImportFrom) and node.module and node.level == 0:
+ roots.add(node.module.split(".")[0])
+ return roots
diff --git a/tests/unit_tests/test_chat_models.py b/tests/unit_tests/test_chat_models.py
index d4055ac..c58f917 100644
--- a/tests/unit_tests/test_chat_models.py
+++ b/tests/unit_tests/test_chat_models.py
@@ -4,7 +4,8 @@
import httpx
import respx
-from langchain_core.messages import AIMessageChunk, HumanMessage, SystemMessage
+from diffbot import Diffbot, DiffbotAsync
+from langchain.messages import AIMessageChunk, HumanMessage, SystemMessage
from langchain_diffbot import ChatDiffbot
@@ -21,7 +22,7 @@
@respx.mock
def test_stream_yields_chunks() -> None:
respx.post(ASK_URL).mock(return_value=httpx.Response(200, content=SSE_BODY))
- llm = ChatDiffbot(diffbot_api_token="t")
+ llm = ChatDiffbot(client=Diffbot(token="t"))
chunks = list(llm.stream([HumanMessage(content="hi")]))
contents = [c.content for c in chunks if isinstance(c, AIMessageChunk)]
assert "".join(contents) == "Hello, world"
@@ -30,7 +31,7 @@ def test_stream_yields_chunks() -> None:
@respx.mock
def test_invoke_aggregates_stream() -> None:
respx.post(ASK_URL).mock(return_value=httpx.Response(200, content=SSE_BODY))
- llm = ChatDiffbot(diffbot_api_token="t")
+ llm = ChatDiffbot(client=Diffbot(token="t"))
result = llm.invoke([HumanMessage(content="hi")])
assert result.content == "Hello, world"
@@ -38,7 +39,7 @@ def test_invoke_aggregates_stream() -> None:
@respx.mock
def test_message_role_mapping() -> None:
route = respx.post(ASK_URL).mock(return_value=httpx.Response(200, content=SSE_BODY))
- llm = ChatDiffbot(diffbot_api_token="t")
+ llm = ChatDiffbot(client=Diffbot(token="t"))
llm.invoke([SystemMessage(content="you are a bot"), HumanMessage(content="hi")])
import json
@@ -50,7 +51,7 @@ def test_message_role_mapping() -> None:
@respx.mock
async def test_astream_yields_chunks() -> None:
respx.post(ASK_URL).mock(return_value=httpx.Response(200, content=SSE_BODY))
- llm = ChatDiffbot(diffbot_api_token="t")
+ llm = ChatDiffbot(async_client=DiffbotAsync(token="t"))
parts: list[str] = []
async for chunk in llm.astream([HumanMessage(content="hi")]):
if isinstance(chunk, AIMessageChunk) and isinstance(chunk.content, str):
diff --git a/tests/unit_tests/test_document_loaders.py b/tests/unit_tests/test_document_loaders.py
index 8ff680a..f0e574c 100644
--- a/tests/unit_tests/test_document_loaders.py
+++ b/tests/unit_tests/test_document_loaders.py
@@ -4,6 +4,7 @@
import httpx
import respx
+from diffbot import Diffbot, DiffbotAsync
from diffbot.crawl import CrawlEvent, CrawlEventType
from langchain_diffbot import DiffbotCrawlLoader, DiffbotExtractLoader
@@ -29,7 +30,7 @@ def test_extract_loader_yields_one_document_per_url() -> None:
)
)
loader = DiffbotExtractLoader(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
urls=["https://example.com", "https://example.com/2"],
)
docs = list(loader.lazy_load())
@@ -46,7 +47,9 @@ async def test_extract_loader_alazy_load() -> None:
200, json={"objects": [{"text": "x", "title": "t"}]}
)
)
- loader = DiffbotExtractLoader(diffbot_api_token="t", urls=["https://example.com"])
+ loader = DiffbotExtractLoader(
+ async_client=DiffbotAsync(token="t"), urls=["https://example.com"]
+ )
docs = [d async for d in loader.alazy_load()]
assert len(docs) == 1
assert docs[0].page_content == "x"
@@ -55,7 +58,7 @@ async def test_extract_loader_alazy_load() -> None:
def test_crawl_loader_default_mapper_filters_to_url_events() -> None:
# Bypass the SDK by stubbing `_kwargs` and feeding events through the mapper
# directly — the crawl SDK path is integration-tested upstream.
- loader = DiffbotCrawlLoader(diffbot_api_token="t", site="https://example.com")
+ loader = DiffbotCrawlLoader(client=Diffbot(token="t"), site="https://example.com")
job_event = CrawlEvent(
event_type=CrawlEventType.JOB_CREATED,
timestamp="now",
@@ -88,7 +91,7 @@ def mapper(event: CrawlEvent):
)
loader = DiffbotCrawlLoader(
- diffbot_api_token="t", site="https://example.com", event_mapper=mapper
+ client=Diffbot(token="t"), site="https://example.com", event_mapper=mapper
)
job_event = CrawlEvent(
event_type=CrawlEventType.JOB_CREATED,
@@ -102,13 +105,13 @@ def mapper(event: CrawlEvent):
def test_crawl_loader_defaults_watch_true() -> None:
- loader = DiffbotCrawlLoader(diffbot_api_token="t", site="https://example.com")
+ loader = DiffbotCrawlLoader(client=Diffbot(token="t"), site="https://example.com")
assert loader._kwargs() == {"watch": True}
def test_crawl_loader_user_can_override_watch() -> None:
loader = DiffbotCrawlLoader(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
site="https://example.com",
crawl_kwargs={"watch": False, "hops": 3},
)
diff --git a/tests/unit_tests/test_imports.py b/tests/unit_tests/test_imports.py
index 71706e3..02afd5e 100644
--- a/tests/unit_tests/test_imports.py
+++ b/tests/unit_tests/test_imports.py
@@ -2,6 +2,7 @@
EXPECTED = [
"ChatDiffbot",
+ "DiffbotAskTool",
"DiffbotCrawlLoader",
"DiffbotDQLProbeTool",
"DiffbotEntitiesTool",
diff --git a/tests/unit_tests/test_readme_examples.py b/tests/unit_tests/test_readme_examples.py
new file mode 100644
index 0000000..ecee054
--- /dev/null
+++ b/tests/unit_tests/test_readme_examples.py
@@ -0,0 +1,149 @@
+"""Execute the Python code blocks in README.md against mocked Diffbot endpoints.
+
+README.md is the single source of truth for this package's docs: the
+docs.langchain.com provider page is *generated from it* by the
+`sync-langchain-docs` skill. This guards against the README (and therefore that
+page) drifting from the package: a renamed class, a changed kwarg, or a wrong
+import surfaces here as a failing exec rather than as a user copy-pasting a
+broken snippet.
+
+The blocks are run under `respx` mocks of every Diffbot endpoint — the same
+canned-response style the rest of `unit_tests/` uses — so the suite stays
+deterministic and needs no token or network. Blocks importing anything outside
+`DIFFBOT_ONLY_IMPORTS` (e.g. the LCEL example's `langchain_anthropic`, which
+would hit a real third-party API) are skipped, as are non-executable blocks
+(`tests.readme.is_executable` — the crawl example drives a real job); the live
+integration suite handles the former. See `tests/readme.py` for the shared
+extraction helpers, and `tests/integration_tests/test_readme_examples.py` for
+the live counterpart.
+"""
+
+from __future__ import annotations
+
+import contextlib
+import io
+
+import httpx
+import pytest
+import respx
+
+from tests.readme import (
+ DIFFBOT_ONLY_IMPORTS,
+ extract_blocks,
+ import_roots,
+ is_executable,
+)
+
+# Endpoint fixtures, mirroring the per-component unit tests.
+DQL_URL = "https://kg.diffbot.com/kg/v3/dql"
+WEB_SEARCH_URL = "https://llm.diffbot.com/api/v1/web_search"
+ANALYZE_URL = "https://api.diffbot.com/v3/analyze"
+ASK_URL = "https://llm.diffbot.com/rag/v1/chat/completions"
+NLP_URL = "https://nl.diffbot.com/v1/"
+ONTOLOGY_URL = "https://kg.diffbot.com/kg/ontology"
+
+DQL_BODY = {
+ # `hits` lets the DQL-probe example (size=0) shape a real count.
+ "hits": 42,
+ "data": [
+ {
+ "score": 1000.0,
+ "entity": {
+ "id": "E1",
+ "type": "Organization",
+ "name": "Acme AI",
+ "description": "Boston-based AI company.",
+ "homepageUri": "https://acme.example",
+ "nbEmployees": 42,
+ "industries": ["Artificial Intelligence"],
+ },
+ }
+ ],
+}
+
+NLP_BODY = [
+ {
+ "entities": [{"name": "Diffbot", "type": "Organization", "id": "E1"}],
+ "sentiment": 0.4,
+ }
+]
+
+ONTOLOGY_BODY = {
+ "types": {
+ "Organization": {"fields": {"name": {"type": "String"}}},
+ "Person": {"fields": {"name": {"type": "String"}}},
+ },
+ "composites": {},
+ "enums": {},
+ "taxonomies": {},
+}
+
+WEB_SEARCH_BODY = {
+ "search_results": [
+ {
+ "score": 0.91,
+ "title": "Diffbot Knowledge Graph",
+ "pageUrl": "https://www.diffbot.com/kg/",
+ "content": "Diffbot KG is the largest commercial knowledge graph...",
+ }
+ ]
+}
+
+ANALYZE_BODY = {
+ "objects": [
+ {
+ "text": "Hello world",
+ "title": "Example",
+ "type": "article",
+ "pageUrl": "https://example.com",
+ }
+ ]
+}
+
+SSE_BODY = (
+ b'data: {"choices": [{"delta": {"content": "Hello"}}]}\n'
+ b'data: {"choices": [{"delta": {"content": ", world"}}]}\n'
+ b"data: [DONE]\n"
+)
+
+_BLOCKS = extract_blocks()
+
+
+def test_readme_has_python_examples() -> None:
+ # Guard against the extraction silently matching nothing (e.g. fence style
+ # changes), which would make every parametrized case vanish.
+ assert len(_BLOCKS) >= 12
+
+
+@pytest.mark.parametrize("source", [b for _, b in _BLOCKS], ids=[i for i, _ in _BLOCKS])
+def test_readme_example_runs(
+ source: str,
+ respx_mock: respx.MockRouter,
+ monkeypatch: pytest.MonkeyPatch,
+) -> None:
+ if not is_executable(source):
+ pytest.skip("illustrative block (e.g. crawl) — not executed by the suite")
+
+ unmockable = import_roots(source) - DIFFBOT_ONLY_IMPORTS
+ if unmockable:
+ pytest.skip(f"imports {sorted(unmockable)} — covered by the live suite")
+
+ monkeypatch.setenv("DIFFBOT_API_TOKEN", "test-token")
+ respx_mock.get(DQL_URL).mock(return_value=httpx.Response(200, json=DQL_BODY))
+ respx_mock.get(WEB_SEARCH_URL).mock(
+ return_value=httpx.Response(200, json=WEB_SEARCH_BODY)
+ )
+ respx_mock.get(ANALYZE_URL).mock(
+ return_value=httpx.Response(200, json=ANALYZE_BODY)
+ )
+ respx_mock.post(ASK_URL).mock(return_value=httpx.Response(200, content=SSE_BODY))
+ respx_mock.post(NLP_URL).mock(return_value=httpx.Response(200, json=NLP_BODY))
+ respx_mock.get(ONTOLOGY_URL).mock(
+ return_value=httpx.Response(200, json=ONTOLOGY_BODY)
+ )
+
+ # Examples print for illustration; swallow it to keep test output clean.
+ with contextlib.redirect_stdout(io.StringIO()):
+ exec(
+ compile(source, f"README.md:{id(source)}", "exec"), {"__name__": "__main__"}
+ )
diff --git a/tests/unit_tests/test_readme_parity.py b/tests/unit_tests/test_readme_parity.py
new file mode 100644
index 0000000..98cebec
--- /dev/null
+++ b/tests/unit_tests/test_readme_parity.py
@@ -0,0 +1,45 @@
+"""Keep README.md in lockstep with the package surface.
+
+README.md is the single source of truth for this package's docs; the LangChain
+provider page is generated from it by the `sync-langchain-docs` skill. These
+tests fail if the README drifts from `__all__`, so a class added, renamed, or
+removed cannot silently miss the docs (and therefore the published page).
+"""
+
+from __future__ import annotations
+
+import re
+
+from langchain_diffbot import __all__
+from tests.readme import components_table_classes, extract_blocks
+
+# Exported classes that take a `client` (every public component does today).
+_COMPONENTS = set(__all__)
+
+
+def test_components_table_matches_all() -> None:
+ """The Components reference table lists exactly the exported classes."""
+ assert sorted(components_table_classes()) == sorted(__all__)
+
+
+def test_every_class_is_documented() -> None:
+ """Every exported class appears in an import line in a README example."""
+ text = "\n".join(src for _, src in extract_blocks())
+ for name in __all__:
+ assert re.search(rf"\b{name}\b", text), f"{name} is not used in any example"
+
+
+def test_examples_build_a_client() -> None:
+ """No component is constructed without a client (the package is client-only).
+
+ Catches the most damaging regression: an example that omits `client=` and
+ therefore raises `ValueError` at call time.
+ """
+ for block_id, src in extract_blocks():
+ constructs_component = any(
+ re.search(rf"\b{name}\s*\(", src) for name in _COMPONENTS
+ )
+ if constructs_component:
+ assert "client=" in src or "async_client=" in src, (
+ f"example {block_id} constructs a component without a client"
+ )
diff --git a/tests/unit_tests/test_retriever.py b/tests/unit_tests/test_retriever.py
index d02f23c..99729b4 100644
--- a/tests/unit_tests/test_retriever.py
+++ b/tests/unit_tests/test_retriever.py
@@ -5,7 +5,7 @@
import httpx
import pytest
import respx
-from diffbot import Diffbot
+from diffbot import Diffbot, DiffbotAsync
from langchain_core.documents import Document
from langchain_diffbot import DiffbotKnowledgeGraphRetriever
@@ -40,22 +40,24 @@
}
-def test_requires_token(monkeypatch: pytest.MonkeyPatch) -> None:
- monkeypatch.delenv("DIFFBOT_API_TOKEN", raising=False)
- with pytest.raises(ValueError, match="Diffbot API token"):
- DiffbotKnowledgeGraphRetriever()
+def test_sync_call_requires_client() -> None:
+ # No client supplied: construction succeeds, but the sync surface raises a
+ # clear error when invoked (the component never builds a client itself).
+ r = DiffbotKnowledgeGraphRetriever()
+ with pytest.raises(ValueError, match="no sync client"):
+ r.invoke("type:Organization")
-def test_reads_token_from_env(monkeypatch: pytest.MonkeyPatch) -> None:
- monkeypatch.setenv("DIFFBOT_API_TOKEN", "env-token")
+async def test_async_call_requires_async_client() -> None:
r = DiffbotKnowledgeGraphRetriever()
- assert r.diffbot_api_token.get_secret_value() == "env-token"
+ with pytest.raises(ValueError, match="no async client"):
+ await r.ainvoke("type:Organization")
@respx.mock
def test_invoke_maps_entities_to_documents() -> None:
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
- r = DiffbotKnowledgeGraphRetriever(diffbot_api_token="t", k=2)
+ r = DiffbotKnowledgeGraphRetriever(client=Diffbot(token="t"), k=2)
docs = r.invoke("type:Organization")
assert len(docs) == 2
assert all(isinstance(d, Document) for d in docs)
@@ -80,7 +82,7 @@ def test_invoke_handles_flat_entity_shape() -> None:
]
}
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=flat_body))
- r = DiffbotKnowledgeGraphRetriever(diffbot_api_token="t")
+ r = DiffbotKnowledgeGraphRetriever(client=Diffbot(token="t"))
[doc] = r.invoke("type:Organization")
assert doc.page_content == "x"
assert doc.metadata["name"] == "Flat Co"
@@ -91,7 +93,7 @@ def test_invoke_k_kwarg_overrides_default() -> None:
route = respx.get(f"{DQL_URL}").mock(
return_value=httpx.Response(200, json=SAMPLE_BODY)
)
- r = DiffbotKnowledgeGraphRetriever(diffbot_api_token="t", k=10)
+ r = DiffbotKnowledgeGraphRetriever(client=Diffbot(token="t"), k=10)
r.invoke("type:Organization", k=3)
assert route.calls.last.request.url.params["size"] == "3"
@@ -99,7 +101,7 @@ def test_invoke_k_kwarg_overrides_default() -> None:
@respx.mock
def test_invoke_rejects_invalid_k() -> None:
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
- r = DiffbotKnowledgeGraphRetriever(diffbot_api_token="t")
+ r = DiffbotKnowledgeGraphRetriever(client=Diffbot(token="t"))
with pytest.raises(ValueError, match="positive integer"):
r.invoke("type:Organization", k=0)
@@ -107,7 +109,7 @@ def test_invoke_rejects_invalid_k() -> None:
@respx.mock
async def test_ainvoke_maps_entities_to_documents() -> None:
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
- r = DiffbotKnowledgeGraphRetriever(diffbot_api_token="t", k=2)
+ r = DiffbotKnowledgeGraphRetriever(async_client=DiffbotAsync(token="t"), k=2)
docs = await r.ainvoke("type:Organization")
assert [d.metadata["id"] for d in docs] == ["E1", "E2"]
@@ -137,7 +139,7 @@ async def test_ainvoke_maps_entities_to_documents() -> None:
def test_fields_projection_narrows_metadata() -> None:
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=FAT_BODY))
r = DiffbotKnowledgeGraphRetriever(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
fields=["id", "name", "homepageUri", "nbEmployees"],
)
[doc] = r.invoke("type:Organization")
@@ -156,7 +158,7 @@ def test_fields_excludes_chosen_content_field() -> None:
# because it was promoted to `page_content`.
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=FAT_BODY))
r = DiffbotKnowledgeGraphRetriever(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
fields=["id", "name", "description"],
)
[doc] = r.invoke("type:Organization")
@@ -178,7 +180,7 @@ def test_content_fields_priority_is_configurable() -> None:
}
respx.get(f"{DQL_URL}").mock(return_value=httpx.Response(200, json=body))
r = DiffbotKnowledgeGraphRetriever(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
content_fields=["summary", "description", "name"],
)
[doc] = r.invoke("type:Organization")
@@ -199,7 +201,7 @@ def mapper(entity: dict) -> Document:
)
r = DiffbotKnowledgeGraphRetriever(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
# These should be ignored when document_mapper is set.
fields=["nbEmployees"],
content_fields=["description"],
@@ -217,7 +219,7 @@ def test_sdk_kwargs_pass_through() -> None:
# job to test), but the params should at least show up.
route = respx.get(DQL_URL).mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
r = DiffbotKnowledgeGraphRetriever(
- diffbot_api_token="t",
+ client=Diffbot(token="t"),
k=5,
from_=10,
filter="type:Organization",
@@ -232,10 +234,10 @@ def test_sdk_kwargs_pass_through() -> None:
@respx.mock
-def test_client_passthrough_uses_provided_diffbot() -> None:
- # If the user supplies a pre-built client, we use it as-is and don't close
- # it. After two retrieve calls the client should still be usable, proving
- # it wasn't closed between calls.
+def test_client_is_reused_and_not_closed() -> None:
+ # The component uses the supplied client as-is and never closes it. After two
+ # retrieve calls the client should still be usable, and the same client can be
+ # shared across components (one connection pool).
respx.get(DQL_URL).mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
db = Diffbot(token="t", timeout=5.0)
r = DiffbotKnowledgeGraphRetriever(client=db, k=1)
@@ -244,15 +246,3 @@ def test_client_passthrough_uses_provided_diffbot() -> None:
# Direct SDK call still works — confirms we didn't close the user's client.
assert isinstance(db.dql("name:Charlie", size=1), dict)
db.close()
-
-
-def test_client_passthrough_skips_token_requirement(
- monkeypatch: pytest.MonkeyPatch,
-) -> None:
- # When a client is supplied, no token is needed at construction time.
- monkeypatch.delenv("DIFFBOT_API_TOKEN", raising=False)
- db = Diffbot(token="t")
- r = DiffbotKnowledgeGraphRetriever(client=db)
- # No exception even though there's no token field set.
- assert r.diffbot_api_token is None
- db.close()
diff --git a/tests/unit_tests/test_tools.py b/tests/unit_tests/test_tools.py
index 90c68e2..5d11fae 100644
--- a/tests/unit_tests/test_tools.py
+++ b/tests/unit_tests/test_tools.py
@@ -4,8 +4,10 @@
import httpx
import respx
+from diffbot import Diffbot
from langchain_diffbot import (
+ DiffbotAskTool,
DiffbotDQLProbeTool,
DiffbotEntitiesTool,
DiffbotExtractTool,
@@ -19,6 +21,7 @@
NLP_URL = "https://nl.diffbot.com/v1/"
DQL_URL = "https://kg.diffbot.com/kg/v3/dql"
ONTOLOGY_URL = "https://kg.diffbot.com/kg/ontology"
+ASK_URL = "https://llm.diffbot.com/rag/v1/chat/completions"
_FIXTURE_ONTOLOGY = {
"types": {
@@ -67,7 +70,7 @@ def test_extract_tool_shapes_response() -> None:
]
}
respx.get(ANALYZE_URL).mock(return_value=httpx.Response(200, json=raw))
- tool = DiffbotExtractTool(diffbot_api_token="t")
+ tool = DiffbotExtractTool(client=Diffbot(token="t"))
out = tool.invoke({"url": "https://example.com"})
assert out["content"] == "Hello world"
assert out["title"] == "Example"
@@ -84,7 +87,7 @@ def test_extract_tool_returns_structured_error_on_extraction_failure() -> None:
200, json={"errorCode": 500, "error": "could not fetch"}
)
)
- tool = DiffbotExtractTool(diffbot_api_token="t")
+ tool = DiffbotExtractTool(client=Diffbot(token="t"))
out = tool.invoke({"url": "https://example.com"})
assert out["errorCode"] == 500
assert "could not fetch" in out["error"]
@@ -95,7 +98,7 @@ def test_extract_tool_propagates_auth_error() -> None:
respx.get(ANALYZE_URL).mock(
return_value=httpx.Response(401, text='{"message": "bad token"}')
)
- tool = DiffbotExtractTool(diffbot_api_token="t")
+ tool = DiffbotExtractTool(client=Diffbot(token="t"))
try:
tool.invoke({"url": "https://example.com"})
except Exception as e: # diffbot.errors.AuthError
@@ -117,7 +120,7 @@ def test_web_search_tool_returns_results_list() -> None:
]
}
respx.get(WEB_SEARCH_URL).mock(return_value=httpx.Response(200, json=body))
- tool = DiffbotWebSearchTool(diffbot_api_token="t")
+ tool = DiffbotWebSearchTool(client=Diffbot(token="t"))
out = tool.invoke({"text": "anything", "num_results": 1})
assert isinstance(out, list)
assert out[0]["title"] == "A"
@@ -132,7 +135,7 @@ def test_entities_tool_returns_response_dict() -> None:
}
]
respx.post(NLP_URL).mock(return_value=httpx.Response(200, json=body))
- tool = DiffbotEntitiesTool(diffbot_api_token="t")
+ tool = DiffbotEntitiesTool(client=Diffbot(token="t"))
out = tool.invoke({"text": "Apple CEO ..."})
assert out["entities"][0]["id"] == "E1"
assert out["sentiment"] == 0.4
@@ -142,7 +145,7 @@ def test_entities_tool_returns_response_dict() -> None:
def test_kg_tool_returns_raw_body() -> None:
body = {"data": [{"score": 1.0, "entity": {"id": "E1", "name": "Acme"}}]}
respx.get(DQL_URL).mock(return_value=httpx.Response(200, json=body))
- tool = DiffbotKnowledgeGraphTool(diffbot_api_token="t")
+ tool = DiffbotKnowledgeGraphTool(client=Diffbot(token="t"))
out = tool.invoke({"query": "type:Organization", "size": 1})
assert out["data"][0]["entity"]["id"] == "E1"
@@ -150,7 +153,7 @@ def test_kg_tool_returns_raw_body() -> None:
@respx.mock
def test_ontology_tool_lists_and_caches() -> None:
route = _mock_ontology()
- tool = DiffbotOntologyTool(diffbot_api_token="t")
+ tool = DiffbotOntologyTool(client=Diffbot(token="t"))
assert tool.invoke({"op": "types"}) == ["Organization", "Person"]
# Second call is served from the in-memory cache — no second HTTP fetch.
assert tool.invoke({"op": "enums"}) == ["Language"]
@@ -160,7 +163,7 @@ def test_ontology_tool_lists_and_caches() -> None:
@respx.mock
def test_ontology_tool_fields_and_taxonomy() -> None:
_mock_ontology()
- tool = DiffbotOntologyTool(diffbot_api_token="t")
+ tool = DiffbotOntologyTool(client=Diffbot(token="t"))
fields = tool.invoke({"op": "fields", "name": "Organization"})
assert "location: [Location] [isComposite]" in fields
tax = tool.invoke(
@@ -172,7 +175,7 @@ def test_ontology_tool_fields_and_taxonomy() -> None:
@respx.mock
def test_ontology_tool_returns_error_dict_on_unknown_type() -> None:
_mock_ontology()
- tool = DiffbotOntologyTool(diffbot_api_token="t")
+ tool = DiffbotOntologyTool(client=Diffbot(token="t"))
out = tool.invoke({"op": "fields", "name": "Nope"})
assert isinstance(out, dict)
assert "error" in out
@@ -188,7 +191,7 @@ def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(200, json={"hits": hits, "results": 0})
respx.get(DQL_URL).mock(side_effect=handler)
- tool = DiffbotDQLProbeTool(diffbot_api_token="t")
+ tool = DiffbotDQLProbeTool(client=Diffbot(token="t"))
out = tool.invoke(
{"queries": ['type:Organization name:"Diffbot"', "type:Organization"]}
)
@@ -196,3 +199,18 @@ def handler(request: httpx.Request) -> httpx.Response:
{"query": 'type:Organization name:"Diffbot"', "hits": 5},
{"query": "type:Organization", "hits": 100},
]
+
+
+_ASK_SSE = (
+ b'data: {"choices": [{"delta": {"content": "Diffbot was "}}]}\n'
+ b'data: {"choices": [{"delta": {"content": "founded in 2008."}}]}\n'
+ b"data: [DONE]\n"
+)
+
+
+@respx.mock
+def test_ask_tool_aggregates_streamed_answer() -> None:
+ respx.post(ASK_URL).mock(return_value=httpx.Response(200, content=_ASK_SSE))
+ tool = DiffbotAskTool(client=Diffbot(token="t"))
+ out = tool.invoke({"question": "When was Diffbot founded?"})
+ assert out == "Diffbot was founded in 2008."
diff --git a/tests/unit_tests/test_web_search_retriever.py b/tests/unit_tests/test_web_search_retriever.py
index d213aff..e5dd14d 100644
--- a/tests/unit_tests/test_web_search_retriever.py
+++ b/tests/unit_tests/test_web_search_retriever.py
@@ -4,6 +4,7 @@
import httpx
import respx
+from diffbot import Diffbot, DiffbotAsync
from langchain_core.documents import Document
from langchain_diffbot import DiffbotWebSearchRetriever
@@ -31,7 +32,7 @@
@respx.mock
def test_invoke_maps_results_to_documents() -> None:
respx.get(WEB_SEARCH_URL).mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
- r = DiffbotWebSearchRetriever(diffbot_api_token="t", k=2)
+ r = DiffbotWebSearchRetriever(client=Diffbot(token="t"), k=2)
docs = r.invoke("diffbot knowledge graph")
assert len(docs) == 2
assert all(isinstance(d, Document) for d in docs)
@@ -50,7 +51,7 @@ def test_num_results_and_max_tokens_pass_through() -> None:
route = respx.get(WEB_SEARCH_URL).mock(
return_value=httpx.Response(200, json=SAMPLE_BODY)
)
- r = DiffbotWebSearchRetriever(diffbot_api_token="t", k=5, max_tokens=2000)
+ r = DiffbotWebSearchRetriever(client=Diffbot(token="t"), k=5, max_tokens=2000)
r.invoke("diffbot")
params = route.calls.last.request.url.params
# diffbot-python sends the `num_results` kwarg as the `size` wire param.
@@ -62,7 +63,7 @@ def test_num_results_and_max_tokens_pass_through() -> None:
def test_fields_allowlist() -> None:
respx.get(WEB_SEARCH_URL).mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
r = DiffbotWebSearchRetriever(
- diffbot_api_token="t", k=1, fields=["title", "pageUrl"]
+ client=Diffbot(token="t"), k=1, fields=["title", "pageUrl"]
)
[doc] = r.invoke("diffbot")
assert set(doc.metadata) == {"title", "pageUrl"}
@@ -75,7 +76,9 @@ def test_document_mapper_overrides_default() -> None:
def mapper(hit: dict) -> Document:
return Document(page_content=hit["title"], metadata={"url": hit["pageUrl"]})
- r = DiffbotWebSearchRetriever(diffbot_api_token="t", k=1, document_mapper=mapper)
+ r = DiffbotWebSearchRetriever(
+ client=Diffbot(token="t"), k=1, document_mapper=mapper
+ )
[doc] = r.invoke("diffbot")
assert doc.page_content == "Diffbot Knowledge Graph"
assert doc.metadata == {"url": "https://www.diffbot.com/kg/"}
@@ -84,7 +87,7 @@ def mapper(hit: dict) -> Document:
@respx.mock
async def test_ainvoke_works() -> None:
respx.get(WEB_SEARCH_URL).mock(return_value=httpx.Response(200, json=SAMPLE_BODY))
- r = DiffbotWebSearchRetriever(diffbot_api_token="t", k=2)
+ r = DiffbotWebSearchRetriever(async_client=DiffbotAsync(token="t"), k=2)
docs = await r.ainvoke("diffbot")
assert [d.metadata["title"] for d in docs] == [
"Diffbot Knowledge Graph",
diff --git a/uv.lock b/uv.lock
index d206936..9be1627 100644
--- a/uv.lock
+++ b/uv.lock
@@ -607,6 +607,7 @@ lint = [
{ name = "ruff" },
]
test = [
+ { name = "langchain" },
{ name = "langchain-tests" },
{ name = "pytest" },
{ name = "pytest-asyncio" },
@@ -634,6 +635,7 @@ provides-extras = ["examples"]
[package.metadata.requires-dev]
lint = [{ name = "ruff", specifier = ">=0.6,<1.0" }]
test = [
+ { name = "langchain", specifier = ">=1.3,<2.0" },
{ name = "langchain-tests", editable = "../langchain/libs/standard-tests" },
{ name = "pytest", specifier = ">=8.0,<10.0" },
{ name = "pytest-asyncio", specifier = ">=0.23,<2.0" },