Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ export KAGI_API_TOKEN='...'
| credential | what it unlocks |
| --- | --- |
| `KAGI_SESSION_TOKEN` | base search fallback, `search --lens`, filtered search, `quick`, `ask-page`, `assistant`, `translate`, `summarize --subscriber` |
| `KAGI_API_TOKEN` | public `summarize`, `fastgpt`, `enrich web`, `enrich news` |
| `KAGI_API_TOKEN` | public `summarize`, `extract`, `fastgpt`, `enrich web`, `enrich news` |
| none | `news`, `smallweb`, `auth status`, `--help` |

example config:
Expand Down Expand Up @@ -165,6 +165,7 @@ for the full command-to-token matrix, use the [`auth-matrix`](https://kagi.micr.
| `kagi batch` | run multiple searches in parallel with JSON, TOON, compact, pretty, markdown, or csv output and shared filters |
| `kagi auth` | launch the auth wizard, or inspect, validate, and save credentials |
| `kagi summarize` | use the paid public summarizer API or the subscriber summarizer with `--subscriber` |
| `kagi extract` | extract a page's full content as markdown through the paid API |
| `kagi watch` | rerun a search on an interval and emit added/removed result URLs |
| `kagi notify` | send search or news output to a webhook |
| `kagi history` | inspect local command history and aggregate query stats |
Expand Down
77 changes: 77 additions & 0 deletions docs/commands/extract.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
---
title: "extract"
description: "Complete reference for the kagi extract command - fetch a page's full content as markdown using Kagi's Extract API."
---

# `kagi extract`

Extract the readable content of a web page as markdown using Kagi's Extract API.

## Synopsis

```bash
kagi extract <URL>
```

## Description

The `kagi extract` command sends one HTTPS URL to Kagi's v1 Extract API and prints the extracted page markdown to stdout. It is useful when a search result, article, or documentation page needs full-page text instead of a search snippet.

The command uses JSON mode internally because that is the stable envelope returned by the API, then prints the first page's `markdown` field. If Kagi returns no page markdown, the CLI reports the Extract API error details and trace id when available.

## Authentication

**Required:** `KAGI_API_TOKEN`

Extract is part of Kagi's paid API surface and consumes API credit per request.

## Arguments

### `<URL>` (Required)

The HTTPS URL of the page to extract.

```bash
kagi extract "https://example.com/article"
```

Only `https://` URLs with a valid host are accepted.

## Output

The command prints markdown directly:

```markdown
# Article title

Extracted page content...
```

## Examples

### Save an Article

```bash
kagi extract "https://example.com/article" > article.md
```

### Pipe into Another Tool

```bash
kagi extract "https://example.com/article" | sed -n '1,80p'
```

## Exit Codes

| Code | Meaning |
|------|---------|
| 0 | Success - markdown extracted |
| 1 | Error - see stderr |

Common errors:

- Missing API token
- Non-HTTPS or invalid URL
- Insufficient API credit
- Kagi returns no extractable content
- Network error
1 change: 1 addition & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@
"commands/batch",
"commands/auth",
"commands/summarize",
"commands/extract",
"commands/watch",
"commands/notify",
"commands/history",
Expand Down
2 changes: 2 additions & 0 deletions docs/reference/auth-matrix.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ This reference provides a complete mapping of which commands require which authe
| `auth set` | None | None | Saves credentials |
| `summarize` | `KAGI_API_TOKEN` | None | Paid public API |
| `summarize --subscriber` | `KAGI_SESSION_TOKEN` | None | Subscriber web product |
| `extract` | `KAGI_API_TOKEN` | None | Paid public API |
| `news` | None | None | Public endpoint |
| `quick` | `KAGI_SESSION_TOKEN` | None | Quick Answer web product |
| `ask-page` | `KAGI_SESSION_TOKEN` | None | Subscriber feature |
Expand Down Expand Up @@ -147,6 +148,7 @@ flowchart TD
| `assistant custom` | `KAGI_SESSION_TOKEN` | Create and manage saved assistants |
| `translate` | `KAGI_SESSION_TOKEN` | Kagi Translate text mode |
| `fastgpt` | `KAGI_API_TOKEN` | Quick factual answers |
| `extract` | `KAGI_API_TOKEN` | Full-page markdown extraction |

#### Settings Commands

Expand Down
2 changes: 2 additions & 0 deletions docs/reference/coverage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ These are official, documented API endpoints:
|----------|---------|--------|
| Search API | `kagi search` | ✅ Implemented for base search |
| Universal Summarizer | `kagi summarize` | ✅ Implemented |
| Extract API | `kagi extract` | ✅ Implemented |
| FastGPT | `kagi fastgpt` | ✅ Implemented |
| Web Enrichment (Teclis) | `kagi enrich web` | ✅ Implemented |
| News Enrichment (TinyGem) | `kagi enrich news` | ✅ Implemented |
Expand Down Expand Up @@ -72,6 +73,7 @@ These require no authentication:
| `summarize` | Public API summarizer | API | ✅ |
| `summarize --subscriber` | Web summarizer | Session | ✅ |
| `summarize --filter` | Summarize stdin items as URLs or text | API or Session | ✅ |
| `extract` | Extract page content as markdown | API | ✅ |
| `watch` | Search diff monitoring | API or Session | ✅ |
| `notify` | Webhook notifications for search/news | API, Session, or None | ✅ |
| `history` | Local history and stats | None | ✅ |
Expand Down
109 changes: 101 additions & 8 deletions src/api.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,21 +33,23 @@ use crate::types::{
AssistantPromptResponse, AssistantThread, AssistantThreadDeleteResponse,
AssistantThreadExportResponse, AssistantThreadListResponse, AssistantThreadOpenResponse,
AssistantThreadPagination, CustomBangCreateRequest, CustomBangDetails, CustomBangSummary,
CustomBangUpdateRequest, DeletedResourceResponse, EnrichResponse, FastGptRequest,
FastGptResponse, LensCreateRequest, LensDetails, LensSummary, LensUpdateRequest,
NewsBatchCategories, NewsBatchCategory, NewsCategoriesResponse, NewsCategoryMetadata,
NewsCategoryMetadataList, NewsChaos, NewsChaosResponse, NewsContentFilterSummary,
NewsFilterPresetListEntry, NewsFilterPresetListResponse, NewsLatestBatch, NewsResolvedCategory,
NewsStoriesPayload, NewsStoriesResponse, NewsStoryContentFilterSummary,
RedirectRuleCreateRequest, RedirectRuleDetails, RedirectRuleSummary, RedirectRuleUpdateRequest,
SmallWebFeed, SubscriberSummarization, SubscriberSummarizeMeta, SubscriberSummarizeRequest,
CustomBangUpdateRequest, DeletedResourceResponse, EnrichResponse, ExtractPageInput,
ExtractRequest, ExtractResponse, FastGptRequest, FastGptResponse, LensCreateRequest,
LensDetails, LensSummary, LensUpdateRequest, NewsBatchCategories, NewsBatchCategory,
NewsCategoriesResponse, NewsCategoryMetadata, NewsCategoryMetadataList, NewsChaos,
NewsChaosResponse, NewsContentFilterSummary, NewsFilterPresetListEntry,
NewsFilterPresetListResponse, NewsLatestBatch, NewsResolvedCategory, NewsStoriesPayload,
NewsStoriesResponse, NewsStoryContentFilterSummary, RedirectRuleCreateRequest,
RedirectRuleDetails, RedirectRuleSummary, RedirectRuleUpdateRequest, SmallWebFeed,
SubscriberSummarization, SubscriberSummarizeMeta, SubscriberSummarizeRequest,
SubscriberSummarizeResponse, SummarizeRequest, SummarizeResponse, TextAlignmentsResponse,
ToggleResourceResponse, TranslateBootstrapMetadata, TranslateCommandRequest,
TranslateDetectedLanguage, TranslateOptionState, TranslateResponse, TranslateTextResponse,
TranslateWarning, TranslationSuggestionsResponse, WordInsightsResponse,
};

const KAGI_SUMMARIZE_PATH: &str = "/api/v0/summarize";
const KAGI_EXTRACT_PATH: &str = "/api/v1/extract";
const KAGI_SUBSCRIBER_SUMMARIZE_PATH: &str = "/mother/summary_labs";
const KAGI_NEWS_LATEST_PATH: &str = "/api/batches/latest";
const KAGI_NEWS_CATEGORIES_METADATA_PATH: &str = "/api/categories/metadata";
Expand Down Expand Up @@ -145,6 +147,45 @@ pub async fn execute_summarize(
decode_kagi_json(response, "summarizer").await
}

/// Extracts a web page as markdown using Kagi's v1 Extract API with API-token auth.
///
/// # Arguments
/// * `url` - The HTTPS URL to extract.
/// * `token` - The Kagi API token.
///
/// # Returns
/// Extracted page markdown.
///
/// # Errors
/// Returns `KagiError::Auth` if the token is missing, `KagiError::Config` if the
/// URL does not satisfy the Extract API contract, and network/parse errors on failure.
pub async fn execute_extract(url: &str, token: &str) -> Result<String, KagiError> {
if token.trim().is_empty() {
return Err(KagiError::Auth(
"missing Kagi API token (expected KAGI_API_TOKEN)".to_string(),
));
}

let url = normalize_extract_url(url)?;
let request = ExtractRequest {
pages: vec![ExtractPageInput { url }],
format: "json".to_string(),
};

let client = build_client()?;
let response = client
.post(http::kagi_url(KAGI_EXTRACT_PATH))
.header(header::AUTHORIZATION, format!("Bot {token}"))
.header(header::CONTENT_TYPE, "application/json")
.json(&request)
.send()
.await
.map_err(map_transport_error)?;

let response: ExtractResponse = decode_kagi_json(response, "Extract").await?;
extract_first_markdown(response)
}

/// Summarizes a URL or text using the subscriber web Summarizer with session-token auth.
///
/// # Arguments
Expand Down Expand Up @@ -232,6 +273,58 @@ pub async fn execute_subscriber_summarize(
}
}

fn normalize_extract_url(url: &str) -> Result<String, KagiError> {
let trimmed = url.trim();
if trimmed.is_empty() {
return Err(KagiError::Config("extract requires a URL".to_string()));
}

let parsed = Url::parse(trimmed)
.map_err(|error| KagiError::Config(format!("extract URL is invalid: {error}")))?;
if parsed.scheme() != "https" {
return Err(KagiError::Config(
"extract URL must use the https scheme".to_string(),
));
}
if parsed.host_str().is_none() {
return Err(KagiError::Config(
"extract URL must include a valid host".to_string(),
));
}

Ok(trimmed.to_string())
}

fn extract_first_markdown(response: ExtractResponse) -> Result<String, KagiError> {
if let Some(markdown) = response
.data
.first()
.and_then(|page| page.markdown.as_deref())
.filter(|markdown| !markdown.is_empty())
{
return Ok(markdown.to_string());
}

let suffix = response
.meta
.trace
.as_deref()
.map(|trace| format!(" (trace id: {trace})"))
.unwrap_or_default();

if let Some(errors) = response.errors.filter(|errors| !errors.is_empty()) {
return Err(KagiError::Network(format!(
"Kagi Extract API error: {}{}",
serde_json::to_string(&errors).unwrap_or_else(|_| format!("{errors:?}")),
suffix
)));
}

Err(KagiError::Parse(format!(
"Kagi Extract API returned no content{suffix}"
)))
}

/// Fetches Kagi News stories for a given category with optional content filtering.
///
/// # Arguments
Expand Down
10 changes: 10 additions & 0 deletions src/cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ pub enum Commands {
Auth(AuthCommand),
/// Summarize a URL or text with Kagi's public API or subscriber web Summarizer
Summarize(SummarizeArgs),
/// Extract a page's full content as markdown through Kagi's Extract API
Extract(ExtractArgs),
/// Read Kagi News from the live public JSON endpoints
News(NewsArgs),
/// Prompt Kagi Assistant and manage Assistant threads
Expand Down Expand Up @@ -604,6 +606,14 @@ impl SummarizeArgs {
}
}

#[derive(Debug, Args)]
/// Arguments for the `extract` subcommand.
pub struct ExtractArgs {
/// HTTPS URL of the page to extract as markdown
#[arg(value_name = "URL")]
pub url: String,
}

#[derive(Debug, Args)]
/// Arguments for the `fastgpt` subcommand.
pub struct FastGptArgs {
Expand Down
16 changes: 14 additions & 2 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ use crate::api::{
execute_custom_assistant_create, execute_custom_assistant_delete, execute_custom_assistant_get,
execute_custom_assistant_list, execute_custom_assistant_update, execute_custom_bang_create,
execute_custom_bang_delete, execute_custom_bang_get, execute_custom_bang_list,
execute_custom_bang_update, execute_enrich_news, execute_enrich_web, execute_fastgpt,
execute_lens_create, execute_lens_delete, execute_lens_get, execute_lens_list,
execute_custom_bang_update, execute_enrich_news, execute_enrich_web, execute_extract,
execute_fastgpt, execute_lens_create, execute_lens_delete, execute_lens_get, execute_lens_list,
execute_lens_set_enabled, execute_lens_update, execute_news, execute_news_categories,
execute_news_chaos, execute_news_filter_presets, execute_redirect_create,
execute_redirect_delete, execute_redirect_get, execute_redirect_list,
Expand Down Expand Up @@ -252,6 +252,12 @@ async fn run() -> Result<(), KagiError> {
print_json(&response)
}
}
Commands::Extract(args) => {
let token = resolve_api_token(profile.as_deref())?;
let markdown = execute_extract(&args.url, &token).await?;
println!("{markdown}");
Ok(())
}
Commands::News(args) => {
args.validate().map_err(KagiError::Config)?;

Expand Down Expand Up @@ -2070,6 +2076,7 @@ async fn run_mcp(args: McpArgs, profile: Option<&str>) -> Result<(), KagiError>
"tools": [
{"name": "kagi_search", "description": "Search Kagi", "inputSchema": {"type": "object"}},
{"name": "kagi_summarize", "description": "Summarize a URL or text", "inputSchema": {"type": "object"}},
{"name": "kagi_extract", "description": "Extract a page's full content as markdown", "inputSchema": {"type": "object"}},
{"name": "kagi_quick", "description": "Get a Kagi Quick Answer", "inputSchema": {"type": "object"}},
{"name": "kagi_news", "description": "Fetch Kagi News stories for a category", "inputSchema": {"type": "object"}},
{"name": "kagi_news_search", "description": "Search the News tab of kagi.com (clusters of articles)", "inputSchema": {"type": "object"}}
Expand Down Expand Up @@ -2136,6 +2143,11 @@ async fn run_mcp_tool_call(request: &Value, profile: Option<&str>) -> Result<Val
};
serde_json::to_string_pretty(&execute_summarize(&request, &token).await?)?
}
"kagi_extract" => {
let token = resolve_api_token(profile)?;
let url = arguments.get("url").and_then(Value::as_str).unwrap_or("");
execute_extract(url, &token).await?
}
"kagi_quick" => {
let token = resolve_session_token(profile)?;
let query = arguments.get("query").and_then(Value::as_str).unwrap_or("");
Expand Down
41 changes: 41 additions & 0 deletions src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,47 @@ pub struct SummarizeResponse {
pub data: Summarization,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
/// Request body for Kagi's v1 content extraction endpoint.
pub struct ExtractRequest {
pub pages: Vec<ExtractPageInput>,
pub format: String,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
/// A single page input for content extraction.
pub struct ExtractPageInput {
pub url: String,
}

#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)]
/// Metadata returned by the v1 extraction endpoint.
pub struct ExtractMeta {
#[serde(default, skip_serializing_if = "Option::is_none")]
pub trace: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub node: Option<String>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub ms: Option<u64>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, Eq)]
/// Extracted content for one page.
pub struct ExtractPageOutput {
pub url: String,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub markdown: Option<String>,
}

#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
/// Response from Kagi's v1 content extraction endpoint.
pub struct ExtractResponse {
pub meta: ExtractMeta,
pub data: Vec<ExtractPageOutput>,
#[serde(default, skip_serializing_if = "Option::is_none")]
pub errors: Option<Vec<Value>>,
}

#[derive(Debug, Clone, Default, Serialize, Deserialize, PartialEq, Eq)]
/// Metadata for the subscriber-mode summarization endpoint.
pub struct SubscriberSummarizeMeta {
Expand Down
Loading