Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
{
"name": "codealive",
"description": "CodeAlive context engine for semantic code search and AI-powered codebase Q&A. Enables AI coding agents to understand entire codebases beyond just open files — search across all indexed repositories, trace cross-service dependencies, discover usage patterns, and get synthesized answers to architectural questions. Includes a lightweight code exploration subagent, authentication hooks, and multiple search modes (fast lexical, semantic, and deep cross-cutting). Works standalone or alongside the CodeAlive MCP server for direct tool access via the Model Context Protocol.",
"version": "2.0.9",
"version": "2.1.0",
"author": {
"name": "CodeAlive AI",
"email": "hello@codealive.ai"
Expand Down
11 changes: 7 additions & 4 deletions agents/codealive-context-explorer.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ You are a code exploration specialist. **Your default tool is CodeAlive — not
Unless the request is unambiguously a local-only file lookup ("read line 42 of foo.ts", "is bar.py in this repo"), your first turn MUST include both of these calls before any answer:

```bash
python scripts/datasources.py
python scripts/datasources.py --query "<the user's question or task>"
python scripts/search.py "<question paraphrased as a concept query>" <data_source>
```

Expand All @@ -28,9 +28,12 @@ The scripts directory is relative to the skill location. If a path fails, fall b

### 1. List data sources — run FIRST every session
```bash
python scripts/datasources.py
python scripts/datasources.py --query "<the user's question or task>"
```
Without this you do not know what to search against. Instant, free, cheap.
Without this you do not know what to search against. Pass the user's question as `--query` so
the backend returns only the relevant sources, each with a `relevanceReason`. The output tells
you when sources were omitted, and when filtering was unavailable (the full list is returned
instead — fail-open). Omit `--query` only when the user asks for the complete inventory.

### 2. Semantic search — your default discovery tool
```bash
Expand Down Expand Up @@ -64,7 +67,7 @@ Use after `search.py` or `fetch.py` to expand a call graph, inheritance, or symb

Standard loop, in order:

1. **`datasources.py`** — every session, no exceptions.
1. **`datasources.py --query "<user's task>"`** — every session, no exceptions. The relevance-filtered shortlist tells you what to search against; if a source you expected is missing, rerun without `--query` to see the full list.
2. **`search.py`** with the main concept — every session, no exceptions. Run it even when you have a guess; the search confirms or refutes it with real evidence.
3. **`grep.py`** for specific identifiers, error messages, or config keys surfaced in step 2.
4. **`fetch.py`** on the most relevant identifiers (descriptions are triage pointers only — never reason from them).
Expand Down
20 changes: 17 additions & 3 deletions skills/codealive-context-engine/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Do NOT retry the failed script until setup completes successfully.

| Tool | Script | Speed | Cost | Best For |
|------|--------|-------|------|----------|
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces. With `--query "task"`, runs an AI relevance filter (low cost, not instant) returning only the relevant sources |
| **Semantic Search** | `search.py` | Fast | Low | Default discovery — finds code by meaning (concepts, behavior, architecture) |
| **Grep Search** | `grep.py` | Fast | Low | Finds code containing a specific string or regex (identifiers, literals, patterns) |
| **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content; function-like artifacts also include up to 3 outgoing/incoming calls as a preview |
Expand Down Expand Up @@ -106,9 +106,13 @@ logic.
### 1. Discover what's indexed

```bash
python scripts/datasources.py
python scripts/datasources.py --query "the user's task in natural language"
```

Recommended: pass the user's task as `--query` so the backend returns only the relevant
data sources, each with a `relevanceReason`. Omit `--query` to list everything (instant,
no AI filtering).

### 2. Search for code (fast, cheap)

```bash
Expand Down Expand Up @@ -151,11 +155,21 @@ python scripts/chat.py "What about security considerations?" --continue CONV_ID
### `datasources.py` — List Data Sources

```bash
python scripts/datasources.py # Ready-to-use sources
python scripts/datasources.py --query "add OAuth to checkout" # Only sources relevant to a task (recommended)
python scripts/datasources.py # Ready-to-use sources (full list)
python scripts/datasources.py --all # All (including processing)
python scripts/datasources.py --json # JSON output
```

| Option | Description |
|--------|-------------|
| `--query "TASK"` | The user's task/intent in natural language. The backend runs an AI relevance filter and returns only the relevant sources, each with a `relevanceReason`. Recommended whenever you know what the user is trying to accomplish |
| `--all` | Include sources still processing |
| `--json` | Raw JSON output (with `--query`: `{"dataSources": [...], "message": "..."}`) |

**Fail-open:** if relevance filtering is unavailable, the FULL list is returned and the
output says so — check the message before treating the result as a relevant shortlist.

### `search.py` — Semantic Code Search (default discovery tool)

The default starting point. Finds code by WHAT it does — concepts, behavior,
Expand Down
10 changes: 7 additions & 3 deletions skills/codealive-context-engine/references/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,13 @@ Complete workflows for common code exploration scenarios using CodeAlive.

### Step 1: Discover Available Code
```bash
python datasources.py
python datasources.py --query "your task in natural language"
```

Pass your task as `--query` to get only the relevant data sources, each with a
`relevanceReason` (recommended when you know the goal). Run plain `python datasources.py`
for the complete inventory.

Review output to understand:
- What repositories are indexed
- What workspaces group related repos
Expand Down Expand Up @@ -287,8 +291,8 @@ python grep.py "useMemo|useCallback|React.memo" workspace:all-frontend --regex
### Day 1: Get Overview

```bash
# Discover what's indexed
python datasources.py
# Discover what's indexed (relevance-filtered to the onboarding goal)
python datasources.py --query "onboard to the new-service codebase"

# Find entry points and main features
python search.py "main application entry point, startup initialization" new-service
Expand Down
57 changes: 49 additions & 8 deletions skills/codealive-context-engine/scripts/datasources.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,16 @@
Includes current project repos, dependencies, libraries, and organizational codebases.

Usage:
python datasources.py # Show ready-to-use data sources
python datasources.py --all # Show all data sources (including processing)
python datasources.py --json # Output as JSON
python datasources.py # Show ready-to-use data sources
python datasources.py --query "TASK" # Show only sources relevant to a task (recommended)
python datasources.py --all # Show all data sources (including processing)
python datasources.py --json # Output as JSON

Examples:
# RECOMMENDED when you know the task: only sources relevant to it, each with a
# relevanceReason explaining the match
python datasources.py --query "add OAuth to the checkout flow"

# List ready data sources
python datasources.py

Expand All @@ -19,6 +24,10 @@

# Get JSON output for parsing
python datasources.py --json

Note:
--query runs an AI relevance filter on the backend. It fails open: if filtering is
unavailable, the FULL list is returned and the output says so.
"""

import sys
Expand All @@ -31,17 +40,27 @@
from api_client import CodeAliveClient


def format_datasources(datasources: list, as_json: bool = False) -> str:
"""Format data sources for display."""
def format_datasources(datasources: list, as_json: bool = False, message: str = "") -> str:
"""Format data sources for display.

`message` is the relevance hint accompanying a --query'd listing: how many sources
were omitted as non-relevant, or that filtering was unavailable and the list is full.
"""
if as_json:
if message:
return json.dumps({"dataSources": datasources, "message": message}, indent=2)
return json.dumps(datasources, indent=2)

if not datasources:
if message:
return f"No data sources matched.\nℹ️ {message}"
return "No data sources found.\nAdd repositories at https://app.codealive.ai"

output = []
output.append(f"\n📚 Available Data Sources ({len(datasources)} total)\n")
output.append("="*80)
if message:
output.append(f"\nℹ️ {message}")

# Group by type
repos = [ds for ds in datasources if ds.get("type") == "Repository"]
Expand All @@ -58,6 +77,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
status = f" [{state}]" if state and state != "Alive" else ""
output.append(f"\n 📁 {name}{status}")
output.append(f" {desc}")
if ws.get("relevanceReason"):
output.append(f" 🎯 {ws['relevanceReason']}")

if repos:
output.append("\n\n📦 REPOSITORIES")
Expand All @@ -71,6 +92,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
status = f" [{state}]" if state and state != "Alive" else ""
output.append(f"\n 📄 {name}{status}")
output.append(f" {desc}")
if repo.get("relevanceReason"):
output.append(f" 🎯 {repo['relevanceReason']}")
if url:
output.append(f" 🔗 {url}")

Expand All @@ -79,6 +102,7 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
output.append(" • Use names with search.py, grep.py, and fetch.py")
output.append(" • Workspaces search ALL repos in the workspace")
output.append(" • Combine multiple data sources for broader search")
output.append(" • Pass --query 'your task' to list only the relevant sources")
output.append("\n📖 Examples:")
output.append(" python search.py 'auth logic' my-backend")
output.append(" python grep.py 'AuthService' my-backend")
Expand All @@ -90,20 +114,37 @@ def main():
"""CLI interface for listing data sources."""
alive_only = True
as_json = False
query = None

for arg in sys.argv[1:]:
args = sys.argv[1:]
i = 0
while i < len(args):
arg = args[i]
if arg == "--all":
alive_only = False
elif arg == "--json":
as_json = True
elif arg == "--query":
if i + 1 >= len(args):
print("❌ Error: --query requires a value", file=sys.stderr)
sys.exit(1)
query = args[i + 1]
i += 1
elif arg == "--help":
print(__doc__)
sys.exit(0)
i += 1

try:
client = CodeAliveClient()
datasources = client.get_datasources(alive_only=alive_only)
print(format_datasources(datasources, as_json))
result = client.get_datasources(alive_only=alive_only, query=query)
if isinstance(result, dict):
datasources = result.get("dataSources", [])
message = result.get("message", "")
else:
datasources = result
message = ""
print(format_datasources(datasources, as_json, message))

except Exception as e:
print(f"❌ Error: {e}", file=sys.stderr)
Expand Down
Loading