CodeAlive-AI · sciapanCA · Jun 9, 2026 · Jun 9, 2026 · Jun 9, 2026
diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
   "name": "codealive",
   "description": "CodeAlive context engine for semantic code search and AI-powered codebase Q&A. Enables AI coding agents to understand entire codebases beyond just open files — search across all indexed repositories, trace cross-service dependencies, discover usage patterns, and get synthesized answers to architectural questions. Includes a lightweight code exploration subagent, authentication hooks, and multiple search modes (fast lexical, semantic, and deep cross-cutting). Works standalone or alongside the CodeAlive MCP server for direct tool access via the Model Context Protocol.",
-  "version": "2.0.9",
+  "version": "2.1.0",
   "author": {
     "name": "CodeAlive AI",
     "email": "hello@codealive.ai"

diff --git a/agents/codealive-context-explorer.md b/agents/codealive-context-explorer.md
@@ -16,7 +16,7 @@ You are a code exploration specialist. **Your default tool is CodeAlive — not
 Unless the request is unambiguously a local-only file lookup ("read line 42 of foo.ts", "is bar.py in this repo"), your first turn MUST include both of these calls before any answer:
 
 ```bash
-python scripts/datasources.py
+python scripts/datasources.py --query "<the user's question or task>"
 python scripts/search.py "<question paraphrased as a concept query>" <data_source>
 ```
 
@@ -28,9 +28,12 @@ The scripts directory is relative to the skill location. If a path fails, fall b
 
 ### 1. List data sources — run FIRST every session
 ```bash
-python scripts/datasources.py
+python scripts/datasources.py --query "<the user's question or task>"
 ```
-Without this you do not know what to search against. Instant, free, cheap.
+Without this you do not know what to search against. Pass the user's question as `--query` so
+the backend returns only the relevant sources, each with a `relevanceReason`. The output tells
+you when sources were omitted, and when filtering was unavailable (the full list is returned
+instead — fail-open). Omit `--query` only when the user asks for the complete inventory.
 
 ### 2. Semantic search — your default discovery tool
 ```bash
@@ -64,7 +67,7 @@ Use after `search.py` or `fetch.py` to expand a call graph, inheritance, or symb
 
 Standard loop, in order:
 
-1. **`datasources.py`** — every session, no exceptions.
+1. **`datasources.py --query "<user's task>"`** — every session, no exceptions. The relevance-filtered shortlist tells you what to search against; if a source you expected is missing, rerun without `--query` to see the full list.
 2. **`search.py`** with the main concept — every session, no exceptions. Run it even when you have a guess; the search confirms or refutes it with real evidence.
 3. **`grep.py`** for specific identifiers, error messages, or config keys surfaced in step 2.
 4. **`fetch.py`** on the most relevant identifiers (descriptions are triage pointers only — never reason from them).

diff --git a/skills/codealive-context-engine/SKILL.md b/skills/codealive-context-engine/SKILL.md
@@ -37,7 +37,7 @@ Do NOT retry the failed script until setup completes successfully.
 
 | Tool | Script | Speed | Cost | Best For |
 |------|--------|-------|------|----------|
-| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces |
+| **List Data Sources** | `datasources.py` | Instant | Free | Discovering indexed repos and workspaces. With `--query "task"`, runs an AI relevance filter (low cost, not instant) returning only the relevant sources |
 | **Semantic Search** | `search.py` | Fast | Low | Default discovery — finds code by meaning (concepts, behavior, architecture) |
 | **Grep Search** | `grep.py` | Fast | Low | Finds code containing a specific string or regex (identifiers, literals, patterns) |
 | **Fetch Artifacts** | `fetch.py` | Fast | Low | Retrieving full content; function-like artifacts also include up to 3 outgoing/incoming calls as a preview |
@@ -106,9 +106,13 @@ logic.
 ### 1. Discover what's indexed
 
 ```bash
-python scripts/datasources.py
+python scripts/datasources.py --query "the user's task in natural language"
 ```
 
+Recommended: pass the user's task as `--query` so the backend returns only the relevant
+data sources, each with a `relevanceReason`. Omit `--query` to list everything (instant,
+no AI filtering).
+
 ### 2. Search for code (fast, cheap)
 
 ```bash
@@ -151,11 +155,21 @@ python scripts/chat.py "What about security considerations?" --continue CONV_ID
 ### `datasources.py` — List Data Sources
 
 ```bash
-python scripts/datasources.py              # Ready-to-use sources
+python scripts/datasources.py --query "add OAuth to checkout"  # Only sources relevant to a task (recommended)
+python scripts/datasources.py              # Ready-to-use sources (full list)
 python scripts/datasources.py --all        # All (including processing)
 python scripts/datasources.py --json       # JSON output
 ```
 
+| Option | Description |
+|--------|-------------|
+| `--query "TASK"` | The user's task/intent in natural language. The backend runs an AI relevance filter and returns only the relevant sources, each with a `relevanceReason`. Recommended whenever you know what the user is trying to accomplish |
+| `--all` | Include sources still processing |
+| `--json` | Raw JSON output (with `--query`: `{"dataSources": [...], "message": "..."}`) |
+
+**Fail-open:** if relevance filtering is unavailable, the FULL list is returned and the
+output says so — check the message before treating the result as a relevant shortlist.
+
 ### `search.py` — Semantic Code Search (default discovery tool)
 
 The default starting point. Finds code by WHAT it does — concepts, behavior,

diff --git a/skills/codealive-context-engine/references/workflows.md b/skills/codealive-context-engine/references/workflows.md
@@ -20,9 +20,13 @@ Complete workflows for common code exploration scenarios using CodeAlive.
 
 ### Step 1: Discover Available Code
 ```bash
-python datasources.py
+python datasources.py --query "your task in natural language"
 ```
 
+Pass your task as `--query` to get only the relevant data sources, each with a
+`relevanceReason` (recommended when you know the goal). Run plain `python datasources.py`
+for the complete inventory.
+
 Review output to understand:
 - What repositories are indexed
 - What workspaces group related repos
@@ -287,8 +291,8 @@ python grep.py "useMemo|useCallback|React.memo" workspace:all-frontend --regex
 ### Day 1: Get Overview
 
 ```bash
-# Discover what's indexed
-python datasources.py
+# Discover what's indexed (relevance-filtered to the onboarding goal)
+python datasources.py --query "onboard to the new-service codebase"
 
 # Find entry points and main features
 python search.py "main application entry point, startup initialization" new-service

diff --git a/skills/codealive-context-engine/scripts/datasources.py b/skills/codealive-context-engine/scripts/datasources.py
@@ -6,11 +6,16 @@
 Includes current project repos, dependencies, libraries, and organizational codebases.
 
 Usage:
-    python datasources.py              # Show ready-to-use data sources
-    python datasources.py --all        # Show all data sources (including processing)
-    python datasources.py --json       # Output as JSON
+    python datasources.py                  # Show ready-to-use data sources
+    python datasources.py --query "TASK"   # Show only sources relevant to a task (recommended)
+    python datasources.py --all            # Show all data sources (including processing)
+    python datasources.py --json           # Output as JSON
 
 Examples:
+    # RECOMMENDED when you know the task: only sources relevant to it, each with a
+    # relevanceReason explaining the match
+    python datasources.py --query "add OAuth to the checkout flow"
+
     # List ready data sources
     python datasources.py
 
@@ -19,6 +24,10 @@
 
     # Get JSON output for parsing
     python datasources.py --json
+
+Note:
+    --query runs an AI relevance filter on the backend. It fails open: if filtering is
+    unavailable, the FULL list is returned and the output says so.
 """
 
 import sys
@@ -31,17 +40,27 @@
 from api_client import CodeAliveClient
 
 
-def format_datasources(datasources: list, as_json: bool = False) -> str:
-    """Format data sources for display."""
+def format_datasources(datasources: list, as_json: bool = False, message: str = "") -> str:
+    """Format data sources for display.
+
+    `message` is the relevance hint accompanying a --query'd listing: how many sources
+    were omitted as non-relevant, or that filtering was unavailable and the list is full.
+    """
     if as_json:
+        if message:
+            return json.dumps({"dataSources": datasources, "message": message}, indent=2)
         return json.dumps(datasources, indent=2)
 
     if not datasources:
+        if message:
+            return f"No data sources matched.\nℹ️  {message}"
         return "No data sources found.\nAdd repositories at https://app.codealive.ai"
 
     output = []
     output.append(f"\n📚 Available Data Sources ({len(datasources)} total)\n")
     output.append("="*80)
+    if message:
+        output.append(f"\nℹ️  {message}")
 
     # Group by type
     repos = [ds for ds in datasources if ds.get("type") == "Repository"]
@@ -58,6 +77,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
             status = f" [{state}]" if state and state != "Alive" else ""
             output.append(f"\n  📁 {name}{status}")
             output.append(f"     {desc}")
+            if ws.get("relevanceReason"):
+                output.append(f"     🎯 {ws['relevanceReason']}")
 
     if repos:
         output.append("\n\n📦 REPOSITORIES")
@@ -71,6 +92,8 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
             status = f" [{state}]" if state and state != "Alive" else ""
             output.append(f"\n  📄 {name}{status}")
             output.append(f"     {desc}")
+            if repo.get("relevanceReason"):
+                output.append(f"     🎯 {repo['relevanceReason']}")
             if url:
                 output.append(f"     🔗 {url}")
 
@@ -79,6 +102,7 @@ def format_datasources(datasources: list, as_json: bool = False) -> str:
     output.append("   • Use names with search.py, grep.py, and fetch.py")
     output.append("   • Workspaces search ALL repos in the workspace")
     output.append("   • Combine multiple data sources for broader search")
+    output.append("   • Pass --query 'your task' to list only the relevant sources")
     output.append("\n📖 Examples:")
     output.append("   python search.py 'auth logic' my-backend")
     output.append("   python grep.py 'AuthService' my-backend")
@@ -90,20 +114,37 @@ def main():
     """CLI interface for listing data sources."""
     alive_only = True
     as_json = False
+    query = None
 
-    for arg in sys.argv[1:]:
+    args = sys.argv[1:]
+    i = 0
+    while i < len(args):
+        arg = args[i]
         if arg == "--all":
             alive_only = False
         elif arg == "--json":
             as_json = True
+        elif arg == "--query":
+            if i + 1 >= len(args):
+                print("❌ Error: --query requires a value", file=sys.stderr)
+                sys.exit(1)
+            query = args[i + 1]
+            i += 1
         elif arg == "--help":
             print(__doc__)
             sys.exit(0)
+        i += 1
 
     try:
         client = CodeAliveClient()
-        datasources = client.get_datasources(alive_only=alive_only)
-        print(format_datasources(datasources, as_json))
+        result = client.get_datasources(alive_only=alive_only, query=query)
+        if isinstance(result, dict):
+            datasources = result.get("dataSources", [])
+            message = result.get("message", "")
+        else:
+            datasources = result
+            message = ""
+        print(format_datasources(datasources, as_json, message))
 
     except Exception as e:
         print(f"❌ Error: {e}", file=sys.stderr)