Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 39 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
​```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

​```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
​```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

​```bash
Expand All @@ -76,9 +83,10 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
### Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
```

</details>
Expand Down Expand Up @@ -287,6 +295,8 @@ Add to `~/.config/zed/settings.json` (or `.zed/settings.json` in your project):
| `search` | Search a codebase with a natural-language or code query. Pass `repo` as a local directory path or an https:// git URL. |
| `find_related` | Given a file path and line number, return chunks semantically similar to the code at that location. |

By default the MCP server indexes only code files. To also index documentation and prose, append `--content all` (or `--content docs`) to the server command. For example, in Claude Code: `claude mcp add semble -s user -- uvx --from "semble[mcp]" semble --content all`.


<a id="bash-agentsmd"></a>

Expand All @@ -307,6 +317,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
​```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

​```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
​```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

​```bash
Expand All @@ -320,9 +337,10 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
```

### Sub-agent setup
Expand Down Expand Up @@ -357,11 +375,17 @@ semble search "save model to disk" https://github.com/MinishLab/model2vec
# Limit results
semble search "save model to disk" ./my-project --top-k 10

# Search docs and prose (markdown, rst, etc.) instead of code
semble search "deployment guide" ./my-project --content docs

# Search everything (code and docs)
semble search "authentication" ./my-project --content all

# Find code similar to a known location
semble find-related src/auth.py 42 ./my-project
```

`path` defaults to the current directory when omitted; git URLs are accepted. If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its place.
`--content` accepts `code` (default), `docs`, or `all`. `path` defaults to the current directory when omitted; git URLs are accepted. If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its place.

<details>
<summary>Savings</summary>
Expand Down Expand Up @@ -395,11 +419,17 @@ Stats are stored in `~/.semble/savings.jsonl`.
Semble can also be used as a Python library for programmatic access, useful when building custom tooling or integrating search directly into your own code.

```python
from semble import SembleIndex
from semble import ContentType, SembleIndex

# Index a local directory
# Index a local directory (code only, the default)
index = SembleIndex.from_path("./my-project")

# Index docs and prose (markdown, rst, etc.)
index = SembleIndex.from_path("./my-project", content=ContentType.DOCS)

# Index everything — code and docs
index = SembleIndex.from_path("./my-project", content=ContentType.ALL)

# Index a remote git repository
index = SembleIndex.from_git("https://github.com/MinishLab/model2vec")

Expand Down
3 changes: 2 additions & 1 deletion src/semble/__init__.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
from semble.index import SembleIndex
from semble.types import Chunk, EmbeddingMatrix, Encoder, IndexStats, SearchResult
from semble.types import Chunk, ContentType, EmbeddingMatrix, Encoder, IndexStats, SearchResult
from semble.version import __version__

__all__ = [
"Chunk",
"ContentType",
"EmbeddingMatrix",
"Encoder",
"IndexStats",
Expand Down
14 changes: 11 additions & 3 deletions src/semble/agents/claude.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -25,6 +32,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
14 changes: 11 additions & 3 deletions src/semble/agents/copilot.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -25,6 +32,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
14 changes: 11 additions & 3 deletions src/semble/agents/cursor.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -24,6 +31,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
14 changes: 11 additions & 3 deletions src/semble/agents/gemini.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -27,6 +34,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
14 changes: 11 additions & 3 deletions src/semble/agents/kiro.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -27,6 +34,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
14 changes: 11 additions & 3 deletions src/semble/agents/opencode.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,13 @@ semble search "save_pretrained" ./my-project
semble search "save model to disk" ./my-project --top-k 10
```

Use `--content docs` to search documentation and prose (markdown, rst, etc.) instead of code, or `--content all` to search everything:

```bash
semble search "deployment guide" ./my-project --content docs
semble search "authentication" ./my-project --content all
```

Use `semble find-related` to discover code similar to a known location (pass `file_path` and `line` from a prior search result):

```bash
Expand All @@ -28,6 +35,7 @@ If `semble` is not on `$PATH`, use `uvx --from "semble[mcp]" semble` in its plac
## Workflow

1. Start with `semble search` to find relevant chunks.
2. Inspect full files only when the returned chunk is not enough context.
3. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
4. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
2. Use `--content docs` when looking for documentation, READMEs, or prose files.
3. Inspect full files only when the returned chunk is not enough context.
4. Optionally use `semble find-related` with a promising result's `file_path` and `line` to discover related implementations.
5. Use grep only when you need exhaustive literal matches or quick confirmation of an exact string.
58 changes: 39 additions & 19 deletions src/semble/cli.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
import asyncio
import sys
import warnings
from enum import Enum
from importlib.resources import files
from importlib.util import find_spec
Expand All @@ -10,8 +11,11 @@

from semble.index import SembleIndex
from semble.stats import format_savings_report
from semble.types import ContentType
from semble.utils import _format_results, _is_git_url, _resolve_chunk

_CONTENT_CHOICES = [ct.value for ct in ContentType]


class Agent(str, Enum):
CLAUDE = "claude"
Expand All @@ -32,6 +36,21 @@ def _agent_path(agent: Agent) -> Path:
return Path(base_dir) / "agents" / "semble-search.md"


def _add_content_args(p: argparse.ArgumentParser) -> None:
"""Add --content and deprecated --include-text-files to a subparser."""
p.add_argument(
"--content",
default=ContentType.CODE.value,
choices=_CONTENT_CHOICES,
help="Content type to index: 'code' (default), 'docs', or 'all'.",
)
p.add_argument(
"--include-text-files",
action="store_true",
help="Deprecated. Use --content all instead.",
)


def main() -> None:
"""Entry point for the semble command-line tool."""
if len(sys.argv) > 1 and sys.argv[1] in _CLI_DISPATCH_ARGS:
Expand All @@ -52,18 +71,15 @@ def _mcp_main() -> None:
help="Local directory or git URL to pre-index at startup (optional).",
)
parser.add_argument("--ref", default=None, help="Branch or tag to check out (git URLs only).")
parser.add_argument(
"--include-text-files",
action="store_true",
help="Also index non-code text files (.md, .yaml, .json, etc.).",
)
_add_content_args(parser)
args = parser.parse_args()
if any(find_spec(dep) is None for dep in get_package_extras("semble", "mcp")):
print("MCP dependencies are not installed. Run: pip install 'semble[mcp]'", file=sys.stderr)
raise SystemExit(1)
from semble.mcp import serve

asyncio.run(serve(args.path, ref=args.ref, include_text_files=args.include_text_files))
content = _resolve_content(args.content, args.include_text_files)
asyncio.run(serve(args.path, ref=args.ref, content=content))


def _run_init(*, agent: Agent = _DEFAULT_AGENT, force: bool = False) -> None:
Expand All @@ -78,6 +94,18 @@ def _run_init(*, agent: Agent = _DEFAULT_AGENT, force: bool = False) -> None:
print(f"Created {dest}")


def _resolve_content(content_arg: str, include_text_files: bool) -> ContentType:
"""Resolve --content and the deprecated --include-text-files into a ContentType."""
if include_text_files:
warnings.warn(
"--include-text-files is deprecated and will be removed in a future version. Use --content all instead.",
DeprecationWarning,
stacklevel=2,
)
return ContentType.ALL
return ContentType(content_arg)


def _cli_main() -> None:
parser = argparse.ArgumentParser(prog="semble")
sub = parser.add_subparsers(dest="command")
Expand All @@ -86,22 +114,14 @@ def _cli_main() -> None:
search_p.add_argument("query", help="Natural language or code query.")
search_p.add_argument("path", nargs="?", default=".", help="Local path or git URL (default: current directory).")
search_p.add_argument("-k", "--top-k", type=int, default=5, help="Number of results (default: 5).")
search_p.add_argument(
"--include-text-files",
action="store_true",
help="Also index non-code text files (.md, .yaml, .json, etc.).",
)
_add_content_args(search_p)

related_p = sub.add_parser("find-related", help="Find code similar to a specific location.")
related_p.add_argument("file_path", help="File path as shown in search results.")
related_p.add_argument("line", type=int, help="Line number (1-indexed).")
related_p.add_argument("path", nargs="?", default=".", help="Local path or git URL (default: current directory).")
related_p.add_argument("-k", "--top-k", type=int, default=5, help="Number of results (default: 5).")
related_p.add_argument(
"--include-text-files",
action="store_true",
help="Also index non-code text files (.md, .yaml, .json, etc.).",
)
_add_content_args(related_p)

init_p = sub.add_parser("init", help="Write a semble sub-agent file for your coding agent.")
init_p.add_argument(
Expand All @@ -126,11 +146,11 @@ def _cli_main() -> None:
print(format_savings_report(verbose=args.verbose), end="")
return

include_text = args.include_text_files
content = _resolve_content(args.content, args.include_text_files)
index = (
SembleIndex.from_git(args.path, include_text_files=include_text)
SembleIndex.from_git(args.path, content=content)
if _is_git_url(args.path)
else SembleIndex.from_path(args.path, include_text_files=include_text)
else SembleIndex.from_path(args.path, content=content)
)

if args.command == "search":
Expand Down
Loading
Loading