← 返回
未分类 中文

Codebase Search

Build a persistent semantic vector index over a Python codebase and search it with natural language. Use when an agent needs to find relevant classes, functi...
在Python代码库上构建持久化的语义向量索引,并使用自然语言进行搜索,适用于代理需要查找相关类、函数的场景。
ryno2390 ryno2390 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 319
下载
💾 0
安装
1
版本
#agentic-coding#chromadb#latest#python#semantic-search

概述

Codebase Search

Builds a persistent ChromaDB vector index over Python source files and enables semantic search with natural language queries.

Quick Start

1. Install the scripts

Copy scripts/code_chunker.py and scripts/code_index.py into your project. They have no dependencies beyond chromadb (install with pip install chromadb).

2. Build the index

import asyncio
from code_index import CodebaseIndex

index = CodebaseIndex(repo_root="/path/to/repo")
count = asyncio.run(index.build())
print(f"Indexed {count} symbols")

The index persists to {repo_root}/.codebase_index/ and survives restarts. Subsequent calls to build() are fast — only new/changed files are indexed.

3. Search

results = asyncio.run(index.search("token payment handling", top_k=5))
for r in results:
    print(f"[{r.score:.2f}] {r.symbol_name} ({r.symbol_type}) — {r.filepath}:{r.start_line}")

Convenience API (when integrated into a project)

If code_chunker.py and code_index.py are in the project as a module, use the singleton helper:

from prsm.compute.nwtn.corpus import search_codebase

results = await search_codebase("circuit breaker", top_k=3)

Key Options

ParameterDefaultDescription
---------
top_k5Number of results to return
symbol_typeNoneFilter to "class" or "function"
force_rebuildFalseWipe and rebuild entire index
exclude_patternssee belowDirectories to skip

Default excludes: __pycache__, .venv, migrations, tests, scripts, .git, node_modules, .codebase_index

How It Works

  1. ChunkingCodeChunker uses Python's ast module to extract every top-level class and function from each .py file. Captures name, type, docstring, line numbers, and source.
  2. Indexing — ChromaDB stores each chunk as a document: "{symbol_name}: {docstring or first 300 chars of source}". Uses ChromaDB's default embedding function (no API key needed).
  3. Search — Cosine similarity query returns ranked SearchResult objects with filepath, symbol name, line numbers, docstring, and relevance score.

.gitignore

Always add .codebase_index/ to .gitignore — it's a local artifact, not source code.

Reference

See references/integration.md for integration patterns, including how to wire semantic search into sub-agent delegation prompts.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 06:14 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

dev-programming

Mcporter

steipete
使用 mcporter CLI 直接列出、配置、认证及调用 MCP 服务器/工具(支持 HTTP 或 stdio),涵盖临时服务器、配置编辑及 CLI/类型生成功能。
★ 195 📥 67,825
dev-programming

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 680 📥 328,565
dev-programming

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 73 📥 182,193