← 返回
内容创作 中文

Search Engine

Design and build any search engine with robust indexing, retrieval logic, relevance controls, and evaluation workflows for production systems.
设计并构建任意搜索引擎,具备强大的索引、检索逻辑、相关性控制及评估工作流,适用于生产系统。
ivangdavila
内容创作 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 1
Stars
📥 777
下载
💾 28
安装
1
版本
#latest

概述

Setup

On first use, read setup.md and establish activation behavior, system scope, and data constraints before proposing implementation steps.

When to Use

User needs to create, redesign, or scale a search engine for applications, documentation, products, or internal knowledge bases. Agent handles architecture planning, indexing strategy, retrieval design, relevance controls, evaluation loops, and rollout safety.

Architecture

Memory lives in ~/search-engine/. See memory-template.md for baseline structure and status values.

~/search-engine/
|-- memory.md              # Persistent context, constraints, and active priorities
|-- requirements.md        # Retrieval goals, latency targets, and relevance expectations
|-- experiments.md         # Offline experiments and tuning decisions
`-- incidents.md           # Production issues, root cause, and remediation notes

Quick Reference

Use the smallest relevant file for the task.

TopicFile
-------------
Setup and activation behaviorsetup.md
Memory template and status modelmemory-template.md
Architecture options and component choicesarchitecture-blueprint.md
Retrieval and ranking strategy patternsretrieval-patterns.md
Quality measurement and evaluation loopsevaluation-metrics.md
Delivery and rollout gatesimplementation-checklist.md

Data Storage

Local notes stay in ~/search-engine/:

  • requirements and relevance objectives
  • data source assumptions and indexing decisions
  • experiment outcomes and deployment safeguards

Core Rules

1. Start with a Retrieval Contract, Not with Tools

Before selecting engines, define the contract:

  • query types to support (keyword, phrase, semantic, hybrid)
  • response format, latency budget, and freshness target
  • error tolerance and fallback behavior

A search engine without a contract becomes an untestable collection of features.

2. Design Ingestion and Indexing as a Deterministic Pipeline

Every document should pass explicit stages:

  • ingestion source validation and deduplication
  • normalization and field extraction
  • chunking policy with stable identifiers
  • indexing with repeatable transforms

Deterministic pipelines reduce drift between environments and simplify debugging.

3. Separate Recall Layers from Precision Layers

Treat retrieval as a staged system:

  • broad candidate retrieval first (lexical, vector, or hybrid)
  • reranking and business rules second
  • formatting and explanation last

Mixing all concerns in one step hides failures and makes tuning unpredictable.

4. Define Relevance Features as Versioned Policy

Relevance changes must be tracked as policy versions:

  • feature weights and boosts
  • typo tolerance and synonym policy
  • filtering, faceting, and tie-break rules

Never ship silent relevance changes without versioned notes and measured deltas.

5. Evaluate Offline Before Production Writes

For each relevance or indexing change:

  • run benchmark queries with labeled expectations
  • measure hit quality, ordering quality, and coverage
  • compare against current baseline and note regressions

If evaluation evidence is weak, keep the current configuration and iterate.

6. Build Idempotent Index Operations and Safe Rollback

Index updates must be replay-safe:

  • stable document ids and version checks
  • resumable batch jobs with checkpoints
  • alias-based or dual-index rollback plan

Without idempotency and rollback, incident recovery becomes guesswork.

7. Match Complexity to Workload Reality

Use the minimum architecture that meets requirements:

  • avoid distributed complexity for small datasets
  • avoid simplistic models for multilingual or high-noise corpora
  • revisit design as scale and usage patterns change

Over-engineering and under-engineering both create expensive rework.

Common Traps

  • Starting with vendor selection before defining retrieval requirements -> architecture lock-in with unclear success criteria
  • Indexing raw data without field-level normalization -> poor filters, weak facets, and noisy matching
  • Tuning relevance on one happy-path query set -> brittle results in real user traffic
  • Applying business boosts without guardrails -> top results become commercially biased and less useful
  • Shipping retrieval changes without offline baseline comparison -> regressions discovered only by users
  • Running full reindex jobs without resumability -> long outages and partial data corruption
  • Ignoring multilingual tokenization differences -> severe precision drop for non-English users

Security & Privacy

Data that leaves your machine:

  • none by default in this instruction set
  • only user-approved integration traffic when the user explicitly connects external services

Data that stays local:

  • planning notes and experiment logs under ~/search-engine/
  • constraints, relevance decisions, and rollback records

This skill does NOT:

  • collect unrelated files or credentials
  • require hidden network calls
  • bypass user-confirmed environment boundaries

Related Skills

Install with clawhub install if user confirms:

  • api - Define stable APIs for indexing, querying, and retrieval orchestration
  • elasticsearch - Implement production indexing and query execution on Elasticsearch
  • meilisearch - Ship lightweight retrieval stacks with fast iteration cycles
  • engineering - Structure implementation workstreams and technical decision logs
  • software-engineer - Improve delivery quality with testable architecture and rollout discipline

Feedback

  • If useful: clawhub star search-engine
  • Stay updated: clawhub sync

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 04:48 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

content-creation

AdMapix

fly0pants
广告情报与应用数据分析助手,支持搜索广告素材、分析应用排名、下载量、收入及市场洞察,用于广告素材和竞品分析。
★ 294 📥 136,399
ai-intelligence

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,349 📥 317,697
content-creation

Humanizer

biostartechnology
消除AI写作痕迹,使文本更自然真实。基于维基百科"AI写作特征"指南,识别并修正夸张象征、宣传用语、肤浅-ing分析、模糊归因、破折号滥用、三项排比、AI词汇、负面平行结构及冗长连接词等模式。
★ 857 📥 199,251