Expert book recommendation engine that finds high-quality books via web search.
["《精益创业》", "《从0到1》"])JSON object with the highest-scoring book:
{
"book_title": "书名",
"author": "作者",
"author_nationality": "国籍或'未知'",
"publish_date": "YYYY-MM或YYYY",
"rating": 8.9,
"review_count": 15000,
"score": 112.08,
"summary": "100字核心简介",
"reasoning": "推荐理由"
}
Goal: Get a list of 5-8 candidate book names. Do NOT try to get ratings here.
Search Queries (execute 2-3 queries in parallel):
| Query Type | Template | Example |
|---|---|---|
| ------------ | ---------- | --------- |
| Chinese book lists | "{topic} 经典书籍推荐 书单" | "用户增长 经典书籍推荐 书单" |
| English book lists | "{topic_en} best books goodreads" | "user growth best books goodreads" |
| Community picks | "{topic} 必读书 知乎推荐" | "用户增长 必读书 知乎推荐" |
Extract: Collect book titles + authors from search results. Ignore ratings at this stage.
Deduplicate immediately: Compare against used_models — remove any matches.
Minimum: Need at least 3 candidate books after dedup. If fewer, broaden the topic and search again.
Goal: Get accurate rating + review_count for each candidate.
Strategy (try in order, stop at first success):
For each candidate book, search for its Douban page then fetch it:
web_search: "{book_title}" site:book.douban.combook.douban.com/subject/ URL is found → web_fetch that URLWhy this works: Douban book pages have structured rating data that WebFetch can reliably parse.
If Method A fails (no Douban URL found, or WebFetch blocked):
web_search: "{book_title}" "{author}" 豆瓣评分 评价人数web_search: "{book_title}" "{author}" site:goodreads.comweb_fetch the Goodreads pageImportant Rules:
After Phase 2, some books may still lack ratings. Apply these rules:
| Missing Field | Action |
|---|---|
| --------------- | -------- |
| rating missing after 2 attempts | Use LLM estimate from search context (mark as "rating_source": "estimated"). If no context at all, drop the book. |
| review_count missing | Default to 500 (neutral — neither penalized nor boosted) |
| publish_date missing | Default to 2020 |
| author_nationality missing | Output "未知" (NEVER fabricate) |
LLM Estimation Rule: If multiple search results consistently describe a book as "高分" / "经典" / "highly rated" but no exact number is found, estimate conservatively (7.5-8.0 for Chinese, 3.8-4.0 for English). Always mark estimated ratings.
Action: Collect ALL surviving candidate books into a single JSON array. Pass this entire array to scripts/score_books.py via stdin for batch scoring. The script returns sorted results.
(If script unavailable, calculate manually using the formula below.)
Formula:
Total Score = (Base Quality + Popularity Bonus) × Recency Multiplier
A. Base Quality:
Base = rating × 10
If review_count < 100: Base = Base × 0.8 (small sample penalty)
B. Popularity Bonus:
Bonus = log₁₀(review_count) × 2
C. Recency Multiplier (based on publish_date):
Published within 2 years (2024-now): × 1.2
Published 3-5 years ago (2021-2023): × 1.0
Published 5+ years ago (≤2020): × 0.8
Example:
《增长黑客》: rating=8.5, review_count=10000, publish=2015
Base = 8.5 × 10 = 85
Bonus = log₁₀(10000) × 2 = 8
Recency = 0.8
Total = (85 + 8) × 0.8 = 74.4
Return the highest-scoring book in the structured JSON format.
Reasoning field must include: score justification, recency consideration, author background (if known).
If rating_source is "estimated", add a note: "注意:评分为根据多源信息估算,非精确数据"
Minimum Standards:
Exclusions:
{
"error": "网络连接连续 3 次超时,无法获取最新书单数据,请稍后重试。"
}
If broad search also fails:
{
"error": "该主题下未找到具备足够评价数据的经典书籍,请尝试更换更宽泛的主题或行业大词。"
}
If after Phase 2.5 no books survive:
web_search, focus on book list articlesweb_search + web_fetch combo, target Douban/Goodreads pagesscripts/score_books.py (deterministic)共 2 个版本