← 返回
未分类 Key 中文

AI Product Comparison Skill

Extract structured product data from e-commerce URLs using the Zyte API and generate side-by-side comparison tables with intelligent purchase recommendations...
使用 Zyte API 从电商 URL 提取结构化商品数据,生成并排比较表并提供智能购买建议。
apscrapes apscrapes 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 301
下载
💾 0
安装
1
版本
#latest

概述

Zyte E-Commerce Products Compare Skill

Compare products from any e-commerce site by extracting structured data via the

Zyte API, building a normalized comparison table, and recommending the best option.

What it does

  • Searches products across multiple e-commerce sources
  • Extracts price, features, and availability
  • Compares products side-by-side
  • Recommends the best option

Input

A natural language product query.

Skill structure

zyte-ecommerce-products-compare-skill/
├── SKILL.md                         ← Workflow and instructions (you are here)
├── scripts/
│   ├── fetch_products.py            ← Parallel fetcher (2–20+ URLs, rate-limit aware)
│   └── parse_product.py             ← Response parser (handles edge cases in Zyte output)
└── references/
    └── zyte-api-notes.md            ← API reference notes and known gotchas

When to read what:

  • scripts/fetch_products.py — always. This is the primary data fetching tool.
  • scripts/parse_product.py — always. Run it on each fetched response file.
  • references/zyte-api-notes.md — when you hit unexpected errors or need to

understand a parsing edge case.

Prerequisites

  • python3 (3.8+, stdlib only — no pip installs required)
  • ZYTE_API_KEY set in the environment

Input

Gather from the user:

FieldRequiredDescription
------------------------------------------------------------------------------
urlsYesList of product page URLs (at least 1, ideally 2+)
intentNoWhat the user cares about (e.g. "best value", "most durable")
api_keyYesZyte API key (prefer $ZYTE_API_KEY from env)

Workflow

Step 1 — Validate inputs

  1. Confirm at least one URL is provided. If only one URL is given, extract and

present its data but note that comparison requires 2+.

  1. Each URL must start with http:// or https://.
  2. Verify ZYTE_API_KEY is set:

```bash

echo "$ZYTE_API_KEY" | head -c 4; echo "..."

```

If empty, ask the user to export it.

  1. If URLs span very different product categories (e.g. footwear and electronics),

warn the user and ask for confirmation before proceeding.

Step 2 — Fetch product data (parallel)

Use the bundled fetch script to call the Zyte API for all URLs in parallel:

python3 scripts/fetch_products.py "$ZYTE_API_KEY" \
  "https://example.com/products/item-a" \
  "https://example.com/products/item-b" \
  "https://example.com/products/item-c"

The script handles everything:

  • Fetches all URLs concurrently (up to 5 workers by default).
  • Writes each response to /tmp/product_1_raw.json, /tmp/product_2_raw.json, etc.
  • Retries HTTP 429 (rate limit) with exponential backoff, up to 3 times per URL.
  • Reports per-URL errors (401, 422, 520, network failures) without aborting others.
  • Decompresses gzip responses automatically.
  • Prints progress to stderr and a JSON summary to stdout.

Performance: Parallel fetching cuts wall-clock time significantly. For 3 URLs,

expect ~35s instead of ~90s sequential (roughly 60% faster). For 10+ URLs the

savings are even greater since most calls run concurrently.

Read the summary output to check which URLs succeeded:

{
  "total": 3,
  "success": 3,
  "failed": 0,
  "total_elapsed": 35.0,
  "results": [
    {"index": 1, "url": "...", "status": "ok", "file_path": "/tmp/product_1_raw.json", "elapsed": 18.2},
    {"index": 2, "url": "...", "status": "ok", "file_path": "/tmp/product_2_raw.json", "elapsed": 34.9},
    {"index": 3, "url": "...", "status": "ok", "file_path": "/tmp/product_3_raw.json", "elapsed": 21.4}
  ]
}

Exit codes: 0 = all succeeded, 1 = partial success (some failed), 2 = all failed.

Step 3 — Parse responses

For each successful result from Step 2, run the parse script:

python3 scripts/parse_product.py /tmp/product_1_raw.json
python3 scripts/parse_product.py /tmp/product_2_raw.json
python3 scripts/parse_product.py /tmp/product_3_raw.json

Skip any index where the fetch status was not "ok".

The script outputs normalized JSON to stdout with: name, price, currency,

currencyRaw, brand, sku, availability, rating, reviewCount,

bestRating, description, features, additionalProperties, breadcrumbs,

mainImage, url, regularPrice.

Exit codes: 0 = success, 1 = no product data in response, 2 = file/JSON error.

Step 4 — Normalize data

Make the extracted data comparable:

  1. Prices — parse string values (e.g. "2999.0") to floats. Note each

product's currency. If currencies differ, flag it — don't auto-convert.

  1. Ratings — normalize to 0–5 scale if bestRating differs across products.

Formula: normalized = (ratingValue / bestRating) * 5. If a product has no

rating, show and don't penalize it in ranking.

  1. Availability — map Zyte values to readable labels: InStock → "In Stock",

OutOfStock → "Out of Stock", PreOrder → "Pre-Order".

  1. Specs — merge features and additionalProperties into one key-value map.

Filter out junk entries (seller addresses, numeric-only keys, metadata like

"net quantity" or "item count"). See references/zyte-api-notes.md for

known junk patterns.

  1. Common fields — identify fields present across all products for the table

columns. Product-specific fields go in a "Unique Features" section.

Step 5 — Build comparison table

Generate a markdown table adapted to the product category:

| Attribute      | Product A          | Product B          |
|----------------|--------------------|--------------------|
| Name           | ...                | ...                |
| Price          | $29.99             | $34.99             |
| Regular Price  | $39.99             | —                  |
| Brand          | Brand X            | Brand Y            |
| Rating         | 4.5/5 (120 reviews)| —                  |
| Availability   | In Stock           | In Stock           |
| Key Features   | feature1, feature2 | feature3, feature4 |

Rules:

  • Use for missing values, never leave cells blank.
  • Show discounts: $29.99 (was $39.99).
  • Cap "Key Features" at 5 items per product.
  • For 4+ products, consider vertical layout if the table gets too wide.

Step 6 — Key differences

List 3–5 bullet points focused on what would influence a purchase decision:

- Product A is 70% cheaper
- Only Product B has customer ratings
- Product C is the only one with detailed material specs
- Product A has the steepest discount (40% off)

Step 7 — Recommendation

With user intent — map intent keywords to relevant attributes:

KeywordsPrioritize
------------------------------------------------------------------------------------
budget / cheap / valuelowest price, price-to-rating ratio
best / premium / tophighest rating, most reviews, brand reputation
comfort / walking / runningcushioning, weight, sole tech, material
sport / court / outdoorsupport, traction, durability, construction
durability / lastingmaterial quality, warranty, build

Produce up to 3 recommendations:

🏆 **Best Overall:** [Name] — [1-sentence reason]
💰 **Best Value:** [Name] — [1-sentence reason]
⭐ **Best Premium:** [Name] — [1-sentence reason]

Only include categories that make sense for the product set.

Be honest about product-intent mismatch. If none of the products actually match

the user's stated need (e.g. user wants running shoes but all products are casual

sneakers), say so clearly and suggest what to look for instead.

Without intent — rank by value score:

value_score = (rating / 5) * 0.6 + (1 - normalized_price) * 0.4

Where normalized_price = (price - min) / (max - min) across the set. If a product

has no rating, use the average of the other products as a stand-in.

Step 8 — Final output

Structure the response as:

## Product Comparison
[Table from Step 5]

### Key Differences
[Bullets from Step 6]

### Recommendation
[From Step 7]

### Data Notes
- Source: Zyte API automatic product extraction
- [List any failed URLs with reasons]
- [Note if currencies differ across products]
- [Note if any data was incomplete]
- [Total fetch time and number of parallel workers used]

Error handling

Most errors are handled automatically by scripts/fetch_products.py. Check

the JSON summary output to see per-URL status.

ErrorHandled by
-----------------------------------------------------------------------------------
Missing ZYTE_API_KEYYou (Step 1) — stop and ask user to export it.
Invalid URL formatfetch_products.py — skipped, reported in summary.
HTTP 401fetch_products.py — reported as auth_error.
HTTP 422fetch_products.py — reported as payload_error.
HTTP 429 (rate limit)fetch_products.py — auto-retries 3× with backoff.
HTTP 520/521fetch_products.py — reported as http_error.
No .product in responsefetch_products.py — reported as no_product_data.
JSON control charactersparse_product.py — handled via strict=False.
Missing individual fieldsYou (Step 5) — show in table, never crash.
All URLs failedReport errors from summary, suggest manual URL check.
Mixed currenciesYou (Step 4) — show both, don't convert, flag it.

DNS note

If network calls fail with DNS resolution errors in sandboxed environments,

force a public DNS resolver before running:

echo "nameserver 8.8.8.8" > /etc/resolv.conf

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 10:36 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

suspicious
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,657
data-analysis

Stock Watcher

robin797860
管理和监控个人股票自选列表,支持利用同花顺数据添加、删除、列出股票及汇总近期表现。适用于用户希望追踪特定股票、获取表现汇总或管理自选列表时。
★ 112 📥 46,531
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 141,566