← 返回
开发者工具 Key 中文

Apify Runner

Run any Apify Actor to scrape web data (Instagram, TikTok, Reddit, Twitter, etc). Handles Actor discovery, quality filtering, probe testing, batched executio...
运行任意 Apify Actor 抓取网络数据(Instagram、TikTok、Reddit、Twitter 等),支持 Actor 发现、质量过滤、探针测试及批量执行。
duxj4520
开发者工具 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 592
下载
💾 8
安装
1
版本
#latest

概述

Apify Skill

Run any Apify Actor through a standardized workflow: search → validate → execute → collect results.

Prerequisites

  • APIFY_TOKEN env var, or a config.json with tokens (copy config.json.example)
  • Python 3 with requests installed

Workflow

Step 1: Parse User Intent

Extract from the user's request:

  • Platform/target (Instagram, TikTok, Reddit, etc.)
  • What to scrape (posts, profiles, hashtags, comments, etc.)
  • Targets (URLs, usernames, keywords)
  • Quantity/filters (how many, time range, min likes, etc.)

Step 2: Select Token

If user specifies a token name or the task maps to a specific account, use that. Otherwise use default.

Token can be provided via:

  1. --token flag (highest priority)
  2. config.json tokens map (by --token-name)
  3. APIFY_TOKEN env var (fallback)

Step 3: Search & Select Actor

Run the search script:

python3 scripts/search_actor.py "instagram scraper" --top 3

Output: ranked candidates with score, success rate, rating, pricing model.

Quality filters (built into script):

  • notice = NONE (not deprecated)
  • 30-day success rate ≥ 95%
  • 30-day runs ≥ 1,000
  • User rating ≥ 4.0

Pick the top-ranked candidate. If user has a preference or prior experience with a specific Actor, skip search.

Step 4: Get Actor Schema & Build run_input

Fetch the Actor's documentation:

web_fetch https://apify.com/{actor_id}.md

Read the input schema section. Construct run_input JSON based on:

  • The Actor's required/optional fields
  • The user's targets and filters
  • Sensible defaults from the documentation

Do NOT ask the user to write JSON. Build it from their natural language request.

Step 5: Probe Test (Top 1 → Top 2 → Top 3 fallback)

Test with minimal input before committing to full run:

python3 scripts/apify_runner.py {actor_id} \
  --input '{...}' \
  --token {token} \
  --probe-only \
  --list-key {key}

The probe automatically uses the first 2 items from the list field.

Checks:

  • Run starts successfully (no permission/billing errors)
  • Run completes (no timeout/crash)
  • Returns non-empty data

If probe fails → try next candidate Actor. If all 3 fail → report to user with Actor URLs for manual activation.

Step 6: Full Execution

python3 scripts/apify_runner.py {actor_id} \
  --input '{...}' \
  --token {token} \
  --output /path/to/results.json \
  --list-key {key} \
  --batch-size 50 \
  --probe

Key flags:

FlagPurposeDefault
---------
--list-keyField in run_input containing the list to batchNone (no batching)
--batch-sizeItems per batch50
--timeoutPer-batch timeout (seconds)600
--probeRun probe before full executionOff
--outputSave results to JSON fileStdout
--configPath to config.json for token lookupNone
--token-nameWhich token to use from config"default"

Batching rules:

  • ≤ batch-size items → single run
  • \> batch-size items → auto-split, 3s pause between batches
  • Each batch has independent timeout (default 10 min)

Step 7: Return Results

  • Report total items collected
  • Save raw JSON to specified output path
  • Summarize key stats (items count, batches, any failures)
  • Let the caller handle filtering/reporting/delivery

Common Actor Patterns

PlatformTypical Actorlist_keyExample input
------------
Instagramapify/instagram-scraperdirectUrls{"directUrls": ["https://instagram.com/user/"], "resultsType": "posts", "resultsLimit": 3}
TikTokclockworks/tiktok-scraperhashtags{"hashtags": ["cooking"], "resultsPerPage": 50}
Reddittrudax/reddit-scraper-litestartUrls{"startUrls": [{"url": "https://reddit.com/r/cooking/top/?t=month"}], "maxItems": 30}
Twitterapidojo/tweet-scraperCheck .md for current schema

These are starting points. Always verify with the Actor's .md page for current schema.

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 16:21 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

developer-tools

Github

steipete
使用 `gh` CLI 与 GitHub 交互,通过 `gh issue`、`gh pr`、`gh run` 和 `gh api` 管理议题、PR、CI 运行及高级查询。
★ 668 📥 324,132
developer-tools

Gog

steipete
Google Workspace 命令行工具,支持 Gmail、日历、云端硬盘、通讯录、表格和文档。
★ 921 📥 185,794
developer-tools

CodeConductor.ai

larsonreever
AI驱动平台,提供快速全栈开发、智能体、工作流自动化及低代码AI集成的可扩展产品创建。
★ 68 📥 180,146