← 返回
未分类 Key

dataify-reddit-comment-by-url

Submit Dataify Reddit Post Comment by URL Builder tasks. Use when the user wants the Reddit post comment collection tool, collect Reddit post comments, scrap...
通过 URL Builder 提交 Dataify Reddit 帖子评论收集任务。用于需要收集并爬取 Reddit 帖子评论的场景。
dataify-server dataify-server 来源
未分类 clawhub v1.1.0 2 版本 100000 Key: 需要
★ 0
Stars
📥 166
下载
💾 0
安装
2
版本
#latest

概述

Dataify Reddit Comment By URL

Submit Reddit post comment collection jobs through Dataify Builder by Reddit URL. After a successful submission, give the user the task_id, the returned or inferred status, and tell them to visit Dataify to view results.

API TOKEN Handling

Use DATAIFY_API_TOKEN as the long-term saved token name.

  • If the user provides a token in the request, use it for this run.
  • If no token is provided, first check whether DATAIFY_API_TOKEN is already saved locally in the environment.
  • If DATAIFY_API_TOKEN is saved locally, use it without asking the user to re-enter the token.
  • If no token is available locally, tell the user they need to provide a Dataify API TOKEN.
  • If the user does not have an API TOKEN, tell them they can register or log in at Dataify to get one.
  • If the user already has an API TOKEN, tell them it is available in the top-right area of Dataify.
  • After the user provides an API TOKEN and no local DATAIFY_API_TOKEN is saved, ask whether they want to save it locally as DATAIFY_API_TOKEN for future use.
  • If the user wants to save it, give the appropriate command for their shell and ask them to run it; do not silently persist tokens without confirmation.
  • Do not call the Builder endpoint without a token.
  • Always call it API TOKEN in user-facing instructions. Prefer the environment variable name DATAIFY_API_TOKEN for saved local use.

PowerShell examples for saving the token for the current session:

$env:DATAIFY_API_TOKEN = "YOUR_DATAIFY_API_TOKEN"

For a persistent user-level variable on Windows:

[Environment]::SetEnvironmentVariable("DATAIFY_API_TOKEN", "YOUR_DATAIFY_API_TOKEN", "User")

Core Workflow

  1. Before submitting, show the user the required values, optional values, and defaults listed in the Parameter Checklist.
  2. Ask whether the user wants to change any value before running the task.
  3. Ask whether the user wants to collect multiple Reddit comment groups. If yes, ask for multiple groups with url, days_back, and comment_limit.
  4. Normalize the final values into a list of spider_parameters objects.
  5. Resolve the Dataify token from explicit input or saved DATAIFY_API_TOKEN.
  6. If no token is available, ask the user to enter their API TOKEN and ask whether to save it as DATAIFY_API_TOKEN.
  7. Validate the Reddit URL, numeric values, and file name.
  8. Submit the Builder request with spider_id=reddit_comment_by-url.
  9. Read data.task_id from the Builder response and read data.status or status when present.
  10. Stop after Builder succeeds.
  11. Tell the user to visit Dataify to view or manage results.

Parameter Checklist

When the user invokes this skill, first tell them these values are used. Always display submitted parameters as a Markdown table; do not use a plain sentence or bullet list for the parameter confirmation.

FieldRequiredDefaultLocationNotes
---------------
urlYeshttps://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_buttonspider_parametersReddit URL.
days_backNo10spider_parametersNumber of days back for collecting comments. Must be an integer greater than or equal to 0.
comment_limitNo5spider_parametersReply comment limit. Must be an integer greater than or equal to 0.
file_nameNo{{TasksID}}Builder form fieldUse the default when the user does not change it.

Then ask: "Do you want to change any of these values before I submit the task?"

Also ask: "Do you want to collect multiple Reddit comment groups? If yes, provide multiple groups with url, days_back, and comment_limit."

If the user has already provided some values, show those values in place of the defaults and only ask whether the remaining/defaulted values should be changed.

Parameter Handling

  • url is required. If the user does not provide it, use the default https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button only after showing it in the parameter confirmation table.
  • Trim leading and trailing whitespace from url.
  • url cannot be empty.
  • url must start with https://www.reddit.com/.
  • days_back must be an integer greater than or equal to 0.
  • comment_limit must be an integer greater than or equal to 0.
  • Multiple collection groups repeat url, days_back, and comment_limit inside spider_parameters.
  • file_name defaults to {{TasksID}}. If the user changes it, submit the user-provided value.
  • file_name cannot be empty.

Single-group example:

spider_parameters=[{"url":"https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button","days_back":"10","comment_limit":"5"}]

Multi-group example:

spider_parameters=[{"url":"https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button","days_back":"10","comment_limit":"5"},{"url":"https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button","days_back":"10","comment_limit":"5"}]

Dataify Builder Request

Use form fields rather than hand-built URL-encoded strings.

  • URL: https://scraperapi.dataify.com/builder?platform=1
  • Method: POST
  • Authorization header: Bearer DATAIFY_API_TOKEN
  • Content type: application/x-www-form-urlencoded
  • Fixed fields:
  • spider_name=reddit.com
  • spider_id=reddit_comment_by-url
  • spider_errors=true
  • Default field:
  • file_name={{TasksID}}
  • Dynamic field:
  • spider_parameters must be a JSON string array of Reddit comment objects.

Script

For stable execution, prefer scripts/submit_dataify_reddit_comment_by_url.py with Python 3.6 or newer instead of rewriting the Builder flow.

python3 ".\scripts\submit_dataify_reddit_comment_by_url.py" --url "https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button" --days-back "10" --comment-limit "5"

To override the saved environment token or file name:

python3 ".\scripts\submit_dataify_reddit_comment_by_url.py" --api-token "YOUR_DATAIFY_API_TOKEN" --url "https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button" --file-name "{{TasksID}}"

To submit multiple Reddit comment groups:

python3 ".\scripts\submit_dataify_reddit_comment_by_url.py" --params-json '[{"url":"https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button","days_back":"10","comment_limit":"5"},{"url":"https://www.reddit.com/r/datascience/comments/1cmnf0m/comment/l32204i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button","days_back":"10","comment_limit":"5"}]'

The script prints a JSON summary with spider_id, task_id, status, parameters, file_name, dashboard_url, and message.

Troubleshooting

Missing Dataify API TOKEN means no explicit token was passed and DATAIFY_API_TOKEN is not saved locally. Tell the user they need to provide their Dataify API TOKEN, ask whether they want to save it as DATAIFY_API_TOKEN, or tell them they can register or log in at Dataify to get one. If they already have a token, tell them it is in the top-right area of Dataify.

url cannot be empty means the Reddit URL is missing.

url must start with https://www.reddit.com/ means the URL is outside the allowed Reddit domain.

days_back must be an integer greater than or equal to 0 means the day limit is invalid.

comment_limit must be an integer greater than or equal to 0 means the reply comment limit is invalid.

File name cannot be empty means no usable file_name was provided.

Necessary parameters is empty! usually means the Builder request was not submitted as form fields, spider_parameters was not a JSON string array, or one spider_parameters object is missing required fields.

Missing task_id usually means the authorization header, token, spider_name, spider_id, or spider_parameters is wrong.

Guardrails

  • Do not put file_name inside spider_parameters.
  • Do not use a Reddit URL from outside https://www.reddit.com/.
  • Use only API TOKEN and DATAIFY_API_TOKEN when referring to authentication.
  • Do not hard-code local Python paths.
  • Do not invent result fields.
  • Always direct the user to Dataify after successful task creation.

版本历史

共 2 个版本

  • v1.1.0 当前
    2026-06-09 18:49 安全 安全
  • v1.0.0
    2026-06-01 21:29 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 274 📥 101,076
data-analysis

Stock Analysis

udiedrichsen
利用Yahoo Finance数据深度分析股票和加密货币。支持投资组合管理、关注列表与提醒、股息分析、八维度股票评分、热门趋势扫描(热点扫描器)及谣言/早期信号检测。适用于股票分析、投资组合追踪、财报反应、加密货币监控、热门股票发现及在主流
★ 280 📥 57,972
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 211 📥 70,331