← 返回
数据分析 中文

Pinterest Scraper

Scrape Pinterest search results and collect pins with image URLs, descriptions, and direct links using infinite scroll. Use when you want to collect visual i...
抓取 Pinterest 搜索结果,收集包含图片 URL、描述和直链的 Pin,使用无限滚动。适用于需要收集视觉灵感时。
phy041
数据分析 clawhub v1.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 379
下载
💾 16
安装
1
版本
#latest#pinterest#scraping

概述

Pinterest Search Scraper

Scrapes pins from Pinterest search results using Crawlee's PlaywrightCrawler with infinite scroll support. Collects pin URLs, image URLs, and descriptions without requiring login.

Requirements

pip install crawlee[playwright]
playwright install chromium

Usage

python pinterest_scraper.py "search query"
python pinterest_scraper.py "minimalist home decor" 30
python pinterest_scraper.py "brutalist architecture" 100

Arguments:

  • query (required): Search term (use quotes for multi-word queries)
  • max_pins (optional): Maximum pins to collect, default 30

What It Returns

Each pin object contains:

FieldDescription
--------------------
idPinterest pin ID
urlFull Pinterest pin URL (https://www.pinterest.com/pin//)
image_urlHighest-resolution image URL from srcset
descriptionImage alt text / pin description

Output

Saved to ./storage/pinterest/.json:

[
  {
    "id": "123456789",
    "url": "https://www.pinterest.com/pin/123456789/",
    "image_url": "https://i.pinimg.com/736x/ab/cd/ef/...",
    "description": "Minimalist living room with white walls"
  },
  ...
]

How It Works

The scraper uses Crawlee's PlaywrightCrawler to:

  1. Navigate to the Pinterest search URL: https://www.pinterest.com/search/pins/?q=
  2. Wait for pin elements to appear ([data-test-id="pin"] or a[href*="/pin/"])
  3. Iteratively scroll to the bottom to trigger infinite load
  4. Extract pins after each scroll via page.evaluate() JavaScript injection
  5. Deduplicate by pin ID and collect until max_pins is reached or scrolling stalls

The JavaScript extractor resolves srcset attributes to select the highest-resolution image available:

async def _extract_pins(self, page) -> None:
    """Extract pin data from the current page state."""
    # Runs JS in the browser context to walk the DOM and extract pin data
    # Handles multiple Pinterest DOM structures (data-test-id variants)
    # Resolves srcset to get highest resolution image
    ...

Configuration

ParameterDefaultDescription
---------------------------------
max_pins30Maximum pins to collect
headlessTrueRun browser headlessly (set False for debugging)
max_scroll_attempts10Stop after N consecutive scrolls with no new pins
scroll_delay1.5sWait between scrolls for content to load

To run in headed mode for debugging:

scraper = PinterestScraper(max_pins=10, headless=False)

Integrating Into a Pipeline

import asyncio
from pinterest_scraper import PinterestScraper

async def collect_inspiration(query: str, limit: int = 50) -> list[dict]:
    scraper = PinterestScraper(max_pins=limit, headless=True)
    pins = await scraper.scrape_search(query)
    return pins

pins = asyncio.run(collect_inspiration("editorial fashion photography", 50))

Troubleshooting

No pins found: Pinterest occasionally changes its DOM structure. Try setting headless=False to inspect visually. The scraper attempts two selector strategies ([data-test-id="pin"] and a[href*="/pin/"]).

Fewer pins than expected: Pinterest's infinite scroll depends on scroll velocity and network speed. Increase max_scroll_attempts in scrape_search() or add a longer asyncio.sleep() after each scroll.

Playwright install error: Run playwright install chromium to download the browser binary. If behind a corporate proxy, set PLAYWRIGHT_BROWSERS_PATH to a writable directory.

Rate limiting / CAPTCHA: Pinterest may show a CAPTCHA after many rapid requests. Add delays between scraper runs or rotate residential IPs.

Rate Limiting Guidelines

  • Wait 5+ seconds between search queries when running multiple
  • Avoid scraping more than 300-500 pins per hour from a single IP
  • Pinterest does not require login for search, but aggressive scraping triggers bot detection

Use Cases

  • Visual trend research: Collect images around a topic to identify visual patterns
  • Dataset creation: Build image datasets for computer vision or aesthetic scoring models
  • Content planning: Find top-performing visuals in a niche to guide creative direction
  • Competitive research: Scrape brand-related queries to see what imagery dominates a category
  • Mood board generation: Automate collection of reference images for design projects

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-30 22:24 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

A股量化 AkShare

mbpz
A股量化数据分析工具,基于AkShare库获取A股行情、财务数据、板块信息等。用于回答关于A股股票查询、行情数据、财务分析、选股等问题。
★ 164 📥 59,883
developer-tools

Twitter Scrape

phy041
Scrape Twitter profiles and tweets via GraphQL, export to JSON or database
★ 0 📥 1,027
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 198 📥 65,027