Search Indeed for job listings and company info via Bright Data's Web Scraper API. Designed for recruiting workflows on messaging platforms (Telegram, Signal) with smart defaults.
BRIGHTDATA_API_KEY environment variable must be setcurl and jq must be availableUser wants job info?
├── Has a specific Indeed URL?
│ ├── Job URL (/viewjob?) → indeed_jobs_by_url.sh [SYNC — seconds]
│ ├── Company jobs URL (/cmp/*/jobs) → indeed_jobs_by_company.sh [ASYNC — minutes]
│ └── Company page URL (/cmp/*) → indeed_company_by_url.sh [SYNC — seconds]
├── Wants to search by keyword/location?
│ └── indeed_smart_search.sh [ASYNC — 3-8 min]
│ Agent says: "Searching now, this takes a few minutes."
│ If results < 5: auto-expands date range, do NOT ask user
│ Always pipe output through: indeed_format_results.sh --top 5
├── Wants company info?
│ ├── Has Indeed company URL → indeed_company_by_url.sh [SYNC — seconds]
│ ├── Has keyword → indeed_company_by_keyword.sh [ASYNC — minutes]
│ └── Has industry + state → indeed_company_by_industry.sh [ASYNC — minutes]
└── Check pending results? → indeed_check_pending.sh (run on heartbeat)
Always prefer sync (URL-based) scripts when the user provides a URL — they return in seconds.
| Script | Purpose | Mode |
|---|---|---|
| -------- | --------- | ------ |
indeed_smart_search.sh | Primary job search — keyword expansion, parallel queries, dedup, caching | ASYNC |
indeed_jobs_by_url.sh | Collect job details by URL(s) | SYNC |
indeed_jobs_by_keyword.sh | Low-level single-keyword job search (used by smart search internally) | ASYNC |
indeed_jobs_by_company.sh | Discover jobs from company page | ASYNC |
indeed_company_by_url.sh | Collect company info by URL | SYNC |
indeed_company_by_keyword.sh | Discover companies by keyword | ASYNC |
indeed_company_by_industry.sh | Discover companies by industry/state | ASYNC |
indeed_format_results.sh | Format JSON results into summary, full, or CSV | Local |
indeed_check_pending.sh | Check/fetch completed pending searches + auto-cleanup | Local/API |
indeed_poll_and_fetch.sh | Poll async job and fetch results (internal) | API |
indeed_list_datasets.sh | List available Indeed dataset IDs | API |
User says: "Find me cybersecurity jobs in New York"
scripts/indeed_smart_search.sh "cybersecurity" US "New York, NY" \
| scripts/indeed_format_results.sh --type jobs --top 5
User says: "Get details on this job: https://www.indeed.com/viewjob?jk=abc123"
scripts/indeed_jobs_by_url.sh "https://www.indeed.com/viewjob?jk=abc123"
indeed_format_results.sh.indeed_check_pending.sh first before starting a new search.---SPLIT--- markers from indeed_format_results.sh to break across messages.# Basic search (expands keywords, deduplicates, defaults to last 7 days)
scripts/indeed_smart_search.sh "cybersecurity" US "Remote"
# All-time search
scripts/indeed_smart_search.sh "nursing" US "Texas" --all-time
# Skip keyword expansion
scripts/indeed_smart_search.sh "registered nurse" US "Ohio" --no-expand
# Bypass 6-hour cache
scripts/indeed_smart_search.sh "data science" US "New York" --force
Output is {"meta": {...}, "results": [...]} with metadata including query params, keywords used, and result counts.
# Telegram-friendly summary (default)
scripts/indeed_format_results.sh --type jobs --top 5 results.json
# CSV export
scripts/indeed_format_results.sh --type jobs --format csv results.json
# Companies
scripts/indeed_format_results.sh --type companies --top 5 companies.json
# Pipe from smart search
scripts/indeed_smart_search.sh "nurse" US "Ohio" | scripts/indeed_format_results.sh --top 5
scripts/indeed_check_pending.sh
# Output: {"completed":[...],"still_pending":[...],"failed":[...]}
Run this periodically. If ~/.config/indeed-brightdata/pending.json exists and is non-empty, check for completed results. Format completed results with indeed_format_results.sh and send to the user.
| Code | Meaning | Agent should... |
|---|---|---|
| ------ | --------- | ----------------- |
| 0 | Success — results on stdout | Format and present results |
| 1 | Error — something failed | Report the error |
| 2 | Deferred — still processing, saved to pending | Tell user "results are still processing, I'll follow up" |
Smart search caches results for 6 hours. Identical searches (same keyword + location + country) return cached results without API calls. Use --force to bypass. Old results (>7 days) are auto-cleaned by indeed_check_pending.sh.
All persistent data is stored under ~/.config/indeed-brightdata/:
| File | Purpose | Lifecycle |
|---|---|---|
| ------ | --------- | ----------- |
datasets.json | Bright Data dataset IDs | Created on first indeed_list_datasets.sh --save, rarely changes |
pending.json | In-flight async snapshots | Entries added on poll timeout (exit 2) or fire-and-forget (--no-wait), removed when fetched or after 24h |
history.json | Search cache index | Entries added per search, auto-cleaned after 7 days |
results/*.json | Fetched result data | Written when snapshots complete, auto-cleaned after 7 days |
Auto-cleanup runs at the start of indeed_check_pending.sh. No data is sent anywhere other than the Bright Data API.
All scripts source scripts/_lib.sh for shared HTTP and persistence functions. The library:
https://api.brightdata.com/datasets/v3BRIGHTDATA_API_KEY (sent via Authorization: Bearer header)~/.config/indeed-brightdata/ (see Data Storage above)See references/api-reference.md for complete endpoint documentation, response schemas, and country/domain mappings.
See references/keyword-expansions.json for the lookup table of keyword-to-job-title mappings.
共 1 个版本