概述

Yula Web Search Skill

Custom web search skill by Yula — NO API KEY REQUIRED.

Uses multiple public anonymous search services that don't require API keys. Works via direct curl requests from local network:

Parse Bing search to get top results (title + URL)
Extract full content from the top 2-3 most relevant URLs**
Summarize all information into a comprehensive answer
If one method fails, automatically try next fallback

Just works, no configuration needed.

When to Use

✅ USE this skill when:

Search for latest news, information, or products
Look up current prices, availability, or status
Find answers to questions that require up-to-date data
Extract content from a specific URL
Research topics that need current web information
Chinese language searches (optimized for China region)

❌ DON'T use when:

Weather forecast → use weather skill
Local file search → use file tools
Already have the information in context

Search Workflow

Complete Workflow (with content extraction)

Search Bing → get top 5-8 results with title + URL
Filter results → select top 2-3 most relevant URLs based on title matching query
Extract content from each selected URL using curl + html-to-text
Combine all extracted content
Summarize into a coherent answer for the user

Search Methods (Tried in Order)

Method 1: Direct Bing Search (cn.bing.com, Primary)

Direct request to Chinese Bing, parse HTML to extract result titles and URLs:

QUERY="your search query"
QUERY_ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY'))"
curl -s -m 20 -L -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "https://cn.bing.com/search?q=$QUERY_ENCODED" | python3 -c "
import re, sys
from html.parser import HTMLParser

class BingParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.results = []
        self.in_h2 = False
        self.current_url = None
        self.current_title = []
    def handle_starttag(self, tag, attrs):
        attrs_dict = dict(attrs)
        if tag == 'h2':
            self.in_h2 = True
            self.current_title = []
        if tag == 'a' and self.in_h2 and 'href' in attrs_dict:
            url = attrs_dict['href']
            if 'bing.com' not in url and url.startswith('http'):
                self.current_url = url
    def handle_endtag(self, tag):
        if tag == 'h2':
            if self.current_url:
                title = ''.join(self.current_title).strip()
                self.results.append((title, self.current_url))
                self.current_url = None
            self.in_h2 = False
    def handle_data(self, data):
        if self.in_h2 and self.current_url:
            self.current_title.append(data)

parser = BingParser()
parser.feed(sys.stdin.read())
for i, (title, url) in enumerate(parser.results[:8]):
    print(f'{i+1}\\t{title}\\t{url}")
"

Method 2: Direct Google Search (google.com, Fallback 1)

If Bing fails, try Google:

QUERY="your search query"
QUERY_ENCODED=$(python3 -c "import urllib.parse; print(urllib.parse.quote('$QUERY'))"
curl -s -m 20 -L -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36" "https://www.google.com/search?q=$QUERY_ENCODED" | python3 -c "
import re, sys
results = []
pattern = r'<h3 class=\"zBAuLc\"><a href=\"([^\"]+)\"'
matches = re.findall(pattern, html)
for i, url in enumerate(matches[:8]):
    # Google enconder title is after... extract separately
    pass
# Simplified extraction - get first 8 URLs
"

Extract Content from URL (after search)

After getting search results, extract text content from top relevant URLs:

def extract_url_content(url):
    # Use curl to get HTML
    # Use python to extract text content, remove scripts/styles/scripts, get main text
    # Return cleaned text content, limit to ~2000 chars per URL

**Example full workflow example:

# After getting search results, select top 2-3 relevant URLs
for (title, url) in selected_urls:
    curl -s -m 20 -L -A "USER_AGENT" "$url" | python3 -c "
import sys
from html.parser import HTMLParser

class TextExtractor(HTMLParser):
    def __init__(self):
        super().__init__()
        self.text = []
        self.in_script = False
        self.in_style = False
    def handle_starttag(self, tag, attrs):
        if tag == 'script' or tag == 'style' or tag == 'noscript':
            self.in_script = True
        if tag == 'style':
            self.in_style = True
    def handle_endtag(self, tag):
        if tag == 'script' or tag == 'style' or tag == 'noscript':
            self.in_script = False
        if tag == 'style':
            self.in_style = False
    def handle_data(self, data):
        if not self.in_script and not self.in_style:
            words = data.strip()
            if words:
                self.text.append(words)

parser = TextExtractor()
parser.feed(sys.stdin.read())
content = ' '.join(parser.text)
# Clean up whitespace and limit length
content = ' '.join(content.split())[:2000]
print(content)
"

User-Agent

Always use a modern browser User-Agent to avoid being blocked immediately:

"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36"

Complete Example Workflow

> "Search 2026腾势Z9GT介绍"

**Step 1: Search Bing → get top 8 results

# Output gives title + url:
1.  2026款腾势Z9GT-腾势官网
    URL: https://www.tengshiauto.com/product-detail/26-z9gt.html
2.  【腾势Z9GT】腾势_腾势Z9GT报价_腾势Z9GT图片_汽车之家
    URL: https://www.autohome.com.cn/7659
...

Step 2: Select most relevant results

Pick 2-3 results that best match the query

Official website
Autohome article

**Step 3: Extract content from each URL

Get cleaned text from each page
Limit to 2000 characters per page

**Step 4: Combine and summarize

Read all extracted content
Summarize into a coherent answer with key information: price, specs, release date

Best Practices

Content extraction → Always extract top 2-3 results, not more (avoids too much text)
Relevance filtering → Select results whose title contains more query keywords preferentially
Character limit → Limit total extracted text to ~5000 chars total to avoid token overflow
Timeout → 20 seconds max per request to avoid hanging
Fallback → if one search engine fails, automatically try next
Chinese optimized → prefer Chinese keywords, Chinese websites

How to select relevant results

Count how many query keywords appear in the title
Sort results by relevance
Pick top N (2-3) for content extraction
Official sites and major portal sites have higher priority

Notes

NO API KEY REQUIRED — works out of the box
Multiple fallback methods → if one gets blocked, try the next
Direct curl via local network → uses your existing network/proxy
Python HTML parsing → extracts title/url reliably
Automatic content extraction and summarization → gives complete answer without user having to click links
Modern browser User-Agent → reduces chance of being blocked
Free for non-commercial use
Rate limits: be respectful, don't spam too many requests quickly
If all methods fail, fall back to general knowledge

Full Workflow Summary

1. Search Bing → get (title + url)
   ↓
2. Filter by relevance → pick top 2-3
   ↓
 3. Extract content from each URL
   ↓
 4. Combine all text
   ↓
 5. Summarize into final answer
   ↓
 6. Present to user with sources

Author

Created by Yula

GitHub: https://github.com/wjzhb/yula-web-search

License

Licensed under the MIT License

If you find this skill useful, please ⭐ star it on GitHub!

版本历史

共 1 个版本

v1.0.1 Created by Yula GitHub: https://github.com/wjzhb/yula-web-search 当前

2026-04-08 11:18 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)