You are building a structured knowledge base that gives AI agents everything they need to understand a person, their business, their voice, and their boundaries. This is the foundation for all future AI work. The better the KB, the better every output from day one.
This is the OpenClaw version of the init-kb skill. It uses the Firecrawl REST API (via curl) instead of the Firecrawl CLI.
IMPORTANT: Knowledge bases must load ON-DEMAND, not at boot. Every agent session should not preload all KB files. This causes massive context bloat and kills productivity.
The pattern:
This is critical for multi-project workspaces. Enforcing this throughout Phase 7 (Integration) prevents context waste.
9 files in KNOWLEDGE BASE/:
| File | What It Captures |
|---|---|
| ------ | ----------------- |
| PERSONA.md | Agent identity, core behavioral rules, boundaries, vibe |
| CONTEXT.md | Business context, goals, market position, competitors, non-negotiables |
| USER.md | The person/founder: background, origin story, personality, differentiators |
| VOICE.md | Writing style: tone, vocabulary, banned words/phrases, quality test |
| GUARDRAILS.md | Brand rules, things to never say, sensitive topics, approval gates |
| SITEMAP.md | Complete site structure (only if 20+ pages; otherwise folded into CONTEXT.md) |
| BUSINESS-INTEL.md | Products, pricing, business model, audience, positioning, tech stack, team |
| OPPORTUNITIES.md | Gaps, thin content, broken journeys, growth signals |
| CORRECTIONS.md | Self-improving log: every correction the user makes updates the source KB file and gets logged here |
Plus:
site-content/ directory with every scraped page as its own markdown fileInit triggers: "init kb", "build kb", "create kb for X", "set up kb", "new kb"
Update triggers: "update kb", "refresh kb", "re-scrape kb", "kb update"
When an update trigger fires, skip to the Update Flow section.
The Firecrawl API key must be available. Check for it in this order:
FIRECRAWL_API_KEY.firecrawl/api-key.txt in the workspace rootIf the user provides a key, save it to .firecrawl/api-key.txt (one line, just the key). Read from this file on future runs.
in the output and move on. Never pressure.When the skill triggers (init), open with this welcome message before asking anything:
Welcome to init-kb. I'm going to build a complete knowledge base that gives your AI agents full context on your business — who you are, what you sell, how you write, and what the rules are.
Here's what we're building:
- 9 structured files covering your business, person, voice, and boundaries
- Your full website scraped and analyzed (if you have one)
- A living KB that gets smarter over time
Estimated time: 10-20 minutes (faster if you have a website to scrape)
Do you have a Firecrawl API key? That's what I use to scrape your website and social profiles. If not, grab one free here: https://firecrawl.link/operator — come back when you have it and we'll continue.
If they say they have a key (or one is already saved), move to API key setup guidance below.
If they don't have one yet, wait for them to confirm before proceeding.
API key setup — ask first: "Are you running OpenClaw locally on your machine (Mac, PC) or on a server/VPS?"
If local:
To save your key permanently, run this in your terminal:
echo 'export FIRECRAWL_API_KEY=your-key-here' >> ~/.zshrc && source ~/.zshrc
Or if you use bash: replace .zshrc with .bashrc
Then paste your key here and I'll also save it to .firecrawl/api-key.txt as a backup.
If VPS/server:
Three ways to add it:
Option 1 — Hostinger (GUI):
Log into Hostinger, go to Catalogue, click Manage on your VPS, scroll down to Environment Variables, and add:
Key: FIRECRAWL_API_KEY
Value: your-key-here
Option 2 — Any VPS via terminal:
echo 'export FIRECRAWL_API_KEY=your-key-here' >> ~/.bashrc && source ~/.bashrc
(Replace .bashrc with .zshrc if you use zsh.)
Option 3 — Just paste it here:
Paste your key directly in this chat and I'll save it to .firecrawl/api-key.txt. Only do this if you're the only one with access to your server and Discord channel. Never paste API keys in shared or public channels.
After they paste the key, save it to .firecrawl/api-key.txt and confirm: "Got it. Key saved."
Then proceed:
Question 1: "What's the project or business name? This becomes the folder name."
Question 2: "Got a website URL? Any social profiles (LinkedIn, X, YouTube, Instagram)? Any other important links (docs, portfolio, press pages, Skool community)? Drop them all here. If you don't have any yet, just say 'none' and we'll skip the scraping."
After Phase 0:
KNOWLEDGE BASE// already exists:.firecrawl//crawl-raw.json exists and is less than 7 days old: "I already scraped this site on [date]. Want to use the cached data or re-scrape?" If cache is good, skip to Stage 4.curl -s -X POST "https://api.firecrawl.dev/v1/map" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "<website-url>", "limit": 500}' \
-o .firecrawl/<project-slug>/map-result.json
Parse the response to get the URL list and count. Present to the user: "I found X pages on your site. Crawling all of them will use approximately X Firecrawl credits. Want me to proceed, or should I limit it?"
If the user wants to limit, ask for a number or suggest core pages only.
curl -s -X POST "https://api.firecrawl.dev/v1/crawl" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "<website-url>", "limit": <N>, "scrapeOptions": {"formats": ["markdown"]}}' \
-o .firecrawl/<project-slug>/crawl-job.json
This returns a job ID. Poll for completion:
curl -s -X GET "https://api.firecrawl.dev/v1/crawl/<job-id>" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-o .firecrawl/<project-slug>/crawl-raw.json
Poll every 10 seconds until status is "completed". Tell the user "Crawling... this might take a minute" while waiting.
After completion, parse the JSON and save each page as an individual markdown file in KNOWLEDGE BASE/ using URL-slug naming (e.g., homepage.md, about.md, products-widget-x.md).
If the crawl times out or fails: save whatever partial results were collected and continue with what you have.
For each social profile and important link, scrape individually:
curl -s -X POST "https://api.firecrawl.dev/v1/scrape" \
-H "Authorization: Bearer $FIRECRAWL_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "<url>", "formats": ["markdown"]}' \
-o .firecrawl/<project-slug>/social/<platform>.json
Parse the markdown content from each response and save as .md files in .firecrawl/ and .firecrawl/.
If a social scrape fails (anti-bot, login wall): note "limited extraction" and continue.
Read through every scraped page and social profile. Build a complete mental model of the business. Do not skim. Do not sample. Read it all.
Extract into BUSINESS-INTEL.md:
1. Products and Services
2. Business Model
3. Target Audience
4. Brand Positioning
5. Content Strategy
6. Tech Stack and Tools (if detectable)
7. Team and People
8. Legal and Compliance
9. Voice and Messaging Patterns
Size check: If BUSINESS-INTEL.md would exceed roughly 2000 words, split it. Core facts (products, pricing, model, audience, positioning, team, tech) stay in BUSINESS-INTEL.md. Analysis and opportunities (gaps, content signals, growth patterns) go to OPPORTUNITIES.md.
Extract into OPPORTUNITIES.md:
Gaps and signals found during analysis:
Mark each with: [ ] Not started, [~] In progress, [x] Done
Site structure handling:
Pre-fill answers from scraped content before asking questions:
Present a summary to the user:
"I crawled X pages and scraped Y social profiles. Here's what I know about your business:
What you sell: [products/services with pricing]
Who you sell to: [target audience in their own language]
How you position yourself: [key differentiators]
Your content strategy: [what you publish, how often, what topics]
Trust signals I found: [testimonials, stats, logos]
Things I noticed: [gaps, opportunities, interesting patterns]
I've written all of this into BUSINESS-INTEL.md. Now let me confirm a few things I couldn't find on the site."
If the website was scraped, attempt to auto-detect the business type. Present as confirmation: "From your site, this looks like a [Creator / Personal Brand]. Is that right, or is it something else?"
If no website was scraped, ask directly:
Question 3: "What type of business is this?"
Progress update: "Got it. 1 of 4 sections done. Next: tell me about yourself."
Ask one at a time. If the scrape found an About page, LinkedIn profile, or bio, show what was extracted first: "From your website, I got this: [extracted bio]. Anything to add or correct?" Then skip to what the scrape missed.
Question 4: "Tell me about yourself in a few sentences. Background, what you're known for, what makes you different."
(Skip if About page or LinkedIn bio was scraped and user confirms.)
Question 5: "What's your origin story? The short version. How did you end up doing what you do?"
(Skip if About page covered this and user confirms.)
Question 6: "What do you disagree with in your industry? What do most people in your space get wrong?"
Question 7: "Drop 2-3 examples of content you've written or posts you're proud of. Paste the text or links. I'll analyze your voice from these."
(If blog posts or social posts were scraped, use those automatically. Only ask for additional samples if fewer than 3 were found.)
If the user provides links, scrape them via the Scrape API. If they paste text, analyze directly. Extract sentence length patterns, vocabulary habits, tone, structural patterns, and recurring phrases.
Question 8: "Any words or phrases you hate? Things that make you cringe when you see them in content? These go straight into your banned list."
(Always ask. Cannot be scraped.)
Progress update: "Personal section done. 2 of 4 sections complete. Next: your business."
If website was scraped, show what was extracted: "From your homepage, it looks like you do [X] for [Y]. Sound right?" Then ask only what the scrape missed.
Question 9: "What does your business actually do? Who's it for?"
(Skip if homepage/about was scraped and user confirms.)
Question 10: "What are you optimizing for right now? Revenue? Growth? Awareness? Building an audience?"
(Always ask. Cannot be scraped.)
Question 11: "Who are your competitors or the people in your space? What makes you different?"
(Skip if scraped content made this clear and user confirms.)
Question 12: "Any non-negotiables? Things that must always be true about how your business shows up?"
(Always ask. Cannot be scraped.)
Adaptive bonus questions by business type:
If SaaS: "Main features? Pricing model? Ideal customer profile?"
If Agency: "Services? Client types? Standout case studies?"
If Niche Site / Content Site: "Niche? Monetization? Content pillars?"
If Creator / Personal Brand: "Platforms? Content formats? Monetization? Audience?"
If E-commerce: "What do you sell? Channels? Brand story? Typical customer?"
Progress update: "Business section done. 3 of 4 sections complete. Last one: AI agent rules."
Question 13: "What should AI agents built from this KB be able to do? Write content? Customer support? Research? SEO? Be specific."
Question 14: "What should the AI never do? Hard boundaries?"
Question 15: "Anything legally sensitive, topics to avoid, or things that need human approval?"
Progress update: "All questions done. Let me show you what I've got before generating the files."
Present a summary organized by output file (key points, not full files):
Here's what I captured:
**PERSONA.md** (Agent Identity)
- Role: [what the agent does]
- Core rules: [2-3 key rules]
- Boundaries: [key restrictions]
**CONTEXT.md** (Business)
- Business: [what it does, who it's for]
- Goals: [top 3 priorities]
- Differentiator: [what makes them different]
**USER.md** (The Person)
- Background: [key points]
- Origin: [short version]
- Values: [what they care about]
**VOICE.md** (Writing Style)
- Tone: [analysis from samples]
- Banned: [key items]
- Style: [key patterns]
**GUARDRAILS.md** (Boundaries)
- Never: [key restrictions]
- Sensitive: [topics requiring care]
- Approval required: [what needs sign-off]
**SITEMAP.md** (Site Structure) [only if 20+ pages]
- Total pages: [count]
- Categories: [breakdown]
**BUSINESS-INTEL.md** (Deep Analysis)
- Products/services: [list with pricing]
- Business model: [how they make money]
- Target audience: [who, in their language]
**OPPORTUNITIES.md** (Gaps and Growth Signals)
- [key gaps and opportunities found]
**CORRECTIONS.md** (Self-Improving Log)
- Empty on first run. Gets populated as the user corrects outputs over time.
Ask: "Anything I missed or got wrong?"
After user confirms, generate all files.
Generate all 9 KB files. See templates below.
PERSONA.md template:
# Agent Persona — <project-name>
## Role
[What this agent does. One sentence. Specific.]
## Personality
[Tone, vibe, how it comes across. Not "professional" — specific.]
## Core Rules
- [Rule 1 — specific and actionable]
- [Rule 2]
- [Rule 3]
## Boundaries
- Never: [hard nos]
- Always ask before: [things needing approval]
- Sensitive topics: [list]
## Voice
See VOICE.md — read it before writing anything.
CORRECTIONS.md template (initial — empty, ready for use):
# Corrections Log
This file tracks every time the user corrected an agent output. Each correction updates the source KB file directly, then gets logged here so the pattern is visible over time.
## How it works
When you correct an output, identify which KB file influenced the mistake, update that file with the correct rule or information, then log the correction below.
---
<!-- Corrections will appear here as you use the KB -->
CORRECTIONS.md — how it gets used (ongoing):
Whenever the user says something like "that's wrong", "I wouldn't say it that way", "don't do that", or corrects a specific output:
## [DATE] — [brief description of what was corrected]
- **Output type:** [content / decision / recommendation]
- **What was wrong:** [brief description]
- **Source file updated:** [e.g., VOICE.md]
- **What changed:** [old assumption or rule] replaced with [corrected rule]
Tell the user during the onboarding wizard: "One more thing: whenever I get something wrong and you correct me, I'll update the relevant KB file automatically. The KB gets smarter every time you correct an output."
Include SITEMAP.md in generated files only if it was generated as standalone.
After generating all files, do this automatically:
## Knowledge Base: <project-name>
**When working on <project-name> content**, read these files in order:
1. KNOWLEDGE BASE/<project-name>/PERSONA.md
2. KNOWLEDGE BASE/<project-name>/CONTEXT.md
3. KNOWLEDGE BASE/<project-name>/VOICE.md
4. KNOWLEDGE BASE/<project-name>/GUARDRAILS.md
5. KNOWLEDGE BASE/<project-name>/BUSINESS-INTEL.md
Read USER.md, SITEMAP.md, and OPPORTUNITIES.md only on demand when needed. **Do not load on every session** — context bloat kills productivity.
## Knowledge Base: <project-name>
**Before writing content, building agents, or making decisions about this project:**
Load files in this order:
1. KNOWLEDGE BASE/<project-name>/PERSONA.md — agent rules and behavior
2. KNOWLEDGE BASE/<project-name>/CONTEXT.md — business fundamentals
3. KNOWLEDGE BASE/<project-name>/VOICE.md — writing style and tone
4. KNOWLEDGE BASE/<project-name>/GUARDRAILS.md — boundaries and sensitive topics
5. KNOWLEDGE BASE/<project-name>/BUSINESS-INTEL.md — deep business analysis
**On-demand references:**
- USER.md — personal background (load when needed)
- SITEMAP.md — site structure (load for navigation questions)
- OPPORTUNITIES.md — gaps and growth signals (load when brainstorming)
Full page content is available in `site-content/` for deep analysis when you need to reference specific pages or check existing positioning.
**Important:** Do not load the KB at boot for unrelated work. Only load when actively working on <project-name> projects.
When an update trigger fires ("update kb", "refresh kb", "re-scrape kb", "kb update"):
KNOWLEDGE BASE/. If multiple exist, ask: "Which project do you want to update?" and wait for confirmation.site-content/ filesWhen the user provides writing samples (or when blog posts are scraped), analyze for:
Use this to populate VOICE.md with specific, actionable observations. Not "conversational tone" but "writes in lowercase, uses fragments, averages 8 words per sentence, opens with a bold claim."
All scraping uses the Firecrawl REST API (https://api.firecrawl.dev/v1/). The API key is passed via the Authorization: Bearer header.
| Endpoint | Method | What it does | Cost |
|---|---|---|---|
| ---------- | -------- | ------------- | ------ |
/v1/map | POST | Discover all URLs on a site | Free/near-free |
/v1/crawl | POST | Start a full site crawl (async, returns job ID) | ~1 credit/page |
/v1/crawl/ | GET | Check crawl status / get results | Free |
/v1/scrape | POST | Scrape a single URL to markdown | 1 credit |
Map request body: {"url": "
Crawl request body: {"url": "
Scrape request body: {"url": "
Crawl polling: The crawl endpoint returns {"id": "..."}. Poll GET /v1/crawl/ every 10 seconds until status is "completed".
.firecrawl// .firecrawl//crawl-raw.json exists and is less than 7 days old, offer to reuse itsite-content/ directory in the KB is the processed output, not the cache.firecrawl//social/ .firecrawl//links/ .firecrawl/api-key.txt| Scenario | What to do |
|---|---|
| ---------- | ----------- |
| No API key | Walk through Phase 0 API key setup. Save to .firecrawl/api-key.txt. |
| API returns 401 | Key is invalid or expired. Ask user for a new key. |
| Crawl times out | Save partial results. Note which pages were missed. Continue with what you have. |
| Social scrape fails (anti-bot) | Note "limited extraction" for that profile. Continue with other sources. |
| Rate limited (429) | Wait 30 seconds and retry. If it happens 3 times, stop and continue with what you have. |
| No website URL provided | Skip all scraping. All questions become mandatory. Still produces all 9 KB files. |
KNOWLEDGE BASE/<project-name>/
PERSONA.md
CONTEXT.md
USER.md
VOICE.md
GUARDRAILS.md
SITEMAP.md (only if 20+ pages)
BUSINESS-INTEL.md
OPPORTUNITIES.md
CORRECTIONS.md
site-content/
homepage.md
about.md
pricing.md
blog-post-slug.md
...
.firecrawl/
api-key.txt
<project-slug>/
map-result.json
crawl-job.json
crawl-raw.json
social/
linkedin.md
twitter.md
youtube.md
instagram.md
links/
docs.md
portfolio.md
...
共 1 个版本