Episode to Instagram
End-to-end pipeline: video episode or YouTube/video webpage URL → local media acquisition → transcript → content extraction → carousel/image generation → Instagram posting via browser automation.
Prerequisites
ffmpeg (installed via Homebrew)- OpenAI API key (for Whisper transcription)
- Replicate API key (for visual generation — optional, can use built-in image generation)
openclaw browser for Instagram posting- Instagram account logged in via the OpenClaw browser profile (optional account/series branding can be stored in
brand-config.json, e.g. @yourhandle)
Pipeline Steps
Step 0: Acquire Source Media
If the user provides a YouTube URL or a webpage that embeds a podcast/video episode:
- Resolve the actual video URL from the page if needed
- Download the best practical local source copy before transcription/frame extraction
- Save the original source URL alongside the working files for traceability
- Prefer a stable MP4/MOV source when possible; if only YouTube is available, download the combined video/audio asset first
Expected outputs in the episode working directory:
source-url.txtsource-video.mp4 (or equivalent local video file)
Step 1: Transcribe
Script: scripts/transcribe.sh
- Extract audio from the video file using ffmpeg
- Split into chunks if >25MB (Whisper API limit)
- Send to OpenAI Whisper API for transcription
- Output: timestamped transcript as JSON + plain text
Step 2: Extract Content
This step is model-driven (not scripted). The agent should:
- Read the full transcript
- Identify the 8-12 best moments for Instagram content:
- 3-4 short quotable moments (for carousel text overlays)
- 2-3 key insight passages (for carousel takeaway slides)
- 3-4 high-energy or visually interesting segments (for Reel clip timestamps)
- For each moment, record:
- Exact timestamp (start + end)
- The quote or passage text
- Content type: quote_card | takeaway | reel_clip
- Suggested carousel slide text (cleaned up for Instagram)
- Output a structured content plan as JSON
Step 3: Extract Video Frames
Script: scripts/extract-frames.sh
- For each quote/takeaway moment, extract a frame at the timestamp using ffmpeg
- Extract multiple frames around each timestamp (±2 seconds) and pick the best one
- Output: PNG frames in a working directory
Step 4: Attach Frames to the Content Plan
Before rendering any carousel slides, the agent must update the content plan so each slide that references a selected moment has a concrete framePath pointing at an extracted frame file. Do not render slides until this mapping is present and verified.
Validation before rendering:
- Every non-hook/non-CTA slide should either have a valid
framePath or be intentionally text-only - If
framePath values are missing, stop and repair the plan before running the generator - Spot-check at least one rendered slide to confirm the background image is visible and not falling back to a plain background unexpectedly
Step 5: Generate Carousel Slides
Script: scripts/generate-carousel.js
For each carousel post (5-7 slides):
- Slide 1: Hook slide — bold text + video frame background
- Slides 2-5: Key quotes/takeaways with text overlay on video frames
- Final slide: CTA ("Follow for more" / episode link)
Carousel image specs:
- 1080x1080px (square) or 1080x1350px (portrait, 4:5 — higher engagement)
- Consistent brand aesthetic (colors, fonts, overlay style)
- Text must be readable over the video frame (use semi-transparent overlay)
For visual enhancement, optionally use Replicate models:
- Background enhancement/style transfer on frames
- Generate complementary visuals from episode themes
Step 6: Preview & Approve
Before posting, the agent must:
- Send all generated carousel slides to the user via Slack
- Send the full proposed caption text verbatim so the user can reply with direct edits
- Send the content plan summary
- Wait for explicit approval
- Accept edits/feedback and regenerate as needed
Step 7: Post to Instagram
Script: scripts/post-to-instagram.js
Uses openclaw browser to:
- Open Instagram in the browser
- Use the main sidebar/header
+ / Create entrypoint - Explicitly choose
Post or Reel based on the asset being uploaded (do not assume every + opens post creation; profile-page New may open Highlight instead) - Upload the staged media from OpenClaw's upload temp root
- For videos/Reels, use the CDP-backed draft helper instead of
openclaw browser upload so the flow avoids the flaky browser upload bridge - Preserve the original image/video aspect ratio via the bottom-left
Select crop control before moving past the crop step; do not leave it on Instagram's default crop if that changes the intended composition - Enter the caption text
- Screenshot the preview for final confirmation
- Only post after explicit approval
Crop rule:
- After upload, explicitly check the crop step and switch to the original aspect ratio/original framing when needed.
- Do not accept Instagram's default square crop if it trims important composition or changes the intended look of the source image.
Practical note:
- If Instagram opens
New Highlight or any non-post flow, back out and retry from the main Create/sidebar + entrypoint, then select Post before uploading media. - Keep a user-visible working copy in a Desktop folder named
instagram when helpful (for example: ~/Desktop/instagram/), but before calling openclaw browser upload, copy the final asset(s) into /tmp/openclaw/uploads/... because the browser CLI only accepts uploads from that temp root. - For single-image posts sourced from chat, first copy the inbound image into the Desktop
instagram folder if you want a visible local working copy, then stage that Desktop copy into /tmp/openclaw/uploads/... for browser upload. - For
.mp4 or .mov uploads, prefer the Reel flow unless the user explicitly wants a feed video post. - For video uploads derived from X posts, prefer an IG-safe working copy that is at most about 89 seconds and at most 1080p. Preserve quality as much as possible while keeping the file under the working upload ceiling.
Content Rules
Quote Cards
- Keep text under 150 characters for readability
- Use the speaker's actual words when possible
- Add attribution (guest name / episode title)
Takeaway Carousels
- 5-7 slides per carousel
- Strong hook on slide 1 (question or bold statement)
- One idea per slide
- Clean, readable typography
- CTA on last slide
Reel Clips
- 60-90 seconds max
- Include captions/subtitles
- Strong hook in first 3 seconds
- Extracted via ffmpeg from source video
Brand Aesthetic
To be configured per user. Store in brand-config.json:
{
"primaryColor": "#000000",
"secondaryColor": "#FFFFFF",
"accentColor": "#8B5CF6",
"fontStyle": "clean-modern",
"overlayOpacity": 0.6,
"format": "1080x1350",
"accountHandle": "",
"seriesName": "",
"hashtagSets": []
}
Approval Rules (Strict)
- Never post without explicit user approval
- Always show preview screenshots before posting
- In Slack-thread runs, include the full editable caption text in the same thread reply rather than only summarizing it
- If user requests edits, regenerate and re-preview
- Carousel order must be confirmed before posting
File Structure
episode-to-instagram/
├── SKILL.md
├── brand-config.json
├── scripts/
│ ├── transcribe.sh # Audio extraction + Whisper API
│ ├── extract-frames.sh # ffmpeg frame extraction
│ ├── generate-carousel.js # Canvas-based slide generation
│ └── post-to-instagram.js # Browser automation for IG posting
└── output/ # Working directory for generated content
└── {episode-id}/
├── transcript.json
├── transcript.txt
├── content-plan.json
├── frames/
├── slides/
└── reels/
Public Repo Notes
- This published version ships with a neutral
brand-config.json; customize it for your own account before generating slides. - The repository intentionally does not include any local auth-profile helpers or generated output files.
OPENAI_API_KEY should be provided by your shell, secret manager, or OpenClaw runtime before using scripts/transcribe.sh.