概述

Episode to Instagram

End-to-end pipeline: video episode or YouTube/video webpage URL → local media acquisition → transcript → content extraction → carousel/image generation → Instagram posting via browser automation.

Prerequisites

ffmpeg (installed via Homebrew)
OpenAI API key (for Whisper transcription)
Replicate API key (for visual generation — optional, can use built-in image generation)
openclaw browser for Instagram posting
Instagram account logged in via the OpenClaw browser profile (optional account/series branding can be stored in brand-config.json, e.g. @yourhandle)

Pipeline Steps

Step 0: Acquire Source Media

If the user provides a YouTube URL or a webpage that embeds a podcast/video episode:

Resolve the actual video URL from the page if needed
Download the best practical local source copy before transcription/frame extraction
Save the original source URL alongside the working files for traceability
Prefer a stable MP4/MOV source when possible; if only YouTube is available, download the combined video/audio asset first

Expected outputs in the episode working directory:

source-url.txt
source-video.mp4 (or equivalent local video file)

Step 1: Transcribe

Script: scripts/transcribe.sh

Extract audio from the video file using ffmpeg
Split into chunks if >25MB (Whisper API limit)
Send to OpenAI Whisper API for transcription
Output: timestamped transcript as JSON + plain text

Step 2: Extract Content

This step is model-driven (not scripted). The agent should:

Read the full transcript
Identify the 8-12 best moments for Instagram content:

3-4 short quotable moments (for carousel text overlays)
2-3 key insight passages (for carousel takeaway slides)
3-4 high-energy or visually interesting segments (for Reel clip timestamps)

For each moment, record:

Exact timestamp (start + end)
The quote or passage text
Content type: quote_card | takeaway | reel_clip
Suggested carousel slide text (cleaned up for Instagram)

Output a structured content plan as JSON

Step 3: Extract Video Frames

Script: scripts/extract-frames.sh

For each quote/takeaway moment, extract a frame at the timestamp using ffmpeg
Extract multiple frames around each timestamp (±2 seconds) and pick the best one
Output: PNG frames in a working directory

Step 4: Attach Frames to the Content Plan

Before rendering any carousel slides, the agent must update the content plan so each slide that references a selected moment has a concrete framePath pointing at an extracted frame file. Do not render slides until this mapping is present and verified.

Validation before rendering:

Every non-hook/non-CTA slide should either have a valid framePath or be intentionally text-only
If framePath values are missing, stop and repair the plan before running the generator
Spot-check at least one rendered slide to confirm the background image is visible and not falling back to a plain background unexpectedly

Step 5: Generate Carousel Slides

Script: scripts/generate-carousel.js

For each carousel post (5-7 slides):

Slide 1: Hook slide — bold text + video frame background
Slides 2-5: Key quotes/takeaways with text overlay on video frames
Final slide: CTA ("Follow for more" / episode link)

Carousel image specs:

1080x1080px (square) or 1080x1350px (portrait, 4:5 — higher engagement)
Consistent brand aesthetic (colors, fonts, overlay style)
Text must be readable over the video frame (use semi-transparent overlay)

For visual enhancement, optionally use Replicate models:

Background enhancement/style transfer on frames
Generate complementary visuals from episode themes

Step 6: Preview & Approve

Before posting, the agent must:

Send all generated carousel slides to the user via Slack
Send the full proposed caption text verbatim so the user can reply with direct edits
Send the content plan summary
Wait for explicit approval
Accept edits/feedback and regenerate as needed

Step 7: Post to Instagram

Script: scripts/post-to-instagram.js

Uses openclaw browser to:

Open Instagram in the browser
Use the main sidebar/header + / Create entrypoint
Explicitly choose Post or Reel based on the asset being uploaded (do not assume every + opens post creation; profile-page New may open Highlight instead)
Upload the staged media from OpenClaw's upload temp root
For videos/Reels, use the CDP-backed draft helper instead of openclaw browser upload so the flow avoids the flaky browser upload bridge
Preserve the original image/video aspect ratio via the bottom-left Select crop control before moving past the crop step; do not leave it on Instagram's default crop if that changes the intended composition
Enter the caption text
Screenshot the preview for final confirmation
Only post after explicit approval

Crop rule:

After upload, explicitly check the crop step and switch to the original aspect ratio/original framing when needed.
Do not accept Instagram's default square crop if it trims important composition or changes the intended look of the source image.

Practical note:

If Instagram opens New Highlight or any non-post flow, back out and retry from the main Create/sidebar + entrypoint, then select Post before uploading media.
Keep a user-visible working copy in a Desktop folder named instagram when helpful (for example: ~/Desktop/instagram/), but before calling openclaw browser upload, copy the final asset(s) into /tmp/openclaw/uploads/... because the browser CLI only accepts uploads from that temp root.
For single-image posts sourced from chat, first copy the inbound image into the Desktop instagram folder if you want a visible local working copy, then stage that Desktop copy into /tmp/openclaw/uploads/... for browser upload.
For .mp4 or .mov uploads, prefer the Reel flow unless the user explicitly wants a feed video post.
For video uploads derived from X posts, prefer an IG-safe working copy that is at most about 89 seconds and at most 1080p. Preserve quality as much as possible while keeping the file under the working upload ceiling.

Content Rules

Quote Cards

Keep text under 150 characters for readability
Use the speaker's actual words when possible
Add attribution (guest name / episode title)

Takeaway Carousels

5-7 slides per carousel
Strong hook on slide 1 (question or bold statement)
One idea per slide
Clean, readable typography
CTA on last slide

Reel Clips

60-90 seconds max
Include captions/subtitles
Strong hook in first 3 seconds
Extracted via ffmpeg from source video

Brand Aesthetic

To be configured per user. Store in brand-config.json:

{
  "primaryColor": "#000000",
  "secondaryColor": "#FFFFFF",
  "accentColor": "#8B5CF6",
  "fontStyle": "clean-modern",
  "overlayOpacity": 0.6,
  "format": "1080x1350",
  "accountHandle": "",
  "seriesName": "",
  "hashtagSets": []
}

Approval Rules (Strict)

Never post without explicit user approval
Always show preview screenshots before posting
In Slack-thread runs, include the full editable caption text in the same thread reply rather than only summarizing it
If user requests edits, regenerate and re-preview
Carousel order must be confirmed before posting

File Structure

episode-to-instagram/
├── SKILL.md
├── brand-config.json
├── scripts/
│   ├── transcribe.sh          # Audio extraction + Whisper API
│   ├── extract-frames.sh      # ffmpeg frame extraction
│   ├── generate-carousel.js   # Canvas-based slide generation
│   └── post-to-instagram.js   # Browser automation for IG posting
└── output/                    # Working directory for generated content
    └── {episode-id}/
        ├── transcript.json
        ├── transcript.txt
        ├── content-plan.json
        ├── frames/
        ├── slides/
        └── reels/

Public Repo Notes

This published version ships with a neutral brand-config.json; customize it for your own account before generating slides.
The repository intentionally does not include any local auth-profile helpers or generated output files.
OPENAI_API_KEY should be provided by your shell, secret manager, or OpenClaw runtime before using scripts/transcribe.sh.

版本历史

共 1 个版本

v1.0.1 当前

2026-05-07 10:34 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)