概述

YouTube Cover Nano Banana

Overview

Analyze the user's text first. Then turn it into a thumbnail concept that is visually simple, emotionally obvious, and readable at small sizes.

Generate English image prompts for nano banana unless the user explicitly asks for another language. Keep reasoning grounded in YouTube thumbnail performance rather than generic poster design.

Use scripts/create_thumbnail.py for the full workflow when local script execution is available. It first calls Gemini text generation to turn source copy into a thumbnail plan, then optionally calls the official Gemini Nano Banana image endpoint. The scripts expect GEMINI_API_KEY or GOOGLE_API_KEY.

Workflow

1. Extract the message

Pull out:

Core topic
Intended audience
Main promise or tension
Emotional direction such as urgency, surprise, authority, fear, curiosity, or excitement
Best visual subject
Any non-negotiable details such as product, person, brand color, or forbidden elements

If the user only gives raw copy, infer the thumbnail angle from the strongest claim instead of mirroring the entire text.

2. Choose the thumbnail strategy

Prefer one dominant idea. Use one of these visual strategies:

Expressive face plus short text
Single object or product in dramatic close-up
Before versus after contrast
Threat, mistake, or warning framing
Curiosity gap with an incomplete reveal
Authority or proof framing with a clear focal object

Reject cluttered multi-idea compositions unless the user explicitly wants a collage.

3. Compress the on-image text

Write overlay text that is:

2 to 6 words when possible
Readable in one second
Stronger than the original copy
Different from the full video title if needed

Do not place paragraphs, subtitles, or detailed bullet points inside the image prompt.

4. Build the nano banana prompt

Produce a prompt with these properties:

English language
16:9 YouTube thumbnail composition
One dominant subject
Bold focal point
High contrast lighting and color separation
Clean background with supporting elements only
Space reserved for large headline text
Photorealistic or stylized only if the user requests it

Explicitly describe:

Subject appearance and pose
Camera framing
Emotion
Background environment
Color palette
Lighting
Text placement area
Thumbnail style cues such as cinematic, glossy, creator-economy, tech, finance, fitness, education

Use the template and examples in youtube-thumbnail-patterns.md when you need help selecting the structure.

5. Generate the image

Call nano banana with the final prompt after the concept is coherent.

For the full automated workflow, run:

python3 scripts/create_thumbnail.py \
  --copy "Man fights tiger" \
  --generate-image \
  --output-json "outputs/thumbnail-plan.json" \
  --image-output "outputs/generated-thumbnail.png"

This script:

Analyzes the source copy
Returns structured JSON with angle, overlay_text, prompt, and generation_notes
Optionally renders the image with Nano Banana
Writes a stable result envelope for integration use

If local script execution is available, run:

python3 scripts/generate_image.py \
  --prompt "<final english prompt>" \
  --angle "<angle>" \
  --overlay-text "<overlay text>" \
  --output "outputs/generated-thumbnail.png"

The script calls Gemini's official gemini-2.5-flash-image endpoint and saves:

The generated PNG
A sidecar JSON file with prompt, model, overlay text, and any text returned by the API

If tool calling or script execution is not available, still return the exact prompt plus a short note on what to generate.

6. Self-critique once

Before finalizing, check for the common failure modes:

Too many subjects
Text area too small
Weak contrast
Busy background
Vague emotion
No obvious click-driving angle
Prompt accidentally describes a poster instead of a thumbnail
Prompt includes tiny text or multiple lines of copy that image models render poorly

If a failure mode is present, revise the prompt once before returning it.

Output Format

Return four blocks in this order:

Angle: one sentence describing the thumbnail idea
Overlay Text: short text for the cover
Nano Banana Prompt: the exact English prompt
Generation Notes: one short sentence with any critical instruction or fallback

Constraints

Optimize for clickability and legibility on mobile.
Favor one subject over many.
Favor strong emotion over neutral expression.
Favor simple composition over descriptive completeness.
Do not invent celebrity likenesses, trademarks, or brand assets the user did not request.
Do not promise exact text rendering quality from the image model.
If the user supplies Chinese copy, analyze in Chinese if helpful but output the final image prompt in English.
If the user gives no style direction, default to modern YouTube thumbnail aesthetics with bold contrast and clean hierarchy.
If the user gives a niche that implies a visual style, reflect it in the prompt. Example: finance should feel sharp and credible; gaming can be more exaggerated; education should feel clear and authoritative.

Missing Information

Ask a brief follow-up only when a missing detail would materially change the output, such as:

Whether a specific person must appear
Whether a real product image is required
Whether the thumbnail should use photorealistic, 3D, illustrated, or anime style
Whether there are strict brand colors or banned visual elements

Otherwise, make reasonable assumptions and proceed.

Resources

scripts/

Use create_thumbnail.py for end-to-end copy-to-thumbnail generation.

Use generate_image.py to call Nano Banana directly and save output files.

references/

Use youtube-thumbnail-patterns.md for prompt scaffolds, angle selection rules, and example transformations from raw copy to thumbnail prompt.

Use publishing-contract.md as the integration contract for callers that need stable command behavior, output JSON, and exit codes.

版本历史

共 1 个版本

v0.1.1 当前

2026-03-29 19:46 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)