概述

Img Prompt Generator

Overview

Use this skill to turn a user's image idea into a stable six-field JSON prompt for text-to-image generation.

Default to Chinese field values unless the user asks for English or bilingual output. The final JSON must contain exactly these top-level string fields: subject, action, scene, camera, style, and constraints.

Interaction Mode

Use a mixed mode:

Direct generation mode: If the latest user request explicitly asks to skip questions or let Codex decide, produce the final JSON in one turn. Trigger phrases include 直接生成, 直接输出, 直接出 JSON, 你来发挥, 自由发挥, 不用问, 不需要问我, 别问我, 按默认生成, 先给我一个版本, just generate, no questions, or close equivalents.
Guided recommendation mode: In all other cases, do not output final JSON immediately. First inspect the user's prompt across the six fields, recommend missing choices, and ask option-based questions. Continue gathering answers until the prompt has enough information for all six fields, then output the final JSON.
If the user answers by option letters/numbers, treat the selected options as authoritative. If the user writes a custom description, merge it with prior details and continue only if important fields are still missing or problematic.

Required Reference

Before producing a final prompt JSON or option recommendations, read references/six-field-manual.md unless the current conversation already includes the full rule manual. Use that file for category templates, option libraries, default style/constraint sentences, vocabulary, and field quality checks.

Workflow

Identify the user's intent and image category: portrait, poster/cover, character illustration, scene/building, UI/product concept, meme/doodle/simplified style, or universal.
Detect direct generation mode or guided recommendation mode from the latest user request.
Extract explicit details into the six fields. Do not discard user-provided subject traits, actions, scene requirements, style preferences, composition, ratio, lens choices, or prohibitions.
For each field, mark it as complete, missing, or problematic. A field is problematic when it is vague, visually hard to render, internally conflicting, mismatched with the category, or likely to produce unwanted results.
In direct generation mode, fill missing routine details with compatible creative choices, select the closest category template and defaults from the reference manual, and output valid JSON only.
In guided recommendation mode, ask option-based questions for missing or problematic fields before producing JSON. Ask at most three question groups per turn, but cover all six fields across turns if needed.
After the user has answered enough questions for all six fields, select the closest category template and defaults from the reference manual, then output valid JSON only unless the user explicitly asks for explanation.

Guided Recommendation Rules

In guided recommendation mode, ask before final output when any of these are true:

The primary subject is missing, ambiguous, or referred to only as "it", "he", "she", or "this" without enough context.
The request mixes incompatible categories or styles, such as UI product shot plus realistic emotional portrait, and no priority is clear.
The user specifies a real person, private person, brand, logo, readable text, or exact IP-like character but does not say whether likeness/text/logo fidelity is required.
The requested composition depends on an unknown target use, such as cover, poster, wallpaper, avatar, product page, or banner.
The user gives strong constraints that conflict with each other.
The prompt is sparse: it only gives a subject, object, or rough composition and lacks important choices for style, scene, action/state, camera, or constraints.
Two or more of the six fields are missing or problematic.
For portrait or realistic photo requests, the prompt does not specify the intended mood, scene/light, styling, or camera language clearly enough to avoid arbitrary creative choices.

Do not treat sparse input as permission to invent major details. Only infer ordinary gaps after the user has either selected recommended options or explicitly asked for direct generation.

For each missing field, recommend choices based on the user's existing prompt. For each problematic field, explain the issue briefly and offer optimized alternatives. For style-like fields, provide enumerated style options. For fuzzy scene details such as weather, time, background, or atmosphere, recommend plausible options that fit the existing subject and category.

All recommendation questions must use selectable options. The user may reply with option letters/numbers or a custom description. Keep option labels short, but include enough detail to show the visual consequence.

Six-Field Intake

In guided recommendation mode, inspect these six fields before asking:

subject: identity, age/type, appearance, clothing/material/structure, recognition points.
action: action/state, pose, expression, dynamics, emotion or relationship to camera/others.
scene: location, time/weather/environment, background elements, light source, atmosphere.
camera: viewpoint, shot size, composition, subject position, visual emphasis, aspect ratio, and lens when applicable.
style: style family, color tendency, texture, detail density, final visual effect.
constraints: must-preserve items, avoid items, do-not-add items, special quality concerns.

When asking questions, explicitly connect recommendations to the user's existing prompt, for example: 你已经指定了人物近景和19岁亚洲女性，目前缺少场景、情绪和镜头选择。

Lens Rule for Realistic Photos

When the category is realistic photo, portrait photo, product photo, street photo, documentary photo, editorial photo, or any request that should look camera-shot, ask for a lens choice in guided recommendation mode unless the user already gave one.

Lens options must include focal length and aperture, plus what that setup is suitable for. Prefer options from the reference manual, such as:

35mm f/1.8: environmental portrait, street/documentary scenes, more background context.
50mm f/1.4: natural close portrait, daily-life atmosphere, balanced subject and background.
85mm f/1.8: flattering close portrait, shallow depth of field, clean background separation.
70-200mm f/2.8: compressed perspective, editorial or cinematic portrait, strong background blur.
Macro 100mm f/2.8: product, beauty, texture, detail close-up.

Expansion Rules

Only creatively expand sparse descriptions in direct generation mode or after guided recommendation has collected enough user choices. When expanding, stay anchored to the user's core idea:

Add a coherent identity, visual features, mood, scene, lighting, camera, style, and constraints.
Keep the expansion plausible within one image concept.
Do not add extra people, text, logos, large props, or story elements that compete with the user's subject unless the user asked for them.

For detailed descriptions, classify and refine:

Preserve the user's explicit information first.
Move each detail into the most appropriate field.
Add only missing connective details that improve renderability, such as lighting direction, composition, material detail, or quality guardrails.
Do not overwrite the user's chosen style, color, ratio, or exclusions.

Output Contract

When ready, output one valid JSON object:

{
  "subject": "主体是...，外观特征包括...，穿着/材质/结构是...，整体气质或识别特征是...。",
  "action": "主体正在...，姿态/状态是...，表情或动态表现为...，情绪或关系上呈现出...。",
  "scene": "场景设定在...，时间/天气/环境条件是...，周围包含...，光线来自...，整体氛围是...。",
  "camera": "画面采用...视角，使用...景别与构图，主体位于...，重点强调...，画幅比例为...。",
  "style": "整体采用...风格，色彩倾向...，画面质感...，细节密度...，最终呈现出...的视觉效果。",
  "constraints": "必须保留...，避免出现...，不要...，特别注意...。"
}

Requirements:

Use strings, not arrays or nested objects.
Do not leave blanks, placeholders, TODO, or uncertain markers.
Keep every field specific enough for an image model to act on.
Put negative requirements only in constraints.
Do not wrap the final JSON in Markdown fences unless the user requests a code block.

Default Aspect Ratios

Use the user's requested ratio when supplied. Otherwise choose by category:

Portrait/photo: 3:4 or 4:5
Social cover/poster: 4:5, 3:4, or 9:16 depending on platform cues
Character illustration: 3:4
Scene/building/landscape: 16:9
UI/product concept: 16:9
Avatar/meme/doodle: 1:1
Universal fallback: choose the ratio that best supports the subject and state it in camera

Final Quality Check

Before sending final JSON, verify:

subject answers who/what, appearance, material/clothing/structure, and recognition points.
action answers current action/state, pose/dynamics, expression if relevant, and emotion/relationship.
scene answers location, time/weather/environment, surrounding elements, light source, and atmosphere.
camera answers viewpoint, shot size, composition, subject position, visual emphasis, and aspect ratio.
style answers style type, color tendency, texture, detail density, and final visual effect.
constraints includes must-preserve, avoid, do-not-add, and special attention clauses.

版本历史

共 2 个版本

v1.0.1 Initial release 当前

2026-05-13 09:52 安全安全
v1.0.0 Initial release

2026-05-12 17:31 安全安全

安全检测

腾讯云安全 (Keen)

安全，无风险

查看报告

腾讯云安全 (Sanbu)