Use this skill to turn a user's image idea into a stable six-field JSON prompt for text-to-image generation.
Default to Chinese field values unless the user asks for English or bilingual output. The final JSON must contain exactly these top-level string fields: subject, action, scene, camera, style, and constraints.
Use a mixed mode:
直接生成, 直接输出, 直接出 JSON, 你来发挥, 自由发挥, 不用问, 不需要问我, 别问我, 按默认生成, 先给我一个版本, just generate, no questions, or close equivalents.Before producing a final prompt JSON or option recommendations, read references/six-field-manual.md unless the current conversation already includes the full rule manual. Use that file for category templates, option libraries, default style/constraint sentences, vocabulary, and field quality checks.
complete, missing, or problematic. A field is problematic when it is vague, visually hard to render, internally conflicting, mismatched with the category, or likely to produce unwanted results.In guided recommendation mode, ask before final output when any of these are true:
Do not treat sparse input as permission to invent major details. Only infer ordinary gaps after the user has either selected recommended options or explicitly asked for direct generation.
For each missing field, recommend choices based on the user's existing prompt. For each problematic field, explain the issue briefly and offer optimized alternatives. For style-like fields, provide enumerated style options. For fuzzy scene details such as weather, time, background, or atmosphere, recommend plausible options that fit the existing subject and category.
All recommendation questions must use selectable options. The user may reply with option letters/numbers or a custom description. Keep option labels short, but include enough detail to show the visual consequence.
In guided recommendation mode, inspect these six fields before asking:
subject: identity, age/type, appearance, clothing/material/structure, recognition points.action: action/state, pose, expression, dynamics, emotion or relationship to camera/others.scene: location, time/weather/environment, background elements, light source, atmosphere.camera: viewpoint, shot size, composition, subject position, visual emphasis, aspect ratio, and lens when applicable.style: style family, color tendency, texture, detail density, final visual effect.constraints: must-preserve items, avoid items, do-not-add items, special quality concerns.When asking questions, explicitly connect recommendations to the user's existing prompt, for example: 你已经指定了人物近景和19岁亚洲女性,目前缺少场景、情绪和镜头选择。
When the category is realistic photo, portrait photo, product photo, street photo, documentary photo, editorial photo, or any request that should look camera-shot, ask for a lens choice in guided recommendation mode unless the user already gave one.
Lens options must include focal length and aperture, plus what that setup is suitable for. Prefer options from the reference manual, such as:
Only creatively expand sparse descriptions in direct generation mode or after guided recommendation has collected enough user choices. When expanding, stay anchored to the user's core idea:
For detailed descriptions, classify and refine:
When ready, output one valid JSON object:
{
"subject": "主体是...,外观特征包括...,穿着/材质/结构是...,整体气质或识别特征是...。",
"action": "主体正在...,姿态/状态是...,表情或动态表现为...,情绪或关系上呈现出...。",
"scene": "场景设定在...,时间/天气/环境条件是...,周围包含...,光线来自...,整体氛围是...。",
"camera": "画面采用...视角,使用...景别与构图,主体位于...,重点强调...,画幅比例为...。",
"style": "整体采用...风格,色彩倾向...,画面质感...,细节密度...,最终呈现出...的视觉效果。",
"constraints": "必须保留...,避免出现...,不要...,特别注意...。"
}
Requirements:
TODO, or uncertain markers.constraints.Use the user's requested ratio when supplied. Otherwise choose by category:
3:4 or 4:54:5, 3:4, or 9:16 depending on platform cues3:416:916:91:1cameraBefore sending final JSON, verify:
subject answers who/what, appearance, material/clothing/structure, and recognition points.action answers current action/state, pose/dynamics, expression if relevant, and emotion/relationship.scene answers location, time/weather/environment, surrounding elements, light source, and atmosphere.camera answers viewpoint, shot size, composition, subject position, visual emphasis, and aspect ratio.style answers style type, color tendency, texture, detail density, and final visual effect.constraints includes must-preserve, avoid, do-not-add, and special attention clauses.共 2 个版本