将用户的自然语言描述或参考图片转换为 Ideogram 4.0 标准格式的结构化 JSON 提示词。
| 能力 | 说明 |
|---|---|
| ------ | ------ |
| 色彩调色板控制 | 每张图最多 16 种十六进制颜色,直接控制主色调 |
| 边界框布局 | 任意元素可通过 [y_min, x_min, y_max, x_max] 定位(0–1000 归一化坐标) |
| 文本元素渲染 | text 类型元素支持多行、多字体、带样式的图像内文本 |
输入 A — 纯文字描述:用户直接用自然语言描述想要的图像。
输入 B — 参考图片:用户提供一张图片作为参考(可能附带简短说明)。此时必须使用 VLM 模型分析图片内容,提取视觉信息后生成 JSON。
输入 C — 文字 + 参考图片:用户同时提供描述和参考图。以参考图的视觉信息为基础,结合用户文字进行增强和调整。
根据输入内容,按照以下 JSON Schema 构建输出:
{
"high_level_description": "整体画面的一句话概括,包含主体、氛围、风格基调",
"style_description": {
"aesthetics": "美学风格关键词",
"lighting": "光照描述",
"photo": "摄影参数/胶片风格(如果是照片)",
"medium": "媒介类型:Photograph / Illustration / Digital art 等",
"art_style": "具体艺术风格",
"color_palette": ["#HEX1", "#HEX2"]
},
"compositional_deconstruction": {
"background": "背景的详细描述,包括环境、氛围、光影效果",
"elements": [
{
"type": "obj | text",
"bbox": [y_min, x_min, y_max, x_max],
"desc": "对象/元素的详细视觉描述",
"text": "仅 type=text 时填写,要渲染的文字内容"
}
]
}
}
color_palette 是可选字段,但推荐在需要精确色彩控制时提供(每图最多 16 色)bbox 是可选字段,但推荐对关键元素提供精确定位type 字段:普通物体用 "obj",需要在图中渲染文字用 "text"text 元素必须同时包含 text(原始文字)和 desc(样式描述)两个字段[y_min, x_min, y_max, x_max],原点在左上角,范围 0–1000一句话总览。包含:
好示例:
> "A cinematic 35mm film photograph of a lone wooden sailboat on a glassy lake at sunset, the boat on a right-third vertical with the horizon at the lower third, in a cool muted blue palette."
差示例:
> "一艘船在湖上" (太模糊,缺少风格、构图、氛围信息)
各子字段不是都必须填写,根据实际需求选择:
| 子字段 | 何时填写 | 示例 |
|---|---|---|
| -------- | ---------- | ------ |
aesthetics | 总是建议填写 | "Cinematic, minimal, serene" |
lighting | 有明确光照意图时 | "Cool overcast dusk light with a small warm sun low at the horizon" |
photo | 摄影类图像时 | "35mm motion-picture film still, 16:9 framing, subtle grain" |
medium | 总是建议填写 | "Photograph" / "Digital vector graphic" / "Ink and watercolor" |
art_style | 非写实风格时 | "hand-drawn editorial illustration, flat color fills with subtle ink linework" |
color_palette | 需要精确色控时 | ["#1B3A5C", "#5B8FB9"] |
背景描述要覆盖:
每个元素遵循以下原则:
text 字段:实际要显示的文字,\n 表示换行desc 字段:字体、大小、颜色、排版方式的视觉描述bbox:文字区域的定位框直接将用户描述拆解为 JSON 的各个字段。如果用户描述比较模糊,需要合理推断并补充细节。例如:
用户输入:"一只猫坐在窗台上看书"
推断补充:
使用 VLM 分析图片后提取:
然后将以上信息重组为标准 JSON 格式。
以图片分析结果为基础骨架,用户文字作为修改指令覆盖或增强对应部分。优先级:用户明确指定 > 图片分析 > 合理推断。
输入:"黄昏时分的海边灯塔,暖色调,电影感"
{
"high_level_description": "A cinematic wide-angle photograph of a solitary lighthouse standing on a rocky coastline at golden hour, warm amber light bathing the stone tower, shot on Kodak Portra 400 with deep foreground rocks and a misty sea stretching to the horizon.",
"style_description": {
"aesthetics": "Cinematic, epic, solitary, warm nostalgia.",
"lighting": "Golden hour directional light from the right, long shadows across the rocks, warm amber glow on the lighthouse face, cool blue shadows in the crevices.",
"photo": "Medium format film photograph, 3:2 aspect ratio, shallow depth of field on the lighthouse, subtle grain, slightly desaturated except for warm highlights.",
"medium": "Photograph.",
"color_palette": ["#D4A574", "#2C3E50", "#E8B86D", "#1A252F"]
},
"compositional_deconstruction": {
"background": "A rugged rocky coastline at golden hour, the sea stretching to a hazy horizon in the lower two-thirds of the frame. Warm directional light rakes across the rock faces from the right, casting long deep-blue shadows. Distant sea mist softens the horizon line. Cool-toned water with golden reflections near the shore.",
"elements": [
{
"type": "obj",
"bbox": [120, 380, 620, 620],
"desc": "Tall cylindrical stone lighthouse anchored at the upper-right third, weathered white-gray masonry with horizontal banding, a dark lantern room capped with a domed glass roof at the top. Warm golden light catching the right-facing side, deep shadow on the left. A narrow gallery railing circles the lantern room."
},
{
"type": "obj",
"desc": "Beam of light cutting diagonally from the lantern room toward the upper-left, visible as a soft volumetric ray through the sea mist, fading gradually into the atmosphere."
},
{
"type": "obj",
"desc": "Rough granite boulders filling the foreground and midground, textured with cracks and patches of orange-brown lichen. Wet surfaces reflecting the golden sky. Sharp focus in the immediate foreground, gently softening toward the lighthouse base."
},
{
"type": "obj",
"desc": "Choppy sea in the lower third, deep teal-blue with scattered golden highlights from the sky reflection. White foam crests on distant waves. Soft motion blur suggesting a 1/4 second exposure."
},
{
"type": "obj",
"desc": "Pale amber-gold sky occupying the upper third, gradient from warm cream near the horizon to soft peach and faint violet at the top edge. No clouds, just a smooth atmospheric gradient with a hint of sea haze."
}
]
}
}
输入:"做一个咖啡店开业海报,深绿色配金色文字"
{
"high_level_description": "A sophisticated square 1:1 poster for a specialty coffee shop grand opening, featuring elegant serif typography in gold on a deep forest-green background with botanical coffee-plant illustrations and minimalist layout.",
"style_description": {
"aesthetics": "Sophisticated, artisanal, botanical elegance, minimal luxury.",
"lighting": "Even studio lighting with subtle warm accent on gold elements, no strong directional source.",
"medium": "Digital graphic design, square poster format.",
"art_style": "Clean modernist design with botanical illustration accents, ample negative space.",
"color_palette": ["#1A2E1A", "#D4AF37", "#F5F5DC", "#2D4A2D"]
},
"compositional_deconstruction": {
"background": "Deep forest-green (#1A2E2A) solid background filling the entire frame edge-to-edge. Subtle texture of fine linen paper grain across the surface. Delicate faint botanical line-art pattern of coffee leaves and branches watermarking the background at very low opacity, especially concentrated near the edges.",
"elements": [
{
"type": "text",
"bbox": [80, 200, 420, 800],
"text": "ARTISAN\nCOFFEE\nHOUSE",
"desc": "Large display serif headline in metallic antique gold (#D4AF37), three lines stacked vertically, generous letter-spacing, slight embossed effect. Dominating the left-center area."
},
{
"type": "text",
"bbox": [480, 280, 540, 720],
"text": "GRAND OPENING\nJUNE 15 · 2026",
"desc": "Medium-sized serif text in cream (#F5F5DC), two lines, centered horizontally below the main headline. Elegant but restrained proportions."
},
{
"type": "text",
"bbox": [580, 350, 650, 650],
"text": "SINGLE ORIGIN · HAND ROASTED\nDAILY FROM 7 AM",
"desc": "Small tracked sans-serif caps in muted cream, two lines, centered. Supporting tagline beneath the date block."
},
{
"type": "obj",
"desc": "Delicate botanical illustration of a coffee plant branch with leaves and unripe berries, rendered in thin gold line-art, positioned as a decorative element wrapping around the right edge of the headline text area."
},
{
"type": "obj",
"desc": "Minimal line-art icon of a coffee cup with steam rising, small scale, in gold, positioned at the bottom center below all text blocks as a visual anchor."
},
{
"type": "text",
"bbox": [900, 220, 965, 780],
"text": "123 ARTIST STREET · DOWNTOWN",
"desc": "Tiny sans-serif address in pale cream at the very bottom, centered, the smallest type on the poster."
}
]
}
}
photo 字段。high_level_description、background 和各 element 的 desc 在风格、色调、光照上要保持一致。color_palette,让模型有合理的创作空间。共 1 个版本