概述

Product Main Image Prompt Extractor

This skill guides you on how to extract visual features and prompts from product main images using multimodal AI, helping e-commerce sellers turn unstructured image data into structured, actionable insights.

Core Concepts

This tool performs deep visual analysis on product main images (and optionally additional images) from a product list. It uses a multimodal AI model to identify specific visual dimensions based on a natural language instruction, such as color, shape, style, material, or specific selling-point elements.

How it works: You provide a list of products (with image URLs) and a natural language prompt describing what to extract. The tool automatically iterates over all products, analyzes each image, and returns structured attribute data (attributeName + attributeValue) appended to each product record.

Row expansion: When extracting multiple dimensions in a single request (e.g., both color and shape), each original product row is duplicated per dimension, resulting in one row per product per attribute.

Parameter Guide

| Parameter | Required | Description |

|-----------|----------|-------------|

| productImageAnalysisPrompt | Yes | Natural language instruction describing what visual information to extract from the images. Be specific about the dimensions you want (color, material, shape, style, pendant type, etc.). |

| analyzeAdditionalImages | No | Whether to also analyze additional product images beyond the main image. Defaults to false. |

| refResultData | No | Reference data from a previous step, containing the product list to analyze. Must be a JSON string with a products array. |

| userInput | No | Supplementary user input for additional context. |

Writing Effective Prompts

Be dimension-specific: Clearly state what visual attribute(s) to extract. "Extract the dominant color of each product" is better than "Analyze the images."
One or few dimensions per call: For cleaner results, focus on one or two dimensions at a time.
Use concrete terms: "Identify the pendant/charm shape on the product" is clearer than "Look at the decorations."
No need to specify individual products: The tool automatically iterates over all products in the input list.
Data flow dependency: The tool requires upstream product data. It cannot reference "products from the previous conversation round" -- the data must be explicitly provided via the current step's input or resource references.

Prompt Examples

| Goal | Example Prompt |

|------|---------------|

| Extract dominant color | "Analyze each product's main image and extract the primary color of the product" |

| Identify material | "From each product's main image, identify the apparent material (plastic, metal, wood, fabric, etc.)" |

| Classify pendant shape | "Analyze each product's main image and identify the shape of the pendant/charm (round, heart, star, etc.)" |

| Detect style | "Extract the overall style of each product from its main image (minimalist, vintage, bohemian, industrial, etc.)" |

| Reverse-engineer image prompt | "Based on the main image, infer the likely AI-generation prompt or visual description that could reproduce this image" |

| Multi-dimension extraction | "From each main image, extract both the dominant color and the overall product shape" |

调用方式

API 端点：POST /multimodal/extractPromptsFromMainImage（完整参数/响应/错误码见 references/api.md）
Python 脚本：python scripts/multimodal_extract_attributes.py '' [--inline]
成本约束：本工具会消耗积分；同一会话同一参数组合默认只调用一次，脚本带 24h 本地缓存。失败/空结果不得自动换关键词、翻页或改邮编连续试探；需要继续检索时先向用户说明会产生额外消耗。

输出策略（脚本默认行为）：

始终将完整响应写入 /linkfox///data/linkfox-multimodal-extract-attributes-.json（为脚本执行时的工作目录，在 Claude Code 里即当前项目目录；取自环境变量 SESSION_ID，按用户任务自动聚合；禁止写入 /tmp，当前目录不可写则报错）
响应体 ≤ 8 KB：落盘后把完整 JSON 打印到 stdout
响应体 > 8 KB：落盘后 stdout 只输出摘要（顶层字段、常见计数如 total/costToken、最大列表字段的长度 + 前 3 条样本）
加 --inline 强制全量打印到 stdout（同样落盘）

读数据建议：先看摘要判断是否足够；需要具体字段时优先用 jq或ConvertFrom-Json 从保存的 json 文件按需抽取，避免整份 JSON 进入上下文。

Response Structure

The response enriches the original product list with extracted attributes:

products: An array of product records, each augmented with attributeName (the dimension extracted, e.g., "color") and attributeValue (the extracted value, e.g., "red"). One record per product per attribute dimension.
attributeGroups: Products grouped by attribute name and value for easy comparison. Each group includes the attribute value, the count of products, and the list of ASINs.
columns: Column definitions for rendering the result table.
costToken: Total tokens consumed by the multimodal AI model.

Display Rules

Present data in tables: Show extracted attributes in clear, well-formatted tables with product identifiers (ASIN, title) alongside the extracted attribute values.
Highlight distribution: When attribute groups are returned, summarize the distribution (e.g., "60% of products are red, 25% blue, 15% green") to give the user a quick overview.
Row expansion notice: If multiple dimensions were extracted, inform the user that each product appears once per dimension in the results.
Error handling: When analysis fails, explain the reason based on the response message and suggest adjustments (e.g., ensuring the product list contains valid image URLs).
Data dependency reminder: If the user tries to reference products from a previous conversation round without explicit data flow, remind them that the product data must come from an upstream step in the current pipeline.
No subjective advice: Present the extracted visual features factually. Let the user draw their own business conclusions.

Important Limitations

Requires product data input: The tool cannot operate without a products array containing image URLs. It depends on upstream data from a prior step.
No fuzzy references: Cannot analyze "products from the last conversation" -- data must be explicitly piped in via refResultData or resource references.
Row multiplication: Extracting N dimensions from M products produces up to M x N rows in the output.
Image accessibility: Product image URLs must be publicly accessible for the analysis to succeed.

User Expression & Scenario Quick Reference

Applicable -- Visual feature extraction and image analysis for product listings:

| User Says | Scenario |

|-----------|----------|

| "What colors are these products" | Dominant color extraction |

| "Analyze the product images", "Look at the main photos" | General visual feature extraction |

| "What material does it look like" | Material identification |

| "What shapes/styles are popular" | Shape or style classification |

| "Reverse the image prompt", "What prompt made this image" | Image prompt reverse-engineering |

| "Group products by visual appearance" | Visual attribute grouping & statistics |

| "Extract features from the product photos" | Structured attribute extraction |

Not applicable -- Needs beyond image-based visual analysis:

Text-based product data queries (use appropriate data query tools)
Listing copywriting or review analysis
Price or sales data analysis
Tasks that do not involve product images

Feedback:

Auto-detect and report feedback via the Feedback API when any of the following apply:

The functionality or purpose described in this skill does not match actual behavior
The skill's results do not match the user's intent
The user expresses dissatisfaction or praise about this skill
Anything you believe could be improved

Call the feedback API as specified in references/api.md. Do not interrupt the user's flow.

For more high-quality, professional cross-border e-commerce skills, set LinkFox Skills.

版本历史

共 2 个版本

v1.0.2 当前

2026-07-03 21:12
v1.0.0

2026-05-07 18:56 安全安全

安全检测

腾讯云安全 (Keen)

队列中

腾讯云安全 (Sanbu)

队列中

Multimodal Extract Attributes

概述