Generate TikTok / Instagram Reels style activewear UGC product videos from a single outfit image.
Requires: imastudio-cli npm package (ima command) and IMA_API_KEY.
Get your API key at: https://imastudio.com
When the user provides an outfit image (local file or URL), execute these steps in order:
Use your vision capability to extract from the image:
Every generated video MUST maintain across all shots:
Tell the model explicitly in every prompt: "same model, same outfit, same styling throughout."
Build two 15-second video prompts:
A) Talking Head (influencer speaks to camera):
B) Voiceover (aesthetic b-roll + narration):
Visual rules for both: handheld but polished, punch-in zooms, natural daylight, clean transitions, strong silhouette emphasis, premium social-native look.
ima upload <outfit-image> --json
Use the returned url as input for video generation.
For each prompt (talking head + voiceover), run:
ima create-task \
--task-type image_to_video \
--model wan2.6-i2v \
--param prompt="<assembled prompt>" \
--param input_images="<image_url>" \
--param duration=10 \
--param aspect_ratio=9:16 \
--wait --json
Model selection:
| Priority | Model | model_id | Best for |
|---|---|---|---|
| ---------- | ------- | ---------- | ---------- |
| Default | Wan 2.6 | wan2.6-i2v | Balanced quality + speed |
| Premium | Kling O1 | kling-video-o1 | Best consistency |
| Fast | Seedance 2.0 Fast | ima-pro-fast | Quick iteration |
Use 9:16 aspect ratio (vertical/portrait) for TikTok and Reels.
For the voiceover video, generate a spoken script:
ima create-task \
--task-type text_to_speech \
--model seed-tts-1.1 \
--param prompt="<voiceover script>" \
--wait --json
Script tone: confident, aspirational, social-native. Not salesy — like a friend recommending a find.
Send each video to the user with:
When building the video generation prompt, include ALL of these elements:
Example prompt:
> A confident young woman in a modern minimalist apartment, natural daylight. She wears a matching sage-green ribbed sports bra and high-waisted leggings set with subtle logo on waistband. She looks at camera with a warm smile, then the camera punches in on the fabric texture and waistband detail. She turns showing the silhouette from the side, walks toward the window. Final full-body hero shot, hands on hips. Handheld camera, premium social-media look. Same model, same outfit, same styling throughout.
Talking head hook lines:
Voiceover narration example:
> "When I say this set hits different — I mean it. The ribbing, the compression, the way it moves with you. From the gym to coffee runs, this is the one."
| Parameter | Required | Default | Description |
|---|---|---|---|
| ----------- | ---------- | --------- | ------------- |
| image | Yes | — | Outfit photo (local path or URL) |
| mode | No | both | talking_head, voiceover, or both |
| scene_type | No | auto-detected | gym, street, studio, café, rooftop |
| brand | No | — | Brand name for script mentions |
| outfit_description | No | auto-analyzed | Override auto-analysis |
image_to_video (not text_to_video) to maintain outfit consistency from the source imagekling-video-o1 which has stronger reference adherence共 1 个版本