← 返回
未分类 Key 中文

OpenClaw VLN Planner

Plan the next high-level navigation step for a robot from a user navigation instruction, one current image, and a sequence of historical images. Use when the...
根据用户导航指令、当前图像及历史图像序列,为机器人规划下一步高层导航动作。适用于...
tiktokdad tiktokdad 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 345
下载
💾 1
安装
1
版本
#latest

概述

OpenClaw VLN Planner

Use this skill when the user wants a robot to follow a natural-language navigation instruction from visual observations.

This skill is a high-level navigation planner. It does not produce motor, joint, torque, or trajectory control. It only produces one structured mid-level navigation action at a time.

When this skill triggers

Trigger this skill when the task includes one or more of the following:

  • Vision-language navigation (VLN)
  • Robot next-step planning from camera images
  • Closed-loop navigation with replanning after each observation
  • Converting a current frame plus historical frames into a single next navigation action
  • Sending current + history images to an OpenAI-compatible multimodal gateway for action prediction

Required inputs

The planner expects:

  • user_instruction: natural-language navigation instruction
  • current_frame: exactly one current image
  • history_frames: zero or more previous images in temporal order

Optional inputs:

  • robot_state: heading, speed, pose estimate, execution feedback, etc.
  • safety_flags: blocked, collision_risk, lost, target_reached, low_visibility, etc.
  • config_path: path to the runtime config file

Output contract

Output must be pure JSON only. Do not prepend or append prose.

Allowed action types only:

  • MOVE_FORWARD
  • TURN_LEFT
  • TURN_RIGHT
  • STOP

Expected JSON shape:

{
  "next_action": {
    "type": "MOVE_FORWARD",
    "value": 75,
    "unit": "cm"
  },
  "task_status": "in_progress",
  "confidence": 0.87,
  "notes": "continue along the hallway"
}

Completion example:

{
  "next_action": {
    "type": "STOP"
  },
  "task_status": "completed",
  "confidence": 0.93,
  "notes": "goal reached"
}

Core rules

  1. Plan only the next action.
  2. Never output a full route.
  3. Replan after each execution step.
  4. If uncertain, unsafe, blocked, unable to parse, or visually ambiguous, output STOP.
  5. Enforce action bounds:
    • MOVE_FORWARD: 10-150 cm
    • TURN_LEFT: 5-90 deg
    • TURN_RIGHT: 5-90 deg
    • STOP: no value/unit required
  6. If safety_flags.target_reached == true, output STOP with task_status = completed.
  7. If blocked, collision_risk, lost, or severe uncertainty is present, prefer STOP.

Runtime configuration

Before running, load a YAML config file such as config/vln-config.yaml.

The config should define:

  • subscribed or logical input topics / channels for current frame and history frame collection
  • optional robot state and safety flag sources
  • OpenAI-compatible multimodal gateway settings: base_url, api_key, model_id
  • planner behavior such as confidence threshold and safety fallback
  • executor bridge mode (default: Python function bridge)

Read references/navigation-schema.md for the expected config structure.

Internal module design

1) context builder

Build a model input payload from:

  • user instruction
  • historical observations
  • current observation
  • optional robot state
  • optional safety flags

The prompt must explicitly separate:

  • historical observations
  • current observation
  • user instruction

2) action planner

Call an OpenAI-compatible multimodal gateway with:

  • one current image
  • historical images
  • planner prompt
  • optional structured context

The model should be asked to return pure JSON for exactly one next action.

3) action parser

Parse the model result as JSON.

If parsing fails:

  • try safe extraction of the first JSON object
  • if still invalid, fall back to STOP

4) action validator

Validate:

  • action type is one of the four allowed values
  • distance and angle ranges are legal
  • unit matches action type
  • confidence is numeric if present
  • task_status is one of in_progress, completed, failed

Any invalid output falls back to STOP.

5) executor bridge

Forward the validated mid-level action to a separate execution layer.

Reserved Python bridge interface:

  • execute_move_forward(distance_cm)
  • execute_turn_left(angle_deg)
  • execute_turn_right(angle_deg)
  • execute_stop()
  • get_robot_state()
  • get_safety_flags()

Do not hardcode a robot SDK into the planner logic.

6) replanning loop

Use the planner in a closed loop:

  1. gather current frame + history frames
  2. gather optional robot state / safety flags
  3. call multimodal planner
  4. parse and validate JSON action
  5. execute through bridge
  6. observe again
  7. repeat until task_status = completed or forced stop

7) safety fallback

Always stop on:

  • parse failure
  • invalid action
  • confidence below threshold
  • blocked / collision risk / lost / target reached
  • missing visual evidence for safe motion

Prompt template

Use this prompt pattern:

You are a robot navigation planner.
You will receive:
1. historical observations
2. current observation
3. a user instruction
4. optional robot state and safety flags

Your job is to decide the robot's next single mid-level navigation action.
You may output only one of these actions:
- MOVE_FORWARD with distance in cm
- TURN_LEFT with angle in deg
- TURN_RIGHT with angle in deg
- STOP

Rules:
- Plan only the next step, not the whole route.
- If the goal has been reached, output STOP.
- If you are uncertain, the scene is unclear, or there is any safety risk, output STOP.
- MOVE_FORWARD must be 10-150 cm.
- TURN_LEFT and TURN_RIGHT must be 5-90 deg.
- Output pure JSON only, with no extra explanation.

Example user requests

  • "Go down the hallway and stop at the blue door."
  • "Move to the kitchen entrance."
  • "Find the end of the corridor and stop."
  • "Turn right at the next intersection and continue."

Failure handling

If anything is wrong with the output, return:

{
  "next_action": {
    "type": "STOP"
  },
  "task_status": "failed",
  "confidence": 0.0,
  "notes": "fallback_stop"
}

Bundled resources

  • references/navigation-schema.md: schema, bounds, safety fallback, examples, config contract
  • scripts/vln_bridge.py: example OpenAI-compatible multimodal planner + Python executor bridge
  • scripts/requirements.txt: Python dependencies
  • config/vln-config.yaml: runtime config template

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-03-31 17:19 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

ai-agent

Find Skills

root
帮助用户发现和安装智能体技能,当用户询问如「如何做X」、「找X的技能」、「有能做...的吗」等问题时
★ 1,517 📥 572,702
ai-agent

Self-Improving + Proactive Agent

ivangdavila
自我反思+自我批评+自我学习+自组织记忆。智能体评估自身工作、发现错误并持续改进。
★ 1,441 📥 328,199
ai-agent

Agent Browser

rez0
用于 AI 代理的浏览器自动化 CLI。当用户需要与网站交互(包括浏览页面、填写表单、点击按钮、截图等)时使用。
★ 865 📥 343,497