← 返回
未分类 中文

Lance Store

Persist and retrieve structured data using the Lance columnar format. Use when you need to store, query, or analyze data across sessions — such as saving ski...
使用 Lance 列式格式持久化和检索结构化数据,适用于跨会话存储、查询或分析——如保存滑雪...
vitorhugoze vitorhugoze 来源
未分类 clawhub v1.0.12 1 版本 100000 Key: 无需
★ 0
Stars
📥 390
下载
💾 0
安装
1
版本
#latest

概述

Lance Store

Installation

python3 -m pip install -r requirements.txt

A persistent data store using the Lance columnar format for fast ML data access.

Quick Start

# List all datasets and their metadata
python3 scripts/command.py list-datasets-info

# Create a dataset
python3 scripts/command.py create-dataset <name> <field1> <field2> ...

# Append data
python3 scripts/command.py append-to-dataset <name> <value1> <value2> ...

# Read all records from a dataset
python3 scripts/command.py read-dataset <name>

Note: list-datasets-info shows dataset metadata (schema, field types, record count) — it does not return the actual data rows. Use read-dataset to retrieve records.

Storage Location

DataSets are created and stored on the current path '.'

Critical Behavior: Data Type Strictness

⚠️ Lance is strict about data types — they CANNOT change after the first record

When you append the first record to a dataset, Lance infers the data type for each field. All subsequent records MUST use the same types.

Example — this FAILS:

# First record: age as STRING
append-to-dataset users "John" "25" "john@test.com"

# Second record: age as INTEGER (will FAIL!)
append-to-dataset users "Jane" 30 "jane@test.com"
# Error: `age` should have type large_string but type was int64

Correct approach — maintain consistent types:

# First record: age as STRING
append-to-dataset users "John" "25" "john@test.com"

# Second record: age as STRING
append-to-dataset users "Jane" "30" "jane@test.com"

Why This Matters

Unlike traditional databases that may coerce types, Lance rejects type mismatches. If you store numbers as strings initially, you must always pass strings. Plan your schema carefully.

Initialization Workflow

When starting a session, always initialize by listing existing datasets first:

# This command returns ALL datasets with their structure
python3 scripts/command.py list-datasets-info

Example output:

{
    "skill": "lance",
    "operation": "list_datasets_info",
    "status": "success",
    "data": [
        {
            "dataset_name": "users",
            "path": "/data/users",
            "fields": ["name", "age", "email"],
            "field_types": {
                "_id": "large_string",
                "_updated_at": "timestamp[us]",
                "name": "large_string",
                "age": "large_string",
                "email": "large_string"
            },
            "record_count": 2,
            "columns": ["id", "_updated_at", "name", "age", "email"],
            "last_updated": "2026-03-21T17:57:44.595628"
        }
    ],
    "error": null
}

Understanding field_types

StateMeaning
-------------------------------------------------------------------------
{} (empty)Dataset exists but no records yet — types not yet defined
populatedTypes are locked — appends must match

Important: If field_types is empty, the first append will define types. Be deliberate about the first record's types.

Commands Reference

Create Dataset

python3 scripts/command.py create-dataset <name> <field1> <field2> ...

Creates a metadata entry. Fields have no types until first append.

Append Record

python3 scripts/command.py append-to-dataset <name> <value1> <value2> ...

Appends one record. Types are inferred from first record.

Batch Append

python3 scripts/command.py batch-append-to-dataset <name> '<json-array>'

Example: batch-append-to-dataset users '[["Alice", "22", "alice@test.com"], ["Bob", "35", "bob@test.com"]]'

Update Record

python3 scripts/command.py update-dataset-record <name> <record_id> <value1> <value2> ...

Updates fields for a specific record by ID.

Delete Record

python3 scripts/command.py delete-dataset-record <name> <record_id>

List All Datasets

python3 scripts/command.py list-datasets

Get Dataset Info

python3 scripts/command.py get-dataset-info <name>

Returns schema, field types (if data exists), and record count.

List All Datasets with Full Info

python3 scripts/command.py list-datasets-info

Recommended for initialization. Returns all datasets with complete metadata.

Get Dataset Path

python3 scripts/command.py get-dataset-path-info <name>

Backup Dataset

python3 scripts/command.py backup-dataset <name> <backup_path>

Count Records

python3 scripts/command.py count-records <name>

Read All Records

Returns all records from the dataset as a list of objects.

python3 scripts/command.py read-dataset <name>

Drop Dataset

Requires confirmation if have not created a backup beforehand.

Delete the entire dataset and its metadata.

python3 scripts/command.py drop-dataset <name>

Internal fields available in every dataset:

FieldTypeDescription
------------------------------------------------------------------
_idstringUUID — unique record identifier
_updated_attimestampWhen the record was last inserted or updated

List Records (Paginated)

python3 scripts/command.py list-records <name> --limit 10 --offset 0

Returns records with optional pagination.

Get Single Record

python3 scripts/command.py get-record <name> <record_id>

Retrieves a specific record by its UUID.

Get Dataset Info

python3 scripts/command.py get-dataset-info <name>

Returns schema, field types (if data exists), and record count.

Response Format

All commands return JSON:

{
  "skill": "lance",
  "operation": "<operation_name>",
  "status": "success|error",
  "data": <result_data_or_null>,
  "error": <error_message_or_null>
}

Internal Fields

Every dataset automatically includes:

  • _id — UUID for each record
  • _updated_at — timestamp of last insert/update

These are managed automatically — when appending, only provide your defined fields.

Data Type Inference

Lance infers types from the first record:

Python TypeLance Type
----------------------------
"string"large_string
25 (int)int64
25.5 (float)float64
True/Falsebool

CLI caveat: When passing via command line, all values are strings. To ensure integer types, initialize with actual integers in a script rather than CLI.

Tips

  1. Initialize at session start: Run list-datasets-info to understand what data already exists
  2. Plan your schema: First record determines types for the entire dataset
  3. Use batch append when adding multiple records: More efficient than individual appends

Requirements

Dependencies are declared in frontmatter (metadata.openclaw.install) and handled by the OpenClaw install system via uv. The Python packages required are:

⚠️ Naming note: Despite the PyPI package being named pylance, the library is imported as import lance in Python code. This is the official Lance project naming convention — it is NOT the VS Code "pylance" language server. See lance.org for details.

  • pandas — Data manipulation

版本历史

共 1 个版本

  • v1.0.12 当前
    2026-05-07 05:52 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 216 📥 71,534
data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 298 📥 142,970
data-analysis

Stock Analysis

udiedrichsen
利用Yahoo Finance数据深度分析股票和加密货币。支持投资组合管理、关注列表与提醒、股息分析、八维度股票评分、热门趋势扫描(热点扫描器)及谣言/早期信号检测。适用于股票分析、投资组合追踪、财报反应、加密货币监控、热门股票发现及在主流
★ 282 📥 58,246