uv pip install pylance pandas
A persistent data store using the Lance columnar format for fast ML data access.
# List all datasets and their metadata
python3 scripts/command.py list-datasets-info
# Create a dataset
python3 scripts/command.py create-dataset <name> <field1> <field2> ...
# Append data
python3 scripts/command.py append-to-dataset <name> <value1> <value2> ...
# Read all records from a dataset
python3 scripts/command.py read-dataset <name>
Note: list-datasets-info shows dataset metadata (schema, field types, record count) — it does not return the actual data rows. Use read-dataset to retrieve records.
DataSets are created and stored on the current path '.'
⚠️ Lance is strict about data types — they CANNOT change after the first record
When you append the first record to a dataset, Lance infers the data type for each field. All subsequent records MUST use the same types.
Example — this FAILS:
# First record: age as STRING
append-to-dataset users "John" "25" "john@test.com"
# Second record: age as INTEGER (will FAIL!)
append-to-dataset users "Jane" 30 "jane@test.com"
# Error: `age` should have type large_string but type was int64
Correct approach — maintain consistent types:
# First record: age as STRING
append-to-dataset users "John" "25" "john@test.com"
# Second record: age as STRING
append-to-dataset users "Jane" "30" "jane@test.com"
Unlike traditional databases that may coerce types, Lance rejects type mismatches. If you store numbers as strings initially, you must always pass strings. Plan your schema carefully.
When starting a session, always initialize by listing existing datasets first:
# This command returns ALL datasets with their structure
python3 scripts/command.py list-datasets-info
Example output:
{
"skill": "data-vault",
"operation": "list_datasets_info",
"status": "success",
"data": [
{
"dataset_name": "users",
"path": "/data/users",
"fields": ["name", "age", "email"],
"field_types": {
"_id": "large_string",
"_updated_at": "timestamp[us]",
"name": "large_string",
"age": "large_string",
"email": "large_string"
},
"record_count": 2,
"columns": ["id", "_updated_at", "name", "age", "email"],
"last_updated": "2026-03-21T17:57:44.595628"
}
],
"error": null
}
field_types| State | Meaning |
|---|---|
| ------------ | ------------------------------------------------------------- |
{} (empty) | Dataset exists but no records yet — types not yet defined |
| populated | Types are locked — appends must match |
Important: If field_types is empty, the first append will define types. Be deliberate about the first record's types.
python3 scripts/command.py create-dataset <name> <field1> <field2> ...
Creates a metadata entry. Fields have no types until first append.
python3 scripts/command.py append-to-dataset <name> <value1> <value2> ...
Appends one record. Types are inferred from first record.
python3 scripts/command.py batch-append-to-dataset <name> '<json-array>'
Example: batch-append-to-dataset users '[["Alice", "22", "alice@test.com"], ["Bob", "35", "bob@test.com"]]'
python3 scripts/command.py update-dataset-record <name> <record_id> <value1> <value2> ...
Updates fields for a specific record by ID.
python3 scripts/command.py delete-dataset-record <name> <record_id>
python3 scripts/command.py list-datasets
python3 scripts/command.py get-dataset-info <name>
Returns schema, field types (if data exists), and record count.
python3 scripts/command.py list-datasets-info
Recommended for initialization. Returns all datasets with complete metadata.
python3 scripts/command.py get-dataset-path-info <name>
python3 scripts/command.py backup-dataset <name> <backup_path>
python3 scripts/command.py count-records <name>
Returns all records from the dataset as a list of objects.
python3 scripts/command.py read-dataset <name>
Delete the entire dataset and its metadata.
python3 scripts/command.py drop-dataset <name>
Internal fields available in every dataset:
| Field | Type | Description |
|---|---|---|
| ------------- | --------- | -------------------------------------------- |
_id | string | UUID — unique record identifier |
_updated_at | timestamp | When the record was last inserted or updated |
python3 scripts/command.py list-records <name> --limit 10 --offset 0
Returns records with optional pagination.
python3 scripts/command.py get-record <name> <record_id>
Retrieves a specific record by its UUID.
python3 scripts/command.py get-dataset-info <name>
Returns schema, field types (if data exists), and record count.
All commands return JSON:
{
"skill": "data-vault",
"operation": "<operation_name>",
"status": "success|error",
"data": <result_data_or_null>,
"error": <error_message_or_null>
}
Every dataset automatically includes:
_id — UUID for each record_updated_at — timestamp of last insert/updateThese are managed automatically — when appending, only provide your defined fields.
Lance infers types from the first record:
| Python Type | Lance Type |
|---|---|
| -------------- | -------------- |
"string" | large_string |
25 (int) | int64 |
25.5 (float) | float64 |
True/False | bool |
CLI caveat: When passing via command line, all values are strings. To ensure integer types, initialize with actual integers in a script rather than CLI.
list-datasets-info to understand what data already existsDependencies are declared in frontmatter (metadata.openclaw.install) and handled by the OpenClaw install system via uv. The Python packages required are:
pylance — The Lance columnar format library. ⚠️ Naming note: Despite the PyPI package being named pylance, the library is imported as import lance in Python code. This is the official Lance project naming convention — it is NOT the VS Code "pylance" language server. See lance.org for details.
pandas — Data manipulation共 1 个版本