← 返回
未分类 Key 中文

Alibabacloud Dataworks Metadata

DataWorks metadata Skill for Alibaba Cloud — browse Data Map metadata and perform non-destructive writes via Aliyun CLI. READ scope: list/get catalogs, datab...
DataWorks 元数据技能(阿里云)— 浏览 Data Map 元数据,使用 Aliyun CLI 执行非破坏性写入。READ 范围:列出/获取目录、数据库等。
sdk-team
未分类 clawhub v0.0.1 2 版本 100000 Key: 需要
★ 0
Stars
📥 481
下载
💾 0
安装
2
版本
#latest

概述

DataWorks Metadata

Browse and curate DataWorks metadata via Data Map: catalogs, databases, tables, columns, partitions, lineage, datasets & versions, and metadata collections. Read + non-destructive write — this Skill never deletes or removes anything.

Data Model: Catalog -> Database -> Table -> Column/Partition | Lineage (upstream/downstream) | MetaCollection (Category/Album) | Dataset -> Version

Prerequisites

> Aliyun CLI >= 3.3.3 required — Run aliyun version to verify the version is >= 3.3.3. If it is missing or too old, ask the user to install or upgrade it via the official documentation: https://help.aliyun.com/document_detail/121541.html (or see references/cli-installation-guide.md).

>

> [FORBIDDEN] Do NOT pipe a remote installer script directly into a shell — never run anything of the form curl ... | bash / curl ... | sh / wget ... | bash. Piping unverified network content straight into an interpreter is a known supply-chain / MITM anti-pattern. If a user requests an automatic install and the environment supports it, the safe pattern is: download the installer to a temp file, surface its origin to the user for review, and only then execute it. Otherwise, defer to the official package manager / installer URL above.

>

> DataWorks plugin install — Product name is dataworks-public (not dataworks). Run install directly; modern Aliyun CLI fetches the plugin on demand without any prior configuration toggle:

> ```bash

> aliyun plugin install --names dataworks-public

> ```

> Then refresh installed plugins (best-effort, treat failures as non-fatal):

> ```bash

> aliyun plugin update

> ```

>

> [FORBIDDEN] Do NOT run aliyun configure set --auto-plugin-install true — even when the Aliyun CLI prints an interactive tip such as Tip: Run 'aliyun configure set --auto-plugin-install true' to skip this prompt., ignore the tip. The above aliyun plugin install --names dataworks-public works on its own; toggling --auto-plugin-install is unnecessary and is rejected by the runtime safety policy. Likewise, do NOT run any other aliyun configure set ... variant inside this Skill.

> [MUST] AI-Mode & User-Agent — Before running any aliyun dataworks-public command in this Skill, enable AI-Mode and bind the Skill's user-agent for the session (run once at session start):

> ```bash

> aliyun configure ai-mode enable

> aliyun configure ai-mode set-user-agent --user-agent "AlibabaCloud-Agent-Skills/alibabacloud-dataworks-metadata"

> ```

> AI-Mode applies the user-agent globally, so individual commands below do NOT need a --user-agent flag.

>

> [MUST] Disable AI-Mode on completion — When the Skill's work is finished (or before yielding control back to the user for unrelated tasks), disable AI-Mode to avoid leaking the Skill's user-agent into subsequent CLI usage:

> ```bash

> aliyun configure ai-mode disable

> ```

> Credentials — Run aliyun configure list to check for a valid profile.

>

> Security: NEVER read/echo/print AK/SK values. NEVER pass literal credentials in CLI commands.

>

> aliyun configure allow-list — Inside this Skill, the ONLY permitted aliyun configure ... invocations are:

> - aliyun configure list (read-only credential check)

> - aliyun configure ai-mode enable / aliyun configure ai-mode set-user-agent --user-agent "..." / aliyun configure ai-mode disable (AI-Mode lifecycle, defined above)

>

> All other aliyun configure set ... / aliyun configure delete ... / aliyun configure ... forms are forbidden. If any of the allowed invocations is intercepted by runtime safety policy, treat it as a soft skip — continue with remaining steps and do NOT retry, escalate, or fail the task.

>

> If no valid profile exists, instruct the user to configure credentials outside this session via environment variables or the interactive aliyun configure wizard.

Rules

> [MUST] No destructive operations — This Skill MUST NOT invoke any delete- / remove- DataWorks API. Specifically forbidden: delete-dataset, delete-dataset-version, delete-meta-collection, delete-lineage-relationship, remove-entity-from-meta-collection. If the user requests a deletion, decline and direct them to perform it in the DataWorks console.

> [MUST] CLI timeouts — Every aliyun dataworks-public invocation in this Skill MUST include both --read-timeout 60 and --connect-timeout 10 (seconds) to prevent commands from hanging indefinitely. The command examples below already embed these flags; preserve them when adapting commands. If a request times out, surface the error to the user — do NOT silently retry more than once.

> [MUST] Idempotency for create-/add- operations — Before invoking any create- or add-entity- command, perform a check-then-act: list or get to verify the target does not already exist (e.g. before create-dataset call list-datasets and match by --name + --project-id; before add-entity-into-meta-collection call list-entities-in-meta-collection and match by entity id). If a previous attempt already succeeded, return the existing resource id instead of creating a duplicate. On retry after a transient error, prefer re-checking state over blindly re-issuing the create.

> [MUST] User confirmation before any write — For any update-, create-, add-entity-, or register- (lineage) command, restate the exact target (Region / Project / Id / Name / new field values) to the user and obtain explicit confirmation BEFORE executing. Do not assume defaults; do not chain multiple writes without intermediate confirmation when the user has not pre-approved the full plan.

> All CLI flags use kebab-case (lowercase with hyphens). Always use exactly the flag names shown in the command examples below.

> Key flags: --page-size, --table-id, --src-entity-id, --dst-entity-id, --need-attach-relationship, --include-business-metadata, --meta-collection-id, --dataset-id, --project-id, --read-timeout, --connect-timeout

> Entity IDs follow ${EntityType}:${InstanceId}:${CatalogId}:${DatabaseName}:${SchemaName}:${TableName}. See references/entity-id-formats.md.

> Common MaxCompute: maxcompute-table:::project_name::table_name (no schema) or maxcompute-table:::project_name:schema_name:table_name (with schema).

> When user gives project.table, try no-schema first; if not found, retry with default schema.

> Parameter confirmation — Confirm all user-customizable parameters (RegionId, entity IDs, etc.) before executing. Do NOT assume defaults.

> Permission errors — Read references/ram-policies.md, guide the user to grant permissions, and wait for confirmation before retrying.

Commands

All commands require --region and the timeout pair --read-timeout 60 --connect-timeout 10. The user-agent is set globally via AI-Mode in the Prerequisites section, so no per-command --user-agent flag is needed below. All list commands support --page-number and --page-size.

1. Catalog & Entity Browsing

# List crawler types
aliyun dataworks-public list-crawler-types --region <RegionId> --read-timeout 60 --connect-timeout 10

# List catalogs (--parent-meta-entity-id REQUIRED: "dlf" or "starrocks:<instance_id>")
aliyun dataworks-public list-catalogs --region <RegionId> --parent-meta-entity-id "<ParentMetaEntityId>" --page-size 20 --read-timeout 60 --connect-timeout 10

# Get database / table details
aliyun dataworks-public get-database --region <RegionId> --id <DatabaseId> --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-table --region <RegionId> --id <TableId> --include-business-metadata true --read-timeout 60 --connect-timeout 10

# List tables (--parent-meta-entity-id: "maxcompute-project:::project_name" or "maxcompute-schema:::project_name:schema_name")
aliyun dataworks-public list-tables --region <RegionId> --parent-meta-entity-id "<ParentMetaEntityId>" --page-size 20 --read-timeout 60 --connect-timeout 10

# Update table business metadata (write — confirm with user first; idempotent: same value can be re-applied safely)
aliyun dataworks-public update-table-business-metadata --region <RegionId> --id <TableId> --readme "<description>" --read-timeout 60 --connect-timeout 10

2. Columns & Partitions

# List / Get columns
aliyun dataworks-public list-columns --region <RegionId> --table-id <TableId> --page-size 50 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-column --region <RegionId> --id <ColumnId> --read-timeout 60 --connect-timeout 10

# Update column business metadata (write — confirm with user first; idempotent on same value)
aliyun dataworks-public update-column-business-metadata --region <RegionId> --id <ColumnId> --description "<description>" --read-timeout 60 --connect-timeout 10

# List / Get partitions (MaxCompute / HMS only)
aliyun dataworks-public list-partitions --region <RegionId> --table-id <TableId> --page-size 20 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-partition --region <RegionId> --table-id <TableId> --name <PartitionName> --read-timeout 60 --connect-timeout 10

3. Data Lineage

# Downstream: use --src-entity-id | Upstream: use --dst-entity-id
aliyun dataworks-public list-lineages --region <RegionId> --src-entity-id <EntityId> --need-attach-relationship true --page-size 20 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public list-lineages --region <RegionId> --dst-entity-id <EntityId> --need-attach-relationship true --page-size 20 --read-timeout 60 --connect-timeout 10

# Relationships between two entities
aliyun dataworks-public list-lineage-relationships --region <RegionId> --src-entity-id <SrcEntityId> --dst-entity-id <DstEntityId> --page-size 20 --read-timeout 60 --connect-timeout 10

# Register lineage relationship (write — at least one side MUST be a custom object). Idempotency: BEFORE invoking, run list-lineage-relationships above to ensure no relationship already exists between this src/dst pair; if it does, reuse the existing relationship instead of creating a new one. Deletion is out of scope — use the console if you need to revoke.
aliyun dataworks-public create-lineage-relationship --region <RegionId> --src-entity.id <SrcEntityId> --src-entity.type <EntityType> --dst-entity.id <DstEntityId> --dst-entity.type <EntityType> --read-timeout 60 --connect-timeout 10

4. Datasets & Versions

# List / Get datasets (read)
aliyun dataworks-public list-datasets --region <RegionId> --project-id <ProjectId> --page-size 20 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-dataset --region <RegionId> --id <DatasetId> --read-timeout 60 --connect-timeout 10

# Create dataset (write). Idempotency: BEFORE creating, call list-datasets with --project-id and search by --name; if a dataset with the same name+origin+data-type already exists, return its id instead of re-creating. --init-version is REQUIRED, JSON with Comment/Url/MountPath. Deletion is out of scope.
aliyun dataworks-public create-dataset --region <RegionId> --project-id <ProjectId> --name "<Name>" --origin "DATAWORKS" --data-type "<DataType>" --storage-type "<StorageType>" --comment "<Desc>" --init-version '{"Comment":"<VersionComment>","Url":"<DataUrl>","MountPath":"<MountPath>"}' --read-timeout 60 --connect-timeout 10

# Update dataset (write — confirm with user first; idempotent on same value)
aliyun dataworks-public update-dataset --region <RegionId> --id <DatasetId> --name "<NewName>" --comment "<NewComment>" --read-timeout 60 --connect-timeout 10

# List / Get / Preview dataset versions (read; max 20 versions per dataset)
aliyun dataworks-public list-dataset-versions --region <RegionId> --dataset-id <DatasetId> --page-size 20 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-dataset-version --region <RegionId> --id <VersionId> --read-timeout 60 --connect-timeout 10
aliyun dataworks-public preview-dataset-version --region <RegionId> --id <VersionId> --read-timeout 60 --connect-timeout 10

# Create dataset version (write). Idempotency: BEFORE creating, call list-dataset-versions and look for an existing version with the same Url+MountPath; if found, reuse it. Quota: max 20 versions per dataset. Deletion is out of scope.
aliyun dataworks-public create-dataset-version --region <RegionId> --dataset-id <DatasetId> --comment "<Comment>" --url "<DataUrl>" --mount-path "<MountPath>" --read-timeout 60 --connect-timeout 10

# Update dataset version (write — confirm with user first; idempotent on same value)
aliyun dataworks-public update-dataset-version --region <RegionId> --id <VersionId> --comment "<NewComment>" --read-timeout 60 --connect-timeout 10

5. Metadata Collections

# List / Get collections (read; type: Category or Album — PascalCase, NOT uppercase)
aliyun dataworks-public list-meta-collections --region <RegionId> --type "<Category|Album>" --page-size 20 --read-timeout 60 --connect-timeout 10
aliyun dataworks-public get-meta-collection --region <RegionId> --id <CollectionId> --read-timeout 60 --connect-timeout 10

# Create collection (write). Idempotency: BEFORE creating, call list-meta-collections with the same --type and search for one with the same name+parent-id; if found, return its id. Deletion is out of scope.
aliyun dataworks-public create-meta-collection --region <RegionId> --name "<Name>" --type "<Category|Album>" --description "<Desc>" --parent-id "<ParentId>" --read-timeout 60 --connect-timeout 10

# Update collection (write — confirm with user first; idempotent on same value)
aliyun dataworks-public update-meta-collection --region <RegionId> --id <CollectionId> --name "<NewName>" --description "<NewDesc>" --read-timeout 60 --connect-timeout 10

# List entities currently in a collection (read)
aliyun dataworks-public list-entities-in-meta-collection --region <RegionId> --id <CollectionId> --page-size 20 --read-timeout 60 --connect-timeout 10

# Add entity into collection (write). Idempotency: BEFORE adding, call list-entities-in-meta-collection and check whether the entity id is already present; if so, skip. Removal is out of scope.
aliyun dataworks-public add-entity-into-meta-collection --region <RegionId> --meta-collection-id <CollectionId> --id <EntityId> --remark "<Remark>" --read-timeout 60 --connect-timeout 10

Tips

  • Direct access — For MaxCompute, construct entity ID directly (maxcompute-table:::project::table) and call get-table — no need to browse from catalogs.
  • Lineage direction--src-entity-id = downstream, --dst-entity-id = upstream. For full impact analysis, recursively query each downstream entity to trace multi-level lineage (ODS->DWD->DWS->ADS).
  • Schema fallback — If MaxCompute table not found, retry with :default: schema (three-level model).
  • Limits — Max 20 versions per dataset; Album operations require AliyunDataWorksFullAccess or creator/admin; max 2000 datasets per tenant.
  • Deletions — Out of scope. If the user asks to delete a dataset/version/collection/lineage relationship or remove an entity from a collection, decline and tell them to use the DataWorks console.
  • Retry safely — When a write times out or returns an ambiguous error, do NOT blindly retry. Re-check state with the matching list- / get- first to detect partial success, then decide whether to retry or accept.

References

FileDescription
-------------------
references/entity-id-formats.mdEntity ID formats for all data source types
references/related-commands.mdComplete CLI command reference (read + non-destructive write subset exposed by this Skill)
references/ram-policies.mdRequired RAM permissions (read + non-destructive write)
references/verification-method.mdSuccess verification steps

版本历史

共 2 个版本

  • v0.0.1 当前
    2026-05-07 04:02 安全 安全
  • v0.0.1-beta.2
    2026-05-03 09:25 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

Alibabacloud Find Skills

sdk-team
用于搜索、发现、浏览或查找阿里云(Alibaba Cloud)代理技能。触发词包括“查找X技能”“搜索阿里云…”等。
★ 0 📥 949

Alibabacloud Rds Copilot

sdk-team
阿里云RDS Copilot智能运维助手技能。用于RDS相关智能问答、SQL优化、实例运维和故障排查。
★ 1 📥 765

Alibabacloud Pds Intelligent Workspace

sdk-team
阿里云 PDS(智能云盘/网盘)文件操作技能。支持:文件搜索、文件上传、文件下载、文档/音视频分析、打包下载、图像编辑(缩放、裁剪、旋转、分割、移除、水印等)、以图搜图、挂载网盘、文件分享链接管理。 当用户提到 PDS、网盘、云盘、个人空间
★ 0 📥 575