← 返回
未分类 Key 中文

Alibabacloud Dataworks Data Governance

Alibaba Cloud DataWorks Data Governance Tag Management Skill. Use for managing data asset tags: creating, updating, querying tag keys/values, binding/unbindi...
阿里云 DataWorks 数据治理标签管理技能。用于管理数据资产标签:创建、更新、查询标签键/值,绑定/解绑。
sdk-team
未分类 clawhub v0.0.1-beta.1 1 版本 100000 Key: 需要
★ 0
Stars
📥 347
下载
💾 0
安装
1
版本
#latest

概述

DataWorks Data Governance Tag Management

Manage data asset tags via the DataWorks Data Governance API, including creating, updating, and querying tag keys/values, as well as batch binding and unbinding tags on data assets.

> FORBIDDEN API — DeleteDataAssetTag must NOT be used in this skill.

> Calling DeleteDataAssetTag is strictly prohibited. Do not implement, suggest, or invoke this API under any circumstances. If a user requests deletion of a tag key or tag value, inform them that this operation is outside the scope of this skill and must be handled through other authorized channels.

Architecture: DataWorks Data Governance Tag API (ROA style) + Alibaba Cloud Python Common SDK


> Pre-check: Aliyun CLI >= 3.3.1 required

> Run aliyun version to verify >= 3.3.1. If not installed or version too low,

> see references/cli-installation-guide.md for installation instructions.

> Then [MUST] run aliyun configure set --auto-plugin-install true to enable automatic plugin installation.


> Pre-check: Alibaba Cloud Credentials Required

>

> Security Rules:

> - NEVER read, echo, or print AK/SK values (e.g., echo $ALIBABA_CLOUD_ACCESS_KEY_ID is FORBIDDEN)

> - NEVER ask the user to input AK/SK directly in the conversation or command line

> - NEVER use aliyun configure set with literal credential values

> - ONLY use aliyun configure list to check credential status

>

> ```bash

> aliyun configure list

> ```

> Check the output for a valid profile (AK, STS, or OAuth identity).

>

> If no valid profile exists, STOP here.

> 1. Obtain credentials from Alibaba Cloud Console

> 2. Configure credentials outside of this session (via aliyun configure in terminal or environment variables in shell profile)

> 3. Return and re-run after aliyun configure list shows a valid profile


Installation

pip3 install alibabacloud_tea_openapi==0.4.4 alibabacloud_credentials==1.0.8 alibabacloud_tea_util==0.3.14 alibabacloud_openapi_util==0.2.4

RAM Permissions

ProductRAM ActionResource ScopeDescription
-------------------------------------------------
DataWorksdataworks:CreateDataAssetTag*Create a data asset tag key
DataWorksdataworks:UpdateDataAssetTag*Update a data asset tag key
DataWorksdataworks:ListDataAssetTags*Query the data asset tag list
DataWorksdataworks:TagDataAssets*Bind tags to data assets
DataWorksdataworks:UnTagDataAssets*Unbind tags from data assets
DataWorksdataworks:ListDataAssets*Query the data asset list

> [MUST] Permission Failure Handling: When any command or API call fails due to permission errors at any point during execution, follow this process:

> 1. Read references/ram-policies.md to get the full list of permissions required by this SKILL

> 2. Use ram-permission-diagnose skill to guide the user through requesting the necessary permissions

> 3. Pause and wait until the user confirms that the required permissions have been granted


Parameter Confirmation

> IMPORTANT: Parameter Confirmation — Before executing any command or API call,

> ALL user-customizable parameters (e.g., RegionId, instance names, CIDR blocks,

> passwords, domain names, resource specifications, etc.) MUST be confirmed with

> the user. Do NOT assume or use default values without explicit user approval.

ParameterRequired/OptionalDescriptionDefault
---------------------------------------------------
region_idRequiredAlibaba Cloud Region ID, e.g. cn-hangzhouNone — confirm with user
tag_keyRequired (create/update)Tag key nameNone — confirm with user
tag_valuesOptionalList of tag values[]
descriptionOptionalDescription of the tag keyNone
categoryOptionalTag category: Normal or CUSTOMNormal
managersOptionalList of tag manager user IDs[]
data_asset_idsRequired (bind/unbind)List of data asset identifiers, max 100None — confirm with user
data_asset_typeRequired (bind/unbind)Asset type, e.g. ACS::DataWorks::TableNone — confirm with user
project_idConditionally requiredProject space ID (required when DataAssetType is ACS::DataWorks::Task)None
env_typeConditionally requiredProject environment — required when project_id is set: Prod / DevNone
asset_typeRequired (ListDataAssets)Asset object type: Table, Task, Node, WorkFlow, DataServiceApi, DataQualityRuleNone — confirm with user
asset_tagsOptional (ListDataAssets)Filter by tag key-value list, max 20, e.g. [{"Key":"k1","Value":"v1"}]None
asset_nameOptional (ListDataAssets)Asset name keyword for fuzzy searchNone
asset_idsOptional (ListDataAssets)List of asset module IDsNone
asset_ownerOptional (ListDataAssets)Asset owner user IDNone
sort_byOptional (ListDataAssets)Sort field and direction, e.g. CreateTime DescCreateTime Desc

Core Workflow

Initialize SDK Client

import re
from alibabacloud_tea_openapi.client import Client as OpenApiClient
from alibabacloud_credentials.client import Client as CredentialClient
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_openapi_util.client import Client as OpenApiUtilClient
import json

_VALID_DATA_ASSET_TYPES = {
    'ACS::DataWorks::Table', 'ACS::DataWorks::Task',
    'ACS::DataWorks::Node', 'ACS::DataWorks::WorkFlow',
    'ACS::DataWorks::DataServiceApi', 'ACS::DataWorks::DataQualityRule'
}
_VALID_ASSET_TYPES = {'Table', 'Task', 'Node', 'WorkFlow', 'DataServiceApi', 'DataQualityRule'}
_VALID_CATEGORIES  = {'Normal', 'CUSTOM'}
_VALID_ENV_TYPES   = {'Prod', 'Dev'}

def _validate_region_id(region_id):
    if not isinstance(region_id, str) or not region_id:
        raise ValueError('region_id must be a non-empty string')
    if not re.match(r'^[a-z]{2}-[a-z0-9-]+$', region_id):
        raise ValueError(f'Invalid region_id format: {region_id!r}')

def _validate_tags(tags, max_len=20):
    if not isinstance(tags, list) or len(tags) == 0:
        raise ValueError('tags must be a non-empty list')
    if len(tags) > max_len:
        raise ValueError(f'tags exceeds max length {max_len}, got {len(tags)}')
    for t in tags:
        if not isinstance(t, dict) or 'Key' not in t or 'Value' not in t:
            raise ValueError('Each tag must be a dict with "Key" and "Value"')

def _validate_asset_ids(ids, max_len=100):
    if not isinstance(ids, list) or len(ids) == 0:
        raise ValueError('data_asset_ids must be a non-empty list')
    if len(ids) > max_len:
        raise ValueError(f'data_asset_ids exceeds max length {max_len}, got {len(ids)}')

def _validate_page(page_number, page_size):
    if not isinstance(page_number, int) or page_number < 1:
        raise ValueError(f'page_number must be an integer >= 1, got {page_number!r}')
    if not isinstance(page_size, int) or not (1 <= page_size <= 100):
        raise ValueError(f'page_size must be an integer between 1 and 100, got {page_size!r}')

def create_client(region_id: str) -> OpenApiClient:
    _validate_region_id(region_id)
    credential = CredentialClient()
    config = open_api_models.Config(credential=credential)
    config.endpoint = f'dataworks.{region_id}.aliyuncs.com'
    config.user_agent = 'AlibabaCloud-Agent-Skills'
    return OpenApiClient(config)

1. Create Tag Key (CreateDataAssetTag)

def create_data_asset_tag(client, key: str, values=None, description=None,
                           value_policy=None, category='Normal', managers=None):
    if not isinstance(key, str) or not key.strip():
        raise ValueError('key must be a non-empty string')
    if category not in _VALID_CATEGORIES:
        raise ValueError(f'category must be one of {_VALID_CATEGORIES}, got {category!r}')
    if values is not None:
        if not isinstance(values, list):
            raise ValueError('values must be a list')
    if managers is not None:
        if not isinstance(managers, list):
            raise ValueError('managers must be a list')

    params = open_api_models.Params(
        action='CreateDataAssetTag',
        version='2024-05-18',
        protocol='HTTPS',
        method='POST',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/tags',
        req_body_type='json',
        body_type='json'
    )
    body = {'Key': key}
    if values is not None:
        body['Values'] = values
    if description is not None:
        body['Description'] = description
    if value_policy is not None:
        body['ValuePolicy'] = value_policy
    if category:
        body['Category'] = category
    if managers is not None:
        body['Managers'] = managers

    request = open_api_models.OpenApiRequest(body=body)
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

2. Update Tag Key (UpdateDataAssetTag)

def update_data_asset_tag(client, key: str, values=None, description=None, managers=None):
    if not isinstance(key, str) or not key.strip():
        raise ValueError('key must be a non-empty string')
    if values is not None:
        if not isinstance(values, list):
            raise ValueError('values must be a list')
    if managers is not None:
        if not isinstance(managers, list):
            raise ValueError('managers must be a list')

    params = open_api_models.Params(
        action='UpdateDataAssetTag',
        version='2024-05-18',
        protocol='HTTPS',
        method='PUT',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/tags',
        req_body_type='json',
        body_type='json'
    )
    body = {'Key': key}
    if values is not None:
        body['Values'] = values
    if description is not None:
        body['Description'] = description
    if managers is not None:
        body['Managers'] = managers

    request = open_api_models.OpenApiRequest(body=body)
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

3. List Tag Keys (ListDataAssetTags)

def list_data_asset_tags(client, key=None, category=None, page_number=1, page_size=10):
    _validate_page(page_number, page_size)
    if category is not None and category not in _VALID_CATEGORIES:
        raise ValueError(f'category must be one of {_VALID_CATEGORIES}, got {category!r}')

    params = open_api_models.Params(
        action='ListDataAssetTags',
        version='2024-05-18',
        protocol='HTTPS',
        method='GET',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/tags',
        req_body_type='json',
        body_type='json'
    )
    queries = {
        'PageNumber': page_number,
        'PageSize': page_size
    }
    if key:
        queries['Key'] = key
    if category:
        queries['Category'] = category

    request = open_api_models.OpenApiRequest(
        query=OpenApiUtilClient.query(queries)
    )
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

4. Bind Tags to Data Assets (TagDataAssets)

def tag_data_assets(client, tags: list, data_asset_ids: list, data_asset_type: str,
                    project_id=None, env_type=None, auto_trace_enable=False):
    _validate_tags(tags, max_len=20)
    _validate_asset_ids(data_asset_ids, max_len=100)
    if data_asset_type not in _VALID_DATA_ASSET_TYPES:
        raise ValueError(f'data_asset_type must be one of {_VALID_DATA_ASSET_TYPES}, got {data_asset_type!r}')
    if data_asset_type == 'ACS::DataWorks::Task':
        if project_id is None or env_type is None:
            raise ValueError('project_id and env_type are required when data_asset_type is ACS::DataWorks::Task')
    if env_type is not None and env_type not in _VALID_ENV_TYPES:
        raise ValueError(f'env_type must be one of {_VALID_ENV_TYPES}, got {env_type!r}')

    params = open_api_models.Params(
        action='TagDataAssets',
        version='2024-05-18',
        protocol='HTTPS',
        method='POST',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/tags/bind',
        req_body_type='json',
        body_type='json'
    )
    body = {
        'Tags': tags,
        'DataAssetIds': data_asset_ids,
        'DataAssetType': data_asset_type,
        'AutoTraceEnable': auto_trace_enable
    }
    if project_id is not None:
        body['ProjectId'] = project_id
    if env_type is not None:
        body['EnvType'] = env_type

    request = open_api_models.OpenApiRequest(body=body)
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

5. List Data Assets (ListDataAssets)

def list_data_assets(client, asset_type: str, tags=None, name=None, ids=None,
                     owner=None, project_id=None, sort_by=None,
                     page_number=1, page_size=10):
    if asset_type not in _VALID_ASSET_TYPES:
        raise ValueError(f'asset_type must be one of {_VALID_ASSET_TYPES}, got {asset_type!r}')
    _validate_page(page_number, page_size)
    if tags is not None:
        _validate_tags(tags, max_len=20)

    params = open_api_models.Params(
        action='ListDataAssets',
        version='2024-05-18',
        protocol='HTTPS',
        method='GET',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/assets',
        req_body_type='json',
        body_type='json'
    )
    queries = {
        'Type': asset_type,
        'PageNumber': page_number,
        'PageSize': page_size
    }
    if tags is not None:
        queries['Tags'] = json.dumps(tags)
    if name is not None:
        queries['Name'] = name
    if ids is not None:
        queries['Ids'] = json.dumps(ids)
    if owner is not None:
        queries['Owner'] = owner
    if project_id is not None:
        queries['ProjectId'] = project_id
    if sort_by is not None:
        queries['SortBy'] = sort_by

    request = open_api_models.OpenApiRequest(
        query=OpenApiUtilClient.query(queries)
    )
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

6. Unbind Tags from Data Assets (UnTagDataAssets)

def untag_data_assets(client, tags: list, data_asset_ids: list, data_asset_type: str,
                      project_id=None, env_type=None):
    _validate_tags(tags, max_len=20)
    _validate_asset_ids(data_asset_ids, max_len=100)
    if data_asset_type not in _VALID_DATA_ASSET_TYPES:
        raise ValueError(f'data_asset_type must be one of {_VALID_DATA_ASSET_TYPES}, got {data_asset_type!r}')
    if data_asset_type == 'ACS::DataWorks::Task':
        if project_id is None or env_type is None:
            raise ValueError('project_id and env_type are required when data_asset_type is ACS::DataWorks::Task')
    if env_type is not None and env_type not in _VALID_ENV_TYPES:
        raise ValueError(f'env_type must be one of {_VALID_ENV_TYPES}, got {env_type!r}')

    params = open_api_models.Params(
        action='UnTagDataAssets',
        version='2024-05-18',
        protocol='HTTPS',
        method='POST',
        auth_type='AK',
        style='ROA',
        pathname='/api/v1/data-governance/tags/unbind',
        req_body_type='json',
        body_type='json'
    )
    body = {
        'Tags': tags,
        'DataAssetIds': data_asset_ids,
        'DataAssetType': data_asset_type
    }
    if project_id is not None:
        body['ProjectId'] = project_id
    if env_type is not None:
        body['EnvType'] = env_type

    request = open_api_models.OpenApiRequest(body=body)
    runtime = util_models.RuntimeOptions()
    runtime.connect_timeout = 5000   # ms
    runtime.read_timeout = 30000     # ms
    return client.call_api(params, request, runtime)

Response Handling

def handle_response(response):
    status_code = response.get('statusCode')
    body = response.get('body', {})
    if status_code == 200 and body.get('Data'):
        print(f"[SUCCESS] RequestId: {body.get('RequestId')}")
        return body.get('Data')
    else:
        code = body.get('Code')
        message = body.get('Message')
        raise Exception(f"[FAIL] Code={code}, Message={message}, RequestId={body.get('RequestId')}")

Success Verification

Operation success criteria:

  • HTTP status code 200
  • Response body Data field is true (create/update/bind/unbind operations)
  • Response body Data is an object (ListDataAssetTags, ListDataAssets)

For detailed step-by-step verification commands, see references/verification-method.md.


Cleanup

client = create_client('cn-hangzhou')  # Replace with actual Region

# Unbind tags from data assets
untag_data_assets(client,
    tags=[{"Key": "k1", "Value": "v1"}],
    data_asset_ids=["maxcompute-table.project.tableName"],
    data_asset_type="ACS::DataWorks::Table"
)

> Note: Deletion of tag keys or tag values via DeleteDataAssetTag is not supported in this skill. Do not attempt to call this API.


Best Practices

  1. Batch limits: Tags max 20, DataAssetIds max 100 per request — split into batches if needed
  2. Task-type assets: When DataAssetType is ACS::DataWorks::Task, both ProjectId and EnvType are required
  3. Value policy: Use ValuePolicy (e.g. regex ^L[1-7]$) when creating a tag key to enforce valid value formats
  4. Least privilege: Create a dedicated RAM role for tag management with only the required permissions
  5. Throttling retry: Use exponential backoff when encountering Throttling errors
  6. Pagination: ListDataAssetTags and ListDataAssets default to 10 per page, max 100 — use TotalCount to calculate total pages
  7. ListDataAssets serialization: Tags and Ids query parameters must be JSON-serialized strings (use json.dumps())
  8. ListDataAssets sort: Default sort is CreateTime Desc; also supports ModifyTime and HealthScore

Reference Documents

DocumentDescription
-----------------------
cli-installation-guide.mdAliyun CLI installation and configuration guide
ram-policies.mdRAM permission policies required by this Skill
related-commands.mdAPI command reference
verification-method.mdStep-by-step success verification
acceptance-criteria.mdAcceptance criteria with correct/incorrect examples

版本历史

共 1 个版本

  • v0.0.1-beta.1 当前
    2026-05-07 12:59 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

data-analysis

AdMapix

fly0pants
AdMapix 原始数据层,提供广告创意、应用、排名、下载/收入及市场元数据。返回 AdMapix API 的结构化 JSON;调用方...
★ 297 📥 141,578
data-analysis

Data Analysis

ivangdavila
{"answer":"数据分析与可视化。查询数据库、生成报告、自动化电子表格,将原始数据转化为清晰可行的见解。适用于:(1) 您……"}
★ 211 📥 69,565
data-analysis

Tavily 搜索

jacky1n7
通过 Tavily API 进行网页搜索(Brave 替代方案)。当用户要求搜索网页、查找来源或链接,且 Brave 网页搜索不可用时使用。
★ 273 📥 100,661