← 返回
未分类 Key

dataworks-diagnoser

Fetch and analyze Alibaba Cloud DataWorks task instance logs to diagnose failures and get actionable recommendations using your instance ID and credentials.
通过实例ID和凭证获取并分析阿里云DataWorks任务实例日志,诊断失败并提供可操作建议。
ljw-git-dw ljw-git-dw 来源
未分类 clawhub v1.0.0 1 版本 100000 Key: 需要
★ 0
Stars
📥 297
下载
💾 0
安装
1
版本
#latest

概述

DataWorks Task Instance Diagnostician

Fetches task instance logs from Alibaba Cloud DataWorks API and provides intelligent diagnostic recommendations.

Quick Start

Diagnose a failed task:

python3 scripts/dataworks_diagnose.py <instance_id>

Example:

python3 scripts/dataworks_diagnose.py 123456789

When to Use

USE this skill when:

  • DataWorks task instance failed and you need to know why
  • You have an instance ID and need to fetch error logs
  • You want automated diagnosis and solutions for task failures
  • Troubleshooting ODPS SQL, Data Integration, Shell, Python nodes
  • Need to analyze error patterns across multiple failures
  • Preparing incident reports for failed tasks

When NOT to Use

DON'T use this skill when:

  • You need real-time task monitoring (use DataWorks console)
  • You want to modify task configurations (use console or API directly)
  • You need historical analytics across many tasks (use DataWorks reports)
  • The task is still running (wait for completion first)
  • You don't have Alibaba Cloud credentials (need AccessKey)

Prerequisites

1. Alibaba Cloud Credentials

One of the following is required:

Option A: Environment Variables (Recommended)

export ALIBABA_CLOUD_ACCESS_KEY_ID=your_access_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_access_secret

Option B: Config File

Create ~/.alibabacloud/credentials:

{
  "access_key_id": "your_access_key",
  "access_key_secret": "your_access_secret"
}

Option C: Aliyun CLI Config

If you have Aliyun CLI configured, credentials will be loaded automatically.

2. Required Permissions

The AccessKey needs these permissions:

  • dataworks:GetInstanceLog - Fetch task instance logs
  • dataworks:QueryTask - Query task information

3. Network Access

  • Access to Alibaba Cloud API endpoints
  • If using VPC, ensure proper network configuration

Core Workflows

1. Quick Diagnosis (Recommended)

Fetch log and get diagnosis in one command:

python3 scripts/dataworks_diagnose.py <instance_id>

Example:

python3 scripts/dataworks_diagnose.py 123456789

Output:

🔍 开始诊断 DataWorks 任务实例:123456789
📍 区域:cn-hangzhou
------------------------------------------------------------

📥 步骤 1/2: 获取任务日志...
✅ 日志获取成功

🔬 步骤 2/2: 分析诊断中...
✅ 诊断完成

============================================================
📋 诊断报告
============================================================
🔍 DataWorks 任务实例诊断报告
============================================================
实例 ID: 123456789
发现问题数:2

----------------------------------------------------------------------
🔴 问题 1: 资源配额不足
   类型:resource_quota
   严重程度:HIGH
   
   相关日志:
     > ERROR: quota exceeded for resource group 'default'
     > No available slots in queue
   
   建议解决方案:
     1. 检查当前资源组的使用情况,释放闲置资源
     2. 联系管理员提升资源配额
     3. 优化任务配置,减少资源消耗
     4. 考虑错峰调度,避开资源使用高峰
   
   参考文档:https://help.aliyun.com/.../resource-group.html

2. Fetch Log Only

python3 scripts/fetch_instance_log.py <instance_id> [options]

Options:

# Specify region
python3 scripts/fetch_instance_log.py 123456789 --region cn-shanghai

# Output as JSON
python3 scripts/fetch_instance_log.py 123456789 --json

# Show full log (default: last 50 lines)
python3 scripts/fetch_instance_log.py 123456789 --verbose

# Save to file
python3 scripts/fetch_instance_log.py 123456789 > log.txt

3. Diagnose Existing Log

python3 scripts/diagnose_log.py <log_file>

Examples:

# From file
python3 scripts/diagnose_log.py error.log

# From stdin
cat log.txt | python3 scripts/diagnose_log.py

# With instance ID
python3 scripts/diagnose_log.py error.log --instance-id 123456789

# JSON output
python3 scripts/diagnose_log.py error.log --json

# Summary only
python3 scripts/diagnose_log.py error.log --summary

Scripts

This skill includes three scripts:

dataworks_diagnose.py - All-in-One Tool

Fetches log and provides diagnosis automatically.

Usage:

python3 scripts/dataworks_diagnose.py <instance_id> [options]

Options:

  • --region, -r - Alibaba Cloud region (default: cn-hangzhou)
  • --json, -j - Output as JSON
  • --verbose, -v - Show full log
  • --save-log FILE - Save raw log to file
  • --save-report FILE - Save diagnostic report to file

fetch_instance_log.py - Log Fetcher

Fetches task instance log from DataWorks API.

Usage:

python3 scripts/fetch_instance_log.py <instance_id> [options]

Options:

  • --region, -r - Region (default: cn-hangzhou)
  • --access-key - Access Key ID
  • --access-secret - Access Key Secret
  • --json, -j - JSON output
  • --verbose, -v - Full log

diagnose_log.py - Log Analyzer

Analyzes log content and provides diagnostic recommendations.

Usage:

python3 scripts/diagnose_log.py <log_file_or_stdin> [options]

Options:

  • --instance-id - Task instance ID
  • --json, -j - JSON output
  • --summary, -s - Summary only

Detected Error Patterns

The diagnostician recognizes these error types:

Error TypeSeverityExamples
--------------------------------
🔴 resource_quotaHigh"quota exceeded", "资源不足"
🔴 resource_expiredHigh"expired", "独享资源组已过期", "bill exception"
🔴 connection_timeoutHigh"connection timeout", "network unreachable"
🔴 permission_deniedHigh"permission denied", "access denied"
🟡 syntax_errorMedium"syntax error", "parse error"
🟡 table_not_foundMedium"table not found", "doesn't exist"
🟡 data_qualityMedium"quality check failed"
🔴 memory_overflowHigh"out of memory", "heap space"
🔴 disk_fullHigh"disk full", "no space left"
🟡 dependency_failedMedium"dependency failed", "upstream failed"
🟡 api_rate_limitMedium"rate limit exceeded"

See references/error_codes.md for detailed error patterns and solutions.

Common Regions

RegionCode
--------------
华东 1 (杭州)cn-hangzhou
华东 2 (上海)cn-shanghai
华北 1 (青岛)cn-qingdao
华北 2 (北京)cn-beijing
华南 1 (深圳)cn-shenzhen
香港cn-hongkong
新加坡ap-southeast-1

API Reference

API: GetTaskInstanceLog

Version: 2024-05-18

Endpoint: https://dataworks-public.{region}.aliyuncs.com/

Request Parameters:

  • InstanceId (required) - Task instance ID
  • RegionId (required) - Region ID

Response:

{
  "Data": {
    "LogContent": "...",
    "InstanceStatus": "FAILED",
    "CycleTime": "2024-01-15 10:30:00"
  },
  "Code": "200"
}

Documentation:

https://api.aliyun.com/api/dataworks-public/2024-05-18/GetTaskInstanceLog

Examples

Example 1: Quick Diagnosis

python3 scripts/dataworks_diagnose.py 123456789

Example 2: Save Report

python3 scripts/dataworks_diagnose.py 123456789 --save-report diagnosis.txt

Example 3: Different Region

python3 scripts/dataworks_diagnose.py 123456789 --region cn-shanghai

Example 4: Analyze Saved Log

python3 scripts/diagnose_log.py saved_log.txt --instance-id 123456789

Example 5: Batch Analysis

for id in 123 456 789; do
  python3 scripts/diagnose_log.py --instance-id $id < log_$id.txt
done

Troubleshooting

"Credentials not found"

# Set environment variables
export ALIBABA_CLOUD_ACCESS_KEY_ID=your_key
export ALIBABA_CLOUD_ACCESS_KEY_SECRET=your_secret

"Instance not found"

  • Verify the instance ID is correct
  • Check if the instance exists in DataWorks console
  • Ensure you're using the correct region

"Permission denied"

  • Verify AccessKey has required permissions
  • Check RAM role configuration
  • Contact administrator for access

"Request timeout"

  • Check network connectivity
  • Try increasing timeout in script
  • Verify API endpoint is accessible

Tips

💡 Pro tips:

  1. Save logs for failed tasks - Use --save-log to keep records
  2. Generate reports - Use --save-report for documentation
  3. Batch processing - Script supports multiple instance IDs
  4. JSON output - Use --json for programmatic processing
  5. Region matters - Always use the correct region for your workspace

Security

⚠️ Important:

  • Never commit AccessKeys to version control
  • Use RAM roles instead of main account keys
  • Rotate keys regularly
  • Use environment variables or secure config files
  • Restrict key permissions to minimum required

References

版本历史

共 1 个版本

  • v1.0.0 当前
    2026-05-07 23:01 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

Tmux

steipete
通过发送按键和抓取窗格输出,远程控制交互式 CLI 的 tmux 会话。
★ 46 📥 29,541
it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,642
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,952