← 返回
未分类 中文

Aws Emr Skill

AWS EMR interaction skill for managing EMR Serverless, EMR on EC2, and EMR on EKS. Submit and manage Spark, Hive, and PySpark jobs across all three EMR deplo...
AWS EMR交互技能:管理EMR Serverless、EMR on EC2和EMR on EKS,跨三种部署方式提交和管理Spark、Hive和PySpark作业
yhyyz yhyyz 来源
未分类 clawhub v2.0.0 1 版本 100000 Key: 无需
★ 0
Stars
📥 490
下载
💾 1
安装
1
版本
#latest

概述

AWS EMR Skills

A Python skill for interacting with AWS EMR across three deployment modes: EMR Serverless, EMR on EC2, and EMR on EKS. Submit Spark and Hive jobs, manage clusters and applications, monitor job status, and retrieve logs.

When to Use (Trigger Phrases)

Invoke this skill when the user mentions:

"Submit a Spark job on EMR"
"List EMR Serverless applications"
"Add a step to my EMR cluster"
"Get EMR job logs"
"Check EMR job status"
"Cancel running EMR job"
"List EMR clusters"
"Create an EMR on EKS virtual cluster"
"Submit PySpark to EMR Serverless"
"Get step logs from EMR cluster"

Any request involving EMR Serverless applications/jobs, EMR on EC2 clusters/steps, or EMR on EKS virtual clusters/job runs.

Feature List

EMR Serverless

  • Applications: List, describe, start, stop EMR Serverless applications
  • Job Submission: Submit Spark SQL, Spark JAR, PySpark, and Hive jobs (sync/async)
  • Job Lifecycle: Get status, cancel, list job runs
  • Results: Retrieve SQL query results from S3
  • Logs: Get driver stdout/stderr logs with secret masking

EMR on EC2

  • Clusters: List, describe, terminate EMR clusters
  • Step Submission: Add Spark, PySpark, and Hive steps via command-runner.jar
  • Step Lifecycle: List, describe, cancel steps
  • Logs: Get step logs (stderr, stdout, controller, syslog) from S3

EMR on EKS

  • Virtual Clusters: List, describe, create, delete virtual clusters
  • Job Submission: Submit Spark and Spark SQL jobs to EKS
  • Job Lifecycle: Describe, list, cancel job runs
  • Logs: Get job logs from S3

Initial Setup

  1. Python 3.8+ with boto3>=1.26.0:

```bash

pip install boto3>=1.26.0

```

  1. AWS credentials via boto3 default chain (env vars, config files, IAM roles).
  1. Environment variables (all optional, validated at point of use):

```bash

export AWS_REGION="us-east-1"

# EMR Serverless

export EMR_SERVERLESS_APP_ID="00abcdef12345678"

export EMR_SERVERLESS_EXEC_ROLE_ARN="arn:aws:iam::123456789:role/emr-role"

export EMR_SERVERLESS_S3_LOG_URI="s3://my-bucket/emr-logs/"

# EMR on EC2

export EMR_CLUSTER_ID="j-XXXXXXXXXXXXX"

# EMR on EKS

export EMR_EKS_VIRTUAL_CLUSTER_ID="abc123def456"

export EMR_EKS_EXEC_ROLE_ARN="arn:aws:iam::123456789:role/emr-eks-role"

```

How to Manage EMR

1. EMR Serverless

Fully managed serverless Spark/Hive execution. No infrastructure to manage.

  • Application management: scripts/on_serverless/emr_serverless_cli.py — 14 @tool functions
  • Detailed guide: references/emr_serverless/application_guide.md — Application lifecycle
  • Detailed guide: references/emr_serverless/job_guide.md — Job submission, results, logs

2. EMR on EC2

Traditional EMR clusters on EC2 instances. Submit work as Steps.

  • Cluster & step management: scripts/on_ec2/emr_on_ec2_cli.py — 10 @tool functions
  • Detailed guide: references/emr_on_ec2/cluster_guide.md — Cluster lifecycle
  • Detailed guide: references/emr_on_ec2/step_guide.md — Step submission, logs

3. EMR on EKS

Spark workloads on Amazon EKS via the emr-containers API.

  • Virtual cluster & job management: scripts/on_eks/emr_on_eks_cli.py — 10 @tool functions
  • Detailed guide: references/emr_on_eks/virtual_cluster_guide.md — Virtual cluster lifecycle
  • Detailed guide: references/emr_on_eks/job_run_guide.md — Job submission, logs

Available Scripts

ScriptDescription
------
scripts/on_serverless/emr_serverless_cli.pyEMR Serverless @tool functions (14 tools)
scripts/on_ec2/emr_on_ec2_cli.pyEMR on EC2 @tool functions (10 tools)
scripts/on_eks/emr_on_eks_cli.pyEMR on EKS @tool functions (10 tools)
scripts/config/emr_config.pyUnified configuration management
scripts/client/boto_client.pyboto3 client factory

References

DocumentDescription
------
references/emr_serverless/application_guide.mdEMR Serverless application management guide
references/emr_serverless/job_guide.mdEMR Serverless job submission and management guide
references/emr_on_ec2/cluster_guide.mdEMR on EC2 cluster management guide
references/emr_on_ec2/step_guide.mdEMR on EC2 step submission and management guide
references/emr_on_eks/virtual_cluster_guide.mdEMR on EKS virtual cluster management guide
references/emr_on_eks/job_run_guide.mdEMR on EKS job run management guide

Requirements

  • When writing temporary files (scripts, notes, etc.), place them in the ./tmp folder.
  • When importing scripts packages, add the skill root to path: sys.path.append(${emr_skill_root})
  • AWS credentials are handled by boto3's default credential chain — never pass access keys directly.
  • All configuration environment variables are optional and validated at the point of use.

Data Privacy & Trust

  • No credential storage: AWS credentials are resolved via boto3 default chain. No keys are stored or logged.
  • Secret masking: Log retrieval functions automatically mask potential AWS credentials in output.
  • Read-only by default: Most operations are read-only queries. Write operations (job submission, cluster termination) require explicit user action.

External Endpoints

This skill connects to:

  • AWS EMR Serverless API (emr-serverless.{region}.amazonaws.com)
  • AWS EMR API (elasticmapreduce.{region}.amazonaws.com)
  • AWS EMR Containers API (emr-containers.{region}.amazonaws.com)
  • AWS S3 API (s3.{region}.amazonaws.com) — for log and result retrieval

版本历史

共 1 个版本

  • v2.0.0 当前
    2026-03-30 19:20 安全 安全

安全检测

腾讯云安全 (Keen)

安全,无风险
查看报告

腾讯云安全 (Sanbu)

安全,无风险
查看报告

🔗 相关推荐

it-ops-security

MoltGuard - Security & Antivirus & Guardrails

thomaslwang
MoltGuard — OpenClaw 安全守卫,由 OpenGuardrails 提供。安装 MoltGuard,保护您和您的用户免受提示注入、数据泄露和恶意攻击。
★ 116 📥 30,857
it-ops-security

1password

steipete
设置和使用 1Password CLI (op)。适用于:安装 CLI、启用桌面应用集成、登录(单/多账户)、通过 op 读取/注入/运行密钥。
★ 53 📥 31,516
it-ops-security

OpenClaw Backup

alex3alex
备份与恢复 OpenClaw 数据。适用于创建备份、设置自动备份计划、从备份恢复或管理备份轮转。处理 ~/.openclaw 目录归档并包含适当的排除规则。
★ 90 📥 30,850