如何使用irsa将火花提交给亚马逊eks集群

In previous article, I have introduced how we submit a Spark job to an EKS cluster. As long as we’re using other AWS components for our pipelines to interact, like S3/DynamoDB/etc., we need assign IAM policy to the Spark driver and executor pods.In this tutorial, I will show you how-to submit a Spark job version 2.4.4 with IRSA (IAM roles for Service Account).

在上一篇文章中 ，我介绍了如何将Spark作业提交到EKS集群。只要我们使用其他AWS组件进行管道交互(例如S3 / DynamoDB /等)，就需要将IAM策略分配给Spark驱动程序和执行器pod。在本教程中，我将向您展示如何提交带有IRSA的Spark作业版本2.4.4(服务帐户的IAM角色)。

什么是IRSA？ (What is IRSA?)

According to AWS official documentation and blog:

根据AWS 官方文档和博客：

Our approach, IAM Roles for Service Accounts (IRSA), however, is different: we made pods first class citizens in IAM. Rather than intercepting the requests to the EC2 metadata API to perform a call to the STS API to retrieve temporary credentials, we made changes in the AWS identity APIs to recognize Kubernetes pods. By combining an OpenID Connect (OIDC) identity provider and Kubernetes service account annotations, you can now use IAM roles at the pod level.

但是，我们的方法“ IAM服务帐户角色(IRSA)”是不同的：我们在IAM中使豆荚成为头等公民。我们没有拦截对EC2元数据API的请求以执行对STS API的调用以检索临时凭证，而是对AWS Identity API进行了更改以识别Kubernetes Pod。通过结合使用OpenID Connect(OIDC)身份提供商和Kubernetes服务帐户注释，您现在可以在Pod级别使用IAM角色。

With IAM roles for service accounts on Amazon EKS clusters, you can associate an IAM role with a Kubernetes service account. This service account can then provide AWS permissions to the containers in any pod that uses that service account. With this feature, you no longer need to provide extended permissions to the node IAM role so that pods on that node can call AWS APIs.

借助Amazon EKS集群上服务帐户的IAM角色，您可以将IAM角色与Kubernetes服务帐户关联。然后，该服务帐户可以向使用该服务帐户的任何吊舱中的容器提供AWS权限。使用此功能，您不再需要为节点IAM角色提供扩展权限，以便该节点上的Pod可以调用AWS API。

AWS also mentions following benefits when combining IRSA with other community tools like kiam or kube2iam:

在将IRSA与kiam或kube2iam等其他社区工具结合使用时，AWS还提到了以下好处：

Least privilege — By using the IAM roles for service accounts feature, you no longer need to provide extended permissions to the node IAM role so that pods on that node can call AWS APIs. You can scope IAM permissions to a service account, and only pods that use that service account have access to those permissions. This feature also eliminates the need for third-party solutions such as kiam or kube2iam.
最低特权-通过使用服务帐户的IAM角色功能，您不再需要为节点IAM角色提供扩展权限，以便该节点上的Pod可以调用AWS API。您可以将IAM权限限定在服务帐户上，只有使用该服务帐户的Pod才能访问这些权限。此功能还消除了对诸如kiam或kube2iam类的第三方解决方案的kube2iam 。
Credential isolation — A container can only retrieve credentials for the IAM role that is associated with the service account to which it belongs. A container never has access to credentials that are intended for another container that belongs to another pod.
凭据隔离-容器只能检索与其所属的服务帐户关联的IAM角色的凭据。容器永远无法访问用于属于另一个容器的另一个容器的凭据。
Auditability — Access and event logging is available through CloudTrail to help ensure retrospective auditing.
可审计性-访问和事件记录，可通过CloudTrail来帮助确保追溯审计。

为什么使用IRSA提交火花？ (Why spark-submit with IRSA?)

We are actually running Spark jobs using kube2iam annotations, and we got a lots of random Spark jobs failure due to the throttling of API calls to EC2 metadata:

实际上，我们正在使用kube2iam批注运行Spark作业，由于对EC2元数据的API调用受到限制，我们出现了很多随机的Spark作业失败：

Exception in thread "main" org.apache.hadoop.fs.s3a.AWSClientIOException: doesBucketExist on sample-s3-bucket: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/
 at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:144)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:328)
 at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:270)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3242)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121)
 ... 35 more
Caused by: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider InstanceProfileCredentialsProvider : com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/
 at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:151)
 at com.amazonaws.http.AmazonHttpClient$RequestExecutor.getCredentialsFromContext(AmazonHttpClient.java:1166)
 ... 27 more
Caused by: com.amazonaws.SdkClientException: The requested metadata is not found at http://169.254.169.254/latest/meta-data/iam/security-credentials/
 at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:125)
 at com.amazonaws.internal.EC2CredentialsUtils.readResource(EC2CredentialsUtils.java:87)
 at com.amazonaws.auth.InstanceProfileCredentialsProvider$InstanceMetadataCredentialsEndpointProvider.getCredentialsEndpoint(InstanceProfileCredentialsProvider.java:189)
 at com.amazonaws.auth.EC2CredentialsFetcher.fetchCredentials(EC2CredentialsFetcher.java:122)
 at com.amazonaws.auth.EC2CredentialsFetcher.getCredentials(EC2CredentialsFetcher.java:82)
 at com.amazonaws.auth.InstanceProfileCredentialsProvider.getCredentials(InstanceProfileCredentialsProvider.java:164)
 at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.getCredentials(AWSCredentialProviderList.java:129)
 ... 43 more

Beside that, we want to explore another way to isolate the IAM permission for different Spark jobs from K8s pod and serviceAccount/namespace layer, and remove one component (kube2iam) from our K8s setup.

除此之外，我们想探索另一种方法来从K8s pod和serviceAccount / namespace层隔离不同Spark作业的IAM权限，并从K8s设置中删除一个组件( kube2iam )。

要求： (Requirements:)

EKS cluster version 1.14 or above.
EKS集群版本1.14或更高版本。
Administrative permission on a running AWS EKS cluster.
对运行中的AWS EKS集群的管理权限。
Administrative permission to create or update IAM roles/policies in AWS account.
在AWS账户中创建或更新IAM角色/策略的管理权限。
awscli at least version 1.18.110 or 2.0.36.
awscli至少为1.18.110或2.0.36版本。
kubectl to manage K8s cluster.
kubectl管理K8s集群。

Below is the general information of my EKS cluster (IP addresses are fake ones):

以下是我的EKS群集的常规信息(IP地址是伪造的)：

➜ kubectl cluster-info
Kubernetes master is running at https://4A5545E6.sk1.ap-southeast-1.eks.amazonaws.com
CoreDNS is running at https://4A5545E6.sk1.ap-southeast-1.eks.amazonaws.com/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy➜ ~ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10–0–0–3.ap-southeast-1.compute.internal Ready  134d v1.15.11
ip-10–0–0–5.ap-southeast-1.compute.internal Ready  91d v1.15.11
ip-10–0–0–7.ap-southeast-1.compute.internal Ready  134d v1.15.11

在启用OIDC的情况下创建IAM角色 (Create IAM role with OIDC enabled)

创建IAM OIDC提供程序 (Create IAM OIDC provider)

Get the OIDC issuer URL of your EKS cluster:
获取您的EKS集群的OIDC发行者URL：

➜ export OIDC_ISSUER_URL=$(aws eks describe-cluster --name  --query "cluster.identity.oidc.issuer" --output text)

Get the thumbprint of EKS in your AWS region:
获取您的AWS区域中的EKS指纹：

➜ export THUMBPRINT=$(echo | openssl s_client -servername oidc.eks.ap-southeast-1.amazonaws.com -showcerts -connect oidc.eks.ap-southeast-1.amazonaws.com:443 2>&- | tac | sed -n '/-----END CERTIFICATE-----/,/-----BEGIN CERTIFICATE-----/p; /-----BEGIN CERTIFICATE-----/q' | tac | openssl x509 -fingerprint -noout | sed 's/://g' | awk -F= '{print tolower($2)}')

Create IAM OIDC provider:
创建IAM OIDC提供程序：

➜  aws iam create-open-id-connect-provider --url $OIDC_ISSUER_URL --client-id-list "sts.amazonaws.com" --thumbprint-list $THUMBPRINT

The IAM OIDC provider will be created with ARN looks like: arn:aws:iam:::oidc-provider/oidc.eks.ap-southeast-1.amazonaws.com/id/4A5545E6

将使用如下形式的ARN创建IAM OIDC提供程序： arn：aws：iam :: ：oidc-provider / oidc.eks.ap-southeast-1.amazonaws.com / id / 4A5 545E6

创建IAM角色和政策 (Create IAM role and policy)

Generate Trust Relationship policy for your IAM role:
为您的IAM角色生成信任关系策略：

➜ export AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query "Account" --output text)➜ export OIDC_PROVIDER=$(aws eks describe-cluster --name  --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")➜ export NAMESPACE="spark-pi"➜ export SERVICE_ACCOUNT="spark-pi"➜ cat < ./trust.json
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${AWS_ACCOUNT_ID}:oidc-provider/${OIDC_PROVIDER}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "${OIDC_PROVIDER}:sub": "system:serviceaccount:${NAMESPACE}:${SERVICE_ACCOUNT}"
        }
      }
    }
  ]
}
EOF

Create IAM role with relevant policies:
使用相关政策创建IAM角色：

➜ aws iam create-role --role-name spark-irsa-test-role --assume-role-policy-document file://trust.json --description "IAM role to test spark-submit with IRSA"
➜ aws iam attach-role-policy --role-name spark-irsa-test-role --policy-arn=

创建spark-pi RBAC (Create spark-pi RBAC)

Create a file named spark_role.yaml as below, remember to update the serviceAccount annotations with the IAM role ARN created in previous step:
创建一个名为spark_role.yaml的文件，如下所示，请记住使用在上一步中创建的IAM角色ARN更新serviceAccount批注：

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: spark-pi
  namespace: spark-pi
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam:::role/spark-irsa-test-role
automountServiceAccountToken: true
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: spark-pi-role
  namespace: spark-pi
rules:
- apiGroups: [""]
  resources: ["pods", "services", "configmaps"]
  verbs: ["get", "list", "watch", "create", "delete", "update", "patch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: spark-pi-role-binding
  namespace: spark-pi
subjects:
- kind: ServiceAccount
  name: spark-pi
  namespace: spark-pi
roleRef:
  kind: Role
  name: spark-pi-role
  apiGroup: rbac.authorization.k8s.io

Run kubectl to create namespace and service account
运行kubectl创建名称空间和服务帐户

➜  kubectl create namespace spark-pi
namespace/spark-pi created
➜  kubectl apply -f ~/lab/k8s/spark_role.yml
serviceaccount/spark-pi created
role.rbac.authorization.k8s.io/spark-pi-role created
rolebinding.rbac.authorization.k8s.io/spark-pi-role-binding created

Verify whether new service account has permission to create/delete pods
验证新服务帐户是否具有创建/删除Pod的权限

➜  kubectl auth can-i create pod --as=system:serviceaccount:spark-pi:spark-pi -n spark-pi
yes

使用IRSA补丁构建spark-2.4.4 Docker映像 (Build spark-2.4.4 Docker image with IRSA patches)

According to ticket SPARK-27872, in Spark 2.4.4, spark-submit only supports one parameter spark.kubernetes.authenticate.driver.serviceAccountName= to assign serviceAccount to Spark Driver pod, while Spark Executor pods will be using default serviceAccount in the namespace, which causing Executor don’t have necessary permissions to access AWS resources.

根据票证SPARK-27872 ，在Spark 2.4.4中， spark-submit仅支持一个参数spark.kubernetes.authenticate.driver.serviceAccountName=将serviceAccount分配给Spark Driver pod，而Spark Executor pod将使用default serviceAccount在名称空间中，这会导致Executor没有访问AWS资源所需的权限。

There was a pull request which add one more parameter spark.kubernetes.authenticate.executor.serviceAccountName but the change only affect for Spark 3.x.

有一个请求请求，该请求添加了另一个参数spark.kubernetes.authenticate.executor.serviceAccountName但更改仅对Spark 3.x有效。

While most of our pipelines are still using Spark 2.4.4 and not easy to upgrade all of them to use Spark 3.x, so I decided to tweak the codes and back-port it to Spark 2.4.4.

虽然我们大多数管道仍在使用Spark 2.4.4，并且不容易将所有管道升级为使用Spark 3.x，所以我决定调整代码并将其回移植到Spark 2.4.4。

Download my patch file and save it as spark-2.4.4-irsa.patch:

下载我的补丁文件，并将其另存为spark-2.4.4-irsa.patch ：

I had tweak my script in previous post to apply the patch to Spark Docker image:

我在上一篇文章中对脚本进行了调整，以将补丁应用于Spark Docker映像：

I’m using docker:dind image to run and build Docker image inside a Docker container

我正在使用docker：dind映像在Docker容器中运行和构建Docker映像

➜  docker container run \
    --privileged -it \
    --name spark-build \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ${PWD}:/tmp \
    -e USER= \
    -e PASSWORD= \
    -w /opt \
    docker:dind \
    sh /tmp/spark_docker_build_with_irsa_patch.sh

You can use the Docker image I’ve built from above script with tag vitamingaugau/spark:spark-2.4.4-irsa.

您可以将我从以上脚本构建的Docker镜像与vitamingaugau / spark：spark-2.4.4-irsa标签一起使用 。

执行样本火花提交 (Perform sample spark-submit)

Create a jump-pod to run spark-submit:
创建一个用于运行spark-submit的Jump-Pod ：

➜  kubectl run tmp --rm -i --tty --serviceaccount spark-pi -n spark-pi --image vitamingaugau/spark:spark-2.4.4-irsa --image-pull-policy='Always' bash

Run spark-submit command inside jump pod:
在跳转窗格中运行spark-submit命令：

/opt/spark/bin/spark-submit \
    --master=k8s://https://4A5545E6.sk1.ap-southeast-1.eks.amazonaws.com \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.kubernetes.driver.pod.name=spark-pi-driver \
    --conf spark.kubernetes.container.image=vitamingaugau/spark:spark-2.4.4-irsa \
    --conf spark.kubernetes.namespace=spark-pi \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi \
    --conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-pi \
    --conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
    --conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
    --conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
    local:///opt/spark/examples/target/scala-2.11/jars/spark-examples_2.11-2.4.4.jar 20000

Noted: as you see above command, we need following parameters for IRSA works:

注意：如您在上面的命令中所见，对于IRSA，我们需要以下参数：

--conf spark.kubernetes.namespace=spark-pi
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-pi
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-pi
--conf spark.hadoop.fs.s3a.aws.credentials.provider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider

After run spark-submit command, verify Spark Driver and Executor pods have correct serviceAccount defined:
运行spark-submit命令后，请验证Spark Driver和Executor Pod是否定义了正确的serviceAccount：

➜  SPARK_APP=$(kubectl get pods -o jsonpath='{.metadata.labels.spark-app-selector}' -n spark-pi spark-pi-driver)➜  kubectl get pods -l spark-app-selector=$SPARK_APP -n spark-pi
NAME                            READY   STATUS    RESTARTS   AGE
spark-pi-1597431578845-exec-1   1/1     Running   0          11s
spark-pi-1597431578845-exec-2   1/1     Running   0          11s
spark-pi-driver                 1/1     Running   0          17s➜  kubectl get pods -o jsonpath='{.spec.serviceAccount}' spark-pi-driver -n spark-pi
spark-pi➜  kubectl get pods -o jsonpath='{.spec.serviceAccount}' spark-pi-1597431578845-exec-1 -n spark-pi
spark-pi

Also checking the environment of those pods, you will see the environment variable AWS_ROLE_ARN value:arn:aws:iam:::role/spark-irsa-test-role, which was added as spark-pi serviceAccount’s annotations:
还要检查这些pod的环境，您将看到环境变量AWS_ROLE_ARN value:arn:aws:iam:::role/spark-irsa-test-role ，它已作为spark-pi serviceAccount的注释添加：

➜  kubectl get pods -o jsonpath='{.spec.containers[0].env}' spark-pi-driver -n spark-pi
[map[name:SPARK_DRIVER_BIND_ADDRESS valueFrom:map[fieldRef:map[apiVersion:v1 fieldPath:status.podIP]]] map[name:SPARK_LOCAL_DIRS value:/var/data/spark-d90a2560-dbe3-4cb5-bc1f-8a0bba614b55] map[name:SPARK_CONF_DIR value:/opt/spark/conf] map[name:AWS_ROLE_ARN value:arn:aws:iam:::role/spark-irsa-test-role] map[name:AWS_WEB_IDENTITY_TOKEN_FILE value:/var/run/secrets/eks.amazonaws.com/serviceaccount/token]]➜  kubectl get pods -o jsonpath='{.spec.containers[0].env}' spark-pi-1597431578845-exec-1 -n spark-pi
[map[name:SPARK_DRIVER_URL value:spark://CoarseGrainedScheduler@spark-pi-1597431578845-driver-svc.spark-pi.svc:7078] map[name:SPARK_EXECUTOR_CORES value:1] map[name:SPARK_EXECUTOR_MEMORY value:1g] map[name:SPARK_APPLICATION_ID value:spark-439e399c63e242dcb995c4ec0384ab36] map[name:SPARK_CONF_DIR value:/opt/spark/conf] map[name:SPARK_EXECUTOR_ID value:1] map[name:SPARK_EXECUTOR_POD_IP valueFrom:map[fieldRef:map[apiVersion:v1 fieldPath:status.podIP]]] map[name:SPARK_LOCAL_DIRS value:/var/data/spark-d90a2560-dbe3-4cb5-bc1f-8a0bba614b55] map[name:AWS_ROLE_ARN value:arn:aws:iam:::role/spark-irsa-test-role] map[name:AWS_WEB_IDENTITY_TOKEN_FILE value:/var/run/secrets/eks.amazonaws.com/serviceaccount/token]]

Now you can configure your Spark jobs to interact with AWS resources like S3 or DynamoDB with proper IAM policies. You can also restrict the IAM role to specific serviceAccount in specific namespace using OIDC and IRSA :D.

现在，您可以配置Spark作业，以使用适当的IAM策略与AWS资源(例如S3或DynamoDB)进行交互。您还可以使用OIDC和IRSA：D将IAM角色限制为特定名称空间中的特定serviceAccount。

Feel free to let me know if you’re facing any issue when following this tutorial.

遵循本教程时，如果您遇到任何问题，请随时告诉我。

翻译自: https://medium.com/@tunguyen9889/how-to-perform-a-spark-submit-to-amazon-eks-cluster-with-irsa-50af9b26cae