AWS IRSA (IAM Roles for Service Accounts) - Complete Enterprise Guide

Master secure AWS service access in EKS with advanced IRSA implementation patterns

Featured image



Table of Contents



Overview

AWS IAM Roles for Service Accounts (IRSA) revolutionizes how Kubernetes workloads securely access AWS services.

By leveraging OpenID Connect (OIDC) and AWS Security Token Service (STS), IRSA eliminates the need for long-lived credentials while providing fine-grained, pod-level access control.


Why IRSA Matters
  • Zero Trust Security: No long-lived credentials stored in clusters
  • Granular Permissions: Pod-level access control with least privilege
  • Kubernetes Native: Seamless integration with ServiceAccounts
  • Audit Trail: Complete visibility into AWS API calls
  • Operational Simplicity: Automatic credential rotation and management



The Problem with Traditional Approaches


1. EC2 Instance Profiles - The Overprivileged Problem

Before IRSA, the most common approach was using EC2 instance profiles, which grants the same permissions to all pods on a node.

# All pods inherit the same permissions
aws sts get-caller-identity
# Returns the same role for every pod on the node

Critical Issues:


2. Embedded Credentials - The Security Nightmare

# Anti-pattern: Never do this
env:
- name: AWS_ACCESS_KEY_ID
  value: "AKIAIOSFODNN7EXAMPLE"
- name: AWS_SECRET_ACCESS_KEY
  value: "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"

Security Risks:


3. External Secret Management - The Complexity Trap

# Complex initialization process
initContainers:
- name: secret-fetcher
  image: vault:latest
  command: ['sh', '-c', 'vault auth && vault kv get -field=aws_key secret/myapp']


Operational Overhead:


IRSA: The Elegant Solution

How IRSA Solves These Problems
  • Automatic Credential Provisioning: AWS SDK automatically discovers and uses IRSA credentials
  • Temporary Credentials: Short-lived tokens that auto-rotate
  • Pod-Level Isolation: Each ServiceAccount maps to specific IAM role
  • Zero Configuration: No credential management in application code
  • Full Auditability: CloudTrail tracks every API call with pod identity


What is AWS IRSA?

AWS IRSA is a sophisticated identity federation mechanism that bridges Kubernetes ServiceAccounts with AWS IAM roles. It uses industry-standard protocols (OIDC) to establish trust between your EKS cluster and AWS IAM, enabling secure, temporary credential exchange without storing any long-lived secrets.



Core Components Deep Dive

IRSA Architecture Components
  • Kubernetes ServiceAccounts: Identity primitive for pods
  • AWS IAM Roles: Permission container with trust policies
  • OpenID Connect (OIDC) Provider: Identity federation bridge
  • Security Token Service (STS): Temporary credential issuer
  • EKS Pod Identity Webhook: Automatic credential injection
  • AWS SDK: Automatic credential discovery and rotation


OpenID Connect (OIDC) - The Trust Foundation

OpenID Connect serves as the cornerstone of IRSA’s security model, establishing cryptographically verifiable trust between EKS and AWS IAM.

Technical Deep Dive

{
  "iss": "https://oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E",
  "aud": "sts.amazonaws.com",
  "sub": "system:serviceaccount:namespace:service-account-name",
  "iat": 1516239022,
  "exp": 1516242622,
  "kubernetes.io": {
    "namespace": "production",
    "serviceaccount": {
      "name": "my-service-account",
      "uid": "12345678-1234-1234-1234-123456789012"
    },
    "pod": {
      "name": "my-pod-12345",
      "uid": "87654321-4321-4321-4321-210987654321"
    }
  }
}

Key OIDC Features for IRSA:

OIDC Provider Setup and Verification


Kubernetes ServiceAccounts - Identity Primitives

ServiceAccounts in Kubernetes serve as the identity foundation for IRSA, providing a secure mechanism to associate AWS IAM roles with specific pods.

Advanced ServiceAccount Configuration

apiVersion: v1
kind: ServiceAccount
metadata:
  name: advanced-service-account
  namespace: production
  annotations:
    # Primary IRSA annotation
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/MyAppRole
    
    # Optional: Specify audience (default: sts.amazonaws.com)
    eks.amazonaws.com/audience: sts.amazonaws.com
    
    # Optional: Custom token expiration (default: 86400 seconds)
    eks.amazonaws.com/token-expiration: "3600"
    
    # Optional: STS regional endpoints
    eks.amazonaws.com/sts-regional-endpoints: "true"
  labels:
    app: my-application
    tier: backend
    security.policy: irsa-enabled
---
# Role binding for Kubernetes RBAC
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: production
  name: serviceaccount-reader
rules:
- apiGroups: [""]
  resources: ["serviceaccounts"]
  verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: read-serviceaccounts
  namespace: production
subjects:
- kind: ServiceAccount
  name: advanced-service-account
  namespace: production
roleRef:
  kind: Role
  name: serviceaccount-reader
  apiGroup: rbac.authorization.k8s.io

ServiceAccount Best Practices

ServiceAccount Security Guidelines
  • Namespace Isolation: Use dedicated namespaces for different environments
  • Naming Convention: Include app name and environment in ServiceAccount names
  • Automount Control: Disable automounting when not needed
  • RBAC Integration: Combine with Kubernetes RBAC for defense in depth
# Example: Disable automatic token mounting
apiVersion: v1
kind: ServiceAccount
metadata:
  name: no-auto-mount-sa
  namespace: production
automountServiceAccountToken: false  # Disable if not needed


AWS Security Token Service (STS) - Credential Engine

STS acts as the credential engine in IRSA, issuing temporary, automatically rotating credentials based on OIDC token validation.

STS Token Exchange Flow

Advanced STS Configuration

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:sub": "system:serviceaccount:production:my-service-account",
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "oidc.eks.us-west-2.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E:amr": "authenticated"
        },
        "DateGreaterThan": {
          "aws:TokenIssueTime": "2024-01-01T00:00:00Z"
        },
        "NumericLessThan": {
          "aws:TokenAge": "86400"
        }
      }
    }
  ]
}

Advanced Trust Policy Features:



IRSA Workflow and Architecture


Complete End-to-End Workflow

The IRSA workflow involves multiple components working together to provide seamless, secure access to AWS services. Here’s the detailed step-by-step process:

IRSA Token Exchange Flow
  1. Pod Startup: EKS Pod Identity Webhook injects OIDC token
  2. Token Projection: Kubernetes mounts projected token volume
  3. AWS SDK Discovery: AWS SDK automatically discovers IRSA credentials
  4. STS Token Exchange: SDK calls AssumeRoleWithWebIdentity
  5. OIDC Validation: STS validates JWT with OIDC provider
  6. Trust Policy Check: IAM evaluates trust relationships
  7. Temporary Credentials: STS issues short-lived credentials
  8. AWS API Calls: Pod uses temporary credentials for AWS services
  9. Automatic Renewal: SDK handles credential refresh transparently


Detailed Architecture Diagram Analysis

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   EKS Cluster   │    │   AWS IAM/STS    │    │  AWS Services   │
│                 │    │                  │    │                 │
│  ┌───────────┐  │    │  ┌─────────────┐ │    │  ┌───────────┐  │
│  │    Pod    │  │    │  │     STS     │ │    │  │    S3     │  │
│  │           │  │    │  │             │ │    │  │           │  │
│  │  AWS SDK  │  │◄──►│  │AssumeRole   │ │    │  │  Buckets  │  │
│  │           │  │    │  │WithWebID    │ │    │  │           │  │
│  └─────┬─────┘  │    │  └─────────────┘ │    │  └───────────┘  │
│        │        │    │         │        │    │         │       │
│  ┌─────▼─────┐  │    │  ┌──────▼──────┐ │    │  ┌──────▼────┐  │
│  │ServiceAcc │  │    │  │ OIDC Provider│ │    │  │ DynamoDB  │  │
│  │ + Role    │  │    │  │ Validation   │ │    │  │   Tables  │  │
│  │Annotation │  │    │  └─────────────┘ │    │  └───────────┘  │
│  └───────────┘  │    │         │        │    │         │       │
│        │        │    │  ┌──────▼──────┐ │    │  ┌──────▼────┐  │
│  ┌─────▼─────┐  │    │  │  IAM Role   │ │    │  │    SQS    │  │
│  │ OIDC JWT  │  │    │  │Trust Policy │ │    │  │   Queues  │  │
│  │  Token    │  │    │  └─────────────┘ │    │  └───────────┘  │
│  └───────────┘  │    └──────────────────┘    └─────────────────┘
└─────────────────┘


Real-World Example: S3 Access Workflow

Let’s walk through a practical example where a pod needs to access S3:

# 1. ServiceAccount with IRSA configuration
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-processor
  namespace: data-pipeline
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/S3ProcessorRole
# 2. Application code (Python with boto3)
import boto3
import os

def process_s3_files():
    # AWS SDK automatically discovers IRSA credentials
    # No explicit credential configuration needed!
    s3_client = boto3.client('s3')
    
    try:
        # List objects in bucket
        response = s3_client.list_objects_v2(Bucket='my-data-bucket')
        
        for obj in response.get('Contents', []):
            print(f"Processing file: {obj['Key']}")
            
            # Download and process file
            s3_client.download_file(
                'my-data-bucket', 
                obj['Key'], 
                f"/tmp/{obj['Key']}"
            )
            
    except ClientError as e:
        print(f"Error accessing S3: {e}")

if __name__ == "__main__":
    process_s3_files()
# 3. Pod deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3-processor
  namespace: data-pipeline
spec:
  replicas: 3
  selector:
    matchLabels:
      app: s3-processor
  template:
    metadata:
      labels:
        app: s3-processor
    spec:
      serviceAccountName: s3-processor  # Links to IRSA-enabled ServiceAccount
      containers:
      - name: processor
        image: my-company/s3-processor:v1.0.0
        env:
        - name: AWS_REGION
          value: us-west-2
        # No AWS credentials needed - IRSA handles automatically!
        resources:
          requests:
            memory: "256Mi"
            cpu: "250m"
          limits:
            memory: "512Mi"
            cpu: "500m"


Behind the Scenes: Credential Discovery

# Inside the pod, AWS SDK follows this credential discovery chain:
# 1. Environment variables (AWS_ACCESS_KEY_ID, etc.) - Empty
# 2. AWS credentials file (~/.aws/credentials) - Not present
# 3. IAM instance profile - Not applicable in containers
# 4. Web Identity Token (IRSA) - Found and used!

# The projected token is mounted at:
ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount/
# token  # JWT token for OIDC authentication

# AWS SDK reads this token and makes STS call:
cat /var/run/secrets/eks.amazonaws.com/serviceaccount/token
# eyJhbGciOiJSUzI1NiIsImtpZCI6... (JWT token)


Credential Lifecycle Management

Automatic Credential Lifecycle
  • Token Issuance: Kubernetes issues JWT with 1-hour expiration
  • STS Exchange: AWS SDK exchanges JWT for temporary credentials
  • Credential Caching: SDK caches credentials until near expiration
  • Automatic Refresh: SDK automatically refreshes before expiration
  • Graceful Fallback: Retry logic handles temporary failures


Security Validation Points

The IRSA workflow includes multiple security validation checkpoints:

  1. Kubernetes API Server Validation:
    • Validates ServiceAccount exists
    • Checks RBAC permissions
    • Verifies pod is authorized to use ServiceAccount
  2. OIDC Token Validation:
    • Cryptographic signature verification
    • Audience claim validation (sts.amazonaws.com)
    • Subject claim validation (namespace:serviceaccount)
    • Token expiration check
  3. IAM Trust Policy Evaluation:
    • Federated identity verification
    • Condition evaluation (StringEquals, StringLike, etc.)
    • Principal validation
  4. STS Token Issuance:
    • Temporary credential generation
    • Session policy application
    • Permission boundary enforcement



Implementation Guide


Prerequisites and Environment Setup

Before implementing IRSA, ensure your environment meets these requirements:

Prerequisites Checklist
  • EKS Version: 1.13+ (recommended: 1.21+)
  • AWS CLI: Version 2.0+ with appropriate permissions
  • kubectl: Compatible with your EKS version
  • eksctl: Latest version (optional but recommended)
  • IAM Permissions: iam:CreateRole, iam:CreateOpenIDConnectProvider


Step 1: OIDC Provider Setup (Advanced)



Step 2: Advanced IAM Role Creation



Step 3: Advanced ServiceAccount Configuration

# advanced-serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: s3-processor
  namespace: data-pipeline
  annotations:
    # Required: IAM role ARN
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/S3ProcessorRole
    
    # Optional: Specify audience (default: sts.amazonaws.com)
    eks.amazonaws.com/audience: sts.amazonaws.com
    
    # Optional: Token expiration (default: 86400 seconds)
    eks.amazonaws.com/token-expiration: "3600"
    
    # Optional: STS regional endpoints for better performance
    eks.amazonaws.com/sts-regional-endpoints: "true"
    
    # Metadata for operational purposes
    description: "ServiceAccount for S3 data processing workloads"
    owner: "data-platform-team"
    cost-center: "engineering"
  labels:
    app.kubernetes.io/name: s3-processor
    app.kubernetes.io/component: data-pipeline
    app.kubernetes.io/part-of: analytics-platform
    security.policy/irsa: "enabled"
    compliance.level: "high"
automountServiceAccountToken: true  # Required for IRSA
---
# Create namespace if it doesn't exist
apiVersion: v1
kind: Namespace
metadata:
  name: data-pipeline
  labels:
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
---
# RBAC configuration
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: data-pipeline
  name: s3-processor-role
rules:
- apiGroups: [""]
  resources: ["secrets", "configmaps"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["events"]
  verbs: ["create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: s3-processor-binding
  namespace: data-pipeline
subjects:
- kind: ServiceAccount
  name: s3-processor
  namespace: data-pipeline
roleRef:
  kind: Role
  name: s3-processor-role
  apiGroup: rbac.authorization.k8s.io


Step 4: Production-Ready Deployment

# production-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: s3-processor
  namespace: data-pipeline
  labels:
    app: s3-processor
    version: v1.0.0
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: s3-processor
  template:
    metadata:
      labels:
        app: s3-processor
        version: v1.0.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
        prometheus.io/path: "/metrics"
    spec:
      serviceAccountName: s3-processor
      securityContext:
        runAsNonRoot: true
        runAsUser: 65534
        fsGroup: 65534
      containers:
      - name: processor
        image: my-company/s3-processor:v1.0.0
        imagePullPolicy: Always
        ports:
        - containerPort: 8080
          name: metrics
        env:
        - name: AWS_REGION
          value: us-west-2
        - name: AWS_DEFAULT_REGION
          value: us-west-2
        # Enable STS regional endpoints for better performance
        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional
        # Optional: Enable AWS SDK debug logging
        - name: AWS_SDK_LOAD_CONFIG
          value: "1"
        # Application-specific configuration
        - name: S3_BUCKET_NAME
          value: my-data-bucket
        - name: PROCESSING_CONCURRENCY
          value: "10"
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 8080
          initialDelaySeconds: 5
          periodSeconds: 5
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          capabilities:
            drop:
            - ALL
        volumeMounts:
        - name: tmp
          mountPath: /tmp
        - name: cache
          mountPath: /app/cache
      volumes:
      - name: tmp
        emptyDir: {}
      - name: cache
        emptyDir: {}
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - s3-processor
              topologyKey: kubernetes.io/hostname


Step 5: Verification and Testing



Advanced Configuration Patterns


Multi-Tenant IRSA Setup

For organizations running multiple teams or applications in shared EKS clusters:

# Namespace-based isolation
apiVersion: v1
kind: Namespace
metadata:
  name: team-alpha
  labels:
    team: alpha
    environment: production
---
apiVersion: v1
kind: Namespace
metadata:
  name: team-beta
  labels:
    team: beta
    environment: production
---
# Team Alpha ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: team-alpha
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/TeamAlphaAppRole
---
# Team Beta ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: app-service-account
  namespace: team-beta
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/TeamBetaAppRole

Cross-Account IRSA Configuration

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::ACCOUNT-A:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID:sub": "system:serviceaccount:production:cross-account-service"
        }
      }
    },
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::ACCOUNT-A:role/CrossAccountRole"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "sts:ExternalId": "unique-external-id-12345"
        }
      }
    }
  ]
}


Environment-Specific Configurations

# Environment-specific role naming convention
ENVIRONMENT="production"  # or staging, development
APPLICATION="data-processor"
ROLE_NAME="${ENVIRONMENT}-${APPLICATION}-role"

# Create environment-specific trust policies
cat > ${ENVIRONMENT}-trust-policy.json << EOF
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::${ACCOUNT_ID}:oidc-provider/oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:sub": "system:serviceaccount:${ENVIRONMENT}:${APPLICATION}",
          "oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "oidc.eks.${REGION}.amazonaws.com/id/${OIDC_ID}:amr": "authenticated"
        }
      }
    }
  ]
}
EOF



Enterprise Use Cases


1. Data Pipeline Architecture

# Real-world data processing pipeline with IRSA
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: data-pipeline
  namespace: data-platform
spec:
  serviceAccountName: data-pipeline-sa
  entrypoint: process-data
  templates:
  - name: process-data
    dag:
      tasks:
      - name: extract
        template: s3-extractor
      - name: transform
        template: data-transformer
        dependencies: [extract]
      - name: load
        template: rds-loader
        dependencies: [transform]
  
  - name: s3-extractor
    container:
      image: data-platform/extractor:v2.1.0
      env:
      - name: AWS_REGION
        value: us-west-2
      - name: SOURCE_BUCKET
        value: raw-data-bucket
      resources:
        requests:
          memory: 1Gi
          cpu: 500m
        limits:
          memory: 2Gi
          cpu: 1000m


2. Microservices with Different AWS Permissions

Microservices IRSA Pattern
  • User Service: DynamoDB read/write, Cognito access
  • Order Service: SQS/SNS, RDS access, S3 receipts
  • Payment Service: Secrets Manager, KMS encryption
  • Notification Service: SES, SNS publishing
  • Analytics Service: Kinesis, S3 data lake, Athena
# User Service - DynamoDB focused
apiVersion: v1
kind: ServiceAccount
metadata:
  name: user-service-sa
  namespace: microservices
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/UserServiceRole
---
# Order Service - Multiple services
apiVersion: v1
kind: ServiceAccount
metadata:
  name: order-service-sa
  namespace: microservices
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/OrderServiceRole
---
# Payment Service - High security requirements
apiVersion: v1
kind: ServiceAccount
metadata:
  name: payment-service-sa
  namespace: microservices
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/PaymentServiceRole
    # Additional security constraints
    eks.amazonaws.com/token-expiration: "900"  # 15 minutes


3. CI/CD Pipeline Integration

# GitLab CI/CD with IRSA for deployment
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gitlab-runner-sa
  namespace: gitlab-system
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/GitLabRunnerRole
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gitlab-runner
  namespace: gitlab-system
spec:
  replicas: 3
  selector:
    matchLabels:
      app: gitlab-runner
  template:
    metadata:
      labels:
        app: gitlab-runner
    spec:
      serviceAccountName: gitlab-runner-sa
      containers:
      - name: gitlab-runner
        image: gitlab/gitlab-runner:latest
        env:
        - name: AWS_REGION
          value: us-west-2
        # Can now push to ECR, deploy to EKS, update CloudFormation
        volumeMounts:
        - name: docker-sock
          mountPath: /var/run/docker.sock
      volumes:
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock


4. Machine Learning Workloads

# ML training job with IRSA
import boto3
import sagemaker
from sagemaker.pytorch import PyTorch

# AWS SDK automatically uses IRSA credentials
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()  # Uses IRSA role

# Define training job
estimator = PyTorch(
    entry_point='train.py',
    source_dir='src',
    role=role,  # IRSA-provided role
    instance_type='ml.p3.2xlarge',
    instance_count=1,
    framework_version='1.8.0',
    py_version='py38',
    hyperparameters={
        'epochs': 10,
        'batch-size': 32,
        'learning-rate': 0.001
    },
    # Data stored in S3 (accessible via IRSA)
    inputs={
        'training': 's3://ml-data-bucket/train/',
        'validation': 's3://ml-data-bucket/val/'
    },
    # Model output to S3
    output_path='s3://ml-models-bucket/experiments/',
    
    # Use Spot instances with IRSA for cost optimization
    use_spot_instances=True,
    max_wait=7200,
    max_run=3600
)

# Start training
estimator.fit()


5. Monitoring and Observability Stack

# Prometheus with IRSA for CloudWatch integration
apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-sa
  namespace: monitoring
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/PrometheusRole
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: monitoring
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    remote_write:
    - url: https://aps-workspaces.us-west-2.amazonaws.com/workspaces/ws-12345/api/v1/remote_write
      queue_config:
        max_samples_per_send: 1000
        max_shards: 200
        capacity: 2500
        
    scrape_configs:
    - job_name: 'kubernetes-pods'
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true


6. Event-Driven Architecture

# SQS Consumer with IRSA
apiVersion: apps/v1
kind: Deployment
metadata:
  name: event-processor
  namespace: events
spec:
  replicas: 5
  selector:
    matchLabels:
      app: event-processor
  template:
    metadata:
      labels:
        app: event-processor
    spec:
      serviceAccountName: event-processor-sa
      containers:
      - name: processor
        image: company/event-processor:v1.2.0
        env:
        - name: AWS_REGION
          value: us-west-2
        - name: SQS_QUEUE_URL
          value: https://sqs.us-west-2.amazonaws.com/123456789012/event-queue
        - name: DLQ_URL
          value: https://sqs.us-west-2.amazonaws.com/123456789012/event-dlq
        - name: MAX_MESSAGES
          value: "10"
        - name: VISIBILITY_TIMEOUT
          value: "300"
        resources:
          requests:
            memory: 256Mi
            cpu: 250m
          limits:
            memory: 512Mi
            cpu: 500m
        livenessProbe:
          httpGet:
            path: /health
            port: 8080
          periodSeconds: 30


7. Cost Optimization Patterns

IRSA Cost Optimization Strategies
  • Regional STS Endpoints: Reduce latency and data transfer costs
  • Credential Caching: Minimize STS API calls
  • Spot Instances: Use IRSA with Spot instances for batch workloads
  • Resource Tagging: Track costs by application and team using IRSA
# Cost-optimized IRSA configuration
export AWS_STS_REGIONAL_ENDPOINTS=regional
export AWS_SDK_LOAD_CONFIG=1

# Enable credential caching
export AWS_CREDENTIAL_CACHE_ENABLED=true
export AWS_CREDENTIAL_CACHE_MAX_ITEMS=100



Security Best Practices


1. Principle of Least Privilege Implementation

Zero Trust Security Model
  • Granular Permissions: Each ServiceAccount gets only required permissions
  • Resource-Level Access: Restrict access to specific S3 buckets, DynamoDB tables
  • Condition-Based Policies: Add time, IP, and encryption requirements
  • Regular Auditing: Continuously review and remove unused permissions

Advanced IAM Policy Examples

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RestrictedS3Access",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject"
      ],
      "Resource": "arn:aws:s3:::company-data-${aws:PrincipalTag/Environment}/*",
      "Condition": {
        "StringEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms",
          "s3:x-amz-server-side-encryption-aws-kms-key-id": "arn:aws:kms:us-west-2:123456789012:key/12345678-1234-1234-1234-123456789012"
        },
        "StringLike": {
          "s3:x-amz-metadata-directive": "REPLACE"
        },
        "DateGreaterThan": {
          "aws:CurrentTime": "2024-01-01T00:00:00Z"
        },
        "DateLessThan": {
          "aws:CurrentTime": "2025-12-31T23:59:59Z"
        }
      }
    },
    {
      "Sid": "DynamoDBTableAccess",
      "Effect": "Allow",
      "Action": [
        "dynamodb:GetItem",
        "dynamodb:PutItem",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteItem",
        "dynamodb:Query"
      ],
      "Resource": [
        "arn:aws:dynamodb:${aws:RequestedRegion}:${aws:userid}:table/${aws:PrincipalTag/Application}-*"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
          "dynamodb:Attributes": ["id", "timestamp", "data", "metadata"]
        },
        "StringEquals": {
          "dynamodb:Select": ["ALL_ATTRIBUTES", "ALL_PROJECTED_ATTRIBUTES"]
        }
      }
    }
  ]
}


2. Advanced Trust Policy Security

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID:sub": "system:serviceaccount:production:secure-app",
          "oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID:aud": "sts.amazonaws.com"
        },
        "StringLike": {
          "oidc.eks.us-west-2.amazonaws.com/id/OIDC-ID:amr": "authenticated"
        },
        "DateGreaterThan": {
          "aws:TokenIssueTime": "2024-01-01T00:00:00Z"
        },
        "NumericLessThan": {
          "aws:TokenAge": "3600"
        },
        "IpAddress": {
          "aws:SourceIp": [
            "10.0.0.0/8",
            "172.16.0.0/12",
            "192.168.0.0/16"
          ]
        },
        "StringEquals": {
          "aws:RequestedRegion": ["us-west-2", "us-east-1"]
        },
        "Bool": {
          "aws:SecureTransport": "true"
        }
      }
    }
  ]
}


3. Namespace and Environment Isolation

# Production namespace with strict security
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    environment: production
    pod-security.kubernetes.io/enforce: restricted
    pod-security.kubernetes.io/audit: restricted
    pod-security.kubernetes.io/warn: restricted
  annotations:
    scheduler.alpha.kubernetes.io/node-selector: environment=production
---
# Network policy for namespace isolation
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: production-isolation
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          environment: production
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          environment: production
  - to: []
    ports:
    - protocol: TCP
      port: 443  # HTTPS to AWS APIs
    - protocol: TCP
      port: 53   # DNS
    - protocol: UDP
      port: 53   # DNS


4. Monitoring and Alerting

# CloudWatch monitoring for IRSA
apiVersion: v1
kind: ConfigMap
metadata:
  name: irsa-monitoring-config
  namespace: monitoring
data:
  cloudwatch-config.json: |
    {
      "agent": {
        "metrics_collection_interval": 60,
        "run_as_user": "cwagent"
      },
      "metrics": {
        "namespace": "EKS/IRSA",
        "metrics_collected": {
          "cpu": {
            "measurement": ["cpu_usage_idle", "cpu_usage_iowait"],
            "metrics_collection_interval": 60
          },
          "disk": {
            "measurement": ["used_percent"],
            "metrics_collection_interval": 60,
            "resources": ["*"]
          },
          "mem": {
            "measurement": ["mem_used_percent"],
            "metrics_collection_interval": 60
          }
        }
      },
      "logs": {
        "logs_collected": {
          "files": {
            "collect_list": [
              {
                "file_path": "/var/log/containers/*irsa*.log",
                "log_group_name": "/eks/irsa/applications",
                "log_stream_name": "{instance_id}-{hostname}",
                "timezone": "UTC"
              }
            ]
          }
        }
      }
    }


5. Security Scanning and Compliance


6. Incident Response for IRSA

IRSA Security Incident Response
  • Immediate Actions: Disable compromised ServiceAccount, rotate OIDC provider thumbprints
  • Investigation: Analyze CloudTrail logs for unauthorized AssumeRoleWithWebIdentity calls
  • Containment: Update trust policies to restrict access, implement IP allowlists
  • Recovery: Create new IRSA roles, update applications, verify security controls


7. Compliance and Governance

# OPA Gatekeeper policy for IRSA governance
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
  name: irsarequired
spec:
  crd:
    spec:
      names:
        kind: IRSARequired
      validation:
        openAPIV3Schema:
          type: object
          properties:
            exemptNamespaces:
              type: array
              items:
                type: string
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package irsarequired
        
        violation[{"msg": msg}] {
          input.review.kind.kind == "Pod"
          input.review.object.spec.serviceAccountName
          not exempt_namespace
          not has_irsa_annotation
          msg := "Pod must use ServiceAccount with IRSA annotation"
        }
        
        has_irsa_annotation {
          sa_name := input.review.object.spec.serviceAccountName
          sa := data.inventory.namespace[input.review.object.metadata.namespace]["v1"]["ServiceAccount"][sa_name]
          sa.metadata.annotations["eks.amazonaws.com/role-arn"]
        }
        
        exempt_namespace {
          input.review.object.metadata.namespace == input.parameters.exemptNamespaces[_]
        }
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: IRSARequired
metadata:
  name: irsa-required-production
spec:
  match:
    kinds:
      - apiGroups: [""]
        kinds: ["Pod"]
    namespaces: ["production", "staging"]
  parameters:
    exemptNamespaces: ["kube-system", "kube-public"]



Monitoring and Operations


1. Comprehensive CloudTrail Monitoring

{
  "eventVersion": "1.05",
  "userIdentity": {
    "type": "WebIdentityUser",
    "principalId": "OIDC:system:serviceaccount:production:my-app",
    "arn": "arn:aws:sts::123456789012:assumed-role/MyAppRole/eks-cluster-system-serviceaccount-production-my-app",
    "accountId": "123456789012",
    "userName": "system:serviceaccount:production:my-app"
  },
  "eventTime": "2024-01-15T10:30:00Z",
  "eventSource": "s3.amazonaws.com",
  "eventName": "GetObject",
  "awsRegion": "us-west-2",
  "sourceIPAddress": "10.0.1.100",
  "resources": [{
    "accountId": "123456789012",
    "type": "AWS::S3::Object",
    "ARN": "arn:aws:s3:::my-data-bucket/data/file.json"
  }]
}

CloudTrail Query Examples


2. Prometheus Metrics for IRSA

# ServiceMonitor for IRSA metrics
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: irsa-metrics
  namespace: monitoring
spec:
  selector:
    matchLabels:
      app: irsa-exporter
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics
---
# Custom IRSA exporter deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: irsa-exporter
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: irsa-exporter
  template:
    metadata:
      labels:
        app: irsa-exporter
    spec:
      serviceAccountName: irsa-monitoring-sa
      containers:
      - name: exporter
        image: company/irsa-exporter:v1.0.0
        ports:
        - containerPort: 8080
          name: metrics
        env:
        - name: AWS_REGION
          value: us-west-2
        - name: CLUSTER_NAME
          value: production-eks

Key IRSA Metrics to Monitor

# Token refresh rate
rate(aws_sdk_token_refresh_total[5m])

# STS API call success rate
rate(aws_sts_assume_role_success_total[5m]) / rate(aws_sts_assume_role_total[5m])

# Token expiration warnings
aws_sdk_token_expiry_seconds < 300

# Failed authentication attempts
rate(aws_sts_assume_role_failed_total[5m])

# Regional endpoint usage
aws_sts_regional_endpoint_usage_total


3. Alerting Rules


4. Operational Dashboards

{
  "dashboard": {
    "title": "IRSA Operations Dashboard",
    "panels": [
      {
        "title": "IRSA Token Refresh Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(aws_sdk_token_refresh_total[5m])",
            "legendFormat": "/"
          }
        ]
      },
      {
        "title": "STS API Success Rate",
        "type": "stat",
        "targets": [
          {
            "expr": "rate(aws_sts_assume_role_success_total[5m]) / rate(aws_sts_assume_role_total[5m]) * 100"
          }
        ]
      },
      {
        "title": "AWS API Latency by Service",
        "type": "heatmap",
        "targets": [
          {
            "expr": "histogram_quantile(0.95, rate(aws_api_duration_seconds_bucket[5m]))"
          }
        ]
      }
    ]
  }
}


5. Automated Health Checks



Troubleshooting Guide


Common Issues and Solutions

Top IRSA Issues
  • Token Projection Failures: Missing webhook or incorrect ServiceAccount
  • Trust Policy Mismatches: Incorrect namespace/ServiceAccount in conditions
  • OIDC Provider Issues: Missing or misconfigured OIDC provider
  • Network Connectivity: Blocked access to AWS STS endpoints
  • Clock Skew: Time synchronization issues affecting token validation


1. Authentication Failures

Issue: “An error occurred (AccessDenied) when calling the AssumeRoleWithWebIdentity operation”

Solution: Fix Trust Policy

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Federated": "arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-west-2.amazonaws.com/id/CORRECT-OIDC-ID"
      },
      "Action": "sts:AssumeRoleWithWebIdentity",
      "Condition": {
        "StringEquals": {
          "oidc.eks.us-west-2.amazonaws.com/id/CORRECT-OIDC-ID:sub": "system:serviceaccount:CORRECT-NAMESPACE:CORRECT-SERVICE-ACCOUNT",
          "oidc.eks.us-west-2.amazonaws.com/id/CORRECT-OIDC-ID:aud": "sts.amazonaws.com"
        }
      }
    }
  ]
}


2. Token Projection Issues

Issue: “Unable to load AWS credentials”

# Debug token projection
kubectl exec -it my-pod -n production -- /bin/bash

# Check if token directory exists
ls -la /var/run/secrets/eks.amazonaws.com/serviceaccount/

# If missing, check ServiceAccount
kubectl get serviceaccount my-app-sa -n production -o yaml

# Check if automountServiceAccountToken is enabled
kubectl get serviceaccount my-app-sa -n production -o jsonpath='{.automountServiceAccountToken}'

# Check pod specification
kubectl get pod my-pod -n production -o yaml | grep -A 10 -B 10 serviceAccount

Solution: Enable Token Projection

# Fix ServiceAccount
apiVersion: v1
kind: ServiceAccount
metadata:
  name: my-app-sa
  namespace: production
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/MyAppRole
automountServiceAccountToken: true  # Must be true for IRSA


3. Network Connectivity Issues

Issue: “Unable to retrieve credentials from STS”

Solution: Configure Network Access

# Ensure network policies allow HTTPS egress
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-aws-apis
  namespace: production
spec:
  podSelector: {}
  policyTypes:
  - Egress
  egress:
  - to: []
    ports:
    - protocol: TCP
      port: 443
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53


4. Clock Skew Issues

Issue: “Token has expired”

Solution: Fix Time Synchronization

# Add NTP sidecar if needed
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  template:
    spec:
      initContainers:
      - name: time-sync
        image: ubuntu:20.04
        command: ['sh', '-c', 'apt-get update && apt-get install -y ntpdate && ntpdate -s time.nist.gov']
        securityContext:
          privileged: true
      containers:
      - name: app
        image: my-app:latest


5. Advanced Debugging Tools


6. Performance Troubleshooting



Migration Strategies


1. From EC2 Instance Profiles to IRSA

Migration Phases
  1. Assessment: Audit current permissions and applications
  2. Planning: Design IRSA roles with least privilege
  3. Parallel Deployment: Run both systems temporarily
  4. Gradual Migration: Move applications one by one
  5. Verification: Ensure functionality and security
  6. Cleanup: Remove old instance profile permissions

Migration Plan Template

# Phase 1: Create IRSA roles with same permissions
apiVersion: v1
kind: ConfigMap
metadata:
  name: migration-plan
  namespace: migration
data:
  phase1.sh: |
    #!/bin/bash
    # Create IRSA role with current instance profile permissions
    CURRENT_POLICIES=$(aws iam list-attached-role-policies --role-name NodeInstanceRole --query 'AttachedPolicies[].PolicyArn' --output text)
    
    for policy in $CURRENT_POLICIES; do
      aws iam attach-role-policy --role-name NewIRSARole --policy-arn $policy
    done

  phase2.sh: |
    #!/bin/bash
    # Deploy applications with IRSA (parallel to existing)
    kubectl apply -f irsa-serviceaccount.yaml
    kubectl apply -f irsa-deployment.yaml

  phase3.sh: |
    #!/bin/bash
    # Switch traffic to IRSA-enabled applications
    kubectl patch service my-app -p '{"spec":{"selector":{"version":"irsa"}}}'

  phase4.sh: |
    #!/bin/bash
    # Remove old deployments
    kubectl delete deployment my-app-legacy
    
  phase5.sh: |
    #!/bin/bash
    # Optimize IRSA role permissions (least privilege)
    aws iam detach-role-policy --role-name NewIRSARole --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
    aws iam attach-role-policy --role-name NewIRSARole --policy-arn arn:aws:iam::123456789012:policy/S3SpecificBucketAccess


2. From Hardcoded Credentials to IRSA

# Before: Hardcoded credentials (Anti-pattern)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: legacy-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: my-app:legacy
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: access-key-id
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: aws-credentials
              key: secret-access-key

---
# After: IRSA (Recommended)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: modern-app
spec:
  template:
    spec:
      serviceAccountName: app-service-account
      containers:
      - name: app
        image: my-app:modern
        env:
        - name: AWS_REGION
          value: us-west-2
        # No AWS credentials needed!


3. Blue-Green IRSA Migration



Performance Optimization


1. Regional STS Endpoints

# Configure regional STS endpoints for better performance
apiVersion: v1
kind: ConfigMap
metadata:
  name: aws-config
  namespace: production
data:
  config: |
    [default]
    region = us-west-2
    sts_regional_endpoints = regional
    max_attempts = 3
    retry_mode = adaptive
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: optimized-app
spec:
  template:
    spec:
      containers:
      - name: app
        image: my-app:optimized
        env:
        - name: AWS_CONFIG_FILE
          value: /aws-config/config
        - name: AWS_STS_REGIONAL_ENDPOINTS
          value: regional
        - name: AWS_RETRY_MODE
          value: adaptive
        - name: AWS_MAX_ATTEMPTS
          value: "3"
        volumeMounts:
        - name: aws-config
          mountPath: /aws-config
      volumes:
      - name: aws-config
        configMap:
          name: aws-config


2. Credential Caching Optimization

# Python application with optimized credential caching
import boto3
from botocore.credentials import InstanceMetadataProvider, AssumeRoleWithWebIdentityProvider
from botocore.session import Session

class OptimizedCredentialProvider:
    def __init__(self):
        # Configure session with optimized settings
        self.session = Session()
        
        # Enable credential caching
        self.session.get_component('credential_provider').insert_before(
            'env',
            AssumeRoleWithWebIdentityProvider(
                load_config=lambda: self.session.full_config,
                client_creator=self._create_sts_client,
                cache={},  # Enable caching
                expire_window_seconds=300  # Refresh 5 minutes before expiry
            )
        )
    
    def _create_sts_client(self, region_name=None, **kwargs):
        # Use regional STS endpoints for better performance
        return boto3.client(
            'sts',
            region_name=region_name or 'us-west-2',
            config=boto3.session.Config(
                retries={'max_attempts': 3, 'mode': 'adaptive'},
                max_pool_connections=50
            )
        )
    
    def get_client(self, service_name, **kwargs):
        return self.session.create_client(service_name, **kwargs)

# Usage
provider = OptimizedCredentialProvider()
s3_client = provider.get_client('s3')


3. Connection Pooling and Retries

// Go application with optimized AWS client configuration
package main

import (
    "context"
    "time"
    
    "github.com/aws/aws-sdk-go-v2/config"
    "github.com/aws/aws-sdk-go-v2/service/s3"
    "github.com/aws/aws-sdk-go-v2/feature/rds/auth"
)

func main() {
    // Load config with optimizations
    cfg, err := config.LoadDefaultConfig(context.TODO(),
        config.WithRegion("us-west-2"),
        config.WithRetryMaxAttempts(3),
        config.WithRetryMode(aws.RetryModeAdaptive),
        config.WithHTTPClient(&http.Client{
            Timeout: 30 * time.Second,
            Transport: &http.Transport{
                MaxIdleConns:        100,
                MaxIdleConnsPerHost: 10,
                IdleConnTimeout:     30 * time.Second,
            },
        }),
    )
    
    if err != nil {
        log.Fatal(err)
    }
    
    // Create S3 client with config
    s3Client := s3.NewFromConfig(cfg)
    
    // Use client...
}


4. Monitoring Performance Metrics

# Performance monitoring dashboard
apiVersion: v1
kind: ConfigMap
metadata:
  name: performance-dashboard
data:
  dashboard.json: |
    {
      "dashboard": {
        "title": "IRSA Performance Metrics",
        "panels": [
          {
            "title": "STS Token Exchange Latency",
            "targets": [
              {
                "expr": "histogram_quantile(0.95, rate(aws_sts_assume_role_duration_seconds_bucket[5m]))",
                "legendFormat": "95th percentile"
              },
              {
                "expr": "histogram_quantile(0.50, rate(aws_sts_assume_role_duration_seconds_bucket[5m]))",
                "legendFormat": "50th percentile"
              }
            ]
          },
          {
            "title": "Credential Cache Hit Rate",
            "targets": [
              {
                "expr": "rate(aws_credential_cache_hits_total[5m]) / rate(aws_credential_requests_total[5m]) * 100"
              }
            ]
          },
          {
            "title": "Regional vs Global STS Usage",
            "targets": [
              {
                "expr": "rate(aws_sts_regional_calls_total[5m])",
                "legendFormat": "Regional"
              },
              {
                "expr": "rate(aws_sts_global_calls_total[5m])",
                "legendFormat": "Global"
              }
            ]
          }
        ]
      }
    }


5. Load Testing IRSA

#!/bin/bash
# IRSA load testing script

NAMESPACE="load-test"
REPLICAS=100
DURATION="300s"

echo "Starting IRSA load test with $REPLICAS replicas for $DURATION..."

# Create load test namespace
kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -

# Deploy load test ServiceAccount
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: load-test-sa
  namespace: $NAMESPACE
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/LoadTestRole
EOF

# Deploy load test application
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: irsa-load-test
  namespace: $NAMESPACE
spec:
  replicas: $REPLICAS
  selector:
    matchLabels:
      app: load-test
  template:
    metadata:
      labels:
        app: load-test
    spec:
      serviceAccountName: load-test-sa
      containers:
      - name: load-generator
        image: amazon/aws-cli:latest
        command: ['sh', '-c']
        args:
        - |
          while true; do
            aws sts get-caller-identity > /dev/null
            aws s3 ls > /dev/null
            sleep 1
          done
        resources:
          requests:
            memory: 64Mi
            cpu: 50m
          limits:
            memory: 128Mi
            cpu: 100m
EOF

# Wait for deployment
kubectl wait --for=condition=Available deployment/irsa-load-test -n $NAMESPACE --timeout=300s

echo "Load test running for $DURATION..."
sleep $(echo $DURATION | sed 's/s//')

# Collect metrics
echo "Collecting performance metrics..."
kubectl top pods -n $NAMESPACE

# Cleanup
echo "Cleaning up load test..."
kubectl delete namespace $NAMESPACE

echo "Load test complete!"



References and Additional Resources

Official AWS Documentation

AWS Blog Posts and Whitepapers

Tools and Utilities

Security and Compliance

Monitoring and Observability

Community Resources

Sample Code and Examples