GCP Shared VPC and GKE Cluster Setup Guide

Configure IAM permissions for GKE clusters using Shared VPC architecture

Featured image



Overview

Google Cloud Platform’s Shared VPC is a powerful feature that enables centralized network management while maintaining project-level resource isolation.

This comprehensive guide walks through the process of setting up GKE clusters across different service projects using Shared VPC architecture, with detailed IAM configuration steps.

Shared VPC allows organizations to centrally manage network resources while enabling independent resource operations across service projects.

This approach ensures consistent security policies, systematic network separation across different environments (dev, staging, production), and efficient collaboration across teams.


Why Shared VPC Matters<

Shared VPC is essential for enterprise-grade GCP deployments where network security, cost optimization, and operational efficiency are critical.

It provides a foundation for multi-environment architectures while maintaining security boundaries and enabling centralized network governance.

For organizations running multiple projects and environments, Shared VPC eliminates the complexity of VPC peering while providing superior network control and security posture management.



What is Shared VPC?

Shared VPC allows multiple projects to connect their resources to a common VPC network, enabling secure and efficient communication using internal IP addresses.

This architecture separates network administration from project administration, providing several key benefits:

graph TB subgraph "Organization" subgraph "Host Project" A[VPC Network] B[Subnets] C[Firewall Rules] D[Routes] end subgraph "Service Project 1" E[GKE Cluster] F[Compute Instances] G[Cloud SQL] end subgraph "Service Project 2" H[GKE Cluster] I[App Engine] J[Cloud Functions] end A --> E A --> F A --> G A --> H A --> I A --> J B --> E B --> H end style A fill:#4285f4,color:#fff style E fill:#34a853,color:#fff style H fill:#34a853,color:#fff


Key Benefits of Shared VPC

Benefit Description Business Impact
Centralized Network Management Single point of control for network policies, firewall rules, and routing Reduced operational overhead and consistent security policies
Project Isolation Resources separated by project boundaries while sharing network Clear cost allocation and security boundaries
Simplified Communication Internal IP communication without VPC peering complexity Lower latency and reduced network management complexity
Environment Separation Logical separation of dev, staging, and production environments Improved security and compliance posture
Cost Optimization Shared network resources and reduced data transfer costs Lower overall infrastructure costs



Shared VPC Architecture Patterns


Single Host Project Architecture

The most common Shared VPC pattern uses one host project providing network services to multiple service projects:

graph TB subgraph "Single Host Project Architecture" subgraph "Host Project: somaz-network" HP[VPC Network] HS1[Subnet: dev-subnet] HS2[Subnet: prod-subnet] HF[Firewall Rules] HR[Routes & NAT] end subgraph "Service Project: somaz-sp-dev" SP1[GKE Dev Cluster] SP1VM[Dev VMs] end subgraph "Service Project: somaz-sp-prod" SP2[GKE Prod Cluster] SP2VM[Prod VMs] end HS1 --> SP1 HS1 --> SP1VM HS2 --> SP2 HS2 --> SP2VM HF --> SP1 HF --> SP2 end style HP fill:#4285f4,color:#fff style SP1 fill:#34a853,color:#fff style SP2 fill:#ea4335,color:#fff


Multiple Host Projects Architecture

For organizations requiring environment isolation at the network level:

graph TB subgraph "Multiple Host Projects Architecture" subgraph "Dev Host Project" DHP[Dev VPC Network] DHS[Dev Subnets] end subgraph "Prod Host Project" PHP[Prod VPC Network] PHS[Prod Subnets] end subgraph "Dev Service Projects" DSP1[Frontend Dev] DSP2[Backend Dev] DSP3[Data Dev] end subgraph "Prod Service Projects" PSP1[Frontend Prod] PSP2[Backend Prod] PSP3[Data Prod] end DHS --> DSP1 DHS --> DSP2 DHS --> DSP3 PHS --> PSP1 PHS --> PSP2 PHS --> PSP3 end style DHP fill:#34a853,color:#fff style PHP fill:#ea4335,color:#fff



Prerequisites and Project Setup


Example Project Structure

For this guide, we’ll use the following project structure:

Project Type Project ID Purpose Resources
Host Project somaz-hp Network management and shared resources VPC, Subnets, Firewall Rules, Artifact Registry
Service Project somaz-sp-dev Development environment GKE Cluster, Development workloads
Service Project somaz-sp-prod Production environment GKE Cluster, Production workloads


Required APIs and Service Accounts

Before starting the IAM configuration, ensure the following APIs are enabled and understand the service accounts involved:

APIs to Enable:

# Enable required APIs in all projects
gcloud services enable container.googleapis.com
gcloud services enable compute.googleapis.com
gcloud services enable artifactregistry.googleapis.com
gcloud services enable cloudresourcemanager.googleapis.com

# Verify enabled APIs
gcloud services list --enabled --filter="name:container.googleapis.com OR name:compute.googleapis.com"

Key Service Accounts:

Service Account Format Purpose
Google APIs Service Account <project-number>@cloudservices.gserviceaccount.com Default service account for Google Cloud services
GKE Service Agent service-<project-number>@container-engine-robot.iam.gserviceaccount.com GKE cluster operations and management
Terraform GKE Service Account tf-gke-<cluster-name>-<random>@<project>.iam.gserviceaccount.com Terraform-managed GKE cluster service account



IAM Configuration Step by Step

This section provides a comprehensive walkthrough of IAM permissions required for GKE clusters in Shared VPC environments.


Step 1: Set Environment Variables

First, establish environment variables for the service accounts and project information:


Step 2: Enable Required APIs

Ensure all necessary APIs are enabled in both host and service projects:

# Enable APIs in service project
gcloud services enable container.googleapis.com --project=$SERVICE_PROJECT
gcloud services enable compute.googleapis.com --project=$SERVICE_PROJECT

# Enable APIs in host project
gcloud services enable container.googleapis.com --project=$HOST_PROJECT
gcloud services enable compute.googleapis.com --project=$HOST_PROJECT

# Verify API enablement
gcloud services list --enabled --project=$SERVICE_PROJECT --filter="name:container.googleapis.com"
gcloud services list --enabled --project=$HOST_PROJECT --filter="name:compute.googleapis.com"


Step 3: Grant Kubernetes Engine Service Agent Role

Grant the roles/container.serviceAgent role to both host and service project service accounts:


Step 4: Grant Editor Role to Service Project

Grant editor permissions within the service project:


Step 5: Grant Network User Role

Grant network access permissions on the host project:


Step 6: Grant Additional Required Roles

Grant additional roles needed for full GKE functionality:


Step 7: Organization-Level Permissions for Terraform

For Terraform automation, grant organization-level permissions:


Step 8: Artifact Registry Permissions

After creating the GKE cluster with Terraform, grant Artifact Registry access:



Terraform Integration


Terraform Configuration Example

Here’s a comprehensive Terraform configuration for creating GKE clusters in Shared VPC:

# terraform/main.tf
terraform {
  required_providers {
    google = {
      source  = "hashicorp/google"
      version = "~> 4.0"
    }
  }
}

provider "google" {
  project = var.service_project_id
  region  = var.region
}

# Data sources for existing network resources
data "google_compute_network" "shared_vpc" {
  name    = var.network_name
  project = var.host_project_id
}

data "google_compute_subnetwork" "gke_subnet" {
  name    = var.subnet_name
  project = var.host_project_id
  region  = var.region
}

# GKE Cluster in Shared VPC
resource "google_container_cluster" "shared_vpc_cluster" {
  name     = var.cluster_name
  location = var.region
  project  = var.service_project_id

  # Network configuration for Shared VPC
  network    = data.google_compute_network.shared_vpc.self_link
  subnetwork = data.google_compute_subnetwork.gke_subnet.self_link

  # IP allocation policy for VPC-native cluster
  ip_allocation_policy {
    cluster_secondary_range_name  = var.cluster_secondary_range_name
    services_secondary_range_name = var.services_secondary_range_name
  }

  # Remove default node pool
  remove_default_node_pool = true
  initial_node_count       = 1

  # Network policy configuration
  network_policy {
    enabled = true
  }

  # Workload Identity
  workload_identity_config {
    workload_pool = "${var.service_project_id}.svc.id.goog"
  }

  # Private cluster configuration
  private_cluster_config {
    enable_private_nodes    = true
    enable_private_endpoint = false
    master_ipv4_cidr_block  = var.master_ipv4_cidr_block
  }

  # Master authorized networks
  master_authorized_networks_config {
    dynamic "cidr_blocks" {
      for_each = var.authorized_networks
      content {
        cidr_block   = cidr_blocks.value.cidr_block
        display_name = cidr_blocks.value.display_name
      }
    }
  }
}

# Node pool
resource "google_container_node_pool" "primary_nodes" {
  name       = "${var.cluster_name}-node-pool"
  location   = var.region
  cluster    = google_container_cluster.shared_vpc_cluster.name
  project    = var.service_project_id
  node_count = var.node_count

  node_config {
    preemptible  = var.preemptible
    machine_type = var.machine_type

    # Service account for nodes
    service_account = google_service_account.gke_nodes.email
    oauth_scopes = [
      "https://www.googleapis.com/auth/cloud-platform"
    ]

    # Workload Identity
    workload_metadata_config {
      mode = "GKE_METADATA"
    }
  }

  # Auto-scaling
  autoscaling {
    min_node_count = var.min_node_count
    max_node_count = var.max_node_count
  }

  # Node management
  management {
    auto_repair  = true
    auto_upgrade = true
  }
}

# Service account for GKE nodes
resource "google_service_account" "gke_nodes" {
  account_id   = "tf-gke-${var.cluster_name}-nodes"
  display_name = "GKE Node Service Account for ${var.cluster_name}"
  project      = var.service_project_id
}

# IAM binding for node service account
resource "google_project_iam_binding" "gke_nodes_gcr" {
  project = var.host_project_id
  role    = "roles/artifactregistry.reader"

  members = [
    "serviceAccount:${google_service_account.gke_nodes.email}",
  ]
}


Variables Configuration

# terraform/variables.tf
variable "service_project_id" {
  description = "The service project ID where GKE cluster will be created"
  type        = string
}

variable "host_project_id" {
  description = "The host project ID containing the Shared VPC"
  type        = string
}

variable "region" {
  description = "The region for the GKE cluster"
  type        = string
  default     = "us-central1"
}

variable "network_name" {
  description = "The name of the Shared VPC network"
  type        = string
}

variable "subnet_name" {
  description = "The name of the subnet for GKE cluster"
  type        = string
}

variable "cluster_name" {
  description = "The name of the GKE cluster"
  type        = string
}

variable "cluster_secondary_range_name" {
  description = "The name of the secondary range for cluster IPs"
  type        = string
}

variable "services_secondary_range_name" {
  description = "The name of the secondary range for services"
  type        = string
}

variable "master_ipv4_cidr_block" {
  description = "The IP range for the GKE master"
  type        = string
  default     = "172.16.0.0/28"
}

variable "authorized_networks" {
  description = "List of authorized networks for GKE master"
  type = list(object({
    cidr_block   = string
    display_name = string
  }))
  default = []
}

variable "node_count" {
  description = "Number of nodes in the node pool"
  type        = number
  default     = 3
}

variable "min_node_count" {
  description = "Minimum number of nodes in the node pool"
  type        = number
  default     = 1
}

variable "max_node_count" {
  description = "Maximum number of nodes in the node pool"
  type        = number
  default     = 10
}

variable "machine_type" {
  description = "Machine type for GKE nodes"
  type        = string
  default     = "e2-medium"
}

variable "preemptible" {
  description = "Whether to use preemptible nodes"
  type        = bool
  default     = false
}


Terraform Execution

# Initialize Terraform
terraform init

# Plan the deployment
terraform plan -var-file="environments/dev.tfvars"

# Apply the configuration
terraform apply -var-file="environments/dev.tfvars"

# Verify cluster creation
gcloud container clusters get-credentials cluster-name --region=region --project=service-project-id
kubectl get nodes



Best Practices and Security Guidelines


IAM Security Best Practices

Practice Description Implementation
Principle of Least Privilege Grant only the minimum required permissions Use specific predefined roles instead of primitive roles
Service Account Segregation Create dedicated service accounts for different purposes Separate accounts for GKE nodes, applications, and CI/CD
Regular Access Review Periodically review and audit IAM permissions Use Cloud Asset Inventory and IAM Recommender
Conditional Access Use IAM conditions for time-based or resource-specific access Implement conditions for temporary access or specific resources


Network Security Configuration


Monitoring and Logging



Troubleshooting Common Issues


Permission Denied Errors

Common GKE Shared VPC Issues
  • Cluster creation fails: Verify all service accounts have required roles on host project
  • Node pool creation fails: Check compute.networkUser role for GKE service account
  • Pod networking issues: Verify secondary IP ranges are properly configured
  • Image pull errors: Ensure Artifact Registry reader permissions are granted

Diagnostic Commands


Network Configuration Issues



What’s Next?

After successfully setting up GKE clusters with Shared VPC, consider these advanced topics:

  1. Multi-cluster Service Mesh: Implement Istio across multiple GKE clusters
  2. GitOps with ArgoCD: Set up continuous deployment pipelines
  3. Workload Identity: Secure pod-to-GCP service authentication
  4. Network Policies: Implement fine-grained network security
  5. Cross-project Monitoring: Set up centralized observability


Advanced Configurations

graph LR A[Basic Shared VPC + GKE] --> B[Workload Identity] A --> C[Network Policies] B --> D[Multi-cluster Service Mesh] C --> E[Zero-trust Networking] D --> F[Advanced Observability] E --> F


  1. Master the basics: Ensure solid understanding of Shared VPC and GKE fundamentals
  2. Implement automation: Use Terraform for all infrastructure provisioning
  3. Security hardening: Apply network policies and Workload Identity
  4. Observability: Set up comprehensive monitoring and logging
  5. Advanced networking: Explore service mesh and multi-cluster architectures



References