Installing Prometheus and Thanos with Helm

A step-by-step guide to setting up Prometheus and Thanos monitoring stack

Featured image

Image Reference



Installation Guide for Prometheus and Thanos

This guide provides detailed, step-by-step instructions for installing and configuring Prometheus and Thanos using Helm charts in Kubernetes environments. Learn how to set up a scalable monitoring infrastructure for both single-cluster and multi-cluster deployments with long-term metrics storage and unified querying capabilities.

Introduction and Prerequisites

Before You Begin

This guide follows our previous post about Prometheus and Thanos, focusing on the practical implementation aspects. Before proceeding with the installation, ensure you have the following prerequisites in place:

  • Kubernetes Cluster: Running cluster with proper networking configuration
  • Storage: Storage provisioner configured for persistent volumes
  • Object Storage: Access to S3, GCS, MinIO, or other compatible object storage for Thanos
  • Helm: Helm 3.x installed and configured
  • kubectl: Properly configured with access to your cluster
  • Namespace: A namespace for your monitoring stack (we'll use "monitoring" in this guide)

If you're setting up a multi-cluster configuration, ensure you have network connectivity between clusters and the proper DNS configuration in place.



Installing Prometheus

Prometheus can be installed using different Helm charts depending on your requirements. This section covers two popular approaches: using the prometheus-community chart for basic installations and kube-prometheus-stack for a more comprehensive monitoring solution with additional components.

Available Installation Methods

graph TD A[Prometheus Installation Methods] --> B[prometheus-community Chart] A --> C[kube-prometheus-stack Chart] B --> D[Basic Prometheus installation] B --> E[Customizable components] B --> F[Lighter weight] C --> G[Complete monitoring solution] C --> H[Includes Grafana, Alertmanager] C --> I[Pre-configured alerts and dashboards] C --> J[Uses Prometheus Operator CRDs] style A fill:#f5f5f5,stroke:#333,stroke-width:1px style B fill:#a5d6a7,stroke:#333,stroke-width:1px style C fill:#64b5f6,stroke:#333,stroke-width:1px style D fill:#ffcc80,stroke:#333,stroke-width:1px style E fill:#ffcc80,stroke:#333,stroke-width:1px style F fill:#ffcc80,stroke:#333,stroke-width:1px style G fill:#ce93d8,stroke:#333,stroke-width:1px style H fill:#ce93d8,stroke:#333,stroke-width:1px style I fill:#ce93d8,stroke:#333,stroke-width:1px style J fill:#ce93d8,stroke:#333,stroke-width:1px
Chart Description
prometheus-community
  • Basic Prometheus installation with core components
  • More granular control over individual components
  • Good for specific use cases with custom requirements
  • Lighter weight with less resource consumption
kube-prometheus-stack
  • Comprehensive monitoring solution using the Prometheus Operator
  • Includes Grafana, Alertmanager, and various exporters
  • Pre-configured alerts and dashboards for Kubernetes monitoring
  • Better integration with Thanos for scalable monitoring
  • Uses Custom Resource Definitions for declarative configuration

Installing Prometheus CRDs

Before installing Prometheus with the operator-based approach, you need to install the required Custom Resource Definitions (CRDs):

Installing Required CRDs

Execute these commands to install the Prometheus Operator CRDs:

# Add the Prometheus community repository
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# Install CRDs in the monitoring namespace
helm install prometheus-operator-crds -n monitoring prometheus-community/prometheus-operator-crds

# Verify the installation
kubectl get crd | grep monitoring
Why CRDs First?

Installing CRDs separately before the main chart ensures:

  • Clean upgrades without CRD-related conflicts
  • Better management of CRD lifecycle
  • Prevention of accidental CRD deletion during chart uninstallation

Option 1: Installing Prometheus Community Chart

Configuring Prometheus Community Chart

Create a values file (values/somaz.yaml) with the following configuration:

server:
  name: server
  image:
    repository: quay.io/prometheus/prometheus
  persistentVolume:
    enabled: true
    accessModes:
      - ReadWriteOnce
    storageClass: "default"
  replicaCount: 1
  statefulSet:
    enabled: false
  service:
    enabled: true
    type: NodePort

alertmanager:
  enabled: true
  persistence:
    size: 2Gi

kube-state-metrics:
  enabled: true

prometheus-node-exporter:
  enabled: true

prometheus-pushgateway:
  enabled: true
  serviceAnnotations:
    prometheus.io/probe: pushgateway
Installation Commands

Install Prometheus using these commands:

# Install Prometheus
helm install prometheus-community prometheus-community/prometheus -n monitoring -f ./values/somaz.yaml --create-namespace

# Upgrade if needed
helm upgrade prometheus-community prometheus-community/prometheus -n monitoring -f ./values/somaz.yaml

Option 2: Installing Kube-Prometheus-Stack

Configuring Kube-Prometheus-Stack

Create a values file (values/somaz.yaml) with this configuration for Thanos integration:

alertmanager:
  enabled: true
  config:
    global:
      resolve_timeout: 5m
  service:
    type: NodePort

grafana:
  enabled: false

prometheus:
  enabled: true
  thanosService:
    enabled: true
  thanosServiceMonitor:
    enabled: true
  thanosServiceExternal:
    enabled: true
    type: NodePort
  ingress:
    enabled: true
    ingressClassName: nginx
    hosts:
      - prometheus.somaz.link
  prometheusSpec:
    replicas: 1
    thanos:
      baseImage: quay.io/thanos/thanos
      version: v0.36.1
      objectStorageConfig:
        existingSecret:
          name: thanos-objstore
          key: objstore.yml
    storageSpec: 
      volumeClaimTemplate:
        spec:
          storageClassName: default
          accessModes: ["ReadWriteOnce"]
          resources:
            requests:
              storage: 100Gi
    externalLabels:
      provider: somaz
      region: seoul
      cluster: mgmt
Key Configuration Points

This configuration:

  • Enables the Thanos sidecar for long-term storage integration
  • References a secret (thanos-objstore) containing object storage configuration
  • Adds external labels to identify metrics from this cluster
  • Configures ingress for accessing Prometheus UI
  • Sets up persistent storage for metrics data
Installation Commands

Install the Kube-Prometheus-Stack using these commands:

# Verify the configuration
helm lint --values ./values/somaz.yaml

# Install the stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring -f ./values/somaz.yaml --create-namespace

# Upgrade if needed
helm upgrade kube-prometheus-stack prometheus-community/kube-prometheus-stack -n monitoring -f ./values/somaz.yaml
sequenceDiagram participant User as You participant Helm as Helm participant K8s as Kubernetes API participant Operator as Prometheus Operator participant Prometheus as Prometheus Pods User->>Helm: helm install kube-prometheus-stack Helm->>K8s: Apply CRDs and resources K8s->>Operator: Create Operator Pod Operator->>K8s: Prometheus CR created Operator->>Prometheus: Create and configure Prometheus Prometheus->>User: Ready for monitoring



Thanos Object Storage Configuration

Thanos requires object storage for long-term metrics retention. This section explains how to configure the object storage secret that both Prometheus and Thanos components will use.

Creating the Object Storage Secret

Object Storage Configuration

Create a file named objstore.yml with your storage provider configuration:

type: s3
config:
  bucket: thanos
  endpoint: minio.storage.svc.cluster.local:9000
  access_key: minioadmin
  secret_key: minioadmin
  insecure: true
Storage Provider Options

Thanos supports various object storage providers:

  • AWS S3: For production environments in AWS
  • Google Cloud Storage: For GCP deployments
  • Azure Blob Storage: For Azure environments
  • MinIO: For on-premises or testing (as shown in example)
  • Others: Alibaba OSS, Tencent COS, OpenStack Swift, etc.

Adjust the configuration according to your chosen provider.

Creating the Kubernetes Secret

Create a Kubernetes secret with your object storage configuration:

# Create the secret in the monitoring namespace
kubectl create secret generic thanos-objstore -n monitoring --from-file=objstore.yml



Installing Thanos

Thanos extends Prometheus with long-term storage, high availability, and global query capabilities. This section covers deploying Thanos components using the Bitnami Helm chart for both single-cluster and multi-cluster scenarios.

Single-Cluster Thanos Configuration

Internal Cluster Setup

Create a values file (values/somaz.yaml) for a single-cluster Thanos deployment:

global:
  defaultStorageClass: "default"

fullnameOverride: "thanos"
clusterDomain: somaz-cluster.local

existingObjstoreSecret: "thanos-objstore"

query:
  enabled: true
  logLevel: debug
  replicaLabel:
    - prometheus_replica
  stores:
    - dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.somaz-cluster.local
  ingress:
    enabled: true
    hostname: thanos-query.somaz.link
    ingressClassName: nginx

queryFrontend:
  enabled: true
  ingress:
    enabled: true
    hostname: thanos.somaz.link

compactor:
  enabled: true
  retentionResolutionRaw: 60d
  retentionResolution5m: 60d
  retentionResolution1h: 1y
  persistence:
    enabled: true
    size: 10Gi

storegateway:
  enabled: true
  persistence:
    enabled: true
    size: 10Gi
Component Explanations

This configuration sets up the following Thanos components:

  • Query: The frontend for querying metrics from both local Prometheus and object storage
  • Query Frontend: Provides advanced query optimization and caching
  • Compactor: Handles data compaction and downsampling in object storage
  • Store Gateway: Accesses historical metrics in object storage

The stores configuration connects to your Prometheus Thanos sidecar using DNS service discovery.

graph TD A[Prometheus + Thanos Sidecar] -->|Upload metrics| B[Object Storage] A -->|Serve real-time metrics| C[Thanos Query] D[Thanos Store Gateway] -->|Fetch historical data| B D -->|Serve historical metrics| C E[Thanos Compactor] -->|Compact & downsample| B C -->|Serve metrics API| F[Thanos Query Frontend] F -->|Expose UI & API| G[Users/Grafana] style A fill:#a5d6a7,stroke:#333,stroke-width:1px style B fill:#64b5f6,stroke:#333,stroke-width:1px style C fill:#ffcc80,stroke:#333,stroke-width:1px style D fill:#ce93d8,stroke:#333,stroke-width:1px style E fill:#ef9a9a,stroke:#333,stroke-width:1px style F fill:#9fa8da,stroke:#333,stroke-width:1px style G fill:#f5f5f5,stroke:#333,stroke-width:1px

Multi-Cluster Thanos Configuration

External Cluster Setup

For external clusters, create a values file (values/somaz-externalcluster.yaml):

query:
  enabled: true
  stores:
    - dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.external-cluster.local
  ingress:
    grpc:
      enabled: true
      hostname: external-cluster-thanos-query.somaz.link
      ingressClassName: "nginx"
      annotations:
        nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
Primary Cluster with Multi-Cluster Configuration

Update the main Thanos Query configuration to include the external cluster:

query:
  stores:
    - dnssrv+_grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.somaz-cluster.local
    - dnssrv+_grpc._tcp.thanos-multicluster-query-grpc.monitoring.svc.somaz-cluster.local
Multi-Cluster Architecture

In this setup:

  • Each cluster runs Prometheus with Thanos sidecar uploading to shared object storage
  • External clusters expose their Thanos Query endpoints via gRPC
  • The primary cluster's Thanos Query connects to both local and external endpoints
  • All metrics are deduplicated and unified in the primary Thanos Query Frontend
graph TD subgraph "Cluster 1 (Primary)" A1[Prometheus] -->|Upload| B[Object Storage] A1 -->|Serve| C1[Thanos Query 1] D1[Store Gateway] -->|Fetch| B D1 -->|Serve| C1 C1 -->|Query| F[Query Frontend] end subgraph "Cluster 2 (External)" A2[Prometheus] -->|Upload| B A2 -->|Serve| C2[Thanos Query 2] end C1 -.->|Connect| C2 F -->|Serve| G[Users/Grafana] style A1 fill:#a5d6a7,stroke:#333,stroke-width:1px style A2 fill:#a5d6a7,stroke:#333,stroke-width:1px style B fill:#64b5f6,stroke:#333,stroke-width:1px style C1 fill:#ffcc80,stroke:#333,stroke-width:1px style C2 fill:#ffcc80,stroke:#333,stroke-width:1px style D1 fill:#ce93d8,stroke:#333,stroke-width:1px style F fill:#9fa8da,stroke:#333,stroke-width:1px style G fill:#f5f5f5,stroke:#333,stroke-width:1px

Installing Thanos Components

Installation Commands

Install Thanos using the Bitnami Helm chart:

# Add the Bitnami repository
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update

# Verify the configuration
helm lint --values ./values/somaz.yaml

# Install Thanos
helm install thanos bitnami/thanos -n monitoring -f ./values/somaz.yaml --create-namespace

# Upgrade if needed
helm upgrade thanos bitnami/thanos -n monitoring -f ./values/somaz.yaml



Verification and Troubleshooting

After installation, verify that all components are working correctly and troubleshoot any issues that may arise. This section provides guidance on common verification steps and troubleshooting techniques.

Verifying the Installation

Check Components Status

Verify that all pods are running correctly:

# Check Prometheus and related components
kubectl get pods -n monitoring -l app=prometheus

# Check Thanos components
kubectl get pods -n monitoring -l app.kubernetes.io/name=thanos

# Check service endpoints
kubectl get svc -n monitoring

DNS and Connectivity Troubleshooting

Testing DNS Resolution

Verify DNS resolution for service discovery:

# Check DNS resolution for Thanos discovery endpoints
kubectl run -it --rm --image=nicolaka/netshoot dns-test --restart=Never -- dig _grpc._tcp.kube-prometheus-stack-thanos-discovery.monitoring.svc.somaz-cluster.local

# Verify endpoints are registered
kubectl get ep -n monitoring | grep thanos-discovery
Common Issues and Solutions
  • DNS Resolution Failures: Ensure CoreDNS is working correctly and service names are accurate
  • Thanos Query Cannot Find Stores: Verify store endpoints and network connectivity
  • Object Storage Access Issues: Check credentials and endpoint configuration
  • No Metrics in Thanos Query: Verify Prometheus external labels and proper sidecar configuration
  • Ingress Connectivity Problems: Check ingress controller logs and annotations

Accessing User Interfaces

Component UIs

Access the following UIs to verify your installation:

  • Prometheus UI: http://prometheus.somaz.link
  • Thanos Query UI: http://thanos-query.somaz.link
  • Thanos Frontend UI: http://thanos.somaz.link



Next Steps and Advanced Topics

Once your Prometheus and Thanos installation is complete and verified, you can expand your monitoring infrastructure with additional components and integrations.

Upcoming Content

In Our Next Post

We'll explore additional monitoring capabilities including:

  • Loki with Promtail: Logging solution that integrates with your monitoring stack
  • Node Feature Discovery: Enhancing Kubernetes node capabilities detection
  • Grafana Dashboards: Creating comprehensive visualization dashboards

Advanced Configuration Options

Fine-tuning Your Installation

Consider these advanced configurations for production environments:

  • Setting up high availability for Prometheus and Thanos components
  • Configuring detailed alert rules and notification channels
  • Implementing retention policies and downsampling strategies
  • Adding custom exporters for application-specific metrics
  • Integrating with existing notification systems (PagerDuty, Slack, etc.)



Key Points

💡 Installation Summary
  • Installation Approach
    - kube-prometheus-stack for comprehensive monitoring with Operator
    - prometheus-community chart for lightweight, customizable setups
    - Bitnami Thanos chart for long-term storage and multi-cluster capabilities
  • Key Components
    - Prometheus with Thanos sidecar for metrics collection
    - Object storage backend for long-term metrics retention
    - Thanos Query for unified metric access
    - Thanos Store Gateway for historical data access
  • Multi-Cluster Setup
    - Shared object storage between clusters
    - Federated query across cluster boundaries
    - Centralized view of all metrics
    - Seamless scaling as your infrastructure grows



References