Installing Modern Monitoring Stack into Kubernetes

Documentation

Devops

Overview

This installs Prometheus, Grafana, Loki and Alloy into the Kubernetes cluster. Prometheus will handle the TSDB metrics gathering, Loki will handle log ingestion, Alloy will handle log collection and data manipulation, and Grafana will display the nice graphs and dashboards we are looking for.

Grafana Admin Credentials

We will be supplying the default credentials to Grafana by using a secret we have created inside our Hashicorp Vault cluster. In order to provide this to Kubernetes, we will create a secret using the External Secrets Operator. This will create a Kubernetes secret from Vault, and keep it up to date should we wish to change these credentials in the future. This is a backup login method if our OAuth doesn’t work, or we if want to provide some API access to things like Homepage.

We won’t go into creating the credentials inside Vault, but suffice it to say we have a key called grafana-creds created with a username and password field in our Vault. We will create a file called grafana-creds.yaml which we will apply to our cluster to form the ESO secret.

grafana-creds.yaml

---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: weepynet-grafana-creds
spec:
  secretStoreRef:
    name: vault-backend
  target:
    name: weepynet-grafana-creds 
  data:
    - secretKey: username
      remoteRef:
        key: weepynet/data/grafana-creds
        property: username
    - secretKey: password
      remoteRef:
        key: weepynet/data/grafana-creds
        property: password

We will then apply this secret:

kubectl apply -f grafana-creds.yaml

We can then check that this secret has been created correctly. Running the below command should show you all the details of the secret:

kubectl get secret weepynet-grafana-creds -o yaml

kube-prometheus-stack Installation

We will be creating a values file for this helm setup so that we can tweak some of the default settings. For our setup, we are going to assign a 10gb local persistent volume, enable the image renderer, as well as provide default admin creds from an external secret in our vault. Further, we will enable OAuth to our Authentik instance for SSO.

values.yaml

grafana:
  assertNoLeakedSecrets: false
  persistence:
    type: pvc
    enabled: true
    accessModes:
      - ReadWriteMany
    size: 10Gi

  admin:
    existingSecret: "weepynet-grafana-creds"
    userKey: username
    passwordKey: password

  imageRenderer:
    deploymentStrategy: {}
    enabled: true
    replicas: 1

  grafana.ini:
    auth:
      signout_redirect_url: "https://auth.weepynet.com/application/o/grafana/end-session/"
      oauth_auto_login: true
    auth.basic_auth:
      enabled: true
    auth.generic_oauth:
      name: authentik
      enabled: true
      client_id: "redacted"
      client_secret: "redacted"
      scopes: "openid profile email"
      auth_url: "https://auth.weepynet.com/application/o/authorize/"
      token_url: "https://auth.weepynet.com/application/o/token/"
      api_url: "https://auth.weepynet.com/application/o/userinfo/"
      role_attribute_path: contains(groups, 'Grafana Admins') && 'Admin' || contains(groups, 'Grafana Editors') && 'Editor' || 'Viewer'
    server:
      root_url: "https://grafana.weepynet.com"

  additionalDataSources:
    - name: Loki
	  type: loki
	  access: proxy
	  url: http://weepy-loki-gateway.default.svc.cluster.local/
	  jsonData:
	    timeout: 60
		maxLines: 1000

Setting up the OAuth stuff for Authentik is beyond the scope of this document. What is relevant, is that we provide the additionalDataSources section, which will prepopulate Grafana with the Loki datasource automatically.

Install the helm chart

helm install weepy-prometheus oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack -f values.yaml

Next, we will need to create an Ingress Route for our Grafana instance.

grafana-ingress.yaml

---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: grafana
spec:
  entryPoints:
    - web
    - websecure
  routes:
    - match: Host(`grafana.weepynet.com`)
      kind: Rule
      services:
        - name: weepy-prometheus-grafana
          port: 80
      middlewares:
        - name: redirect-https

We can then apply it to the cluster

kubectl apply -f grafana-ingress.yaml

Loki Installation

First we will need to create a loki values files

loki-values.yaml

loki:
  auth_enabled: false
  commonConfig:
    replication_factor: 1
  schemaConfig:
    configs:
      - from: "2024-04-01"
        store: tsdb
        object_store: s3
        schema: v13
        index:
          prefix: loki_index_
          period: 24h
  pattern_ingester:
      enabled: true
  limits_config:
    allow_structured_metadata: true
    volume_enabled: true
  ruler:
    enable_api: true

minio:
  enabled: true

deploymentMode: SingleBinary

singleBinary:
  replicas: 1

# Zero out replica counts of other deployment modes
backend:
  replicas: 0
read:
  replicas: 0
write:
  replicas: 0

ingester:
  replicas: 0
querier:
  replicas: 0
queryFrontend:
  replicas: 0
queryScheduler:
  replicas: 0
distributor:
  replicas: 0
compactor:
  replicas: 0
indexGateway:
  replicas: 0
bloomCompactor:
  replicas: 0
bloomGateway:
  replicas: 0

This is a standard, singleton instance of Loki. There are more complex deployment scenarios available, but this suits what we’re doing well.

Then we will install it with helm

helm install weepy-loki grafana/loki -f loki-values.yaml

We can test sending data to loki using curl. We can exec into any pod running in the cluster, and issue a curl call http://127.0.0.1:3100/loki/api/v1/push to the Loki service URL as shown below:

curl -H "Content-Type: application/json" -XPOST -s "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/push"  \
--data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}"

Then verify that Loki did receive the data using the following command:

curl "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/query_range" --data-urlencode 'query={job="test"}' | jq .data.result

Internal cluster Loki endpoint (push logs to this address):

http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/push

Grafana Alloy Installation

We will create the values file. We are going to be disabling the default config, we will supply that manually via a config map. We will also enable mounting the host /var/log into the container so we can get the pod logs natively.

alloy-values.yaml

alloy:
  configMap:
    create: false

  mounts:
    varlog: true
    dockercontainers: true

Helm install

helm install weepy-alloy grafana/alloy -f alloy-values.yaml

We will next need to create the alloy.config file which will provide alloy with necessary endpoints to send the logs to Loki.

This config file creates a Loki writer configuration to send the logs to. It then has a set of rules to read log files, perform some mutation of labels, and sends it off to Loki. It also scraped pod data from the Kubernetes API so you can have native access to logs from pods, which is really convenient.

alloy.config

// Loki Server
loki.write "weepyloki" {
  endpoint {
    url = "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/push"
    // basic_auth {
      // username = "<USERNAME>"
      // password = "<PASSWORD>"
    // }
  }
}

// Local Node Log Ingestion

// local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
local.file_match "node_logs" {
  path_targets = [{
      // Monitor syslog to scrape node-logs
      __path__  = "/var/log/syslog",
      job       = "node/syslog",
      node_name = sys.env("HOSTNAME"),
      cluster   = "weepy-cluster",
  }]
}

// loki.source.file reads log entries from files and forwards them to other loki.* components.
// You can specify multiple loki.source.file components by giving them different labels.
loki.source.file "node_logs" {
  targets    = local.file_match.node_logs.targets
  forward_to = [loki.write.weepyloki.receiver]
}

// Kubernetes Pod Scraping

// discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
// It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
discovery.kubernetes "pod" {
  role = "pod"
}

// discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
// If no rules are defined, then the input targets are exported as-is.
discovery.relabel "pod_logs" {
  targets = discovery.kubernetes.pod.targets

  // Label creation - "namespace" field from "__meta_kubernetes_namespace"
  rule {
    source_labels = ["__meta_kubernetes_namespace"]
    action = "replace"
    target_label = "namespace"
  }

  // Label creation - "pod" field from "__meta_kubernetes_pod_name"
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    action = "replace"
    target_label = "pod"
  }

  // Label creation - "container" field from "__meta_kubernetes_pod_container_name"
  rule {
    source_labels = ["__meta_kubernetes_pod_container_name"]
    action = "replace"
    target_label = "container"
  }

  // Label creation -  "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
  rule {
    source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
    action = "replace"
    target_label = "app"
  }

  // Label creation -  "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
  // Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
  rule {
    source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
    action = "replace"
    target_label = "job"
    separator = "/"
    replacement = "$1"
  }

  // Label creation - "__path__" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
  // Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
  rule {
    source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
    action = "replace"
    target_label = "__path__"
    separator = "/"
    replacement = "/var/log/pods/*$1/*.log"
  }

  // Label creation -  "container_runtime" field from "__meta_kubernetes_pod_container_id"
  rule {
    source_labels = ["__meta_kubernetes_pod_container_id"]
    action = "replace"
    target_label = "container_runtime"
    regex = "^(\\S+):\\/\\/.+$"
    replacement = "$1"
  }
}

// loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
loki.source.kubernetes "pod_logs" {
  targets    = discovery.relabel.pod_logs.output
  forward_to = [loki.process.pod_logs.receiver]
}

// loki.process receives log entries from other Loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component's arguments.
loki.process "pod_logs" {
  stage.static_labels {
      values = {
        cluster = "weepy-cluster",
      }
  }

  forward_to = [loki.write.weepyloki.receiver]
}

// Kubernetes Cluster Events

// loki.source.kubernetes_events tails events from the Kubernetes API and converts them
// into log lines to forward to other Loki components.
loki.source.kubernetes_events "cluster_events" {
  job_name   = "integrations/kubernetes/eventhandler"
  log_format = "logfmt"
  forward_to = [
    loki.process.cluster_events.receiver,
  ]
}

// loki.process receives log entries from other loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component's arguments.
loki.process "cluster_events" {
  forward_to = [loki.write.weepyloki.receiver]

  stage.static_labels {
    values = {
      cluster = "weepy-cluster",
    }
  }

  stage.labels {
    values = {
      kubernetes_cluster_events = "job",
    }
  }
}

We will now create the config map from this file. The name of the configmap needs to match the helm installation target. In our example, we installed to weepy-alloy as the name of the instance, therefore the configmap needs to be called weepy-alloy.

kubectl create configmap weepy-alloy "--from-file=config.alloy=./config.alloy"

At this point we can login to our Grafana instance, and we should be able to see all the metrics and Logs coming from our cluster!

Installing Authentik into Kubernetes