Installing Modern Monitoring Stack into Kubernetes
Overview
This installs Prometheus, Grafana, Loki and Alloy into the Kubernetes cluster. Prometheus will handle the TSDB metrics gathering, Loki will handle log ingestion, Alloy will handle log collection and data manipulation, and Grafana will display the nice graphs and dashboards we are looking for.
Grafana Admin Credentials
We will be supplying the default credentials to Grafana by using a secret we have created inside our Hashicorp Vault cluster. In order to provide this to Kubernetes, we will create a secret using the External Secrets Operator. This will create a Kubernetes secret from Vault, and keep it up to date should we wish to change these credentials in the future. This is a backup login method if our OAuth doesn’t work, or we if want to provide some API access to things like Homepage.
We won’t go into creating the credentials inside Vault, but suffice it to say we have a key called grafana-creds created with a username and password field in our Vault. We will create a file called grafana-creds.yaml which we will apply to our cluster to form the ESO secret.
grafana-creds.yaml
---
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
name: weepynet-grafana-creds
spec:
secretStoreRef:
name: vault-backend
target:
name: weepynet-grafana-creds
data:
- secretKey: username
remoteRef:
key: weepynet/data/grafana-creds
property: username
- secretKey: password
remoteRef:
key: weepynet/data/grafana-creds
property: passwordWe will then apply this secret:
kubectl apply -f grafana-creds.yamlWe can then check that this secret has been created correctly. Running the below command should show you all the details of the secret:
kubectl get secret weepynet-grafana-creds -o yamlkube-prometheus-stack Installation
We will be creating a values file for this helm setup so that we can tweak some of the default settings. For our setup, we are going to assign a 10gb local persistent volume, enable the image renderer, as well as provide default admin creds from an external secret in our vault. Further, we will enable OAuth to our Authentik instance for SSO.
values.yaml
grafana:
assertNoLeakedSecrets: false
persistence:
type: pvc
enabled: true
accessModes:
- ReadWriteMany
size: 10Gi
admin:
existingSecret: "weepynet-grafana-creds"
userKey: username
passwordKey: password
imageRenderer:
deploymentStrategy: {}
enabled: true
replicas: 1
grafana.ini:
auth:
signout_redirect_url: "https://auth.weepynet.com/application/o/grafana/end-session/"
oauth_auto_login: true
auth.basic_auth:
enabled: true
auth.generic_oauth:
name: authentik
enabled: true
client_id: "redacted"
client_secret: "redacted"
scopes: "openid profile email"
auth_url: "https://auth.weepynet.com/application/o/authorize/"
token_url: "https://auth.weepynet.com/application/o/token/"
api_url: "https://auth.weepynet.com/application/o/userinfo/"
role_attribute_path: contains(groups, 'Grafana Admins') && 'Admin' || contains(groups, 'Grafana Editors') && 'Editor' || 'Viewer'
server:
root_url: "https://grafana.weepynet.com"
additionalDataSources:
- name: Loki
type: loki
access: proxy
url: http://weepy-loki-gateway.default.svc.cluster.local/
jsonData:
timeout: 60
maxLines: 1000Setting up the OAuth stuff for Authentik is beyond the scope of this document. What is relevant, is that we provide the additionalDataSources section, which will prepopulate Grafana with the Loki datasource automatically.
Install the helm chart
helm install weepy-prometheus oci://ghcr.io/prometheus-community/charts/kube-prometheus-stack -f values.yamlNext, we will need to create an Ingress Route for our Grafana instance.
grafana-ingress.yaml
---
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: grafana
spec:
entryPoints:
- web
- websecure
routes:
- match: Host(`grafana.weepynet.com`)
kind: Rule
services:
- name: weepy-prometheus-grafana
port: 80
middlewares:
- name: redirect-httpsWe can then apply it to the cluster
kubectl apply -f grafana-ingress.yamlLoki Installation
First we will need to create a loki values files
loki-values.yaml
loki:
auth_enabled: false
commonConfig:
replication_factor: 1
schemaConfig:
configs:
- from: "2024-04-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
pattern_ingester:
enabled: true
limits_config:
allow_structured_metadata: true
volume_enabled: true
ruler:
enable_api: true
minio:
enabled: true
deploymentMode: SingleBinary
singleBinary:
replicas: 1
# Zero out replica counts of other deployment modes
backend:
replicas: 0
read:
replicas: 0
write:
replicas: 0
ingester:
replicas: 0
querier:
replicas: 0
queryFrontend:
replicas: 0
queryScheduler:
replicas: 0
distributor:
replicas: 0
compactor:
replicas: 0
indexGateway:
replicas: 0
bloomCompactor:
replicas: 0
bloomGateway:
replicas: 0This is a standard, singleton instance of Loki. There are more complex deployment scenarios available, but this suits what we’re doing well.
Then we will install it with helm
helm install weepy-loki grafana/loki -f loki-values.yamlWe can test sending data to loki using curl. We can exec into any pod running in the cluster, and issue a curl call http://127.0.0.1:3100/loki/api/v1/push to the Loki service URL as shown below:
curl -H "Content-Type: application/json" -XPOST -s "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/push" \
--data-raw "{\"streams\": [{\"stream\": {\"job\": \"test\"}, \"values\": [[\"$(date +%s)000000000\", \"fizzbuzz\"]]}]}"Then verify that Loki did receive the data using the following command:
curl "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/query_range" --data-urlencode 'query={job="test"}' | jq .data.resultInternal cluster Loki endpoint (push logs to this address):
http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/pushGrafana Alloy Installation
We will create the values file. We are going to be disabling the default config, we will supply that manually via a config map. We will also enable mounting the host /var/log into the container so we can get the pod logs natively.
alloy-values.yaml
alloy:
configMap:
create: false
mounts:
varlog: true
dockercontainers: trueHelm install
helm install weepy-alloy grafana/alloy -f alloy-values.yamlWe will next need to create the alloy.config file which will provide alloy with necessary endpoints to send the logs to Loki.
This config file creates a Loki writer configuration to send the logs to. It then has a set of rules to read log files, perform some mutation of labels, and sends it off to Loki. It also scraped pod data from the Kubernetes API so you can have native access to logs from pods, which is really convenient.
alloy.config
// Loki Server
loki.write "weepyloki" {
endpoint {
url = "http://weepy-loki-gateway.default.svc.cluster.local/loki/api/v1/push"
// basic_auth {
// username = "<USERNAME>"
// password = "<PASSWORD>"
// }
}
}
// Local Node Log Ingestion
// local.file_match discovers files on the local filesystem using glob patterns and the doublestar library. It returns an array of file paths.
local.file_match "node_logs" {
path_targets = [{
// Monitor syslog to scrape node-logs
__path__ = "/var/log/syslog",
job = "node/syslog",
node_name = sys.env("HOSTNAME"),
cluster = "weepy-cluster",
}]
}
// loki.source.file reads log entries from files and forwards them to other loki.* components.
// You can specify multiple loki.source.file components by giving them different labels.
loki.source.file "node_logs" {
targets = local.file_match.node_logs.targets
forward_to = [loki.write.weepyloki.receiver]
}
// Kubernetes Pod Scraping
// discovery.kubernetes allows you to find scrape targets from Kubernetes resources.
// It watches cluster state and ensures targets are continually synced with what is currently running in your cluster.
discovery.kubernetes "pod" {
role = "pod"
}
// discovery.relabel rewrites the label set of the input targets by applying one or more relabeling rules.
// If no rules are defined, then the input targets are exported as-is.
discovery.relabel "pod_logs" {
targets = discovery.kubernetes.pod.targets
// Label creation - "namespace" field from "__meta_kubernetes_namespace"
rule {
source_labels = ["__meta_kubernetes_namespace"]
action = "replace"
target_label = "namespace"
}
// Label creation - "pod" field from "__meta_kubernetes_pod_name"
rule {
source_labels = ["__meta_kubernetes_pod_name"]
action = "replace"
target_label = "pod"
}
// Label creation - "container" field from "__meta_kubernetes_pod_container_name"
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "container"
}
// Label creation - "app" field from "__meta_kubernetes_pod_label_app_kubernetes_io_name"
rule {
source_labels = ["__meta_kubernetes_pod_label_app_kubernetes_io_name"]
action = "replace"
target_label = "app"
}
// Label creation - "job" field from "__meta_kubernetes_namespace" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_namespace/__meta_kubernetes_pod_container_name
rule {
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "job"
separator = "/"
replacement = "$1"
}
// Label creation - "__path__" field from "__meta_kubernetes_pod_uid" and "__meta_kubernetes_pod_container_name"
// Concatenate values __meta_kubernetes_pod_uid/__meta_kubernetes_pod_container_name.log
rule {
source_labels = ["__meta_kubernetes_pod_uid", "__meta_kubernetes_pod_container_name"]
action = "replace"
target_label = "__path__"
separator = "/"
replacement = "/var/log/pods/*$1/*.log"
}
// Label creation - "container_runtime" field from "__meta_kubernetes_pod_container_id"
rule {
source_labels = ["__meta_kubernetes_pod_container_id"]
action = "replace"
target_label = "container_runtime"
regex = "^(\\S+):\\/\\/.+$"
replacement = "$1"
}
}
// loki.source.kubernetes tails logs from Kubernetes containers using the Kubernetes API.
loki.source.kubernetes "pod_logs" {
targets = discovery.relabel.pod_logs.output
forward_to = [loki.process.pod_logs.receiver]
}
// loki.process receives log entries from other Loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component's arguments.
loki.process "pod_logs" {
stage.static_labels {
values = {
cluster = "weepy-cluster",
}
}
forward_to = [loki.write.weepyloki.receiver]
}
// Kubernetes Cluster Events
// loki.source.kubernetes_events tails events from the Kubernetes API and converts them
// into log lines to forward to other Loki components.
loki.source.kubernetes_events "cluster_events" {
job_name = "integrations/kubernetes/eventhandler"
log_format = "logfmt"
forward_to = [
loki.process.cluster_events.receiver,
]
}
// loki.process receives log entries from other loki components, applies one or more processing stages,
// and forwards the results to the list of receivers in the component's arguments.
loki.process "cluster_events" {
forward_to = [loki.write.weepyloki.receiver]
stage.static_labels {
values = {
cluster = "weepy-cluster",
}
}
stage.labels {
values = {
kubernetes_cluster_events = "job",
}
}
}We will now create the config map from this file. The name of the configmap needs to match the helm installation target. In our example, we installed to weepy-alloy as the name of the instance, therefore the configmap needs to be called weepy-alloy.
kubectl create configmap weepy-alloy "--from-file=config.alloy=./config.alloy"At this point we can login to our Grafana instance, and we should be able to see all the metrics and Logs coming from our cluster!