Metrics

À la DPT

Our set up

  • Prometheus Operator
  • Prometheus
  • Grafana

Prometheus Operator

Custom Resource Definitions

  • Prometheus
  • ServiceMonitor
  • AlertManager
  • PrometheusRule

Prometheus


                            apiVersion: monitoring.coreos.com/v1
                            kind: Prometheus
                            metadata:
                              name: prometheus
                              namespace: monitoring
                            spec:
                              serviceAccountName: prometheus
                              serviceMonitorSelector:
                                matchLabel:
                                  team: DPT
                              resources:
                                requests:
                                  memory: "400Mi"
                        

=

1 Running Prometheus pod

ServiceMonitor


                            apiVersion: monitoring.coreos.com/v1
                            kind: ServiceMonitor
                            metadata:
                              name: myService
                              labels:
                                team: DPT
                            spec:
                              selector:
                                matchLabels:
                                  app: myService
                              endpoints:
                                - port: metrics
                        
Monitor services with label: "app: myService"

AlertManager


                            apiVersion: monitoring.coreos.com/v1
                            kind: Alertmanager
                            metadata:
                              name: alertmanager
                            spec:
                              replicas: 3
                        
Not yet used by us but we are looking into it.
More info is available here!

PrometheusRule


                            apiVersion: monitoring.coreos.com/v1
                            kind: PrometheusRule
                            metadata:
                              name: prometheus-example-rules
                            spec:
                                groups:
                                - name: alerting_rules
                                  rules:
                                  - alert: LoadAverage15m
                                    expr: node_load15 >= 0.50
                                    labels:
                                      severity: major
                                    annotations:
                                      summary: "Instance {{ $labels.instance }} - high load average"
                                      description: "{{ $labels.instance  }} (measured by {{ $labels.job }}) has high load average ({{ $value }}) over 15 minutes."
                        
Not yet used by us but we are looking into it.
More info is available here!

Our CRDs

  • 1 Prometheus looking for all ServiceMonitors
  • 1 ServiceMonitor / Service

Prometheus.yml


                            apiVersion: monitoring.coreos.com/v1
                            kind: Prometheus
                            metadata:
                              name: prometheus
                              namespace: monitoring
                            spec:
                              serviceAccountName: prometheus
                              serviceMonitorSelector: {} # Get all serviceMonitors.
                              serviceMonitorNamespaceSelector: {} # Search all namespaces
                              retention: 336h
                              resources:
                                requests:
                                  memory: 400Mi
                              enableAdminAPI: false
                              securityContext:
                                runAsUser: 1000
                                runAsNonRoot: true
                                fsGroup: 2000
                              storage:
                                volumeClaimTemplate:
                                  spec:
                                    storageClassName: gp2
                                    resources:
                                      requests:
                                        storage: 100Gi
                        
Can be found here!

ServiceMonitor in json-to-pdf


                            apiVersion: monitoring.coreos.com/v1
                            kind: ServiceMonitor
                            metadata:
                              name: json-to-pdf
                              labels:
                                team: DPT
                            spec:
                              endpoints:
                                - interval: 15s
                                  port: api
                              selector:
                                matchLabels:
                                  app: json-to-pdf
                        
Can be found here!

Grafana

  • 1 Deployment
  • 3 ConfigMaps
    • Dashboard Location
    • Datasource Definitions
    • Dashboard Definitions

grafana.yml


                            apiVersion: apps/v1beta2
                            kind: Deployment
                            metadata:
                              labels:
                                app: grafana
                              name: grafana
                              namespace: monitoring
                            spec:
                              replicas: 1
                              selector:
                                matchLabels:
                                  app: grafana
                              template:
                                metadata:
                                  labels:
                                    app: grafana
                                spec:
                                  containers:
                                    - image: grafana/grafana@sha256:1f1260f5a97e18547d6aa703602400d2f46162edda4dcf0f96156a693c3e9f4c # 6.2.2
                                      name: grafana
                                      ports:
                                        - containerPort: 3000
                                          name: http
                                      readinessProbe:
                                        httpGet:
                                          path: /api/health
                                          port: http
                                      resources:
                                        limits:
                                          cpu: 200m
                                          memory: 200Mi
                                        requests:
                                          cpu: 100m
                                          memory: 100Mi
                                      volumeMounts:
                                        - mountPath: /var/lib/grafana
                                          name: grafana-storage
                                          readOnly: false
                                        - mountPath: /etc/grafana/provisioning/datasources
                                          name: grafana-datasources
                                          readOnly: false
                                        - mountPath: /etc/grafana/provisioning/dashboards
                                          name: grafana-dashboards
                                        - mountPath: /grafana-dashboard-definitions/0
                                          name: grafana-dashboard-definitions
                                      env:
                                        - name: GF_INSTALL_PLUGINS
                                          value: "grafana-piechart-panel"
                                  securityContext:
                                    runAsNonRoot: true
                                    runAsUser: 65534
                                  serviceAccountName: grafana
                                  volumes:
                                    - emptyDir: {}
                                      name: grafana-storage
                                    - name: grafana-datasources
                                      configMap:
                                        name: grafana-datasources
                                    - name: grafana-dashboards
                                      configMap:
                                        name: grafana-dashboards
                                    - name: grafana-dashboard-definitions
                                      configMap:
                                        name: grafana-dashboard-definitions
                        

Dashboard location


                            apiVersion: v1
                            kind: ConfigMap
                            metadata:
                              name: grafana-dashboards
                              namespace: monitoring
                            data:
                              dashboards.yaml: |-
                                {
                                    "apiVersion": 1,
                                    "providers": [
                                        {
                                            "folder": "",
                                            "name": "0",
                                            "options": {
                                                "path": "/grafana-dashboard-definitions/0"
                                            },
                                            "orgId": 1,
                                            "type": "file"
                                        }
                                    ]
                                }
                        

Datasource Definitions


                            apiVersion: v1
                            kind: ConfigMap
                            metadata:
                              name: grafana-datasources
                              namespace: monitoring
                            data:
                              prometheus.yaml: |-
                                {
                                    "apiVersion": 1,
                                    "datasources": [
                                        {
                                            "name": "prometheus",
                                            "type": "prometheus",
                                            "editable": false,
                                            "orgId": 1,
                                            "version": 1,
                                            "url": "http://prometheus:9090",
                                            "access": "proxy",
                                        }
                                    ]
                                }
                        

Dashboard Definitions


                            apiVersion: v1
                            kind: ConfigMap
                            metadata:
                              name: grafana-dashboard-definitions
                              namespace: monitoring
                            data:
                                # Taken from: https://grafana.com/dashboards/8685
                                dashboard-8685.json: '{"annotations":{"list":[{"builtIn":1,"datasource":"--'
                                # Dashboard to view flask metrics on services.
                                service-dashboard.json: |-
                                  {
                                    "annotations": {
                        

Import Dashboard


                            David@~/w/i/the-port> bin/add_grafana_dashboard.py
                            Dashboard Id (from grafana.com): 8685
                            Name: DS_PROMETHEUS
                            Description:
                            Value [prometheus]:
                            ConfigMap file [k8s/monitoring/grafana/dashboard-definitions.yml]:
                        

one new definition is added to dashboard-definitions.yml

Demo!


                        kubectl port-forward service/grafana 3000 -n monitoring &
                        kubectl port-forward service/prometheus 9090 -n monitoring &
                    

All k8s manifests can be found on
gh://instantor/the-port/k8s/monitoring