EKS Monitoring with Amazon Managed Service for Prometheus

3 min readFeb 7, 2023

I want to talk about Amazon Managed Service for Prometheus. We migrated some environments from AWS Opsworks to EKS. We used the Sensu monitoring tool for non-container (primitive) environments, but now we choose EKS for our new environments and are trying to migrate current primitive environments. It’s a little tricky but possible.

Whatever, for EKS we generally chose cloud-native tools, also Prometheus is a CNCF graduate project.

Cloud Native Landscape

The cloud native landscape ( png, pdf), serverless landscape ( png, pdf), and member landscape ( png, pdf) are…

landscape.cncf.io

Our Infrastructure

Here is our monitoring infrastructure;

I am passing Amazon Managed Service for Prometheus setup, I talk about EKS integration and rule management.

Prometheus Config
When your AMP (Amazon Managed Service for Prometheus) is ready and your EKS cluster is also prepared, you need Prometheus Server and some custom deployments. On the EKS side Prometheus configuration is like below.

We followed the Prometheus community helm chart;

helm-charts/values.yaml at main · prometheus-community/helm-charts

You can't perform that action at this time. You signed in with another tab or window. You signed out in another tab or…

github.com

server:
  remoteWrite:
    - url: ${prometheus_endpoint}
      sigv4:
        region: ${aws_region}
        %{ if amp_account != "master" }
        role_arn: arn:aws:iam::${master_aws_account_id}:role/master-prod-amp-role
        %{ endif }
      queue_config:
        max_samples_per_send: 1000
        max_shards: 200
        capacity: 2500

We have an AMP server and we don’t need to store any metrics in any cluster. We need to configure our Prometheus Helm Chart config like the above. remoteWrite configuration is enough. When the Prometheus server in the cluster collects metrics, forward AMP via remoteWrite configuration.

Rules
AMP makes possible Prometheus rule management.
Prometheus has default rules but if you are using AMP, rules managed by AMP and default empty. You can manage via the dashboard or any IaC.

Format like this,

groups:
  - name: kube-apiserver-slos
    rules:
    - alert: KubeAPIErrorBudgetBurn
      annotations:
        description: The API server is burning too much error budget.
        runbook_url: https://runbooks.prometheus-operator.dev/runbooks/kubernetes/kubeapierrorbudgetburn
        summary: The API server is burning too much error budget.
      expr: |-
        sum(apiserver_request:burnrate1h) > (14.40 * 0.01000)
        and
        sum(apiserver_request:burnrate5m) > (14.40 * 0.01000)
      for: 2m
      labels:
        long: 1h
        severity: critical
        short: 5m

Alert Management

AMP also makes it possible alert management like pagerduty. Here is an example;

I chose SNS for the Pagerduty trigger. You need SNS ARN if you use this block.

template_files:
  default_template: |
    {{ define "sns.default.subject" }}[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]{{ end }}
    {{ define "__alertmanager" }}AlertManager{{ end }}
    {{ define "__alertmanagerURL" }}{{ .ExternalURL }}/#/alerts?receiver={{ .Receiver | urlquery }}{{ end }}
alertmanager_config: |
  global:
    resolve_timeout: 1m 
  route:
    group_by: ["alertname"]
    group_wait: 1m
    group_interval: 5m
    repeat_interval: 3m
    receiver: "pagerduty"
    routes:
      - receiver: "pagerduty"
        group_wait: 1m
        match_re:
          severity: critical
        continue: true

  receivers:
    - name: pagerduty
      sns_configs:
      - send_resolved: true
        topic_arn: "[sns_topic_arn]"
        sigv4:
          region: "[region]"
        message: |
          routing_key: "[routing_key]"
          client_url: {{ .ExternalURL }}
          {{ range .Alerts -}}
          dedup_key: {{ .Labels.alertname }}
          severity: {{ .Labels.severity }}
          description: {{ .Annotations.summary }}
          details:
            details: {{ .Annotations.description }}
            {{ range .Labels.SortedPairs }}
            {{ .Name }}: {{ .Value }}
            {{ end }}
          {{ end }}

Here is an example trigger response;