[AWES] EKS Observability

AEWS 스터디에서는 AWS의 관리형 Kubernetes인 Elastic Kubernetes의 다양한 기능들을 실습해보면서 익혀본다. 이 글은 스터디를 참여하면서 학습한 내용을 정리하는 연재 글이다. 스터디 진도에 맞춰 글을 작성한다.

이 글에서는 EKS Observability - Logging in EKS 에 대해서 알아본다.

EKS WorkerNodeInstanceType을 t3.xlarge로 변경하여 실습한다. 전 회차 스터디에서는 거의 하루동안 EKS 배포를 해놓고 관련 자료를 찾아보면서 천천히 실습을 했는데, 비싼 인스턴스 타입을 사용하므로 이번 실습은 EKS를 배포하고 빠르게 실습해봤다.

1. EKS Console

Amazon Elastic Kubernetes Service > 클러스터 > myeks (클러스터 이름) > 리소스

AWS EKS 콘솔에서도 클러스터의 다양한 정보를 확인 할 수 있다. EKS 배포시에 ClusterRole을 기본적으로 생성한다. 이 클러스터 롤들을 통해서 정보를 콘솔에서 확인 할 수 있게된다.

콘솔에서 다양한 오브젝트들을 확인 할 수 있다. 모니터링 용으로 배포해 놓은 kube-ops-view을 확인 해본다. 우측 상단의 원시보기를 클릭하면 해당 yaml 내용도 확인 할 수 있다.

2. Logging in EKS

control plane logging, node logging, and application logging 세 가지 종류의 로깅이 필요하다. 클러스터 로깅을 활성화 시키면 확인 가능하다.

2-1. Control Plane logging 활성화

아래 Kubernetes API server나 Scheduler, controllerManager 등은 EKS에서는 AWS가 관리하기 때문에 활성화 하면 어떤 활동을 했는지 확인 할 수 있게된다.

Kubernetes API server component logs (api) – kube-apiserver-<nnn...>
Audit (audit) – kube-apiserver-audit-<nnn...>
Authenticator (authenticator) – authenticator-<nnn...>
Controller manager (controllerManager) – kube-controller-manager-<nnn...>
Scheduler (scheduler) – kube-scheduler-<nnn...>

기본적으로 로깅은 모두 Off 상태이며, AWS 콘솔에서 로깅관리를 클릭하여 활성화 시킬 수 있다. 활성화는 CLI로도 가능하다. 아래 CLI 명령어를 통해 로그 활성화하고 로그를 실시간으로 확인해본다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

# 모든 로깅 활성화
aws eks update-cluster-config --region $AWS_DEFAULT_REGION --name $CLUSTER_NAME \
    --logging '{"clusterLogging":[{"types":["api","audit","authenticator","controllerManager","scheduler"],"enabled":true}]}'
 
# 로그 그룹 확인
aws logs describe-log-groups | jq
{
  "logGroups": []
}
 
# 로그 tail 확인 : aws logs tail help
aws logs tail /aws/eks/$CLUSTER_NAME/cluster | more
 
# 신규 로그를 바로 출력
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --follow
 
# 필터 패턴
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --filter-pattern <필터 패턴>
 
# 로그 스트림이름
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --log-stream-name-prefix <로그 스트림 prefix> --follow
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --log-stream-name-prefix kube-controller-manager --follow
kubectl scale deployment -n kube-system coredns --replicas=1
kubectl scale deployment -n kube-system coredns --replicas=2
 
# 시간 지정: 1초(s) 1분(m) 1시간(h) 하루(d) 한주(w)
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --since 1h30m
 
# 짧게 출력
aws logs tail /aws/eks/$CLUSTER_NAME/cluster --since 1h30m --format short
Colored by Color Scripter

cs

CLI로 log 실시간 확인 - replica 개수 변경 (아래) 하여 실시간으로 로그를 확인 (위)

CloudWatch - 로그그룹 확인

활성화 하고 Cloud Watch를 확인 하면 해당 클러스터의 로그 스트림에서 위에서 활성화 한 5개의 로깅 스트림을 확인 할 수 있다.

CloudWatch - Logs Insights

이렇게 쌓인 로그들을 쿼리를 통해 필터링 하여 보고 싶을 때는 Log Insights를 사용한다. CloudWatch에서 서비스를 확인 할 수 있다. 아래 예시 처럼 쿼리를 할 수 있다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

# EC2 Instance가 NodeNotReady 상태인 로그 검색
fields @timestamp, @message
| filter @message like /NodeNotReady/
| sort @timestamp desc
 
# kube-apiserver-audit 로그에서 userAgent 정렬해서 아래 4개 필드 정보 검색
fields userAgent, requestURI, @timestamp, @message
| filter @logStream ~= "kube-apiserver-audit"
| stats count(userAgent) as count by userAgent
| sort count desc
 
#
fields @timestamp, @message
| filter @logStream ~= "kube-scheduler"
| sort @timestamp desc
 
#
fields @timestamp, @message
| filter @logStream ~= "authenticator"
| sort @timestamp desc
 
#
fields @timestamp, @message
| filter @logStream ~= "kube-controller-manager"
| sort @timestamp desc
Colored by Color Scripter

cs

2번째 예시를 사용해서 감사로그에서 확인하여 어떤 User agnet가 API를 사용했는지 확인 해본다.

Control Plane metrics을 Prometheus 방식으로 확인 할 수도 있다.

1
2

# 메트릭 패턴 정보 : metric_name{"tag"="value"[,...]} value
kubectl get --raw /metrics | more

cs

로깅 끄기

1
2
3
4
5

# EKS Control Plane 로깅(CloudWatch Logs) 비활성화
eksctl utils update-cluster-logging --cluster $CLUSTER_NAME --region $AWS_DEFAULT_REGION --disable-types all --approve
 
# 로그 그룹 삭제
aws logs delete-log-group --log-group-name /aws/eks/$CLUSTER_NAME/cluster
Colored by Color Scripter

cs

컨테이너 애플리케이션 로그는 kubectl logs 명령어로 조회가 가능하다. 즉 애플리케이션 종류, 로그 파일 위치에 상관 없이 Pod접속 없이 단일 명령어로 조회가 가능하다. NGINX 웹서버 배포하여 어떻게 이게 가능한지 살펴보자.
NGINX 웹서버 배포의 annotations에는 load-balancer-name 설정, ssl-redirect, group.name이 설정 되어 있다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51

# NGINX 웹서버 배포
helm repo add bitnami https://charts.bitnami.com/bitnami
 
# 사용 리전의 인증서 ARN 확인
CERT_ARN=$(aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text)
echo $CERT_ARN
 
# 도메인 확인
echo $MyDomain
 
# 파라미터 파일 생성
cat <<EOT > nginx-values.yaml
service:
    type: NodePort
 
ingress:
  enabled: true
  ingressClassName: alb
  hostname: nginx.$MyDomain
  path: /*
  annotations: 
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
    alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
    alb.ingress.kubernetes.io/success-codes: 200-399
    alb.ingress.kubernetes.io/load-balancer-name: $CLUSTER_NAME-ingress-alb
    alb.ingress.kubernetes.io/group.name: study
    alb.ingress.kubernetes.io/ssl-redirect: '443'
EOT
cat nginx-values.yaml | yh
 
--
 
# 배포
helm install nginx bitnami/nginx --version 14.1.0 -f nginx-values.yaml
 
# 확인
kubectl get ingress,deploy,svc,ep nginx
kubectl get targetgroupbindings # ALB TG 확인
 
# 접속 주소 확인 및 접속
echo -e "Nginx WebServer URL = https://nginx.$MyDomain"
curl -s https://nginx.$MyDomain
kubectl logs deploy/nginx -f
 
# 반복 접속
while true; do curl -s https://nginx.$MyDomain -I | head -n 1; date; sleep 1; done
 
# (참고) 삭제 시
helm uninstall nginx
Colored by Color Scripter

cs

kubectl logs 명령어를 사용해보고 컨테이너 로그 파일 위치를 확인 해 본다.

1
2
3
4
5
6
7
8
9
10

# 로그 모니터링
kubectl logs deploy/nginx -f
 
# nginx 웹 접속 시도
 
# 컨테이너 로그 파일 위치 확인
kubectl exec -it deploy/nginx -- ls -l /opt/bitnami/nginx/logs/
total 0 
lrwxrwxrwx 1 root root 11 Apr 24 10:13 access.log -> /dev/stdout 
lrwxrwxrwx 1 root root 11 Apr 24 10:13 error.log -> /dev/stderr

cs

컨테이너 이미지를 빌드할 때 어플리케이션의 주요 로그(access.log)를 stadout으로, 에러 로그(error.log)는 staderror를 심볼릭 링크를 건다. 그렇기에 Pod에 접근하지 않아도 kubectl logs 명령을 통해 확인이 가능하다.

이 방법의 한계는 1) 종료된 파드의 로그는 kubectl logs로 조회 할 수 없다. 2) kubelet 기본 설정은 로그 파일의 최대 크기가 10Mi로 10Mi를 초과하는 로그는 전체 로그 조회가 불가능하다.

2-3. 파드 로깅

CloudWatct의 Container Insights와 Fluent Bit 두가지를 사용하여 파드 로그 수집 가능하다.

Container Insights metrics in Amazon CloudWatch & Fluent Bit (Logs)

Fluent Bit과 CloudWatch Agent가 노드에 데몬셋으로 떠 있다. 로그와 메트릭을 수집할 때 로그는 Fluent Bit이 수집하고, 메트릭은 CloudWatch Agent가 수집하여 CloudWatch Logs에 전달한다.

https://aws.amazon.com/ko/blogs/containers/fluent-bit-integration-in-cloudwatch-container-insights-for-eks/

Fluent Bit 컨테이너가 수집하는 로그는 3가지 종류다.

1. 각 컨테이너/파드 로그: /aws/containerinsights/Cluster_Name/application, 로그 소스(All log files in /var/log/containers),
2. 노드(호스트) 로그: /aws/containerinsights/Cluster_Name/host : 로그 소스(Logs from /var/log/dmesg, /var/log/secure, and /var/log/messages),
3. 쿠버네티스 데이터플레인 로그: /aws/containerinsights/Cluster_Name/dataplane : 로그 소스(/var/log/journal for kubelet.service, kubeproxy.service, and docker.service)

그럼 바로 실습을 해보자. 설정 값을 변수화 하고 설치하면 각 워커노드의 ssh로 확인 해보면 2020포트로 fluent-bit이 리스닝되어 있는걸 확인 할 수 있다.

application-log.conf에서 Fluent Bit이 로그를 어떻게 INPUT/FILTER/OUTPUT 해서 보내는지 확인 가능하다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117

FluentBitHttpServer='On'
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
FluentBitReadFromTail='On'
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${CLUSTER_NAME}'/;s/{{region_name}}/'${AWS_DEFAULT_REGION}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl apply -f -
 
namespace/amazon-cloudwatch created
serviceaccount/cloudwatch-agent created
clusterrole.rbac.authorization.k8s.io/cloudwatch-agent-role created
clusterrolebinding.rbac.authorization.k8s.io/cloudwatch-agent-role-binding created
configmap/cwagentconfig created
daemonset.apps/cloudwatch-agent created
configmap/fluent-bit-cluster-info created
serviceaccount/fluent-bit created
clusterrole.rbac.authorization.k8s.io/fluent-bit-role created
clusterrolebinding.rbac.authorization.k8s.io/fluent-bit-role-binding created
configmap/fluent-bit-config created
daemonset.apps/fluent-bit created
 
# 설치 확인
kubectl get-all -n amazon-cloudwatch
kubectl get ds,pod,cm,sa -n amazon-cloudwatch
kubectl describe clusterrole cloudwatch-agent-role fluent-bit-role                          # 클러스터롤 확인
kubectl describe clusterrolebindings cloudwatch-agent-role-binding fluent-bit-role-binding  # 클러스터롤 바인딩 확인
kubectl -n amazon-cloudwatch logs -l name=cloudwatch-agent -f # 파드 로그 확인
kubectl -n amazon-cloudwatch logs -l k8s-app=fluent-bit -f    # 파드 로그 확인
for node in $N1 $N2 $N3; do echo ">>>>> $node <<<<<"; ssh ec2-user@$node sudo ss -tnlp | grep fluent-bit; echo; done
 
# cloudwatch-agent 설정 확인
kubectl describe cm cwagentconfig -n amazon-cloudwatch
{
  "agent": {
    "region": "ap-northeast-2"
  },
  "logs": {
    "metrics_collected": {
      "kubernetes": {
        "cluster_name": "myeks",
        "metrics_collection_interval": 60
      }
    },
    "force_flush_interval": 5
  }
}
 
# CW 파드가 수집하는 방법 : Volumes에 HostPath를 살펴보자! >> / 호스트 패스 공유??? 보안상 안전한가? 좀 더 범위를 좁힐수는 없을까요?
kubectl describe -n amazon-cloudwatch ds cloudwatch-agent
...
ssh ec2-user@$N1 sudo tree /dev/disk
...
 
# Fluent Bit Cluster Info 확인
kubectl get cm -n amazon-cloudwatch fluent-bit-cluster-info -o yaml | yh
apiVersion: v1
data:
  cluster.name: myeks
  http.port: "2020"
  http.server: "On"
  logs.region: ap-northeast-2
  read.head: "Off"
  read.tail: "On"
kind: ConfigMap
...
 
# Fluent Bit 로그 INPUT/FILTER/OUTPUT 설정 확인 - 링크
## 설정 부분 구성 : application-log.conf , dataplane-log.conf , fluent-bit.conf , host-log.conf , parsers.conf
kubectl describe cm fluent-bit-config -n amazon-cloudwatch
...
application-log.conf:
----
[INPUT]
    Name                tail
    Tag                 application.*
    Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
    Path                /var/log/containers/*.log
    multiline.parser    docker, cri
    DB                  /var/fluent-bit/state/flb_container.db
    Mem_Buf_Limit       50MB
    Skip_Long_Lines     On
    Refresh_Interval    10
    Rotate_Wait         30
    storage.type        filesystem
    Read_from_Head      ${READ_FROM_HEAD}
 
[FILTER]
    Name                kubernetes
    Match               application.*
    Kube_URL            https://kubernetes.default.svc:443
    Kube_Tag_Prefix     application.var.log.containers.
    Merge_Log           On
    Merge_Log_Key       log_processed
    K8S-Logging.Parser  On
    K8S-Logging.Exclude Off
    Labels              Off
    Annotations         Off
    Use_Kubelet         On
    Kubelet_Port        10250
    Buffer_Size         0
 
[OUTPUT]
    Name                cloudwatch_logs
    Match               application.*
    region              ${AWS_REGION}
    log_group_name      /aws/containerinsights/${CLUSTER_NAME}/application
    log_stream_prefix   ${HOST_NAME}-
    auto_create_group   true
    extra_user_agent    container-insights
...
 
# Fluent Bit 파드가 수집하는 방법 : Volumes에 HostPath를 살펴보자!
kubectl describe -n amazon-cloudwatch ds fluent-bit
...
ssh ec2-user@$N1 sudo tree /var/log
...
 
# (참고) 삭제
curl -s https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/quickstart/cwagent-fluent-bit-quickstart.yaml | sed 's/{{cluster_name}}/'${CLUSTER_NAME}'/;s/{{region_name}}/'${AWS_DEFAULT_REGION}'/;s/{{http_server_toggle}}/"'${FluentBitHttpServer}'"/;s/{{http_server_port}}/"'${FluentBitHttpPort}'"/;s/{{read_from_head}}/"'${FluentBitReadFromHead}'"/;s/{{read_from_tail}}/"'${FluentBitReadFromTail}'"/' | kubectl delete -f -

cs

로깅 확인 : CloudWatch > 로그 그룹

로그 그룹에 위 3가지 로그그룹이 추가된 것을 확인 할 수 있다. application에 들어가서 nginx관련 로그 스트림을 확인해본다.

메트릭 확인 : CloudWatch > Container Insights

다양한 메트릭을 다양한 방식으로 확인 할 수 있다.

리소스

컨테이너 맵

성능 모니터링

3. Metrics-Server, Alert system

Metrics-server

Metrics-server는 kubelet으로부터 수집한 리소스 메트릭을 수집 및 집계하는 클러스터 애드온 구성 요소이다. cAdvisor라는 데몬이 kubelet에 포함된 컨테이너 메트릭을 수집, 집계, 노출한다. 이 수집된 메트릭을 kubectl top 명령어로 확인 할 수 있다.

cAdvisor : kubelet에 포함된 컨테이너 메트릭을 수집, 집계, 노출하는 데몬

kubectl top node나 kubectl top pod 명령어로 CPU, Memory 현황을 확인 할 수 있다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

# 배포
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
 
# 메트릭 서버 확인 : 메트릭은 15초 간격으로 cAdvisor를 통하여 가져옴
kubectl get pod -n kube-system -l k8s-app=metrics-server
kubectl api-resources | grep metrics
kubectl get apiservices |egrep '(AVAILABLE|metrics)'
 
# 노드 메트릭 확인
kubectl top node
 
NAME                                               CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
ip-192-168-1-68.ap-northeast-2.compute.internal    68m          1%     767Mi           5%
ip-192-168-2-227.ap-northeast-2.compute.internal   79m          2%     682Mi           4%
ip-192-168-3-40.ap-northeast-2.compute.internal    77m          1%     714Mi           4%
# 파드 메트릭 확인
kubectl top pod -A
kubectl top pod -n kube-system --sort-by='cpu'
kubectl top pod -n kube-system --sort-by='memory'
Colored by Color Scripter

cs

Botkube

슬랙에서 편리하게 kubectl get 등의 조회를 할 수 있다. 슬랙 앱 설정으로 SLACK_API_BOT_TOKEN 과 SLACK_API_APP_TOKEN 이 필요하다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

# repo 추가
helm repo add botkube https://charts.botkube.io
helm repo update
 
# 변수 지정
export ALLOW_KUBECTL=true
export ALLOW_HELM=true
export SLACK_CHANNEL_NAME=webhook3
 
#
cat <<EOT > botkube-values.yaml
actions:
  'describe-created-resource': # kubectl describe
    enabled: true
  'show-logs-on-error': # kubectl logs
    enabled: true
 
executors:
  k8s-default-tools:
    botkube/helm:
      enabled: true
    botkube/kubectl:
      enabled: true
EOT
 
# 설치
helm install --version v1.0.0 botkube --namespace botkube --create-namespace \
--set communications.default-group.socketSlack.enabled=true \
--set communications.default-group.socketSlack.channels.default.name=${SLACK_CHANNEL_NAME} \
--set communications.default-group.socketSlack.appToken=${SLACK_API_APP_TOKEN} \
--set communications.default-group.socketSlack.botToken=${SLACK_API_BOT_TOKEN} \
--set settings.clusterName=${CLUSTER_NAME} \
--set 'executors.k8s-default-tools.botkube/kubectl.enabled'=${ALLOW_KUBECTL} \
--set 'executors.k8s-default-tools.botkube/helm.enabled'=${ALLOW_HELM} \
-f botkube-values.yaml botkube/botkube
 
# 참고 : 삭제 시
helm uninstall botkube --namespace botkube
Colored by Color Scripter

cs

잘못된 이미지 파드 배포 및 확인

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

# 터미널1
watch kubectl get pod
 
# 잘못된 이미지 정보의 파드 배포
kubectl apply -f https://raw.githubusercontent.com/junghoon2/kube-books/main/ch05/nginx-error-pod.yml
kubectl get events -w
@Botkube k get pod
 
# 이미지 업데이트 방안2 : set 사용 - iamge 등 일부 리소스 값을 변경 가능!
kubectl set 
kubectl set image pod nginx-19 nginx-pod=nginx:1.19
@Botkube k get pod
 
# 삭제
kubectl delete pod nginx-19
Colored by Color Scripter

cs

잘못된 이미지 파드 알림, 파드 업데이트 전후 get으로 확인 했을 때!

4. Prometheus

https://prometheus.io/docs/introduction/overview/

쿠버네티스 모니터링을 해주는 대표적인 오픈소스 알럿 툴이다. 구성 요소는 다음과 같다.

the main Prometheus server which scrapes and stores time series data
client libraries for instrumenting application code
a push gateway for supporting short-lived jobs
special-purpose exporters for services like HAProxy, StatsD, Graphite, etc.
an alertmanager to handle alerts
various support tools

프로메테우스-스택 설치

모니터링에 필요한 여러 요소를 단일 차트(스택)으로 제공한다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107

# 모니터링
kubectl create ns monitoring
watch kubectl get pod,pvc,svc,ingress -n monitoring
 
# 사용 리전의 인증서 ARN 확인
CERT_ARN=`aws acm list-certificates --query 'CertificateSummaryList[].CertificateArn[]' --output text`
echo $CERT_ARN
 
# repo 추가
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
 
# 파라미터 파일 생성
cat <<EOT > monitor-values.yaml
prometheus:
  prometheusSpec:
    podMonitorSelectorNilUsesHelmValues: false
    serviceMonitorSelectorNilUsesHelmValues: false
    retention: 5d
    retentionSize: "10GiB"
 
  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - prometheus.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'
 
grafana:
  defaultDashboardsTimezone: Asia/Seoul
  adminPassword: prom-operator
 
  ingress:
    enabled: true
    ingressClassName: alb
    hosts: 
      - grafana.$MyDomain
    paths: 
      - /*
    annotations:
      alb.ingress.kubernetes.io/scheme: internet-facing
      alb.ingress.kubernetes.io/target-type: ip
      alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
      alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
      alb.ingress.kubernetes.io/success-codes: 200-399
      alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
      alb.ingress.kubernetes.io/group.name: study
      alb.ingress.kubernetes.io/ssl-redirect: '443'
 
defaultRules:
  create: false
kubeControllerManager:
  enabled: false
kubeEtcd:
  enabled: false
kubeScheduler:
  enabled: false
alertmanager:
  enabled: false
 
# alertmanager:
#   ingress:
#     enabled: true
#     ingressClassName: alb
#     hosts: 
#       - alertmanager.$MyDomain
#     paths: 
#       - /*
#     annotations:
#       alb.ingress.kubernetes.io/scheme: internet-facing
#       alb.ingress.kubernetes.io/target-type: ip
#       alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS":443}, {"HTTP":80}]'
#       alb.ingress.kubernetes.io/certificate-arn: $CERT_ARN
#       alb.ingress.kubernetes.io/success-codes: 200-399
#       alb.ingress.kubernetes.io/load-balancer-name: myeks-ingress-alb
#       alb.ingress.kubernetes.io/group.name: study
#       alb.ingress.kubernetes.io/ssl-redirect: '443'
EOT
cat ~/monitor-values.yaml | yh
 
# 배포
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 45.27.2 \
--set prometheus.prometheusSpec.scrapeInterval='15s' --set prometheus.prometheusSpec.evaluationInterval='15s' \
-f monitor-values.yaml --namespace monitoring
 
# 확인
## alertmanager-0 : 사전에 정의한 정책 기반(예: 노드 다운, 파드 Pending 등)으로 시스템 경고 메시지를 생성 후 경보 채널(슬랙 등)로 전송
## grafana : 프로메테우스는 메트릭 정보를 저장하는 용도로 사용하며, 그라파나로 시각화 처리
## prometheus-0 : 모니터링 대상이 되는 파드는 ‘exporter’라는 별도의 사이드카 형식의 파드에서 모니터링 메트릭을 노출, pull 방식으로 가져와 내부의 시계열 데이터베이스에 저장
## node-exporter : 노드익스포터는 물리 노드에 대한 자원 사용량(네트워크, 스토리지 등 전체) 정보를 메트릭 형태로 변경하여 노출
## operator : 시스템 경고 메시지 정책(prometheus rule), 애플리케이션 모니터링 대상 추가 등의 작업을 편리하게 할수 있게 CRD 지원
## kube-state-metrics : 쿠버네티스의 클러스터의 상태(kube-state)를 메트릭으로 변환하는 파드
helm list -n monitoring
kubectl get pod,svc,ingress -n monitoring
kubectl get-all -n monitoring
kubectl get prometheus,servicemonitors -n monitoring
kubectl get prometheusrule,alertmanager -n monitoring
kubectl get crd | grep monitoring
Colored by Color Scripter

cs

모니터링을 해보면 알람 매니저 파드가 확인된다. 프로메테우스가 Pull 방식으로 메트릭을 수집한다. 노드 메트릭을 가져올 수 있도록 node-exporter 파드가 각각 뜬다. node-exporter를 보면 노드 ip에 9100포트가 열려있어서 노드 상태정보를 제공한다.

Status 메뉴의 Service Discovery를 보면 리소스 정보를 자동으로 발견해서 메트릭 수집하고 있다. 타겟, 도달설정이 이미 구성되어 있다. 이외의 메트릭을 가져오는 주기, 타임아웃, Alert 관련 설정 등을 볼 수 있다.

자세한 기능은 다음 포스팅에서 확인 할 수있다. [PKOS] Kubernetes 모니터링과 로깅 - Prometheus, Grafana, Loki

5. kubecost

OpenCost를 기반으로 구축되었으며 AWS에서 적극 지원하여, 쿠버네티스 리소스별 비용 분류 가시화 제공한다.
AWS에서 addon을 제공하고 가격도 15일 보존시 무료이다.

Pricing - 링크 : Free(메트릭 15일 보존, Business(메트릭 30일 보존, …), Enterprise(.)
Amazon EKS cost monitoring with Kubecost architecture - 링크

https://docs.kubecost.com/install-and-configure/install/provider-installations/aws-eks-cost-monitoring

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46

# 
cat <<EOT > cost-values.yaml
global:
  grafana:
    enabled: true
    proxy: false
 
priority:
  enabled: false
networkPolicy:
  enabled: false
podSecurityPolicy:
  enabled: false
 
persistentVolume:
    storageClass: "gp3"
 
prometheus:
  kube-state-metrics:
    disabled: false
  nodeExporter:
    enabled: true
 
reporting:
  productAnalytics: true
EOT
 
# kubecost chart 에 프로메테우스가 포함되어 있으니, 기존 프로메테우스-스택은 삭제하자 : node-export 포트 충돌 발생
helm uninstall -n monitoring kube-prometheus-stack
 
# 배포
kubectl create ns kubecost
helm install kubecost oci://public.ecr.aws/kubecost/cost-analyzer --version 1.103.2 --namespace kubecost -f cost-values.yaml
 
# 배포 확인
kubectl get-all -n kubecost
kubectl get all -n kubecost
 
# kubecost-cost-analyzer 파드 IP변수 지정 및 접속 확인
CAIP=$(kubectl get pod -n kubecost -l app=cost-analyzer -o jsonpath={.items[0].status.podIP})
curl -s $CAIP:9090
 
# 외부에서 bastion EC2 접속하여 특정 파드 접속 방법 : socat(SOcket CAT) 활용 - 링크
yum -y install socat
socat TCP-LISTEN:80,fork TCP:$CAIP:9090
웹 브라우저에서 bastion EC2 IP로 접속
Colored by Color Scripter

cs

저작자표시 비영리 변경금지

'스터디 > Kubernetes' 카테고리의 다른 글

[AWES] EKS - Karpenter (0)	2023.05.28
[AWES] EKS - Autoscaling (0)	2023.05.27
[AWES] EKS Storage 5/5 - EKS Persistent Volumes for Instance Store & Add NodeGroup (3)	2023.05.14
[AWES] EKS Storage 4/5 - AWS EFS Controller (1)	2023.05.14
[AWES] EKS Storage 3/5 - AWS Volume SnapShots Controller (1)	2023.05.14

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

CloudWatch - 로그그룹 확인

CloudWatch - Logs Insights

로깅 끄기

Container Insights metrics in Amazon CloudWatch & Fluent Bit (Logs)

로깅 확인 : CloudWatch > 로그 그룹

메트릭 확인 : CloudWatch > Container Insights

Metrics-server

Botkube

잘못된 이미지 파드 배포 및 확인

프로메테우스-스택 설치

'스터디 > Kubernetes' 카테고리의 다른 글

CloudWatch - 로그그룹 확인

CloudWatch - Logs Insights

로깅 끄기

Container Insights metrics in Amazon CloudWatch & Fluent Bit (Logs)

로깅 확인 : CloudWatch > 로그 그룹

메트릭 확인 : CloudWatch > Container Insights

Metrics-server

Botkube

잘못된 이미지 파드 배포 및 확인

프로메테우스-스택 설치

'스터디 > Kubernetes' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역