[Kubernetes] Affinity (affinity, anti-affinity)

DevOps/Kubernetes

[Kubernetes] Affinity (affinity, anti-affinity)

TTOII 2022. 6. 5. 10:10

728x90

🚀 Affinity

nodeSelector의 정책이 경직되어 있다면 Affinity는 선호도를 이용해 가능하면 선호하는 것을 사용하고 아니어도 허용한다.
즉, 스케줄링에 유연성을 둔다.

affinity
- pod
- node
anti-affinty
- pod

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: with-node-affinity
    image: k8s.gcr.io/pause:2.0

파드의 속성에 affinity속성이 있다.
preferredDuringSchedulingIgnoredDuringExecution이므로 만족하지 않는다고해서 구성되지 않는 것이 아니며 만족하지 못하면 무시될 수 있다.

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain 
FIELDS:
   nodeAffinity <Object>
     Describes node affinity scheduling rules for the pod.

   podAffinity  <Object>
     Describes pod affinity scheduling rules (e.g. co-locate this pod in the
     same node, zone, etc. as some other pod(s)).

   podAntiAffinity      <Object>
     Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
     in the same node, zone, etc. as some other pod(s)).

nodeAffinity : 파드에 대한 노드 어피니티 스케줄링 규칙
podAffinity : 파드 선호도 스케줄링 규칙
podAntiAffinity : 파드의 반선호도 스케줄링 규칙

🚀 nodeAffinity

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain pod.spec.affinity.nodeAffinity

FIELDS:
   preferredDuringSchedulingIgnoredDuringExecution      <[]Object>

   requiredDuringSchedulingIgnoredDuringExecution       <Object>

preferredDuringSchedulingIgnoredDuringExecution : 선호
requiredDuringSchedulingIgnoredDuringExecution : 요구(반드시 지켜져야 한다.)

스케줄링할 때만 적용되며 스케줄링 후 파드의 컨테이너가 실행됐을 때는 무시한다.

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain pod.spec.affinity.nodeAffinity.preferredDuringSchedulingIgnoredDuringExecution

FIELDS:
   preference   <Object> -required-
     A node selector term, associated with the corresponding weight.

   weight       <integer> -required-
     Weight associated with matching the corresponding nodeSelectorTerm, in the
     range 1-100.

preference : 노드 셀렉팅을 어떻게 할 것인가 ?
- matchExpressions
- matchFields
weight : 가중치
- 선호 기준의 리스트가 있을 때 어떤 정책에 가중치를 얼마나 더 둘 것인가 ?

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuring

FIELDS:
   nodeSelectorTerms    <[]Object> -required-
     Required. A list of node selector terms. The terms are ORed.

nodeSelectorTerms
- matchExpressions
- matchFields

강제 사항이므로 weight가 없다.

🚀 파드간 affinity와 anti-affinity

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain pod.spec.affinity.podAffinity.preferredDuringSchedulingIgnoredDuringExecution
FIELDS:
   podAffinityTerm      <Object> -required-
     Required. A pod affinity term, associated with the corresponding weight.

   weight       <integer> -required-
     weight associated with matching the corresponding podAffinityTerm, in the
     range 1-100.

podAffinityTerm : 파드와 파드 간의 affinity
- labelSelector : 파드와 파드 간의
- namespaceSelector
- namespaces : 파드와 네임스페이스를 매칭시킨다.
- topologyKey (필수)

namespaceSelector, namespaces는 잘 사용되지 않는다.

YAML에 네임스페이스를 지정하는게 더 편하다. 또는 명령어에 -n옵션으로 지정하는게 낫다.

노드가 3개 있고 RS에는 3개의 web 파드가 STS에는 3개의 DB 파드가 있다고 하자

클라이언트가 web에 접속하고 web은 사용자의 요청에 따라 DB에서 값을 가져와 클라이언트에게 응답해야 할 수 있다.

가장 best한 상황은 모든 노드에 web과 DB 파드가 각각 1개씩 배치되는 상황이다.

이때 worst 상황은 모든 web 파드가 하나의 노드에만 배치되고

DB 파드는 web 파드가 배치된 노드가 아닌 다른 노드에 배치되는 경우이다.

클라이언트는 web에 요청하고 web은 DB에 요청한다.

web이 다른 노드에 있는 DB 파드에 요청해야 하므로 물리적인 레이어에서 네트워크를 타고 가야한다.

만약 DB 파드가 같은 노드에 있었다면 바로 로컬에서 요청하면 된다.

그리고 DB 파드 간 동기화가 필요한데 다른 노드에 배치되면 DB 동기화도 마찬가지로 물리적인 네트워크를 타고 이루어져야 한다. 이런 모든 것들이 오버헤드이다.

이제 best한 상황에서 고려해야할 점은 web 파드와 DB 파드를 한 쌍으로 묶어야 한다는 것이다.

파드와 파드간의 선호도를 결정한다.

조금 더 구체적인 예로 설명하면
RS에 의해 만들어진 Web 파드에는 app: web, STS에 의해 만들어진 DB 파드에는 app : db 라는 레이블이 붙어있다.

여기서 web 파드에서 affinity를 설정하는데 셀렉터를 이용해 app : db 레이블이 붙은 파드와 친하다고 선언한다.
그러면 web 파드 입장에서 항상 DB 파드를 셀렉팅하는 것이기 때문에 web 파드와 DB 파드 쌍은 항상 같은 노드에 배치된다.

물론 여기서 또 worst 상황이 발생할 수도 있다. 모든 web 파드와 DB 파드 쌍이 하나의 노드에만 배치되는 상황이다.

이때 봐야하는 것이 podAntiAffinity이다.
같은 RS에 의해 만들어진 web 파드는 서로를 anti 한다고 선언한다.
서로를 anti 하므로 같은 노드에 배치될 수 없다. 반드시 떨어져 있어야 한다.

정리하자면 같은 RS에 의해 생성된 web 파드들은 서로를 anti하고 같은 STS에 의해 생성된 DB 파드들도 서로를 anti 해야한다.

그리고 web 파드와 DB 파드는 affinity해야한다.

그러면 web 파드, DB 파드 쌍이 하나씩 노드에 배치되며 각 쌍은 서로를 배척하게 된다.

 vagrant@k8s-node1  ~/schedule/affinity  kubectl explain pod.spec.affinity.podAffinity.requiredDuringSchedulingIgnoredDuringExecution

FIELDS:
   labelSelector        <Object>

   namespaceSelector    <Object>

   namespaces   <[]string>

   topologyKey  <string> -required-

topologyKey(필수) : 파드를 배치하기 위해 노드의 레이블을 참조한다.
파드간의 관계에서 왜 노드와 관련된 얘기가 나오는 것인가 ?

파드 배치의 co-located 기준과 not co-located의 기준이 무엇인가 하는 것이다. 이는 도메인이라고 볼 수도 있을 것이다.

노드가 2개 있을 때 파드를 affinity 하거나 antiAffinity로 배치한다면 그 노드의 기준이 어떤 기준인가 하는 것이다.

모든 노드에는 kubernetes.io/hostname=value이 붙어있다.
만약 파드가 같이 배치돼야 한다면(affinity) 해당 값을 기준으로해서 같은 노드에 배치시켜야하고 antiAffinity라면 해당 값을 기준으로해서 다른 노드에 배치시켜야한다.

AWS에서는 토폴로지를 가용영역 단위로 나눠주게 되면 어떤 파드는 ap-northeast-2a에 배치하고 어떤 파드는 ap-northease-2b에 배치하도록 할 수 있을 것이다.

🚀 실습

다음과 같은 구성을 가지도록 리소스를 생성해보자

myweb-a.yaml

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: myweb-a
spec:
  replicas: 2
  selector:
    matchLabels:
      app: a
  template:
    metadata:
      labels:
        app: a
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 10
              preference:
                matchExpressions:
                  - key: gpu
                    operator: Exists
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                  matchLabels:
                    app: a # 자기 자신을 배척한다. 파드간 anti
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: myweb
          image: ghcr.io/c1t1d0s7/go-myweb

myweb-b.yaml

apiVersion: apps/v1
kind: ReplicaSet
metadata:
  name: myweb-b
spec:
  replicas: 2
  selector:
    matchLabels:
      app: b
  template:
    metadata:
      labels:
        app: b
    spec:
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 10
              preference:
                matchExpressions:
                  - key: gpu
                    operator: Exists
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                  matchLabels:
                    app: b # 자기자신과는 anti
              topologyKey: "kubernetes.io/hostname"
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                  matchLabels:
                    app: a # a라는 레이블을 가진 파드와 affinity
              topologyKey: "kubernetes.io/hostname"
      containers:
        - name: myweb
          image: ghcr.io/c1t1d0s7/go-myweb

 vagrant@k8s-node1  ~/schedule/affinity  kubectl create -f myweb-a.yaml                                                              
replicaset.apps/myweb-a created
 vagrant@k8s-node1  ~/schedule/affinity  kubectl get po -o wide        
NAME                                      READY   STATUS    RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
myweb-a-5glw4                             1/1     Running   0          4s      10.233.96.141   node2   <none>           <none>
myweb-a-gvznj                             1/1     Running   0          4s      10.233.92.160   node3   <none>           <none>
nfs-client-provisioner-758f8cd4d6-wpjbt   1/1     Running   0          2d21h   10.233.92.110   node3   <none>           <none>

myweb-a가 node2, 3번에만 배치됐으므로 b도 node2, 3에 배치되어야 한다.

 vagrant@k8s-node1  ~/schedule/affinity  kubectl create -f myweb-b.yaml  
replicaset.apps/myweb-b created
 vagrant@k8s-node1  ~/schedule/affinity  kubectl get po -o wide        
NAME                                      READY   STATUS              RESTARTS   AGE     IP              NODE    NOMINATED NODE   READINESS GATES
myweb-a-5glw4                             1/1     Running   0          3m12s   10.233.96.141   node2   <none>           <none>
myweb-a-gvznj                             1/1     Running   0          3m12s   10.233.92.160   node3   <none>           <none>
myweb-b-2jgdr                             1/1     Running   0          28s     10.233.92.161   node3   <none>           <none>
myweb-b-shvpn                             1/1     Running   0          28s     10.233.96.142   node2   <none>           <none>
nfs-client-provisioner-758f8cd4d6-wpjbt   1/1     Running             0          2d21h   10.233.92.110   node3   <none>           <none>

myweb-a가 node2, 3에 각각 하나씩 배치되어 있으므로 myweb-b도 node2, 3에 각각 하나씩 배치된 것을 볼 수 있다.

728x90