意思就是 rpc 超了,节点下有太多 PodSandBox,crictl ps -a 一看有1400多个。。。大量exited的容器没有被删掉,累积起来超过了rpc限制。
PodSandBox 泄漏,crictl pods 可以看到大量同名但是 pod id不同的sanbox,几个月了kubelet并不自动删除- crictl pods
- crictl inspectp <pod id>
- crictl ps -a | grep <pod-id>
- crictl logs <container-id>
复制代码 kubelet通过cri和containerd举行交互。crictl也可以通过cri规范和containerd交互
crictl 是 CRI(规范) 兼容的容器运行时命令行接口,可以利用它来检查和调试 k8s node节点上的容器运行时和应用程序。
kubernetes 垃圾接纳(Garbage Collection)机制由kubelet完成,kubelet定期清算不再利用的容器和镜像,每分钟举行一次容器的GC,每五分钟举行一次镜像的GC
1. 开始GC
pkg/kubelet/kubelet.go:1352,开始GC func (kl *Kubelet) StartGarbageCollection()
pkg/kubelet/kuberuntime/kuberuntime_gc.go:409- // GarbageCollect removes dead containers using the specified container gc policy.
- // Note that gc policy is not applied to sandboxes. Sandboxes are only removed when they are
- // not ready and containing no containers.
- //
- // GarbageCollect consists of the following steps:
- // * gets evictable containers which are not active and created more than gcPolicy.MinAge ago.
- // * removes oldest dead containers for each pod by enforcing gcPolicy.MaxPerPodContainer.
- // * removes oldest dead containers by enforcing gcPolicy.MaxContainers.
- // * gets evictable sandboxes which are not ready and contains no containers.
- // * removes evictable sandboxes.
- func (cgc *containerGC) GarbageCollect(ctx context.Context, gcPolicy kubecontainer.GCPolicy, allSourcesReady bool, evictNonDeletedPods bool) error {
- errors := []error{}
- // Remove evictable containers
- if err := cgc.evictContainers(ctx, gcPolicy, allSourcesReady, evictNonDeletedPods); err != nil {
- errors = append(errors, err)
- }
- // Remove sandboxes with zero containers
- if err := cgc.evictSandboxes(ctx, evictNonDeletedPods); err != nil {
- errors = append(errors, err)
- }
- // Remove pod sandbox log directory
- if err := cgc.evictPodLogsDirectories(ctx, allSourcesReady); err != nil {
- errors = append(errors, err)
- }
- return utilerrors.NewAggregate(errors)
- }
复制代码 2. 驱逐容器 evictContainers
- 获取 evictUnits pkg/kubelet/kuberuntime/kuberuntime_gc.go:187
列出所有容器,容器中状态为 ContainerState_CONTAINER_RUNNING 和 container.CreatedAt 小于 minAge 直接跳过。
其余添加到 evictUnits
- map[evictUnit][]containerGCInfo
- // evictUnit is considered for eviction as units of (UID, container name) pair.
- type evictUnit struct {
- // UID of the pod.
- uid types.UID
- // Name of the container in the pod.
- name string
- }
- // containerGCInfo is the internal information kept for containers being considered for GC.
- type containerGCInfo struct {
- // The ID of the container.
- id string
- // The name of the container.
- name string
- // Creation time for the container.
- createTime time.Time
- // If true, the container is in unknown state. Garbage collector should try
- // to stop containers before removal.
- unknown bool
- }
复制代码- // evict all containers that are evictable
- func (cgc *containerGC) evictContainers(ctx context.Context, gcPolicy kubecontainer.GCPolicy, allSourcesReady bool, evictNonDeletedPods bool) error {
- // Separate containers by evict units.
- evictUnits, err := cgc.evictableContainers(ctx, gcPolicy.MinAge)
- if err != nil {
- return err
- }
- // Remove deleted pod containers if all sources are ready.
- // 如果pod已经不存在了,那么就删除其中的所有容器。
- if allSourcesReady {
- for key, unit := range evictUnits {
- if cgc.podStateProvider.ShouldPodContentBeRemoved(key.uid) || (evictNonDeletedPods && cgc.podStateProvider.ShouldPodRuntimeBeRemoved(key.uid)) {
- cgc.removeOldestN(ctx, unit, len(unit)) // Remove all.
- delete(evictUnits, key)
- }
- }
- }
- // Enforce max containers per evict unit.
- // 执行 GC 策略,保证每个 POD 最多只能保存 MaxPerPodContainer 个已经退出的容器
- if gcPolicy.MaxPerPodContainer >= 0 {
- cgc.enforceMaxContainersPerEvictUnit(ctx, evictUnits, gcPolicy.MaxPerPodContainer)
- }
- // Enforce max total number of containers.
- // 执行 GC 策略,保证节点上最多有 MaxContainers 个已经退出的容器
- if gcPolicy.MaxContainers >= 0 && evictUnits.NumContainers() > gcPolicy.MaxContainers {
- // Leave an equal number of containers per evict unit (min: 1).
- numContainersPerEvictUnit := gcPolicy.MaxContainers / evictUnits.NumEvictUnits()
- if numContainersPerEvictUnit < 1 {
- numContainersPerEvictUnit = 1
- }
- cgc.enforceMaxContainersPerEvictUnit(ctx, evictUnits, numContainersPerEvictUnit)
- // If we still need to evict, evict oldest first.
- numContainers := evictUnits.NumContainers()
- if numContainers > gcPolicy.MaxContainers {
- flattened := make([]containerGCInfo, 0, numContainers)
- for key := range evictUnits {
- flattened = append(flattened, evictUnits[key]...)
- }
- sort.Sort(byCreated(flattened))
- cgc.removeOldestN(ctx, flattened, numContainers-gcPolicy.MaxContainers)
- }
- }
- return nil
- }
- 移除该pod uid下的所有容器
- // removeOldestN removes the oldest toRemove containers and returns the resulting slice.
- func (cgc *containerGC) removeOldestN(ctx context.Context, containers []containerGCInfo, toRemove int) []containerGCInfo {
- // Remove from oldest to newest (last to first).
- numToKeep := len(containers) - toRemove
- if numToKeep > 0 {
- sort.Sort(byCreated(containers))
- }
- for i := len(containers) - 1; i >= numToKeep; i-- {
- if containers[i].unknown {
- // Containers in known state could be running, we should try
- // to stop it before removal.
- id := kubecontainer.ContainerID{
- Type: cgc.manager.runtimeName,
- ID: containers[i].id,
- }
- message := "Container is in unknown state, try killing it before removal"
- if err := cgc.manager.killContainer(ctx, nil, id, containers[i].name, message, reasonUnknown, nil); err != nil {
- klog.ErrorS(err, "Failed to stop container", "containerID", containers[i].id)
- continue
- }
- }
- if err := cgc.manager.removeContainer(ctx, containers[i].id); err != nil {
- klog.ErrorS(err, "Failed to remove container", "containerID", containers[i].id)
- }
- }
- // Assume we removed the containers so that we're not too aggressive.
- return containers[:numToKeep]
- }
复制代码 3. 驱逐sandbox evictSandboxes
移除所有可驱逐的沙箱。可驱逐的沙箱必须满意以下要求: 1.未处于停当状态2.不包含任何容器。3.属于不存在的 (即,已经移除的) pod,或者不是该pod的近来创建的沙箱。
如今现象是 crictl pods 可以看到大量同名但是 pod id不同的sanbox。 根据 3 点要求
- sanbox notReady 满意
- 不包容任何容器 不满意
- 不是该pod的近来创建的沙箱 满意
因此sandbox 删不掉的缘故原由是 sandbox下的容器未被删除
容器异常退出后,根据重启计谋 restartPolicy: Always pod 会不断重启,直到 超过时限失败。
Pod 的垃圾收集
对于已失败的 Pod 而言,对应的 API 对象仍然会保留在集群的 API 服务器上, 直到用户或者控制器进程显式地将其删除。
Pod 的垃圾收集器(PodGC)是控制平面的控制器,它会在 Pod 个数超出所设置的阈值 (根据 kube-controller-manager 的 terminated-pod-gc-threshold 设置 默认值:12500)时删除已终止的 Pod(阶段值为 Succeeded 或 Failed)。 这一行为会避免随着时间演进不断创建和终止 Pod 而引起的资源泄漏问题。
