Openshift and also OKD Docker image is stuck when loading. A severe bug in the CRI-O engine causes stuck the OKD Docker images in an invalid and unusable state. There are discussions about timeouts while loading the images from the docker registry or too long filenames in the CRI-O layer. But in fact, the OKD Docker image is stuck. The binary content is not available, and any further retry to load the image runs into “error reserving ctr name”.

Bug leaves image in an inconsistent state – OKD Image is stuck

The CRI-O bugs leave the docker images in a half-loaded inconsistent state. While the image name is reserved, the binary layer contents are incomplete. In this way, the runtime knows the image, but it cannot use it. The message “pod sandbox with name … already exists” indicates the conflict situation.

OKD Image is stuck – What favors the error?

There are several theories on the origin of the error situation. Some discussions talk about too-long path names when storing the image layers in the file system. A different theory sees network latencies or slow docker registries as a cause. During my tests, the bug occurred only randomly. While some clusters ran into the bug, others were not affected.

Solving the stuck image

When the docker image gets stuck, the affected image becomes not available to the worker node. In addition, you may see increasing network traffic because of image reload tries. But in some cases deleting the affected pod or deleting the containing namespace solved the problem. At the same time, in another case, even rebooting the worker node did not solve the problem. Here, deleting the pod and rebooting the worker node could not solve the problem. As a result, the affected worker node could not operate the image. Unfortunately, we must reinstall the worker node from scratch.

When the bug becomes a trouble amplifier

The problems boil over when they favor each other. So in one error case, a hardware malfunction removed a worker node from the cluster. Here the Kubernetes failovers work pretty and start the missing pods on a new worker node. In consequence, Kubernetes starts loading all missing images. The massive image loading causes network latencies, which may favor the bug. Unfortunately, several images ran into retries, and the docker registry active the pull rate limit.

Conclusion

This bug is a severe error. Think of an occurrence in productive operation. Here, loading container images is so essential that such problems make operation a gamble. Here, a different post reports a Docker image-related problem.

Interested in more postings?

Windows Subsystem for Linux and Minikube

The Windows Subsystem for Linux is the seamless integration of Linux into Windows. Use Windows natively and quickly issue a Linux command. Apply a Linux command to the Windows file system without having to start a virtual machine. As a result, Linux is always...

New: Openshift OKD causes image layer not known problems

The POD fails to start, and referrers to the image layer not known. The "layer not known" issue may affect one or more cluster nodes. In effect, there is a corrupt docker image on the local disk cache. The layer not known problem still exists, even after a node or...

The Docker daemon configuration files

Where are the Docker daemon configuration files located? How to restart the Docker daemon after applying changes to the configuration? How to change and activate the Docker configuration? These are frequently asked questions. But changes to the Docker configuration...

Docker Content Trust

Docker Content Trust feature enables your environment to run only with signed images. In this way, Docker Content Trust ensures that the docker pulls only signed containers from the docker registry. Once enabled, Docker Content Trust is active for all docker pull...

Docker, networks, subnets and IP address pools

Docker uses default address pools to create subnets. For most use cases, the shipped defaults fit. But sometimes they cause conflicts with existing networks or subnets. Overlapping networks may conflict with existing systems. Or a large number of docker networks...

Docker networks and subnets

Docker uses default address pools to create subnets. For most use cases, the shipped docker subnet defaults fit. But sometimes the docker subnets cause conflicts with existing networks or subnets. Overlapping networks may conflict with existing systems. Or a large...

Software containerization with docker reviewed

Docker software containerization reviewed Putting Software into containers seems to be state of the art. But what are the benefits? Are there any drawbacks? Most people have heard about docker technology. And not less have used docker. But we are looking towards more...

Docker process virtualization

Docker is a lightweight framework for virtualizing application processes. Instead of emulating a computer hardware that still needs an operating system to run applications, Docker takes a different approach. Docker is able to pretend an operating system environment to...