New: Openshift OKD causes image layer not known problems

by | Jan 8, 2021 | Docker

The POD fails to start, and referrers to the image layer not known. The “layer not known” issue may affect one or more cluster nodes. In effect, there is a corrupt docker image on the local disk cache. The layer not known problem still exists, even after a node or cluster restart.

Image layer not known

The CRI-O container runtime v1.18.2 causes this problem. Here, the developer optimized the code for more performance. Unfortunately, this optimization, in some cases, does not flush file operations. As a result, image metadata is missing or corrupt. Accordingly, the underlying docker image becomes unusable. Openshift OKD 4.5 and event 4.6 nightlies become affected.

CRI-O v1.18.2 image issue

Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "XXXX": layer not known

Layer not known affected CRI-O versions

The GitHub issue 4285 references this problem. As a result, the CRI-O v1.18.4 solves this problem. Also, OKD 4.6 should include a fixed v1.19. Therefore, you should think of upgrading.

Manually fixing corrupted images.

A suitable workaround to fix the problem is to delete images from the node. After that, the container platform can load the missing docker images from the docker registry. In effect, the pods find valid docker images and start.

delete all image from cluster node

systemctl stop kubelet
systemctl stop crio
rm -rf /var/lib/containers/
systemctl start crio
systemctl start kubelet

Conclusion

For more details, refer to the Redhat Bugzilla entry. In my case, the deletion of all images solved all problems. Think of upgrading to the next OKD release. On a productive system, the occurrence of the error is annoying. But it is at least solvable. Another article describes a similar problem. But this problem is not so easily solvable and, together with other problems, can lead to a total failure.

 

0 Comments

Leave a Reply

Explore Articles That Align With Your Interests

Overprovisioned Host System – A Nightmare

Overprovisioned host systems in virtualized environments often cause performance issues. Steal Time is a reliable indicator for identifying such bottlenecks. This article explains how to monitor Steal Time using top, the impact of high values, and how monitoring tools...

Well documented: Architecture Decision Records

Heard about Architecture Decision Records? Anyone who moves to a new team quickly faces familiar questions. Why did colleagues solve the problem in this way? Did they not see the consequences? The other approach would have offered many advantages. Or did they see...

Why Event-Driven Architecture?

What is event-driven architecture? What are the advantages of event-driven architecture, and when should I use it? What advantages does it offer, and what price do I pay? In the following, we will look at what constitutes an event-driven architecture and how it...

On-Premise? IaaS vs. PaaS vs. SaaS?

What does it mean to run an application in the cloud? What types of clouds are there, and what responsibilities can they take away from me? Or conversely, what does it mean not to go to the cloud? To clarify these questions, we first need to identify the...