New: Openshift OKD causes image layer not known problems

by | Jan 8, 2021 | Docker

The POD fails to start, and referrers to the image layer not known. The “layer not known” issue may affect one or more cluster nodes. In effect, there is a corrupt docker image on the local disk cache. The layer not known problem still exists, even after a node or cluster restart.

Image layer not known

The CRI-O container runtime v1.18.2 causes this problem. Here, the developer optimized the code for more performance. Unfortunately, this optimization, in some cases, does not flush file operations. As a result, image metadata is missing or corrupt. Accordingly, the underlying docker image becomes unusable. Openshift OKD 4.5 and event 4.6 nightlies become affected.

CRI-O v1.18.2 image issue

Failed to create pod sandbox: rpc error: code = Unknown desc = error creating pod sandbox with name "XXXX": layer not known

Layer not known affected CRI-O versions

The GitHub issue 4285 references this problem. As a result, the CRI-O v1.18.4 solves this problem. Also, OKD 4.6 should include a fixed v1.19. Therefore, you should think of upgrading.

Manually fixing corrupted images.

A suitable workaround to fix the problem is to delete images from the node. After that, the container platform can load the missing docker images from the docker registry. In effect, the pods find valid docker images and start.

delete all image from cluster node

systemctl stop kubelet
systemctl stop crio
rm -rf /var/lib/containers/
systemctl start crio
systemctl start kubelet

Conclusion

For more details, refer to the Redhat Bugzilla entry. In my case, the deletion of all images solved all problems. Think of upgrading to the next OKD release. On a productive system, the occurrence of the error is annoying. But it is at least solvable. Another article describes a similar problem. But this problem is not so easily solvable and, together with other problems, can lead to a total failure.