Skip to content

How to resolve 'node unavailable, kubelet stopped posting node status' when using Rancher

Problem

When using Rancher, sometimes a worker node may stop working, and you may encounter a warning like this:

Terminal window
Unavailable
kubelet stopped posting node status

Environment

  • Docker: Server Version: 19.03.13
  • Rancher 2.x

Debug

You can debug the node status by running this command:

Terminal window
kubectl describe nodes

Then check the kubelet logs on the node:

Terminal window
journalctl -u kubelet

Solution

Solution #1: Restart docker/kubelet service

You can try to restart the Docker service on the non-working node:

On CentOS:

Terminal window
service docker restart

On Ubuntu:

Terminal window
systemctl restart docker
systemctl restart kubelet

Solution #2: Reboot the node

If you have root permission and the server is ready to reboot, you can do this:

Terminal window
reboot

Solution #3: Recreate the cluster

You can follow this guide to recreate the cluster.

Solution #4: Remove and then re-add the node

  1. First, remove the node from the cluster.
  2. Second, add the node to the cluster again or perform an etcd snapshot restore by following this guide.

Solution #5: Disable swap memory on the node

You can follow this guide or simply execute the following command:

Terminal window
swapoff -a

Solution #6: Re-enable IP forwarding for Docker

Dockerd enables IP forwarding (sysctl net.ipv4.ip_forward) when it starts. However, if you run service network restart, it will disable IP forwarding while stopping networking. You need to re-enable it.

You can verify the ip_forward status by running:

Terminal window
docker info|grep WARNING

If you see this:

Terminal window
WARNING: IPv4 forwarding is disabled

Then you should re-enable IP forwarding temporarily:

Terminal window
sudo sysctl -w net.ipv4.ip_forward=1

Or permanently:

Terminal window
echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf

Summary

This post outlines several methods to resolve the “node unavailable, kubelet stopped posting node status” error in Rancher. Key solutions include restarting Docker and kubelet services, rebooting the node, recreating the cluster, and reconfiguring IP forwarding. These steps should help restore node functionality and ensure smooth operation of your Rancher-managed Kubernetes cluster.

Final Words + More Resources

My intention with this article was to help others who might be considering solving such a problem. So I hope that’s been the case here. If you still have any questions, don’t hesitate to ask me by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!