How to resolve 'node unavailable, kubelet stopped posting node status' when using Rancher

Feb 1, 2025

Problem

When using Rancher, sometimes a worker node may stop working, and you may encounter a warning like this:

Unavailable
kubelet stopped posting node status

Environment

Docker: Server Version: 19.03.13
Rancher 2.x

Debug

You can debug the node status by running this command:

kubectl describe nodes

Then check the kubelet logs on the node:

journalctl -u kubelet

Solution

Solution #1: Restart docker/kubelet service

You can try to restart the Docker service on the non-working node:

On CentOS:

service docker restart

On Ubuntu:

systemctl restart docker
systemctl restart kubelet

Solution #2: Reboot the node

If you have root permission and the server is ready to reboot, you can do this:

reboot

Solution #3: Recreate the cluster

You can follow this guide to recreate the cluster.

Solution #4: Remove and then re-add the node

First, remove the node from the cluster.
Second, add the node to the cluster again or perform an etcd snapshot restore by following this guide.

Solution #5: Disable swap memory on the node

You can follow this guide or simply execute the following command:

swapoff -a

Solution #6: Re-enable IP forwarding for Docker

Dockerd enables IP forwarding (sysctl net.ipv4.ip_forward) when it starts. However, if you run service network restart, it will disable IP forwarding while stopping networking. You need to re-enable it.

You can verify the ip_forward status by running:

docker info|grep WARNING

If you see this:

WARNING: IPv4 forwarding is disabled

Then you should re-enable IP forwarding temporarily:

sudo sysctl -w net.ipv4.ip_forward=1

Or permanently:

echo "net.ipv4.ip_forward = 1" | sudo tee -a /etc/sysctl.conf

Summary

This post outlines several methods to resolve the “node unavailable, kubelet stopped posting node status” error in Rancher. Key solutions include restarting Docker and kubelet services, rebooting the node, recreating the cluster, and reconfiguring IP forwarding. These steps should help restore node functionality and ensure smooth operation of your Rancher-managed Kubernetes cluster.

Final Words + More Resources

My intention with this article was to help others who might be considering solving such a problem. So I hope that’s been the case here. If you still have any questions, don’t hesitate to ask me by email: Email me

Here are also the most important links from this article along with some further resources that will help you in this scope:

👨‍💻 Rancher Cluster Provisioning Guide
👨‍💻 Restoring etcd in Rancher
👨‍💻 Disable Swap Partition in CentOS/Ubuntu

Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!