How to resolve 'node unavailable, kubelet stopped posting node status' when using Rancher
Problem
When using Rancher, sometimes a worker node may stop working, and you may encounter a warning like this:
Environment
- Docker: Server Version: 19.03.13
- Rancher 2.x
Debug
You can debug the node status by running this command:
Then check the kubelet logs on the node:
Solution
Solution #1: Restart docker/kubelet service
You can try to restart the Docker service on the non-working node:
On CentOS:
On Ubuntu:
Solution #2: Reboot the node
If you have root permission and the server is ready to reboot, you can do this:
Solution #3: Recreate the cluster
You can follow this guide to recreate the cluster.
Solution #4: Remove and then re-add the node
- First, remove the node from the cluster.
- Second, add the node to the cluster again or perform an etcd snapshot restore by following this guide.
Solution #5: Disable swap memory on the node
You can follow this guide or simply execute the following command:
Solution #6: Re-enable IP forwarding for Docker
Dockerd enables IP forwarding (sysctl net.ipv4.ip_forward
) when it starts. However, if you run service network restart
, it will disable IP forwarding while stopping networking. You need to re-enable it.
You can verify the ip_forward
status by running:
If you see this:
Then you should re-enable IP forwarding temporarily:
Or permanently:
Summary
This post outlines several methods to resolve the “node unavailable, kubelet stopped posting node status” error in Rancher. Key solutions include restarting Docker and kubelet services, rebooting the node, recreating the cluster, and reconfiguring IP forwarding. These steps should help restore node functionality and ensure smooth operation of your Rancher-managed Kubernetes cluster.
Final Words + More Resources
My intention with this article was to help others who might be considering solving such a problem. So I hope that’s been the case here. If you still have any questions, don’t hesitate to ask me by email: Email me
Here are also the most important links from this article along with some further resources that will help you in this scope:
- 👨💻 Rancher Cluster Provisioning Guide
- 👨💻 Restoring etcd in Rancher
- 👨💻 Disable Swap Partition in CentOS/Ubuntu
Oh, and if you found these resources useful, don’t forget to support me by starring the repo on GitHub!