Automating Kubernetes Node Patching with Ansible — By Someone Who Doesn’t Like Ansible

Let’s get the obvious out of the way first: “What do you mean you DON’T LIKE ANSIBLE?”

Ansible is a very popular tool, and for good reason; it’s a powerful automation tool that allows centralized administration for however many servers you have with a variety of great out-of-the-box plugins, and if the builtin capabilities aren’t enough there are hundreds of community plugins available that will let you do pretty much anything against nearly any piece of technology out there that has an API (and probably a bunch that don’t). Knowing this, I will be the first to admit that my dislike of it is not especially logical, and is definitely a “me” thing.

My primary reason is that it is often touted as a holy grail and, in spite of my comment above, it isn’t — at least, not for my primary use case or at least, not until now. I’ll put a note at the bottom of this article explaining that.

Secondly, I strongly dislike the way you write configurations for it. Which actually makes no sense at all, coming from someone who generally likes both Terraform/HCL and YAML. I just didn’t particularly enjoy the way you have to split the configurations up into inventories and playbooks; I found it highly limiting. I have a second note at the bottom about that, too.

But enough whining. Let’s talk about what I did and why.

Why?

My Kubernetes homelab consists of 7 nodes:

4 Raspberry Pi 4b’s with 4 GB’s of RAM; 3 of which are the control plane nodes, all 4 of which have no SD card and are booting off a 256 GB external SSD
2 Intel NUC 12 Pros with i5–1250P processors, 16 GB RAM, and a 512 GB NVMe drive each
1 VM running on my Dell PowerEdge R720xd (which I only occasionally power on)

All 7 nodes are running Ubuntu 22.04 LTS and k3s v1.28.5. I’m also running Longhorn for distributed storage, and this is important later.

Keeping the nodes updated is kind of a chore; not least because the procedure looks a little like this:

kubectl drain <node> --ignore-daemonsets --delete-emptydir-data --pod-selector='app!=csi-attacher,app!=csi-provisioner'
SSH to the host.
sudo apt-get update && sudo apt-get upgrade
sudo reboot
Wait for the node to reboot and kubectl get nodes periodically to show that node as Ready,SchedulingDisabled.
kubectl uncordon <node>
Repeat for the other nodes.

On the Raspberry Pis in particular, due to the overall speed of the nodes, this often takes 5–10 minutes and requires at least one interaction with the SSH command line during step 3. Longhorn also tends to be a grumpy customer, and has held me at step one for 10+ minutes occasionally while constantly complaining that a pod on the node couldn’t be evicted because it would violate its disruption budget; something I didn’t quite grasp the why of until recently (and I’ll share it below).

Also note that here, I’m not upgrading Kubernetes/k3s; that’s for another article.

If I do nothing else, I can get the whole cluster updated in less than an hour. Usually, though, because I do this sort of thing off the side of my desk while I’m working, it takes no less than two. Hardly awful, but I find that I’m distracted and/or things fall by the wayside as I bring home the bacon.

I have been wanting to automate this process for a while, and I have known that the likely best too is Ansible, and when the latest round of me needing to update my cluster came up, I decided to bite the bullet.

How: Docker

A brief aside: I also hate Python.

I don’t hate the language per se, I just hate what it does to my system. The installer is weird, package management is weird, it tends to vomit files all over the system, virtualenv is weird, it’s not especially portable, and because of all of that when I rebuilt my main workstation earlier this year I refused to install it.

Anyway, pedantic ranting aside, Ansible is written in Python, so I was going to need that, but thankfully, Docker containers are a thing.

There isn’t an official Ansible Docker container, so I borrowed a Dockerfile from here and modified it a touch to include kubectl (more on that later), the Kubernetes Python module, and a requirements.ymlto install some Ansible modules. Of particular note, two improvements could be made here:

Move the pip3 install modules list to a requirements.txt as well, since if something changes in the future I then don’t have to update the Dockerfile (semantics, I know)
This could probably be slimmed down; the original source of the Dockerfile had this aimed at a slightly different use case than mine and might include extra things I don’t need

FROM python:slim

ENV DEBIAN_FRONTEND noninteractive

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
    apt-transport-https \
    ca-certificates \
    software-properties-common \
    openssh-client \
    sshpass \
    locales \
    bash \
    git \
    curl \
    rsync \
    zsh \
    nano \
    sudo \
    less \
    && apt-get clean \
    && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* \
    && rm -Rf /usr/share/doc && rm -Rf /usr/share/man

RUN curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl" 
RUN chmod +x ./kubectl
RUN mv ./kubectl /usr/local/bin

ARG USERNAME=ansible
ARG USER_UID=1000
ARG USER_GID=$USER_UID
ENV HOME=/home/$USERNAME
RUN groupadd --gid $USER_GID $USERNAME
RUN useradd -s /bin/bash --uid $USER_UID --gid $USER_GID -m $USERNAME
RUN echo $USERNAME ALL=\(root\) NOPASSWD:ALL >/etc/sudoers.d/$USERNAME
RUN chmod 0440 /etc/sudoers.d/$USERNAME

RUN pip3 install --no-cache-dir \
    ansible \
    ara \
    hvac \
    dnspython \
    jmespath \
    "hvac[parser]" \
    certifi \
    ansible-lint \
    ansible-modules-hashivault \
    kubernetes

ENV ANSIBLE_GATHERING smart
ENV ANSIBLE_HOST_KEY_CHECKING false
ENV ANSIBLE_RETRY_FILES_ENABLED false
ENV ANSIBLE_FORCE_COLOR true

COPY requirements.yml .

RUN ansible-galaxy collection install -r requirements.yml

RUN echo "LC_ALL=en_US.UTF-8" >> /etc/environment
RUN echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
RUN echo "LANG=en_US.UTF-8" > /etc/locale.conf
RUN locale-gen en_US.UTF-8

ENV DEBIAN_FRONTEND=dialog

requirements.yml:

collections:
  - name: kubernetes.core
    version: 3.0.0

Build this with: docker build -t djryanj/ansible:latest .

How: Ansible

On to the Ansible pieces. I created a couple of directories and some files to assist:

├── Dockerfile
├── id_rsa
├── inventory
│   └── inventory.yaml
├── playbooks
│   └── update-with-kubernetes.yaml
├── requirements.yml
└── update.sh

Note that this is the complete directory listing.

inventory.yaml :

k8s_controlplanes:
  hosts:
    k3s-rpi-1.domain.com:
    k3s-rpi-2.domain.com:
    k3s-rpi-3.domain.com:

k8s_workers:
  hosts: 
    k3s-rpi-4.domain.com:
    k3s-nuc-1.domain.com:
    k3s-nuc-2.domain.com:
    k3s-vm-1.domain.com:

k8s:
  vars:
    ansible_user: me
    ansible_become: yes
    ansible_ssh_private_key_file: /ansible/id_rsa
  children:
    k8s_controlplanes:
    k8s_workers:

You’ll need to update ansible_user: to whatever username is on your system.

update.sh :

chmod 400 /ansible/id_rsa
ansible-playbook -i /ansible/inventory/inventory.yaml /ansible/playbooks/update-with-kubernetes.yaml -K

Before going on, a couple of notes:

Although it’s slightly unclear at this juncture, I use a volume to forward the id_rsa SSH private key Ansible needs to connect to the nodes into the Docker container (that’s why the line ansible_ssh_private_key_file: /ansible/id_rsa is in inventory.yaml). The first line of update.sh ensures that the key’s permissions are set properly, otherwise Ansible will complain.
I am also forwarding my kubeconfig file; more on this later.
The final -K in the ansible-playbook line in update.sh means that Ansible will ask for a sudo password interactively; this imparts a particular requirement on how we run this using Docker. More on this below.

Finally, the playbook:

update-with-kubernetes.yaml :

---
- hosts: k8s
  gather_facts: false
  serial: 1

  tasks:

  - name: Update apt cache on {{ inventory_hostname_short }}
    ansible.builtin.apt:
      update_cache: yes

  - name: Check if there are updates for {{ inventory_hostname_short }}
    ansible.builtin.command:
      cmd: apt list --upgradable
    register: updates

  - name: Cordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: cordon
      name: "{{ inventory_hostname_short }}"
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Evict Longhorn volumes from {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: false
        - op: replace
          path: /spec/evictionRequested
          value: true
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Wait for Longhorn volume eviction on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_info:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
    register: replica_list
    until: "replica_list.resources[0] | community.general.json_query('status.diskStatus.*.scheduledReplica') |unique == [{}]"
    retries: 60
    delay: 10
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Drain node {{ inventory_hostname_short }}
    delegate_to: localhost
    ansible.builtin.shell: kubectl drain {{ inventory_hostname_short }} --ignore-daemonsets --delete-emptydir-data
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Upgrade all packages on node {{ inventory_hostname_short }}
    ansible.builtin.apt: 
      update_cache: no
      upgrade: yes
      force: yes
      dpkg_options: 'force-confdef,force-confold'
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  # Restart required?
  - name: Check if reboot is needed for {{ inventory_hostname_short }}
    stat: path=/var/run/reboot-required
    register: check_reboot 
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Reboot node {{ inventory_hostname_short }}
    ansible.builtin.reboot:
      connect_timeout: 5
      reboot_timeout: 600
      pre_reboot_delay: 0
      post_reboot_delay: 30
      test_command: whoami
      msg: "Reboot complete"
    when: check_reboot.stat.exists and updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Uncordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: uncordon
      name: "{{ inventory_hostname_short }}"
    tags:
      - always
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Re-enable Longhorn volumes on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: true
        - op: replace
          path: /spec/evictionRequested
          value: false
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

Now let’s talk a bit about the playbook itself, in sections.

serial: 1

This line is important, otherwise Ansible will try to do this work concurrently across the nodes (by default, 5). As much as it’s not as efficient, we only want this to be done on one node at a time. Note that you could run this playbook twice with different concurrencies; 1 for the control planes, and then 2 or whatever makes sense for the workers.

- name: Update apt cache on {{ inventory_hostname_short }}
    ansible.builtin.apt:
      update_cache: yes
- name: Check if there are updates for {{ inventory_hostname_short }}
    ansible.builtin.command:
      cmd: apt list --upgradable
    register: updates

These tasks update the apt cache on the host, then checks if there are any updates to be applied and registers a variable. The idea here is to skip nodes if they don’t have any updates to be applied (e.g., if the playbook fails, for example). In practice this is only somewhat successful, so I welcome any suggestions to improve the logic here.

- name: Cordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: cordon
      name: "{{ inventory_hostname_short }}"
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

Here we’re cordoning the node using the Kubernetes Ansible plugin (note the delegate_to: localhost line). Note that we aren’t draining the node just yet; this is because of some Longhorn stuff we need to do. Also note the last when: line, which is part of the logic in trying to detect if there are any actual updates to apply and in theory should skip it if there aren’t.

- name: Evict Longhorn volumes from {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: false
        - op: replace
          path: /spec/evictionRequested
          value: true
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0
- name: Wait for Longhorn volume eviction on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_info:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
    register: replica_list
    until: "replica_list.resources[0] | community.general.json_query('status.diskStatus.*.scheduledReplica') |unique == [{}]"
    retries: 60
    delay: 10
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

These tasks first use the excellent json_patch ability of the Kubernetes Ansible plugin to mark the Longhorn node as unschedulable and to request eviction of the Longhorn volumes. This is needed because as of right now, if we issue a drain command to the Kubernetes Ansible plugin, it will almost immediately fail with a 429 Too Many Requests error thrown back by Kubernetes, which the plugin does not seem to handle properly. The root of this is that there are Longhorn pods on each node where Longhorn volumes exist, and they cannot be evicted from the node until all Longhorn volumes are evicted as well. Draining the node does activate this eviction process in Longhorn, but I find that explicitly evicting the volumes is a bit more transparent in terms of what we’re trying to accomplish.

The second task here monitors the eviction. In Longhorn, the eviction process ensures that there is the requested number of replicas of a volume available, which in my case defaults to 3, so what this is actually doing is making a new replica on another node. It does this by copying all the data, and depending on how fast your network/disks are, this could take a bit. So, take care to adjust the retries: and delay: lines here for your environment; obviously, it will only use what’s needed, but in my case I have some decent sized volumes and a 10 minute maximum (60 retries delayed by 10 seconds apiece) is sufficient for now.

- name: Drain node {{ inventory_hostname_short }}
    delegate_to: localhost
    ansible.builtin.shell: kubectl drain {{ inventory_hostname_short }} --ignore-daemonsets --delete-emptydir-data
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

Here we drain the nodes, this time using kubectl instead of the Kubernetes Ansible plugin. In spite of my explicit eviction of Longhorn volumes above, was not able to consistently get it to work without errors otherwise.

- name: Upgrade all packages on node {{ inventory_hostname_short }}
    ansible.builtin.apt: 
      update_cache: no
      upgrade: yes
      force: yes
      dpkg_options: 'force-confdef,force-confold'
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  # Restart required?
  - name: Check if reboot is needed for {{ inventory_hostname_short }}
    stat: path=/var/run/reboot-required
    register: check_reboot 
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Reboot node {{ inventory_hostname_short }}
    ansible.builtin.reboot:
      connect_timeout: 5
      reboot_timeout: 600
      pre_reboot_delay: 0
      post_reboot_delay: 30
      test_command: whoami
      msg: "Reboot complete"
    when: check_reboot.stat.exists and updates.stdout_lines | reject('search','Listing...') | list | length > 0

These are the upgrade tasks. In particular, the line dpkg_options: ‘force-confdef,force-confold’ tells dpkg to accept defaults and not prompt. This is slightly dangerous, but acceptable in my case as my nodes have no specific customizations on them. It then checks if a reboot is needed, then reboots if needed.

- name: Uncordon node {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_drain:
      state: uncordon
      name: "{{ inventory_hostname_short }}"
    tags:
      - always
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

  - name: Re-enable Longhorn volumes on {{ inventory_hostname_short }}
    delegate_to: localhost
    kubernetes.core.k8s_json_patch:
      kind: nodes
      namespace: longhorn-system
      api_version: longhorn.io/v1beta2
      name: "{{ inventory_hostname_short }}"
      patch:
        - op: replace
          path: /spec/allowScheduling
          value: true
        - op: replace
          path: /spec/evictionRequested
          value: false
    when: updates.stdout_lines | reject('search','Listing...') | list | length > 0

The final 2 tasks uncordon the node making it ready for Kubernetes scheduling again, and then does the same for Longhorn volumes.

After this, Ansible proceeds with the next host in the inventory.

How: Tying it All Together

The final command to run this in Docker from my Windows 11 machine resembles this:

docker run -it --volume ${pwd}:/ansible --volume C:\Users\ryan\.kube\config:/home/ansible/.kube/config djryanj/ansible bash /ansible/update.sh

As I mentioned above, I’m forwarding both my kubeconfig and the entire contents of my local directory which contains all the files above and the id_rsa private key, which is a key part of keeping this at least somewhat secure.

The -it line in the run command also lets the Docker terminal launch interactively, which means that you will be able to type in the sudo password that ansible-playbook will request.

What about Talos Linux? It avoids all this!

Yes it does, and that’s on my list of things to do in the future, but this was a good exercise on its own.

Next up, I’ll also write some playbooks to automatically upgrade k3s on the nodes.

Addendum

Why I Don’t Like Ansible: Expanded (Slightly), Part 1

The biggest gripe I have with Ansible is that when bootstrapping new VMs or bare metal hardware (especially a Raspberry Pi), there’s always at least one manual intermediate step between deployment and being able to fire Ansible at the system for configuration required in a homelab situation — or, at least, in my homelab situation.

Deployment of a Raspberry Pi looks like this at minimum:

Flash SD card or USB SSD (or whatever) with whatever OS you want. If you’re feeling brave and your OS supports it, try to craft a cloud-init script that automatically sets the IP address of the device on first boot. You will probably fail.
Move the SD card or USB SSD (or whatever) to the device.
Boot the device, discover your cloud-init failed, and proceed to set the IP address manually either by finding the IP address in your DHCP leases list or using a console on the device directly.
Perform any other things to the node you need.
Either import, create, or install your id_rsa.pub file on the node so that Ansible can connect to it automatically.
Finally run Ansible playbooks against it to configure it.

A VM is a little less annoying, depending on your hypervisor, since it can be done completely remotely, but generally:

Create the VM and mount the OS ISO; or, if you have a template, deploy the template.
Boot the VM and console into it.
Install the OS and reboot the VM.
If your template set the IP address, fantastic; if not, using the console, set the IP address.
Perform any other things to the node you need.
Either import, create, or install your id_rsa.pub file on the node so that Ansible can connect to it automatically.
Finally run Ansible playbooks against it to configure it.

I work in cloud, and I’m used to writing Terraform configurations that result in a completely booted VM with options already set. I leverage cloud-init a lot in the cloud, but that’s just not possible in the same way in my homelab. I understand that equating my lab to a massive cloud provider is absolutely ludicrous, but it’s still frustrating.

Many homelabbers use Ansible to do post-boot configuration, but I haven’t found a reliable way to create new hosts in my lab in a completed software-defined way, which is a big part of why I haven’t embraced Ansible.

Why I Don’t Like Ansible: Expanded (Slightly), Part 2

The second reason I mentioned is about how you write configurations. This is much less of a problem than it used to be; my first crack at Ansible was before I began to work with cloud providers, and so things like YAML and even software-defined or Infrastructure as Code things were hard to grasp. Splitting things into inventories, playbooks, or just running a task from a single command line command seemed incredibly arcane at the time.

Ansible’s configuration is not especially user-friendly, and I find their documentation isn’t very approachable either, even as a seasoned IaC professional today. It assumes a level of skill that not everyone has, and while I understand that it’s aimed at a particular demographic of IT professionals, personally I think that it’s a miss on the part of the Ansible team because I firmly believe that designing and writing for the edges means everyone benefits.

Even putting this playbook together, I had to seek out the syntax I needed to write for tasks based on the documentation for a particular task (like kubernetes.core.k8s_drain , for example), and then figure out how those translated to the actual task writing because I was unfamiliar with Ansible and annoyingly, very few (if any) examples are provided on the documentation website for tasks.

Contrast this to Terraform providers, most of which make a point of including a detailed example for every resource type (somewhat analogous to a task in an Ansible playbook, though not exactly) on every documentation page.

Once again, I fully admit this is a “me” thing and once you get to know the particular foibles of a piece of technology they usually fade into the background. I know lots of people have complained about technologies I like based on personal preferences, which is something I both understand and support — to a degree.

Personally and professionally, I approach technologies with as objective of an eye as I can, and when I recognize that something is the best tool for the job, I work through my biases to get what needs doing done, and this particular exploration of Ansible (and this article) are testament to that.

Thanks for reading.

-r-