Kubernetes in the Hetzner cloud

Last month I started my journey down the path of Kubernetes, and today I can announce that every hosted resource of my home lab is successfully migrated into my new cluster. What? A new cluster? Indeed, I’m dogfooding this stuff to see what all the fuss is about. So this includes a bunch of websites including this one, some of my old tools like Jenkins, new tools like ArgoCD, Keycloak, Grafana, Nextcloud, my e-mail infrastructure and initially also my public anonymous FTP service but that eventually found a different home. Here’s the gist of my journey, all of which now sits in the Hetzner Cloud environment.

First off, why Hetzner? Simply put: because it is cheap. I’ve been a customer with them on and off for the past 20 years or so, and my previous setup was based around a big fat dedicated server of theirs (the SX-64 type) that I had been running for a few years now. It has 64TB of raw HDD-based storage, 64GB of RAM and a decent Ryzen CPU to top it all off. That box handled just about anything I could throw at it, running very reliably on FreeBSD. The cost? Slightly over EUR 100 per month, including a USB-stick for the read-only boot volume.

The challenge: redundancy. I want my environment to be more resilient against component failure. The server had been aging, and I’m starting to see blips in the SMART reporting against the disks. FreeBSD definitely started to dislike the box, which it showed by occasionally rebooting for no apparent reason and without a trace in the logs. That’s never a good sign, but since FreeBSD is no longer officially supported at Hetzner I’m out of luck there. The choice:

A new server would just postpone the inevitable. FreeBSD still won’t be supported at Hetzner, and I’ll run into hardware breakage again at some point sooner or later. Hosting everything at home is still a viable route, but not one my wallet likes very much right now. An upfront investment hurts more than a subscription-based model. So cloud then? Sure, why not!

The major hyperscalers are out, though. I’d like you to believe that sovereignty is the only driver here, but financials are a massive factor as well. I want to stay below the EUR 100/month threshold, and preferably save some money with this. So cloud, which Hetzner offers as well, so that’s where I started. I was pleasantly suprised by what Hetzner offers even if it’s a far cry from what the big hyperscalers provide.

Hetzner Cloud for resilient infrastructure

My shopping list needs a few things:

To run this low-cost I could plonk everything onto a pair of CX53 VM’s and be done with most of it, and I probably would have gone that route if FreeBSD was still a first-class citizen at Hetzner. Migration would have been simple.

With FreeBSD out of the picture, I looked at Kubernetes. More specifically: Talos Linux. It’s the only viable way you can run Kubernetes on an otherwise self-managed platform. There is no way in hell that I’m going to manage both something like Debian and Kubernetes on top of that.

Connectivity

I’m building a Kubernetes cluster on VM’s. Specifically starting with the Hetzner CX33, which gives me 4 cores of CPU, 8GB of RAM and 80GB of local storage. Fine to bootstrap the initial cluster onto, at a price of one cent per hour. Eventually that single VM would grow into at least three nodes for control plane redundancy but for testing the waters a single Talos node is fine.

Now a VM in the Hetzner cloud gets provisioned with public networking enabled by default. You get an IPv4 address and an IPv6 /64 subnet. Optionally you can add a private network alongside, which is interesting because I want my nodes to have local connectivity away from the internet. So I opted for a design where a single gateway VM of the cheapest type (a CX23, at EUR 3.62/month) would run OPNSense. That box gets public addresses and the .2 address in my newly envisioned private network. It’ll then serve as the gateway for my actual Kubernetes nodes to reach the internet: at EUR 3.62 per month I’m evading the monthly cost of EUR 19.84 for a Hetzner load balancer with enough capacity for my needs. The redundancy hit is a risk I’m willing to accept for now.

So step 1: setup OPNSense and wire it into my existing WireGuard-based WAN. My private network in Hetzner’s cloud lives at 10.50.0.0/24 (DC Falkenstein) while an existing network for my old server lives at 10.20.0.0/24 (DC Helsinki), all routed over my home gateway in the Netherlands for now. Problematic? Not really, I’m not going to push terabytes over this convoluted contraption. Some level of direct reachability is useful though.

Adding a Talos node became an interesting exercise. Hetzner’s Private Networks aren’t flat L2 domains. Everything gets forced through their .1 address as the default gateway and your servers get a /32 IPv4 address ONLY, so no IPv6 at all there.

Using OPNSense, living at address .2, as a gateway requires that the Hetzner network recognizes it as its gateway into the rest of the universe. So I had to create a route 0.0.0.0/0 inside the Hetzner Cloud environment to point at 10.50.0.2 for my Talos node to see the outside world. This took a while, but it’s understandable given the architecture requirment of having a private network stretched across multiple DC locations. I get where Hetzner is coming from with these restrictions.

Conveniently, Hetzner offers a bunch of ISO images for OPNSense and Talos that are recent enough to serve as a starting point for all of this. Installing OPNSense this way was a breeze, but Talos was a bit finnicky. Its ISO would boot and not have a viable network configuration if the machine only had private network connectivity. So the procedure here turned into the following:

Now the node comes up, ready for Kubernetes bootstrapping or joining a cluster.

The Hetzner private network, not being an L2 domain and being severely restricted from a custom routing perspective, is not a good place for things like MetaLB to provide virtual load balanced IP’s from a Kubernetes cluster. I spent days in this quagmire and gave up on it. Sure you can talk BGP to OPNSense, but I haven’t figured out how to cross that forced hop over the .1 address at Hetzner. I’m open to suggestions!

Instead, I set up HAProxy inside OPNSense. The idea is simple: expose NodePorts on the cluster, register all the nodes in HAProxy and let that figure out availability. In TCP-mode, HAProxy also doesn’t interfere all too much with what my cluster is doing and many of my services accept the PROXY protocol so I still get visibility into real source IP’s.

The main disadvantage at this point is IPv6. My OPNSense gateway uses a single address from its routable subnet which it uses for HAProxy forwarding. The services in the cluster, being confined to IPv4 for now, don’t always like this. It’s something I’ll probably be able to solve eventually, but for now it’s acceptable.

As things stand at this point, I’m spending a little over EUR 20 per month for a 3-node cluster with a decent firewall/gateway in front of it. In a later post I’ll elaborate on the creation of the cluster itself and how I went about migrating actual services from FreeBSD into it.