February 2026 at Cloudfleet: scheduling that respects reality

February was a month of meaningful platform upgrades: more accurate scheduling, fewer surprise restarts on your nodes, stronger guardrails around the infrastructure we manage on your behalf, and a smarter rate limiting layer that coordinates every cloud API call across the entire control plane.

Here’s what shipped, and why each change matters more than its one-line release note suggests.

Resource reservations now adapt to the node you actually got

Kubernetes lets the kubelet reserve some CPU and memory on every node for system services and itself, so that cluster components do not get squeezed out by tenant workloads. The number is configurable, but most platforms set it once and forget it.

The trouble is that the right reservation on a 64 vCPU server is wildly different from the right reservation on a 2 vCPU server. Set a static absolute reservation, and you’ll either waste a meaningful slice of capacity on big nodes or risk overcommit on small ones. Set a static percentage, and you flip the problem the other way.

In February, node auto-provisioning started calculating these reservations dynamically based on instance size. The result is more honest scheduling: when the scheduler thinks a 4 vCPU node has 3.6 vCPU available for your pods, it actually does. On smaller instance types, this also removes a class of memory overcommit issues that could otherwise show up as OOM kills or evictions under pressure.

If you’ve ever stared at a node’s Allocatable and wondered where the missing capacity went, you’ll appreciate this one.

Taking control of node upgrades

Cloudfleet worker nodes are based on Ubuntu, and Ubuntu ships with unattended-upgrades enabled by default. On a desktop or a single-tenant server, that’s a sensible security feature. On a Kubernetes node running production workloads, an unannounced package upgrade in the middle of the night can trigger a kubelet restart, which ripples into pod restarts, brief loss of node readiness, and, in the worst case, masks a more serious problem with confusing timing.

We’ve disabled automatic Ubuntu upgrades on cluster nodes and moved that responsibility fully into Cloudfleet. We own the node image and the upgrade cadence end-to-end, so security patches still land promptly, but they land on a schedule we control and through node lifecycle events you can plan around.

This is a direct upgrade to the reliability of your cluster nights and weekends: the kubelet restart that didn’t happen at 3 a.m. is one less page for your on-call engineer.

Don’t touch my virtual machines

One of the quietly powerful things about Hetzner Cloud Kubernetes on CFKE is that you can park your own virtual machines in the same private network as your cluster nodes, then talk to them over private IP without crossing the public internet. A lot of customers were doing exactly that: managed-or-self-hosted databases, message brokers, jump hosts, legacy services that haven’t moved into containers yet, build runners, monitoring stacks, you name it. Anything that isn’t on Kubernetes yet but still needs a fast, private path to the workloads that are.

The problem: when CFKE cleaned up resources at the end of a cluster’s life, the cleanup logic could remove the Hetzner network and firewall even when customer-managed VMs were still attached to them. That’s not data loss, but it is a network disruption nobody asked for, and it cuts the cord between your remaining VMs and anything else they were talking to over that network.

The February fix is straightforward in description and careful in execution. Before removing a network or firewall, CFKE now checks whether anything outside its own bookkeeping is attached. If your VMs are using the network, the network stays. We’ll log the situation and skip the cleanup rather than do anything destructive.

This is exactly the kind of guardrail you want from a control plane that touches your real infrastructure.

Smarter rate limiting across every cloud API we touch

Rate limits are one of the more interesting distributed systems problems we deal with, and they get harder the moment you stop treating the control plane as a single thing.

Cloudfleet’s control plane is made up of multiple components that each have legitimate reasons to talk to the same cloud provider API. The node auto-provisioner is asking the cloud about instance availability and creating servers. The cloud controller manager is reconciling load balancers and routes. The garbage collector is sweeping up orphaned resources. The CSI plumbing is provisioning and attaching volumes. From the cloud provider’s side, all of those calls land on the same API key against the same rate-limit budget. From our side, none of those components are individually aware of how much budget the others are spending.

The naive approach is to let each component manage its own backoff and hope for the best. In practice that means any one component, under load, can quietly exhaust the shared budget and starve every other component. The first sign anything is wrong is usually that an unrelated cluster operation, load balancer reconciliation, or volume attach, mysteriously slows down or fails.

We rolled out a smarter rate limiting layer that sits in front of every cloud provider API call, regardless of which control plane component is making it. It tracks how close we are to the published API limits in aggregate across all callers, paces requests so we stay comfortably under them, and applies circuit breaking when error rates spike so a misbehaving call path can’t drag the rest of the cluster’s operations down with it.

The first version landed for Hetzner Cloud in February, and we’re extending the same layer to every other cloud provider Cloudfleet integrates with, AWS, GCP, OVH, Scaleway, Exoscale, Upcloud, the lot. “Treat every vendor API as a shared, partially cooperative dependency, and coordinate consumption across our own components” is now the default posture across the platform.

What the rest of the industry was doing

While we were patching the platform, the Cloud Native Computing Foundation published their State of Cloud Native 2026 report. A few signals from it match what we hear from Cloudfleet customers every week:

The CNCF ecosystem now spans more than 230 projects and 300,000 contributors. The center of gravity has shifted well beyond container orchestration into observability, platform engineering, FinOps, and AI tooling.
Cost pressure on AI workloads is pushing teams to mix hyperscalers with regional providers. The bet on multi-cloud as a cost lever, not just a resilience lever, is paying off.
Platform engineering keeps getting more concrete. The conversation is moving away from “should we have a platform team” and toward “what does the developer-facing surface area look like in practice.”

All three of those resonate with us. CFKE is built on the bet that running Kubernetes well across many clouds is a hard problem worth solving once, so individual teams don’t keep reinventing it. The industry data suggests we’re not alone in that belief.

Looking ahead

March brings KubeCon + CloudNativeCon Europe in Amsterdam. We’ll be watching the announcements closely and you can expect more behind-the-scenes platform work to land while we do. See you next month.

February 2026 at Cloudfleet: scheduling that respects reality

Resource reservations now adapt to the node you actually got

Taking control of node upgrades

Don’t touch my virtual machines

Smarter rate limiting across every cloud API we touch

What the rest of the industry was doing

Looking ahead

Latest Cloudfleet tutorial: Install and Configure Istio service mesh

Cloud Native infrastructure blog topics

February 2026 at Cloudfleet: scheduling that respects reality

Resource reservations now adapt to the node you actually got

Taking control of node upgrades

Don’t touch my virtual machines

Smarter rate limiting across every cloud API we touch

What the rest of the industry was doing

Looking ahead

Latest Cloudfleet tutorial: Install and Configure Istio service mesh

Sign up for Cloud Native Newsletter

Cloud Native infrastructure blog topics