Welcome to the first of our monthly recaps. We’ve been heads-down shipping, and we want to start sharing more of what’s changing inside Cloudfleet, what we’re learning from running it, and where the cloud-native world is headed.
Q4 2025 was one of our busiest quarters ever. Rather than treat it as old news, we want to open 2026 by putting it all in one place. Here’s the tour.
The headline: Cloudfleet Container Registry
The launch we are quietly proudest of is the preview release of the Cloudfleet Container Registry (CFCR), our own fully managed, private, OCI-compliant registry that integrates natively with Cloudfleet clusters.
The deeper reason we built CFCR is that we want to own the full experience of running containers on Cloudfleet, end to end. The cluster, the nodes, the network, and now the registry that holds the images those nodes pull. Every handoff between vendors in that chain is a place where authentication breaks, latency creeps in, or a credential rotation slips through the cracks. Owning the whole path lets us make it actually feel like one product.
The user-facing pitch is just as simple: most teams running on Cloudfleet were already paying for, configuring, and rotating credentials against a third-party registry. CFCR removes that whole tier of toil. Push your Docker images, multi-architecture images, and Helm charts to a registry that already knows about your cluster. Clusters authenticate automatically, so you do not need to create image pull secrets or manage credentials by hand. CFCR is available across our Europe, North America, and APAC regions during the preview.
If your build pipeline currently has a screen full of YAML dedicated to registry authentication, you’re going to enjoy this one.
An MCP server inside the Cloudfleet CLI
The other launch we keep getting questions about is the Model Context Protocol (MCP) server built directly into the Cloudfleet CLI. With it, AI assistants like Claude, Cursor, and other MCP-compatible tools can list clusters, query Kubernetes resources, and manage your infrastructure through natural language. The setup steps live in the MCP server documentation.
Why this matters: a lot of the day-to-day operator work on Kubernetes is just “go look at the thing.” Pulling cluster state into an AI assistant turns that into a conversation rather than a context switch into a terminal. We’re using it ourselves on call and at our desks, and it has changed how we triage.
This is the start, not the end. The MCP surface area will keep growing through 2026, alongside the rest of our automation story.
Cloud provider polish
A lot of Q4 was about making the multi-cloud experience smoother in the places where customers actually live.
GCP load balancing went GA. Service type: LoadBalancer is now fully supported on GCP nodes through native GCP load balancers, completing the trio with AWS and Hetzner. See the Exposing applications to the internet documentation for the current matrix.
Hetzner Cloud shipped a new generation of shared vCPU servers, the CX Gen3 (cx23, cx33, cx43, cx53) and the CPX Gen2 (cpx12, cpx22, cpx32, cpx42, cpx52, cpx62). Our node auto-provisioner considers them automatically. The flip side of that change is that Hetzner deprecated the previous-generation cx22–cx52 and cpx11–cpx51 types. Starting January 1, 2026, Hetzner no longer offers them for new orders. Cloudfleet keeps them available for as long as Hetzner does, because they may still be cost-optimal for certain Hetzner Kubernetes workloads, but if you want to exclude them explicitly, the December 13 release note has a node-anti-affinity snippet you can drop straight into your pod spec.
The auto-provisioner got smarter about capacity exhaustion. Capacity-out is one of the more frustrating failure modes in multi-cloud Kubernetes. Your cluster decides it needs a node, the cloud provider says “sorry, none of those left in this region,” and you sit there waiting for the auto-provisioner to recover. We shipped a fix: when a provider reports capacity exhaustion for the chosen instance type, the auto-provisioner now automatically falls forward to the next acceptable option that meets your workload requirements, instead of getting stuck. In practice, this turns a class of stuck-scheduling incidents into a small extra second or two of latency before your pod gets a node.
GPU sharing for NVIDIA workloads
For clusters with NVIDIA GPUs, we added support for GPU sharing via time-slicing and MPS. Multiple pods can now share a single GPU, which is the right answer for inference workloads, dev/test environments, and any workload that does not need a whole GPU to itself. The GPU-based workloads documentation has the configuration details.
This is a big quality-of-life improvement for teams that are GPU-constrained, particularly in the AI inference space where utilization on a per-card basis is often the gating cost factor.
CLI everywhere: now in a YUM repo
If you run Red Hat-based Linux distributions, the Cloudfleet CLI is now available through our official YUM repository. Fedora, RHEL, CentOS, Rocky Linux, and AlmaLinux users can install and upgrade with dnf, exactly like every other native package. This joins our existing Homebrew, APT, Winget, and Nix support, so the CLI now meets you on whatever platform you happen to be working on.
Hardening on the data plane
Two changes in this category. First, we shipped enhanced DNS amplification mitigation so that clusters cannot be used as reflectors in DNS-based DDoS attacks. This is the sort of work you only notice when it is missing.
Second, we identified and fixed a critical issue affecting large clusters and clusters that span multiple clouds, regions, or datacenters. The symptom was elevated CPU and bandwidth consumption on nodes, plus intermittent network issues for inter-node traffic. It is exactly the kind of bug that hides in plain sight when individual clusters are small and only surfaces when the topology gets interesting. The fix is rolled out and every freshly auto-provisioned node picks it up. If you operate larger clusters and want to bring older nodes onto the fix without waiting for natural replacement, drain the existing nodes and let auto-provisioning re-create them, or use the CLI 0.8.8 and Terraform Provider 0.1.7 to re-add your self-managed nodes. If a rolling node replacement is not feasible for your environment, Cloudfleet support can plan a tailored migration with you.
A quieter Cilium upgrade, and one BGP note
The CFKE data plane was upgraded to Cilium 1.18.5. The upgrade is invisible for the vast majority of customers, but there is one carve-out worth flagging. This release of Cilium ships its BGP CRDs under the stable cilium.io/v2 API version, deprecating the older v2alpha1. If you run on-premises Kubernetes with BGP-based load balancing, update your manifests accordingly. The On-premises load balancing with BGP documentation has details.
A short January note: when a vendor’s catalog is wrong
January itself was a quieter month on launches and a busier one on resilience work, and there’s one incident worth telling briefly because the lesson generalizes.
A small percentage of clusters hit intermittent node provisioning failures. The root cause turned out to be a Hetzner instance family that the vendor had retired but their API still listed as available. From the auto-provisioner’s perspective, it looked like a perfectly valid candidate. When the server creation call came back with an error we hadn’t seen before, the retry path concluded the request had merely failed transiently and tried again. The retries piled up, hit the API rate limit, and from that point on every other operation that needed the same vendor backend slowed down too.
We pushed two fixes. We removed the affected instance family from our catalog so the auto-provisioner stops considering it, and we taught the auto-provisioner to recognize the new error shape and stop retrying on it. The combination broke the loop and restored normal cluster operations.
The deeper lesson is older than Cloudfleet: vendor APIs are eventually consistent with reality, and they will surprise you. Defensive programming on the consumer side, with sensible retry budgets, circuit breaking, and a well-curated instance catalog, is non-negotiable for any platform that talks to multiple clouds. You’ll see more of that defensive layering land through February.
The wider Kubernetes world
While we were shipping the above, the upstream community wrapped up Kubernetes v1.35 “Timbernetes”. Two stable-track features from the recent releases are especially relevant for Cloudfleet customers:
- Dynamic Resource Allocation (DRA) went stable in v1.33 and continues to mature. For GPU-heavy workloads, DRA is the long-awaited replacement for ad-hoc device plugins, with a richer model for requesting and sharing accelerators. CFKE’s existing GPU workloads support already covers the common cases, and we’re aligning our roadmap with DRA so multi-cloud GPU pools become genuinely portable.
- Topology-aware routing is GA. CFKE enabled this by default more than a year ago, so this is just a quiet confirmation that the direction was correct.
What’s next
The themes for the first half of 2026 are coming into focus: smarter scheduling, calmer infrastructure, more clouds, and tighter integration with the tools you already use to ship software. February already has a stack of improvements out the door, and we’ll cover them next month.
If something here resonates, or you’d like to see a particular topic in a future recap, reach out via Cloudfleet support or your account manager. These posts get better when you tell us what you actually want to read about.
