Self-managed nodes
For supported cloud providers (AWS, GCP, and Hetzner Cloud), Cloudfleet provides node auto-provisioning that automatically creates and manages compute nodes based on your workload requirements. For on-premises infrastructure and other cloud providers, you manually provision compute nodes and register them with Cloudfleet. This guide explains how to add self-managed nodes to your CFKE cluster.
Not sure which approach to use? See How Cloudfleet provisions nodes for a comparison of node auto-provisioning versus self-managed nodes.
Self-managed nodes are Linux machines that you provision and give to CFKE to allow it to run workloads on them. This allows you to use your own hardware or third-party cloud resources to expand your cluster. Self-managed nodes act as equal members of the cluster and can run any workload that their hardware supports.
Requirements
To be able to add self-managed nodes to your cluster, your node must meet the following requirements:
- Your node must run Ubuntu 22.04 or 24.04 (Red Hat Enterprise Linux and variants are on the roadmap)
- You must have SSH access from a bastion host or your local workstation with root access
- Your node must have egress internet access
- Your firewall must allow your server to initiate TCP connections to *:443, UDP from :41641 to : and UDP to *:3478.
- Your node can be behind NAT and does not need a public IP address. Cloudfleet is able to establish a secure tunnel behind NAT.
Please note that egress internet access is required for the self-managed nodes to be able to communicate with the CFKE control plane and establish VPN connections with other nodes in the cluster. If two self-managed nodes are in the same network and have a connection via a private IP, they communicate with each other over the private network. In this case, the VPN between nodes is still established and node-to-node communication is encrypted.
Adding self-managed nodes with Terraform
For production deployments, infrastructure-as-code workflows, or when adding multiple nodes at once, use the Cloudfleet Terraform provider. The provider supports two methods:
- Cloud-init (preferred): Generates configuration that automatically registers nodes when they boot. Use this method when your target platform supports cloud-init, as it enables fully automated provisioning without requiring SSH access during setup.
- SSH: Connects to existing machines via SSH to install and configure them. Use this for bare-metal servers, existing VMs, or platforms that do not support cloud-init.
Using Terraform with cloud-init
The Cloudfleet Terraform provider generates cloud-init userdata that you can use with any cloud provider or virtualization platform that supports cloud-init. This is the preferred method when your infrastructure supports it, as nodes automatically register themselves during boot without requiring SSH connectivity from your Terraform environment:
terraform {
required_providers {
cloudfleet = {
source = "terraform.cloudfleet.ai/cloudfleet/cloudfleet"
}
}
}
provider "cloudfleet" {}
# Reference your existing cluster
data "cloudfleet_cfke_cluster" "cluster" {
id = "YOUR_CLUSTER_ID"
}
# Generate cloud-init configuration for self-managed nodes
resource "cloudfleet_cfke_node_join_information" "node" {
cluster_id = data.cloudfleet_cfke_cluster.cluster.id
region = "your-region" # e.g., "eu-west-1", "on-premises"
zone = "your-zone" # e.g., "datacenter-1", "rack-a"
node_labels = {
"cfke.io/provider" = "custom" # Optional: identify node origin
}
}
# The rendered cloud-init configuration is available as:
# cloudfleet_cfke_node_join_information.node.rendered
#
# Pass this to your VM's user_data field
The region and zone values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors.
Using the cloud-init output
The rendered attribute contains the cloud-init userdata. Pass this to your infrastructure provider’s user_data or cloud-init field:
# Example: Generic VM resource (syntax varies by provider)
resource "your_provider_instance" "node" {
# ... other configuration ...
user_data = cloudfleet_cfke_node_join_information.node.rendered
}
When the VM boots, cloud-init executes the configuration and automatically:
- Installs required packages (kubelet, container runtime, networking)
- Configures the node to join your CFKE cluster
- Establishes secure connectivity with other nodes
Provider-specific guides
For complete Terraform examples including VM provisioning, firewall configuration, and security best practices, see the provider-specific guides:
| Provider | Use case | Guide |
|---|---|---|
| Proxmox | On-premises virtualization | VMs on Proxmox VE |
| OVH | European cloud provider | OVH Public Cloud instances |
| Scaleway | European cloud provider | Scaleway instances |
| Vultr | Global cloud provider | Vultr cloud instances |
| Exoscale | European cloud provider | Exoscale compute instances |
For other providers that support cloud-init (DigitalOcean, Linode, VMware, OpenStack, etc.), adapt the examples above using your provider’s Terraform resources.
Adding GPU support with cloud-init
To enable NVIDIA GPU support on cloud-init provisioned nodes, add the install_nvidia_drivers option:
resource "cloudfleet_cfke_node_join_information" "gpu_node" {
cluster_id = data.cloudfleet_cfke_cluster.cluster.id
region = "your-region"
zone = "your-zone"
install_nvidia_drivers = true
node_labels = {
"cfke.io/provider" = "custom"
"cfke.io/accelerator-name" = "V100" # Optional: specify GPU model
}
}
Using Terraform with SSH
The cloudfleet_cfke_self_managed_node resource connects directly to existing machines via SSH and configures them as Kubernetes nodes. Use this method for bare-metal servers, existing VMs, or platforms that do not support cloud-init:
resource "cloudfleet_cfke_self_managed_node" "server" {
cluster_id = cloudfleet_cfke_cluster.example.id
region = "datacenter-1"
zone = "rack-a"
ssh {
host = "192.168.1.100"
user = "ubuntu"
private_key_path = "~/.ssh/id_rsa"
port = 22 # Optional, defaults to 22
}
node_labels = {
"environment" = "production"
}
}
The region and zone values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors.
For GPU nodes with SSH provisioning:
resource "cloudfleet_cfke_self_managed_node" "gpu_server" {
cluster_id = cloudfleet_cfke_cluster.example.id
region = "datacenter-1"
zone = "rack-b"
install_nvidia_drivers = true
ssh {
host = "192.168.1.101"
user = "ubuntu"
private_key_path = "~/.ssh/id_rsa"
}
node_labels = {
"cfke.io/accelerator-name" = "RTX-4090"
}
}
For complete SSH resource documentation, see the Terraform resources reference.
Adding self-managed nodes with the CLI
For quick testing, single-node additions, or environments where Terraform is not available, use the Cloudfleet CLI.
Basic CLI usage
Install the Cloudfleet CLI on a workstation by following the instructions here and configure it for your account.
Run the following command to add a self-managed node to your cluster:
cloudfleet clusters add-self-managed-node CLUSTER_ID \
--host HOST_IP \
--ssh-username SSH_USERNAME \
--ssh-key SSH_KEY_LOCATION \
--ssh-port SSH_PORT \
--region DATACENTER_REGION \
--zone DATACENTER_ZONE
region and zone are mandatory flags that specify the location of the node. These values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors. You can use any string that makes sense for your infrastructure. The best practice is to use the datacenter region as the region value and the failure domain (e.g., rack) as the zone value.
- You can omit the
--usernameflag if your node uses the default usernameroot. - You can omit the
--ssh-keyflag if you have an SSH agent running and the key is added to the agent. - You can omit the
--ssh-portflag if your node uses the default SSH port 22.
- Verify that the node is added by running:
kubectl get nodes
Adding GPU support with the CLI
If you have a self-managed node with an NVIDIA GPU, enable GPU support by adding the --install-nvidia-drivers flag:
cloudfleet clusters add-self-managed-node CLUSTER_ID \
--host HOST_IP \
--ssh-username SSH_USERNAME \
--ssh-key SSH_KEY_LOCATION \
--region DATACENTER_REGION \
--zone DATACENTER_ZONE \
--install-nvidia-drivers
This installs NVIDIA drivers from Ubuntu’s official repository and configures the NVIDIA container runtime.
NVIDIA GPU configuration
When GPU support is enabled (via Terraform or CLI), Cloudfleet:
- Installs NVIDIA drivers from Ubuntu’s official repository
- Installs and configures the NVIDIA container runtime
- Labels the node with
cfke.io/accelerator-manufacturer: NVIDIA - Updates the node capacity
nvidia.com/gpufield with the number of GPUs
Please note that the usage of the NVIDIA drivers is subject to the NVIDIA Driver License Agreement. By using the NVIDIA drivers, you agree to the terms of the NVIDIA Driver License Agreement.
Verify GPU configuration:
kubectl get node -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity
After the node is initialized, give it a few seconds until the capacity value is updated.
Adding GPU model labels
When you have different GPU models in the cluster, add extra labels to identify them:
Option 1: Manual labeling
kubectl label node NODE_NAME cfke.io/accelerator-name=V100
Option 2: Automated labeling with Node Feature Discovery
The Node Feature Discovery project can automatically label nodes based on their hardware features. This Kubernetes SIG project discovers hardware features and advertises them as node labels.
For more information about GPU workloads, see GPU-based workloads and the Kubernetes GPU documentation.
Removing a self-managed node
To remove a node from a cluster:
- Run the following commands on the node:
sudo apt remove -y kubelet
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/kubelet/
- Remove the node from the cluster:
kubectl delete node NODE_NAME
Even if you do not run this last command, the cluster garbage collector will delete the node after it becomes NotReady. However, removing the node manually immediately deletes the pods scheduled on it, causing them to be recreated on other nodes.
Moving a node to another cluster
To add a self-managed node to a different cluster, first remove it from the current cluster following the steps above. This ensures the node is properly unregistered and does not retain any configuration from the previous cluster. After removal, follow the steps in this guide to add the node to the new cluster.
Vultr →