Self-managed nodes

For supported cloud providers (AWS, GCP, and Hetzner Cloud), Cloudfleet provides node auto-provisioning that automatically creates and manages compute nodes based on your workload requirements. For on-premises infrastructure and other cloud providers, you manually provision compute nodes and register them with Cloudfleet. This guide explains how to add self-managed nodes to your CFKE cluster.

Not sure which approach to use? See How Cloudfleet provisions nodes for a comparison of node auto-provisioning versus self-managed nodes.

Self-managed nodes are Linux machines that you provision and give to CFKE to allow it to run workloads on them. This allows you to use your own hardware or third-party cloud resources to expand your cluster. Self-managed nodes act as equal members of the cluster and can run any workload that their hardware supports.

Requirements

To be able to add self-managed nodes to your cluster, your node must meet the following requirements:

  • Your node must run Ubuntu 22.04 or 24.04 (Red Hat Enterprise Linux and variants are on the roadmap)
  • You must have SSH access from a bastion host or your local workstation with root access
  • Your node must have egress internet access
  • Your firewall must allow your server to initiate TCP connections to *:443, UDP from :41641 to : and UDP to *:3478.
  • Your node can be behind NAT and does not need a public IP address. Cloudfleet is able to establish a secure tunnel behind NAT.

Please note that egress internet access is required for the self-managed nodes to be able to communicate with the CFKE control plane and establish VPN connections with other nodes in the cluster. If two self-managed nodes are in the same network and have a connection via a private IP, they communicate with each other over the private network. In this case, the VPN between nodes is still established and node-to-node communication is encrypted.

Adding self-managed nodes with Terraform

For production deployments, infrastructure-as-code workflows, or when adding multiple nodes at once, use the Cloudfleet Terraform provider. The provider supports two methods:

  • Cloud-init (preferred): Generates configuration that automatically registers nodes when they boot. Use this method when your target platform supports cloud-init, as it enables fully automated provisioning without requiring SSH access during setup.
  • SSH: Connects to existing machines via SSH to install and configure them. Use this for bare-metal servers, existing VMs, or platforms that do not support cloud-init.

Using Terraform with cloud-init

The Cloudfleet Terraform provider generates cloud-init userdata that you can use with any cloud provider or virtualization platform that supports cloud-init. This is the preferred method when your infrastructure supports it, as nodes automatically register themselves during boot without requiring SSH connectivity from your Terraform environment:

terraform {
  required_providers {
    cloudfleet = {
      source = "terraform.cloudfleet.ai/cloudfleet/cloudfleet"
    }
  }
}

provider "cloudfleet" {}

# Reference your existing cluster
data "cloudfleet_cfke_cluster" "cluster" {
  id = "YOUR_CLUSTER_ID"
}

# Generate cloud-init configuration for self-managed nodes
resource "cloudfleet_cfke_node_join_information" "node" {
  cluster_id = data.cloudfleet_cfke_cluster.cluster.id
  region     = "your-region"    # e.g., "eu-west-1", "on-premises"
  zone       = "your-zone"      # e.g., "datacenter-1", "rack-a"

  node_labels = {
    "cfke.io/provider" = "custom"  # Optional: identify node origin
  }
}

# The rendered cloud-init configuration is available as:
# cloudfleet_cfke_node_join_information.node.rendered
#
# Pass this to your VM's user_data field

The region and zone values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors.

Using the cloud-init output

The rendered attribute contains the cloud-init userdata. Pass this to your infrastructure provider’s user_data or cloud-init field:

# Example: Generic VM resource (syntax varies by provider)
resource "your_provider_instance" "node" {
  # ... other configuration ...

  user_data = cloudfleet_cfke_node_join_information.node.rendered
}

When the VM boots, cloud-init executes the configuration and automatically:

  1. Installs required packages (kubelet, container runtime, networking)
  2. Configures the node to join your CFKE cluster
  3. Establishes secure connectivity with other nodes

Provider-specific guides

For complete Terraform examples including VM provisioning, firewall configuration, and security best practices, see the provider-specific guides:

ProviderUse caseGuide
ProxmoxOn-premises virtualizationVMs on Proxmox VE
OVHEuropean cloud providerOVH Public Cloud instances
ScalewayEuropean cloud providerScaleway instances
VultrGlobal cloud providerVultr cloud instances
ExoscaleEuropean cloud providerExoscale compute instances

For other providers that support cloud-init (DigitalOcean, Linode, VMware, OpenStack, etc.), adapt the examples above using your provider’s Terraform resources.

Adding GPU support with cloud-init

To enable NVIDIA GPU support on cloud-init provisioned nodes, add the install_nvidia_drivers option:

resource "cloudfleet_cfke_node_join_information" "gpu_node" {
  cluster_id            = data.cloudfleet_cfke_cluster.cluster.id
  region                = "your-region"
  zone                  = "your-zone"
  install_nvidia_drivers = true

  node_labels = {
    "cfke.io/provider"         = "custom"
    "cfke.io/accelerator-name" = "V100"  # Optional: specify GPU model
  }
}

Using Terraform with SSH

The cloudfleet_cfke_self_managed_node resource connects directly to existing machines via SSH and configures them as Kubernetes nodes. Use this method for bare-metal servers, existing VMs, or platforms that do not support cloud-init:

resource "cloudfleet_cfke_self_managed_node" "server" {
  cluster_id = cloudfleet_cfke_cluster.example.id
  region     = "datacenter-1"
  zone       = "rack-a"

  ssh {
    host             = "192.168.1.100"
    user             = "ubuntu"
    private_key_path = "~/.ssh/id_rsa"
    port             = 22  # Optional, defaults to 22
  }

  node_labels = {
    "environment" = "production"
  }
}

The region and zone values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors.

For GPU nodes with SSH provisioning:

resource "cloudfleet_cfke_self_managed_node" "gpu_server" {
  cluster_id             = cloudfleet_cfke_cluster.example.id
  region                 = "datacenter-1"
  zone                   = "rack-b"
  install_nvidia_drivers = true

  ssh {
    host             = "192.168.1.101"
    user             = "ubuntu"
    private_key_path = "~/.ssh/id_rsa"
  }

  node_labels = {
    "cfke.io/accelerator-name" = "RTX-4090"
  }
}

For complete SSH resource documentation, see the Terraform resources reference.

Adding self-managed nodes with the CLI

For quick testing, single-node additions, or environments where Terraform is not available, use the Cloudfleet CLI.

Basic CLI usage

  1. Install the Cloudfleet CLI on a workstation by following the instructions here and configure it for your account.

  2. Run the following command to add a self-managed node to your cluster:

cloudfleet clusters add-self-managed-node CLUSTER_ID \
  --host HOST_IP \
  --ssh-username SSH_USERNAME \
  --ssh-key SSH_KEY_LOCATION \
  --ssh-port SSH_PORT \
  --region DATACENTER_REGION \
  --zone DATACENTER_ZONE

region and zone are mandatory flags that specify the location of the node. These values become Kubernetes node labels (topology.kubernetes.io/region and topology.kubernetes.io/zone), allowing you to schedule workloads to specific locations using standard node selectors. You can use any string that makes sense for your infrastructure. The best practice is to use the datacenter region as the region value and the failure domain (e.g., rack) as the zone value.

  • You can omit the --username flag if your node uses the default username root.
  • You can omit the --ssh-key flag if you have an SSH agent running and the key is added to the agent.
  • You can omit the --ssh-port flag if your node uses the default SSH port 22.
  1. Verify that the node is added by running:
kubectl get nodes

Adding GPU support with the CLI

If you have a self-managed node with an NVIDIA GPU, enable GPU support by adding the --install-nvidia-drivers flag:

cloudfleet clusters add-self-managed-node CLUSTER_ID \
  --host HOST_IP \
  --ssh-username SSH_USERNAME \
  --ssh-key SSH_KEY_LOCATION \
  --region DATACENTER_REGION \
  --zone DATACENTER_ZONE \
  --install-nvidia-drivers

This installs NVIDIA drivers from Ubuntu’s official repository and configures the NVIDIA container runtime.

NVIDIA GPU configuration

When GPU support is enabled (via Terraform or CLI), Cloudfleet:

  • Installs NVIDIA drivers from Ubuntu’s official repository
  • Installs and configures the NVIDIA container runtime
  • Labels the node with cfke.io/accelerator-manufacturer: NVIDIA
  • Updates the node capacity nvidia.com/gpu field with the number of GPUs

Please note that the usage of the NVIDIA drivers is subject to the NVIDIA Driver License Agreement. By using the NVIDIA drivers, you agree to the terms of the NVIDIA Driver License Agreement.

Verify GPU configuration:

kubectl get node -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity

After the node is initialized, give it a few seconds until the capacity value is updated.

Adding GPU model labels

When you have different GPU models in the cluster, add extra labels to identify them:

Option 1: Manual labeling

kubectl label node NODE_NAME cfke.io/accelerator-name=V100

Option 2: Automated labeling with Node Feature Discovery

The Node Feature Discovery project can automatically label nodes based on their hardware features. This Kubernetes SIG project discovers hardware features and advertises them as node labels.

For more information about GPU workloads, see GPU-based workloads and the Kubernetes GPU documentation.

Removing a self-managed node

To remove a node from a cluster:

  1. Run the following commands on the node:
sudo apt remove -y kubelet
sudo rm -rf /etc/kubernetes/
sudo rm -rf /var/lib/kubelet/
  1. Remove the node from the cluster:
kubectl delete node NODE_NAME

Even if you do not run this last command, the cluster garbage collector will delete the node after it becomes NotReady. However, removing the node manually immediately deletes the pods scheduled on it, causing them to be recreated on other nodes.

Moving a node to another cluster

To add a self-managed node to a different cluster, first remove it from the current cluster following the steps above. This ensures the node is properly unregistered and does not retain any configuration from the previous cluster. After removal, follow the steps in this guide to add the node to the new cluster.