Self-managed nodes
For supported cloud providers, Cloudfleet supports autoprovisioning the compute nodes using the node auto-provisioner. For on-premises infrastructure, you need to manually provision the compute nodes and register them with Cloudfleet. This guide explains how to set up Cloudfleet with on-premises infrastructure. Besides on-premises infrastructure, you can also use this guide to add nodes to the CFKE clusters from other cloud providers that are not supported by Cloudfleet’s Node Auto-provisioner.
Self-managed nodes are Linux machines that you provision and give to CFKE to allow it to run workloads on them. This allows you to use your own hardware or third-party cloud resources to expand your cluster. Self-managed nodes act as equal members of the cluster and can run any workload that their hardware supports.
Requirements
To be able to add self-managed nodes to your cluster, your node must meet the following requirements:
- Your node must run Ubuntu 22.04 or 24.04 (Red Hat Enterprise Linux and variants are on the roadmap)
- You must have SSH access from a bastion host or your local workstation with root access
- Your node must have egress internet access
Please note that egress internet access is required for the self-managed nodes to be able to communicate with the CFKE control plane and establish VPN connections with other nodes in the cluster. If two self-managed nodes are in the same network and have a connection via a private IP, they communicate with each other over the private network. In this case, the VPN between nodes is still established and node-to-node communication is encrypted.
Adding self-managed nodes
- Install the Cloudfleet CLI on a workstation by following the instructions here and configure it for your account.
- Run the following command to add a self-managed node to your cluster:
cloudfleet clusters add-self-managed-node CLUSTER_ID --host HOST_IP --ssh-username SSH_USERNAME --ssh-key SSH_KEY_LOCATION --ssh-port SSH_PORT --region DATACENTER_REGION --zone DATACENTER_ZONE
region
, and zone
are mandatory flags that you can use to specify, region, and zone of the node. These are arbitrary strings that you can use to identify region, and zone of the node. You can use any string that makes sense to you and these values are translated to node labels in the Kubernetes cluster and can be used to schedule workloads to specific nodes. The best practice is to use the datacenter region as the region value and the failure domain (e.g. rack) as the zone value.
- You can omit the –username flag if your node uses the default username root.
- You can omit the –ssh-key flag if you have an SSH agent running and the key is added to the agent.
- You can omit the –ssh-port flag if your node uses the default SSH port 22. After running the command, the CLI will install the required packages and automatically add the node to your cluster. You can verify that the node is added by running the following command:
kubectl get nodes
Adding NVIDIA Accelerator support
If you have a self-managed node with an NVIDIA GPU, you can enable GPU support by adding the --install-nvidia-drivers
flag to the add-self-managed-node
command. This will not only install the NVIDIA drivers but also install and configure the NVIDIA container runtime on the node. The NVIDIA drivers are installed from Ubuntu’s official repository and the NVIDIA container runtime is installed from the NVIDIA container runtime repository.
Please note that the usage of the NVIDIA drivers is subject to the NVIDIA Driver License Agreement. By using the NVIDIA drivers, you agree to the terms of the NVIDIA Driver License Agreement.
NVIDIA driver enabled nodes are labeled with cfke.io/accelerator-manufacturer: NVIDIA
label and if the NVIDIA GPU is really installed on the node, the node capacity nvidia.com/gpu
field is updated with the number of GPUs on the node. You can use the following command to verify that the node is labeled correctly:
kubectl get node -o custom-columns=NAME:.metadata.name,CAPACITY:.status.capacity
After the node is initialized, please give it a few seconds until the capacity value is updated.
In addition to the default label added by Cloudfleet, you may need extra labels in some scenarios, for example, when you have different GPU models in the cluster. You have two options to add extra labels to the node:
You can use kubectl label command to add extra labels to the node. For example, to add a label
cfke.io/accelerator-name: V100
to a node, you can run the following command:kubectl label node NODE_NAME cfke.io/accelerator-model=V100
You can use the Node Feature Discovery project to automate labelling nodes based on their hardware features. The Node Feature Discovery project is a Kubernetes SIG project that discovers hardware features of nodes in a Kubernetes cluster and advertises them as labels on the nodes. Please note that the Node Feature Discovery project is not maintained by Cloudfleet and you should refer to the project’s documentation for more information.
For more information and examples about how to use NVIDIA GPUs on Cloudfleet Kubernetes Engine, please refer to the scheduling workloads on GPUs.
For more information about the usage of GPU’s on Kubernetes please refer to the official Kubernetes documentation.