Kubernetes operators: what they are and when to use them

Kubernetes provides a powerful set of primitives for running containerized workloads: Pods, Deployments, Services, ConfigMaps, and Secrets handle the majority of stateless application patterns. But when you need to run something more complex, like a database cluster with automated failover, a message queue with partition rebalancing, or a monitoring stack that configures itself based on your workloads, the built-in resources are not enough.

This is where operators come in. The operator pattern extends Kubernetes with application-specific automation, encoding the knowledge that a human administrator would use to deploy, scale, back up, and recover a system.

The operator pattern

An operator is a combination of two Kubernetes concepts working together:

Custom Resource Definitions (CRDs) extend the Kubernetes API with new resource types. A CRD lets you define a resource like PostgresCluster or PrometheusStack with its own schema, just like the built-in Deployment or Service resources. Users interact with CRDs through kubectl and standard Kubernetes tooling.

Custom controllers watch for changes to those custom resources and take action to reconcile the actual state of the system with the desired state. This is the same control loop pattern that Kubernetes itself uses for built-in resources. When you create a Deployment, a controller ensures the right number of Pods are running. An operator controller does the same thing for application-specific concerns.

The formula is straightforward: operator = CRD + controller. The CRD defines what you want. The controller makes it happen and keeps it that way.

The Kubernetes operator pattern: Custom Resource with spec and status, operator control loop watching and updating, managed objects like StatefulSet, Service, ConfigMap

What operators actually do

The value of an operator becomes clear when you separate application lifecycle into two phases.

Day-1 operations cover initial deployment: installing the software, configuring it, and getting it running. Helm charts and static manifests handle this well. You define the desired configuration, apply it, and the application starts.

Day-2 operations are everything that happens after deployment: upgrades, backups, failover, scaling, certificate rotation, schema migrations, and recovery from failures. These are the tasks that require operational knowledge specific to the application. A Postgres backup is not the same as a MongoDB backup. Scaling an Elasticsearch cluster requires rebalancing shards. Upgrading Kafka requires coordinated rolling restarts with partition leadership migration.

Operators encode this day-2 knowledge into software. Instead of a runbook that a human follows at 3am, the operator watches the cluster continuously and takes the right action automatically. If a database replica falls behind, the operator detects it and initiates recovery. If a TLS certificate is about to expire, the operator renews it. If a new version is available, the operator can perform a rolling upgrade with the correct sequence of steps for that specific application.

A practical example

Consider running PostgreSQL on Kubernetes without an operator. You would need to:

Create a StatefulSet with the right container image and configuration
Set up persistent volumes for data storage
Configure a Service for client connections
Write scripts for backup and restore
Implement health checks and failover logic
Handle version upgrades manually, coordinating primary and replica ordering
Manage connection pooling, monitoring, and alerting separately

With a Postgres operator (such as the Zalando Postgres Operator or CrunchyData PGO), you instead create a single custom resource:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  name: my-database
spec:
  teamId: "myteam"
  numberOfInstances: 3
  volume:
    size: 50Gi
  postgresql:
    version: "16"

The operator handles everything else: creating the StatefulSet, configuring streaming replication, setting up connection pooling via PgBouncer, scheduling backups to object storage, performing failover when the primary becomes unavailable, and rolling out version upgrades in the correct order.

Widely used operators in production

The operator ecosystem has matured significantly. These are the operators you are most likely to encounter in production Kubernetes clusters.

cert-manager automates TLS certificate management. It integrates with Let’s Encrypt, HashiCorp Vault, and other certificate authorities to issue, renew, and rotate certificates automatically. Nearly every production Kubernetes cluster uses cert-manager.

Prometheus Operator simplifies deploying and configuring Prometheus monitoring. It introduces ServiceMonitor and PodMonitor CRDs that let application teams define monitoring targets declaratively, without editing a central Prometheus configuration file.

External Secrets Operator synchronizes secrets from external secret stores (AWS Secrets Manager, HashiCorp Vault, Google Secret Manager, Azure Key Vault) into Kubernetes Secrets. This keeps sensitive data out of Git repositories and YAML manifests.

Strimzi manages Apache Kafka on Kubernetes. It handles broker configuration, topic management, user authentication, and rolling upgrades for Kafka clusters, which are notoriously complex to operate manually.

Rook provides storage orchestration for Kubernetes, most commonly used to deploy and manage Ceph storage clusters. It turns Ceph’s complex deployment and management into a set of Kubernetes-native resources.

Crossplane takes the operator pattern in a different direction. Instead of managing applications inside the cluster, Crossplane manages cloud infrastructure resources (databases, storage buckets, VPCs) from within Kubernetes using CRDs. You define an RDS instance or a GCS bucket as a Kubernetes resource, and Crossplane provisions it in the cloud provider.

The full catalog of available operators is browsable at OperatorHub.io, which lists hundreds of operators for databases, monitoring, networking, security, and storage.

Operators vs Helm

Helm and operators are frequently compared, but they serve different purposes and work well together.

Helm is a package manager. It templates and packages Kubernetes manifests, making it easy to distribute, install, and upgrade applications. Helm handles day-1 operations: getting the application deployed with the right configuration. Once helm install finishes, Helm steps back. It does not watch the cluster, react to failures, or manage the application lifecycle.

Operators are runtime managers. They watch the cluster continuously and take action to maintain the desired state. An operator handles both day-1 and day-2 operations. Many operators are themselves distributed as Helm charts, using Helm for initial installation while the operator handles everything after that.

The practical distinction: use Helm for stateless applications where Kubernetes built-in controllers (Deployments, ReplicaSets) handle all the lifecycle management you need. Use operators for stateful or complex applications where application-specific logic is required for upgrades, backups, failover, and scaling.

When you do not need an operator

Not every application benefits from an operator. The operator pattern adds complexity, specifically a custom controller that runs in your cluster and needs its own monitoring, upgrades, and RBAC permissions. Before adopting an operator, consider whether simpler alternatives work:

Stateless web services and APIs are well served by Deployments, Horizontal Pod Autoscalers, and Ingress controllers. Kubernetes has built-in primitives for rolling updates, scaling, and service discovery. Adding an operator for a stateless application is unnecessary overhead.

Applications with simple configuration that can be fully described with ConfigMaps, Secrets, and environment variables do not need CRDs. If kubectl apply gets the job done, an operator is overkill.

Managed cloud services may be a better choice than running your own operator-managed database. If your cloud provider offers a managed Postgres, Redis, or Elasticsearch service, using it eliminates the need to operate the database software entirely, which is simpler than even the best operator.

When operators are the right choice

Operators provide clear value in specific scenarios:

Stateful applications with complex lifecycle requirements. Databases, message queues, distributed caches, and search engines all have application-specific operational procedures that cannot be captured in a Deployment spec. Operators encode these procedures and execute them reliably.

Applications that require automated day-2 operations. If your team spends significant time on manual backup schedules, certificate rotation, version upgrades, or failover procedures, an operator can automate those tasks and reduce operational toil.

Infrastructure standardization. When you need multiple teams to deploy the same application consistently, a CRD provides a well-defined interface. Instead of each team writing their own StatefulSet configuration for Postgres, they create a PostgresCluster resource with the parameters they need, and the operator ensures it is deployed correctly.

Self-service platforms. If you are building an internal developer platform, operators provide the abstraction layer between platform teams and application teams. The platform team manages the operator. Application teams consume the CRD without needing to understand the underlying infrastructure.

Building your own operator

Most teams should use existing operators rather than building their own. The ecosystem covers the majority of common applications. But if you have a custom application with complex operational requirements, several frameworks are available:

Kubebuilder (Go) is the most widely used framework, maintained by the Kubernetes SIG API Machinery group
Operator SDK (Go, Ansible, Helm) is part of the Operator Framework project and builds on top of Kubebuilder
Kopf (Python) provides a simpler entry point for teams more comfortable with Python
kube-rs (Rust) is growing in popularity for operators where performance and resource efficiency matter
Java Operator SDK brings the pattern to Java/JVM teams

Building a production-quality operator requires understanding of Kubernetes API conventions, finalizers, status subresources, leader election, and reconciliation idempotency. Plan for significant engineering investment if you go this route.

The operator landscape in 2026

The operator pattern has evolved from an experimental idea introduced by CoreOS in 2016 to a standard part of the Kubernetes ecosystem. CRDs are a stable API (v1), with support for CEL validation and server-side apply. Multi-cluster operators are increasingly common, managing resources across multiple Kubernetes clusters from a single control plane.

For teams running Kubernetes in production, operators are not optional for stateful workloads. The question is not whether to use them, but which ones to adopt and how to manage them alongside the rest of your cluster infrastructure.

Kubernetes operators: what they are and when to use them

The operator pattern

What operators actually do

A practical example

Widely used operators in production

Operators vs Helm

When you do not need an operator

When operators are the right choice

Building your own operator

The operator landscape in 2026

Latest Cloudfleet tutorial: Expose HTTP applications with NGINX Ingress

Cloud Native infrastructure blog topics

Kubernetes operators: what they are and when to use them

The operator pattern

What operators actually do

A practical example

Widely used operators in production

Operators vs Helm

When you do not need an operator

When operators are the right choice

Building your own operator

The operator landscape in 2026

Latest Cloudfleet tutorial: Expose HTTP applications with NGINX Ingress

Sign up for Cloud Native Newsletter

Cloud Native infrastructure blog topics