Feature Gates

The Thanos Operator provides several experimental features controlled by feature gates. These features are designed to extend the operator’s capabilities while maintaining backward compatibility for the APIs. Features can be enabled individually using the --enable-feature command-line flag.

Overview

Feature gates allow you to:

Enable experimental functionality before it becomes stable
Test new features in development environments
Incrementally adopt new capabilities as they mature
Maintain backward compatibility by keeping new features disabled by default

Available Features

Feature	Flag	Status	Description
ServiceMonitor	`service-monitor`	Experimental	Automatic ServiceMonitor creation for Prometheus scraping
PrometheusRule	`prometheus-rule`	Experimental	Automatic discovery and mounting of PrometheusRule objects
OpenTelemetry Sidecar	`otel-sidecar`	Experimental	OpenTelemetry collector sidecar injection for distributed tracing
KubeResourceSync	`kube-resource-sync`	Experimental	Immediate ConfigMap/Secret synchronization via sidecar
Volume Resize	`volume-resize`	Experimental	Automatic PVC resizing for Thanos component storage expansion

Enabling Features

Command Line

Enable features using the --enable-feature flag when starting the operator:

# Enable a single feature
./thanos-operator --enable-feature service-monitor

# Enable multiple features
./thanos-operator \
  --enable-feature service-monitor \
  --enable-feature prometheus-rule \
  --enable-feature otel-sidecar \
  --enable-feature volume-resize

ServiceMonitor Feature

Flag: service-monitor

What It Achieves

Automatically creates and manages ServiceMonitor resources for Thanos components, eliminating the need to manually define ServiceMonitors for scraping the resources deployed by the operator. This feature integrates seamlessly with the Prometheus Operator ecosystem.

How It Works

When enabled, the operator automatically creates ServiceMonitor resources alongside each Thanos component. These ServiceMonitors are configured with appropriate labels, selectors, and endpoints to enable Prometheus discovery and scraping.

Prerequisites

Prometheus Operator must be installed in the cluster
Prometheus instance must be configured to discover ServiceMonitors with appropriate selectors

PrometheusRule Feature

Flag: prometheus-rule

What It Achieves

Enables ThanosRuler to automatically discover and mount PrometheusRule resources as configuration. This allows you to define alerting and recording rules as Kubernetes custom resources rather than manually managing ConfigMaps.

How It Works

The operator watches for PrometheusRule resources that match the configured label selector, converts them into ConfigMaps, and mounts them into ThanosRuler pods. This provides automatic rule discovery and lifecycle management.

Configuration

Configure PrometheusRule discovery in your ThanosRuler spec:

apiVersion: monitoring.thanos.io/v1alpha1
kind: ThanosRuler
metadata:
  name: thanos-ruler
spec:
  ruleConfigSelector:
    matchLabels:
      prometheus: main
      role: alert-rules
  
  # Optional: Enable multi-tenancy
  ruleTenancyConfig:
    # Label on PrometheusRule that contains tenant value (default: "operator.thanos.io/tenant")
    tenantSpecifierLabel: tenant
    # Label injected into rule groups and PromQL expressions (default: "tenant_id") 
    enforcedTenantIdentifier: tenant_id

PrometheusRule Example

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: thanos-alerts
  labels:
    prometheus: main
    role: alert-rules
    tenant: platform
spec:
  groups:
  - name: thanos.rules
    rules:
    - alert: ThanosQueryInstanceDown
      expr: up{job="thanos-query"} == 0
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Thanos Query instance is down"

PrometheusRule Discovery

The operator discovers PrometheusRule resources using the ruleConfigSelector you define. This is a standard Kubernetes label selector that determines which PrometheusRules the ThanosRuler should process.

Discovery process:

You define ruleConfigSelector with your desired labels
PrometheusRules matching the combined selector are discovered and processed

Multi-Tenancy Processing

When ruleTenancyConfig is configured, the operator performs tenant injection:

Tenant Discovery:
- Looks for tenantSpecifierLabel (default: operator.thanos.io/tenant) on PrometheusRule metadata
- Uses the label value as the tenant identifier
Rule Group Processing:
- Adds enforcedTenantIdentifier label to each rule group
- Example: tenant_id: platform added to group labels
PromQL Expression Enforcement:
- Injects tenant label into PromQL expressions using enforceTenantLabelInPromQL
- Original: up{job="thanos-query"} == 0
- Modified: up{job="thanos-query", tenant_id="platform"} == 0

Tenancy Example:

Input PrometheusRule:

metadata:
  labels:
    tenant: platform  # tenantSpecifierLabel value
spec:
  groups:
  - name: thanos.rules
    rules:
    - alert: ThanosQueryInstanceDown
      expr: up{job="thanos-query"} == 0

Generated ConfigMap content:

groups:
- name: thanos.rules
  labels:
    tenant_id: platform  # enforcedTenantIdentifier added
  rules:
  - alert: ThanosQueryInstanceDown
    expr: up{job="thanos-query", tenant_id="platform"} == 0  # PromQL modified

Rule Processing

The operator continuously watches and processes PrometheusRule resources:

Discovery: Uses the ruleConfigSelector to find matching PrometheusRules in the same namespace
Tenancy Processing: When ruleTenancyConfig is enabled, applies tenant label injection to rule groups and PromQL expressions
Conversion: Transforms PrometheusRule specs into Prometheus rule file format
ConfigMap Generation: Creates ConfigMaps named {ruler-name}-promrule-{index} containing the rule files
Mounting: ThanosRuler pods automatically mount these ConfigMaps as rule files
Precedence: PrometheusRule-derived ConfigMaps override any conflicting user-created ConfigMaps
Metrics: Exposes discovery and processing metrics including per-tenant rule counts

Use Cases

GitOps rule management: Store rules in version control as PrometheusRule resources
Multi-tenant alerting: Separate rules per tenant with automatic label injection
Dynamic rule updates: Rules update automatically when PrometheusRule resources change

OpenTelemetry Sidecar Feature

Flag: otel-sidecar

What It Achieves

Enables automatic injection of OpenTelemetry collector sidecars into Thanos component pods, providing distributed tracing capabilities across the entire Thanos stack without manual configuration.

How It Works

When enabled, the operator adds the sidecar.opentelemetry.io/inject: "true" annotation to Thanos pods and configures Thanos components with OTLP tracing endpoints. The OpenTelemetry Operator handles the actual sidecar injection.

Automatic Configuration

The operator automatically adds tracing configuration to Thanos components:

# Automatically added tracing config
--tracing.config=type: OTLP
config:
  client_type: http
  endpoint: localhost:4318
  insecure: true

Prerequisites

OpenTelemetry Operator must be installed in the cluster
OpenTelemetryCollector resource must be configured for sidecar injection

KubeResourceSync Feature

Flag: kube-resource-sync

What It Achieves

Provides immediate ConfigMap and Secret synchronization via a specialized sidecar container, eliminating kubelet sync delays (typically 60+ seconds) for critical configuration updates. Currently implemented for ThanosReceive hashring configuration.

How It Works

The operator injects a kube-resource-sync sidecar container that watches Kubernetes resources in real-time and immediately syncs changes to a shared volume. An init container ensures data is available before the main Thanos container starts.

Implementation Details

When enabled for ThanosReceive router:

Volume Change: ConfigMap volume mount is replaced with EmptyDir
Init Container: Ensures initial configuration is synced before Thanos starts
Sidecar Container: Continuously watches for configuration changes
RBAC: Automatically creates Role and RoleBinding for resource access

RBAC Configuration

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: thanos-receive
rules:
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["get", "list", "watch"]
  resourceNames: ["thanos-receive"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: thanos-receive
subjects:
- kind: ServiceAccount
  name: thanos-receive
roleRef:
  kind: Role
  name: thanos-receive
  apiGroup: rbac.authorization.k8s.io

Custom Image Configuration

# Environment variable
export KUBE_RESOURCE_SYNC_IMAGE=custom-registry/kube-resource-sync:v1.0.0

Monitoring Feature Gates

The operator exposes metrics about enabled feature gates:

# Check which features are enabled
thanos_operator_feature_gates_info{feature="service-monitor"} == 1

Volume Resize Feature

Flag: volume-resize

What It Achieves

Enables automatic PVC (PersistentVolumeClaim) resizing for Thanos component StatefulSets when storage expansion is needed. This feature allows you to resize storage volumes without manual intervention, providing seamless storage scaling for Thanos components.

How It Works

The volume resize controller watches for StatefulSets managed by the Thanos Operator and automatically resizes associated PVCs when a storage size annotation indicates a larger size is needed. After successful resize, the StatefulSet is orphaned and recreated to pick up the new volume sizes.

Resize Process

When the feature is enabled, the controller performs the following steps:

StatefulSet Discovery: Identifies StatefulSets managed by the Thanos Operator
Annotation Processing: Reads the operator.thanos.io/storage-size annotation from the StatefulSet
PVC Comparison: Compares the requested size with current PVC storage size
Volume Expansion: If requested size is larger, updates the PVC storage request
StatefulSet Recreation: Orphans and deletes the StatefulSet so it can be recreated with new volume sizes

Supported Components

The volume resize feature works with all Thanos components that use StatefulSets:

ThanosStore: Storage gateway persistent volumes
ThanosCompact: Compactor working directory and metadata storage
ThanosRuler: Rule evaluation state and WAL storage
ThanosReceive: Ingester TSDB storage

Storage Class Requirements

The underlying StorageClass must support volume expansion:

allowVolumeExpansion: true  # Required in StorageClass

Monitoring

The controller exposes metrics for monitoring resize operations:

# Total resize attempts
thanos_operator_volume_resize_attempts_total{statefulset="thanos-store"}

# Resize failures  
thanos_operator_volume_resize_failures_total{statefulset="thanos-store"}

Limitations

Expansion Only: Volumes can only be expanded, not shrunk
StorageClass Support: Requires allowVolumeExpansion: true in the StorageClass

Feature Gates

Overview#

Available Features#

Enabling Features#

Command Line#

ServiceMonitor Feature#

What It Achieves#

How It Works#

Prerequisites#

PrometheusRule Feature#

What It Achieves#

How It Works#

Configuration#

PrometheusRule Example#

PrometheusRule Discovery#

Multi-Tenancy Processing#

Rule Processing#

Use Cases#

OpenTelemetry Sidecar Feature#

What It Achieves#

How It Works#

Automatic Configuration#

Prerequisites#

KubeResourceSync Feature#

What It Achieves#

How It Works#

Implementation Details#

RBAC Configuration#

Custom Image Configuration#

Monitoring Feature Gates#

Volume Resize Feature#

What It Achieves#

How It Works#

Resize Process#

Supported Components#

Storage Class Requirements#

Monitoring#

Limitations#

See Also#

Overview

Available Features

Enabling Features

Command Line

ServiceMonitor Feature

What It Achieves

How It Works

Prerequisites

PrometheusRule Feature

What It Achieves

How It Works

Configuration

PrometheusRule Example

PrometheusRule Discovery

Multi-Tenancy Processing

Rule Processing

Use Cases

OpenTelemetry Sidecar Feature

What It Achieves

How It Works

Automatic Configuration

Prerequisites

KubeResourceSync Feature

What It Achieves

How It Works

Implementation Details

RBAC Configuration

Custom Image Configuration

Monitoring Feature Gates

Volume Resize Feature

What It Achieves

How It Works

Resize Process

Supported Components

Storage Class Requirements

Monitoring

Limitations

See Also