Feature Gates
The Thanos Operator provides several experimental features controlled by feature gates. These features are designed to extend the operator’s capabilities while maintaining backward compatibility for the APIs. Features can be enabled individually using the --enable-feature command-line flag.
Overview
Feature gates allow you to:
- Enable experimental functionality before it becomes stable
- Test new features in development environments
- Incrementally adopt new capabilities as they mature
- Maintain backward compatibility by keeping new features disabled by default
Available Features
| Feature | Flag | Status | Description |
|---|---|---|---|
| ServiceMonitor | service-monitor |
Experimental | Automatic ServiceMonitor creation for Prometheus scraping |
| PrometheusRule | prometheus-rule |
Experimental | Automatic discovery and mounting of PrometheusRule objects |
| OpenTelemetry Sidecar | otel-sidecar |
Experimental | OpenTelemetry collector sidecar injection for distributed tracing |
| KubeResourceSync | kube-resource-sync |
Experimental | Immediate ConfigMap/Secret synchronization via sidecar |
| Volume Resize | volume-resize |
Experimental | Automatic PVC resizing for Thanos component storage expansion |
Enabling Features
Command Line
Enable features using the --enable-feature flag when starting the operator:
# Enable a single feature
./thanos-operator --enable-feature service-monitor
# Enable multiple features
./thanos-operator \
--enable-feature service-monitor \
--enable-feature prometheus-rule \
--enable-feature otel-sidecar \
--enable-feature volume-resizeServiceMonitor Feature
Flag: service-monitor
What It Achieves
Automatically creates and manages ServiceMonitor resources for Thanos components, eliminating the need to manually define ServiceMonitors for scraping the resources deployed by the operator. This feature integrates seamlessly with the Prometheus Operator ecosystem.
How It Works
When enabled, the operator automatically creates ServiceMonitor resources alongside each Thanos component. These ServiceMonitors are configured with appropriate labels, selectors, and endpoints to enable Prometheus discovery and scraping.
Prerequisites
- Prometheus Operator must be installed in the cluster
- Prometheus instance must be configured to discover ServiceMonitors with appropriate selectors
PrometheusRule Feature
Flag: prometheus-rule
What It Achieves
Enables ThanosRuler to automatically discover and mount PrometheusRule resources as configuration. This allows you to define alerting and recording rules as Kubernetes custom resources rather than manually managing ConfigMaps.
How It Works
The operator watches for PrometheusRule resources that match the configured label selector, converts them into ConfigMaps, and mounts them into ThanosRuler pods. This provides automatic rule discovery and lifecycle management.
Configuration
Configure PrometheusRule discovery in your ThanosRuler spec:
apiVersion: monitoring.thanos.io/v1alpha1
kind: ThanosRuler
metadata:
name: thanos-ruler
spec:
ruleConfigSelector:
matchLabels:
prometheus: main
role: alert-rules
# Optional: Enable multi-tenancy
ruleTenancyConfig:
# Label on PrometheusRule that contains tenant value (default: "operator.thanos.io/tenant")
tenantSpecifierLabel: tenant
# Label injected into rule groups and PromQL expressions (default: "tenant_id")
enforcedTenantIdentifier: tenant_idPrometheusRule Example
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: thanos-alerts
labels:
prometheus: main
role: alert-rules
tenant: platform
spec:
groups:
- name: thanos.rules
rules:
- alert: ThanosQueryInstanceDown
expr: up{job="thanos-query"} == 0
for: 5m
labels:
severity: warning
annotations:
summary: "Thanos Query instance is down"PrometheusRule Discovery
The operator discovers PrometheusRule resources using the ruleConfigSelector you define. This is a standard Kubernetes label selector that determines which PrometheusRules the ThanosRuler should process.
Discovery process:
- You define
ruleConfigSelectorwith your desired labels - PrometheusRules matching the combined selector are discovered and processed
Multi-Tenancy Processing
When ruleTenancyConfig is configured, the operator performs tenant injection:
-
Tenant Discovery:
- Looks for
tenantSpecifierLabel(default:operator.thanos.io/tenant) on PrometheusRule metadata - Uses the label value as the tenant identifier
- Looks for
-
Rule Group Processing:
- Adds
enforcedTenantIdentifierlabel to each rule group - Example:
tenant_id: platformadded to group labels
- Adds
-
PromQL Expression Enforcement:
- Injects tenant label into PromQL expressions using
enforceTenantLabelInPromQL - Original:
up{job="thanos-query"} == 0 - Modified:
up{job="thanos-query", tenant_id="platform"} == 0
- Injects tenant label into PromQL expressions using
Tenancy Example:
Input PrometheusRule:
metadata:
labels:
tenant: platform # tenantSpecifierLabel value
spec:
groups:
- name: thanos.rules
rules:
- alert: ThanosQueryInstanceDown
expr: up{job="thanos-query"} == 0Generated ConfigMap content:
groups:
- name: thanos.rules
labels:
tenant_id: platform # enforcedTenantIdentifier added
rules:
- alert: ThanosQueryInstanceDown
expr: up{job="thanos-query", tenant_id="platform"} == 0 # PromQL modifiedRule Processing
The operator continuously watches and processes PrometheusRule resources:
- Discovery: Uses the
ruleConfigSelectorto find matching PrometheusRules in the same namespace - Tenancy Processing: When
ruleTenancyConfigis enabled, applies tenant label injection to rule groups and PromQL expressions - Conversion: Transforms PrometheusRule specs into Prometheus rule file format
- ConfigMap Generation: Creates ConfigMaps named
{ruler-name}-promrule-{index}containing the rule files - Mounting: ThanosRuler pods automatically mount these ConfigMaps as rule files
- Precedence: PrometheusRule-derived ConfigMaps override any conflicting user-created ConfigMaps
- Metrics: Exposes discovery and processing metrics including per-tenant rule counts
Use Cases
- GitOps rule management: Store rules in version control as PrometheusRule resources
- Multi-tenant alerting: Separate rules per tenant with automatic label injection
- Dynamic rule updates: Rules update automatically when PrometheusRule resources change
OpenTelemetry Sidecar Feature
Flag: otel-sidecar
What It Achieves
Enables automatic injection of OpenTelemetry collector sidecars into Thanos component pods, providing distributed tracing capabilities across the entire Thanos stack without manual configuration.
How It Works
When enabled, the operator adds the sidecar.opentelemetry.io/inject: "true" annotation to Thanos pods and configures Thanos components with OTLP tracing endpoints. The OpenTelemetry Operator handles the actual sidecar injection.
Automatic Configuration
The operator automatically adds tracing configuration to Thanos components:
# Automatically added tracing config
--tracing.config=type: OTLP
config:
client_type: http
endpoint: localhost:4318
insecure: truePrerequisites
- OpenTelemetry Operator must be installed in the cluster
- OpenTelemetryCollector resource must be configured for sidecar injection
KubeResourceSync Feature
Flag: kube-resource-sync
What It Achieves
Provides immediate ConfigMap and Secret synchronization via a specialized sidecar container, eliminating kubelet sync delays (typically 60+ seconds) for critical configuration updates. Currently implemented for ThanosReceive hashring configuration.
How It Works
The operator injects a kube-resource-sync sidecar container that watches Kubernetes resources in real-time and immediately syncs changes to a shared volume. An init container ensures data is available before the main Thanos container starts.
Implementation Details
When enabled for ThanosReceive router:
- Volume Change: ConfigMap volume mount is replaced with EmptyDir
- Init Container: Ensures initial configuration is synced before Thanos starts
- Sidecar Container: Continuously watches for configuration changes
- RBAC: Automatically creates Role and RoleBinding for resource access
RBAC Configuration
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: thanos-receive
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
resourceNames: ["thanos-receive"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: thanos-receive
subjects:
- kind: ServiceAccount
name: thanos-receive
roleRef:
kind: Role
name: thanos-receive
apiGroup: rbac.authorization.k8s.ioCustom Image Configuration
# Environment variable
export KUBE_RESOURCE_SYNC_IMAGE=custom-registry/kube-resource-sync:v1.0.0Monitoring Feature Gates
The operator exposes metrics about enabled feature gates:
# Check which features are enabled
thanos_operator_feature_gates_info{feature="service-monitor"} == 1Volume Resize Feature
Flag: volume-resize
What It Achieves
Enables automatic PVC (PersistentVolumeClaim) resizing for Thanos component StatefulSets when storage expansion is needed. This feature allows you to resize storage volumes without manual intervention, providing seamless storage scaling for Thanos components.
How It Works
The volume resize controller watches for StatefulSets managed by the Thanos Operator and automatically resizes associated PVCs when a storage size annotation indicates a larger size is needed. After successful resize, the StatefulSet is orphaned and recreated to pick up the new volume sizes.
Resize Process
When the feature is enabled, the controller performs the following steps:
- StatefulSet Discovery: Identifies StatefulSets managed by the Thanos Operator
- Annotation Processing: Reads the
operator.thanos.io/storage-sizeannotation from the StatefulSet - PVC Comparison: Compares the requested size with current PVC storage size
- Volume Expansion: If requested size is larger, updates the PVC storage request
- StatefulSet Recreation: Orphans and deletes the StatefulSet so it can be recreated with new volume sizes
Supported Components
The volume resize feature works with all Thanos components that use StatefulSets:
- ThanosStore: Storage gateway persistent volumes
- ThanosCompact: Compactor working directory and metadata storage
- ThanosRuler: Rule evaluation state and WAL storage
- ThanosReceive: Ingester TSDB storage
Storage Class Requirements
The underlying StorageClass must support volume expansion:
allowVolumeExpansion: true # Required in StorageClassMonitoring
The controller exposes metrics for monitoring resize operations:
# Total resize attempts
thanos_operator_volume_resize_attempts_total{statefulset="thanos-store"}
# Resize failures
thanos_operator_volume_resize_failures_total{statefulset="thanos-store"}Limitations
- Expansion Only: Volumes can only be expanded, not shrunk
- StorageClass Support: Requires
allowVolumeExpansion: truein the StorageClass
See Also
- Prometheus Operator Documentation
- OpenTelemetry Documentation
- kube-resource-sync Project
- Thanos Documentation