Class FeedbackAdaptiveRateLimiter

java.lang.Object
org.apache.hadoop.hbase.quotas.RateLimiter
org.apache.hadoop.hbase.quotas.FeedbackAdaptiveRateLimiter

@Private @Evolving public class FeedbackAdaptiveRateLimiter extends RateLimiter
An adaptive rate limiter that dynamically adjusts its behavior based on observed usage patterns to achieve stable, full utilization of configured quota allowances while managing client contention.

Core Algorithm: This rate limiter divides time into fixed refill intervals (configurable via hbase.quota.rate.limiter.refill.interval.ms, default is 1 refill per TimeUnit of the RateLimiter). At the beginning of each interval, a fresh allocation of resources becomes available based on the configured limit. Clients consume resources as they make requests. When resources are exhausted, clients must wait until the next refill, or until enough resources become available.

Adaptive Backpressure: When multiple threads compete for limited resources (contention), this limiter detects the contention and applies increasing backpressure by extending wait intervals. This prevents thundering herd behavior where many threads wake simultaneously and compete for the same resources. The backoff multiplier increases by a small increment (see FEEDBACK_ADAPTIVE_BACKOFF_MULTIPLIER_INCREMENT) per interval when contention occurs, and decreases (see FEEDBACK_ADAPTIVE_BACKOFF_MULTIPLIER_DECREMENT) when no contention is detected, converging toward optimal throughput. The multiplier is capped at a maximum value (see FEEDBACK_ADAPTIVE_MAX_BACKOFF_MULTIPLIER) to prevent unbounded waits.

Contention is detected when getWaitInterval(long, long, long) is called with insufficient available resources (i.e., amount > available), indicating a thread needs to wait for resources. If this occurs more than once in a refill interval, the limiter identifies it as contention requiring increased backpressure.

Oversubscription for Full Utilization: In practice, synchronization overhead and timing variations often prevent clients from consuming exactly their full allowance, resulting in consistent under-utilization. This limiter addresses this by tracking utilization via an exponentially weighted moving average (EWMA). When average utilization falls below the target range (determined by FEEDBACK_ADAPTIVE_UTILIZATION_ERROR_BUDGET), the limiter gradually increases the oversubscription proportion (see FEEDBACK_ADAPTIVE_OVERSUBSCRIPTION_INCREMENT), allowing more resources per interval than the base limit. Conversely, when utilization exceeds the target range, oversubscription is decreased (see FEEDBACK_ADAPTIVE_OVERSUBSCRIPTION_DECREMENT). Oversubscription is capped (see FEEDBACK_ADAPTIVE_MAX_OVERSUBSCRIPTION) to prevent excessive bursts while still enabling consistent full utilization.

Example Scenario: Consider a quota of 1000 requests per second with a 1-second refill interval. Without oversubscription, clients might typically achieve only 950 req/s due to coordination delays. This limiter would detect the under-utilization, gradually increase oversubscription, allowing slightly more resources per interval, which compensates for inefficiencies and achieves stable throughput closer to the configured quota. If multiple threads simultaneously try to consume resources and repeatedly wait, the backoff multiplier increases their wait times, spreading out their retry attempts and reducing wasted CPU cycles.

Configuration Parameters:

This algorithm converges toward stable operation where: (1) wait intervals are just long enough to prevent excessive contention, and (2) oversubscription is just high enough to achieve consistent full utilization of the configured allowance.