Analytical characterization and efficient simulation of batched arrivals in the Kafka broker

Abstract: Apache Kafka is a key component in event-driven and microservice architectures relying on distributed publish-subscribe messaging for scalable and fault-tolerant streaming of real-time data. To reduce distribution overhead, messages are buffered and dispatched to the broker when either a maximum batch size N is reached or a timeout T expires, enabling control on the trade-off between high throughput and low latency. However, this trade-off has been explored only through empirical studies, referred to specific system deployments and not suited for runtime adaptation to variable workload conditions. We provide an analytical characterization of the arrival process induced by Kafka batching policy under Poisson arrivals. The analysis develops on the observation that the time for buffering a full batch and the size of a batch dispatched at expiration of the timeout follow truncated Erlang and Poisson distributions, respectively. Leveraging this insight, we derive closed forms for quantities that characterize the arrival process, and we propose a method for efficient simulation of the process embedded at dispatching times. To support practical implementation, we evaluate solutions for drawing samples from an Erlang distribution provided by NumPy, PyTorch, and R, and we also propose a novel approach based on rejection sampling with proposal function in the family of Kumaraswamy distributions with automated optimization of parameters with respect to the number of phases. Numerical experimentation shows that: (i) aggregated simulation enabled by the analytical formulation is insensitive to the value of the batch size, and it definitely outperforms fine-grained simulation of individual message arrivals; (ii) the best efficiency is obtained with the NumPy implementation of Erlang, with promising results of the novel approach based on Kumaraswamy, which achieves results comparable to PyTorch and better than R.

Proceedings of MASCOTS, pp. -, IEEE Computer Society, 2025

Stochastic ProcessesPerformance ModelsApplications



copy bib | save bib | save pdf | go to publisher

🏠 Home