A study on inference latency for vision transformers on mobile devices
Z. Li, M. Paolieri, L. Golubchik
Abstract: Given the significant advances in machine learning techniques on mobile devices, particularly in the domain of computer vision, in this work we quantitatively study the performance characteristics of 190 real-world vision transformers (ViTs) on mobile devices. Through a comparison with 102 real-world convolutional neural networks (CNNs), we provide insights into the factors that influence the latency of ViT architectures on mobile devices. Based on these insights, we develop a dataset including measured latencies of 1000 synthetic ViTs with representative building blocks and state-of-the-art architectures from two machine learning frameworks and six mobile platforms. Using this dataset, we show that inference latency of new ViTs can be predicted with sufficient accuracy for real-world applications.
Proceedings of VALUETOOLS, pp. -, Springer, 2024 2024copy bib | save bib | save pdf | go to publisher
🏠 Home