TY - GEN
T1 - FPGA Based Approximate Vector Operation Accelerator for VLMs
AU - Kim, Raehyeong
AU - Lee, Chaebin
AU - Lee, Dayoung
AU - Jeong, Yue Ri
AU - Lee, Seung Eun
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - As artificial intelligence continues to advance, the importance of autonomous robotic systems and human-robot in-teractions is growing, particularly in the areas of language-based control and behavior prediction. Vision-language models (VLMs), which simultaneously process images and texts, are essential for interpreting commands and analyzing environments. However, their high computational demands, particularly for vector op-erations like softmax and layer normalization, pose significant challenges. These operations, involving complex functions such as exponentials and square roots, consume substantial resources as model sizes grow. This paper proposes a vector operation accelerator for VLMs through approximation techniques, specifically Newton-Raphson and piecewise linear approximations, to improve speed while reducing resource usage. The optimized architecture reuses resources by targeting redundant operations. Implemented on an FPGA, the accelerator achieved up to 54% faster performance compared to an RTX 3070 GPU, with minimal aproximation error.
AB - As artificial intelligence continues to advance, the importance of autonomous robotic systems and human-robot in-teractions is growing, particularly in the areas of language-based control and behavior prediction. Vision-language models (VLMs), which simultaneously process images and texts, are essential for interpreting commands and analyzing environments. However, their high computational demands, particularly for vector op-erations like softmax and layer normalization, pose significant challenges. These operations, involving complex functions such as exponentials and square roots, consume substantial resources as model sizes grow. This paper proposes a vector operation accelerator for VLMs through approximation techniques, specifically Newton-Raphson and piecewise linear approximations, to improve speed while reducing resource usage. The optimized architecture reuses resources by targeting redundant operations. Implemented on an FPGA, the accelerator achieved up to 54% faster performance compared to an RTX 3070 GPU, with minimal aproximation error.
KW - accelerator
KW - approximation
KW - transformer
KW - vision-language models
UR - https://www.scopus.com/pages/publications/86000022247
U2 - 10.1109/ICEIC64972.2025.10879665
DO - 10.1109/ICEIC64972.2025.10879665
M3 - Conference contribution
AN - SCOPUS:86000022247
T3 - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
BT - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Conference on Electronics, Information, and Communication, ICEIC 2025
Y2 - 19 January 2025 through 22 January 2025
ER -