TY - GEN
T1 - ANNA
T2 - 28th Annual IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
AU - Lee, Yejin
AU - Choi, Hyunji
AU - Min, Sunhong
AU - Lee, Hyunseung
AU - Beak, Sangwon
AU - Jeong, Dawoon
AU - Lee, Jae W.
AU - Ham, Tae Jun
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Similarity search or nearest neighbor search is a task of retrieving a set of vectors in the (vector) database that are most similar to the provided query vector. It has been a key kernel for many applications for a long time. However, it is becoming especially more important in recent days as modern neural networks and machine learning models represent the semantics of images, videos, and documents as high-dimensional vectors called embeddings. Finding a set of similar embeddings for the provided query embedding is now the critical operation for modern recommender systems and semantic search engines. Since exhaustively searching for the most similar vectors out of billion vectors is such a prohibitive task, approximate nearest neighbor search (ANNS) is often utilized in many real-world use cases. Unfortunately, we find that utilizing the server-class CPUs and GPUs for the ANNS task leads to suboptimal performance and energy efficiency. To address such limitations, we propose a specialized architecture named ANNA (Approximate Nearest Neighbor search Accelerator), which is compatible with state-of-the-art ANNS algorithms such as Google ScaNN and Facebook Faiss. By combining the benefits of a specialized dataflow pipeline and efficient data reuse, ANNA achieves multiple orders of magnitude higher energy efficiency, 2.3-61.6× higher throughput, and 4.3-82.1× lower latency than the conventional CPU or GPU for both million- and billion-scale datasets.
AB - Similarity search or nearest neighbor search is a task of retrieving a set of vectors in the (vector) database that are most similar to the provided query vector. It has been a key kernel for many applications for a long time. However, it is becoming especially more important in recent days as modern neural networks and machine learning models represent the semantics of images, videos, and documents as high-dimensional vectors called embeddings. Finding a set of similar embeddings for the provided query embedding is now the critical operation for modern recommender systems and semantic search engines. Since exhaustively searching for the most similar vectors out of billion vectors is such a prohibitive task, approximate nearest neighbor search (ANNS) is often utilized in many real-world use cases. Unfortunately, we find that utilizing the server-class CPUs and GPUs for the ANNS task leads to suboptimal performance and energy efficiency. To address such limitations, we propose a specialized architecture named ANNA (Approximate Nearest Neighbor search Accelerator), which is compatible with state-of-the-art ANNS algorithms such as Google ScaNN and Facebook Faiss. By combining the benefits of a specialized dataflow pipeline and efficient data reuse, ANNA achieves multiple orders of magnitude higher energy efficiency, 2.3-61.6× higher throughput, and 4.3-82.1× lower latency than the conventional CPU or GPU for both million- and billion-scale datasets.
KW - Approximate Nearest Neighbor Search
KW - Hardware Accelerator
KW - Product Quantization
KW - Similarity Search
UR - http://www.scopus.com/inward/record.url?scp=85130704187&partnerID=8YFLogxK
U2 - 10.1109/HPCA53966.2022.00021
DO - 10.1109/HPCA53966.2022.00021
M3 - Conference contribution
AN - SCOPUS:85130704187
T3 - Proceedings - International Symposium on High-Performance Computer Architecture
SP - 169
EP - 183
BT - Proceedings - 2022 IEEE International Symposium on High-Performance Computer Architecture, HPCA 2022
PB - IEEE Computer Society
Y2 - 2 April 2022 through 6 April 2022
ER -