TY - JOUR
T1 - Understanding the performance characteristics of computational storage drives
T2 - A case study with smartssd
AU - Kim, Hwajung
AU - Yeom, Heon Y.
AU - Sung, Hanul
N1 - Publisher Copyright:
© 2021 by the authors. Licensee MDPI, Basel, Switzerland.
PY - 2021/11/1
Y1 - 2021/11/1
N2 - The emerging computational storage drives (CSDs) provide new opportunities by moving data computation closer to the storage. Performing computation within storage drives enables data pre/post-processing without expensive data transfers. Moreover, large amounts of data can be processed in parallel thanks to the nature of the field-programmable gate array (FPGA) included in CSDs. In a CSD, there are several implementation techniques that support parallel processing, each of which provides a different degree of parallelism. However, without sufficient understanding of the parallel processing techniques of CSD, it can lead to overhead due to misuse rather than benefiting from task offloading. Thus, to exploit the best performance of CSDs, it is important to properly adjust the degree of parallelism of each implementation technique. In this paper, we focus on the study of the differences in CSD performance according to various combinations of parallel processing techniques. To investigate the performance differences, we implement and offload the data verification algorithm to the CSD and analyze the performance and resource utilization. The experimental results show that implementing the data verification algorithm with a sufficient understanding of CSD’s parallel processing techniques can improve the performance by up to 20 times. Moreover, even with the same degree of parallelism, the performance can differ by 59% depending on the combination of implementation techniques. These results imply that proper orchestration of different implementation techniques leads to better performance and efficient resource utilization.
AB - The emerging computational storage drives (CSDs) provide new opportunities by moving data computation closer to the storage. Performing computation within storage drives enables data pre/post-processing without expensive data transfers. Moreover, large amounts of data can be processed in parallel thanks to the nature of the field-programmable gate array (FPGA) included in CSDs. In a CSD, there are several implementation techniques that support parallel processing, each of which provides a different degree of parallelism. However, without sufficient understanding of the parallel processing techniques of CSD, it can lead to overhead due to misuse rather than benefiting from task offloading. Thus, to exploit the best performance of CSDs, it is important to properly adjust the degree of parallelism of each implementation technique. In this paper, we focus on the study of the differences in CSD performance according to various combinations of parallel processing techniques. To investigate the performance differences, we implement and offload the data verification algorithm to the CSD and analyze the performance and resource utilization. The experimental results show that implementing the data verification algorithm with a sufficient understanding of CSD’s parallel processing techniques can improve the performance by up to 20 times. Moreover, even with the same degree of parallelism, the performance can differ by 59% depending on the combination of implementation techniques. These results imply that proper orchestration of different implementation techniques leads to better performance and efficient resource utilization.
KW - Computational storage drives
KW - Offloading
KW - Parallelization
UR - https://www.scopus.com/pages/publications/85117918356
U2 - 10.3390/electronics10212617
DO - 10.3390/electronics10212617
M3 - Article
AN - SCOPUS:85117918356
SN - 2079-9292
VL - 10
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 21
M1 - 2617
ER -