Abstract
Designing high-performance hardware sorter for resource-constrained systems is challenging due to physical limitations and the need to balance streaming bandwidth with memory throughput. This brief introduces a novel, scalable hardware sorter architecture with fully-streaming support and an accompanying RTL generator to provide versatile, energy-efficient hardware acceleration. Our solution employs a dual-layer architecture consisting of a parallel one-way linear insertion sorter (OLIS) for bandwidth optimization and a cyclic bitonic merge network (CBMN) for a compact, high-throughput implementation. Furthermore, we developed the RTL generator written in Chisel to provide the agile implementation of the scalable architecture. Experimental results targeting the Xilinx XVU37P-FSVH2892-2L-E FPGA show that our design achieves throughput increasing by 126.26% and latency decreasing by 68.46%, with an area increment of no more than 132.94% for LUTs and a decrement of flip-flops by 79.84%, compared to state-of-the-art streaming sorter. The source code is available at https://github.com/hyun-woo-oh/DL-Sort-Generator.
| Original language | English |
|---|---|
| Pages (from-to) | 2549-2553 |
| Number of pages | 5 |
| Journal | IEEE Transactions on Circuits and Systems II: Express Briefs |
| Volume | 71 |
| Issue number | 5 |
| DOIs | |
| State | Published - 1 May 2024 |
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 7 Affordable and Clean Energy
Keywords
- bitonic sort
- energy-efficient computing
- hardware acceleration
- scalable architecture
- Sorting network
Fingerprint
Dive into the research topics of 'DL-Sort: A Hybrid Approach to Scalable Hardware-Accelerated Fully-Streaming Sorting'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver