TY - GEN
T1 - Cost-effectively offering private buffers in SoCs and CMPs
AU - Fang, Zhen
AU - Zhao, Li
AU - Iyer, Ravishankar R.
AU - Fajardo, Carlos Flores
AU - Garcia, German Fabila
AU - Lee, Seung Eun
AU - Li, Bin
AU - King, Steve R.
AU - Jiang, Xiaowei
AU - Makineni, Srihari
PY - 2011
Y1 - 2011
N2 - High performance SoCs and CMPs integrate multiple cores and hardware accelerators such as network interface devices and speech recognition engines. Cores make use of SRAM organized as a cache. Accelerators make use of SRAM as special-purpose storage such as FIFOs, scratchpad memory, or other forms of private buffers. Dedicated private buffers provide benefits such as deterministic access, but are highly area inefficient due to the lower average utilization of the total available storage. We propose Buffer-integrated-Caching (BiC), which integrates private buffers and traditional caches into a single shared SRAM block. Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. We show that with a small extra area added to the baseline cache, BiC removes the need for large, dedicated SRAMs, with minimal performance impact.
AB - High performance SoCs and CMPs integrate multiple cores and hardware accelerators such as network interface devices and speech recognition engines. Cores make use of SRAM organized as a cache. Accelerators make use of SRAM as special-purpose storage such as FIFOs, scratchpad memory, or other forms of private buffers. Dedicated private buffers provide benefits such as deterministic access, but are highly area inefficient due to the lower average utilization of the total available storage. We propose Buffer-integrated-Caching (BiC), which integrates private buffers and traditional caches into a single shared SRAM block. Much like shared caches improve SRAM utilization on CMPs, the BiC architecture generalizes this advantage for a heterogeneous mix of cores and accelerators in future SoCs and CMPs. We demonstrate cost-effectiveness of the BiC using SoC-based low-power servers and CMP-based servers with on-chip NIC. We show that with a small extra area added to the baseline cache, BiC removes the need for large, dedicated SRAMs, with minimal performance impact.
KW - accelerators
KW - cache
KW - sram
UR - https://www.scopus.com/pages/publications/79959616648
U2 - 10.1145/1995896.1995940
DO - 10.1145/1995896.1995940
M3 - Conference contribution
AN - SCOPUS:79959616648
SN - 9781450301022
T3 - Proceedings of the International Conference on Supercomputing
SP - 275
EP - 284
BT - ICS'11 - Proceedings of the 2011 ACM International Conference on Supercomputing
T2 - 25th ACM International Conference on Supercomputing, ICS 2011
Y2 - 31 May 2011 through 4 June 2011
ER -