Abstract
A parallel algorithm of bi-conjugate gradient method was developed based on CUDA for parallel computation of the incompressible Navier-Stokes equations. The governing equations were discretized using splitting P2P1 finite element method. Asymmetric stenotic flow problem was solved to validate the proposed algorithm, and then the parallel performance of the GPU was examined by measuring the elapsed times. Further, the GPU performance for sparse matrix-vector multiplication was also investigated with a matrix of fluid-structure interaction problem. A kernel was generated to simultaneously compute the inner product of each row of sparse matrix and a vector. In addition, the kernel was optimized to improve the performance by using both parallel reduction and memory coalescing. In the kernel construction, the effect of warp on the parallel performance of the present CUDA was also examined. The present GPU computation was more than 7 times faster than the single CPU by double precision.
| Original language | English |
|---|---|
| Pages (from-to) | 597-604 |
| Number of pages | 8 |
| Journal | Transactions of the Korean Society of Mechanical Engineers, B |
| Volume | 40 |
| Issue number | 9 |
| DOIs | |
| State | Published - Sep 2016 |
Keywords
- Bi-conjugate gradient method
- CFD
- Finite element method
- GPU