Battling memory requirements of array programming through streaming
Research output: Contribution to journal › Conference article › Research › peer-review
Standard
Battling memory requirements of array programming through streaming. / Kristensen, Mads Ruben Burgdorff; Avery, James Emil; Blum, Troels; Lund, Simon Andreas Frimann; Vinter, Brian.
In: Lecture notes in computer science, Vol. 2016, 2016, p. 451-469.Research output: Contribution to journal › Conference article › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Battling memory requirements of array programming through streaming
AU - Kristensen, Mads Ruben Burgdorff
AU - Avery, James Emil
AU - Blum, Troels
AU - Lund, Simon Andreas Frimann
AU - Vinter, Brian
N1 - Conference code: 1
PY - 2016
Y1 - 2016
N2 - A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelization high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers.Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilization of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy due to exhausting system memory.
AB - A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelization high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers.Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilization of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy due to exhausting system memory.
U2 - 10.1007/978-3-319-46079-6_32
DO - 10.1007/978-3-319-46079-6_32
M3 - Conference article
VL - 2016
SP - 451
EP - 469
JO - Lecture Notes in Computer Science
JF - Lecture Notes in Computer Science
SN - 0302-9743
T2 - 1st International Workshop on Performance Portable Programming Models for Accelerators
Y2 - 23 June 2016 through 23 June 2016
ER -
ID: 178247789