Battling memory requirements of array programming through streaming

Research output: Contribution to journalConference articleResearchpeer-review

Standard

Battling memory requirements of array programming through streaming. / Kristensen, Mads Ruben Burgdorff; Avery, James Emil; Blum, Troels; Lund, Simon Andreas Frimann; Vinter, Brian.

In: Lecture notes in computer science, Vol. 2016, 2016, p. 451-469.

Research output: Contribution to journalConference articleResearchpeer-review

Harvard

Kristensen, MRB, Avery, JE, Blum, T, Lund, SAF & Vinter, B 2016, 'Battling memory requirements of array programming through streaming', Lecture notes in computer science, vol. 2016, pp. 451-469. https://doi.org/10.1007/978-3-319-46079-6_32

APA

Kristensen, M. R. B., Avery, J. E., Blum, T., Lund, S. A. F., & Vinter, B. (2016). Battling memory requirements of array programming through streaming. Lecture notes in computer science, 2016, 451-469. https://doi.org/10.1007/978-3-319-46079-6_32

Vancouver

Kristensen MRB, Avery JE, Blum T, Lund SAF, Vinter B. Battling memory requirements of array programming through streaming. Lecture notes in computer science. 2016;2016:451-469. https://doi.org/10.1007/978-3-319-46079-6_32

Author

Kristensen, Mads Ruben Burgdorff ; Avery, James Emil ; Blum, Troels ; Lund, Simon Andreas Frimann ; Vinter, Brian. / Battling memory requirements of array programming through streaming. In: Lecture notes in computer science. 2016 ; Vol. 2016. pp. 451-469.

Bibtex

@inproceedings{9cc93502ae944312b40999922b6a9d3f,
title = "Battling memory requirements of array programming through streaming",
abstract = "A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelization high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers.Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilization of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy due to exhausting system memory.",
author = "Kristensen, {Mads Ruben Burgdorff} and Avery, {James Emil} and Troels Blum and Lund, {Simon Andreas Frimann} and Brian Vinter",
year = "2016",
doi = "10.1007/978-3-319-46079-6_32",
language = "English",
volume = "2016",
pages = "451--469",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer",
note = "1st International Workshop on Performance Portable Programming Models for Accelerators, P^3MA ; Conference date: 23-06-2016 Through 23-06-2016",

}

RIS

TY - GEN

T1 - Battling memory requirements of array programming through streaming

AU - Kristensen, Mads Ruben Burgdorff

AU - Avery, James Emil

AU - Blum, Troels

AU - Lund, Simon Andreas Frimann

AU - Vinter, Brian

N1 - Conference code: 1

PY - 2016

Y1 - 2016

N2 - A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelization high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers.Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilization of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy due to exhausting system memory.

AB - A barrier to efficient array programming, for example in Python/NumPy, is that algorithms written as pure array operations completely without loops, while most efficient on small input, can lead to explosions in memory use. The present paper presents a solution to this problem using array streaming, implemented in the automatic parallelization high-performance framework Bohrium. This makes it possible to use array programming in Python/NumPy code directly, even when the apparent memory requirement exceeds the machine capacity, since the automatic streaming eliminates the temporary memory overhead by performing calculations in per-thread registers.Using Bohrium, we automatically fuse, JIT-compile, and execute NumPy array operations on GPGPUs without modification to the user programs. We present performance evaluations of three benchmarks, all of which show dramatic reductions in memory use from streaming, yielding corresponding improvements in speed and utilization of GPGPU-cores. The streaming-enabled Bohrium effortlessly runs programs on input sizes much beyond sizes that crash on pure NumPy due to exhausting system memory.

U2 - 10.1007/978-3-319-46079-6_32

DO - 10.1007/978-3-319-46079-6_32

M3 - Conference article

VL - 2016

SP - 451

EP - 469

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

T2 - 1st International Workshop on Performance Portable Programming Models for Accelerators

Y2 - 23 June 2016 through 23 June 2016

ER -

ID: 178247789