Latest
Compilers for Heterogeneous Supercomputing Now Available
Compilers for Heterogeneous Supercomputing Now Available
PGI 17.7
Provides Support for NVIDIA Volta GPUs, OpenACC interoperability with CUDA
Unified Memory, OpenMP 4.5 for Multicore CPUs
Provides Support for NVIDIA Volta GPUs, OpenACC interoperability with CUDA
Unified Memory, OpenMP 4.5 for Multicore CPUs
SINGAPORE
— September 18, 2017 — Version 17.7 of the PGI® 2017
Compilers and Tools is now available, delivering improved performance
and programming simplicity to high performance computing (HPC) developers who
target multicore CPUs and heterogeneous GPU-accelerated systems.
— September 18, 2017 — Version 17.7 of the PGI® 2017
Compilers and Tools is now available, delivering improved performance
and programming simplicity to high performance computing (HPC) developers who
target multicore CPUs and heterogeneous GPU-accelerated systems.
Available immediately, key new features of the
PGI 17.7 Compilers & Tools include:
PGI 17.7 Compilers & Tools include:
• Tesla
V100 GPU support – PGI OpenACC and CUDA Fortran now
support the new NVIDIA Volta GV100 GPU, offering more memory bandwidth,
more streaming multiprocessors, next-generation NVIDIA NVLink™ and new
microarchitectural features that add up to better performance and
programmability.
V100 GPU support – PGI OpenACC and CUDA Fortran now
support the new NVIDIA Volta GV100 GPU, offering more memory bandwidth,
more streaming multiprocessors, next-generation NVIDIA NVLink™ and new
microarchitectural features that add up to better performance and
programmability.
• OpenACC
for CUDA Unified Memory – the PGI 17.7 compilers leverage CUDA
Unified Memory to simplify OpenACC programming on GPU-accelerated
systems. When OpenACC allocatable data is placed in CUDA Unified Memory using a
simple compiler option, no explicit data movement code or directives are
needed.
for CUDA Unified Memory – the PGI 17.7 compilers leverage CUDA
Unified Memory to simplify OpenACC programming on GPU-accelerated
systems. When OpenACC allocatable data is placed in CUDA Unified Memory using a
simple compiler option, no explicit data movement code or directives are
needed.
• OpenMP
4.5 for Multicore CPUs – Initial support for OpenMP 4.5
syntax and features allows the compilation of most OpenMP 4.5 programs
for parallel execution across all the cores of a multicore CPU system. TARGET
regions are implemented with default support for the multicore host as the
target, and PARALLEL and DISTRIBUTE loops are parallelised across all OpenMP
threads.
4.5 for Multicore CPUs – Initial support for OpenMP 4.5
syntax and features allows the compilation of most OpenMP 4.5 programs
for parallel execution across all the cores of a multicore CPU system. TARGET
regions are implemented with default support for the multicore host as the
target, and PARALLEL and DISTRIBUTE loops are parallelised across all OpenMP
threads.
•
Automatic Deep Copy of Fortran Derived Types – Movement
of aggregate, or deeply nested Fortran data objects between CPU host and
GPU device memory, including traversal and management of pointer-based objects,
is now supported using OpenACC directives.
Automatic Deep Copy of Fortran Derived Types – Movement
of aggregate, or deeply nested Fortran data objects between CPU host and
GPU device memory, including traversal and management of pointer-based objects,
is now supported using OpenACC directives.
• C++
Enhancements – The PGI 17.7 C++ compiler includes
incremental C++17 features, and is supported as a CUDA 9.0 NVCC host
compiler. It delivers an average 20 percent performance improvement on the
LCALS loops benchmarks.
Enhancements – The PGI 17.7 C++ compiler includes
incremental C++17 features, and is supported as a CUDA 9.0 NVCC host
compiler. It delivers an average 20 percent performance improvement on the
LCALS loops benchmarks.
• Use
C++14 Lambdas with Capture in OpenACC Regions – C++
lambda expressions provide a convenient way to define anonymous function
objects at the location where they are invoked or passed as arguments. Starting
with the PGI 17.7 release, lambdas are supported in OpenACC compute regions in
C++ programs, for example to drive code generation customised to different
programming models or platforms. C++14 opens doors for more lambda use cases,
especially for polymorphic lambdas. Those capabilities are now usable in
OpenACC programs.
C++14 Lambdas with Capture in OpenACC Regions – C++
lambda expressions provide a convenient way to define anonymous function
objects at the location where they are invoked or passed as arguments. Starting
with the PGI 17.7 release, lambdas are supported in OpenACC compute regions in
C++ programs, for example to drive code generation customised to different
programming models or platforms. C++14 opens doors for more lambda use cases,
especially for polymorphic lambdas. Those capabilities are now usable in
OpenACC programs.
• Interoperability
with the cuSOLVER Library – call optimised cuSolverDN routines
from CUDA Fortran and OpenACC Fortran, C and C++ using the PGI-supplied
interface module and the PGI-compiled version of the cuSOLVER library bundled
with PGI 17.7.
with the cuSOLVER Library – call optimised cuSolverDN routines
from CUDA Fortran and OpenACC Fortran, C and C++ using the PGI-supplied
interface module and the PGI-compiled version of the cuSOLVER library bundled
with PGI 17.7.
• PGI
Unified Binary for NVIDIA Tesla and Multicore CPUs – use
OpenACC to build applications for both GPU acceleration and parallel
execution on multicore CPUs. When run on a GPU-enabled system, OpenACC regions
offload and execute on the GPU. When run on a system without GPUs installed,
OpenACC regions execute in parallel across all CPU cores in the system.
Unified Binary for NVIDIA Tesla and Multicore CPUs – use
OpenACC to build applications for both GPU acceleration and parallel
execution on multicore CPUs. When run on a GPU-enabled system, OpenACC regions
offload and execute on the GPU. When run on a system without GPUs installed,
OpenACC regions execute in parallel across all CPU cores in the system.
• New
Profiling features for CUDA Unified Memory and OpenACC – The
PGI 17.7 Profiler adds new OpenACC profiling features including support
on multicore CPUs with or without attached GPUs, and a new summary view that
shows time spent in each OpenACC construct. New CUDA Unified Memory features
include correlating CPU page faults with the source code lines where the
associated data was allocated, support for new CUDA Unified Memory page
thrashing, throttling and remote map events, NVLink support and more.
Profiling features for CUDA Unified Memory and OpenACC – The
PGI 17.7 Profiler adds new OpenACC profiling features including support
on multicore CPUs with or without attached GPUs, and a new summary view that
shows time spent in each OpenACC construct. New CUDA Unified Memory features
include correlating CPU page faults with the source code lines where the
associated data was allocated, support for new CUDA Unified Memory page
thrashing, throttling and remote map events, NVLink support and more.
Other features and enhancements of PGI 17.7
include comprehensive support for environment modules on all supported
platforms, prebuilt versions of popular open source libraries and applications,
and new “Introduction to Parallel Computing with OpenACC” video tutorial
series.
include comprehensive support for environment modules on all supported
platforms, prebuilt versions of popular open source libraries and applications,
and new “Introduction to Parallel Computing with OpenACC” video tutorial
series.
For a complete list of PGI 17.7 features and capabilities,
visit
visit
PGI
17.7 is available for download today from the PGI website to all PGI Professional customers
with active maintenance.
17.7 is available for download today from the PGI website to all PGI Professional customers
with active maintenance.
About PGI Compilers & Tools
An
NVIDIA Corporation brand, PGI includes high-performance parallel Fortran, C and
C++ compilers and tools for x86-64 and OpenPOWER CPU processor-based systems
and NVIDIA Tesla GPU Accelerators running Linux, Microsoft Windows or Apple
macOS operating systems. More information is available at www.pgicompilers.com.
NVIDIA Corporation brand, PGI includes high-performance parallel Fortran, C and
C++ compilers and tools for x86-64 and OpenPOWER CPU processor-based systems
and NVIDIA Tesla GPU Accelerators running Linux, Microsoft Windows or Apple
macOS operating systems. More information is available at www.pgicompilers.com.
To Keep Current on PGI and NVIDIA:
•
View OpenACC videos on YouTube.
View OpenACC videos on YouTube.
•
Like NVIDIA on Facebook.
Like NVIDIA on Facebook.
•
Keep up with the NVIDIA Blog.
Keep up with the NVIDIA Blog.
•
Use the Pulse news reader to subscribe to the
NVIDIA Daily News feed.
Use the Pulse news reader to subscribe to the
NVIDIA Daily News feed.
About NVIDIA
NVIDIA’s
(NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming
market, redefined modern computer graphics and revolutionised parallel
computing. More recently, GPU deep learning ignited modern AI — the next era
of computing — with the GPU acting as the brain of computers, robots and
self-driving cars that can perceive and understand the world. More information
at http://nvidianews.nvidia.com/.
(NASDAQ: NVDA) invention of the GPU in 1999 sparked the growth of the PC gaming
market, redefined modern computer graphics and revolutionised parallel
computing. More recently, GPU deep learning ignited modern AI — the next era
of computing — with the GPU acting as the brain of computers, robots and
self-driving cars that can perceive and understand the world. More information
at http://nvidianews.nvidia.com/.
For the LATEST tech updates,
FOLLOW us on our Twitter
LIKE us on our FaceBook
SUBSCRIBE to us on our YouTube Channel!