Topic : Course Material | HPC133 Intro to GPU Programming (Feb 2024)

Topic outline

- Sélectionner l’activité Source code
  
  Source code Fichier
  
  This tarball contains the source code for the vector addition examples, gravitational potential exercise, and sample code for the diffusion equation homework exercise.
  
  The src/gravity directory contains two Python programs to test and benchmark the different solutions for the gravitational potential exercise; the solution functions are in the gravity_calculators subdirectory. The Sapporo N-body library has to be downloaded, patched, and built before it can be used here; this is done automatically by the get_sapporo.sh script in the sapporo2 directory (tested on Mist; some tweaks required to get this program to work on other systems).
- Sélectionner l’activité What we learned on Monday: Modern CPUs have...
  
  What we learned on Monday:
  
  Modern CPUs have SIMD hardware.
  
  We (usually) depend on the compiler to vectorize serial code.
  
  A GPU is a co-processor.
  
  It’s similar to a CPU but with more parallel machinery.
  
  It has its own instruction stream and memory.
  
  Therefore, an application will have both host code and device code.
  
  Similarly, variables can live either on the RAM (CPU-side) or the video memory (GPU-side).
  
  Copying data back and forth is needed.
  
  Copying data can be explicit or implicit.
  
  The thread is the basic unit of parallelism (thread hierarchy).
  
  Threads are collected into blocks, the blocks make up the grid.
  
  The total number of threads should be >> number of CUDA “cores” / stream processors.
  
  What we learned on Wednesday:
  There are many GPU programming frameworks
  Native APIs
  Graphics APIs (less good for general purpose programming)
  Abstraction libraries help make code more portable
  Directive-based approaches
  Programs and libraries
  We looked at a basic CUDA program:
  
  We created pointers to device memory.
  
  Copied memory between host and device.
  
  Wrote and launched a kernel.
  Numba is a JIT compiler for Python.
  
  Allows access to the CUDA API in Python
  Can be used on Nvidia GPU.
  
  Has kernel and ufunc modes and reduction operations.
  We showcased a few more frameworks:
  
  HIP is a clone of CUDA, the difference is just the branding.
  
  It’s part of AMD’s open source ROCm platform.
  SYCL is a modern C++-based standard.
  
  Pushed by Intel.
  Thrust is a high level wrapper of CUDA providing useful abstractions.
  Such as containers (vectors).
  And algorithms (transformation, reductions, sorting...)
  
  Directive-based approaches use #pragma directives to transform loops into GPU code.
  
  Easy to get started.
  
  Complex code may require extensive optimization.