Topic: Slides | HPC133 Introduction to GPU Programming (Apr 2022)

Topic outline

- Select activity Setting up the environment (Python)
  
  Setting up the environment (Python) File
- Select activity [slides available for enrolled students]
  
  [slides available for enrolled students]
- Select activity What we learned on Monday: Modern CPUs have...
  
  What we learned on Monday:
  
  Modern CPUs have SIMD hardware.
  
  We (usually) depend on the compiler to vectorize serial code.
  
  A GPU is a co-processor.
  
  It’s similar to a CPU but with more parallel machinery.
  
  It has its own instruction stream and memory.
  
  Therefore, an application will have both host code and device code.
  
  Similarly, variables can live either on the RAM (CPU-side) or the video memory (GPU-side).
  
  Copying data back and forth is needed.
  
  Copying data can be explicit or implicit.
  
  The thread is the basic unit of parallelism (thread hierarchy).
  
  Threads are collected into blocks, the blocks make up the grid.
  
  The total number of threads should be >> number of CUDA “cores” / stream processors.
  
  We looked at basic CUDA and Thrust programs.
  
  In CUDA,
  
  We created pointers to device memory.
  
  Copied memory between host and device.
  
  Wrote and launched a kernel.
  
  In Thrust,
  
  We created host and device vectors.
  
  We launched a kernel implicitly using the thrust::transform function.
- Select activity What we learned on Wednesday: We discussed ...
  
  What we learned on Wednesday:
  
  We discussed the diffusion equation.
  
  Finite-difference method, stencils.
  
  We showcased a few more frameworks:
  
  HIP is a clone of CUDA, the difference is just the branding.
  
  It’s part of AMD’s open source ROCm platform.
  
  OpenCL is an open standard.
  
  It’s similar to CUDA/HIP but with different nomenclature.
  
  Device code is a string, compiled at runtime.
  
  It depends on vendor implementation, which is problematic.
  
  Tonnes of boilerplate code.
  
  Directive-based approaches use #pragma directives to transform loops into GPU code.
  
  Easy to get started.
  
  Complex code may require extensive optimization.
  
  SYCL is a modern C++-based standard.
  
  Pushed by Intel.
  
  Numba is a JIT compiler for Python.
  
  Can be used on Nvidia GPU.
  
  Has kernel and ufunc modes and reduction operations.