Topic outline

    • This tarball contains the source code for the vector addition examples, gravitational potential exercise, and sample code for the diffusion equation homework exercise.

      The src/gravity directory contains two Python programs to test and benchmark the different solutions for the gravitational potential exercise; the solution functions are in the gravity_calculators subdirectory. The Sapporo N-body library has to be downloaded, patched, and built before it can be used here; this is done automatically by the get_sapporo.sh script in the sapporo2 directory (tested on Mist; some tweaks required to get this program to work on other systems).

    • What we learned on Monday:

      • Modern CPUs have SIMD hardware.
        • We (usually) depend on the compiler to vectorize serial code.
      • A GPU is a co-processor.
        • It’s similar to a CPU but with more parallel machinery.
        • It has its own instruction stream and memory.
        • Therefore, an application will have both host code and device code.
        • Similarly, variables can live either on the RAM (CPU-side) or the video memory (GPU-side).
          • Copying data back and forth is needed.
          • Copying data can be explicit or implicit.
      • The thread is the basic unit of parallelism (thread hierarchy).
        • Threads are collected into blocks, the blocks make up the grid.
        • The total number of threads should be >> number of CUDA “cores” / stream processors.

      What we learned on Wednesday:
      • There are many GPU programming frameworks
        • Native APIs
        • Graphics APIs (less good for general purpose programming)
        • Abstraction libraries help make code more portable
        • Directive-based approaches
        • Programs and libraries
      • We looked at a basic CUDA program:
        • We created pointers to device memory.
        • Copied memory between host and device.
        • Wrote and launched a kernel.
      • Numba is a JIT compiler for Python.
        • Allows access to the CUDA API in Python
        • Can be used on Nvidia GPU.
        • Has kernel and ufunc modes and reduction operations.
      • We showcased a few more frameworks:
        • HIP is a clone of CUDA, the difference is just the branding.
          • It’s part of AMD’s open source ROCm platform.
        • SYCL is a modern C++-based standard.
          • Pushed by Intel.
        • Thrust is a high level wrapper of CUDA providing useful abstractions.
          • Such as containers (vectors).
          • And algorithms (transformation, reductions, sorting...)
        • Directive-based approaches use #pragma directives to transform loops into GPU code.
          • Easy to get started.
          • Complex code may require extensive optimization.