This week's colloquium: "Accelerated DataFrame with Dask-cuDF on multiple GPUs" by Jinhui Qin from SHARCNET. The Compute Ontario Colloquia are weekly Zoom presentations on Advanced Research Computing, High Performance Computing, Research Data Management, and Research Software topics, delivered by staff from three Compute Ontario consortia (CAC, SciNet, SHARCNET) and guest speakers. The colloquia are one hour long and include time for questions. No registration is required. Most presentations are recorded and uploaded to the hosting consortium video channel.
cuDF is a GPU DataFrame library in Python. It provides a Pandas-like API with accelerated performance for DataFrame operations on a single GPU. However, dealing with large datasets is limited by the memory available on a single GPU. Since Dask provides a framework for scalable computing, Dask-cuDF integrates cuDF with Dask to allow scaling out DataFrame workloads across multiple GPUs. This webinar introduces Dask-cuDF with demo examples on a multi-GPU node on the national clusters.