AMD Day 2025

Collocated with the 8th Colombian HPC Summer School

Objective

The objective of this program is to introduce advanced students (ideally at the master’s and doctoral levels), researchers, faculty members, and professionals to parallel programming models and parallel computer architectures. The focus is on computing platforms (GPUs and NPUs) relevant to artificial intelligence (AI) applications.

Date/Time

June 19, 2025.
8:30 AM (COT) - 5:00 PM (COT)

Program Structure

The program consists of two technical sessions:

Session 1: Neural Processing Unit (NPU) Programming for AI

The Neural Processing Unit (NPU) in AMD Ryzen processors is an AI engine that utilizes AMD's XDNA architecture, designed to accelerate machine learning tasks. It's a dedicated processing unit that works alongside the main x86 CPU, offering improved performance and efficiency for AI-related workloads.

Topics Covered

Introduction to NPU and AI Engine architecture
AIE core, array configuration, and host application code compilation
Data movement and communication abstraction layers
Performance tracing and monitoring
Practical examples, including matrix multiplication and convolutions, as foundational components for machine learning and computer vision applications

Time	Topic
8:30	Intro to spatial compute and explicit data movement
8:45	“Hello World” from Ryzen™ AI
9:05	Exercise 1: Build and run your first program
9:20	Data movement on Ryzen™ AI with objectFIFOs
9:30	Exercise 2: Explore AIE DMA capabilities
9:50	Your First Program
10:20	Exercise 3: Vector-scalar mul
10:30	Coffee Break
11:00	Tracing and performance analysis
11:20	Exercise 4: Tracing vector-scalar mul
11:30	Vectorizing on AIE
11:50	Exercise 5: Tracing vectorized vector-scalar
12:00	Dataflow and larger designs
12:10	Exercise 6: More examples
12:20	Close Tutorial

Session 2: GPU Programming with HIP and ROCm

AMD’s Heterogeneous Compute Interface for Portability, or HIP, is a C++ runtime API and kernel language that allows developers to create portable applications that can run on AMD’s accelerators as well as CUDA devices.

Topics Covered:

Overview of parallel programming models, AMD GPUs, and the ROCm platform
Structure of HIP code: API usage, grid hierarchy, and kernel definitions
Practical examples: matrix and vector addition
Use of streams to overlap kernel execution and data transfers
Migration of CUDA applications to HIP using hipify and hipifly

Time	Topic
1:30 PM - 2:40 PM	Resource Access	Log in to Cluster and Clone Tutorial Repository
	Background	Parallel Programming Models, AMD GPUs, ROCm, HIP
	HIP Basic w/ Vactor Addition	Basic structure of HIP code, API calls, grid hierarchy, kernels
	HIP error Checking	Understanding HIP code, API calls, grid hierarchy, kernels.
2:45 PM - 3:00 PM	Hands-On Session 1	Participants work on exercises on content covered to this point
3:00 PM - 3:30 PM	Break
3:30 PM - 4:10 PM	2D Grids	Matrix addition example
	Concurrency	Streams, overlap kernels, overlap data trasnfers w/ compute
	CUDA-to-HIP Translations	hipify and hipifly
4:10 PM - 4:40 PM	Hands-On Session 2	Participants work on exercises on content covered to this point
4:40 PM - 5:00 PM	Differences Between HIP & CUDA	Architecture, tools, low-level optimizations
	IA on ROCm	Supported frameworks and ecosystem partners
	Wrap Up	Additional resources and where to go from here

Official Language

English: All sessions, tutorials, and instructions will be conducted in English.

Participant Requirements

This is an advanced-level program. Participants are expected to meet the following requirements:

Anyone, whether currently enrolled in the Summer School or not, including students at any level (undergraduate, graduate), alumni, researchers, and professionals, is eligible to apply.
Basic familiarity with command-line tools in Linux (e.g., vim, nano, etc.)
Access to a system capable of connecting to a remote cluster via SSH
Completion of the registration form by June 5, 2025, providing the following information:
- Full name
- Institution
- University email address
- GitHub account

University Requirements

Participating institutions must complete the following form by June 5, 2025, including:

Full name of the institutional representative
Institution name
Institutional email address of the representative
Public IP address of the institution (required by AMD to enable access to the training platform)
Additional notes (e.g., technical considerations regarding the provided IP or connection)

Technical Review Meeting
A technical review will be scheduled for Wednesday, June 9, 2025 (time to be confirmed). Please confirm:

Whether the institutional labs intended for training access can be used for this meeting
Whether a student from the institution can attend the meeting and follow instructions in English
The attending student can take notes or report on accessibility to the training platform

Connection Details

AMD will generate the Zoom link for the session, enabling recording and access control
CyberColombia will be responsible for distributing the link to the participating host universities.