Aaron's Digital Garden 🪴

Recent Writing

Computer Arch Crash Course
Sep 26, 2025
The Missing Readme - consolidated by new grad
Jun 16, 2025
Caching Crash Course
Jun 08, 2025
OS Crash Course
Jun 08, 2025

Recent Notes

Dist OS
Oct 23, 2025
HW Disaggregation
Oct 23, 2025

Neural Network Training
Batch Gradient Descent
Stochastic Gradient Descent
Minibatch-Based SGD
Model Training Cost
Distributed Training is Necessary
Introduction to Distributed Training
Data Parallelism
Distributed Communication

❯

Infra LLMs + AI Agents

❯

Lec 3 Distributed Model Training

Lec 3 - Distributed Model Training

Oct 03, 20251 min read

Neural Network Training

Feedforward and back propagation gradient descent

Batch Gradient Descent

Stochastic Gradient Descent

Minibatch-Based SGD

Model Training Cost

Distributed Training is Necessary

Developers / Researchers’ time are more valuable than hardware .
If a training takes 10 GPU days • Parallelize with distributed training
1024 GPUs can finish in 14 minutes (ideally)!
The develop and research cycle will be greatly boosted

Introduction to Distributed Training

Data Parallelism

Train by splitting the training data over a bunch of GPUs

Scaling Distributed Machine Learning with the Parameter Server

All worker nodes synchronize to a single point
Two different roles in framework:
- Parameter Server: receive gradients from workers and send back the aggregated results
- Workers: compute gradients using splitted dataset and send to parameter server
Problems

Distributed Communication

Note: last two are typically used in industry atm

Recent Writing

Computer Arch Crash Course
Sep 26, 2025
The Missing Readme - consolidated by new grad
Jun 16, 2025
Caching Crash Course
Jun 08, 2025
OS Crash Course
Jun 08, 2025

Recent Notes

Dist OS
Oct 23, 2025
HW Disaggregation
Oct 23, 2025

Graph View

Neural Network Training
Batch Gradient Descent
Stochastic Gradient Descent
Minibatch-Based SGD
Model Training Cost
Distributed Training is Necessary
Introduction to Distributed Training
Data Parallelism
Distributed Communication

Created with Quartz v4.5.2 © 2025

GitHub
LinkedIn