microddp

Principles

Syllabus

Intro

What is distributed training?

Manual

Manually split the batch across two “GPUs” and average gradients.

Sandbox

Play with all-reduce ops.

All-Reduce

Performance Analysis

When is DDP worth it, how well does it scale?