microddp

Performance Analysis of DDP

Communication Overhead

DDP adds communication overhead. Understanding when it’s worth it is crucial.

Overhead = Communication Time / Total Time

When DDP Breaks Down

DDP becomes inefficient when:

  1. Model too small: Communication time > computation time
  2. Network too slow: High latency, low bandwidth
  3. Batch size per rank too small: Can’t hide communication

Bottleneck Analysis

Computation-Bound

Computation time » communication time:

Communication-Bound

Communication time » computation time:

Memory-Bound

GPU memory limits batch size:

Further Reading