Large scale training; Dropout; CNN; Transformer

Pascal Notin, Aidan N. Gomez, Joanna Yoo, Yarin Gal

Preprint. A memory-efficient dropout-inspired scheme to train large neural networks faster with no loss in accuracy.