Large scale training; Dropout; CNN; Transformer
Home
Publications
About
Code
Blog
Large scale training; Dropout; CNN; Transformer
Improving Compute Efficacy Frontiers with SliceOut
Pascal Notin
,
Aidan N. Gomez
,
Joanna Yoo
,
Yarin Gal
Preprint.
A memory-efficient dropout-inspired scheme to train large neural networks faster with no loss in accuracy.
PDF
Cite
arXiv
Press (The Batch)
Cite
×