Large scale training; Dropout; CNN; Transformer

Improving Compute Efficacy Frontiers with SliceOut

Improving Compute Efficacy Frontiers with SliceOut

Preprint. A memory-efficient dropout-inspired scheme to train large neural networks faster with no loss in accuracy.