University of Toronto | AGI Progress Tracker

Major

Adam: A Method for Stochastic Optimization

2014-12-22

Diederik Kingma and Jimmy Ba introduced Adam, an adaptive learning rate optimization algorithm. Adam combines the benefits of AdaGrad and RMSProp, computing adaptive learning rates for each parameter. It became the default optimizer for training deep neural networks and is used in virtually all modern deep learning frameworks.

Adaptive learning rates per parameter
Combines AdaGrad and RMSProp benefits
Computationally efficient
Works well with sparse gradients
Became default optimizer in deep learning

research-paperoptimizationdeep-learningadam

Sources

Adam Paper (ICLR 2015)

Major

Dropout: A Simple Way to Prevent Neural Networks from Overfitting

2013-07-01

Geoffrey Hinton and colleagues introduced Dropout, a regularization technique that randomly drops neurons during training. This simple method dramatically reduced overfitting and became a standard technique in deep learning, improving performance across computer vision, NLP, and speech recognition tasks.

Randomly drops neurons during training
Reduces co-adaptation between neurons
Simple yet highly effective regularization
Became standard in deep learning
Improved state-of-the-art on many benchmarks

research-paperregularizationdeep-learningneural-networks

Sources

Dropout Paper (Journal of Machine Learning Research)

Landmark

ImageNet Classification with Deep Convolutional Neural Networks

2012-12-03

AlexNet demonstrated that a deep convolutional neural network could dramatically outperform prior methods on ImageNet. Its 2012 breakthrough triggered the modern deep learning surge in computer vision by combining GPUs, ReLU-style activations, and dropout-style regularization.

Winning ImageNet 2012 entry
Showed deep CNNs could scale
Popularized GPU-accelerated training
Helped trigger the deep learning revolution
Combined ReLU and dropout-era techniques

research-paperdeep-learningvisioncnn

Sources

Major

Rectified Linear Units Improve Restricted Boltzmann Machines

2010-06-22

Vinod Nair and Geoffrey Hinton introduced rectified linear units for restricted Boltzmann machines, showing that ReLU-style activations improve learning speed and feature quality. The paper became one of the key early references behind modern deep learning activation design.

Early ReLU-based deep learning paper
Improved training speed and feature quality
Important precursor to later CNN practice
Helped normalize rectified activations in deep nets

research-paperdeep-learningmachine-learningneural-networks