Stochastic gradient descent (SGD) and variants from
Scratch: A Python hands-on experience
The stochastic gradient descent (SGD) is a simple yet very influential algorithm
used to find the minimum of a loss (cost) function which is dependent on datasets with large
cardinality, such in cases typically associated with deep learning (DL). In relation to the
well-known GD (gradient descent) algorithm, SGD uses a noisy gradient approximation (considering
only a small fraction of the dataset) instead of the full gradient, where the key motivation for
such selection is speed.
There exists several variants/improvements over the “vanilla” SGD, such Adagrad,
RMSprop, Adadelta, Adam, Nadam, etc.; from a high-level perspective, all such variants may be
understood as using an adaptive element-wise step-size.
The primary objective of this course is to give an overview of the essential
theoretical aspects related to SGD and variants in order to program them in Python, from scratch
(i.e. not based on DL's libraries such as TensorFlow, PyTorch, MXNet). In order to test such
implementations, the MNIST and CIFAR-10 datasets will be used along with shallow networks
(consisting of hidden layers and a Softmax as the last layer).
Requirements |
A laptop with Python. If you have a graphics card (CUDA), the CUDA/nVidia
"suit" should be installed.
|
A gmail account to use Colab (equivalent to a Python virtual environment) or
the standard modules (e.g. numpy, scipy, matplotlib, pyfftw, pickle, etc.),
Jupyter and Tensorflow modules (and dependencies) locally installed.
|
About the instructor
Dr. Paul Rodriguez, Pontificia Universidad Católica del Perú
Received the BSc degree in electrical engineering from the
“Pontificia Universidad Católica del Perú’ (PUCP), Lima, Peru, in 1997, and the MSc and PhD
degrees in electrical engineering from the University of New Mexico, U.S., in 2003 and 2005
respectively.
He spent two years (2005-2007) as a postdoctoral researcher at Los Alamos National
Laboratory, and is currently a Full Professor with the Department of Electrical Engineering
at PUCP.
His research interests include AM-FM models, parallel algorithms, adaptive signal
decompositions, and optimization algorithms for inverse problems in signal and image
processing such Total Variation, Basis Pursuit, principal component pursuit (a.k.a. robust
PCA), convolutional sparse representations, extreme learning machines, etc.