Optimization algorithms, understanding dropout and batch normalization

In this section, we will explain the different methods to compute moving averages according to the stationarity of the process, and relate these results to the optimiza- tion methods used in deep learning. We will also discuss several techniques that have been proposed in recent years to help tune the learning rate in a more systematic way. Additionally, we will motivate why dropout is regarded as a regularization technique and explain the underlying effects of its usage. We will consider some simple neural networks where a better understanding of dropout is available and infer this inter- pretation to other deep networks. Finally, we will describe “batch normalization” to enhance training speed in deep networks, and very briefly, we will present gradient checking as a technique to validate backpropagation of gradients.

Class: 

Data Science 2: Advanced Topics in Data Science

Attach Files: