Last updated on April 6, 2021, 8:07 p.m. by vinit
This paper analyzes the effect of dropout on different properties of a neural network and describes how dropout interacts with the network's hyperparameters.
Although dropout alone gives significant improvements, using dropout along with max norm regularization, large decaying learning rates and high momentum provides a significant boost over just using dropout.
The best performing convolutional nets that do not use dropout achieve an error rate of 3.95%. Adding dropout only to the fully connected layers reduces the error to 3.02%. Adding dropout to the convolutional layers as well further reduces the error to 2.55%. Even more gains can be obtained by using maxout units.
Method ConvNet max pooling stochastic pooling max pooling max pooling dropout fully connected layers max pooling dropout in all layers maxout CIFAR-10 CIFAR-100 15.60 15.13 14.98 14.32 12.61 11.68 43.48 42.51 41.26 37.20 38.57 Table 4: Error rates on CIFAR-10 and CIFAR-100.
A 4-layer net pretrained with a stack of RBMs gets a phone error rate of 22.7%. With dropout, this reduces to 19.7%. Similarly, for an 8-layer net the error reduces from 20.5% to 19.7%. Method Phone Error Rate% NN Dropout NN 23.4 21.8 DBN-pre trained NN DBN-pre trained NN DBN-pre trained NN mcRBM-DBN-pre trained NN DBN-pre trained NN dropout DBN-pre trained NN dropout 22.7 22.4 20.7 20.5 19.7 19.7
Results on a Text Data Set To test the usefulness of dropout in the text domain, we used dropout networks to train a document classifier.
We 1942 Dropout Method Test Classification error % L2 L2 L1 applied towards the end of training L2 KL-sparsity Max-norm Dropout L2 Dropout Max-norm 1.62 1.60 1.55 1.35 1.25 1.05. Comparison of different regularization methods on MNIST. also see how the advantages obtained from dropout vary with the probability of retaining units, size of the network and the size of the training set.
In our experiments, we use CD-1 for training dropout RBMs. Effect on Features Dropout in feed-forward networks improved the quality of features by reducing co-adaptations.
Important points: