Chi / Larissa Face Detection #9 – Cutting Cloud Costs with Infrastructure Automation (Part IV – Preparing and Training Code on AWS with Drop Out – MNIST)

I said in the last post that I was ready to try this on our face. I’m totally not ready – I’m just missing one last thing I wanted to try. TFlearn gives us the option of easily including drop out in our fully connected layers of the NN, and I wanted to explore that really quickly before moving on because I’ve read about it and it seems super easily to implement.

Drop Out

Drop out is a simple regularization concept in NNs. When we implement drop out, we’re telling the NN to basically pretend like a few of the neurons in the fully connected layer don’t exist. This forces the other neurons (who have not been dropped out) to remodel their weights / thresholds to work with different neurons in every iteration of training.

I like to think of this method employing the same logic that subsampling does in tree techniques like random forests. When we subsample in random forest, we choose only a subset of samples and / or features to train on. This basically forces different features to explicitly coexist with some features and prevent the coexistence with other features to get a bit more perspective on which features are truly more important.

Dropping out within NNs also force features (or linear combinations of features, depending on how deep we get into our fully connected layers) to work nice with other features to see which really end up making an impact in every situation before completing our model training.

Dropping out in TFlearn is super simple. There’s literally a dropout function where you tell it what percentage of the neurons to keep in every iteration and TFlearn will perform this for you by dropping our random neurons.

It’s actually been in the code this whole time because sentdex used dropout in his model and I just copied sentdex, but I’ve been commenting it out. All the code below is the exact same as the previous post except the dropout line when building our NN.

In [1]:
# Install tflearn when working on a fresh AWS Deep Learning AMI (which doesn't come with TFlearn)
import os
os.system("sudo pip install tflearn")
Out[1]:
0
In [2]:
# TFlearn libraries
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
import tflearn.datasets.mnist as mnist

# General purpose libraries
import matplotlib.pyplot as plt
import numpy as np
import math
In [3]:
# Extract data from mnist.load_data()
x, y, x_test, y_test = mnist.load_data(one_hot = True)
Extracting mnist/train-images-idx3-ubyte.gz
Extracting mnist/train-labels-idx1-ubyte.gz
Extracting mnist/t10k-images-idx3-ubyte.gz
Extracting mnist/t10k-labels-idx1-ubyte.gz
In [4]:
# Reshape x
x_reshaped = x.reshape([-1, 28, 28, 1])
print 'x_reshaped has the shape {}'.format(x_reshaped.shape)
x_reshaped has the shape (55000, 28, 28, 1)
In [5]:
# Reshape x_test
x_test_reshaped = x_test.reshape([-1, 28, 28, 1])
print 'x_test_reshaped has the shape {}'.format(x_test_reshaped.shape)
x_test_reshaped has the shape (10000, 28, 28, 1)
In [6]:
# sentdex's code to build the neural net using tflearn
#   Input layer --> conv layer w/ max pooling --> conv layer w/ max pooling --> fully connected layer --> output layer
convnet = input_data(shape = [None, 28, 28, 1], name = 'input')

convnet = conv_2d(convnet, 32, 2, activation = 'relu')
convnet = max_pool_2d(convnet, 2)

convnet = conv_2d(convnet, 64, 2, activation = 'relu')
convnet = max_pool_2d(convnet, 2)

convnet = fully_connected(convnet, 1024, activation = 'relu')
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 10, activation = 'softmax')
convnet = regression(convnet, optimizer = 'sgd', learning_rate = 0.01, loss = 'categorical_crossentropy', name = 'targets')
In [7]:
model = tflearn.DNN(convnet)
model.fit(
    {'input': x_reshaped}, 
    {'targets': y}, 
    n_epoch = 5, 
    validation_set = ({'input': x_test_reshaped}, {'targets': y_test}), 
    snapshot_step = 500, 
    show_metric = True
)
Training Step: 4299  | total loss: 0.11463 | time: 11.579s
| SGD | epoch: 005 | loss: 0.11463 - acc: 0.9676 -- iter: 54976/55000
Training Step: 4300  | total loss: 0.11163 | time: 12.897s
| SGD | epoch: 005 | loss: 0.11163 - acc: 0.9693 | val_loss: 0.09128 - val_acc: 0.9733 -- iter: 55000/55000
--

Results

Nice. We’re at 97.3% and we’ve gained like 0.7%, which is actually a big deal in this application! At this level, we’re basically trying to squeeze every 1/10th of a percent we can get haha.

If the USPS or Canada Post are trying to use this algorithm to detect letters which become addresses, that 0.7% on a million letters being transferred is the difference between 7000 people receiving the correct mail. Sure, in the grand scheme of things, maybe 7000 isn’t too bad, but the USPS actually processes 21.1M mailpieces per day, good for 147,700 misclassified mail every hour. All of a sudden, every tenth of a percent is looking pretty solid right now.

I actually thought that dropping out would decrease the training time, but it seems here that it’s actually added time. I’m not sure if this is the overhead of picking and implementing the dropped out nodes or what, but I won’t pretend like I know exactly how TFlearn is performing this process. In fact, I won’t rule out altogether that it’s an uncontrolled variable that caused this. There could have been some external factor which caused this to take longer. I’ll leave this for now, but I just wanted to make note!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s