Chi / Larissa Face Detection #12 – Building & Training Convolutional Neural Network (AWS)


Note that this same notebook crashed my laptop when I tried to train my CNN, so I’m migrating this onto AWS. This notebook’s code is a similar copy of the previous notebook in this series except for this preface and the section after the model has been successfully trained.

In [1]:
# Install tflearn
import os
os.system("sudo pip install tflearn tqdm boto3 opencv-python")

Feature Building

In [2]:
import cv2
import numpy as np
import pandas as pd
import urllib
import math
import boto3
import os
import copy
from tqdm import tqdm
from matplotlib import pyplot as plt
%matplotlib inline
In [3]:
# Connect to s3 bucket
s3 = boto3.resource('s3', region_name = 'ca-central-1')
my_bucket = s3.Bucket('2017edmfasatb')
In [4]:
# Get all files in the project directory under chi_lars_face_detection/photos/
chi_photos = [i.key for i in my_bucket.objects.all() if 'chi_lars_face_detection/photos/chi/' in i.key]
lars_photos = [i.key for i in my_bucket.objects.all() if 'chi_lars_face_detection/photos/lars/' in i.key]
In [6]:
# Define function to convert URL to numpy array
def url_to_image(url):
    # Download the image, convert it to a numpy array, and then read it into OpenCV format
    resp = urllib.urlopen(url)
    image = np.asarray(bytearray(, dtype="uint8")
    image = cv2.imdecode(image, cv2.IMREAD_GRAYSCALE)
    # Rotate image
    image = np.rot90(image, 3)
    # Build resize into function
    image = cv2.resize(image, (0,0), fx=0.03, fy=0.03)

    # Return the image
    return image
In [10]:
# Loop through all files to download into a single array from AWS
url_prefix = ''
In [11]:
# Trying out the new tqdm library for progress bar
chi_photos_list = [url_to_image(os.path.join(url_prefix, x)) for x in tqdm(chi_photos)]
100%|██████████| 203/203 [04:20<00:00,  1.36s/it]
In [12]:
# Trying out the new tqdm library for progress bar
lars_photos_list = [url_to_image(os.path.join(url_prefix, x)) for x in tqdm(lars_photos)]
100%|██████████| 200/200 [04:21<00:00,  1.22s/it]
In [13]:
# Convert to numpy arrays
chi_photos_np = np.array(chi_photos_list)
lars_photos_np = np.array(lars_photos_list)
In [14]:
# Temporarily save np arrays'chi_photos_np_0.03_compress', chi_photos_np)'lars_photos_np_0.03_compress', lars_photos_np)
In [15]:
# Temporarily load from np arrays
chi_photos_np = np.load('chi_photos_np_0.03_compress.npy')
lars_photos_np = np.load('lars_photos_np_0.03_compress.npy')
In [16]:
# View shape of numpy array
(203, 91, 91)
In [17]:
# Set width var
width = chi_photos_np.shape[-1]

Scaling Inputs

In [19]:
# Try out scaler on a manually set data (min of 0, max of 255)
from sklearn.preprocessing import MinMaxScaler
In [20]:
# Set test data list to train on (min of 0, max of 255)
test_list = np.array([0, 255]).reshape(-1, 1)
array([[  0],
In [21]:
# Initialize scaler
scaler = MinMaxScaler()
In [22]:
# Fit test list
/Users/chiwang/anaconda/lib/python2.7/site-packages/sklearn/utils/ DataConversionWarning: Data with input dtype int64 was converted to float64 by MinMaxScaler.
  warnings.warn(msg, DataConversionWarning)
MinMaxScaler(copy=True, feature_range=(0, 1))

Reshaping 3D Array To 4D Array

In [24]:
chi_photos_np.reshape(-1, width, width, 1).shape
(203, 91, 91, 1)

Putting It All Together

In [25]:
# Reshape to prepare for scaler
chi_photos_np_flat = chi_photos_np.reshape(1, -1)
array([[135, 139, 139, ..., 210, 142, 136]], dtype=uint8)
In [26]:
# Scale
chi_photos_np_scaled = scaler.transform(chi_photos_np_flat)
array([[ 0.52941176,  0.54509804,  0.54509804, ...,  0.82352941,
         0.55686275,  0.53333333]])
In [27]:
# Reshape to prepare for scaler
lars_photos_np_flat = lars_photos_np.reshape(1, -1)
lars_photos_np_scaled = scaler.transform(lars_photos_np_flat)

Now let’s reshape.

In [28]:
# Reshape
chi_photos_reshaped = chi_photos_np_scaled.reshape(-1, width, width, 1)
lars_photos_reshaped = lars_photos_np_scaled.reshape(-1, width, width, 1)

print('{} has shape: {}'. format('chi_photos_reshaped', chi_photos_reshaped.shape))
print('{} has shape: {}'. format('lars_photos_reshaped', lars_photos_reshaped.shape))
chi_photos_reshaped has shape: (203, 91, 91, 1)
lars_photos_reshaped has shape: (200, 91, 91, 1)
In [29]:
# Create copy of chi's photos to start populating x_input
x_input = copy.deepcopy(chi_photos_reshaped)

print('{} has shape: {}'. format('x_input', x_input.shape))
x_input has shape: (203, 91, 91, 1)
In [30]:
# Concatentate lars' photos to existing x_input
x_input = np.append(x_input, lars_photos_reshaped, axis = 0)

print('{} has shape: {}'. format('x_input', x_input.shape))
x_input has shape: (403, 91, 91, 1)

Preparing Labels

In [31]:
# Create label arrays
y_chi = np.array([[1, 0] for i in chi_photos_reshaped])
y_lars = np.array([[0, 1] for i in lars_photos_reshaped])

print('{} has shape: {}'. format('y_chi', y_chi.shape))
print('{} has shape: {}'. format('y_lars', y_lars.shape))
y_chi has shape: (203, 2)
y_lars has shape: (200, 2)
In [32]:
# Preview the first few elements
array([[1, 0],
       [1, 0],
       [1, 0],
       [1, 0],
       [1, 0]])
In [33]:
array([[0, 1],
       [0, 1],
       [0, 1],
       [0, 1],
       [0, 1]])
In [34]:
# Create copy of chi's labels to start populating y_input
y_input = copy.deepcopy(y_chi)

print('{} has shape: {}'. format('y_input', y_input.shape))
y_input has shape: (203, 2)
In [35]:
# Concatentate lars' labels to existing y_input
y_input = np.append(y_input, y_lars, axis = 0)

print('{} has shape: {}'. format('y_input', y_input.shape))
y_input has shape: (403, 2)


I’m going to just copy and paste the CNN structure I used for the MNIST tutorial and see what happens. I’m running this on my own laptop by the way, let’s observe the speed.

In [36]:
# TFlearn libraries
import tflearn
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.estimator import regression
In [37]:
# sentdex's code to build the neural net using tflearn
#   Input layer --> conv layer w/ max pooling --> conv layer w/ max pooling --> fully connected layer --> output layer
convnet = input_data(shape = [None, width, width, 1], name = 'input')

convnet = conv_2d(convnet, 32, 10, activation = 'relu')
convnet = max_pool_2d(convnet, 2)

convnet = conv_2d(convnet, 64, 10, activation = 'relu')
convnet = max_pool_2d(convnet, 2)

convnet = fully_connected(convnet, 1024, activation = 'relu')
convnet = dropout(convnet, 0.8)

convnet = fully_connected(convnet, 2, activation = 'softmax')
convnet = regression(convnet, optimizer = 'sgd', learning_rate = 0.01, loss = 'categorical_crossentropy', name = 'targets')

Train Test Split

I’m just going to do a 90 / 10 train test split here

  • My training data will consist of roughly 360 training images
  • My test data will consist of roughly 40 test images
In [38]:
# Import library
from sklearn.cross_validation import train_test_split
In [39]:
(403, 91, 91, 1)
(403, 2)
In [40]:
# Perform train test split
x_train, x_test, y_train, y_test = train_test_split(x_input, y_input, test_size = 0.1, stratify = y_input)


Let’s try training with 3 epochs.

In [41]:
# Train with data
model = tflearn.DNN(convnet)
    {'input': x_train},
    {'targets': y_train},
    n_epoch = 3,
    validation_set = ({'input': x_test}, {'targets': y_test}),
    snapshot_step = 500,
    show_metric = True
Training Step: 35  | total loss: 0.01546 | time: 123.529s
| SGD | epoch: 003 | loss: 0.01546 - acc: 0.9999 -- iter: 704/724
Training Step: 36  | total loss: 0.01470 | time: 141.459s
| SGD | epoch: 003 | loss: 0.01470 - acc: 0.9999 | val_loss: 0.00820 - val_acc: 1.0000 -- iter: 724/724
In [42]:
# Save model'model_4_epochs_0.03_compression_99.6.tflearn')
INFO:tensorflow:/Users/chiwang/Documents/Projects/Dev/chi_lars_face_detection/notebook/model_4_epochs_0.03_compression_99.6.tflearn is not in all_model_checkpoint_paths. Manually adding it.


Okay, so that was quite the wild ride. I’ve gotten something to work right now and it’s giving me 99.99% accuracy. (0.0147 loss). The loss is cross entropy (measuring node purity, something to the tune of D=-\sum_{k=1}^{K}\hat{p}_{mk}log{\hat{p}_{mk}}), so that seems like quite a good loss value to have. Let’s try to predict on our test set and generate a simple confusion matrix just to make sure we’re sane.

In [46]:
# Predict on test set, generating probabilities of each class (one-hot style)
y_pred_proba = model.predict(x_test)
array([[ 0.0052014 ,  0.99479854],
       [ 0.00778673,  0.99221331],
       [ 0.00471839,  0.99528164],
       [ 0.00564974,  0.99435025],
       [ 0.00522123,  0.99477875],
       [ 0.00666202,  0.99333799],
       [ 0.00466673,  0.99533325],
       [ 0.00777068,  0.99222928],
       [ 0.98755813,  0.01244185],
       [ 0.99169731,  0.00830262]], dtype=float32)
In [54]:
# Convert probabilities to direct predictions
y_pred_labels = np.array(['chi' if y[0] >= 0.5 else 'lars' for y in y_pred_proba])
array(['lars', 'lars', 'lars', 'lars', 'lars', 'lars', 'lars', 'lars',
       'chi', 'chi'], 
In [56]:
# Convert y_test to direct predictions
y_test_labels = np.array(['chi' if y[0] >= 0.5 else 'lars' for y in y_test])
array(['lars', 'lars', 'lars', 'lars', 'lars', 'lars', 'lars', 'lars',
       'chi', 'chi'], 

It looks like it’s gotten the first 10 right so far… creepy. Let’s make a confusion matrix.

In [58]:
# Confusion matrix
from sklearn.metrics import confusion_matrix
confusion_matrix(y_test_labels, y_pred_labels)
array([[35,  0],
       [ 0, 47]])

Well, there’s that 100%… It guessed 35 / 35 of my photos and 47 / 47 of my girlfriend’s photos… Amazing? I really don’t know how to answer this question… I don’t know enough yet to answer this question yet.

On one hand…


I feel like the neural net can understand not only my face, but my life and my soul… I’m a bit scared, but it’s what I set out to do so I’m obviously very happy.

On the other hand, an experienced practitioner can probably scoff and laugh at how controlled my experiment was…

  • I had a small training sample size
  • I had a small test sample size
  • I had the same background
  • We wore the same clothes
  • Accessories and hairstyles were all the same
  • Only faces changed, but face position or head position relative to the photo frame

We’d probably want to test this with a few more photos with varying factors to understand a bit better how our NN operates. I’ll continue with this in the next post, but before that I want to talk about some very important details that I have glossed over.

Looping Back To The Details

There are many things I want to talk about because it took me basically an entire evening to actually get that thing to train.

The biggest problem I had was getting an “OOM” or Out-Of-Memory error. This took me so long to figure out, but the issue was that I was using way too many filters in my convolutional layers. Before, I had decided on using 100 filters in my first layer and 200 filters in my second layer. My logic here was clearly flawed, but was based the fact that we had 32 filters for our 28 x 28 MNIST data set, so I thought I should use a number of filters that was similar to the dimensions of my photo. This immediately caused an OOM error.

The second problem was that I’m not quite understanding how the variables and objects are built and stored within TFlearn. After I ran that model that yielded the OOM error, I tried re-tweaking the parameters, and I kept getting that error. I tried many different things, tweaking the number / size of convolutional filters, the number of nodes in the fully connected layer, the input size of the image (I thought that maybe ~100 x 100 pixels was too large of an input, so I scaled the image even smaller to ~25 x 25 like MNIST), but I kept getting the OOM error. Eventually, I figured out that I needed to reset the kernel in the jupyter notebook because every time I built my NN, I was building on top of the NN I had already built, just adding onto the first 100 filter convolutional layer. It turns out, at the end, that the 100 and 200 filters was what was causing the problems.

I also want to touch really quickly on the number of epochs as well. TFlearn gives us the benefit of seeing how the training process is progressing through the epochs. Technically, I would use cross validation to find the optimal number of epochs (not to mention all the other NN parameters as well), but I haven’t quite figured out / explored how to use GridSearchCV with TFlearn yet. I just watched the error metrics while the epochs trained and 3 seemed to do the best as, at one point, it reached 100% accuracy (result still yielded 100% accuracy).

At the end of the day, after I figured out the memory issues, I actually just ended up training the NN on my own laptop. At such a small sample size and only 3 epochs, the process took around 6 minutes. No big deal and I can save myself \$0.20 in the process :).

That’s all for now. Let’s test with a more complex test set in the next post!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s