# Dog Breed Classification

By David Jimenez Barrero / Machine Learning

## Notes on the problem

This is a challenging problem. The classification requires to sort correctly ~120 different classes, which is a complicated task. In my experience, it is unlikely that a single model can correctly classify as many classes. For this reason I propose to use an ensamble of models, where each learns to classify a single class from the rest. Now, for the models themselves, I propose to use a Convolutional Neural Network, due to its extensive use in image classification.

For this implementation I decided to switch to Keras, as it is a higher level programing language than TensorFlow, and allowed me to build a complex network faster.

Next, I briefly explain the functions used to create this network.

In [20]:
# Import packages

import os
import cv2
import pandas as pd
import fnmatch
import numpy as np
import random

import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.metrics import roc_curve, auc

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K
K.set_image_dim_ordering('th')

In [21]:
def get_classes():
''' Function to get the classes names and their folders '''

# Get folders' names
classes_raw = [name for name in os.listdir("Images") \
if os.path.isdir("Images/" + name) ]
# Get classes names
classes = [c.split('-', 1)[-1] for c in classes_raw]

# Make arrays into pandas
classes_df = pd.DataFrame(data=classes,columns=['breeds'])
classes_df['folder_name'] = classes_raw

return classes_df,classes_raw


In the following method I sample the test set by randomly selecting 10% of the images on each class.

In [22]:
def sample_test_set(classes_raw):
'''
Sample the test set. 10% of the images on every class are selected randomly
for the test set.

'''
# Initialize the test sets
tst_img = []
tst_clss = []

class_count = 0 # Keeps track of the clases seen

# Get 10% of images on every class for the test set
for breed in classes_raw:
# Get list of filenames of images on the current class
list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + breed),'*.jpg')]

# Get the number of images per class to select
# 10% that will go on the test set
num_elem_class = len( list_filenames )

# Num Elements that makeup 10%
num_elem_test_set = int(np.ceil( num_elem_class * .10 ))

# Randomly select the  10% of the images to be on the test set
elements_to_go_in_test = random.sample( range(0, num_elem_class) , num_elem_test_set)

image_ref = 0 # Keeps track of the image currently being sorted

for filename in list_filenames: # Iterates over files in the class folder

path = "Images/" + breed + "/" + filename

# Insert elements in test set
if( image_ref in elements_to_go_in_test ):
tst_img.append(path)

# One-hot encode the class
label = np.zeros(2)

# Append respective class
tst_clss.append(class_count)

image_ref += 1

class_count += 1

return tst_img,tst_clss

In [23]:
def sample_target_class(target_class,tst_set,classes_raw):
'''

@Param target_class: Class which is the target of this sub-model
@Param tst_set: Previously calculated test set
@Param classes_raw: list of folder names for every class

This function creates a list of all images in the target class folder, except
for the images already present in the test set.

'''
# One-hot encode the class
label = np.zeros(2)
label[0] = 1

# Initialize the arrays
training_img = []
training_clss = []

folder = classes_raw[target_class]

# Get list of filenames of images on the class
list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + folder),'*.jpg')]

for filename in list_filenames:
# Create path
path = "Images/" + folder + "/" + filename

# Check the path is not selected on the tst set
if path not in tst_set:
training_img.append(path)
training_clss.append(label)

return training_img,training_clss


Notice I do not store images on flash memory, I only store the paths to the images. Creating arrays of images would take a severe impact on the flash memory of the computer.

In [24]:
def import_data(target_class, num_times_target, tst_img ):
'''
One-vs-All scheme

@Param target_class: Class which is the target of this sub-model
@Param num_times_target: The number of times the target class is re-sampled
to reduce unbalanced data

Four arrays are created:
- Traning array with Paths for the images
- Training array with respective image class

Images classes are then colapsed to a binary classification, either belonging to
a target class or not

Notice I perform an indirect subsample of the data in the classes different from the
target, as I only select samples for the negative class as there is samples for the
positive. As being more negatives than positives, there is a subsample.

'''

# Get classes from folder's names
_,classes_raw = get_classes()

# Initialize the arrays
training_img = []
training_clss = []

# Target class is re-sampled
for _ in range(num_times_target):
tr_img, tr_cl = sample_target_class(target_class,tst_img,classes_raw)
training_img += tr_img
training_clss += tr_cl

num_elems_targ = len(training_img)
# Count sampled elements for the negative class
count_non_target = 0

# Sample as many elements for the negative class as there is positive samples
while(count_non_target <= num_elems_targ):

# Select a random folder
random_folder = random.sample( classes_raw, 1 )[0]

# Get list of filenames of images on the current class
list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + random_folder),'*.jpg')]

# Select a random image from the folder
random_img = random.sample( list_filenames, 1 )[0]

# Build path to the image
path = "Images/" + random_folder + "/" + random_img

# Make sure the path is not sampled yet in tst set nor training
if (path not in tst_img) and (path not in training_img):

# One-hot encode the negative class
label = np.zeros(2)
label[1] = 1

#Append data
training_img.append(path)
training_clss.append(label)

count_non_target += 1

return training_img,training_clss


In the following function, I import and process an image given a path and a desired height and width. Now, here 2 remarks:

• Larger Images: The function crops the larger image, making it of the desired dimensions. Now, the way it crops is it leaves the center of the image, and discards the borders that exceed the dimensions. It is very likely that the target (dog) is located on the vicinity of the center of the image.

• Smaller Images: Convolutional Neural Networks are space invariant models, which means that they are design to detect features, regardless of the spatial location of the feature. For this reason, I decided to create a padding for the small images consisting on the reflection of the image. This way, I can reinforce the detection of features on an image, as they can appear 2, 3 or even 4 times with the reflection technique.

In [25]:
def load_and_process_image(arb_height,arb_width,path):
'''
This function loads an image and makes it of a predefined size
arb_height x arb_width. If the image is larger than this size,
a window of size arb_height x arb_width is placed in the center of the
image and then cropped. If conversely the image is smaller,
a padding is added using the reflection of the image with the border. This
works well with Convolutional Neural Networks as they extract features
not based on location.
'''

# Get dimension (the channels are the RGB)
height, width, channels = img.shape

# If the image is larger than arb_height or arb_width,
# a subimage  with this size is cropped from the center
if(height > arb_height or width > arb_width):

# Current dimensions
c_height = np.ceil(height/2)
c_width = np.ceil(width/2)

# Subimage dimensions
corr_height = np.ceil(arb_height/2)
corr_width = np.ceil(arb_width/2)

# Crop image
img = img[int(c_height-corr_height):int(c_height+corr_height),
int(c_width-corr_width):int(c_width+corr_width)]

# New dimensions
height, width, channels = img.shape

# If the image is smaller, I add a border which is a reflection.
# Due to convolution, Conv_NN are not affected by this change.

# Calculate the padding size required to match the target image size
border_horizontal = arb_width-width
border_vertical = arb_height-height

img = cv2.copyMakeBorder( img, 0 , border_vertical ,
0 , border_horizontal, cv2.BORDER_REFLECT_101 )

# Normalize the image
image = img.astype(np.float32)
image = np.multiply(image, 1.0 / 255.0)

return image


Now, as mentioned before, creating arrays of so many images, creates a huge overhead in flash memory. So, I used python generators which create elements "on-the-fly". They are used to create batches for training and testing, and storing one batch at the time only - not the whole dataset.

In [26]:
def generate_data(list_of_paths, list_of_classes, batch_size):
'''
Python Generator. This Generator allows Keras to learn in batches.
It creates a batch of images for the learning function of Keras.
'''
i = 0

while True:
image_batch = []
class_batch = []

for b in range(batch_size):

# If all the images have been added to batches, then I shuffle
if i == len(list_of_paths):
# Reset index
i = 0
# Shuffle data
list_of_paths, list_of_classes = shuffle_lists_together(
list_of_paths, list_of_classes )

list_of_paths[i] )
# Build batches
class_batch.append( list_of_classes[i] )
i += 1

yield np.array(image_batch),np.array(class_batch)

def generate_for_pred(list_of_paths,batch_size):
'''
Python Generator. This Generator allows Keras to predict in batches.
It creates a batch of images for the network prediction.
'''
i = 0

while True:

image_batch = []
for b in range(batch_size):
# Creates a batch of images to classify

# If all the images have been added to batches, then I shuffle
if i == len(list_of_paths):
# Reset index
i = 0

arb_width,list_of_paths[i] )
# Builds batch
i += 1

yield np.array(image_batch)

In [27]:
def shuffle_lists_together(a, b):
''' Shuffles a list of images together with the class it belongs to '''
c = list(zip(a, b))
random.shuffle(c)

return zip(*c)

In [28]:
def one_hot_encode_all_vs_one(set_ ,target_breed ):
# One-hot following all vs one scheme

one_h_array = []

for ele in set_:

label = np.zeros(2)

# All-vs-One encoding
if(ele == target_breed):
label[0] = 1
else:
label[1] = 1

# Append respective class
one_h_array.append(label)

return one_h_array

In [39]:
# Define ROC diagram function
def build_ROC_curve(true_o, est_o, title, n_classes = 11, first_class=0):
''' Compute ROC curve and ROC area for each class '''

#Create dictionaries for False Positive Rate, True Positive Rate, and Area Under the Curve
fpr = dict()
tpr = dict()
roc_auc = dict()

#Perform a one vs all scheme for calculating the ROC
for i in range(n_classes):
fpr[i], tpr[i], _ = roc_curve(true_o[:, i], est_o[:, i])
roc_auc[i] = auc(fpr[i], tpr[i])

#Build Figure
plt.figure()
lw = 2
colors = cycle(['blue', 'darkorange', 'cornflowerblue', 'orange', 'green', 'red', 'purple', 'brown', 'pink', 'gray', 'olive', 'cyan'])
for i, color in zip(range(n_classes), colors):
plt.scatter(fpr[i][1], tpr[i][1], color=color, lw=lw,
label='Class {0} (TPR = {1:0.2f}, FPR = {2:0.2f})'
''.format(i+first_class, tpr[i][1],fpr[i][1]))

plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title(title)
plt.legend(loc="lower right")
plt.show()


## Notes on network structure

I chose the structure of the network as the following stacked layers:

• Convolutional input layer, 8 feature maps with filter size of 6×6.
• Dropout regularization 20%.
• Convolutional input layer, 8 feature maps with filter size of 6×6.
• Max Pooling with filter size 2×2.
• Convolutional input layer, 16 feature maps with filter size of 6×6.
• Dropout regularization 20%.
• Convolutional input layer, 16 feature maps with filter size of 6×6.
• Max Pooling with filter size 2×2.
• Convolutional input layer, 32 feature maps with filter size of 6×6.
• Dropout regularization 20%.
• Convolutional input layer, 32 feature maps with filter size of 6×6.
• Max Pooling with filter size 2×2.
• Flatten layer.
• Dropout regularization 20%.
• Fully connected layer with 1024 units.
• Dropout regularization 20%.
• Fully connected layer with 512 units.
• Dropout regularization 20%.
• Fully connected output layer with 2 units and a softmax activation function.

I chose this particular structure inspired in literature Conv NN's used to classify images (usually very distinct objects, like: cars, planes, persons, etc). I addapted it however to this particular problem. For example, I chose a relatively large filter size (6x6), as for this problem: the images are relatively large (I chose 700x700), and additionally it is hard to tell a dog appart from another by looking a few pixels, usually humans require more information. Thus, guided by this intuition, I selected the filter sizes.

In [36]:
def build_conv_nn( arb_height, arb_width, num_classes = 2):
'''
This function creates the structure of the Conv_NN and returns a model object

@Param arb_height: Images' height
@Param arb_width: Images' width
@Param num_classes: Number of classes, in One-vs-All, this is 2

'''

# Create the model. Sequential models are a linear stack of layers
model = Sequential()
# Convolutional layer: Filter size 6x6x3, with 8 output layers.
# Padding as same to keep image size intact
# Kernel constraint keeps the weigths constrained
model.add(Conv2D(8, (6, 6), input_shape=(arb_height, arb_width, 3),
# Dropout regularization
# Perform pooling to reduce the size. Filter size 2x2
# Flatten the network to connect it to the fully connected layer

return model


For training, I define the function:

In [37]:
def train_conv_nn( model, training_img, training_clss, batch_size, lrate):
'''
Function for training a model.

@Param model: Model to train
@Param training_img: Training set, contains paths to images
@Param training_clss: Training set, contains respective images classes
@Param batch_size: Size of the training batches
@Param lrate: Learning rate

'''

# Define the optimization

# Decay, factor by which lrate decreases every iteration. lr := lr*1/(1 + decay*iters)
decay = lrate/epochs
sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
model.compile(loss='categorical_crossentropy', optimizer=sgd)

# Print structure, by default set to false as there are 120 classes, thus 120 structures to print
if (False):
print(model.summary())

# Train
model.fit_generator(
generate_data(training_img, training_clss, batch_size),
steps_per_epoch=len(training_img) // batch_size,
epochs = epochs, workers=2)


Now, I define the hyper-parameter, process the data and generate the ensamble

In [ ]:
### Define Hyper-parameters

# As I transform the problem to a One-vs-All classification approach,
# the number of classes is 2, an image either belongs to class c or not.
num_classes = 2

# Batch size
batch_size = 100

# Desired dimensions of the image
arb_height = 700
arb_width = 700

# Training Hyper-parameters
epochs = 1000
lrate = 1e-2

### Generate the test set

# Get classes from folder's names
classes_pd,classes_raw = get_classes()

# Generate test set
tst_img,tst_clss = sample_test_set(classes_raw)

# Shuffle set
tst_img,tst_clss = shuffle_lists_together(tst_img,tst_clss)

### Build Ensamble

ensamble_networks = []

for i in range(len(classes_raw)):

# Get paths to images  and their classes split into training and test set
#tst_img,tst_clss,training_img,training_clss = get_images_path(target_breed)

training_img,training_clss = import_data( i , 2, tst_img) # i is the target breed

# Shuffle training set
training_img,training_clss = shuffle_lists_together(training_img,training_clss)

# Build model
_temp_mod = build_conv_nn( arb_height, arb_width, num_classes )

# Store model in an array
ensamble_networks.append( _temp_mod )

# Train the last stored model

train_conv_nn( ensamble_networks[-1] ,
training_img, training_clss,
batch_size, lrate)



After the ensamble is trained, it makes predictions by showing a particular sample to every sub-model that makes part of it. Each sub-model then gives its "opinion" on the sample, of weather it belongs to the class the sub-model focused on, or not. This "opinions" on the sample are given by probabilities (due to the softmax activation functions) of belonging to the class 1 of the sub-model. After every sub-model has seen the sample, the class is assigned to the sub-model who gave the highest probability for the sample.

In [ ]:
### Test the ensamble on data

opinions = pd.DataFrame()

curr_model = 0 # Keeps track of models seen. Works as index

# "Shows" the sample to every sub-model in the ensamble
for model in ensamble_network:

# The sub-model makes a prediction on the sample
preds_raw = model.predict_generator(
generate_for_pred(tst_img,test_batch_size),
steps = np.ceil(len(tst_img) / test_batch_size) , workers=2)

# Now I am only intersted to know the probability of this sample belonging to this model.
# The probability of not belonging is unimportant here. THUS, I obtain this probability only.

prob_belong = [i[0] for i in preds_norm]

# Stores this probability
opinions[ curr_model ] = prob_belong

curr_model += 1


In the previous cell, I stored the probability of a sample to belong to a certain sub-model in the ensamble. Now, I assign the sample to the class of the model who return a largest probability:

In [ ]:
pred = opinions.idxmax(axis=1).as_matrix()


To evaluate the model I plot the results on a ROC diagram.

In [ ]:
# Make ROC curve

one_h_tr = one_hot_encode( tst_clss, len(classes_raw) )
one_h_pr = one_hot_encode(  pred, len(classes_raw) )

build_ROC_curve( one_h_tr, one_h_pr,
"Wine Ensamble Binary-classification Validation Set",
len(classes), first_class )


## Conclusions

Due to the complexity of this network and the limited computing power I have available at home, I was not able to fully test this approach.

Running this script for target class 0, I obtained the following result for One-vs-All binary classification of Silky Terrier:

This result was obtained after only 50 epochs! This is very early-stopping as usually these algorithms run for epochs in the order of thousands, especially for such a complex and large network, nevertheless, it produced good results. Now, let us take a look at the training error during the last epochs:

The error was decreasing when the algorithm stopped. Which means with more epochs (perhaps 1000 or more), is likely that a better result can be achieved. However, it took several hours to obtain this result on my computer.

This is a proof that the model is on a good track, particularly as in an emsamble, this result has to compete with other results generated by other learners, and only the highest probability class prevails. This boosts the collective performance.