Dog Breed Classification

By David Jimenez Barrero / Machine Learning

Notes on the problem

This is a challenging problem. The classification requires to sort correctly ~120 different classes, which is a complicated task. In my experience, it is unlikely that a single model can correctly classify as many classes. For this reason I propose to use an ensamble of models, where each learns to classify a single class from the rest. Now, for the models themselves, I propose to use a Convolutional Neural Network, due to its extensive use in image classification.

For this implementation I decided to switch to Keras, as it is a higher level programing language than TensorFlow, and allowed me to build a complex network faster.

Next, I briefly explain the functions used to create this network.

In [20]:
# Import packages

import os
import cv2
import pandas as pd
import fnmatch
import numpy as np
import random

import matplotlib.pyplot as plt
from itertools import cycle
from sklearn.metrics import roc_curve, auc


from keras.models import Sequential
from keras.layers import Dense
from keras.layers import Dropout
from keras.layers import Flatten
from keras.constraints import maxnorm
from keras.optimizers import SGD
from keras.layers.convolutional import Conv2D
from keras.layers.convolutional import MaxPooling2D
from keras import backend as K
K.set_image_dim_ordering('th')
In [21]:
def get_classes():
    ''' Function to get the classes names and their folders '''
    
    # Get folders' names
    classes_raw = [name for name in os.listdir("Images") \
        if os.path.isdir("Images/" + name) ]
    # Get classes names
    classes = [c.split('-', 1)[-1] for c in classes_raw]
    
    # Make arrays into pandas
    classes_df = pd.DataFrame(data=classes,columns=['breeds'])
    classes_df['folder_name'] = classes_raw
    
    return classes_df,classes_raw

In the following method I sample the test set by randomly selecting 10% of the images on each class.

In [22]:
def sample_test_set(classes_raw):
    ''' 
    Sample the test set. 10% of the images on every class are selected randomly
    for the test set.
    
    '''
    # Initialize the test sets
    tst_img = []
    tst_clss = []
        
    class_count = 0 # Keeps track of the clases seen
    
    # Get 10% of images on every class for the test set
    for breed in classes_raw:
        # Get list of filenames of images on the current class
        list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + breed),'*.jpg')]
       
        # Get the number of images per class to select
        # 10% that will go on the test set
        num_elem_class = len( list_filenames )
        
        # Num Elements that makeup 10%
        num_elem_test_set = int(np.ceil( num_elem_class * .10 )) 
        
        # Randomly select the  10% of the images to be on the test set
        elements_to_go_in_test = random.sample( range(0, num_elem_class) , num_elem_test_set)
        
        
        image_ref = 0 # Keeps track of the image currently being sorted
        
        for filename in list_filenames: # Iterates over files in the class folder
            
            path = "Images/" + breed + "/" + filename
            
            # Insert elements in test set
            if( image_ref in elements_to_go_in_test ):
                tst_img.append(path)
                
                # One-hot encode the class
                label = np.zeros(2)
                
                # Append respective class
                tst_clss.append(class_count)
        
            image_ref += 1
    
        class_count += 1
        
    return tst_img,tst_clss
In [23]:
def sample_target_class(target_class,tst_set,classes_raw):
    ''' 
    
    @Param target_class: Class which is the target of this sub-model
    @Param tst_set: Previously calculated test set
    @Param classes_raw: list of folder names for every class
    
    This function creates a list of all images in the target class folder, except
    for the images already present in the test set.
    
    '''
    # One-hot encode the class
    label = np.zeros(2)
    label[0] = 1
    
    # Initialize the arrays
    training_img = []
    training_clss = []
    
    folder = classes_raw[target_class]
        
    # Get list of filenames of images on the class
    list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + folder),'*.jpg')]
    
    for filename in list_filenames:
        # Create path
        path = "Images/" + folder + "/" + filename
        
        # Check the path is not selected on the tst set
        if path not in tst_set:
            training_img.append(path)
            training_clss.append(label)
    
    return training_img,training_clss

Notice I do not store images on flash memory, I only store the paths to the images. Creating arrays of images would take a severe impact on the flash memory of the computer.

In [24]:
def import_data(target_class, num_times_target, tst_img ):
    ''' 
    One-vs-All scheme
    
    @Param target_class: Class which is the target of this sub-model
    @Param num_times_target: The number of times the target class is re-sampled
        to reduce unbalanced data
        

    Four arrays are created:
        - Traning array with Paths for the images
        - Training array with respective image class
        
    Images classes are then colapsed to a binary classification, either belonging to
    a target class or not
    
    Notice I perform an indirect subsample of the data in the classes different from the 
    target, as I only select samples for the negative class as there is samples for the
    positive. As being more negatives than positives, there is a subsample.
    
    '''
    
    # Get classes from folder's names
    _,classes_raw = get_classes()
        
    # Initialize the arrays
    training_img = []
    training_clss = []
    
    # Target class is re-sampled
    for _ in range(num_times_target):
        tr_img, tr_cl = sample_target_class(target_class,tst_img,classes_raw)
        training_img += tr_img
        training_clss += tr_cl
    
    num_elems_targ = len(training_img)
    # Count sampled elements for the negative class
    count_non_target = 0
    
    # Sample as many elements for the negative class as there is positive samples
    while(count_non_target <= num_elems_targ):
        
        # Select a random folder
        random_folder = random.sample( classes_raw, 1 )[0]
        
        # Get list of filenames of images on the current class
        list_filenames = [ filename for filename in fnmatch.filter(os.listdir("Images/" + random_folder),'*.jpg')]
        
        # Select a random image from the folder
        random_img = random.sample( list_filenames, 1 )[0]
        
        # Build path to the image
        path = "Images/" + random_folder + "/" + random_img
        
        # Make sure the path is not sampled yet in tst set nor training
        if (path not in tst_img) and (path not in training_img):
            
            # One-hot encode the negative class
            label = np.zeros(2)
            label[1] = 1
            
            #Append data
            training_img.append(path)
            training_clss.append(label)
            
            count_non_target += 1
    
    return training_img,training_clss

In the following function, I import and process an image given a path and a desired height and width. Now, here 2 remarks:

  • Larger Images: The function crops the larger image, making it of the desired dimensions. Now, the way it crops is it leaves the center of the image, and discards the borders that exceed the dimensions. It is very likely that the target (dog) is located on the vicinity of the center of the image.

  • Smaller Images: Convolutional Neural Networks are space invariant models, which means that they are design to detect features, regardless of the spatial location of the feature. For this reason, I decided to create a padding for the small images consisting on the reflection of the image. This way, I can reinforce the detection of features on an image, as they can appear 2, 3 or even 4 times with the reflection technique.

In [25]:
def load_and_process_image(arb_height,arb_width,path):
    ''' 
    This function loads an image and makes it of a predefined size 
    arb_height x arb_width. If the image is larger than this size,
    a window of size arb_height x arb_width is placed in the center of the
    image and then cropped. If conversely the image is smaller,
    a padding is added using the reflection of the image with the border. This
    works well with Convolutional Neural Networks as they extract features
    not based on location.
    '''
    
    # Load image
    img = cv2.imread(path)
            
    # Get dimension (the channels are the RGB)
    height, width, channels = img.shape
    
    # If the image is larger than arb_height or arb_width, 
    # a subimage  with this size is cropped from the center 
    if(height > arb_height or width > arb_width):
        
        # Current dimensions
        c_height = np.ceil(height/2)
        c_width = np.ceil(width/2)
        
        # Subimage dimensions
        corr_height = np.ceil(arb_height/2)
        corr_width = np.ceil(arb_width/2)
        
        # Crop image
        img = img[int(c_height-corr_height):int(c_height+corr_height),
                  int(c_width-corr_width):int(c_width+corr_width)]
    
        # New dimensions
        height, width, channels = img.shape
    
    # If the image is smaller, I add a border which is a reflection.
    # Due to convolution, Conv_NN are not affected by this change.
    
    # Calculate the padding size required to match the target image size
    border_horizontal = arb_width-width
    border_vertical = arb_height-height
    
    img = cv2.copyMakeBorder( img, 0 , border_vertical ,
                             0 , border_horizontal, cv2.BORDER_REFLECT_101 )
    
    # Normalize the image
    image = img.astype(np.float32)
    image = np.multiply(image, 1.0 / 255.0)
    
    return image

Now, as mentioned before, creating arrays of so many images, creates a huge overhead in flash memory. So, I used python generators which create elements "on-the-fly". They are used to create batches for training and testing, and storing one batch at the time only - not the whole dataset.

In [26]:
def generate_data(list_of_paths, list_of_classes, batch_size):
    ''' 
    Python Generator. This Generator allows Keras to learn in batches.
    It creates a batch of images for the learning function of Keras.
    '''
    i = 0
    
    while True:
        image_batch = []
        class_batch = []
        
        for b in range(batch_size):
            
            # If all the images have been added to batches, then I shuffle
            # the data and proceed loading batches
            if i == len(list_of_paths):
                # Reset index
                i = 0
                # Shuffle data
                list_of_paths, list_of_classes = shuffle_lists_together(
                        list_of_paths, list_of_classes )
            
            # Load images
            _temp_loaded_image = load_and_process_image( arb_height, arb_width,
                                                       list_of_paths[i] )
            # Build batches
            image_batch.append( _temp_loaded_image )
            class_batch.append( list_of_classes[i] )
            i += 1
            
        yield np.array(image_batch),np.array(class_batch)
        
def generate_for_pred(list_of_paths,batch_size):
    ''' 
    Python Generator. This Generator allows Keras to predict in batches.
    It creates a batch of images for the network prediction.
    '''
    i = 0
    
    while True:
        
        image_batch = []
        for b in range(batch_size):
            # Creates a batch of images to classify
            
            # If all the images have been added to batches, then I shuffle
            # the data and proceed loading batches
            if i == len(list_of_paths):
                # Reset index
                i = 0
                
            # Loads image
            _temp_loaded_image = load_and_process_image( arb_height,
                                                        arb_width,list_of_paths[i] )
            # Builds batch
            image_batch.append( _temp_loaded_image )
            i += 1
            
        yield np.array(image_batch)
In [27]:
def shuffle_lists_together(a, b):
    ''' Shuffles a list of images together with the class it belongs to '''
    c = list(zip(a, b))
    random.shuffle(c)

    return zip(*c)
In [28]:
def one_hot_encode_all_vs_one(set_ ,target_breed ):
    # One-hot following all vs one scheme
    
    one_h_array = []
    
    for ele in set_:

        label = np.zeros(2)

        # All-vs-One encoding
        if(ele == target_breed):
            label[0] = 1
        else:
            label[1] = 1

        # Append respective class
        one_h_array.append(label)
    
    return one_h_array
In [39]:
# Define ROC diagram function
def build_ROC_curve(true_o, est_o, title, n_classes = 11, first_class=0):
    ''' Compute ROC curve and ROC area for each class '''

    #Create dictionaries for False Positive Rate, True Positive Rate, and Area Under the Curve
    fpr = dict()
    tpr = dict()
    roc_auc = dict()

    #Perform a one vs all scheme for calculating the ROC
    for i in range(n_classes):
        fpr[i], tpr[i], _ = roc_curve(true_o[:, i], est_o[:, i])
        roc_auc[i] = auc(fpr[i], tpr[i])

    #Build Figure
    plt.figure()
    lw = 2
    colors = cycle(['blue', 'darkorange', 'cornflowerblue', 'orange', 'green', 'red', 'purple', 'brown', 'pink', 'gray', 'olive', 'cyan'])
    for i, color in zip(range(n_classes), colors):
        plt.scatter(fpr[i][1], tpr[i][1], color=color, lw=lw,
         label='Class {0} (TPR = {1:0.2f}, FPR = {2:0.2f})'
         ''.format(i+first_class, tpr[i][1],fpr[i][1]))

    plt.plot([0, 1], [0, 1], 'k--', lw=lw)
    plt.xlim([0.0, 1.0])
    plt.ylim([0.0, 1.05])
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    plt.title(title)
    plt.legend(loc="lower right")
    plt.show()

Notes on network structure

I chose the structure of the network as the following stacked layers:

  • Convolutional input layer, 8 feature maps with filter size of 6×6.
  • Dropout regularization 20%.
  • Convolutional input layer, 8 feature maps with filter size of 6×6.
  • Max Pooling with filter size 2×2.
  • Convolutional input layer, 16 feature maps with filter size of 6×6.
  • Dropout regularization 20%.
  • Convolutional input layer, 16 feature maps with filter size of 6×6.
  • Max Pooling with filter size 2×2.
  • Convolutional input layer, 32 feature maps with filter size of 6×6.
  • Dropout regularization 20%.
  • Convolutional input layer, 32 feature maps with filter size of 6×6.
  • Max Pooling with filter size 2×2.
  • Flatten layer.
  • Dropout regularization 20%.
  • Fully connected layer with 1024 units.
  • Dropout regularization 20%.
  • Fully connected layer with 512 units.
  • Dropout regularization 20%.
  • Fully connected output layer with 2 units and a softmax activation function.

I chose this particular structure inspired in literature Conv NN's used to classify images (usually very distinct objects, like: cars, planes, persons, etc). I addapted it however to this particular problem. For example, I chose a relatively large filter size (6x6), as for this problem: the images are relatively large (I chose 700x700), and additionally it is hard to tell a dog appart from another by looking a few pixels, usually humans require more information. Thus, guided by this intuition, I selected the filter sizes.

In [36]:
def build_conv_nn( arb_height, arb_width, num_classes = 2):
    ''' 
    This function creates the structure of the Conv_NN and returns a model object 
    
    @Param arb_height: Images' height
    @Param arb_width: Images' width
    @Param num_classes: Number of classes, in One-vs-All, this is 2
    
    '''
    
    # Create the model. Sequential models are a linear stack of layers
    model = Sequential()
    # Convolutional layer: Filter size 6x6x3, with 8 output layers.
    # Padding as same to keep image size intact
    # Kernel constraint keeps the weigths constrained
    model.add(Conv2D(8, (6, 6), input_shape=(arb_height, arb_width, 3),
                     padding='same', activation='relu', kernel_constraint=maxnorm(3)))
    # Dropout regularization
    model.add(Dropout(0.2))
    model.add(Conv2D(8, (6, 6), activation='relu', padding='same'))
    # Perform pooling to reduce the size. Filter size 2x2
    model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="tf"))
    model.add(Conv2D(16, (6, 6), activation='relu', padding='same'))
    model.add(Dropout(0.2))
    model.add(Conv2D(16, (6, 6), activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="tf"))
    model.add(Conv2D(32, (6, 6), activation='relu', padding='same'))
    model.add(Dropout(0.2))
    model.add(Conv2D(32, (6, 6), activation='relu', padding='same'))
    model.add(MaxPooling2D(pool_size=(2, 2), dim_ordering="tf"))
    # Flatten the network to connect it to the fully connected layer
    model.add(Flatten())
    model.add(Dropout(0.2))
    model.add(Dense(1024, activation='relu', kernel_constraint=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(512, activation='relu', kernel_constraint=maxnorm(3)))
    model.add(Dropout(0.2))
    model.add(Dense(num_classes, activation='softmax'))
    
    return model

For training, I define the function:

In [37]:
def train_conv_nn( model, training_img, training_clss, batch_size, lrate):
    ''' 
    Function for training a model.
    
    @Param model: Model to train
    @Param training_img: Training set, contains paths to images
    @Param training_clss: Training set, contains respective images classes
    @Param batch_size: Size of the training batches
    @Param lrate: Learning rate
    
    '''
    
    # Define the optimization
    
        # Decay, factor by which lrate decreases every iteration. lr := lr*1/(1 + decay*iters)
    decay = lrate/epochs
    sgd = SGD(lr=lrate, momentum=0.9, decay=decay, nesterov=False)
    model.compile(loss='categorical_crossentropy', optimizer=sgd)
    
    # Print structure, by default set to false as there are 120 classes, thus 120 structures to print
    if (False):
        print(model.summary())
    
    # Train
    model.fit_generator(
            generate_data(training_img, training_clss, batch_size),
            steps_per_epoch=len(training_img) // batch_size,
            epochs = epochs, workers=2)

Now, I define the hyper-parameter, process the data and generate the ensamble

In [ ]:
### Define Hyper-parameters

# As I transform the problem to a One-vs-All classification approach,
# the number of classes is 2, an image either belongs to class c or not.
num_classes = 2

# Batch size
batch_size = 100

# Desired dimensions of the image
arb_height = 700
arb_width = 700

# Training Hyper-parameters
epochs = 1000
lrate = 1e-2

### Generate the test set

# Get classes from folder's names
classes_pd,classes_raw = get_classes()

# Generate test set
tst_img,tst_clss = sample_test_set(classes_raw)

# Shuffle set
tst_img,tst_clss = shuffle_lists_together(tst_img,tst_clss)


### Build Ensamble

ensamble_networks = []

for i in range(len(classes_raw)):
    
    # Get paths to images  and their classes split into training and test set
    #tst_img,tst_clss,training_img,training_clss = get_images_path(target_breed)
    
    training_img,training_clss = import_data( i , 2, tst_img) # i is the target breed

    # Shuffle training set
    training_img,training_clss = shuffle_lists_together(training_img,training_clss)
    
    # Build model
    _temp_mod = build_conv_nn( arb_height, arb_width, num_classes )
    
    # Store model in an array
    ensamble_networks.append( _temp_mod )
    
    # Train the last stored model
    
    train_conv_nn( ensamble_networks[-1] ,
                  training_img, training_clss,
                  batch_size, lrate)
    

After the ensamble is trained, it makes predictions by showing a particular sample to every sub-model that makes part of it. Each sub-model then gives its "opinion" on the sample, of weather it belongs to the class the sub-model focused on, or not. This "opinions" on the sample are given by probabilities (due to the softmax activation functions) of belonging to the class 1 of the sub-model. After every sub-model has seen the sample, the class is assigned to the sub-model who gave the highest probability for the sample.

In [ ]:
### Test the ensamble on data

opinions = pd.DataFrame()

curr_model = 0 # Keeps track of models seen. Works as index

# "Shows" the sample to every sub-model in the ensamble
for model in ensamble_network:


    # The sub-model makes a prediction on the sample
    preds_raw = model.predict_generator(
        generate_for_pred(tst_img,test_batch_size),
        steps = np.ceil(len(tst_img) / test_batch_size) , workers=2)
    
    # Now I am only intersted to know the probability of this sample belonging to this model.
    # The probability of not belonging is unimportant here. THUS, I obtain this probability only.
    
    prob_belong = [i[0] for i in preds_norm]
    
    # Stores this probability
    opinions[ curr_model ] = prob_belong
    
    curr_model += 1

In the previous cell, I stored the probability of a sample to belong to a certain sub-model in the ensamble. Now, I assign the sample to the class of the model who return a largest probability:

In [ ]:
pred = opinions.idxmax(axis=1).as_matrix()

To evaluate the model I plot the results on a ROC diagram.

In [ ]:
# Make ROC curve

one_h_tr = one_hot_encode( tst_clss, len(classes_raw) )
one_h_pr = one_hot_encode(  pred, len(classes_raw) )

build_ROC_curve( one_h_tr, one_h_pr, 
                "Wine Ensamble Binary-classification Validation Set",
                len(classes), first_class )

Conclusions

Due to the complexity of this network and the limited computing power I have available at home, I was not able to fully test this approach.

Running this script for target class 0, I obtained the following result for One-vs-All binary classification of Silky Terrier:

ROC_res

This result was obtained after only 50 epochs! This is very early-stopping as usually these algorithms run for epochs in the order of thousands, especially for such a complex and large network, nevertheless, it produced good results. Now, let us take a look at the training error during the last epochs:

ROC_res

The error was decreasing when the algorithm stopped. Which means with more epochs (perhaps 1000 or more), is likely that a better result can be achieved. However, it took several hours to obtain this result on my computer.

This is a proof that the model is on a good track, particularly as in an emsamble, this result has to compete with other results generated by other learners, and only the highest probability class prevails. This boosts the collective performance.