Chapter 3 Convolutional Neural Networks

We will start with…

library(generics)
library(keras)
library(tensorflow)

3.1 3.1 Image Classifications with MLP

3.1.1 3.1.1 Input Layer

Let’s start by building a model that will begin by flattening a 28x28 image matrix into a vector.

model <- keras_model_sequential() %>%
  layer_flatten(input_shape = c(28,28))

Great, now let’s add some hidden layers.

3.1.2 3.1.2 Hidden Layers

We want to add two dense layers with 512 nodes.

model %>% 
  layer_dense(units = 512, activation = "relu") %>% 
  layer_dense(units = 512, activation = "relu")

3.1.3 3.1.3 Output Layer

Now we need to add an output layer using softmax for the numbers 1-10.

model %>%
  layer_dense(units = 10, activation = "softmax")

To see the whole model together - and what it produces

#sequential model
model <- keras_model_sequential() %>%

  #flatten the input
  layer_flatten(input_shape = c(28,28)) %>%
  
  #two hidden layers
  layer_dense(units = 512, activation = "relu") %>% 
  layer_dense(units = 512, activation = "relu")%>%
  
  #the output layer
  layer_dense(units = 10, activation = "softmax")

summary(model)

## Model: "sequential_9"
## ______________________________________________________________________________________________
## Layer (type)                              Output Shape                         Param #        
## ==============================================================================================
## flatten_7 (Flatten)                       (None, 784)                          0              
## ______________________________________________________________________________________________
## dense_23 (Dense)                          (None, 512)                          401920         
## ______________________________________________________________________________________________
## dense_24 (Dense)                          (None, 512)                          262656         
## ______________________________________________________________________________________________
## dense_25 (Dense)                          (None, 10)                           5130           
## ==============================================================================================
## Total params: 669,706
## Trainable params: 669,706
## Non-trainable params: 0
## ______________________________________________________________________________________________

3.1.4 3.1.3 BONUS - fit

First, let’s grab the mnist from keras. We’ll regularize it to be between 0 and 1 to speed up convergence, and use to_categorical() for the labels so that we can use softmax

# fetch the data set from the the keras library
mnist <- dataset_mnist()

train_images <- mnist$train$x / 255 #some regularization
train_labels <- to_categorical(mnist$train$y)

test_images <- mnist$test$x / 255 #some regularization
test_labels <- to_categorical(mnist$test$y)

What do these images look like?

plot_mnist <- function(x){
    train_images[x,,] %>% 
    as.raster(max = 1) %>% 
    plot()
}

par(mfrow = c(2,2))
for(i in c(1,16,72,95)) plot_mnist(i)

par(mfrow = c(1,1))

Let’s compile the network using stochastic gradient descent as our optimizer, examining the cateogircal crossentropy as our loss function and output the accuracy.

model %>%
  compile(
    optimizer = "rmsprop", #optimization algorithm
    loss = "categorical_crossentropy",
    metrics = c("accuracy")
  )

Great! So…. let’s fit it!

history <- model %>%
  fit(x = train_images,
      y = train_labels,
      epochs = 10,
      batch_size = 128)

plot(history)

## `geom_smooth()` using formula 'y ~ x'

We can also evaluate this model against test data.

model %>% evaluate(test_images, test_labels)

##      loss  accuracy 
## 0.1211862 0.9787000

Look at that accuracy! Not bad!

Let’s look in more detail with caret and see how well more model behaves for each number.

library(caret)
pred_mnist <- model %>% predict_classes(test_images)

confusionMatrix(data = factor(pred_mnist, levels = 0:9), 
                reference = factor(mnist$test$y, levels = 0:9))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0  974    0    3    2    1    4    4    1    6    3
##          1    2 1129    0    0    1    0    3    4    1    2
##          2    1    3 1022    6   11    1    3   16    8    1
##          3    0    0    2  991    0   10    1    1    6    5
##          4    0    0    0    0  944    0    1    0    1    4
##          5    0    1    0    4    0  873   12    0    7    4
##          6    2    1    1    0    5    1  932    0    0    0
##          7    1    1    2    3    7    1    0 1004    3    8
##          8    0    0    2    2    1    1    2    1  939    3
##          9    0    0    0    2   12    1    0    1    3  979
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9787          
##                  95% CI : (0.9757, 0.9814)
##     No Information Rate : 0.1135          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9763          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7
## Sensitivity            0.9939   0.9947   0.9903   0.9812   0.9613   0.9787   0.9729   0.9767
## Specificity            0.9973   0.9985   0.9944   0.9972   0.9993   0.9969   0.9989   0.9971
## Pos Pred Value         0.9760   0.9886   0.9534   0.9754   0.9937   0.9689   0.9894   0.9748
## Neg Pred Value         0.9993   0.9993   0.9989   0.9979   0.9958   0.9979   0.9971   0.9973
## Prevalence             0.0980   0.1135   0.1032   0.1010   0.0982   0.0892   0.0958   0.1028
## Detection Rate         0.0974   0.1129   0.1022   0.0991   0.0944   0.0873   0.0932   0.1004
## Detection Prevalence   0.0998   0.1142   0.1072   0.1016   0.0950   0.0901   0.0942   0.1030
## Balanced Accuracy      0.9956   0.9966   0.9924   0.9892   0.9803   0.9878   0.9859   0.9869
##                      Class: 8 Class: 9
## Sensitivity            0.9641   0.9703
## Specificity            0.9987   0.9979
## Pos Pred Value         0.9874   0.9810
## Neg Pred Value         0.9961   0.9967
## Prevalence             0.0974   0.1009
## Detection Rate         0.0939   0.0979
## Detection Prevalence   0.0951   0.0998
## Balanced Accuracy      0.9814   0.9841

Pretty good. Some funkiness with fives and eights, but otherwise, pretty good.

3.2 3.4 Image Classification using CNNs

3.2.1 3.4.1 Building the model architecture

OK, let’s build a CNN to analyze MNIST.

model_cnn <- keras_model_sequential() %>%
  
  layer_conv_2d(filters = 32, kernel_size = c(3,3),
                strides = 1, padding = "same",
                activation = "relu", input_shape = c(28,28,1)) %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_conv_2d(filters = 64, kernel_size = c(3,3),
                strides = 1, padding = "same",
                activation = "relu") %>%
  layer_max_pooling_2d(pool_size = c(2,2)) %>%
  
  layer_flatten() %>%
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 64, activation = "relu") %>%
  layer_dense(units = 10, activation = "softmax")

So, what does this look like?

model_cnn

## Model
## Model: "sequential_10"
## ______________________________________________________________________________________________
## Layer (type)                              Output Shape                         Param #        
## ==============================================================================================
## conv2d_9 (Conv2D)                         (None, 28, 28, 32)                   320            
## ______________________________________________________________________________________________
## max_pooling2d_8 (MaxPooling2D)            (None, 14, 14, 32)                   0              
## ______________________________________________________________________________________________
## conv2d_10 (Conv2D)                        (None, 14, 14, 64)                   18496          
## ______________________________________________________________________________________________
## max_pooling2d_9 (MaxPooling2D)            (None, 7, 7, 64)                     0              
## ______________________________________________________________________________________________
## flatten_8 (Flatten)                       (None, 3136)                         0              
## ______________________________________________________________________________________________
## dense_26 (Dense)                          (None, 64)                           200768         
## ______________________________________________________________________________________________
## dense_27 (Dense)                          (None, 64)                           4160           
## ______________________________________________________________________________________________
## dense_28 (Dense)                          (None, 10)                           650            
## ==============================================================================================
## Total params: 224,394
## Trainable params: 224,394
## Non-trainable params: 0
## ______________________________________________________________________________________________

3.2.2 Bonus: Let’s fit this and compare performance

First, we need to reshape the dimensions of the data a bit so that there is one layer (Black and White). Don’t worry, we’ve already rescaled.

train_images_cnn <- array_reshape(train_images, c(nrow(train_images), 28, 28, 1))
test_images_cnn <- array_reshape(test_images, c(nrow(test_images), 28, 28, 1))

model_cnn %>%
  compile(
    optimizer = "rmsprop", #optimization algorithm
    loss = "categorical_crossentropy",
    metrics = c("accuracy")
  )

history_cnn <- model_cnn %>%
  fit(x = train_images_cnn,
      y = train_labels,
      epochs = 10,
      batch_size = 128,
      shuffle = TRUE)

plot(history_cnn)

We can also evaluate it as before

model_cnn %>% evaluate(test_images_cnn, test_labels)

##       loss   accuracy 
## 0.03208445 0.99140000

Wow. That is a damn site better.

And the confusion matrix and a breakdown by class.

pred_mnist <- model %>% predict_classes(test_images)

confusionMatrix(data = factor(pred_mnist, levels = 0:9), 
                reference = factor(mnist$test$y, levels = 0:9))

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0  974    0    3    2    1    4    4    1    6    3
##          1    2 1129    0    0    1    0    3    4    1    2
##          2    1    3 1022    6   11    1    3   16    8    1
##          3    0    0    2  991    0   10    1    1    6    5
##          4    0    0    0    0  944    0    1    0    1    4
##          5    0    1    0    4    0  873   12    0    7    4
##          6    2    1    1    0    5    1  932    0    0    0
##          7    1    1    2    3    7    1    0 1004    3    8
##          8    0    0    2    2    1    1    2    1  939    3
##          9    0    0    0    2   12    1    0    1    3  979
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9787          
##                  95% CI : (0.9757, 0.9814)
##     No Information Rate : 0.1135          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9763          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7
## Sensitivity            0.9939   0.9947   0.9903   0.9812   0.9613   0.9787   0.9729   0.9767
## Specificity            0.9973   0.9985   0.9944   0.9972   0.9993   0.9969   0.9989   0.9971
## Pos Pred Value         0.9760   0.9886   0.9534   0.9754   0.9937   0.9689   0.9894   0.9748
## Neg Pred Value         0.9993   0.9993   0.9989   0.9979   0.9958   0.9979   0.9971   0.9973
## Prevalence             0.0980   0.1135   0.1032   0.1010   0.0982   0.0892   0.0958   0.1028
## Detection Rate         0.0974   0.1129   0.1022   0.0991   0.0944   0.0873   0.0932   0.1004
## Detection Prevalence   0.0998   0.1142   0.1072   0.1016   0.0950   0.0901   0.0942   0.1030
## Balanced Accuracy      0.9956   0.9966   0.9924   0.9892   0.9803   0.9878   0.9859   0.9869
##                      Class: 8 Class: 9
## Sensitivity            0.9641   0.9703
## Specificity            0.9987   0.9979
## Pos Pred Value         0.9874   0.9810
## Neg Pred Value         0.9961   0.9967
## Prevalence             0.0974   0.1009
## Detection Rate         0.0939   0.0979
## Detection Prevalence   0.0951   0.0998
## Balanced Accuracy      0.9814   0.9841

Also here. Excellent.