Chapter 3 Convolutional Neural Networks

We will start with…

3.1 3.1 Image Classifications with MLP

3.1.1 3.1.1 Input Layer

Let’s start by building a model that will begin by flattening a 28x28 image matrix into a vector.

Great, now let’s add some hidden layers.

3.1.2 3.1.2 Hidden Layers

We want to add two dense layers with 512 nodes.

3.1.3 3.1.3 Output Layer

Now we need to add an output layer using softmax for the numbers 1-10.

To see the whole model together - and what it produces

## Model: "sequential_9"
## ______________________________________________________________________________________________
## Layer (type)                              Output Shape                         Param #        
## ==============================================================================================
## flatten_7 (Flatten)                       (None, 784)                          0              
## ______________________________________________________________________________________________
## dense_23 (Dense)                          (None, 512)                          401920         
## ______________________________________________________________________________________________
## dense_24 (Dense)                          (None, 512)                          262656         
## ______________________________________________________________________________________________
## dense_25 (Dense)                          (None, 10)                           5130           
## ==============================================================================================
## Total params: 669,706
## Trainable params: 669,706
## Non-trainable params: 0
## ______________________________________________________________________________________________

3.1.4 3.1.3 BONUS - fit

First, let’s grab the mnist from keras. We’ll regularize it to be between 0 and 1 to speed up convergence, and use to_categorical() for the labels so that we can use softmax

What do these images look like?

Let’s compile the network using stochastic gradient descent as our optimizer, examining the cateogircal crossentropy as our loss function and output the accuracy.

Great! So…. let’s fit it!

## `geom_smooth()` using formula 'y ~ x'

We can also evaluate this model against test data.

##      loss  accuracy 
## 0.1211862 0.9787000

Look at that accuracy! Not bad!

Let’s look in more detail with caret and see how well more model behaves for each number.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0  974    0    3    2    1    4    4    1    6    3
##          1    2 1129    0    0    1    0    3    4    1    2
##          2    1    3 1022    6   11    1    3   16    8    1
##          3    0    0    2  991    0   10    1    1    6    5
##          4    0    0    0    0  944    0    1    0    1    4
##          5    0    1    0    4    0  873   12    0    7    4
##          6    2    1    1    0    5    1  932    0    0    0
##          7    1    1    2    3    7    1    0 1004    3    8
##          8    0    0    2    2    1    1    2    1  939    3
##          9    0    0    0    2   12    1    0    1    3  979
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9787          
##                  95% CI : (0.9757, 0.9814)
##     No Information Rate : 0.1135          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9763          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7
## Sensitivity            0.9939   0.9947   0.9903   0.9812   0.9613   0.9787   0.9729   0.9767
## Specificity            0.9973   0.9985   0.9944   0.9972   0.9993   0.9969   0.9989   0.9971
## Pos Pred Value         0.9760   0.9886   0.9534   0.9754   0.9937   0.9689   0.9894   0.9748
## Neg Pred Value         0.9993   0.9993   0.9989   0.9979   0.9958   0.9979   0.9971   0.9973
## Prevalence             0.0980   0.1135   0.1032   0.1010   0.0982   0.0892   0.0958   0.1028
## Detection Rate         0.0974   0.1129   0.1022   0.0991   0.0944   0.0873   0.0932   0.1004
## Detection Prevalence   0.0998   0.1142   0.1072   0.1016   0.0950   0.0901   0.0942   0.1030
## Balanced Accuracy      0.9956   0.9966   0.9924   0.9892   0.9803   0.9878   0.9859   0.9869
##                      Class: 8 Class: 9
## Sensitivity            0.9641   0.9703
## Specificity            0.9987   0.9979
## Pos Pred Value         0.9874   0.9810
## Neg Pred Value         0.9961   0.9967
## Prevalence             0.0974   0.1009
## Detection Rate         0.0939   0.0979
## Detection Prevalence   0.0951   0.0998
## Balanced Accuracy      0.9814   0.9841

Pretty good. Some funkiness with fives and eights, but otherwise, pretty good.

3.2 3.4 Image Classification using CNNs

3.2.1 3.4.1 Building the model architecture

OK, let’s build a CNN to analyze MNIST.

So, what does this look like?

## Model
## Model: "sequential_10"
## ______________________________________________________________________________________________
## Layer (type)                              Output Shape                         Param #        
## ==============================================================================================
## conv2d_9 (Conv2D)                         (None, 28, 28, 32)                   320            
## ______________________________________________________________________________________________
## max_pooling2d_8 (MaxPooling2D)            (None, 14, 14, 32)                   0              
## ______________________________________________________________________________________________
## conv2d_10 (Conv2D)                        (None, 14, 14, 64)                   18496          
## ______________________________________________________________________________________________
## max_pooling2d_9 (MaxPooling2D)            (None, 7, 7, 64)                     0              
## ______________________________________________________________________________________________
## flatten_8 (Flatten)                       (None, 3136)                         0              
## ______________________________________________________________________________________________
## dense_26 (Dense)                          (None, 64)                           200768         
## ______________________________________________________________________________________________
## dense_27 (Dense)                          (None, 64)                           4160           
## ______________________________________________________________________________________________
## dense_28 (Dense)                          (None, 10)                           650            
## ==============================================================================================
## Total params: 224,394
## Trainable params: 224,394
## Non-trainable params: 0
## ______________________________________________________________________________________________

3.2.2 Bonus: Let’s fit this and compare performance

First, we need to reshape the dimensions of the data a bit so that there is one layer (Black and White). Don’t worry, we’ve already rescaled.

We can also evaluate it as before

##       loss   accuracy 
## 0.03208445 0.99140000

Wow. That is a damn site better.

And the confusion matrix and a breakdown by class.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction    0    1    2    3    4    5    6    7    8    9
##          0  974    0    3    2    1    4    4    1    6    3
##          1    2 1129    0    0    1    0    3    4    1    2
##          2    1    3 1022    6   11    1    3   16    8    1
##          3    0    0    2  991    0   10    1    1    6    5
##          4    0    0    0    0  944    0    1    0    1    4
##          5    0    1    0    4    0  873   12    0    7    4
##          6    2    1    1    0    5    1  932    0    0    0
##          7    1    1    2    3    7    1    0 1004    3    8
##          8    0    0    2    2    1    1    2    1  939    3
##          9    0    0    0    2   12    1    0    1    3  979
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9787          
##                  95% CI : (0.9757, 0.9814)
##     No Information Rate : 0.1135          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.9763          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2 Class: 3 Class: 4 Class: 5 Class: 6 Class: 7
## Sensitivity            0.9939   0.9947   0.9903   0.9812   0.9613   0.9787   0.9729   0.9767
## Specificity            0.9973   0.9985   0.9944   0.9972   0.9993   0.9969   0.9989   0.9971
## Pos Pred Value         0.9760   0.9886   0.9534   0.9754   0.9937   0.9689   0.9894   0.9748
## Neg Pred Value         0.9993   0.9993   0.9989   0.9979   0.9958   0.9979   0.9971   0.9973
## Prevalence             0.0980   0.1135   0.1032   0.1010   0.0982   0.0892   0.0958   0.1028
## Detection Rate         0.0974   0.1129   0.1022   0.0991   0.0944   0.0873   0.0932   0.1004
## Detection Prevalence   0.0998   0.1142   0.1072   0.1016   0.0950   0.0901   0.0942   0.1030
## Balanced Accuracy      0.9956   0.9966   0.9924   0.9892   0.9803   0.9878   0.9859   0.9869
##                      Class: 8 Class: 9
## Sensitivity            0.9641   0.9703
## Specificity            0.9987   0.9979
## Pos Pred Value         0.9874   0.9810
## Neg Pred Value         0.9961   0.9967
## Prevalence             0.0974   0.1009
## Detection Rate         0.0939   0.0979
## Detection Prevalence   0.0951   0.0998
## Balanced Accuracy      0.9814   0.9841

Also here. Excellent.