Introduction to Neural Networks

IN2004B: Generation of Value with Data Analytics

Alan R. Vazquez

Department of Industrial Engineering

Agenda



  1. Basics and Terminology

  2. Application

  3. When to use a Neural Network

Basics and Terminology

Historical Background

  • The idea of having “an electronic brain” dates back to the 1940s.

  • Neural networks rose to fame in the late 1980s, but they did not take off due to the lack of computing power and the discovery of mathematically tractable machine algorithms such as Support Vector Machines.

  • In the 2010s, NN resurged thanks to the computational speed ups given by newly developed GPUs. They also changed their names to deep learning.

  • Many innovations follow such as massively parallel computing with GPUs, modern architechtures such as CNNs and RNNS, modern optimization algorithms to train NN, and so on.

Nowadays…



NN have become prominent in many scientific fields.

Their most succesful applications include image classification, video classification, etc. Some of them are fun:

Neural Network (NN)

Essentially, a neural network is a nonlinear function \(f(\boldsymbol{X})\) to predict the response \(Y\) using a vector of p inputs, \(\boldsymbol{X} = (X_1, X_2, ..., X_p)\).

The function \(f(\boldsymbol{X})\) can be represented as a “network” with several interconnected nodes.

The nodes and network structure might make us think of how the neurons in the brain are connected and communicate to each other. However, this is not true since we still do not know how the brain actually works!

Terminology



  • The structure of the NN is called its architecture, which depicts all the components and steps taken to reach the prediction.

  • The nodes in the network are called units.

  • The units are divided into groups called layers.

Perceptron

The simplest type of neural network is the perceptron.

Activation Function

An activation function turns several input values into a single number.

NN were originally developed for classification tasks.

Therefore, they use the Logistic or Sigmoid function:

\[g(z) = \frac{1}{1 + e^{-z}}.\]

The function’s output is between 0 and 1, which can be interpreted as the probability for the target class.

Single Layer NN…

has two layers and multiple hidden units.



Modern NNs use the Rectified Linear Unit (ReLU) as an activation function:

\[g(z) = \begin{cases} 0 \text{ if } z < 0 \\ z \text{ if } z \geq 0 \end{cases}\]

  • The function’s output is between 0 and \(\infty\).

  • The ReLU function allows for a more computationally efficient training of a NN than the sigmoid function.

Output Function



The output function takes the values of the hidden units as inputs to output the final prediction of the response


The function \(f\) depends on the type of problem:

  • Regression: \(f(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k) = \beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k\)

  • Classification: \(f(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k) = \frac{1}{1 + e^{-(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k)}}\).

Discussion



  • The user must specify the network architecture: the number of hidden units (\(K\)) to use.

  • The more hidden units, the longer the training time and the more complex the the NN.

  • In theory, a NN with one layer and many units will work for any prediction or classification problem. In other words, a NN is a universal approximator.

Multilayer Neural Networks


Multilayer NN have multiple hidden layers.


All neurons in one layer are fully connected to those in the next layer. This is referred to as a fully connected multilayer NN.


Three layers is typical, but more are possible too (a deeper multilayer NN).

Some Comments



  • Multilayer NN are cheaper to train compared to a single layer NN with many hidden units.

  • This is because their training leverages modern in GPUs and parallel computing.

  • Ideally, the multilayer NN is not too wide and should not be too deep.

Application

A new library: tensorflow

  • TensorFlow is an open-source Python library for building and training machine learning models, especially neural networks.
  • It provides high-level APIs such as Keras for fast and intuitive model development.
  • It supports efficient computation on CPUs, GPUs, and TPUs for large-scale learning tasks.
  • https://www.tensorflow.org/

The libraries



Let’s import tensorflow and our standard Python libraries.

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

Here, we use specific functions from the pandas, matplotlib, seaborn and sklearn libraries in Python.

Classifying Hand-Written Digits


The MNIST dataset is one of the most widely used datasets for illustrating the performance of neural networks.

  • Contains 70,000 grayscale images of handwritten digits (0–9)
  • Each image has a size of 28 × 28 pixels
  • Pixel values range from 0 (black) to 255 (white)
  • The response variable is the digit label (0–9)

Goal: Predict the correct digit based on the pixel values of the image.

The MNIST dataset

It is a datasets in tensorflow. It even has its partition into training and test dataset, and into a predictor matrix and a response vector.

(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

Let’s inspect the shapes of the datasets.

print(f"Shape of x_train: {x_train.shape}")
print(f"Shape of y_train: {y_train.shape}")
print(f"Shape of x_test: {x_test.shape}")
print(f"Shape of y_test: {y_test.shape}")
Shape of x_train: (60000, 28, 28)
Shape of y_train: (60000,)
Shape of x_test: (10000, 28, 28)
Shape of y_test: (10000,)

Example images

Code
plt.figure(figsize=(10, 4))
for i in range(10):
    ax = plt.subplot(2, 5, i + 1)  # 2 rows, 5 columns    
    sns.heatmap(
        x_train[i],
        cmap='gray',
        cbar=False,
        xticklabels=False,
        yticklabels=False,
        ax=ax
    )    
    ax.set_title(f"Label: {y_train[i]}")
    ax.axis('off')
plt.suptitle('First 10 MNIST Training Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

Data Pre-processing


Neural networks work with inputs (or predictors) that take values between 0 and 1. For the images in the MNIST dataset, we can normalize the pixels of an image by dividing them by the largest pixel (255).


In Python, it will be something like this:

x_train_normalized = x_train.astype('float32') / 255.0
x_test_normalized = x_test.astype('float32') / 255.0

Flattening: Turn a matrix into a vector


The input images of the MNIST dataset are matrices. However, the neural networks take vectors as input, which is why we need to flatten each matrix of pixels into a single vector.


We achieve this using the code below. Note that we apply it to the predictors from both the training and test dataset.

image_size = x_train.shape[1] * x_train.shape[2] # 28 * 28 = 784
x_train_flattened = x_train_normalized.reshape(-1, image_size)
x_test_flattened = x_test_normalized.reshape(-1, image_size)



Let’s see the new dimensions of the input.

print(f"Shape of x_train_flattened after flattening: {x_train_flattened.shape}")
print(f"Shape of x_test_flattened after flattening: {x_test_flattened.shape}")
Shape of x_train_flattened after flattening: (60000, 784)
Shape of x_test_flattened after flattening: (10000, 784)


Since the images are \(28 \times 28\), the predictor input is a \(1 \times 784\) vector for each image.

Pre-processing the response

We must also turn the response into a categorical response with its corresponding dummy variable encoding. First, we specify that there are 10 categories.

num_classes = 10

Next, we turn the responses into dummy variables in the training and test datasets.

y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes)

print(f"Shape of y_train_one_hot after one-hot encoding: {y_train_one_hot.shape}")
print(f"Shape of y_test_one_hot after one-hot encoding: {y_test_one_hot.shape}")
Shape of y_train_one_hot after one-hot encoding: (60000, 10)
Shape of y_test_one_hot after one-hot encoding: (10000, 10)

Setting the Structure of the Neural Network

The neural network we will build has the following structure:

  1. Input Layer / First Hidden Layer: A Dense layer with 256 units and a ReLU (Rectified Linear Unit) activation function. The input_shape is (784,), corresponding to the flattened 28x28 MNIST images. ReLU is chosen as the activation function due to its computational efficiency and its ability to mitigate the vanishing gradient problem.



  1. Second Hidden Layer: Another Dense layer with 128 units, also using ReLU activation. This layer further processes the features extracted by the previous layer.

  2. Output Layer: A Dense layer with 10 units, corresponding to the 10 possible digit classes (0-9) in the MNIST dataset. It uses a Softmax activation function, which outputs a probability distribution over the classes. The class with the highest probability is chosen as the model’s prediction.

In tensorflow

model = Sequential()
model.add(Dense(256, activation='relu', input_shape=(image_size,)))
model.add(Dense(128, activation='relu'))
model.add(Dense(num_classes, activation='softmax'))
model.compile(optimizer = 'adam', loss = 'categorical_crossentropy', 
              metrics = ['accuracy'])
model.summary()
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ dense (Dense)                   │ (None, 256)            │       200,960 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 128)            │        32,896 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 10)             │         1,290 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 235,146 (918.54 KB)
 Trainable params: 235,146 (918.54 KB)
 Non-trainable params: 0 (0.00 B)

Number of epochs

To train a neural network, we must set the number of epochs, which are the number of times the trainiing algorithm passes through the entire dataset.

The number of epochs can be seen as the iterations of the training algorithm. We expect that, the larger the number of epochs, the better the performance of the algorithm. However, larger numbers increases the computing time needed to train the network.

Let’s use 10 epochs as an example.

epochs = 10

Batch size


Another important parameter is the batch_size. Essentially, this parameter controls the number of images that are processed during training. In other words, 64 samples are processed at a time in each iteration during training.


It is recommended that this number is a power of two, such as 8, 16, 64, and 128. Here, we fix it to 64.

batch_size = 64

Training

Now, we are ready to train the neural network.

history = model.fit(
    x_train_flattened, y_train_one_hot,
    epochs=epochs,
    batch_size=batch_size,
    validation_data=(x_test_flattened, y_test_one_hot)
)
Epoch 1/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 6:22 408ms/step - accuracy: 0.1250 - loss: 2.3058

 51/938 ━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6019 - loss: 1.3875    

103/938 ━━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.6986 - loss: 1.0585

158/938 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.7490 - loss: 0.8833

212/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.7784 - loss: 0.7786

266/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.7988 - loss: 0.7056

320/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.8140 - loss: 0.6515

374/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.8259 - loss: 0.6086

425/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.8351 - loss: 0.5756

473/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.8425 - loss: 0.5492

526/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.8494 - loss: 0.5243

579/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.8554 - loss: 0.5027

633/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.8608 - loss: 0.4833

684/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.8654 - loss: 0.4670

736/938 ━━━━━━━━━━━━━━━━━━━━ 0s 961us/step - accuracy: 0.8695 - loss: 0.4521

788/938 ━━━━━━━━━━━━━━━━━━━━ 0s 962us/step - accuracy: 0.8733 - loss: 0.4385

839/938 ━━━━━━━━━━━━━━━━━━━━ 0s 963us/step - accuracy: 0.8767 - loss: 0.4265

891/938 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.8798 - loss: 0.4153

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8825 - loss: 0.4057 - val_accuracy: 0.9671 - val_loss: 0.1021

Epoch 2/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9688 - loss: 0.1136

 51/938 ━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9747 - loss: 0.0826

102/938 ━━━━━━━━━━━━━━━━━━━━ 0s 999us/step - accuracy: 0.9753 - loss: 0.0827

153/938 ━━━━━━━━━━━━━━━━━━━━ 0s 994us/step - accuracy: 0.9748 - loss: 0.0850

205/938 ━━━━━━━━━━━━━━━━━━━━ 0s 987us/step - accuracy: 0.9742 - loss: 0.0867

258/938 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9739 - loss: 0.0874

311/938 ━━━━━━━━━━━━━━━━━━━━ 0s 973us/step - accuracy: 0.9737 - loss: 0.0882

365/938 ━━━━━━━━━━━━━━━━━━━━ 0s 967us/step - accuracy: 0.9734 - loss: 0.0888

418/938 ━━━━━━━━━━━━━━━━━━━━ 0s 966us/step - accuracy: 0.9732 - loss: 0.0892

453/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9731 - loss: 0.0895  

507/938 ━━━━━━━━━━━━━━━━━━━━ 0s 996us/step - accuracy: 0.9729 - loss: 0.0898

560/938 ━━━━━━━━━━━━━━━━━━━━ 0s 992us/step - accuracy: 0.9729 - loss: 0.0899

614/938 ━━━━━━━━━━━━━━━━━━━━ 0s 987us/step - accuracy: 0.9728 - loss: 0.0898

668/938 ━━━━━━━━━━━━━━━━━━━━ 0s 983us/step - accuracy: 0.9728 - loss: 0.0897

722/938 ━━━━━━━━━━━━━━━━━━━━ 0s 980us/step - accuracy: 0.9728 - loss: 0.0896

777/938 ━━━━━━━━━━━━━━━━━━━━ 0s 976us/step - accuracy: 0.9728 - loss: 0.0895

831/938 ━━━━━━━━━━━━━━━━━━━━ 0s 973us/step - accuracy: 0.9728 - loss: 0.0893

884/938 ━━━━━━━━━━━━━━━━━━━━ 0s 972us/step - accuracy: 0.9729 - loss: 0.0892

938/938 ━━━━━━━━━━━━━━━━━━━━ 0s 970us/step - accuracy: 0.9729 - loss: 0.0890

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9729 - loss: 0.0890 - val_accuracy: 0.9658 - val_loss: 0.1001

Epoch 3/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9688 - loss: 0.0927

 55/938 ━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9798 - loss: 0.0659

107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9811 - loss: 0.0602

160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9816 - loss: 0.0579

214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9815 - loss: 0.0571

268/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9815 - loss: 0.0569

322/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9816 - loss: 0.0569

376/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9816 - loss: 0.0570

431/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9816 - loss: 0.0572

485/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9816 - loss: 0.0573

539/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9815 - loss: 0.0575

593/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9815 - loss: 0.0576

644/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9815 - loss: 0.0578

686/938 ━━━━━━━━━━━━━━━━━━━━ 0s 957us/step - accuracy: 0.9815 - loss: 0.0578

736/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.9815 - loss: 0.0578

789/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.9815 - loss: 0.0578

843/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9815 - loss: 0.0578

896/938 ━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9815 - loss: 0.0579

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9815 - loss: 0.0579 - val_accuracy: 0.9769 - val_loss: 0.0738

Epoch 4/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9844 - loss: 0.0794

 53/938 ━━━━━━━━━━━━━━━━━━━ 0s 969us/step - accuracy: 0.9858 - loss: 0.0521

106/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9866 - loss: 0.0471

160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9868 - loss: 0.0452

214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9867 - loss: 0.0447

269/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9867 - loss: 0.0440

323/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0435

377/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0431

430/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9868 - loss: 0.0428

484/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0425

538/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9868 - loss: 0.0424

593/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9868 - loss: 0.0423

648/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9868 - loss: 0.0422

702/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9868 - loss: 0.0422

756/938 ━━━━━━━━━━━━━━━━━━━━ 0s 935us/step - accuracy: 0.9867 - loss: 0.0422

810/938 ━━━━━━━━━━━━━━━━━━━━ 0s 935us/step - accuracy: 0.9867 - loss: 0.0422

862/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9867 - loss: 0.0422

911/938 ━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9867 - loss: 0.0423

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9867 - loss: 0.0423 - val_accuracy: 0.9759 - val_loss: 0.0772

Epoch 5/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 1.0000 - loss: 0.0154

 52/938 ━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.9919 - loss: 0.0264

105/938 ━━━━━━━━━━━━━━━━━━━━ 0s 975us/step - accuracy: 0.9920 - loss: 0.0257

158/938 ━━━━━━━━━━━━━━━━━━━━ 0s 968us/step - accuracy: 0.9918 - loss: 0.0261

212/938 ━━━━━━━━━━━━━━━━━━━━ 0s 961us/step - accuracy: 0.9918 - loss: 0.0263

266/938 ━━━━━━━━━━━━━━━━━━━━ 0s 957us/step - accuracy: 0.9919 - loss: 0.0262

319/938 ━━━━━━━━━━━━━━━━━━━━ 0s 956us/step - accuracy: 0.9919 - loss: 0.0263

373/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9919 - loss: 0.0265

427/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9918 - loss: 0.0268

481/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9917 - loss: 0.0271

535/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9916 - loss: 0.0274

589/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9915 - loss: 0.0277

643/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9914 - loss: 0.0280

697/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9913 - loss: 0.0282

751/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9912 - loss: 0.0284

804/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9912 - loss: 0.0286

857/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9911 - loss: 0.0288

911/938 ━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9910 - loss: 0.0290

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9910 - loss: 0.0292 - val_accuracy: 0.9762 - val_loss: 0.0820

Epoch 6/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 1.0000 - loss: 0.0055

 53/938 ━━━━━━━━━━━━━━━━━━━ 0s 968us/step - accuracy: 0.9921 - loss: 0.0252

106/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.9924 - loss: 0.0240

160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9928 - loss: 0.0233

214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9930 - loss: 0.0229

268/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9933 - loss: 0.0225

321/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9934 - loss: 0.0221

376/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0220

429/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0221

483/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0223

537/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9934 - loss: 0.0224

592/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9933 - loss: 0.0227

646/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9932 - loss: 0.0229

699/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9931 - loss: 0.0231

752/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9931 - loss: 0.0233

806/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9930 - loss: 0.0234

859/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9929 - loss: 0.0236

913/938 ━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9929 - loss: 0.0237

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9929 - loss: 0.0238 - val_accuracy: 0.9790 - val_loss: 0.0723

Epoch 7/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 1.0000 - loss: 0.0199

 53/938 ━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.9957 - loss: 0.0178

107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9951 - loss: 0.0174

147/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9947 - loss: 0.0177  

195/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9944 - loss: 0.0180

247/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0183

297/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0183

347/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0184

400/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9941 - loss: 0.0185

453/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9941 - loss: 0.0185

506/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1000us/step - accuracy: 0.9941 - loss: 0.0185

560/938 ━━━━━━━━━━━━━━━━━━━━ 0s 994us/step - accuracy: 0.9941 - loss: 0.0185 

614/938 ━━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.9940 - loss: 0.0186

668/938 ━━━━━━━━━━━━━━━━━━━━ 0s 985us/step - accuracy: 0.9940 - loss: 0.0187

722/938 ━━━━━━━━━━━━━━━━━━━━ 0s 981us/step - accuracy: 0.9939 - loss: 0.0188

776/938 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9939 - loss: 0.0189

830/938 ━━━━━━━━━━━━━━━━━━━━ 0s 975us/step - accuracy: 0.9938 - loss: 0.0190

884/938 ━━━━━━━━━━━━━━━━━━━━ 0s 972us/step - accuracy: 0.9937 - loss: 0.0192

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9937 - loss: 0.0194 - val_accuracy: 0.9780 - val_loss: 0.0848

Epoch 8/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0253

 53/938 ━━━━━━━━━━━━━━━━━━━ 0s 963us/step - accuracy: 0.9933 - loss: 0.0130

107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9936 - loss: 0.0141

160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9941 - loss: 0.0141

214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 948us/step - accuracy: 0.9944 - loss: 0.0138

267/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9946 - loss: 0.0137

318/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9946 - loss: 0.0138

370/938 ━━━━━━━━━━━━━━━━━━━━ 0s 956us/step - accuracy: 0.9946 - loss: 0.0140

423/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9946 - loss: 0.0143

476/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9946 - loss: 0.0145

530/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9946 - loss: 0.0148

584/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9945 - loss: 0.0150

637/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9945 - loss: 0.0152

691/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9945 - loss: 0.0154

742/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9944 - loss: 0.0156

795/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9944 - loss: 0.0158

848/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9943 - loss: 0.0161

902/938 ━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9943 - loss: 0.0163

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9943 - loss: 0.0165 - val_accuracy: 0.9806 - val_loss: 0.0767

Epoch 9/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0226

 54/938 ━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9966 - loss: 0.0105

107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9962 - loss: 0.0116

161/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9960 - loss: 0.0121

211/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.9959 - loss: 0.0126

265/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9957 - loss: 0.0129

320/938 ━━━━━━━━━━━━━━━━━━━━ 0s 948us/step - accuracy: 0.9957 - loss: 0.0130

374/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9957 - loss: 0.0130

428/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9957 - loss: 0.0130

483/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9957 - loss: 0.0130

537/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9957 - loss: 0.0129

591/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9957 - loss: 0.0129

645/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9957 - loss: 0.0130

699/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9957 - loss: 0.0130

753/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9957 - loss: 0.0131

807/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9957 - loss: 0.0131

860/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9956 - loss: 0.0132

914/938 ━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9956 - loss: 0.0133

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9956 - loss: 0.0133 - val_accuracy: 0.9800 - val_loss: 0.0779

Epoch 10/10


  1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0150

 55/938 ━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9959 - loss: 0.0096

109/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9962 - loss: 0.0096

163/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9964 - loss: 0.0094

217/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9966 - loss: 0.0091

271/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9967 - loss: 0.0090

325/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9968 - loss: 0.0089

378/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9968 - loss: 0.0090

431/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9968 - loss: 0.0092

482/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9968 - loss: 0.0093

533/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9968 - loss: 0.0094

585/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9967 - loss: 0.0095

636/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9967 - loss: 0.0096

691/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9966 - loss: 0.0098

745/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9966 - loss: 0.0100

799/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9965 - loss: 0.0103

850/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9964 - loss: 0.0105

902/938 ━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9963 - loss: 0.0107

938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9963 - loss: 0.0108 - val_accuracy: 0.9780 - val_loss: 0.0872

Evaluate Algorithm Performance


We evaluate the classification performance of the neural network using test data. To this end, we apply the neural network classifier using the function .evaluate.

Remember, the input must be the flattened or vectorized images in x_test_flattened and the dummy coded responses in y_test_one_hot

test_loss, test_accuracy = model.evaluate(x_test_flattened, y_test_one_hot)

# Show the accuracy on the training dataset.
print(f"Test Accuracy: {test_accuracy:.4f}")

Accuracy




In the multi-class problem, accuracy is still the proportion of correct decisions made.

print(f"Test Accuracy: {test_accuracy:.4f}")
Test Accuracy: 0.9780

Individual Classifications



As with any classifier seen before, the neural network calculates the probability that an image belongs to each class (0-9). We can have a look at the proabilities using the function .predict().

y_pred_probabilities = model.predict(x_test_flattened)




Following the Bayes classifier, we classify each observation in the test dataset to the most probable class. This can be done using the command below, where np.argmax() is a function from numpy that takes the argument with the maximum value.

y_pred_labels = np.argmax(y_pred_probabilities, axis=1)
y_pred_labels
array([7, 2, 1, ..., 4, 5, 6])

Confusion Matrix

Code
y_true_labels = np.argmax(y_test_one_hot, axis=1)
cm = confusion_matrix(y_true_labels, y_pred_labels)
ConfusionMatrixDisplay(cm).plot()

When to use a Neural Network

General Remarks



  • The “deep” in deep learning is not a reference to any kind of deeper understanding achieved by the approach.

  • Instead, it stands for the idea of successive layers of representations.

  • Modern deep learning often involves tens or even hundres of sucessive layers.

When should I use a NN?

NN VS Other Algorithms

Return to main page