IN2004B: Generation of Value with Data Analytics
Department of Industrial Engineering
Basics and Terminology
Application
When to use a Neural Network
The idea of having “an electronic brain” dates back to the 1940s.
Neural networks rose to fame in the late 1980s, but they did not take off due to the lack of computing power and the discovery of mathematically tractable machine algorithms such as Support Vector Machines.
In the 2010s, NN resurged thanks to the computational speed ups given by newly developed GPUs. They also changed their names to deep learning.
Many innovations follow such as massively parallel computing with GPUs, modern architechtures such as CNNs and RNNS, modern optimization algorithms to train NN, and so on.
NN have become prominent in many scientific fields.
Their most succesful applications include image classification, video classification, etc. Some of them are fun:
Essentially, a neural network is a nonlinear function \(f(\boldsymbol{X})\) to predict the response \(Y\) using a vector of p inputs, \(\boldsymbol{X} = (X_1, X_2, ..., X_p)\).
The function \(f(\boldsymbol{X})\) can be represented as a “network” with several interconnected nodes.
The nodes and network structure might make us think of how the neurons in the brain are connected and communicate to each other. However, this is not true since we still do not know how the brain actually works!
The structure of the NN is called its architecture, which depicts all the components and steps taken to reach the prediction.
The nodes in the network are called units.
The units are divided into groups called layers.
The simplest type of neural network is the perceptron.
An activation function turns several input values into a single number.
NN were originally developed for classification tasks.
Therefore, they use the Logistic or Sigmoid function:
\[g(z) = \frac{1}{1 + e^{-z}}.\]
The function’s output is between 0 and 1, which can be interpreted as the probability for the target class.
has two layers and multiple hidden units.
Modern NNs use the Rectified Linear Unit (ReLU) as an activation function:
\[g(z) = \begin{cases} 0 \text{ if } z < 0 \\ z \text{ if } z \geq 0 \end{cases}\]
The function’s output is between 0 and \(\infty\).
The ReLU function allows for a more computationally efficient training of a NN than the sigmoid function.
The output function takes the values of the hidden units as inputs to output the final prediction of the response
The function \(f\) depends on the type of problem:
Regression: \(f(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k) = \beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k\)
Classification: \(f(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k) = \frac{1}{1 + e^{-(\beta_{0} + \sum_{k=1}^{2} \beta_{k} A_k)}}\).
The user must specify the network architecture: the number of hidden units (\(K\)) to use.
The more hidden units, the longer the training time and the more complex the the NN.
In theory, a NN with one layer and many units will work for any prediction or classification problem. In other words, a NN is a universal approximator.
Multilayer NN have multiple hidden layers.
All neurons in one layer are fully connected to those in the next layer. This is referred to as a fully connected multilayer NN.
Three layers is typical, but more are possible too (a deeper multilayer NN).
Multilayer NN are cheaper to train compared to a single layer NN with many hidden units.
This is because their training leverages modern in GPUs and parallel computing.
Ideally, the multilayer NN is not too wide and should not be too deep.

Let’s import tensorflow and our standard Python libraries.
Here, we use specific functions from the pandas, matplotlib, seaborn and sklearn libraries in Python.
The MNIST dataset is one of the most widely used datasets for illustrating the performance of neural networks.
Goal: Predict the correct digit based on the pixel values of the image.
It is a datasets in tensorflow. It even has its partition into training and test dataset, and into a predictor matrix and a response vector.
Let’s inspect the shapes of the datasets.
plt.figure(figsize=(10, 4))
for i in range(10):
ax = plt.subplot(2, 5, i + 1) # 2 rows, 5 columns
sns.heatmap(
x_train[i],
cmap='gray',
cbar=False,
xticklabels=False,
yticklabels=False,
ax=ax
)
ax.set_title(f"Label: {y_train[i]}")
ax.axis('off')
plt.suptitle('First 10 MNIST Training Images', fontsize=16)
plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()
Neural networks work with inputs (or predictors) that take values between 0 and 1. For the images in the MNIST dataset, we can normalize the pixels of an image by dividing them by the largest pixel (255).
In Python, it will be something like this:
The input images of the MNIST dataset are matrices. However, the neural networks take vectors as input, which is why we need to flatten each matrix of pixels into a single vector.
We achieve this using the code below. Note that we apply it to the predictors from both the training and test dataset.
Let’s see the new dimensions of the input.
Shape of x_train_flattened after flattening: (60000, 784)
Shape of x_test_flattened after flattening: (10000, 784)
Since the images are \(28 \times 28\), the predictor input is a \(1 \times 784\) vector for each image.
We must also turn the response into a categorical response with its corresponding dummy variable encoding. First, we specify that there are 10 categories.
Next, we turn the responses into dummy variables in the training and test datasets.
y_train_one_hot = tf.keras.utils.to_categorical(y_train, num_classes)
y_test_one_hot = tf.keras.utils.to_categorical(y_test, num_classes)
print(f"Shape of y_train_one_hot after one-hot encoding: {y_train_one_hot.shape}")
print(f"Shape of y_test_one_hot after one-hot encoding: {y_test_one_hot.shape}")Shape of y_train_one_hot after one-hot encoding: (60000, 10)
Shape of y_test_one_hot after one-hot encoding: (10000, 10)
The neural network we will build has the following structure:
Dense layer with 256 units and a ReLU (Rectified Linear Unit) activation function. The input_shape is (784,), corresponding to the flattened 28x28 MNIST images. ReLU is chosen as the activation function due to its computational efficiency and its ability to mitigate the vanishing gradient problem.Second Hidden Layer: Another Dense layer with 128 units, also using ReLU activation. This layer further processes the features extracted by the previous layer.
Output Layer: A Dense layer with 10 units, corresponding to the 10 possible digit classes (0-9) in the MNIST dataset. It uses a Softmax activation function, which outputs a probability distribution over the classes. The class with the highest probability is chosen as the model’s prediction.
Model: "sequential"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩ │ dense (Dense) │ (None, 256) │ 200,960 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_1 (Dense) │ (None, 128) │ 32,896 │ ├─────────────────────────────────┼────────────────────────┼───────────────┤ │ dense_2 (Dense) │ (None, 10) │ 1,290 │ └─────────────────────────────────┴────────────────────────┴───────────────┘
Total params: 235,146 (918.54 KB)
Trainable params: 235,146 (918.54 KB)
Non-trainable params: 0 (0.00 B)
To train a neural network, we must set the number of epochs, which are the number of times the trainiing algorithm passes through the entire dataset.
The number of epochs can be seen as the iterations of the training algorithm. We expect that, the larger the number of epochs, the better the performance of the algorithm. However, larger numbers increases the computing time needed to train the network.
Let’s use 10 epochs as an example.
Another important parameter is the batch_size. Essentially, this parameter controls the number of images that are processed during training. In other words, 64 samples are processed at a time in each iteration during training.
It is recommended that this number is a power of two, such as 8, 16, 64, and 128. Here, we fix it to 64.
Now, we are ready to train the neural network.
Epoch 1/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 6:22 408ms/step - accuracy: 0.1250 - loss: 2.3058 51/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.6019 - loss: 1.3875 103/938 ━━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.6986 - loss: 1.0585 158/938 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.7490 - loss: 0.8833 212/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.7784 - loss: 0.7786 266/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.7988 - loss: 0.7056 320/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.8140 - loss: 0.6515 374/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.8259 - loss: 0.6086 425/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.8351 - loss: 0.5756 473/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.8425 - loss: 0.5492 526/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.8494 - loss: 0.5243 579/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.8554 - loss: 0.5027 633/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.8608 - loss: 0.4833 684/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.8654 - loss: 0.4670 736/938 ━━━━━━━━━━━━━━━━━━━━ 0s 961us/step - accuracy: 0.8695 - loss: 0.4521 788/938 ━━━━━━━━━━━━━━━━━━━━ 0s 962us/step - accuracy: 0.8733 - loss: 0.4385 839/938 ━━━━━━━━━━━━━━━━━━━━ 0s 963us/step - accuracy: 0.8767 - loss: 0.4265 891/938 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.8798 - loss: 0.4153 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.8825 - loss: 0.4057 - val_accuracy: 0.9671 - val_loss: 0.1021 Epoch 2/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9688 - loss: 0.1136 51/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9747 - loss: 0.0826 102/938 ━━━━━━━━━━━━━━━━━━━━ 0s 999us/step - accuracy: 0.9753 - loss: 0.0827 153/938 ━━━━━━━━━━━━━━━━━━━━ 0s 994us/step - accuracy: 0.9748 - loss: 0.0850 205/938 ━━━━━━━━━━━━━━━━━━━━ 0s 987us/step - accuracy: 0.9742 - loss: 0.0867 258/938 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9739 - loss: 0.0874 311/938 ━━━━━━━━━━━━━━━━━━━━ 0s 973us/step - accuracy: 0.9737 - loss: 0.0882 365/938 ━━━━━━━━━━━━━━━━━━━━ 0s 967us/step - accuracy: 0.9734 - loss: 0.0888 418/938 ━━━━━━━━━━━━━━━━━━━━ 0s 966us/step - accuracy: 0.9732 - loss: 0.0892 453/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9731 - loss: 0.0895 507/938 ━━━━━━━━━━━━━━━━━━━━ 0s 996us/step - accuracy: 0.9729 - loss: 0.0898 560/938 ━━━━━━━━━━━━━━━━━━━━ 0s 992us/step - accuracy: 0.9729 - loss: 0.0899 614/938 ━━━━━━━━━━━━━━━━━━━━ 0s 987us/step - accuracy: 0.9728 - loss: 0.0898 668/938 ━━━━━━━━━━━━━━━━━━━━ 0s 983us/step - accuracy: 0.9728 - loss: 0.0897 722/938 ━━━━━━━━━━━━━━━━━━━━ 0s 980us/step - accuracy: 0.9728 - loss: 0.0896 777/938 ━━━━━━━━━━━━━━━━━━━━ 0s 976us/step - accuracy: 0.9728 - loss: 0.0895 831/938 ━━━━━━━━━━━━━━━━━━━━ 0s 973us/step - accuracy: 0.9728 - loss: 0.0893 884/938 ━━━━━━━━━━━━━━━━━━━━ 0s 972us/step - accuracy: 0.9729 - loss: 0.0892 938/938 ━━━━━━━━━━━━━━━━━━━━ 0s 970us/step - accuracy: 0.9729 - loss: 0.0890 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9729 - loss: 0.0890 - val_accuracy: 0.9658 - val_loss: 0.1001 Epoch 3/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9688 - loss: 0.0927 55/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9798 - loss: 0.0659 107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9811 - loss: 0.0602 160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9816 - loss: 0.0579 214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9815 - loss: 0.0571 268/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9815 - loss: 0.0569 322/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9816 - loss: 0.0569 376/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9816 - loss: 0.0570 431/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9816 - loss: 0.0572 485/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9816 - loss: 0.0573 539/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9815 - loss: 0.0575 593/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9815 - loss: 0.0576 644/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9815 - loss: 0.0578 686/938 ━━━━━━━━━━━━━━━━━━━━ 0s 957us/step - accuracy: 0.9815 - loss: 0.0578 736/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.9815 - loss: 0.0578 789/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.9815 - loss: 0.0578 843/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9815 - loss: 0.0578 896/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9815 - loss: 0.0579 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9815 - loss: 0.0579 - val_accuracy: 0.9769 - val_loss: 0.0738 Epoch 4/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 0.9844 - loss: 0.0794 53/938 ━━━━━━━━━━━━━━━━━━━━ 0s 969us/step - accuracy: 0.9858 - loss: 0.0521 106/938 ━━━━━━━━━━━━━━━━━━━━ 0s 958us/step - accuracy: 0.9866 - loss: 0.0471 160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9868 - loss: 0.0452 214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9867 - loss: 0.0447 269/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9867 - loss: 0.0440 323/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0435 377/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0431 430/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9868 - loss: 0.0428 484/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9868 - loss: 0.0425 538/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9868 - loss: 0.0424 593/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9868 - loss: 0.0423 648/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9868 - loss: 0.0422 702/938 ━━━━━━━━━━━━━━━━━━━━ 0s 936us/step - accuracy: 0.9868 - loss: 0.0422 756/938 ━━━━━━━━━━━━━━━━━━━━ 0s 935us/step - accuracy: 0.9867 - loss: 0.0422 810/938 ━━━━━━━━━━━━━━━━━━━━ 0s 935us/step - accuracy: 0.9867 - loss: 0.0422 862/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9867 - loss: 0.0422 911/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9867 - loss: 0.0423 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9867 - loss: 0.0423 - val_accuracy: 0.9759 - val_loss: 0.0772 Epoch 5/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 6s 7ms/step - accuracy: 1.0000 - loss: 0.0154 52/938 ━━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.9919 - loss: 0.0264 105/938 ━━━━━━━━━━━━━━━━━━━━ 0s 975us/step - accuracy: 0.9920 - loss: 0.0257 158/938 ━━━━━━━━━━━━━━━━━━━━ 0s 968us/step - accuracy: 0.9918 - loss: 0.0261 212/938 ━━━━━━━━━━━━━━━━━━━━ 0s 961us/step - accuracy: 0.9918 - loss: 0.0263 266/938 ━━━━━━━━━━━━━━━━━━━━ 0s 957us/step - accuracy: 0.9919 - loss: 0.0262 319/938 ━━━━━━━━━━━━━━━━━━━━ 0s 956us/step - accuracy: 0.9919 - loss: 0.0263 373/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9919 - loss: 0.0265 427/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9918 - loss: 0.0268 481/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9917 - loss: 0.0271 535/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9916 - loss: 0.0274 589/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9915 - loss: 0.0277 643/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9914 - loss: 0.0280 697/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9913 - loss: 0.0282 751/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9912 - loss: 0.0284 804/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9912 - loss: 0.0286 857/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9911 - loss: 0.0288 911/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9910 - loss: 0.0290 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9910 - loss: 0.0292 - val_accuracy: 0.9762 - val_loss: 0.0820 Epoch 6/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 1.0000 - loss: 0.0055 53/938 ━━━━━━━━━━━━━━━━━━━━ 0s 968us/step - accuracy: 0.9921 - loss: 0.0252 106/938 ━━━━━━━━━━━━━━━━━━━━ 0s 960us/step - accuracy: 0.9924 - loss: 0.0240 160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9928 - loss: 0.0233 214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9930 - loss: 0.0229 268/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9933 - loss: 0.0225 321/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9934 - loss: 0.0221 376/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0220 429/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0221 483/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9934 - loss: 0.0223 537/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9934 - loss: 0.0224 592/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9933 - loss: 0.0227 646/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9932 - loss: 0.0229 699/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9931 - loss: 0.0231 752/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9931 - loss: 0.0233 806/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9930 - loss: 0.0234 859/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9929 - loss: 0.0236 913/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9929 - loss: 0.0237 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9929 - loss: 0.0238 - val_accuracy: 0.9790 - val_loss: 0.0723 Epoch 7/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 1.0000 - loss: 0.0199 53/938 ━━━━━━━━━━━━━━━━━━━━ 0s 964us/step - accuracy: 0.9957 - loss: 0.0178 107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 947us/step - accuracy: 0.9951 - loss: 0.0174 147/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9947 - loss: 0.0177 195/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9944 - loss: 0.0180 247/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0183 297/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0183 347/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9942 - loss: 0.0184 400/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9941 - loss: 0.0185 453/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1ms/step - accuracy: 0.9941 - loss: 0.0185 506/938 ━━━━━━━━━━━━━━━━━━━━ 0s 1000us/step - accuracy: 0.9941 - loss: 0.0185 560/938 ━━━━━━━━━━━━━━━━━━━━ 0s 994us/step - accuracy: 0.9941 - loss: 0.0185 614/938 ━━━━━━━━━━━━━━━━━━━━ 0s 989us/step - accuracy: 0.9940 - loss: 0.0186 668/938 ━━━━━━━━━━━━━━━━━━━━ 0s 985us/step - accuracy: 0.9940 - loss: 0.0187 722/938 ━━━━━━━━━━━━━━━━━━━━ 0s 981us/step - accuracy: 0.9939 - loss: 0.0188 776/938 ━━━━━━━━━━━━━━━━━━━━ 0s 978us/step - accuracy: 0.9939 - loss: 0.0189 830/938 ━━━━━━━━━━━━━━━━━━━━ 0s 975us/step - accuracy: 0.9938 - loss: 0.0190 884/938 ━━━━━━━━━━━━━━━━━━━━ 0s 972us/step - accuracy: 0.9937 - loss: 0.0192 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9937 - loss: 0.0194 - val_accuracy: 0.9780 - val_loss: 0.0848 Epoch 8/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0253 53/938 ━━━━━━━━━━━━━━━━━━━━ 0s 963us/step - accuracy: 0.9933 - loss: 0.0130 107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9936 - loss: 0.0141 160/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9941 - loss: 0.0141 214/938 ━━━━━━━━━━━━━━━━━━━━ 0s 948us/step - accuracy: 0.9944 - loss: 0.0138 267/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9946 - loss: 0.0137 318/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9946 - loss: 0.0138 370/938 ━━━━━━━━━━━━━━━━━━━━ 0s 956us/step - accuracy: 0.9946 - loss: 0.0140 423/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9946 - loss: 0.0143 476/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9946 - loss: 0.0145 530/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9946 - loss: 0.0148 584/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9945 - loss: 0.0150 637/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9945 - loss: 0.0152 691/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9945 - loss: 0.0154 742/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9944 - loss: 0.0156 795/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9944 - loss: 0.0158 848/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9943 - loss: 0.0161 902/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9943 - loss: 0.0163 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9943 - loss: 0.0165 - val_accuracy: 0.9806 - val_loss: 0.0767 Epoch 9/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0226 54/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9966 - loss: 0.0105 107/938 ━━━━━━━━━━━━━━━━━━━━ 0s 949us/step - accuracy: 0.9962 - loss: 0.0116 161/938 ━━━━━━━━━━━━━━━━━━━━ 0s 945us/step - accuracy: 0.9960 - loss: 0.0121 211/938 ━━━━━━━━━━━━━━━━━━━━ 0s 959us/step - accuracy: 0.9959 - loss: 0.0126 265/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9957 - loss: 0.0129 320/938 ━━━━━━━━━━━━━━━━━━━━ 0s 948us/step - accuracy: 0.9957 - loss: 0.0130 374/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9957 - loss: 0.0130 428/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9957 - loss: 0.0130 483/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9957 - loss: 0.0130 537/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9957 - loss: 0.0129 591/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9957 - loss: 0.0129 645/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9957 - loss: 0.0130 699/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9957 - loss: 0.0130 753/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9957 - loss: 0.0131 807/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9957 - loss: 0.0131 860/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9956 - loss: 0.0132 914/938 ━━━━━━━━━━━━━━━━━━━━ 0s 939us/step - accuracy: 0.9956 - loss: 0.0133 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9956 - loss: 0.0133 - val_accuracy: 0.9800 - val_loss: 0.0779 Epoch 10/10 1/938 ━━━━━━━━━━━━━━━━━━━━ 5s 6ms/step - accuracy: 0.9844 - loss: 0.0150 55/938 ━━━━━━━━━━━━━━━━━━━━ 0s 944us/step - accuracy: 0.9959 - loss: 0.0096 109/938 ━━━━━━━━━━━━━━━━━━━━ 0s 943us/step - accuracy: 0.9962 - loss: 0.0096 163/938 ━━━━━━━━━━━━━━━━━━━━ 0s 942us/step - accuracy: 0.9964 - loss: 0.0094 217/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9966 - loss: 0.0091 271/938 ━━━━━━━━━━━━━━━━━━━━ 0s 938us/step - accuracy: 0.9967 - loss: 0.0090 325/938 ━━━━━━━━━━━━━━━━━━━━ 0s 937us/step - accuracy: 0.9968 - loss: 0.0089 378/938 ━━━━━━━━━━━━━━━━━━━━ 0s 940us/step - accuracy: 0.9968 - loss: 0.0090 431/938 ━━━━━━━━━━━━━━━━━━━━ 0s 941us/step - accuracy: 0.9968 - loss: 0.0092 482/938 ━━━━━━━━━━━━━━━━━━━━ 0s 946us/step - accuracy: 0.9968 - loss: 0.0093 533/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9968 - loss: 0.0094 585/938 ━━━━━━━━━━━━━━━━━━━━ 0s 952us/step - accuracy: 0.9967 - loss: 0.0095 636/938 ━━━━━━━━━━━━━━━━━━━━ 0s 955us/step - accuracy: 0.9967 - loss: 0.0096 691/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9966 - loss: 0.0098 745/938 ━━━━━━━━━━━━━━━━━━━━ 0s 951us/step - accuracy: 0.9966 - loss: 0.0100 799/938 ━━━━━━━━━━━━━━━━━━━━ 0s 950us/step - accuracy: 0.9965 - loss: 0.0103 850/938 ━━━━━━━━━━━━━━━━━━━━ 0s 953us/step - accuracy: 0.9964 - loss: 0.0105 902/938 ━━━━━━━━━━━━━━━━━━━━ 0s 954us/step - accuracy: 0.9963 - loss: 0.0107 938/938 ━━━━━━━━━━━━━━━━━━━━ 1s 1ms/step - accuracy: 0.9963 - loss: 0.0108 - val_accuracy: 0.9780 - val_loss: 0.0872
We evaluate the classification performance of the neural network using test data. To this end, we apply the neural network classifier using the function .evaluate.
Remember, the input must be the flattened or vectorized images in x_test_flattened and the dummy coded responses in y_test_one_hot
In the multi-class problem, accuracy is still the proportion of correct decisions made.
As with any classifier seen before, the neural network calculates the probability that an image belongs to each class (0-9). We can have a look at the proabilities using the function .predict().
Following the Bayes classifier, we classify each observation in the test dataset to the most probable class. This can be done using the command below, where np.argmax() is a function from numpy that takes the argument with the maximum value.
The “deep” in deep learning is not a reference to any kind of deeper understanding achieved by the approach.
Instead, it stands for the idea of successive layers of representations.
Modern deep learning often involves tens or even hundres of sucessive layers.

Tecnologico de Monterrey