EN | How Will Cybersecurity Technologies Be Shaped by AI? — Part 1: Mathematics… lots of mathematics!
In this series, I will examine how artificial intelligence can shape, develop, and even render obsolete cybersecurity technologies and tools. In the first installment, we will begin by taking a detailed look at how AI systems operate today. We will explore the layers of deep learning and artificial neural networks, how data inputs are formed between these layers, the ReLU and Softmax activation functions, weight flow, forward and backward passes, and the mathematical operations an input undergoes to perform probability calculations.
Behind the scenes, AI’s structure represents a world where certain aspects of human intelligence are imitated, simulated, or automated through computer systems. The human brain — considered the most complex and intricate known structure in the universe — is being transformed through mathematical operations and theoretical approaches into an entity that can comprehend and produce responses. But how?
In recent years, the most commonly used and advanced structure in the field of AI is deep learning. By employing multilayer artificial neural networks arranged sequentially, deep learning enables the learning of complex functions.
This structure is composed of mathematical building blocks such as linear algebra, matrix computations and differentiation, optimization theory, differentiability, and non-convex optimization.
Before delving into the details of how this process works, let’s first take a look at the layers. Artificial neural networks are formed by organizing a series of computational units called neurons into layers.
- Input layer: This layer provides the model with raw data (image pixels, text, numerical values, etc.).
- Hidden layers: In this layer, the data is transformed into meaningful datasets. In deep learning models, the number of these layers can reach into the hundreds or even thousands. Each layer processes the data received from the previous layer and transforms it through a series of weights and activation functions.
- Output layer: Finally, from the transformed data, the final predictions, classifications, or output values are produced in this layer.
In artificial neural networks, data generally flows in two directions: the first is the forward pass, and the second is the backward pass. Each of these processes plays a critical role in how neural networks function and learn. The forward pass is the mechanism by which raw data moves from the input layer through the layers, all the way to the output layer. In this flow, raw data — such as the pixels of an image or a sequence of text — is first fed into the input layer. This data is then processed in each layer’s neurons through weights and activation functions. At each neuron, the incoming data is multiplied by the corresponding weights, the results are summed, and a bias value is added. The resulting value is then passed through a nonlinear activation function. The outputs obtained from this step become the input to the next layer, and the process continues through each hidden layer. Each hidden layer refines the data into more abstract and meaningful features. Eventually, when the processed data reaches the output layer, the network produces its final prediction, classification, or regression result.
During the forward pass, there isn’t actually any “learning” taking place; it’s simply a computation (a prediction) based on the current weight values. Let’s illustrate this with an example. Suppose we have a sample input text: “Alican Kiraz.” Let’s look at how this input moves through five layers — four hidden layers and one output layer — to produce a final output.
We’ll consider a simple, fully-connected artificial neural network with five layers (four hidden layers and one output layer) to demonstrate this progression. Note that this will be a theoretical example because, in a real network, transforming text into a numerical representation involves embeddings, word-to-vector conversions, or language models, which can be more complex. However, the theoretical principle remains similar.
Let’s assume the input (“Alican Kiraz”) has undergone some preprocessing and is represented by a 3-dimensional vector after a series of steps.
Our network consists of 5 layers: 1 input layer, 3 hidden layers, and 1 output layer.
- Input dimension: 3
- Each hidden layer: 4 neurons with ReLU activation
- Output layer: 2 neurons with Softmax activation
ReLU and Softmax are activation functions. ReLU is defined as ReLU(z) = max(0, z), meaning it sets negative values to zero and leaves positive values unchanged. This introduces non-linearity to neural networks and makes the training of large, deep networks more efficient.
The Softmax function applies an exponential function to each element in a vector and then divides by the sum of these exponentials to produce a probability distribution. For an output vector z=(z1,z2,…,zK)z=(z1,z2,…,zK), the Softmax function is defined as follows:
For each output class, a probability between 0 and 1 is obtained, and the sum of all probabilities equals 1. The Softmax function is used in the final layer of classification problems.
Returning to our summary: we will use Input (3 dimensions) → Layer 1 (4 neurons) → Layer 2 (4 neurons) → Layer 3 (4 neurons) → Layer 4 (4 neurons) → Output Layer (2 neurons). Our activation functions will be ReLU(z) = max(0, z), and we will use Softmax at the output.
Below, we provide example weight matrices (Wₗ) and bias vectors (bₗ) with hypothetical values for each layer. Since Medium’s formatting is limited, you can see them below as well. (Because this topic requires quite extensive mathematical knowledge, and I’m a bit rusty in math, let’s take advantage of ChatGPT and mathematical computation tools.)
The weight matrix W₁ (size 4x3) and bias b₁ (4x1) will be used in our forward pass calculation, where z₁ = W₁x₀ + b₁. Let’s first compute W₁x₀:
Then we add the bias:
A ReLU activation is applied, and ReLU(z) sets negative values to 0:
This process continues for three more steps until we reach the output layer. When W₅ is applied to x₄ again, it looks like this:
After adding the bias and applying the Softmax function:
In this output, we see that the network’s output for the input “Alican Kiraz” represents a probability distribution over two output classes. Of course, this example is entirely based on fabricated weights. In reality, during network training, the weights are learned from the data. In other words, each time you train the network, you teach it to “think” with different weights. The final values are interpreted according to the purpose for which the network is configured. The Softmax activation produces a “probability” value for each class, ranging from 0 to 1, with all probabilities summing to 1. In our example, we obtained a result like [0.5145, 0.4855], meaning our hypothetical model gave the first class a 51.45% probability and the second class a 48.55% probability. This probability plays a role as a predicted output in the AI’s decision-making mechanism. For a classification task, this output can be turned into a “decision” by choosing the class with the highest probability. Since 0.5145 is greater than 0.4855, the AI could label the input “Alican Kiraz” as belonging to the first class. For instance, if the network’s two classes were “Person Names” (first value) and “Product Names” (second value), it would classify the input as “Person Names” because that probability is higher. That’s exactly why the parameters are adjusted during training: so that the AI can perform tasks such as classification, prediction, or decision-making. At the end of the training process, the AI understands how to predict, decide, or classify for the given problem.
There is also the Backward Pass. In this flow, the network adjusts the weights based on the error produced by the output. This process is only used during training. At the end of this process, the prediction generated by the network is compared to the actual label (the correct answer), and an error (or loss) value is calculated. The effect of this error on the network’s weights is determined through mathematical derivative calculations. Using gradient descent or a similar optimization algorithm, the weights are slightly adjusted. As a result, the next time the model processes similar data, it updates in a direction that reduces the error.
Alright, if you’ve grown tired of the mathematical details, let’s step back a bit. But I should emphasize: AI = Math and Math = Life. So let’s delve a little deeper :D
What does each neuron contain? These artificial neurons are often likened to the biological neurons in the human brain. Biological neurons consist of dendrites, a soma, and an axon. Dendrites receive signals at their input terminals, the soma processes these signals, and the axon transmits the output to other neurons. In an artificial neuron, this process is translated into mathematical terms.
Each neuron receives a set of numerical inputs from either the network’s input layer or the previous layer’s neurons. These inputs are represented as x₁, x₂, x₃, …, each associated with a weight (w₁, w₂, w₃, …). These weights determine how much importance the neuron assigns to each input signal.
The neuron multiplies each input by its corresponding weight and sums them all up. This calculation can be expressed as:
z=w1x1+w2x2+⋯+wnxn+bz=w1x1+w2x2+⋯+wnxn+b
Here, the term bb is the bias, a constant offset value that shifts the output of the neuron.
If the neuron’s output requires a nonlinear mapping, activation functions such as Sigmoid, Tanh, ReLU (Rectified Linear Unit), Leaky ReLU, ELU, or Softmax come into play. The purpose of these functions is to enable the network to learn nonlinear relationships, making it possible to solve complex problems.
References:
- https://neptune.ai/blog/backpropagation-algorithm-in-neural-networks-guide
- https://iq.opengenus.org/purpose-of-different-layers-in-ml/#google_vignette
- https://math.mit.edu/~gs/learningfromdata/siam.pdf
- https://www.mathworks.com/discovery/deep-learning.html
- https://www.ibm.com/topics/deep-learning
- https://cloud.google.com/discover/what-is-deep-learning
- https://builtin.com/machine-learning/relu-activation-function
- https://thegradient.pub/the-limitations-of-visual-deep-learning-and-how-we-might-fix-them/