This article is based on the book “Swarm Intelligence” by James Kennedy and Russel C. Eberhart.
Neural networks are Machine Learning algorithms that learn how to make a task using training examples. This algorithm simulates the behavior of our brains to learn patterns in data.
We have three main parts in Neural Networks: neurons, layers, and backpropagation.
In this image, we have a neural network with one layer and 3 neurons. The neural network gets the data from the input layer and transforms the data according to the structure defined.
Each of these arrows and neurons represents an operation in the network. Usually, these operations are multiplication and sum between the data and weights.
In this article, we will optimize our neural network without backpropagation. Instead, we will apply a bio-inspired algorithm Particle Swarm Optimization.
Particle Swarm Optimization (PSO)
PSO is an optimization algorithm inspired by biological behavior. Unlike Backpropagation, PSO does not use gradients. It is a metaheuristic as it does not guarantee an optimal solution. However, it can search in very large spaces of candidate solutions. There are three main parts in PSO: particles, constants, and iterations.
The first element in PSO is the particle. In the beginning, we will define the number of particles to be produced. All particles will be a potential solution. As we want to optimize a neural network using PSO, each particle will have a weights combination of our network.
Also, each particle will have three properties, velocity (random at the beginning), the personal best solution (best solution found by the particle), and global best solution (best solution found by the swarm). These properties will determine the direction of the particle.
There are three main constants in this algorithm: cognitive coefficient (c1), social coefficient (c2), and inertia (w). Each of these constants is related to the personal best solution, the global best solution, and the velocity of the particles.
Where X is the current position, r1 and r2 are random numbers, pbest is the personal best solution, gbest is the global best solution, and velocity is the current velocity.
Following this rule for each particle, we will optimize the solution balancing between what a particle believes is the best solution, what the swarm believes is the best solution, and the current movement inertia.
The iterations are the number of times that the particles will update their velocities. In each iteration, the particles will move to search for the best solution.
In this image, we can observe how each particle is moving around to find the best solution. The particle that found the best solution will attract all the others until others find the best solution.
In this project, we are going to use the PSO optimization library pyswarms.
The library can be installed using pip:
<pip install pyswarms>
Building the neural network
First, we are going to build a test case using keras and scikitlearn. We are going to use dataset iris.
The function create_custom_model will help us to build our network.
Then, we build our network structure with 4 neurons and 1 layer.
In order to use PSO, we need to extract and set the structure of the network.
The optimization function
We need to build an optimization function.
The optimization function in pyswarms needs 1 parameter W with all the solutions of each particle. Then, we can add all the additional variables that we need. In our case, we will give the network structure shape, the training dataset, and their labels.
Finally, the function needs to return the result of each particle solution. In our case, we return the variable results that collect each accuracy from each model. Pyswarms minimize the function so we need to subtract 1 to each accuracy.
Configuration and optimization
We need to define the properties of our swarm (c1, c2, w, and boundaries).
In this example, we set the c1 to 0.4, c2 to 0.8, and w to 0.4 because this problem has low complexity. So, the exploitation will be crucial. Also, we define boundaries because each weight will be between -1.0 and 1.0.
Finally, we set the number of iterations using optimize function and we use the test dataset to evaluate the result of our optimization.
We use the same structure but using a cross validation of 5 folds.
0.96 +/- 0.02
0.91 +/- 0.03
There are more optimization that we can apply to PSO to improve the results that we will explore in other articles like decreasing inertia, levy flight distribution, imperial modification, and combine PSO with Backpropagation.
Moreover, there are other research where the authors explore the performance of PSO in front of backpropagation like these ones:
PSO could present competent results with MLP. However, there are some cases where PSO can not find a good solution. That reduces the average accuracy. Also, Backpropagation has optimization made by Tensorflow that reduces the time processing.
PSO does not need a differentiable function. So, PSO can be used when the gradient is not available.PSO can perform a global search in the problem space. Using PSO, we can customize the speed of learning and performance using the optimization function.