Particle swarm optimization

From Wikipedia, the free encyclopedia

Particle swarm optimization (PSO) is a form of swarm intelligence. Imagine a swarm of insects or a school of fish. If one sees a desirable path to go (e.g., for food, protection, etc.) the rest of the swarm will be able to follow quickly even if they are on the opposite side of the swarm. On the other hand, in order to facilitate felicitous exploration of the search space, typically one wants to have each particle to have a certain level of "craziness" or randomness in their movement, so that the movement of the swarm has a certain explorative capability: the swarm should be influenced by the rest of the swarm but also should independently explore to a certain extent. (This is a manifestation of the basic exploration-exploitation tradeoff that occurs in any search problem.)

This is modeled by particles in multidimensional space that have a position and a velocity. These particles are flying through hyperspace (i.e., $\mathbb{R}^n$ ) and have two essential reasoning capabilities: their memory of their own best position and knowledge of the swarm's best, "best" simply meaning the position with the smallest objective value. Members of a swarm communicate good positions to each other and adjust their own position and velocity based on these good positions. There are two main ways this communication is done:

a global best that is known to all and immediately updated when a new best position is found by any particle in the swarm
"neighborhood" bests where each particle only immediately communicates with a subset of the swarm about best positions

An algorithm is presented below where there is a global best rather than neighborhood bests. Neighborhood bests allow better exploration of the search space and reduce the susceptibility of PSO to falling into local minima, but slow down convergence speed. Note that neighborhoods merely slow down the proliferation of new bests, rather than creating isolated subswarms because of the overlapping of neighborhoods: to make neighborhoods of size 3, say, particle 1 would only communicate with particles 2 through 5, particle 2 with 3 through 6, and so on. But then a new best position discovered by particle 2's neighborhood would be communicated to particle 1's neighborhood at the next iteration of the PSO algorithm presented below. Smaller neighborhoods lead to slower convergence, while larger neighborhoods to faster convergence, with a global best representing a neighborhood consisting of the entire swarm.

1 A Basic, Canonical PSO Algorithm
- 1.1 Pseudo code
- 1.2 Discussion
2 Variations and Practicalities
3 Applications
4 See also
5 External links
6 References

[edit] A Basic, Canonical PSO Algorithm

Let $f : \mathbb{R}^m \rightarrow \mathbb{R}$ be the objective function. Let there be $n$ particles, each with associated positions $\mathbf{x}_i \in \mathbb{R}^m$ and velocities $\mathbf{v}_i \in \mathbb{R}^m$ , $i = 1, \ldots, n$ . Let $\hat{\mathbf{x}}_i$ be the current best position of each particle and let $\hat{\mathbf{g}}$ be the global best.

Initialize $\mathbf{x}_i$ and $\mathbf{v}_i$ for all $i$ . One common choice is to take $\mathbf{x}_{ij} \in U[a_j, b_j]$ and $\mathbf{v}_i = \mathbf{0}$ for all $i$ and $j = 1, \ldots, m$ , where $a j, b j$ are the limits of the search domain in each dimension.
$\hat{\mathbf{x}}_i \leftarrow \mathbf{x}_i$ and $\hat{\mathbf{g}} \leftarrow \min_{\mathbf{x}_i} f(\mathbf{x}_i), i = 1, \ldots, n$ .

While not converged:
- For :
  - $\mathbf{x}_i \leftarrow \mathbf{x}_i+\mathbf{v}_i$ .
  - $\mathbf{v}_i \leftarrow {\omega}\mathbf{v}_i+c_1\mathbf{r}_1\circ(\hat{\mathbf{x}}_i-\mathbf{x}_i)+c_2\mathbf{r}_2\circ(\hat{\mathbf{g}}-\mathbf{x}_i)$ .
  - If $f(\mathbf{x}_i) < f(\hat{\mathbf{x}}_i)$ , $\hat{\mathbf{x}}_i \leftarrow \mathbf{x}_i$ .
  - If $f(\mathbf{x}_i) < f(\hat{\mathbf{g}})$ , $\hat{\mathbf{g}} \leftarrow \mathbf{x}_i$ .

Note the following about the above algorithm:

$ω$ is an inertial constant. Good values are usually slightly less than 1.
$c 1$ and $c 2$ are constants that say how much the particle is directed towards good positions. They represent a "cognitive" and a "social" component, respectively, in that they affect how much the particle's personal best and the global best (respectively) influence its movement. Usually, we take $c_1, c_2 \approx 1$ but this need not be the case.
$\mathbf{r}_1, \mathbf{r}_2$ are two random vectors with each compenent generally a uniform random number between 0 and 1.
$\circ$ operator indicates element-by-element multiplication.

[edit] Pseudo code

Here follows a pseudo code example of the position update of the swarm. Note that the random vectors $\mathbf{r}_1, \mathbf{r}_2$ are implemented as scalars inside the dimension loop which is equivalent to the mathematical description given above.

 for I = 1 to number of particles n do
   for J=1 to number of dimensions m do
     R1=uniform random number
     R2=uniform random number
     V[I][J]=w*V[I][J]
            +C1*R1*(P[I][J]-X[I][J])
            +C2*R2*[G[I][J]-X[I][J])
   enddo
   X[I][J] = X[I][J]+V[I][J]
 enddo

[edit] Discussion

By studying this algorithm, we see that we are essentially carrying out something like a discrete-time simulation where each iteration of it represents a "tic" of time. The particles "communicate" information they find about each other by updating their velocities in terms of local and global bests; when a new best is found, the particles will change their positions accordingly so that the new information is "broadcast" to the swarm. The particles are always drawn back both to their own personal best positions and also to the best position of the entire swarm. They also have stochastic exploration capability via the use of the random multipliers $r 1, r 2$ . The vector, floating-point nature of the algorithm suggests that high-performance implementations could be created that take advantage of modern hardware extensions pertaining to vectorization, such as Streaming SIMD Extensions and Altivec.

Typical convergence conditions include reaching a certain number of iterations, reaching a certain fitness value, and so on.

[edit] Variations and Practicalities

There are a number of considerations in using PSO in practice; one might wish to clamp the velocities to a certain maximum amount, for instance. The considerable adaptability of PSO to variations and hybrids is seen as a strength over other robust evolutionary optimization mechanisms, such as genetic algorithms. For example, one common, reasonable modification is to add a probabilistic bit-flipping local search heuristic to the loop. Normally, a stochastic hill-climber risks falling into local minima, but the stochastic exploration and communication of the swarm overcomes this. Thus, PSO can be seen as a basic search "workbench" that can be adapted as needed for the problem at hand.

Note that the research literature has uncovered many heuristics and variants determined to be better with respect to convergence speed and robustness, such as clever choices of $ω$ , $c i$ , and $r i$ . There are also other variants of the algorithm, such as discretized versions for searching over subsets of $\mathbb{Z}^n$ rather than $\mathbb{R}^n$ . There has also been experimentation with coevolutionary versions of the PSO algorithm with good results reported. Very frequently the value of $ω$ is taken to decrease over time; e.g., one might have the PSO run for a certain number of iterations and increase linearly from a starting value (0.9, say) to a final value (0.4, say) in order to facilitate exploitation over exploration in later states of the search. The literature is full of such heuristics. In other words, the canonical PSO algorithm is not as strong as various improvements which have been developed on several common function optimization benchmarks and consulting the literature for ideas on parameter choices and variants for particular problems is likely to be helpful.

Significant, non-trivial modifications have been developed for multi-objective optimization, versions designed to find solutions satisfying linear or non-linear constraints, as well as "niching" versions designed to find multiple solutions to problems where it is believed or known that there are multiple global minima which ought to be located. For an up-to-date literature survey and in-depth discussion of PSO along with the related paradigm of ant colony optimization and most of the documented modifications and heuristics mentioned above, see Engelbrecht's book below.

[edit] Applications

Although a relatively new paradigm, PSO has been applied to a variety of tasks, such as the training of artificial neural networks. Very recently, PSO has been applied in combination with grammatical evolution to create a hybrid optimization paradigm called "grammatical swarms." Engelbrecht's book has a chapter on applications of PSO.

[edit] See also

[edit] External links

Particle Swarm Central

JSwarm-PSO Particle swarm optimization package

Simple introduction to PSO with link to Java source code

DMOZ Particle Swarm People

Java Applet for 3D-visualisation of PSO

Links to PSO source codes

Some papers and source codes (C and Java)

CILib - GPLed computational intelligence simulation and research environment written in Java, includes various PSO implementations

Understanding the Particle Swarm Optimization Algorithm - What are some of the effects of diversity?

[edit] References

M. Clerc. Particle Swarm Optimization. ISTE, 2006.

A. Chatterjee, P. Siarry, Nonlinear inertia variation for dynamic adaptation in particle swarm optimization, Computers and Operations Research, Vol. 33, No. 3, pp. 859–871, 2006.

A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley, 2005.

D. N. Wilke. Analysis of the particle swarm optimization algorithm, Master's Dissertation, University of Pretoria, 2005. [1]

M. Clerc, and J. Kennedy, The Particle Swarm-Explosion, Stability, and Convergence in a Multidimensional Complex Space, IEEE Transactions on Evolutionary Computation, 2002, 6, 58-73