Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to alternatives such as gradient descent.
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. Both are attributes of the material that depend on its thermodynamic free energy. Heating and cooling the material affects both the temperature and the thermodynamic free energy. The simulation of annealing can be used to find an approximation of a global minimum for a function with a large number of variables.
This notion of slow cooling implemented in the simulated annealing algorithm is interpreted as a slow decrease in the probability of accepting worse solutions as the solution space is explored. Accepting worse solutions is a fundamental property of metaheuristics because it allows for a more extensive search for the global optimal solution. In general, the simulated annealing algorithms work as follows. At each time step, the algorithm randomly selects a solution close to the current one, measures its quality, and then decides to move to it or to stay with the current solution based on either one of two probabilities between which it chooses on the basis of the fact that the new solution is better or worse than the current one. During the search, the temperature is progressively decreased from an initial positive value to zero and affects the two probabilities: at each step, the probability of moving to a better new solution is either kept to 1 or is changed towards a positive value; instead, the probability of moving to a worse new solution is progressively changed towards zero.
The simulation can be performed either by a solution of kinetic equations for density functions or by using the stochastic sampling method. The method is an adaptation of the Metropolis-Hastings algorithm, a Monte Carlo method to generate sample states of a thermodynamic system, published by N. Metropolis et al. in 1953.
The state of some physical systems, and the function E(s) to be minimized, is analogous to the internal energy of the system in that state. The goal is to bring the system, from an arbitrary initial state, to a state with the minimum possible energy.
At each step, the simulated annealing heuristic considers some neighboring state s* of the current state s, and probabilistically decides between moving the system to state s* or staying in-state s. These probabilities ultimately lead the system to move to states of lower energy. Typically this step is repeated until the system reaches a state that is good enough for the application, or until a given computation budget has been exhausted.
Optimization of a solution involves evaluating the neighbours of a state of the problem, which are new states produced through conservatively altering a given state. For example, in the travelling salesman problem each state is typically defined as a permutation of the cities to be visited, and its neighbours are the set of permutations produced by reversing the order of any two successive cities. The well-defined way in which the states are altered to produce neighbouring states is called a "move", and different moves give different sets of neighbouring states. These moves usually result in minimal alterations of the last state, in an attempt to progressively improve the solution through iteratively improving its parts (such as the city connections in the traveling salesman problem).
Simple heuristics like hill climbing, which move by finding better neighbour after better neighbour and stop when they have reached a solution which has no neighbours that are better solutions, cannot guarantee to lead to any of the existing better solutions – their outcome may easily be just a local optimum, while the actual best solution would be a global optimum that could be different. Metaheuristics use the neighbours of a solution as a way to explore the solutions space, and although they prefer better neighbours, they also accept worse neighbours in order to avoid getting stuck in local optima; they can find the global optimum if run for a long enough amount of time.
The probability of making the transition from the current state to a candidate new state is specified by an acceptance probability function , that depends on the energies and of the two states, and on a global time-varying parameter called the temperature. States with a smaller energy are better than those with a greater energy. The probability function must be positive even when is greater than . This feature prevents the method from becoming stuck at a local minimum that is worse than the global one.
When tends to zero, the probability must tend to zero if and to a positive value otherwise. For sufficiently small values of , the system will then increasingly favor moves that go "downhill" (i.e., to lower energy values), and avoid those that go "uphill." With the procedure reduces to the greedy algorithm, which makes only the downhill transitions.
In the original description of simulated annealing, the probability was equal to 1 when --i.e., the procedure always moved downhill when it found a way to do so, irrespective of the temperature. Many descriptions and implementations of simulated annealing still take this condition as part of the method's definition. However, this condition is not essential for the method to work.
The function is usually chosen so that the probability of accepting a move decreases when the difference increases--that is, small uphill moves are more likely than large ones. However, this requirement is not strictly necessary, provided that the above requirements are met.
Given these properties, the temperature plays a crucial role in controlling the evolution of the state of the system with regard to its sensitivity to the variations of system energies. To be precise, for a large , the evolution of is sensitive to coarser energy variations, while it is sensitive to finer energy variations when is small.
The name and inspiration of the algorithm demand an interesting feature related to the temperature variation to be embedded in the operational characteristics of the algorithm. This necessitates a gradual reduction of the temperature as the simulation proceeds. The algorithm starts initially with set to a high value (or infinity), and then it is decreased at each step following some annealing schedule--which may be specified by the user, but must end with towards the end of the allotted time budget. In this way, the system is expected to wander initially towards a broad region of the search space containing good solutions, ignoring small features of the energy function; then drift towards low-energy regions that become narrower and narrower; and finally move downhill according to the steepest descent heuristic.
Example illustrating the effect of cooling schedule on the performance of simulated annealing. The problem is to rearrange the pixels of an image so as to minimize a certain potential energy function, which causes similar colours to attract at short range and repel at a slightly larger distance. The elementary moves swap two adjacent pixels. These images were obtained with a fast cooling schedule (left) and a slow cooling schedule (right), producing results similar to amorphous and crystalline solids, respectively.
For any given finite problem, the probability that the simulated annealing algorithm terminates with a global optimal solution approaches 1 as the annealing schedule is extended. This theoretical result, however, is not particularly helpful, since the time required to ensure a significant probability of success will usually exceed the time required for a complete search of the solution space.
The following pseudocode presents the simulated annealing heuristic as described above. It starts from a state s0 and continues until a maximum of kmax steps have been taken. In the process, the call neighbour(s) should generate a randomly chosen neighbour of a given state s; the call random(0, 1) should pick and return a value in the range [0, 1], uniformly at random. The annealing schedule is defined by the call temperature(r), which should yield the temperature to use, given the fraction r of the time budget that has been expended so far.
In order to apply the simulated annealing method to a specific problem, one must specify the following parameters: the state space, the energy (goal) function E, the candidate generator procedure neighbour, the acceptance probability function P, and the annealing schedule temperature AND initial temperature <init temp>. These choices can have a significant impact on the method's effectiveness. Unfortunately, there are no choices of these parameters that will be good for all problems, and there is no general way to find the best choices for a given problem. The following sections give some general guidelines.
Simulated annealing may be modeled as a random walk on a search graph, whose vertices are all possible states, and whose edges are the candidate moves. An essential requirement for the neighbour function is that it must provide a sufficiently short path on this graph from the initial state to any state which may be the global optimum – the diameter of the search graph must be small. In the traveling salesman example above, for instance, the search space for n = 20 cities has n! = 2,432,902,008,176,640,000 (2.4 quintillion) states; yet the neighbor generator function that swaps two consecutive cities can get from any state (tour) to any other state in at most steps (this is equivalent to ).
To investigate the behavior of simulated annealing on a particular problem, it can be useful to consider the transition probabilities that result from the various design choices made in the implementation of the algorithm. For each edge of the search graph, the transition probability is defined as the probability that the simulated annealing algorithm will move to state when its current state is . This probability depends on the current temperature as specified by temperature, on the order in which the candidate moves are generated by the neighbour function, and on the acceptance probability function P. (Note that the transition probability is not simply , because the candidates are tested serially.)
The specification of neighbour, P, and temperature is partially redundant. In practice, it's common to use the same acceptance function P for many problems, and adjust the other two functions according to the specific problem.
In the formulation of the method by Kirkpatrick et al., the acceptance probability function was defined as 1 if , and otherwise. This formula was superficially justified by analogy with the transitions of a physical system; it corresponds to the Metropolis-Hastings algorithm, in the case where T=1 and the proposal distribution of Metropolis->Hastings is symmetric. However, this acceptance probability is often used for simulated annealing even when the neighbour function, which is analogous to the proposal distribution in Metropolis-Hastings, is not symmetric, or not probabilistic at all. As a result, the transition probabilities of the simulated annealing algorithm do not correspond to the transitions of the analogous physical system, and the long-term distribution of states at a constant temperature need not bear any resemblance to the thermodynamic equilibrium distribution over states of that physical system, at any temperature. Nevertheless, most descriptions of simulated annealing assume the original acceptance function, which is probably hard-coded in many implementations of SA.
When choosing the candidate generator neighbour, one must consider that after a few iterations of the simulated annealing algorithm, the current state is expected to have much lower energy than a random state. Therefore, as a general rule, one should skew the generator towards candidate moves where the energy of the destination state is likely to be similar to that of the current state. This heuristic (which is the main principle of the Metropolis-Hastings algorithm) tends to exclude "very good" candidate moves as well as "very bad" ones; however, the former are usually much less common than the latter, so the heuristic is generally quite effective.
In the traveling salesman problem above, for example, swapping two consecutive cities in a low-energy tour is expected to have a modest effect on its energy (length); whereas swapping two arbitrary cities is far more likely to increase its length than to decrease it. Thus, the consecutive-swap neighbour generator is expected to perform better than the arbitrary-swap one, even though the latter could provide a somewhat shorter path to the optimum (with swaps, instead of ).
A more precise statement of the heuristic is that one should try first candidate states for which is large. For the "standard" acceptance function above, it means that is on the order of or less. Thus, in the traveling salesman example above, one could use a neighbour function that swaps two random cities, where the probability of choosing a city pair vanishes as their distance increases beyond .
When choosing the candidate generator neighbour one must also try to reduce the number of "deep" local minima--states (or sets of connected states) that have much lower energy than all its neighbouring states. Such "closed catchment basins" of the energy function may trap the simulated annealing algorithm with high probability (roughly proportional to the number of states in the basin) and for a very long time (roughly exponential on the energy difference between the surrounding states and the bottom of the basin).
As a rule, it is impossible to design a candidate generator that will satisfy this goal and also prioritize candidates with similar energy. On the other hand, one can often vastly improve the efficiency of simulated annealing by relatively simple changes to the generator. In the traveling salesman problem, for instance, it is not hard to exhibit two tours , , with nearly equal lengths, such that (1) is optimal, (2) every sequence of city-pair swaps that converts to goes through tours that are much longer than both, and (3) can be transformed into by flipping (reversing the order of) a set of consecutive cities. In this example, and lie in different "deep basins" if the generator performs only random pair-swaps; but they will be in the same basin if the generator performs random segment-flips.
The physical analogy that is used to justify simulated annealing assumes that the cooling rate is low enough for the probability distribution of the current state to be near thermodynamic equilibrium at all times. Unfortunately, the relaxation time--the time one must wait for the equilibrium to be restored after a change in temperature--strongly depends on the "topography" of the energy function and on the current temperature. In the simulated annealing algorithm, the relaxation time also depends on the candidate generator, in a very complicated way. Note that all these parameters are usually provided as black box functions to the simulated annealing algorithm. Therefore, the ideal cooling rate cannot be determined beforehand, and should be empirically adjusted for each problem. Adaptive simulated annealing algorithm address this problem by connecting the cooling schedule to the search progress. Other adaptive approach as Thermodynamic Simulated Annealing, automatically adjusts the temperature at each step based on the energy difference between the two states, according to the laws of thermodynamics.
Sometimes it is better to move back to a solution that was significantly better rather than always moving from the current state. This process is called restarting of simulated annealing. To do this we set
ebest and perhaps restart the annealing schedule. The decision to restart could be based on several criteria. Notable among these include restarting based on a fixed number of steps, based on whether the current energy is too high compared to the best energy obtained so far, restarting randomly, etc.
Manage research, learning and skills at defaultlogic.com. Create an account using LinkedIn to manage and organize your omni-channel knowledge. defaultlogic.com is like a shopping cart for information -- helping you to save, discuss and share.