You have a problem. You already have some solution to your problem, but you want the very best solution.

There is a set of local modifications that you can make to your solution. After you make a modification, you have a new solution that neighbors your old solution.

You try each modification. If none of the neighboring solutions is better than your solution, then you stay with your solution. If at least one of them is better than your solution, then you change your solution to the best of them. With this new solution now as your solution, you repeat the process. You continue like this until no neighboring solution is better than your solution.

This strategy is an instance of local search. Specifically, it is hill climbing.

The issue with this approach is that, although your solution improves as it iteratively changes into its best neighboring solution, the very best solution might never be in the neighborhood of your solution. So, you might never find the very best solution.

Simulated annealing combats this issue. At each iteration, a neighboring solution is drawn at random. Like in hill climbing, if it is better than the current solution, then it becomes the current solution. However, unlike in hill climbing, there is still a probability that it will become the current solution even if it is worse than the current solution. The hope is that by chance the solution will enter a neighborhood where the very best solution lives.

In fact, if you model the sequence of solutions as a Markov chain, then you can show that simulated annealing converges toward the set of optimal solutions. However, it converges in the limit. Like hill climbing, simulated annealing is a heuristic. There is no guarantee that it will find a global optimum.

In the single machine total weighted tardiness problem, you want to process a set of jobs on a single machine. Each job has a processing time, a due date, and a weight that represents its importance. You can only process one job at a time. If a job finishes after its due date, then it is tardy. The importance of tardiness depends on the importance of the job, and you want to minimize the total weighted tardiness for all jobs.

I wrote a Java program that applies simulated annealing to the single machine total weighted tardiness problem. It is up at my GitHub.

Further references:

http://www.inf.ufpr.br/aurora/disciplinas/topicosia2/livros/search/SA.pdf

http://www.theprojectspot.com/tutorial-post/simulated-annealing-algorithm-for-beginners/6

http://katrinaeg.com/simulated-annealing.html