Algorithmics of sudoku

From Wikipedia, the free encyclopedia

The class of Sudoku puzzles consists of a partially completed row-column grid of cells partitioned into N regions or zones each of size N cells, to be filled in using a prescribed set of N distinct symbols (typically the numbers {1, ..., N}), so that each row, column and region contains exactly one of each element of the set. The puzzle can be solved using a variety of algorithms. This page provides algorithms only.

Contents

[edit] Solving a blank sudoku grid

Although sudoku grids that come with some of their cells pre-filled can often be quite challenging to solve, blank sudoku grids can actually be solved very quickly.

Perhaps the easiest way of doing this is to produce the root solution, which can be achieved using the following very simple, polynomial time algorithm, first proposed by Lewis[1]

For the standard n2 x n2 grid this algorithm is as follows:

int x := 0;
for i := 1 to n do
   for j := 1 to n do
      for k := 1 to n^2 do
         grid[n*(i-1)+j][k] := (x mod n^2) + 1;
         x := x + 1;
       x := x + n;
   x := x + 1

(Here, the notation grid[i][j] represents the value that is contained in the ith row and jth column of cells). If we set n = 3, then this procedure will produce the following root solution.

+-----------------------+
| 1 2 3 | 4 5 6 | 7 8 9 |
| 4 5 6 | 7 8 9 | 1 2 3 |
| 7 8 9 | 1 2 3 | 4 5 6 |
|-------+-------+-------|
| 2 3 4 | 5 6 7 | 8 9 1 |
| 5 6 7 | 8 9 1 | 2 3 4 |
| 8 9 1 | 2 3 4 | 5 6 7 |
|-------+-------+-------|
| 3 4 5 | 6 7 8 | 9 1 2 |
| 6 7 8 | 9 1 2 | 3 4 5 |
| 9 1 2 | 3 4 5 | 6 7 8 |
+-----------------------+

Obviously, this algorithm can easily be adapted to different sized puzzles if needed.

[edit] Solving sudokus by backtracking

The basic backtracking algorithm can be adapted to solve sudokus. This is straightforward. Say a zone is a subset of N boxes of an N x N grid, which must contain the numbers from 1 to N. A standard sudoku contains 27 zones, namely 9 rows, 9 columns and 9 squares that are 3 x 3. In a jigsaw sudoku, the square zones are replaced by zones having irregular boundaries, like a jigsaw piece.

One possible algorithm that uses backtracking to solve such sudokus constructs a graph on N2 vertices, one vertex for each box of the grid. Two vertices are connected by an edge if there exists a zone containing the two boxes. The problem is then equivalent to coloring this graph with N colors, where adjacent vertices may not have the same color. This is done by starting with an empty assignment of colors and assigning colors to vertices one after another, using some fixed order of the vertices. Whenever a color is being assigned, we check whether it is compatible with the existing assignments, i.e. whether the new color occurs among the neighbors of that vertex. If it doesn't, then we may assign it to the vertex and try to process another vertex. We backtrack once all N colors have been tried for a given vertex. If all vertices have been assigned a color, then we have found a solution. There are of course much more sophisticated algorithms to solve graph coloring. If the sudoku contains initial data, i.e. some boxes have already been filled, then these go into the color assignment before backtracking begins and the vertex sequence includes only the empty boxes.

The above algorithm was used to solve a 10x10 jigsaw sudoku that was proposed on Les-Mathematiques.net A link to the proposal may be found in the section for external links. The first section of the program defines the 10 jigsaw pieces (zones), the second the row and column zones. Thereafter the graph is constructed as an adjacency list. The search procedure prints completed solutions (when all 100 boxes have been assigned). Otherwise it computes the set of colors present among the neighbors of the next vertex to be processed, and recursively tries those assignments that do not conflict with this set. The search starts with an empty assignment.

[edit] Exact Cover in solving sudokus

Sudoku can be described as an Exact cover problem. This allows both for a very elegant description of the problem and an efficient backtrack algorithm for solving the problem.

In an exact cover problem, there is given a universe U of elements and a collection \mathcal{S} of subsets of U. The task is to find a subcollection \mathcal{S}^* of \mathcal{S} such that every element in U is contained in exactly one set in \mathcal{S}^*.

Let R, C, B, and V denote the set of rows, columns, blocks, and values in the sudoku. Each of these sets has nine elements. Each of the following combinations must occur exactly once in each sudoku: R×C (in each row-column intersection there must be exactly one number), R×V (each value must appear exactly once in each row), C×V (each value must appear exactly once in each column), and B×V (each value must appear exactly once in each 3×3 block. The union of these four sets of 81 elements each is the universe of 324 elements in the exact cover formulation.

The subsets available for covering the elements of the universe correspond to writing a number in a square. Writing the value v in row r in column c in block b corresponds to the set with the elements (r, c), (r, v), (c, v), (b, v). The sudoku can then be solved by finding 81 such subsets that together cover each element in the universe exactly once.

Where this method really excels is that one does not have to consider the various types of constraints separately. Rather, one can simply structure the backtrack search so that at each level of recursion one chooses the element from the universe for which there are the fewest number of sets with which it can be covered without violating the exact cover constraints.

With this approach and an efficient library for solving exact cover problems, one can solve 9×9 sudokus in such a short time on a modern PC that measuring the computation time becomes challenging.

[edit] Standard Sudoku solver and enumerator

This section contains a discussion of a modified backtracking algorithm that can be used to create and solve Sudokus even with modest computing resources. A standard Sudoku is an NxN Sudoku where N = k2 whose zones consist of the rows, columns and kxk subsquares as in the classic 9x9 Sudoku.

[edit] Key modifications to the algorithm

Conceptually there is one modification only: the backtracking algorithm sorts the vertices by the number of colors already assigned among its neighbors before selecting the next vertex to try. The vertex with the largest number of assigned colors, and hence, the smallest number of choices, is tried first. (There may be more than one such vertex).

The data structures used during the backtracking search are chosen to make this easy and fast, although further optimization is possible. The search state is stored in three data structures: a hash table whose keys are the vertices and whose values are the colors that have been assigned to them. There is an array that contains the vertices that have not yet been assigned a color. Finally, there is a hash table whose keys are again the vertices and whose values are hash tables containing the colors present among the neighbors of the respective vertex, as well as a hint as to who assigned them.

[edit] Discussion of the modified algorithm

The algorithm follows this sequence of steps:

  • Create the row and column zones, then create the subsquare zones.
  • Construct the adjacency matrix of the graph.
  • The search routine:
    • Print the solution if there are no more vertices to be assigned a color, and return.
    • Sort the remaining vertices in descending order of colors present among their neighbors.
    • Pick the/a vertex with the largest number of assigned colors.
    • Try each of the remaining possible colors recursively.
      • Update the hash table of vertex neighboring colors to reflect the assignment.
      • Update the partial solution to reflect the assignment.
      • Recurse.
      • Remove the color from the partial solution.
      • Undo the color assignments from the neighboring colors hash table.
  • Before the search begins, read the initial color assignment.
  • Compute the set of vertices to be assigned a color, i.e. not present in the initial assignment.
  • Compute the initial state of the hash table of neighbor colors.
  • Start the search.

The above algorithm can enter into loops. To detect this, add a hash table that stores seen configurations. When this happens, terminate the computation and indicate FAIL (e.g. by throwing an exception). Repeat with a different seed of the random number generator if desired.

[edit] Example of a back-tracking Sudoku solver (in Ruby programming language)

def read_matrix
  matrix = []

  (0..8).each { |i|
    l = readline
    matrix[i] = []
    (0..8).each { |j|
      matrix[i][j] = l[j..j].to_i
    }
  }
  matrix
end

def permissible(matrix, i, j)
  ok = [true,true,true,true,true,true,true,true,true]
  # Same as another in the column isn't permissible...
  (0..8).each { |i2|
    next if matrix[i2][j] == 0
    ok[matrix[i2][j] - 1] = false
  }
  # Same as another in the row isn't permissible...
  (0..8).each { |j2|
    next if matrix[i][j2] == 0
    ok[matrix[i][j2] - 1] = false
  }
  # Same as another in the 3x3 block isn't permissible...
  igroup = (i / 3) * 3
  jgroup = (j / 3) * 3
  (igroup..(igroup + 2)).each { |i2|
    (jgroup..(jgroup + 2)).each { |j2|
      next if matrix[i2][j2] == 0
      ok[matrix[i2][j2] - 1] = false
    }
  }
  # Convert to the array format...
  ret = []
  (0..8).each { |i2| ret.push(i2 + 1) if ok[i2] }
  ret
end

def deep_copy_sudoku(matrix)
  newmat = []
  (0..8).each { |i|
    newmat[i] = []
    (0..8).each { |j|
      newmat[i][j] = matrix[i][j]
    }
  }
  newmat
end

def solve_sudoku(matrix)
  while true
    options = []
    (0..8).each { |i|
      (0..8).each { |j|
        next if matrix[i][j] != 0
        p = permissible(matrix, i, j)
        # If nothing is permissible, there is no solution at this level.
        return false if (p.length == 0)
        options.push({:i => i, :j => j, :permissible => p})
      }
    }
    # If the matrix is complete, we have a solution...
    return matrix if options.length == 0

    omin = options.min { | a, b |
      a[:permissible].length <=> b[:permissible].length
    }

    # If there is an option with only one solution, set it and re-check permissibility
    if omin[:permissible].length == 1
      matrix[omin[:i]][omin[:j]] = omin[:permissible][0]
      next
    end

    # We have two or more choices. We need to search both...
    omin[:permissible].each { |v|
      mtmp = deep_copy_sudoku(matrix)
      mtmp[omin[:i]][omin[:j]] = v
      ret = solve_sudoku(mtmp)
      if ret != false
        return ret
      end
    }

    # We did an exhaustive search on this branch and nothing worked out.
    return false
  end
end

def print_matrix(matrix)
  if (matrix == false)
    print "Impossible\n"
    return
  end

  (0..8).each { |i|
    (0..8).each { |j|
      print matrix[i][j]
    }
    print "\n"
  }
end

print_matrix(solve_sudoku(read_matrix()))

[edit] Solving sudokus by a brute-force algorithm

Some hobbyists have developed computer programs that will solve sudoku puzzles using a brute force algorithm. Although it has been established that approximately 6.67 x 1021 final grids exist, using a brute force algorithm can be a practical method to solve puzzles using a computer program if the code is well designed.

An advantage of this method is that if the puzzle is valid, a solution is guaranteed. There is not a strong relation between the solving time and the degree of difficulty of the puzzle; generating a solution is just a matter of waiting until the algorithm advances to the set of numbers that satisfies the puzzle. The disadvantage of this method is that it may be comparatively slow when compared to computer solution methods modeled after human-style deductive methods.

Briefly, a brute force program would solve a puzzle by placing the digit "1" in the first cell and checking if it is allowed to be there. If there are no violations (checking row, column, and box constraints) then the algorithm advances to the next cell, and places a "1" in that cell. When checking for violations, it is discovered that the "1" is not allowed, so the value is advanced to a "2". If a cell is discovered where none of the 9 digits is allowed, then the algorithm leaves that cell blank and moves back to the previous cell. The value in that cell is then incremented by one. The algorithm is repeated until the allowed value in the 81st cell is discovered. The construction of 81 numbers is parsed to form the 9 x 9 solution matrix.

Most Sudoku puzzles will be solved in just a few seconds with this method, but there are exceptions. The following puzzle was designed to be a near worst case situation for solution by brute force (although it is not regarded as a difficult puzzle when solved by other methods).

"Near worst case" sudoku puzzle for brute force solver.
"Near worst case" sudoku puzzle for brute force solver.

Solving this puzzle by brute-force requires a large number of iterations because it has a low number of clues (17), the top row has no clues at all, and the solution has "987654321" as its first row. Thus a brute-force solver will spend an enormous amount of time "counting" upward before it arrives at the final grid which satisfies the puzzle. If one iteration is defined as one attempt to place one value in one cell, then this puzzles requires 641,580,843 iterations to solve. These iterations do not include the work involved at each step to learn if each digit entered is valid or not (required for every iteration). Based on the specific construction of the computer code, programmers have found the solution time for this puzzle to be between 30 and 45 minutes with a computer processor running at 3 GHz. Many programmers have developed variations of the brute force algorithm which will solve this puzzle in a minute or less with a 3 GHz computer processor.

One programmer has created charts of the progression of a pointer as it advances through the 81 positions of a sudoku using a brute force algorithm. An example is the chart for the solution to a sudoku "Star Burst Leo" shown here.[2][3]

"Star Burst Leo" Sudoku
"Star Burst Leo" Sudoku
Chart of cell position for solution to "Star Burst Leo" when solved by a brute force algorithm
Chart of cell position for solution to "Star Burst Leo" when solved by a brute force algorithm

[edit] Solving sudokus via stochastic search / optimisation methods

Recently, some researchers have also shown how sudoku can be solved using stochastic -- i.e. random-based -- search. Perhaps the first algorithm of this kind was produced by Lewis.

This particular algorithm starts by randomly assigning numbers to the blank cells in the grid in a haphazard fashion. The number of errors in the grid is then calculated and the objective of the algorithm is to then "shuffle" the numbers around the grid until the number of mistakes has been reduced to zero. A solution to the puzzle will then have been found. In this algorithm, the optimization is achieved using simulated annealing, though other optimization methods such as tabu search may prove just as applicable.

The advantage of this type of method is that the puzzle does not have to be "logic-solvable" in order for the algorithm to be able to solve it. In other words, unlike other methods, the puzzles that are given to this algorithm do not have to be specially constructed so that they provide sufficient clues for filling the grid using forward chaining logic only. In fact, the only prerequisite for the stochastic search algorithm to work is that puzzle has at least one solution.

The simulated annealing method in Lewis's paper is also quite fast (though perhaps not as quick as some logic-based techniques with logic solvable puzzles): depending on the type of instance given to the algorithm, generally 9x9 puzzles will be solved in less than 1 second on a typical year-2000 lap top; 16x16 puzzles will take around 10-15 seconds.

It is also possible to express a Sudoku as an integer linear programming problem. This gets close to the solution quickly, and then will use a branching approach towards the end. The advantage is that the simplex algorithm handles no solutions and multiple solutions easily.

[edit] Exceptionally difficult Sudokus (Hardest Sudokus)

Some of the sudoku puzzles can only be solved using logic that is too complex for human solvers. Most would describe them as unsolvable after exhausting their arsenal of sudoku solving techniques and would proceed to a trial and error path to reach a solution.

It was no surprise that computer programmers were interested in this subject, trying to generate even more difficult puzzles or even trying to find new ways to logically solve and rate them.

Rating puzzles that are beyond human capabilities proved to be difficult as different points of view regarding what could be considered a yardstick for measuring difficulty resulted in heterogeneous opinions on which puzzle is the hardest of them all.

Several active discussions on this issue took place on a number of popular sudoku forums since 2005.[4][5][6] Several openly accessible solver programs became popular between users for the purpose of rating and generating such puzzles. (See: External links)

The following is a compilation of the latest hardest sudoku puzzles according to a number of openly accessible solver Programs:

Rating Program: gsf's sudoku q1 (rating) 
Rating: 99408 
Poster: JPF 
Label: Easter Monster 
1.......2.9.4...5...6...7...5.9.3.......7.......85..4.7.....6...3...9.8...2.....1 
1 . . | . . . | . . 2  
. 9 . | 4 . . | . 5 .  
. . 6 | . . . | 7 . .  
------+-------+------ 
. 5 . | 9 . 3 | . . .  
. . . | . 7 . | . . .  
. . . | 8 5 . | . 4 .  
------+-------+------ 
7 . . | . . . | 6 . .  
. 3 . | . . 9 | . 8 .  
. . 2 | . . . | . . 1  
Rating Program: gsf's sudoku q1 (Processing time) 
Rating: 4m19s@2 GHz 
Poster: tarek 
Label: tarek071223170000-052 
..1..4.......6.3.5...9.....8.....7.3.......285...7.6..3...8...6..92......4...1... 
. . 1 | . . 4 | . . .  
. . . | . 6 . | 3 . 5  
. . . | 9 . . | . . .  
------+-------+------ 
8 . . | . . . | 7 . 3  
. . . | . . . | . 2 8  
5 . . | . 7 . | 6 . .  
------+-------+------ 
3 . . | . 8 . | . . 6  
. . 9 | 2 . . | . . .  
. 4 . | . . 1 | . . .  
Rating Program: Nicolas Juillerat's Sudoku explainer 1.2.1 
Rating: 11.9 
Poster: tarek 
Label: golden nugget 
.......39.....1..5..3.5.8....8.9...6.7...2...1..4.......9.8..5..2....6..4..7..... 
. . . | . . . | . 3 9  
. . . | . . 1 | . . 5  
. . 3 | . 5 . | 8 . .  
------+-------+------ 
. . 8 | . 9 . | . . 6  
. 7 . | . . 2 | . . .  
1 . . | 4 . . | . . .  
------+-------+------ 
. . 9 | . 8 . | . 5 .  
. 2 . | . . . | 6 . .  
4 . . | 7 . . | . . .  
Rating Program: dukuso's suexrat9 
Rating: 4483 
Poster: coloin 
Label: col-02-08-071 
.2.4.37.........32........4.4.2...7.8...5.........1...5.....9...3.9....7..1..86.. 
. 2 . | 4 . 3 | 7 . .  
. . . | . . . | . 3 2  
. . . | . . . | . . 4  
------+-------+------ 
. 4 . | 2 . . | . 7 .  
8 . . | . 5 . | . . .  
. . . | . . 1 | . . .  
------+-------+------ 
5 . . | . . . | 9 . .  
. 3 . | 9 . . | . . 7  
. . 1 | . . 8 | 6 . .  
Rating Program: dukuso's suexratt (10000 2 option) 
Rating: 2141 
Poster: tarek 
Label: golden nugget 
.......39.....1..5..3.5.8....8.9...6.7...2...1..4.......9.8..5..2....6..4..7..... 
. . . | . . . | . 3 9  
. . . | . . 1 | . . 5  
. . 3 | . 5 . | 8 . .  
------+-------+------ 
. . 8 | . 9 . | . . 6  
. 7 . | . . 2 | . . .  
1 . . | 4 . . | . . .  
------+-------+------ 
. . 9 | . 8 . | . 5 .  
. 2 . | . . . | 6 . .  
4 . . | 7 . . | . . .

[edit] References

  1. ^ Lewis, R (2007) Metaheuristics Can Solve Sudoku Puzzles Journal of Heuristics, vol. 13 (4), pp 387-401.
  2. ^ "flickr.com/photos/npcomplete/"
  3. ^ "flickr.com/photos/npcomplete/2384354604/" (Star Burst Leo sudoku)
  4. ^ "The hardest sudokus" On the Sudoku Players' Forums
  5. ^ "what is the hardest known suduko ?" On the Sudoku Programmers Forums
  6. ^ "How regular is to generate sudoku with difficulty 9+ SE?" On the Sudoku Players' Forums

[edit] External links