This page describes an algorithm for solving Sudoku puzzles, which have suddenly become popular in the UK, and provides a Python script to automate the process.
Updates:Oct 9 2005 - Replaced the Python script with an easier to understand version that doesn't use recursion. Altered this page to reflect the new algorithm.. |
|
We will start off with an easy example. This one appeared in the Evening Standard on 31st May 2005, where it was described as "Hard". It has 27 squares filled in, the examples they describe as "Easy" have 32. This doesn't make much difference to a computer algorithm, though if you were working through this by hand the more initial numbers you are given the quicker you should be able to solve it. I call it "easy" as it can be solved after applying only the simpler parts of the algorithm. |
The algorithm works by writing out all the possible answers and then eliminating those that are incompatible with the numbers published in the puzzle. The starting position is that every cell can contain any of the numbers 1-9. We represent this by writing all the possible values into each cell, which gives us a grid that looks like this.
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 | 123456789 |
The overall approach is to eliminate the impossible so that what remains must be the answer.
In the example above we have been given the values for 27 of the cells. The first thing to do is to apply these to our grid and eliminate all the values that are inconsistent with these. Working across the top row the first value we have is in the 5th cell. This is a 1. This means that there can be no other '1's on the top row or in the 5th column, or in the top-middle block of 9. Crossing out all these '1's leaves
23456789 | 23456789 | 23456789 | 23456789 | 1 | 23456789 | 23456789 | 23456789 | 23456789 |
123456789 | 123456789 | 123456789 | 23456789 | 23456789 | 23456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 23456789 | 23456789 | 23456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
123456789 | 123456789 | 123456789 | 123456789 | 23456789 | 123456789 | 123456789 | 123456789 | 123456789 |
Continuing this process; after entering the 9 given values on the top three rows we have
2368 | 2368 | 2368 | 23567 | 1 | 9 | 3578 | 4 | 3578 |
1239 | 1239 | 4 | 8 | 2357 | 2357 | 6 | 13579 | 13579 |
7 | 5 | 13689 | 346 | 346 | 346 | 1389 | 1389 | 2 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
12345689 | 12346789 | 12356789 | 12345679 | 23456789 | 12345678 | 12345789 | 12356789 | 13456789 |
This hasn't made much impact on the bottom two thirds of the grid because so far we've only had one value in each column. But we've got 18 more values to place. When we get to '6' in the 6th column of the 6th row the grid is becoming a lot sparser and we notice that the only possible value left for the sixth cell on the third row is '4'. So we have found our first missing number (shown in green).
2368 | 2368 | 2368 | 23567 | 1 | 9 | 3578 | 4 | 3578 |
1239 | 123 | 4 | 8 | 2357 | 57 | 6 | 13579 | 13579 |
7 | 5 | 13689 | 36 | 346 | 4 | 1389 | 1389 | 2 |
368 | 9 | 3678 | 1 | 578 | 2 | 3578 | 35678 | 4 |
12468 | 124678 | 12678 | 579 | 5789 | 3 | 125789 | 1256789 | 156789 |
5 | 12378 | 12378 | 4 | 789 | 6 | 123789 | 123789 | 13789 |
1234689 | 1234678 | 12356789 | 235679 | 23456789 | 14578 | 12345789 | 12356789 | 1356789 |
1234689 | 1234678 | 12356789 | 235679 | 23456789 | 14578 | 12345789 | 12356789 | 1356789 |
1234689 | 1234678 | 12356789 | 235679 | 23456789 | 14578 | 12345789 | 12356789 | 1356789 |
After we've used up all the given numbers (and the one we found) we we are left with the grid.
236 | 2368 | 238 | 3567 | 1 | 9 | 3578 | 4 | 578 |
1239 | 23 | 4 | 8 | 2357 | 57 | 6 | 159 | 1579 |
7 | 5 | 1389 | 36 | 36 | 4 | 1389 | 189 | 2 |
36 | 9 | 378 | 1 | 578 | 2 | 578 | 568 | 4 |
1246 | 24678 | 1278 | 579 | 578 | 3 | 125789 | 125689 | 156789 |
5 | 278 | 1278 | 4 | 78 | 6 | 12789 | 3 | 1789 |
8 | 24 | 259 | 56 | 456 | 15 | 1259 | 7 | 3 |
239 | 237 | 6 | 357 | 357 | 8 | 4 | 1259 | 159 |
34 | 1 | 357 | 2 | 9 | 57 | 58 | 568 | 568 |
So we move onto phase 2, "Looking for Singletons".
Phase two involves counting how often the undecided numbers appear in a set of cells (row, column or 3x3 super-cell). If we start with the first column, the undecided cells are the 1st, 2nd, 4th, 5th, 8th and 9th. The possibilities are '236', '1239', '36', '1246', '239' and '34'. If we count how often each undecided number appears (the known ones are excluded from this process) we get
Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Frequency | 2 | 4 | 5 | 2 | - | 3 | - | - | 2 |
Every undecided number appears at least twice, so this doesn't help us. Repeating this exercise for columns 2 and 3 doesn't produce any useful information either. When we try column 4 though, we get
Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Frequency | - | - | 3 | - | 4 | 3 | 3 | - | 1 |
Now the '9' only appears once, in the 5th row, so that is where it must be. We have found our second number. We can now set the '9' and repeat the process described in phase 1 giving.
236 | 2368 | 238 | 3567 | 1 | 9 | 3578 | 4 | 578 |
1239 | 23 | 4 | 8 | 2357 | 57 | 6 | 159 | 1579 |
7 | 5 | 1389 | 36 | 36 | 4 | 1389 | 189 | 2 |
36 | 9 | 378 | 1 | 578 | 2 | 578 | 568 | 4 |
1246 | 24678 | 1278 | 9 | 578 | 3 | 12578 | 12568 | 15678 |
5 | 278 | 1278 | 4 | 78 | 6 | 12789 | 3 | 1789 |
8 | 24 | 259 | 56 | 456 | 15 | 1259 | 7 | 3 |
239 | 237 | 6 | 357 | 357 | 8 | 4 | 1259 | 159 |
34 | 1 | 357 | 2 | 9 | 57 | 58 | 568 | 568 |
That didn't help very much, so we restart this phase at the beginning. The first four columns offer no additional insights, but now the frequencies in column 5 are
Number | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Frequency | - | 1 | 3 | 1 | 5 | 2 | 5 | 3 | - |
Which gives us two more discoveries, '2' on the second row and '4' on the 7th. Filling these in using the phase 1 algorithm sets off a chain reaction...
Action | Consequences |
(2,5) = 2 | (2,2) = 3 |
(2,2) = 3 | None |
(7,5) = 4 | (7,2) = 2 |
(7,2) = 2 | (8,2) = 7 |
(8,2) = 7 | (6,2) = 8 |
(6,2) = 8 | (1,2) = 6, (6,5) = 7 |
(1,2) = 6 | (1,1) = 2, (5,2) = 4 |
(6,5) = 7 | None |
(1,1) = 2 | (1,3) = 8 |
(5,2) = 4 | None |
(1,3) = 8 | None |
Co-ordinates are (row, column) with the top left being (1,1). The grid now looks like this (with the new discoveries highlighted).
2 | 6 | 8 | 357 | 1 | 9 | 357 | 4 | 57 |
19 | 3 | 4 | 8 | 2 | 57 | 6 | 159 | 1579 |
7 | 5 | 19 | 36 | 36 | 4 | 1389 | 189 | 2 |
36 | 9 | 37 | 1 | 58 | 2 | 578 | 568 | 4 |
16 | 4 | 127 | 9 | 58 | 3 | 12578 | 12568 | 15678 |
5 | 8 | 12 | 4 | 7 | 6 | 129 | 3 | 19 |
8 | 2 | 59 | 56 | 4 | 15 | 159 | 7 | 3 |
39 | 7 | 6 | 35 | 35 | 8 | 4 | 1259 | 159 |
34 | 1 | 35 | 2 | 9 | 57 | 58 | 568 | 568 |
Restartind g this phase counting down the columns shows us that the bottom cell in the first column must be '4'. This doesn't lead to any further discoveries. Trying again reveals that the top cell in the 4th column must be '7'. This second discovery forces both (1,9) and (2,6) to be '5', which in turn forces three more discoveries, leaving.
2 | 6 | 8 | 7 | 1 | 9 | 3 | 4 | 5 |
19 | 3 | 4 | 8 | 2 | 5 | 6 | 19 | 179 |
7 | 5 | 19 | 36 | 36 | 4 | 189 | 189 | 2 |
36 | 9 | 37 | 1 | 58 | 2 | 578 | 568 | 4 |
16 | 4 | 127 | 9 | 58 | 3 | 12578 | 12568 | 1678 |
5 | 8 | 12 | 4 | 7 | 6 | 129 | 3 | 19 |
8 | 2 | 59 | 56 | 4 | 1 | 59 | 7 | 3 |
39 | 7 | 6 | 35 | 35 | 8 | 4 | 1259 | 19 |
4 | 1 | 35 | 2 | 9 | 7 | 58 | 568 | 68 |
Searching the columns again we see that the third cell in the fifth column must be a 6. Setting this triggers another cascade of discoveries which culminates with the solution.
2 | 6 | 8 | 7 | 1 | 9 | 3 | 4 | 5 |
1 | 3 | 4 | 8 | 2 | 5 | 6 | 9 | 7 |
7 | 5 | 9 | 3 | 6 | 4 | 1 | 8 | 2 |
3 | 9 | 7 | 1 | 8 | 2 | 5 | 6 | 4 |
6 | 4 | 2 | 9 | 5 | 3 | 7 | 1 | 8 |
5 | 8 | 1 | 4 | 7 | 6 | 2 | 3 | 9 |
8 | 2 | 5 | 6 | 4 | 1 | 9 | 7 | 3 |
9 | 7 | 6 | 5 | 3 | 8 | 4 | 2 | 1 |
4 | 1 | 3 | 2 | 9 | 7 | 8 | 5 | 6 |
For this example that is as far as we need to go, which is why I've described it as "Easy". I don't have any examples, but any patterns that can be solved only by using the first phase could be described as "Trivial". The next sections discuss some more difficult examples of the problem.
The first two phases are sufficient to solve most examples published in popular newspapers, but here is a more difficult example someone sent me. It only has 23 given numbers, 4 less than the previous example.
2 | ||||||||
5 | 8 | 6 | ||||||
3 | 8 | 5 | ||||||
1 | 4 | 7 | 6 | |||||
9 | 6 | 5 | 7 | |||||
7 | 3 | 9 | 4 | |||||
7 | 6 | 8 | ||||||
9 | 8 | 1 | ||||||
9 |
After applying phases 1 and 2 we are still left with.
134 | 379 | 134 | 1578 | 14589 | 14 | 2 | 679 | 13469 |
1234 | 5 | 8 | 127 | 1249 | 6 | 3479 | 79 | 1349 |
6 | 279 | 124 | 3 | 1249 | 124 | 479 | 8 | 5 |
238 | 1 | 23 | 4 | 7 | 5 | 6 | 29 | 289 |
9 | 4 | 6 | 128 | 128 | 12 | 5 | 3 | 7 |
5 | 28 | 7 | 6 | 3 | 9 | 1 | 4 | 28 |
7 | 6 | 12345 | 125 | 1245 | 8 | 349 | 259 | 2349 |
234 | 23 | 2345 | 9 | 2456 | 7 | 8 | 1 | 2346 |
1248 | 28 | 9 | 125 | 12456 | 3 | 47 | 2567 | 246 |
Which appears to have still a long way to go. If we examine the second column we can see things of interest.
The first of these rules will allow us to simplify cells (1,2) and (3,2) which currently show the options 3,7,9 and 2,7,9. Because 7 and 9 only appear in these two cells, one must contain 7 and the other 9, though we don't know which is which. This does though allow us to remove the possibilities that these cells contain a 3 or a 2.
The second of these rules applies to cells (6,2) and (9,2), both of which can only contain a 2 or an 8. Again this means that one of the cells must contain 2 and the other 8. Although we don't know which is which we do no that none of the other cells in the column can be a 2 or an 8. This allows us to remove the twos from (3,2) and (8,2), though in this case (3,2) has already had its two removed using the previous rule.
The table now looks like
134 | 79 | 134 | 1578 | 14589 | 14 | 2 | 679 | 13469 |
1234 | 5 | 8 | 127 | 1249 | 6 | 3479 | 79 | 1349 |
6 | 79 | 124 | 3 | 1249 | 124 | 479 | 8 | 5 |
238 | 1 | 23 | 4 | 7 | 5 | 6 | 29 | 289 |
9 | 4 | 6 | 128 | 128 | 12 | 5 | 3 | 7 |
5 | 28 | 7 | 6 | 3 | 9 | 1 | 4 | 28 |
7 | 6 | 12345 | 125 | 1245 | 8 | 349 | 259 | 2349 |
234 | 3 | 2345 | 9 | 2456 | 7 | 8 | 1 | 2346 |
1248 | 28 | 9 | 125 | 12456 | 3 | 47 | 2567 | 246 |
With the changed cells in green. The fact that 3 is now in a cell on it's own means that we can eliminate all the other 3's in the same block or row.
134 | 79 | 134 | 1578 | 14589 | 14 | 2 | 679 | 13469 |
1234 | 5 | 8 | 127 | 1249 | 6 | 3479 | 79 | 1349 |
6 | 79 | 124 | 3 | 1249 | 124 | 479 | 8 | 5 |
238 | 1 | 23 | 4 | 7 | 5 | 6 | 29 | 289 |
9 | 4 | 6 | 128 | 128 | 12 | 5 | 3 | 7 |
5 | 28 | 7 | 6 | 3 | 9 | 1 | 4 | 28 |
7 | 6 | 1245 | 125 | 1245 | 8 | 349 | 259 | 2349 |
24 | 3 | 245 | 9 | 2456 | 7 | 8 | 1 | 246 |
1248 | 28 | 9 | 125 | 12456 | 3 | 47 | 2567 | 246 |
In this case there are no more pairs that can be used to simplify the grid, so we move onto phase 4, Eliminating Bad Guesses.
This phase is simple to described but harder to implement. In the previous example, after exhausting phase 3 we are still left with
134 | 79 | 134 | 1578 | 14589 | 14 | 2 | 679 | 13469 |
1234 | 5 | 8 | 127 | 1249 | 6 | 3479 | 79 | 1349 |
6 | 79 | 124 | 3 | 1249 | 124 | 479 | 8 | 5 |
238 | 1 | 23 | 4 | 7 | 5 | 6 | 29 | 289 |
9 | 4 | 6 | 128 | 128 | 12 | 5 | 3 | 7 |
5 | 28 | 7 | 6 | 3 | 9 | 1 | 4 | 28 |
7 | 6 | 1245 | 125 | 1245 | 8 | 349 | 259 | 2349 |
24 | 3 | 245 | 9 | 2456 | 7 | 8 | 1 | 246 |
1248 | 28 | 9 | 125 | 12456 | 3 | 47 | 2567 | 246 |
The next phase involves making guesses and seeing the consequences. There are three possible outcomes,
This first outcome at first glance appears to be ideal, but at this moment we don't know there is only one solution so we ignore it. Instead what we look for is outcome 2, the contradiction becase we now know that that guess must be wrong. Looking at the table above there are 141 possible guesses, starting with the top left cell being 1.
This part of the algorithm involves making that guess and then applying the three previous phases to see where it leads us. In this case setting cell (1,1) to 1 leads to a contradiction so we can eliminate that guess leaving only 3 or 4 in the top left cell. Trying 3 causes no problems but 4 leads to a contradiction, so we can eliminate that too.
Now we only have 3 left so that must be the value in the top left cell. Fixing this value and applying the previous phases leads to the solution
3 | 9 | 1 | 8 | 5 | 4 | 2 | 7 | 6 |
4 | 5 | 8 | 7 | 2 | 6 | 3 | 9 | 1 |
6 | 7 | 2 | 3 | 9 | 1 | 4 | 8 | 5 |
8 | 1 | 3 | 4 | 7 | 5 | 6 | 2 | 9 |
9 | 4 | 6 | 1 | 8 | 2 | 5 | 3 | 7 |
5 | 2 | 7 | 6 | 3 | 9 | 1 | 4 | 8 |
7 | 6 | 4 | 2 | 1 | 8 | 9 | 5 | 3 |
2 | 3 | 5 | 9 | 6 | 7 | 8 | 1 | 4 |
1 | 8 | 9 | 5 | 4 | 3 | 7 | 6 | 2 |
If we take the example from the start of phase 3 (Pairs) and remove the '6' on the fifth row we are left with this puzzle.
2 | ||||||||
5 | 8 | 6 | ||||||
3 | 8 | 5 | ||||||
1 | 4 | 7 | 6 | |||||
9 | 5 | 7 | ||||||
7 | 3 | 9 | 4 | |||||
7 | 6 | 8 | ||||||
9 | 8 | 1 | ||||||
9 |
After applying the first 4 phases we are left with
134 | 79 | 134 | 78 | 1589 | 45 | 2 | 67 | 1346 |
1234 | 5 | 8 | 127 | 129 | 6 | 3479 | 379 | 134 |
126 | 79 | 26 | 3 | 129 | 14 | 479 | 8 | 5 |
8 | 1 | 35 | 4 | 7 | 25 | 6 | 239 | 39 |
9 | 34 | 346 | 168 | 68 | 12 | 5 | 23 | 7 |
56 | 2 | 7 | 56 | 3 | 9 | 1 | 4 | 8 |
7 | 6 | 1234 | 12 | 1245 | 8 | 349 | 359 | 2349 |
2345 | 34 | 2345 | 9 | 2456 | 7 | 8 | 1 | 2346 |
145 | 8 | 9 | 1256 | 12456 | 3 | 47 | 567 | 246 |
Phase 5 works by working through the possible values recursively until it finds a solution or a contradiction. It steps through the possible guesses sequentially. The first cell (1,1) can be 1, 3 or 4. Choosing 1 gives
1 | 79 | 34 | 8 | 59 | 45 | 2 | 67 | 346 |
34 | 5 | 8 | 7 | 2 | 6 | 349 | 39 | 1 |
26 | 79 | 26 | 3 | 19 | 14 | 479 | 8 | 5 |
8 | 1 | 35 | 4 | 7 | 25 | 6 | 239 | 39 |
9 | 34 | 346 | 16 | 8 | 12 | 5 | 23 | 7 |
56 | 2 | 7 | 56 | 3 | 9 | 1 | 4 | 8 |
7 | 6 | 1 | 2 | 45 | 8 | 349 | 359 | 349 |
2345 | 34 | 2345 | 9 | 456 | 7 | 8 | 1 | 346 |
45 | 8 | 9 | 156 | 1456 | 3 | 47 | 567 | 2 |
The next guess is to try 7 in cell (1,2). This leads to
1 | 7 | 3 | 8 | 9 | 5 | 2 | 6 | 4 |
4 | 5 | 8 | 7 | 2 | 6 | 39 | 39 | 1 |
2 | 9 | 6 | 3 | 1 | 4 | 7 | 8 | 5 |
8 | 1 | 5 | 4 | 7 | 2 | 6 | 39 | 39 |
9 | 3 | 4 | 6 | 8 | 1 | 5 | 2 | 7 |
6 | 2 | 7 | 5 | 3 | 9 | 1 | 4 | 8 |
7 | 6 | 1 | 2 | 4 | 8 | 39 | 5 | 39 |
3 | 4 | 2 | 9 | 5 | 7 | 8 | 1 | 6 |
5 | 8 | 9 | 1 | 6 | 3 | 4 | 7 | 2 |
The third guess, setting (2,7) to 3 leads to a solution
1 | 7 | 3 | 8 | 9 | 5 | 2 | 6 | 4 |
4 | 5 | 8 | 7 | 2 | 6 | 3 | 9 | 1 |
2 | 9 | 6 | 3 | 1 | 4 | 7 | 8 | 5 |
8 | 1 | 5 | 4 | 7 | 2 | 6 | 3 | 9 |
9 | 3 | 4 | 6 | 8 | 1 | 5 | 2 | 7 |
6 | 2 | 7 | 5 | 3 | 9 | 1 | 4 | 8 |
7 | 6 | 1 | 2 | 4 | 8 | 9 | 5 | 3 |
3 | 4 | 2 | 9 | 5 | 7 | 8 | 1 | 6 |
5 | 8 | 9 | 1 | 6 | 3 | 4 | 7 | 2 |
Stepping back one guess and trying (2,7) = 9 also leads to a solution. All in all there are 32 possibilities, see here.
There are two scripts to choose from. Click on the link to download the corresponding one.
SuSolver.py | The new script. This implements the 5 phases described above except the first part of phase 3 (looking for numbers that only appear twice and in pairs). This solves all the examples I have tried. When it can't solve a puzzle using the first 4 phases it enters phase 5 which is intended to enumerate the possible solutions (on the assumption that there are more than one). If there is only 1 solution which isn't found after the first 4 phases then phase 5 will find it. |
OldSuSolver.py | This is the previous version I published. It is very similar but in place of the current phase 4 it went straight into a recursive algorithm. The problem with this version is that it will always find a solution even when there aren't any, picking the first one it finds from the set of possibilities. I'm pretty confident though, that if there is a unique solution it will find it. |
More information about running Python scripts can be found on the Python Patterns page.
Visits since June 2005: | ![]() |