diff --git a/source/_posts/2013-03-11-sudoku-and-haskell-a-sudoku-solver.markdown b/source/_posts/2013-03-11-sudoku-and-haskell-a-sudoku-solver.markdown index 562a659..bc71c68 100644 --- a/source/_posts/2013-03-11-sudoku-and-haskell-a-sudoku-solver.markdown +++ b/source/_posts/2013-03-11-sudoku-and-haskell-a-sudoku-solver.markdown @@ -38,8 +38,8 @@ import Data.List.Split (chunksOf) import Data.Maybe (fromJust) import Data.Ord (comparing) import Control.Monad (foldM, guard) + ``` -> Now that the imports are out of the way let's setup the basic functionalities. @@ -73,8 +73,8 @@ updateBoard (Board ixMap) cell@Cell{..} = Board (M.insert cellIdx cell ixMap) emptyBoard :: Board emptyBoard = Board $ foldl' (\m i -> M.insert i (Cell i allDigits) m) M.empty [0 .. 80] + ``` -> A `Digit` is just one of the nine possible values. It derives `Eq`, `Ord` and `Enum`. We use the fact the `Digit` is enumerable to create a custom `Show` instance which is just the `Digit`'s ordinal @@ -119,8 +119,8 @@ readBoard str = do return $ updateBoard board (Cell i cellVals)) emptyBoard $ zip [0 .. 80 ] str + ``` -> `readBoard` converts a string to a `Board`. It returns `Just Board` if the string represents a valid Sudoku board, otherwise it returns `Nothing`. We use the `Monad` nature of `Maybe` to guard against the @@ -149,8 +149,8 @@ asciiShowBoard = instance Show Board where show = showBoard + ``` -> `showBoard` does the reverse of `readBoard`. It takes a board and creates a valid string representation of it. It does so by mapping over each cell of the board in the order of their index @@ -188,7 +188,7 @@ Here is an example run in _ghci_: Is it solved yet? -------------------- -Before we proceed to write a full-fledged Sudoku solver, it is good to have a function which tells us +Before we proceed to write a full-fledged Sudoku solver, we must have a function which tells us whether a board is filled completely and whether that solution is a valid one. ```haskell @@ -215,8 +215,8 @@ blockIxs = blockRow1 ++ blockRow2 ++ blockRow3) row1 row2 row3) . chunksOf 3 . map (chunksOf 3) $ rowIxs + ``` -> We start by defining the board state as an enumeration of three value corresponding to the solved, incomplete and invalid states. The `boardState` function takes a board and gives its current state. @@ -314,7 +314,7 @@ a node in the search graph with the moves linking them as edges. A move is filli with a digit. So now we can solve the board just by finding a path from the given board configuration to a configuration where all cells are filled. -We can use [Depth first Search][5] (DSF) algorithm to accomplish this. DFS is a brute force +We can use [Depth first Search][5] (DFS) algorithm to accomplish this. DFS is a brute force technique and in worst case it may visit all the nodes in the search graph. In case of Sudoku, this search graph is very large (approximately 6.67×1021) so this is not a very efficient way of solving Sudoku. For now, we'll add one optimization in DFS: while listing the next possible @@ -346,7 +346,38 @@ dfsSolver board = dfs board nextBoards ((== SOLVED) . boardState) ``` -Let's try this out in _ghci_ now: +The `dfs` functions is a literal translation of the DFS algorithm. It visits the graph node by node, +keeping a track of the nodes visited and the goals seen till now. It first checks if the current +node is a goal and if so it just returns it wrapped in a list. If the current node is not a goal, it +checks if has already been visited and if so just returns an empty list. This is done to avoid getting +stuck into infinite loops, going down the same path again and again. If both of these checks fail then +it gets to the general case in which it just finds all the next nodes for the current node, +recursively maps itself over them accumulating all the goals found (by flattening the lists) and +returns them. + +`dfsSolver` uses `dfs` to solve Sudoku as a DFS by supplying the `getNext` and `isGoal` functions. +the `isGoal` function is simple, it just checks if the current board is solved by calling the +`boardState` function. `nextBoards` is a little complicated so let's go over it step by step, +reading from bottom to up: + +1. it gets all the cells in the board +2. filters in only the empty cells, the cells with more than one cell values +3. sorts the empty cells comparing the count of their cell values +4. for each cell, it takes each cell value and create a cell containing only that cell value and +flattens this list of list of cells into a list of cells +5. for each cell so created, it creates a board by updating the current board with that cell + +So in effect, it goes over all the empty cells in the board in ascending order of their cell value +count, picking each cell value in turn and creating a board where that cell is filled with that cell +value. Hence it outputs all the next board configurations for the current board. + +And then `dfs` goes to work; it goes over all the whole graph and finds all the solutions. Note that +since a solution is reachable from more than one path (fill cell 1 first and then cell 2 or do it in +reverse order), the solutions returned are in general not unique. Also, because of the `concatMap` in +`dfs`, it finds the solutions one by one in a lazy fashion. So it is possible to stop the search early +and just get the first solution found. + +Let's try this out now in _ghci_: ```haskell *Sudoku> let boardStr = ".839216579.734582125187649354813297672956413813679824537268951481425376969541738." @@ -369,6 +400,15 @@ Let's try this out in _ghci_ now: (7.32 secs, 1322834576 bytes) ``` +And it solves the Sudoku as expected! Note how we have to use `nub` to find the unique solutions. Also +note how the time take increases from 1.75 secs for 3 empty cells to 66.64 secs for 4 empty cells, +indicating the exponential nature of the problem graph and the brute force nature of the solver. The +first solution in case of 4 empty cells is however found in just 7.32 secs using `head` to stop the +search early. + +So we have now successfully written our first Sudoku solver. Too bad it can't be used for solving boards +with more than few empty cells. Let's see a way to improve the solver drastically, in the next section. + Constraint propagation ---------------------- diff --git a/source/downloads/code/sudoku1.hs b/source/downloads/code/sudoku1.hs index 3d3fe89..9ad21ec 100644 --- a/source/downloads/code/sudoku1.hs +++ b/source/downloads/code/sudoku1.hs @@ -10,6 +10,7 @@ import Data.List.Split (chunksOf) import Data.Maybe (fromJust) import Data.Ord (comparing) import Control.Monad (foldM, guard) + data Digit = ONE | TWO | TRE | FOR | FIV | SIX | SVN | EGT | NIN deriving (Eq, Ord, Enum) @@ -39,6 +40,7 @@ updateBoard (Board ixMap) cell@Cell{..} = Board (M.insert cellIdx cell ixMap) emptyBoard :: Board emptyBoard = Board $ foldl' (\m i -> M.insert i (Cell i allDigits) m) M.empty [0 .. 80] + readBoard :: String -> Maybe Board readBoard str = do guard $ length str == 81 @@ -50,6 +52,7 @@ readBoard str = do return $ updateBoard board (Cell i cellVals)) emptyBoard $ zip [0 .. 80 ] str + showBoard :: Board -> String showBoard = map (\Cell{..} -> @@ -70,6 +73,7 @@ asciiShowBoard = instance Show Board where show = showBoard + data BoardState = SOLVED | INCOMPLETE | INVALID deriving (Eq, Show) @@ -93,6 +97,7 @@ blockIxs = blockRow1 ++ blockRow2 ++ blockRow3) row1 row2 row3) . chunksOf 3 . map (chunksOf 3) $ rowIxs + dfs :: Ord a => a -> (a -> [a]) -> (a -> Bool) -> [a] dfs start getNext isGoal = go start S.empty where diff --git a/source/downloads/code/sudoku1.lhs b/source/downloads/code/sudoku1.lhs index 1c1088c..e464bb6 100644 --- a/source/downloads/code/sudoku1.lhs +++ b/source/downloads/code/sudoku1.lhs @@ -38,7 +38,7 @@ Basic setup > import Data.Maybe (fromJust) > import Data.Ord (comparing) > import Control.Monad (foldM, guard) -> +> Now that the imports are out of the way let's setup the basic functionalities. @@ -71,7 +71,7 @@ Now that the imports are out of the way let's setup the basic functionalities. > emptyBoard :: Board > emptyBoard = > Board $ foldl' (\m i -> M.insert i (Cell i allDigits) m) M.empty [0 .. 80] -> +> A `Digit` is just one of the nine possible values. It derives `Eq`, `Ord` and `Enum`. We use the fact the `Digit` is enumerable to create a custom `Show` instance which is just the `Digit`'s ordinal @@ -115,7 +115,7 @@ right column. An example: > return $ updateBoard board (Cell i cellVals)) > emptyBoard > $ zip [0 .. 80 ] str -> +> `readBoard` converts a string to a `Board`. It returns `Just Board` if the string represents a valid Sudoku board, otherwise it returns `Nothing`. We use the `Monad` nature of `Maybe` to guard against the @@ -143,7 +143,7 @@ string contained a digit at the cell index else they have all the digits as cell > > instance Show Board where > show = showBoard -> +> `showBoard` does the reverse of `readBoard`. It takes a board and creates a valid string representation of it. It does so by mapping over each cell of the board in the order of their index @@ -181,7 +181,7 @@ Here is an example run in _ghci_: Is it solved yet? -------------------- -Before we proceed to write a full-fledged Sudoku solver, it is good to have a function which tells us +Before we proceed to write a full-fledged Sudoku solver, we must have a function which tells us whether a board is filled completely and whether that solution is a valid one. > data BoardState = SOLVED | INCOMPLETE | INVALID @@ -207,7 +207,7 @@ whether a board is filled completely and whether that solution is a valid one. > blockRow1 ++ blockRow2 ++ blockRow3) > row1 row2 row3) > . chunksOf 3 . map (chunksOf 3) $ rowIxs -> +> We start by defining the board state as an enumeration of three value corresponding to the solved, incomplete and invalid states. The `boardState` function takes a board and gives its current state. @@ -305,7 +305,7 @@ a node in the search graph with the moves linking them as edges. A move is filli with a digit. So now we can solve the board just by finding a path from the given board configuration to a configuration where all cells are filled. -We can use [Depth first Search][5] (DSF) algorithm to accomplish this. DFS is a brute force +We can use [Depth first Search][5] (DFS) algorithm to accomplish this. DFS is a brute force technique and in worst case it may visit all the nodes in the search graph. In case of Sudoku, this search graph is very large (approximately 6.67×1021) so this is not a very efficient way of solving Sudoku. For now, we'll add one optimization in DFS: while listing the next possible @@ -335,7 +335,38 @@ We can write this in two parts: a general DFS function and a solver which uses i > $ board > -Let's try this out in _ghci_ now: +The `dfs` functions is a literal translation of the DFS algorithm. It visits the graph node by node, +keeping a track of the nodes visited and the goals seen till now. It first checks if the current +node is a goal and if so it just returns it wrapped in a list. If the current node is not a goal, it +checks if has already been visited and if so just returns an empty list. This is done to avoid getting +stuck into infinite loops, going down the same path again and again. If both of these checks fail then +it gets to the general case in which it just finds all the next nodes for the current node, +recursively maps itself over them accumulating all the goals found (by flattening the lists) and +returns them. + +`dfsSolver` uses `dfs` to solve Sudoku as a DFS by supplying the `getNext` and `isGoal` functions. +the `isGoal` function is simple, it just checks if the current board is solved by calling the +`boardState` function. `nextBoards` is a little complicated so let's go over it step by step, +reading from bottom to up: + +1. it gets all the cells in the board +2. filters in only the empty cells, the cells with more than one cell values +3. sorts the empty cells comparing the count of their cell values +4. for each cell, it takes each cell value and create a cell containing only that cell value and +flattens this list of list of cells into a list of cells +5. for each cell so created, it creates a board by updating the current board with that cell + +So in effect, it goes over all the empty cells in the board in ascending order of their cell value +count, picking each cell value in turn and creating a board where that cell is filled with that cell +value. Hence it outputs all the next board configurations for the current board. + +And then `dfs` goes to work; it goes over all the whole graph and finds all the solutions. Note that +since a solution is reachable from more than one path (fill cell 1 first and then cell 2 or do it in +reverse order), the solutions returned are in general not unique. Also, because of the `concatMap` in +`dfs`, it finds the solutions one by one in a lazy fashion. So it is possible to stop the search early +and just get the first solution found. + +Let's try this out now in _ghci_:
haskell
 *Sudoku> let boardStr = ".839216579.734582125187649354813297672956413813679824537268951481425376969541738."
@@ -358,6 +389,15 @@ Let's try this out in _ghci_ now:
 (7.32 secs, 1322834576 bytes)
 
+And it solves the Sudoku as expected! Note how we have to use `nub` to find the unique solutions. Also +note how the time take increases from 1.75 secs for 3 empty cells to 66.64 secs for 4 empty cells, +indicating the exponential nature of the problem graph and the brute force nature of the solver. The +first solution in case of 4 empty cells is however found in just 7.32 secs using `head` to stop the +search early. + +So we have now successfully written our first Sudoku solver. Too bad it can't be used for solving boards +with more than few empty cells. Let's see a way to improve the solver drastically, in the next section. + Constraint propagation ----------------------