# Build a Speech-Controlled, Sudoku-Solving Robot  Written by

## Using a Raspberry Pi Microprocessor and Camera

### Solving Sudoku puzzles is difficult and time-consuming for most people. In this article, Arijit explains how he and his team members built a speaking, voice-controlled robot, using a Raspberry Pi 4 Model B, that can quickly solve any sudoku puzzle. Then, he walks us through the details of the building and solution processes.

• How can I make a voice-controlled robot?
• Can robots solve sodoku puzzles?
• Can you build a robot with a Raspberry Pi?
• Raspberry Pi 4 Model B microprocessor
• Raspberry Pi Camera Module V2
• Raspberry Pi 3.5” Touchscreen Module
• Speaker (small)
• LED (small)
• Raspberry Pi power supply
• USB microphone
• Jumper wires
• 3D-printer for body parts

Sudoko (shortened from a Japanese phrase that means “the numerals must remain single”) is a puzzle in which missing numbers are filled into a grid of 9×9 squares. The squares are subdivided into 3×3 boxes. To solve the puzzle, each box, row, and column must contain the numbers 1-9 without duplication. Figure 1 shows a sample sudoku puzzle.

Most individuals find it difficult to solve sudoku puzzles; a beginner could take many hours. My team members and I wondered if it might be possible to develop a robot that could quickly and easily solve sudoku puzzles just by looking at them with its component camera. Here, we describe how we built a simple, voice-controlled robot that can solve any sudoku puzzle in seconds, regardless of the puzzle’s difficulty. Additionally, we discuss in detail the steps in the entire sudoku-solving process. A short demonstration of the robot in action is available on YouTube . Figure 1 A sample 9×9 sudoku puzzle. Note that some boxes are prefilled with numbers at the start. The objective is to fill in numbers such that each row, column, and 3×3 box contains the non-repeating numbers, 1-9.
##### HARDWARE

The robot was constructed using the following components:

• Raspberry Pi 4 Model B microprocessor
• Raspberry Pi Camera Module V2
• Raspberry Pi 3.5” Touchscreen Module
• Speaker (small)
• LED (small)
• Raspberry Pi power supply
• USB microphone
• Jumper wires
• 3D-printed body parts
##### CHARACTERISTICS OF THE ROBOT

Before discussing how the robot was built and programmed, it’s first important to understand its characteristics and operations. How the robot performs each operation is detailed later in this article, in sections on programming and important functions in the code.

The robot was designed to respond to voice instructions after it is powered on. It features an LED indicator on the top of its head (Figure 2). When the LED is on, the robot is listening to the user, and when the LED is off, it is processing the prior command. The robot is capable of speech recognition, and the user can ask the robot to introduce itself (Figure 3).

The user instructs the robot to begin solving a sudoku puzzle, and the robot begins capturing video. Also, it streams the video on its touchscreen, as shown in Figure 4. A sudoku puzzle, either on paper or on a digital device’s screen, must be held in front of the robot, and the user instructs the robot to capture the sudoku image (Figure 5).

After capturing the puzzle, the robot detects the sudoku from the image, extracts digits from the sudoku and forms its own version of the unsolved puzzle (Figure 6). Then it demonstrates the procedure used to solve the puzzle, employing a backtracking approach (Figure 7). Using its artificial voice, the robot announces every step as it solves the sudoku. After solving the puzzle, the robot displays it on the touchscreen (Figure 8), and waits for the user to tell it to solve other puzzles. Figure 3 This shows the robot introducing itself. Note that the LED light is on, and its face on the screen is illuminated.
##### ASSEMBLING THE ROBOT

We carried out several steps to get the robot ready for programming. First, we needed an appropriate body for our robot. We used one of my old toys for this purpose. Next, we attached the Raspberry Pi touchscreen to the robot’s head (Figure 9), and linked the Raspberry Pi camera to the robot’s “belly” (Figure 10). The jumper wires were then connected to the screen (Figure 11), and the speaker and LED were fastened to the robot’s back (Figure 12). We then connected the speaker, microphone, camera, and jumper wires to the Raspberry Pi (Figure 13). Finally, several 3D-printed elements were added to improve the robot’s appearance. The finished robot is shown in Figure 14. Figure 9 The touchscreen is attached to the robot’s head. The hole in the center of the body is where the camera will be placed.
##### PREPARING THE RASPBERRY PI FOR PROGRAMMING

The instructions below were used to prepare the Raspberry Pi for programming.

• Using Raspberry Pi Imager, install the Raspberry Pi OS Buster (Legacy) on a memory card .
• Insert the memory card into the Raspberry Pi.
• Power the Pi using the official Raspberry Pi power supply.
• Before setting up the Raspberry Pi, connect it to a large display through HDMI, or SSH.
• Install the required drivers for the Raspberry Pi touchscreen. (Details are available in the touchscreen’s manual.)
• Enable the camera using “raspi-config”.
• Set “Audio Output” to “3.5mm jack” using “raspi-config,” or from settings.
• Connect the Raspberry Pi to Wi-Fi, to control it without a large display.
• Set up the microphone and test it. In this step, you will need to edit the “/home/pi/.asoundrc” file.
##### PROGRAMMING THE ROBOT

Python 3 was used to program the robot. To simplify and streamline the development process, all the robot’s functions, including voice recognition, facial animation, and sudoku solving, are carried out by a single Python program. Note that the code (discussed later) makes extensive use of threads, which are managed by global flags. Additionally, those global flags’ values are adjusted in response to the user’s voice instructions.

We used several Python libraries for programming the robot’s various operations and for working with arrays. Some of the specific functions used from each library are mentioned later in this article. The libaries we used are:

• OpenCV
• Imutils
• Pytesseract
• PyGame Speech Recognition
• NumPy

An algorithm was created to help our robot solve sudoku puzzles. A simple backtracking method , which works well in similar applications, was employed:

Find row, col of an unassigned cell

If there is none, return true

For digits from 1 to 9

a) If there is no conflict for digit at row, col assign digit to row, col and recursively try fill in rest of grid
b) If recursion successful, return true
c) Else, remove digit and try another

If all digits have been tried and nothing worked, return false

##### IMPORTANT FUNCTIONS IN THE CODE

The complete code with all the required resources are available on GitHub . Some important functions in the code are discussed below.

faceAnimation(): Using the PyGame package, this function generates facial animations and shows them on the screen (see Listing 1). Figure 15 and Figure 16 show “face1.png” and “face2.png” files, which this function uses to produce the animations. Figure 15 This is the file “face1.png” that is included in the code in Listing 1. Figure 16 This is the file “face2.png” that is included in the code in Listing 1.

focusGrid(): This function finds the largest contour in the supplied picture, determines whether it is in proper shape, cleans it up with certain transformations, and then returns it (see Listing 2).

splitUp(): This function takes the largest, cleanest contour as input, splits it into cells, and returns the matrix of the cells (see Listing 3).

highlightDigit(): This function uses connected-component analysis to remove the noisy areas from an input cell, leaving only the cell’s digits (see Listing 4).

highlightCells(): This function applies connected-component analysis to all the cells (Listing 5).

getDigits(): Using the pytesseract library, this function extracts digits via optical character recognition from the highlighted cells (see Listing 6).

extractGrid(): This function uses the focusGrid(), splitUp(), highlightCells(), and getDigits() functions to get the grid of recognized digits from the provided sudoku image (see Listing 7).

draw(): This function uses PyGame to help draw the puzzle in the display (see Listing 8).

draw_box(): This function draws a red box around the cell the robot is working on, when the robot is solving a sudoku puzzle (see Listing 9).

draw_val(): This function draws a value (digit) in a cell, while the robot is solving the puzzle (see Listing 10).

show_puzzle(): This function displays the sudoku puzzle using PyGame. It uses the draw() function internally (see Listing 11).

valid(): This function determines whether the current solution to the sudoku is correct, if a certain value is entered into a specific cell at each step of the solving process (see Listing 12).

solve(): This function solves the sudoku using recursion. The functions valid(), draw(), and draw_box() are used internally (see Listing 13).

sudoku_solve(): This function runs the video streaming and image capture, and passes them to other functions to solve the puzzle. It also provides the voice output at different steps in the solving process (see Listing 14).

main: Here, we employed the voice recognition system, set global variables, and built several threads. This section of the code alters the values of global variables according to user instructions, and different actions are carried out based on those values (see Listing 15.

``````LISTING 1
Using the PyGame package, faceAnimation() generates facial animations and shows them on the screen.

def faceAnimation(display_surface):
global face, talking

while face or talking:
if face:
display_surface.blit(image, (0, 0))
pygame.display.update()
elif talking:
display_surface.blit(image, (0, 0))
pygame.display.update()
time.sleep(0.5)
display_surface.blit(image2, (0, 0))
pygame.display.update()
time.sleep(0.5)
for event in pygame.event.get():
if event.type == pygame.QUIT:
pygame.quit()
quit()``````
``````LISTING 2
The focusGrid() function finds the largest contour in the supplied picture, determines whether it is in proper shape, cleans it up with certain transformations, and then returns it.

def focusGrid(ogimg):
rx = 500.0 / ogimg.shape
ry = 500.0 / ogimg.shape
r = max([rx, ry])
ogimg = cv2.resize(ogimg, (0, 0), fx=r, fy=r)
img = cv2.cvtColor(ogimg, cv2.COLOR_BGR2GRAY)
cv2.THRESH_BINARY, 25, 25)
blur = cv2.GaussianBlur(img, (3, 3), 3)
edged = cv2.Canny(blur, 100, 180)
contours, hierarchy = cv2.findContours(edged,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)[-2:]

if len(contours) == 0:
print(“No contours found”)
return None
cnt = None
maxArea = 0
for c in contours:
area = cv2.contourArea(c)
if area > maxArea:
maxArea = area
cnt = c
if cnt is None:
print(“No biggest contour”)
return None
epsilon = 0.01 * cv2.arcLength(cnt, True)
approx = cv2.approxPolyDP(cnt, epsilon, True)

if (approx.size != 8):
print(“Wrong shape of grid”)
return None
approx = approx.reshape(4, 2)
approx = rearrangeCorners(approx, ogimg.shape,
ogimg.shape)
approx = np.array(approx.tolist(), np.float32)

gridSize = cellSize * 9
final = np.array([
[0, 0],
[0, gridSize],
[gridSize, gridSize],
[gridSize, 0]], dtype=”float32”)

M = cv2.getPerspectiveTransform(approx, final)
fixed = cv2.warpPerspective(img, M,
(gridSize, gridSize))

return fixed``````
``````LISTING 3

The splitUp() function takes the largest, cleanest contour as input, splits it into cells, and returns the matrix of the cells.

def splitUp(grid):
cells = []
for i in range(0, 9):
row = []
for j in range(0, 9):
cropped = grid[
cellSize * i + border:cellSize * (i + 1) - border,
cellSize * j + border:cellSize * (j + 1) - border]
row.append(cropped)
cells.append(row)
return cells``````
``````LISTING 4

The highlightDigit() function uses connected-component analysis to remove the noisy areas from an input cell, leaving only the cell’s digit.

def highlightDigit(cell):
if cell is None:
return None
img = cv2.cvtColor(cell, cv2.COLOR_GRAY2RGB)
gray = cv2.bitwise_not(cell)
output = cv2.connectedComponentsWithStats(gray, 8, cv2.CV_32S)
stats = output
if len(output) <= 1:
return None
largest_label = 1 + np.argmax(output[1:, -1])
width, height = gray.shape[:2]
x, y, w, h, _ = stats[largest_label]
bX = x + w / 2.0
bY = y + h / 2.0

cX = width / 2.0
cY = height / 2.0

tX = cX - bX
tY = cY - bY

if (abs(tX) + abs(tY) > 10) or (
w * h > 0.5 * width * height):
return None
img = img[y:y + h, x:x + w]
return img``````
``````LISTING 5

The highlightCells() function applies connected-component analysis to all the cells.

def highlightCells(cells):
for i in range(len(cells)):
for j in range(len(cells[i])):
cells[i][j] = highlightDigit(cells[i][j])
return cells``````
``````LISTING 6

Using the pytesseract library, the getDigits() function extracts digits via optical character recognition from the highlighted cells.

def getDigits(cells):
line = flatten(cells)
cellsWithDigits = list(
filter(lambda x: x is not None, line))
line = hconcat_resize_min(cellsWithDigits)
custom_config = r’--psm 6 outputbase digits’
text = pytesseract.image_to_string(line,
config=custom_config)
if len(text) == 0:
return None
text = text.partition(‘\n’)
if len(text) == 0:
return None
text = “”.join(re.findall(‘\d+’, text))
if len(text) != len(
cellsWithDigits) or not text.isdigit():
return None
print(text)
grid = []
c = 0
for i in range(0, len(cells)):
row = []
for j in range(0, len(cells[i])):
if cells[i][j] is not None:
row.append(int(text[c]))
c += 1
else:
row.append(0)
grid.append(row)
return grid``````
``````LISTING 7
The extractGrid() function uses the focusGrid(), splitUp(), highlightCells(), and getDigits() functions to get the grid of recognized digits from the provided sudoku image.

def extractGrid(img):
if img is None:
print(“No such image found”)
return None

clean = focusGrid(img)
if clean is None:
print(“Failed”)
return None

cells = splitUp(clean)
cells = highlightCells(cells)
grid = getDigits(cells)
if grid is None:
return None
return grid``````
``````LISTING 8
The draw() function uses PyGame to help draw the puzzle in the display.

def draw(grid):
for i in range(9):
for j in range(9):
if grid[i][j] != 0:
pygame.draw.rect(screen, (101, 152, 224), (padding + i * dif, j * dif, dif + 1, dif + 1))
text1 = font1.render(str(grid[i][j]), 1,(255, 255, 255))
screen.blit(text1, (padding + i * dif + 15, j * dif + 10))
for i in range(10):
if i % 3 == 0:
thick = 7
else:
thick = 1
pygame.draw.line(screen, (255, 255, 255),(padding, i * dif),(padding + 320, i * dif), thick)
pygame.draw.line(screen, (255, 255, 255),(i * dif + padding, 0),(i * dif + padding, 500), thick)``````
``````LISTING 9
The draw_box() function draws a red box around the cell the robot is working on, when the robot is solving a sudoku puzzle.

def draw_box():
for i in range(2):
pygame.draw.line(screen, (255, 0, 0), (padding + x * dif - 3, (y + i) * dif),
(padding + x * dif + dif + 3, (y + i) * dif), 4)
pygame.draw.line(screen, (255, 0, 0), (padding + (x + i) * dif, y * dif),
(padding + (x + i) * dif, y * dif + dif), 4)``````
``````LISTING 10
The draw_val() function draws a value (digit) in a cell, while the robot is solving the puzzle.

def draw_val(val):
text1 = font1.render(str(val), 1, (255, 255, 255))
screen.blit(text1, (x * dif + 15, y * dif + 15))``````
``````LISTING 11

The show_puzzle() function displays the sudoku puzzle using PyGame. It uses the draw() function internally.

def show_puzzle(grid):
screen.fill(((75, 75, 75)))
for event in pygame.event.get():
if event.type == pygame.QUIT:
return
draw(grid)
pygame.display.update()``````
``````LISTING 12

The valid() function determines the accuracy of the current solution to the sudoku, if a certain value is entered into a specific cell at each step of the solving process.

def valid(m, i, j, val):
for it in range(9):
if m[i][it] == val:
return False
if m[it][j] == val:
return False
it = i // 3
jt = j // 3
for i in range(it * 3, it * 3 + 3):
for j in range(jt * 3, jt * 3 + 3):
if m[i][j] == val:
return False
return True``````
``````LISTING 13

The solve() function solves the sudoku using recursion. The functions valid(), draw(), and draw_box() are used internally.

def solve(grid, i, j):
while grid[i][j] != 0:
if i < 8:
i += 1
elif i == 8 and j < 8:
i = 0
j += 1
elif i == 8 and j == 8:
return True
pygame.event.pump()
for it in range(1, 10):
if valid(grid, i, j, it) == True:
grid[i][j] = it
global x, y
x = i
y = j
screen.fill((75, 75, 75))
draw(grid)
draw_box()
pygame.display.update()
pygame.time.delay(20)
if solve(grid, i, j) == 1:
return True
else:
grid[i][j] = 0
screen.fill(((75, 75, 75)))

draw(grid)
draw_box()
pygame.display.update()
pygame.time.delay(50)
return False``````
``````LISTING 14

The sudoku_solve() function runs the video streaming and image capture, and passes them to other functions to solve the puzzle. It also provides the voice output at different steps in the solving process.

def sudoku_solve():
global solve_sudoku, show_solution, capture
val = 0
vs = VideoStream(usePiCamera=True,
resolution=(1280,720)).start()
time.sleep(1.0)
img_name = “temp.png”
while True:
while solve_sudoku:

show_solution = True
up_points = (screen_size_x, 360)
frame = cv2.resize(initial_frame, up_points,
interpolation=cv2.INTER_LINEAR)
cv2.normalize(frame, frame, 0, 255,
cv2.NORM_MINMAX)
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame = numpy.rot90(frame, 3)
frame = numpy.fliplr(frame)
frame = pygame.surfarray.make_surface(frame)
screen.blit(frame, (0, 0))
pygame.display.update()
if capture:
capture = False
cv2.imwrite(img_name, initial_frame)
time.sleep(2)
print(“{} written!”.format(img_name))
try:
if grid == None:
print(“No Sudoku found”)
found any sudoku in the image”,))
continue
except:
print(“Error”)
continue

grid = [list(i) for i in zip(*grid)]
show_puzzle(grid)

recognised the sudoku, and now I am
solving it.”,))
time.sleep(1)

run = True
show = True
flag1 = 0

while run:
for event in pygame.event.get():
if event.type == pygame.QUIT:
run = False

if solve(grid, 0, 0):
run = False
if val != 0:
draw_val(val)
if valid(grid, int(x), int(y),
val) == True:
grid[int(x)][int(y)] = val
flag1 = 0
else:
grid[int(x)][int(y)] = 0
val = 0

draw(grid)
if flag1 == 1:
draw_box()
pygame.display.update()
time.sleep(1)
args=(“I have solved the sudoku”,))

while show_solution:
time.sleep(1)

for event in pygame.event.get():
if event.type == pygame.QUIT:
exit()

pygame.quit()``````
``````Listing 15

The main section of the code alters the values of global variables according to user instructions, and different actions are carried out based on those values.

if __name__ == “__main__”:
pos_x = 0
pos_y = -1
os.environ[‘SDL_VIDEO_WINDOW_POS’] = ‘%i,%i’ % (
pos_x, pos_y)
os.environ[‘SDL_VIDEO_CENTERED’] = ‘0’

screen_size_x = 500
screen_size_y = 320
cellSize = 56
border = 3
padding = (screen_size_x - screen_size_y) / 2
x = 0
y = 0
dif = screen_size_y / 9

talking = False
solve_sudoku = False
capture = False
face = True
show_solution = False

pygame.init()
pygame.font.init()
screen = pygame.display.set_mode(
(screen_size_x, screen_size_y), pygame.NOFRAME)
font1 = pygame.font.SysFont(“comicsans”, 25)

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(40, GPIO.OUT, initial=GPIO.LOW)

args=(screen,))

sample_rate = 48000
chunk_size = 2048
r = sr.Recognizer()

with sr.Microphone(device_index=2,
sample_rate=sample_rate,
chunk_size=chunk_size) as source:
while True:
print(“Say Something”)
GPIO.output(40, GPIO.HIGH)
audio = r.listen(source)

try:
GPIO.output(40, GPIO.LOW)
print(“you said: “ + text)
if any(x in text for x in [“intro”]):
talking = True
face = False
elif any(x in text for x in
[“start”, “sudoku”, “solving”]):
solve_sudoku = True
face = False
elif any(x in text for x in [“capture”]):
capture = True
elif any(x in text for x in [“stop”]):
solve_sudoku = False
face = True
elif any(x in text for x in [“thank”]):
show_solution = False
elif any(x in text for x in [“exit”]):
exit()
elif any(x in text for x in [“sleep”]):
os.system(“sudo poweroff”)

except sr.UnknownValueError:
print(
“Google Speech Recognition could not understand audio”)

except sr.RequestError as e:
print(“error”)

if __name__ == “__main__”:
pos_x = 0
pos_y = -1
os.environ[‘SDL_VIDEO_WINDOW_POS’] = ‘%i,%i’ % (
pos_x, pos_y)
os.environ[‘SDL_VIDEO_CENTERED’] = ‘0’

screen_size_x = 500
screen_size_y = 320
cellSize = 56
border = 3
padding = (screen_size_x - screen_size_y) / 2
x = 0
y = 0
dif = screen_size_y / 9

talking = False
solve_sudoku = False
capture = False
face = True
show_solution = False

pygame.init()
pygame.font.init()
screen = pygame.display.set_mode(
(screen_size_x, screen_size_y), pygame.NOFRAME)
font1 = pygame.font.SysFont(“comicsans”, 25)

GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(40, GPIO.OUT, initial=GPIO.LOW)

args=(screen,))

sample_rate = 48000
chunk_size = 2048
r = sr.Recognizer()

with sr.Microphone(device_index=2,
sample_rate=sample_rate,
chunk_size=chunk_size) as source:
while True:
print(“Say Something”)
GPIO.output(40, GPIO.HIGH)
audio = r.listen(source)

try:
GPIO.output(40, GPIO.LOW)
print(“you said: “ + text)
if any(x in text for x in [“intro”]):
talking = True
face = False
elif any(x in text for x in
[“start”, “sudoku”, “solving”]):
solve_sudoku = True
face = False
elif any(x in text for x in [“capture”]):
capture = True
elif any(x in text for x in [“stop”]):
solve_sudoku = False
face = True
elif any(x in text for x in [“thank”]):
show_solution = False
elif any(x in text for x in [“exit”]):
exit()
elif any(x in text for x in [“sleep”]):
os.system(“sudo poweroff”)

except sr.UnknownValueError:
print(
“Google Speech Recognition could not understand audio”)

except sr.RequestError as e:
print(“error”)``````
##### CONCLUSION

We named our robot “SUDO,” and displayed an early version of it (without 3D-printed parts) at the Kolkata Mini Maker Faire. The audience was highly receptive (Figure 17, Figure 18, and Figure 19). Figure 17 Displaying an early version of SUDO, the sudoku-solving robot, at the Kolkata Mini Maker Faire

Although our robot currently can only solve sudoku puzzles, its strong processing unit makes it capable of performing a variety of other activities based on computer vision. More fascinating and exciting uses are therefore possible in the future. Additionally, we anticipate that Circuit Cellar’s readers will improve the robot over time by adding more new functions into it.

RESOURCES
RaspberryPi | www.raspberrypi.com
https://github.com/cunananm2000/Sudoku/blob/79984e4f35c6869aae81754a6334231c4515bf92/sudokuSplitter.py
https://www.geeksforgeeks.org/building-and-visualizing-sudoku-game-using-pygame/

REFERENCES
 Demonstration of the sudoku-solving robot in action
https://youtu.be/gCwES3D2PGY
https://www.raspberrypi.com/software/operating-systems/
 Geeksforgeeks, the source of the sudoku backtracking algorithm
https://www.geeksforgeeks.org/
 Complete code for our speech-controlled, sudoku-solving robot
https://github.com/Arijit1080/Speech-Controlled-Sudoku-Solving-Robot
 YouTube channel for the project team
 The project team’s Facebook page

Code and Supporting Files PUBLISHED IN CIRCUIT CELLAR MAGAZINE • NOVEMBER 2023 #388 – Get a PDF of the issue

 Keep up-to-date with our FREE Weekly Newsletter! Don't miss out on upcoming issues of Circuit Cellar. Subscribe to Circuit Cellar Magazine Note: We’ve made the May 2020 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine. Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.  