CC Blog Projects Research & Design Hub

A Picture-Sound Association Game for Pre-Readers

FIGURE 1 TI's Speak & Spell was the first use of artificial speech to create an interesting educational toy used to teach spelling.
Written by Jeff Bachiochi

STEM or Toy?

Educational toys using synthesized speech, such as TI’s Speak and Spell, have been around for nearly 50 years. For this month’s project, Jeff developed a portable association game for pre-readers, using an M5Stack (based on the ESP32 microcontroller). Players match pictures with sounds or verbal descriptions, or sounds with pictures, pertaining to “themes” chosen by the developer. Instead of text to speech, mp3 files of sounds and speech are used.


One of the first tech toys I bought for my kids was the Texas Instruments Speak & Spell (Figure 1). It was the first educational toy that used a speech synthesizer, and you can still purchase a second-generation knockoff with a membrane keyboard for less than $30. The synthesized voice sounded like the Cylons, a race of robots from the TV show Battlestar Galactica, though crude, linear predictive coding (LPC) is actually a close approximation of the way our vocal tract (nasal cavity, oral cavity, pharynx, and laryngeal cavity) creates speech.

FIGURE 1
TI's Speak & Spell was the first use of artificial speech to create an interesting educational toy used to teach spelling.
FIGURE 1
TI’s Speak & Spell was the first use of artificial speech to create an interesting educational toy used to teach spelling.

For this month’s project, I wanted to create an educational game for children who cannot yet read. I wanted to use some speech to make the project more interesting. I mention the Speak & Spell because its characteristic speech quality is what you get when you do TTS (text to speech) on a microcontroller. For some, it’s difficult to understand—so much so for me that during my project development, I had to manipulate the text I inputted, from actual spelling to modified phonetic spelling, to correct some of the pronunciation. I considered the result so crude that it negatively affected the fun of the project.

There used to be a free online method to handle the TTS and provide an audio clip of the translation. Apparently Google Translate (and others) decided that this could provide an income stream, so it is no longer free (except for some very limited use). They are still available through physical text input/audio output on a webpage, but receiving an audio file through an HTML request is no longer free.

An MP3 audio file is not a problem to play on a microcontroller; however, you have to be connected to the Internet for this to work. I thought this might be an issue when carrying the project around from place to place, so I decided on an alternative: provide MP3 files of every thing that needed to be spoken. This is rather intensive to put together, but it could make the project more personal if the recipient can recognize your voice.

EDUCATION VS. FUN

Most teachers would agree that if you can keep a student’s attention, you can actually teach something. Once their mind wanders, you’ve lost them and potentially created some competition for your attention. While the attention span of youngsters seems to be a matter of seconds, they do enjoy being engaged in something new. The purpose of TI’s Speak & Spell was to aid in learning how to associate a word with its correct spelling. The device was limited to a single line of text and a rather mechanical voice. A better display and more natural voice would be a must for any project that might compete in today’s higher tech world.

I first bought an M5Stack (Figure 2) from M5Stack [1] about 5 years ago, It is a modular, stackable development kit, based on the ESP32 microcontroller. I knew this would be a useful product for several reasons. I believe the ESP32 has changed the whole microcontroller industry. It is a powerful, yet inexpensive MCU that includes Wi-Fi and Bluetooth peripherals. Take this micro and add a few useful external peripherals like an SD card, user I/O, buttons, LCD screen, speaker, USB, Lithium battery with charging circuitry, package it in a small enclosure, and you have an excellent and useful product.

FIGURE 2
M5Stack has taken my favorite micro, the ESP32, and added just the right peripherals to create a solid and useful solution to many applications.
FIGURE 2
M5Stack has taken my favorite micro, the ESP32, and added just the right peripherals to create a solid and useful solution to many applications.

While my desire here is to use this as is, the I/O is exposed and the package can be expanded, while still retaining its finished product appearance. Since my purchase, I see M5Stack has replaced the three push buttons with an LCD screen with touch input; but I will use my less expensive version for this project.

Here is the essence of my project. For those youngsters who, due to age or disability, are not able to read, this project will create an association game for learning the names of things. It will require the tot to match the picture of something and a sound associated with it. I’m grouping the things into themes such as Animals, Vehicles, and Jobs.

The idea will be to first choose a theme, contained in a particular file folder on an SD card. Each theme folder has two additional folders, Pictures and Sounds. The next choice determines the type of game—matching Pictures to Sounds or Sounds to Pictures. The first displays a random picture, and the child must determine if the random sound played is associated with the picture or not. In the second game, a random sound is played, and the child must determine if the random picture is associated with the sound or not.

PICTURES

By far, the most time-consuming part of this project is gathering pictures and sounds. Many can be found free if you attribute them with their origin [2]. I found the caricatures to be the easiest to find and most friendly, unless you wish to portray reality.

The M5Stack has a screen size of 320 pixels wide by 240 pixels high. All the picture files (JPG) must be sized to fit this display. As an example, let’s size a picture of a monkey to fit our screen. The two pictures I found to demo here are shown in Figure 3 (original and cropped photo) and Figure 4 (original options and resized cartoon circled in red). The first original was 364 × 600 pixels, the second was 5,970 × 4,124 pixels. I cropped the top of the first, and the second was a subset of the array of monkeys. I used Microsoft Paint on my PC to crop them to include the element I wanted at roughly the right aspect ratio. Then I used “resize” to get an exact width of 320 pixels. If you deselect the “maintain the aspect ratio” box, you can stretch or shrink the height slightly to force it to 240 pixels.

Sometimes the simplest drawings are the best. If you are an artist, go ahead and create your own pictures. Lots of color makes the images really pop on the M5Stack’s colorful display.

FIGURE 3
Here I've taken the image of a real monkey, cropped and resized it to fit on the M5Stack's display (320 x 240 pixels) [3].
FIGURE 3
Here I’ve taken the image of a real monkey, cropped and resized it to fit on the M5Stack’s display (320 x 240 pixels) [3].
FIGURE 4
I've cropped and resized a single pose from this menagerie of monkey caricatures to the correct size of the M5Stack display [2].
FIGURE 4
I’ve cropped and resized a single pose from this menagerie of monkey caricatures to the correct size of the M5Stack display [2].
SOUNDS

The M5Stack has an internal speaker that can be used to play audio. It can be used for “beeps” and “boops,” but can also play audio files. I’ll be saving audio as .mp3 files. Many audio files are already available in an .mp3. I found one that will work, but it was just a bit too long, so I edited it using Audacity, a free, open-source audio application [4]. I used this to edit the playing time of my monkey.mp3 audio file.

As shown in Figure 5, the original file was just under 15 seconds long. I wanted to limit the sounds to around 3 seconds each. Note the central section of Figure 5 that I chose is about 3 seconds long. I can isolate this section and save it as a mono channel .mp3. Note that the picture file and the sound file must be named the same (except for the extensions)—hence, monkey.jpg and monkey.mp3.

FIGURE 5
When it comes to editing (or creating ) audio files, Audacity [4] is the easiest to use and it's open sourced. I used it to crop all my audio to around 3 seconds .
FIGURE 5
When it comes to editing (or creating ) audio files, Audacity [4] is the easiest to use and it’s open sourced. I used it to crop all my audio to around 3 seconds .

You might wish to use Audacity to record your own sound files. I think I can make monkey sounds just fine, and that would add a personal touch to this project if it were a gift. You could also record some audio cues such as, “You are correct!’ or “Sorry, that doesn’t match, try again.” I’ll be using an “applause” audio file for right answers and a “booing” audio file for wrong answers.

STORING PICTURE AND SOUND FILES

There is room on an SD card for a whole lot of pictures and sounds, because the files are so small, so you can use the smallest SD card that you can find. You might not be able to find a microSD card smaller than 8GB, and you might need a microSD card adapter to allow your card to plug into your PC’s SD card slot. Some microSD cards come with the adapter.

To make this microSD card compatible with our program, you must first set up folders for every theme you want to use. I’m using three themes here, so their names will fit on a single screen without having to build a scrolling routine. Each theme folder contains folders for Pictures and Sounds. All your “Animal” theme files go in their associated folders. Note that you can call them anything, but the associated Picture and Sound files should have the same name before the file extension. I have two additional audio files in the root directory for responses: Good.mp3 for correct answers and Bad.mp3 for wrong answers.

The way to set up your microSD card is shown in Figure 6. Create separate folders for each theme, and separate Pictures and Sounds sub-folders for each theme. You can use as many picture and sound files as you’d like; the more the merrier. Just make sure there is one picture file for each sound file.

FIGURE 6
SD card folder/files. Create separate folders for each theme, and separate Pictures and Sounds sub-folders for each theme. You can use as many picture and sound files as you'd like; the more the merrier. Just make sure there is one picture file for each sound file.
FIGURE 6
SD card folder/files. Create separate folders for each theme, and separate Pictures and Sounds sub-folders for each theme. You can use as many picture and sound files as you’d like; the more the merrier. Just make sure there is one picture file for each sound file.
MULTI-PLATFORM SUPPORT

The M5Stack is compatible with UIFlow, MicroPython, Arduino, and .NET nanoFramework development tools [5, 6, 7]. I’ll be using the Arduino IDE V2.2.2 for this project. Links to information about the other tools are available in the Resources section of the Circuit Cellar Article Materials and Resources webpage.

Including the M5Stack.h library handles most libraries. However, to play .mp3 files, we need to add a few files from the ESP8266Audio library. You can install this library using the Arduino Library Manager, as shown in Listing 1.

LISTING 1
Code for installing the ESP8266Audio library, using the Arduino Library Manager.

#include "AudioFileSourceSD.h"		// retrieve the .mp3 file#include "AudioGeneratorMP3.h"		// the file format is .mp3#include "AudioFileSourceID3.h"	// strip off meta data from the file#include "AudioOutputI2S.h"		// Output .mp3 data via internal DAC (speaker)

Now that we have the support that we need for this project, let’s look and see what’s on the SD card. Although the routine actually reads all of a theme’s files, they are only displayed via USB (debugging port) if the showFiles bit is enabled in myDebug. Both picture and sound files are listed, so you can verify them. We only need to hang on to those files in a particular theme’s folder. So let’s see how this is displayed and how the user chooses one.

M5STACK USER I/O

Referring back to Figure 2, the M5Stack is shown running this application. The three theme folders on my SD card are Animals, Vehicles, and Jobs. These folder names are displayed under the page title, “Themes.” This is the largest font—a 24-point font set (32 pixels) that provides up to six lines of text. Here I’ve spread the text out, because I have only four lines to show. Note that I’ve only used a few command lines to display the title, “Themes;” however, the folder names use a little more code, since I use the color green to indicate which theme is highlighted (see Listing 2).

LISTING 2
Display group code. This is the themes display. The three push buttons, A, B, and C, can be thought of as Next, Select, and Exit. Next increments groupHighlight. Select chooses the highlighted theme. Exit restarts the game.

void groupShow(){  Serial.println(group [0]);  M5.Lcd.fillScreen(BLACK);    // Set BLACK to the background color.  M5.Lcd.setCursor(0, 60);  M5.Lcd.setFreeFont(FSB24);    M5.Lcd.setTextColor(TFT_WHITE);  M5.Lcd.println(group [0]);			// Display “Themes†.  //  if(groupHighlight == 1)  {    M5.Lcd.setTextColor(TFT_GREEN); // Set GREEN to the background color.    if(myDebug & showHighlighted)    {      Serial.println(group [1] + " Highlighted");       }   }  else  {    M5.Lcd.setTextColor(TFT_RED);        	 // Set RED to the background color.  }  M5.Lcd.println("   " + group [1]);		// Display first folder name....}

If you refer to the loop() flow chart in Figure 7, you’ll notice that all operations are based on button pushes. Button C always resets the game. Initially, groupChoice = 0: no theme has been selected, so all the themes found on the SD card are displayed. The first theme is highlighted, and Button A will change the highlighted theme without selecting it. Button B is used to select the highlighted theme.

FIGURE 7
Once I have added all my picture and audio files to my SD card, the M5Stack can access them as necessary. The main loop() function handles the game play by executing display or audio functions, based on the player's use of the three buttons on the lower face of the M5Stack (Figure 2). Button A is used to highlight a theme and Button B selects that theme. Once a theme is selected, the same buttons pick whether the player will match a picture with random sounds or match a sound to random pictures.
FIGURE 7
Once I have added all my picture and audio files to my SD card, the M5Stack can access them as necessary. The main loop() function handles the game play by executing display or audio functions, based on the player’s use of the three buttons on the lower face of the M5Stack (Figure 2). Button A is used to highlight a theme and Button B selects that theme. Once a theme is selected, the same buttons pick whether the player will match a picture with random sounds or match a sound to random pictures.

Once a theme has been selected, groupChoice has become theme 1, 2, or 3, depending on which was highlighted. The display now shows “’Match” and two choices for this theme, Picture and Sound. At this point, associationChoice = 0; that is, no association has been selected. Button A will change the highlighted association without selecting it. Button B is used to select the highlighted association.

Choosing the Picture association will present the player with a picture from the selected theme folder. The player must decide whether the short random audio file being played from the selected theme folder is associated with the picture. The player is then coached by the text “No Yes Quit” presented above Buttons A, B, and C. If the player chooses “No,” a new random audio file is played. If the player answers “Yes” and the association is correct, they get a positive audio file and win the round. If the association is incorrect they get a negative audio file and a new random audio file is played. Choosing the sound association is just the opposite, a sound file is played and the player must match an associated picture.

As you might imagine, there is a lot of flexibility in the kinds of pictures and sound files you choose. For instance, the Animals theme uses actual (or mimicked) animal sounds, whereas the Jobs theme has audio files of typical speech that one might use in that job. My Vehicle theme has audio files of typical speech that might describe the vehicle, though I could have used audio files of actual vehicle sounds (for example, siren, tug boat toot, or chugging locomotive).

ESP32—MY MICRO OF CHOICE

If I were to design a product like the M5Stack, there is little I would change. It uses the most widely recognized, new-age microcontroller, the ESP32. The ESP32’s sister product, the ESP8266, was originally designed as a Wi-Fi peripheral that you could add to your controller project using a serial port and simple “AT” commands. It wasn’t long, though, before it was discovered that its embedded micro could be reprogrammed, and it could handle your simple application and the Wi-Fi on its own (without an additional processor).

The next generation ESP32 filled in a few gaps, which made this processor more powerful and added Bluetooth to its repertoire. It now stands as my micro of choice. M5Stack (the company) has done a great job, by not only creating a product with all the right stuff, but also packaging it in a useful form. This relieves the dreamer from having to stuff the prototyping circuitry into a usable package. As you can see in Figure 8, the package is small and self-contained, and all the I/O is inconspicuously available as both male and female single-row connectors. An internal lithium-ion battery and charger make it a complete handheld system. The backplane allows the unit to be expanded to include other specific I/O or a larger battery. Each add-on is housed in a form factor that only increases the product depth, while retaining the original width and height.

FIGURE 8
With the M5Stack's case split open, you can see its basic circuitry and the I/O buss which allows one of M5's many compatible peripherals or to make use of the I/O brought out to the 4 edges of the lower section.
FIGURE 8
With the M5Stack’s case split open, you can see its basic circuitry and the I/O buss which allows one of M5’s many compatible peripherals or to make use of the I/O brought out to the 4 edges of the lower section.

Since the ESP32 already has Wi-Fi and Bluetooth capability, you can do so much with basic M5Stack. There are useful peripherals like Zigbee, HDMI, LoRa, and RS485, so you can do practically anything while still staying in this form factor. However, like many companies trying to expand their market share, M5Stack has branched out into other areas that may not even require a local display. These offshoots are designed with other markets in mind. I suggest you peruse their website [1] to see what I mean.

RANDOMNESS

I decided to use randomness to help make this project less predictable—so the same pictures would not always occur after one another, or the same sequence of sounds would not always follow the selection of a particular picture, for example. There are a few issues with using many random functions. Many “random number generators” are actually “pseudorandom number generators” (PRNGs); they generate numbers that only look random, but are in fact predetermined. When started, they always begin with the same sequence of numbers. To get around this, you can provide a seed number that will be used to produce a new but predetermined set of numbers. But, because digital computations will always produce the same results, you really want to add some physical process that is expected to be random. This might be as exotic as a white noise source or as simple as ambient light strength. An analog input is commonly used. It could be from a sensor or just sampling the analog voltage on an unused input, which generally floats to some arbitrary value. This is the approach I use here.

Once we have implemented a workable process, we have a new problem. With truly random numbers, every new random number should have the same possibility of coming up. Therefore, repeating the same number is possible. In fact, any particular number may never come up. In our case, once we have randomly chosen a picture or a sound, it might be possible to randomly pick an associated sound or picture to present and never come up with the correct one! The player could think things are broken or become disenchanted with the whole game.

I have only five items in each theme on my SD card, but what if there were 100? In my trials of the game, I’ve gone through many wrong associations before the correct one was found. Increasing this field to 100 choices would most likely make the game unplayable, since there would be a much larger field to choose from. Is there a way to prevent this from happening? There is, and it would also force the game to a conclusion.

We can force the “random” number not to repeat by creating a Boolean array of the size of our field. Each time a random number is chosen, the associated array bit is checked in the array, and if it is not set, then it becomes a legal choice. If a bit is set, that associated random number will not be used again. There is a slight logic flaw with this, because it doesn’t change the way the random number is selected. We might just keep getting this or other already picked numbers.

If we reduce the field of choices by one after every pick, we will always have a new (unpicked) number, because the choice would become an index into the array of unset bits. As the number of “unpicked bits” diminishes, so does the maximum random number we are picking from. This will eliminate the possibility of never being presented with the correct association.

SCORING

I didn’t want complicate the game play by including scoring. Instead I just use audio feedback based on a player’s “Yes” selection. Either the player is informed the answer is incorrect and prompted to continue, or the answer is correct and the player may try a new game. If you wish you can create a more exciting reward for successive correct answers. Be creative.

For those who need some running tally on their progress, you might like to keep track of the number of correct and incorrect answers, and present that after each game.

As engineers, we must be vigilant of falling into the trap of constant improvements. All of that should be decided before one line of code gets typed in. There is always too much to do and so little time. 

Project Video

REFERENCES
[1] M5Stack Website: https://m5stack.com
[2] Site: https://www.freepik.com/
     Image: wild-animals-set-with-nature-elements_1308-111630.jpg
[3] Page: https://en.wikipedia.org/wiki/Monkey
     Image: Bonnet_macaque_(Macaca_radiata)_Photograph_By_Shantanu_Kuveskar.jpg
[4] Audacity Website: www.audacityteam.org

SOURCES
UIFlow: https://flow.m5stack.com/
MicroPython: https://micropython.org
.NET nanoFramework: https://github.com/nanoframework/nanoFramework.M5Stac

RESOURCES
Audacity | www.audacityteam.org
Espressif Systems | www.espressif.com
M5Stack | m5stack.com

Code and Supporting Files

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • DECEMBER 2023 #401 – Get a PDF of the issue

Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.

— ADVERTISMENT—

Advertise Here


Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
 | Website

Jeff Bachiochi (pronounced BAH-key-AH-key) has been writing for Circuit Cellar since 1988. His background includes product design and manufacturing. You can reach him at: jeff.bachiochi@imaginethatnow.com or at: www.imaginethatnow.com.

Supporting Companies

Upcoming Events


Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2024 KCK Media Corp.

A Picture-Sound Association Game for Pre-Readers

by Jeff Bachiochi time to read: 15 min