Controlling electronic devices with hand gestures may seem like the stuff of science fiction. But the technology is easily available today, even for MCU-level embedded systems. Learn how these three Cornell students built a motion/gesture-controlled speaker using sensors, a computer and a Microchip PIC32 MCU. With hand gestures, the system lets you control the volume, play/pause and change songs by skipping forward and backward.
Our Motion-Controlled Speaker project is an application that uses non-contact sensors to control the audio output from a speaker, based on motion patterns that the sensors detect. This project idea originated when we began discussing innovations that would be of interest to us. We immediately took a liking to this idea, because we could see it being implemented into products in the near future. We all have personal interests in music and in working on something that could be built into different products, such as smart watches or other similar smart products with streaming capabilities. Our objective was to build a prototype of this type of technology, using Sharp GP2Y0A21YK0F IR sensors from Pololu, the PIC32 microcontroller (MCU) and another main component that would be based on whether we decided to stream the music or to play downloaded music from a device with memory.
After much research and trying out various methods, we decided to use a Raspberry Pi 3B embedded computer board as a device for the playback of songs. The final product, using the IR sensors, PIC32 and Raspberry Pi 3, was a working prototype that was able to pause and play songs, turn the volume up and down and change to the previous and next songs—solely based on hand motions. The schematic of the project is shown in Figure 1.
A significant logical part of our project was the communication between the Raspberry Pi and the PIC32 MCU, through the utilization of the UART hardware on each device. The serial port between the two was set up at a baud rate of 115200 bps—the fastest speed that the serial port can transfer data between the two computers. There was an optimization trade-off that we had to consider. We knew that using this baud rate allowed us to send the greatest amount of data at a fast enough rate, but with a higher chance of data corruption or data loss. Fortunately, we didn’t observe any of these errors, so we chose to continue using the highest speed possible.
The baud rate is the speed at which bits can be transferred, so bytes of data can be sent at a maximum rate of 11,500Hz. With that in mind, we had to downsample most tracks of music, which originally were sampled at a rate of 44,100Hz. We used Mathworks MATLAB code provided by a Cornell professor to downsample the tracks to a quarter of that frequency, or 11,025Hz, which was the greatest rate we could obtain that was below the maximum. This decreased the quality of the relayed music, but it was still clear and enjoyable.
The structure of our project is as follows: The first part is resetting the PIC and running our code on the Raspberry Pi 3. Once the PIC has finished resetting, it sends a ready signal to the Pi. The Pi receives this signal and then begins to process and send the music data that has been prewritten onto it, byte by byte. These bytes are received by the PIC and stored into two buffers, where one receives the data, and when full, starts playing. While this buffer is playing, the other buffer continues receiving data where the previous left off. When one of the buffers is full, it sends a signal that it is ready to receive more data. Figure 2 is a logical structure diagram illustrating this process.
We made sure to follow the typical multi-processor communication protocol that was relevant for our purposes. Because we were using the UART hardware present on the PIC and the Pi, we followed the RS-232 standard. Unlike other motion-activated audio emitters, our project is more focused around using specific motions to control a sound or music playback device, with different patterns of motion producing different results.
The PIC32 does not have enough memory (only 128KB of flash memory) to store full songs. So, we used the Raspberry Pi 3 to store music and stream to the PIC32, because its memory is limited only by the size of its memory card, and it is also a high-performance device for its price. The serial communication between the PIC and the Raspberry Pi required connecting the UART transmit pin on the PIC to the UART receive pin on the Raspberry Pi and vice versa. We also ensured that they share a common ground.
This sensor array acts as the interface between a user and the music streaming system that we designed. We screwed the sensors to a small rectangular wooden board to keep them stable and make it easy to control the program using hand gestures. We assembled the sensors in a diamond formation (Figure 3). This formation makes it easy to control the flow of the music by simply holding a hand over different combinations of sensors to perform different actions. For example, holding a hand over the bottom and top sensor pauses or resumes playing the music. The diamond formation also makes it possible to add a swiping feature to our system in the future, such that swiping across the sensors from left to right will switch the music to the next track.
The main software component of our motion-sensor speaker system consists of two threads: an interrupt service routine (ISR) on the PIC and a Python program on the Raspberry Pi. The program continuously executes until it is terminated. For the entire system to run successfully, the threads, ISR and Raspberry Pi program must be synchronized with each other and communicate efficiently. The sensor thread reads the analog input from the analog IR distance sensors and controls the state of the system based on these data.
The serial thread’s main function is to spawn another thread that reads and sends data through the UART module based on the state of the system. The ISR processes data received through the UART and outputs the processed data through the digital-to-analog converter (DAC). The serial communication program that runs on the Raspberry Pi loads the music header files and sends these data through the UART. This program also receives data through the UART from the PIC that affect the state of the program.
For testing, we used a MATLAB program to make WAV files. This program converts downloaded WAV files to C language header files that can be outputted through the DAC once transmitted to the PIC. This program first reads a WAV audio file specified at a certain location on the computer. The WAV files have a sampling frequency of 44.1kHz, which is too fast for our system to play, so the program down-samples the audio by a factor of four.
This allows the audio to be played at a sampling frequency of about 11kHz. The samples then have to be scaled so they can be played by the 12-bit DAC. These converted audio samples are then stored to the header file, with enters between samples. After being converted to a header file, the music is ready to be loaded into our program to be played.
The first PIC32 thread begins by reading the first four channels of analog-to-digital converter (ADC). The ADC converts the analog output from the four IR distance sensors to a digital format that is then stored in variables (
adc_12). We set a minimum threshold of 400 ADC units for a sensor reading that counts as a valid detection. We found this to be an ideal threshold through trial and error. If the threshold is too small, then you have to hold your hand too close to the sensor for motion to be detected. If it’s too large, then objects that are far away may be unintentionally detected.
We implemented a counter for each sensor to keep track of how long a hand is being detected by a sensor. The corresponding counter is incremented with every consecutive iteration of this thread during which a hand is still being detected by the same sensor. If a sensor no longer detects a hand, then its corresponding counter is reset to zero. These counters are a form of debouncing the sensors. For example, if someone quickly waves a hand over a sensor by accident, it will not be acknowledged by our program.
These counters act as the control signal for the state of the music playback. They also signal actions that should be done to the playback. The two states that our system can be in are “play” and “pause.” When someone holds a hand over the bottom and top sensors, one state is switched to the other state. In other words, the state switches from play to pause or pause to play. We found that if the top counter is equal to 3 and the bottom counter is greater than 1, it is a solid enough sign that someone is attempting either to resume playing music or pause the music. We discovered that when we set the condition for both counters to be equal to the same value, the switch of states was inconsistent.
The volume of the music can be turned up or down when a hand is held over the top or bottom sensor, respectively. To adjust the volume, the counter corresponding to the sensor must be equal to 2, and the sensor opposite it must be less than 1. We added the “less than 1” condition to differentiate this action from changing the play/pause state. A variable for volume is then decremented/incremented based on which action is signaled.
We also implemented an action state for switching to a new song. When a hand is held over the right or left sensor, the next/previous track should play. This thread sets the action variable to next or previous track if either of these counters is equal to 2. The desire to switch tracks is later signaled to the Raspberry Pi by the serial thread.
The second PIC thread spawns another thread that communicates with the Raspberry Pi by sending and receiving data through the UART’s Tx and Rx pins. Each byte of data is received by the PIC and stored into one of the buffers, while the other one is being read in the ISR. We use two buffers to ensure constant playback, since one buffer constantly receives the data while the other relays that data to the DAC. This is more efficient and necessary so that data can be received and written at the same time that data are being played. It enables a seamless transition between bytes of music data that are transferred. The spawned thread is not killed until the current buffer being written to has been filled with the latest 8,000 samples of the transmitted music data.
In the spawned thread, we continuously check the state of our system and send a signal to the Pi based on this state. When the state of our system is in “play,” we continuously receive data from the Pi by first sending it the ready signal. And when the state is in “pause,” we stop sending the ready signal, which stops the transfer of data—but we make sure to save the spot that we stopped at on both ends. When switching to the next or previous track, we output the corresponding signal that basically informs the Pi to begin outputting data from the next or previous set of data that was downloaded. Each set is given a corresponding number on the Pi side.
The next or previous signal is sent only once, then we return to continuously sending the ready signal, so that the PIC, in turn, receives and plays the song immediately. This makes the entire song-switching process occur in real time. It is important to clear either buffer when necessary, such as when switching songs, so that playback of different song data does not overlap—an issue that we encountered briefly.
INTERRUPT SERVICE ROUTINE
The ISR formats the samples in the serial buffers for the DAC, before transmitting them through the second SPI channel and outputting through the DAC. This ISR is triggered by a timer interrupt at a rate determined by the sampling frequency of the music playback. We set the timer to trigger an interrupt every 3,628 cycles (1/11,025Hz) (PIC32 clock freq./sampling frequency). Each time the ISR is executed, a new sample is sent through the SPI channel to the DAC, only if the state of the system is in “play.” Before the sample is transmitted through the SPI port, the sample is manipulated based on the state of the system and requirements for the DAC. The sample is first converted to an integer, before being left-shifted by a number determined by the volume variable.
A greater left shift creates a larger value, which consequently makes the sample louder. The most that each sample can be left-shifted is four. That’s because the samples are 8-bit values, and the DAC only supports 12 bits. The samples cannot be left-shifted by less than zero, because that would result in the loss of some of the sample’s data. The shifted sample is then added to 2,048 to increase its amplitude, to maximize the potential of the DAC. Before being written to the SPI channel, the sample is OR-ed with the DAC A configuration bits. This step tells the SPI to send the sample to DAC A.
The buffer that is not being written to at the time is read and sent through SPI to the DAC. Once this buffer’s samples have all been transmitted to the DAC, the ISR switches the buffer state. This indicates that this buffer should now be written to, and the other buffer should be read from. This switching of buffers allows for continuous playing of music, because samples from the Pi are always being received and stored at the same time that the PIC is outputting these samples.
This program’s main purpose is to transmit music samples to the PIC using serial communication. It reads and writes serial data through the UART module on the Raspberry Pi. The program begins by initializing a serial writer and reader, to send and read signals through the UART. We set the baud rate for this communication to 115,200bps—fast enough to transmit 11.5 thousand samples per second). We then read the header files into variables, which we convert to integer format. One final conversion is then performed on the data: conversion to byte arrays format. At this point, the music samples can be transmitted through the Raspberry Pi’s UART transmit pin.
Once the initial procedures performed on the data are completed, the program enters an infinite loop and begins reading the serial input. If an “A” is received, then a flag is set to indicate that the PIC is waiting to receive data. Once this flag has been set, the next 8,000 samples of the current song being played are transmitted through the Pi’s transmit pin to the PIC.
If the song is finished being transmitted, then on the next iteration, the next song starts being transmitted to the PIC. If an “N” (or “P”) is received instead, the next (or previous) song read by the program starts to be transmitted. Our current Python serial communication program transmits only three songs, but this program can be expanded to transmit many more songs simply by converting and loading more WAV files onto the Raspberry Pi’s memory. It will also require duplicating much of the code.
RESULT OF DESIGN
We were pleased with what were able to achieve with our project. It covered all the bases of what we had initially aimed to do. When we loaded three songs through the Pi, we were able to pause and play a song, skip forward to the next song, skip back to the previous song and control the volume. The Pi read properly from the serial interface, and did not start transmitting the music until it received permission from the PIC. The PIC read the inputs from the distance sensors, and used that information either to control the volume level, or tell the Raspberry Pi to stop transmitting or change the song it was sending. All of this was performed quickly and smoothly, and—most importantly—in real time. We were able to exhibit all of this during our demo. A YouTube video of our project demo, (Figure 4) is shown below:
FIGURE 4 – YouTube video demonstration of our project.
The design also showcases all the things we considered throughout the development of the speaker. By positioning the sensors on a board similar to a remote and in an efficient manner, we ensured that each gesture will be correctly interpreted. This design as a whole is preferable and useful because all it takes is a simple hand gesture over the apparatus to control the user’s music.
A more advanced version of this prototype could be useful in many situations. The primary and most popular use would be providing a fun, innovative and relatively effortless method in which users can interface with their devices. However, there are also some serious applications for this technology. For example, people with impaired vision could benefit greatly from this type of gesture technology. If the motion sensing speaker were attached to the user’s wrist—perhaps as part of a smart watch application—the user could switch between songs or perhaps pages of an audio book without needing touchscreen buttons. The user would simply wave a hand over the screen in the desired direction to switch or flip pages. Those with mental disabilities could possibly benefit from this technology as well, because they might find it easier to use certain gestures and hand motions, rather than the typical button inputs required to interface with devices. Overall, we can see many amusing and functional uses for a more advanced version of this prototype.
On the whole, our final product worked quite seamlessly and met, if not exceeded, our expectations. We had to deviate from some initial plans as we progressed with this project. But, in the end, we fully achieved our goal of having a quality speaker system that could be motion-controlled by hand gestures.
There are a few supplements we could add in the future to further bolster our system. It would be desirable to develop a method of streaming the audio WAV file directly on the Raspberry Pi. This would preclude the lengthy process of the MATLAB header conversion for each song. Another improvement would be refining the gesture-detection software, so that users could perform more engaging motions, such as swiping up to raise the volume, and swiping down to lower it.
An additional interesting feature that we talked about was installing a microphone that could “listen to the room.” It would adjust the volume of the music based on the background noise present in the current environment. However, this might not be feasible or desirable in some situations. Our final, most advantageous improvement would be the ability to play music from a streaming application, such as Spotify or Apple music. This would make our system a lot more useful and popular. All in all, these improvements would be nice additions to our system, but we are very happy with our current final product.
PIC32 hardware manual section Datasheets:
IR Distance Sensor Datasheet:
Raspberry Pi 3 Model B Datasheet:
PIC32 Peripheral Libraries for MPLAB C32 Compiler:
Protothreads for PIC32:
Serial Communication Example:
Python Serial Library
Raspberry Pi 3 Model B serial communication:
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JANUARY 2020 #354 – Get a PDF of the issueSponsor this Article
Jidenna Nwosu is a Cornell University graduate who majored in Electrical and Computer Engineering and Information Science Engineering. Jidenna is looking to work as an embedded software/hardware engineer.
Benjamin Francis is a Cornell University graduate (May 2019) who majored in Electrical and Computer Engineering. He is currently work as a Systems Engineer at L3Harris Technologies.
Ayomi Sanni is a Cornell University graduate (May 2019) who majored in Electrical and Computer Engineering. Ayomi is currently looking to work as a software engineer.