Using Wavelet Transform
Microcontrollers are perfect for systems that need to process analog signals such as audio and do real-time digital control in conjunction with those signals. Along just those lines, learn how these two Cornell students recreated the classic arcade game “Dance Dance Revolution” using a Microchip PIC32 MCU. Their version performs wavelet transforms to detect beats from an audio signal to synthesize dance move instructions in real-time without preprocessing.
We designed a version of the traditional arcade game, “Dance Dance Revolution,” that synthesizes dance instructions from any audio source using the PIC32 MCU. Unlike the original game, in which users must choose from a pre-selected list of songs, our system allows users to plug-in their audio device and play songs of their choice. The dance move instructions are then generated in real time by buffering the audio and processing it, using the discrete wavelet transform.
We were inspired by a mutual desire to work on a music-related project, and both of us had fond memories of playing this kind of game. We also wanted to add some sort of novel, interesting component, so we brainstormed the idea of allowing the player to play whatever song he or she wanted. The game is much more fun when the song playing is your favorite tune. All versions of the commercial game have pre-programmed song libraries, so replay value is limited. Our version has no such limitation. The discrete wavelet transform was selected as a processing method because we needed both time and frequency resolution. We also needed a computationally efficient algorithm.
The system requires two kinds of user input: an audio source and button presses from the dance mat floor tiles. The audio input must be processed, so it needs to be delayed or buffered until the processing is complete. Another reason for the delay of the audio output is to give ample time for the user to react to the instructions created from processing the audio. In contrast, the user input needs to be in real time. We use two PIC32s to do the input processing—one detects beats and reads the dance mat input, while the other buffers audio. We use a macOS application to display the beats and handle scoring.
We built a custom dance mat for the game, consisting of five individual tiles that could each detect when players put their weight on (Figure 1). They needed to be both durable and sensitive to pressure. To achieve this, we used force sensitive resistors wired in parallel. These were polled at approximately 20 Hz for changes in resistance. These resistors were placed directly between the tiles—which were made out of canvas covered boards—and the supports that raised them off the ground.
We used a PIC32 development board designed by Sean Carroll, with an DAC socket and GPIO pins brought out, to provide flexibility for development . We also used a second PIC32 on a smaller development board, with connections to the floor tiles and the Serial to USB cable. The floor tiles were wired underneath to a protoboard, and all-important signals were fed up to our main protoboard using a ribbon cable. Figure 2 shows the schematic of the system, incorporating Sean Carroll’s full-size and small PIC32 development boards.
We soldered our audio circuitry on a protoboard to make it easier to debug and to reduce noise. Our audio input jack fed into a 500 µF capacitor to cut out any DC component, then we fed it into an offset circuit, such that the ADCs could read it with no clipping. The DAC output was fed directly to an audio jack and speakers.
Another PIC32 MCU handled audio buffering, because SPI communication with both a 128 KB serial SRAM and DAC took too many cycles to perform the necessary signal processing simultaneously. We used our professor Bruce Land’s code for the SRAM chip for reading and writing to the SRAM and writing to the DAC . His code included some read/write methods, and handled the SPI setup and mode changes. We had to add code to read from the ADC in a timer interrupt at 40 kHz, write to a location in the SRAM, and finally read from a different location and write that value to the DAC. The locations written to and read from were incremented each time, to create a loop around the SRAM memory locations. To change how long we wanted to buffer the audio, we just needed to change the values of MAX_ADDR and MIN_ADDR. The closer together they were, the smaller the range of the SRAM we used. This was important, because using the entire SRAM gave us a buffer of about 3.3 seconds, and we wanted only about 2.5 seconds.
The major consideration that affected our tile construction was a desire for resiliency. Because users would probably stomp on each of the tiles fairly hard, we wanted to make sure that our press detection system could withstand a lot of force. We also wanted a simple, easy solution to mock-up and build.
Initially we looked into using strain gauges, but they would require mounting to a base plate and the tile to be pushed. Traditional buttons did not seem like a robust enough option. Instead, we decided to use force sensitive resistors (FSRs). Initial testing revealed that the unpressed FSRs had resistance of approximately 6 MΩ. When pressed, it was approximately 1 kΩ. This huge discrepancy made it easy to probe it for a press. We thank Interlink Electronics, who were gracious enough to donate 10 FSR402s for use in our project.
Because the FSRs were small in area (less than 1 square inch), we added two to each tile to increase the robustness of detection. We used a plywood plank as our first version of the tile, but found it was too heavy. We then used the canvas boards that were significantly lighter. We hot-glued 1/4-inch nuts to each of the FSRs, and placed them near the center along one diagonal of the board. We then hot-glued washers and nuts to each of the four corners. As a result of this setup, the resistors typically had no force applied to them, because they barely rested on the floor. However, once depressed, the board bent slightly, and the FSRs were squeezed. We wired the two resistors in parallel, then in series with a 10 KΩ resistor. Running 3.3 V across the setup and measuring the voltage at the intersection point gave us voltage levels of basically 0V/3.2V when unpressed and pressed, respectively, which meant we could just wire up each tile to a digital pin. We constructed four of these, wired them up to a protoboard hidden underneath the center tile, then fed a ribbon cable up to our main protoboard.
It was critical that the audio processing was completed quickly and allowed us to distinguish beats from the audio signal. We were inspired to look at wavelet transforms by a class Michael took, and found a proof of concept in a paper on beat detection . Wavelet transforms are a way to assess quickly the frequency components of a signal, with the best possible time resolution at each frequency (in contrast to a typical Fourier transform). They are attractive for an embedded systems application like this one, because they are relatively computationally inexpensive, for reasons explained later.
We specifically use the discrete wavelet transform (DWT). It is performed on a discrete time signal using a simple non-periodic function known as “the mother wavelet, Ψ” (Figure 3). There are many different kinds of mother wavelets, but the simplest is the Haar wavelet—essentially a square pulse, partially reflected along the x-axis. When the Haar wavelet is used, the math to compute each layer of the DWT is fairly simple, requiring only a few multiply-accumulates (MACs). Moreover, these calculations are recursive and extremely efficient.
The DWT outputs a series of coefficient values, which correspond to the energy of the signal at various frequencies. A signal of length L = 2N will have N sets of coefficients. Each set of coefficients is calculated recursively from the previous set, so each has half as many elements as the previous—the first set has L/2 elements. Together, these can be used to reconstruct the signal, or, in our case, to get a general understanding of its frequency components. Each layer of the transform—that is, each set of the coefficients—represents a different scale of frequencies. The lowest layers of the transform, which have the most elements, represent the highest frequencies, and vice versa.
Our initial idea was to perform the wavelet transform, and then set threshold values for each set of coefficients. If the transform produced a value above those coefficients, a beat would be detected. Our final implementation incorporated active averaging to set dynamic thresholds for these detections.
We adapted the GNU Scientific Library, an open source scientific computing library, for use in our project. It had documentation for a variety of wavelet functions written in C, including the DWT. We had to make several modifications, though. We stripped the library of everything we did not need (2D wavelet transforms, wavelets besides the Haar wavelet, needless libraries and so on). The top priority was to modify GSL’s DWT to no longer allocate any memory dynamically, because the PIC MCU does not have heap space defined. Instead, our implementation allocates the memory beforehand, by declaring global variables of the necessary sizes to compute the DWT. The DWT is always computed on an array of length 2,048, so arrays of constant sizes can be used. We converted all the math except constants to integers, because having a large dynamic range was not a priority.
We performed the DWT approximately 10 times a second, so that we could get pretty good resolution for beat detection. Sampling at 40 kHz, this worked out to about 4,096 samples for each transform. The function requires an input array of length that is some power of 2. We downsample by 2, giving us 2,048 samples and 11 layers of coefficients. When a buffer of samples fills, we trigger the DWT computation. GSL’s DWT returns the coefficients in one large array, so we defined an absmax function that returns the maximum absolute coefficient value in a specified range. This allows us to find the “peak intensity” of each coefficient layer.
We then take those peaks and compare them to our thresholds. Figure 4 shows the different levels of recursion of the DWT, performed on a frequency sweep. We have two sets of thresholds for each coefficient layer, one active and one passive. If the absmax value is greater than active plus passive thresholds, a beat is detected. The passive thresholds were set due to testing trial and error. The active ones are essentially a running average of the past four or so absmax values from that coefficient layer, implemented through an approximation of an RC filter ((old_val – new_val) >> 1) + ((old_val – new_val) >> 2) + new_val. We then send the detected beats to the laptop for the GUI and scoring. Each coefficient layer is assigned to one of four possible arrow instructions (left/right/up/down), and if a beat is detected in that layer, that particular arrow is sent. This creates a pleasing level of variance in our gameplay.
PIC32 Software Architecture: The audio-buffering PIC32 was built around a simple timer interrupt that fired at 40 kHz, during which time it would read the signal from the ADC, write it to the SRAM, read from the SRAM and write the read value to the DAC. The SPI communication to serial memory took approximately 70% of available CPU cycles, because we could not use memory burst mode, requiring the use of a secondary PIC.
Our processing PIC32 consisted of a similar interrupt that also fired at 40 kHz and reading the ADC value. The values were fed into a static array in memory, and when full, a flag would be set for the wavelet transform to begin. In addition, this PIC32 also received user input from the dance mat to send to the macOS application. A protothread  ran every 60 ms, took any user input data or new dance instructions, and spawned a serial write thread, which used DMA to write to the serial UART buffer. Luckily, we had only four arrows, so we could send new instructions in 4 bits and user presses in 4 bits for a total of just 1 byte transferred over serial.
macOS Application: The macOS app reads data through serial UART. We first had to install a driver for the specific serial cable bought for the project. For the application to use incoming UART data, we used a framework called ORSSerialPort, developed by Andrew Madsen . This framework allowed us to use a common pattern in Cocoa development called the delegate pattern. We created an ORSSerialPort object and set the delegate to be our app’s view controller. Subsequently, an interface function, implemented as part of the delegate protocol, would be called every time there were new data to read. This made reading from the PIC very simple, as each time that method was called we knew more information had been sent from the PIC.
The application also displays the arrow sequences and user input, and this was done by adapting the simple CoreAnimation game library developed by Matt Gallagher . This library offers simple on-screen game sprites, centralized game data, and CoreAnimation layers for the game objects—all synchronized with NSTimers and Key-Value Observation. The game data starts a timer upon a new game, which calls update functions for each game object currently in the data. Each time the object updates its properties, the value change notifies the game sprites’ attached CoreAnimation layer to update on the screen. The library was adapted for our arrow sprites, and the game data now starts a new game with outline arrow and a score of zero. We also modified how sprites were removed from the screen by using CATransactions, rather than abruptly removing it, which had caused an odd lag as the arrows off screen were removed.
Combining these two libraries, we were able to make a simple display for our game, using open source images from the Internet and colorful gradients made in photo editing software. We also added some game logic for scoring, which was just an inversely proportional function relative to the distance between the dance arrow instructions and the user input arrow display. The score would sum up as users correctly timed their killer dance moves.
The final product is exciting. The audio buffering worked extremely well, user input was clearly visible on the screen, beat detection worked, and the macOS app accurately kept track of the score. Our floor tiles were somewhat difficult to use, because they felt fragile to users who stepped on them. To prevent anything from breaking, much of the final testing was performed by stepping on the tiles while standing next to the system—essentially being as delicate as possible. The game was definitely functional, though. Both authors found the game to be exceptionally difficult. Figure 5 shows the macOS application in action.
The beat detection algorithm was remarkably successful. We tested using a variety of different music and different genres. Once the algorithm was implemented on the PIC and arrows were being sent to the monitor, we had a simple trial-and-error system of testing. To calibrate timing, we would play one drum beat, then measure the difference in time between when it played on the speakers and when the generated arrow reached the top of the screen. We adjusted the buffer size on the PIC and the speed of the arrows on the Mac to achieve synchronous behavior. Once this was done, we simply tried to play as much music as we could. Early attempts at this revealed that preset thresholds were not effective at doing beat detection among many different types of music. We then implemented an active averaging system to detect beats, which was significantly more versatile.
Our algorithm is highly effective at detecting beats and generating interesting arrow patterns when distinctive low-frequency components in the music manifest themselves in short beats. Playing classic rock and other music with loud, repetitive drums (AC/DC’s, Highway to Hell ; Darius Rucker’s, Wagon Wheel ) results in clear beat detection as soon as the drums kick in. Other sounds (guitar modulation, vocals and so on), result in variations in the arrow sequences. Our system also performs well with electronic music (Mr. Finger’s, Mystery of Love ; LCD Soundsystem’s, Dance Yrself Clean ). These songs also have distinctive drums or quick bass notes that make up the beat of the song. Analog drums are fairly constant in tone from hit to hit. But because the bass notes here are electronic and can actually have a melody, they can produce really interesting, fun-to-play patterns.
As we expected at the onset of the project, certain genres of music did not work well for our system. We had two different types of failure cases. In the first case, no beats were detected. This occurred when a piece of music was slower and had few instances of “spikes in the signal,” so to speak. The Discrete Wavelet Transform essentially detects high energy transitions that look like the mother wavelet, and these are much less common in songs without modern bass. Classical music therefore fails to have beats detected. We also had poor results with jazz (Louis Armstrong, Dave Brubeck).
The other failure case we encountered was when too many beats were detected. This occurred primarily when bass notes were too long, or the music simply had “too much going on.” We consistently failed to generate anything remotely usable when playing hip-hop music, because long 808 bass notes caused our beat-detection system to freak out, generating far too many notes. Quick hi-hats also resulted in many beat detections. This oversaturation made the game unplayable.
Despite the quirks in our final result, we were extremely pleased with our project as a whole and the promise it showed for games of this style. We were happy with the sensitivity achieved with the averaged thresholds based on the results of the wavelet transform, and how the variations in the music could provide variations from the bassline to make the game more interesting.
References:S. Carroll, PIC32MX250F128B Development Boards,
 B. Land, SPI to 23LC1024 serial RAM & DAC,
 G. Tzanetakis, Audio Analysis using the Discrete Wavelet Transform, Princeton
 A. Dunkels & B. Land, Protothreads with modifications,
 A. Madsen, ORSSerialPort, https://github.com/armadsen/ORSSerialPort
 M. Gallagher, Core Animation Library
 AC/DC. “Highway to Hell”, Highway to Hell, 1979, Spotify,
 Darius Rucker. “Wagon Wheel”, True Believers, 2013, Spotify,
 Mr. Fingers. “Mystery of Love”, Amnesia, 1989, Spotify,
 LCD Soundsystem. “Dance Yrself Clean”, This is Happening, 2010, Spotify,
GNU Scientific Library, https://github.com/ampl/gsl
Land, ECE4760, http://people.ece.cornell.edu/land/courses/ece4760/
Dunne & M. Solomentsev, Dance Dance Evolution, 2017, http://people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2017/asd222_mys29/asd222_mys29/asd222_mys29/index.html
FSR402 Force Sensitive Resistor
Interlink Electronics | https://www.interlinkelectronics.com/
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • NOVEMBER 2018 #340 – Get a PDF of the issueSponsor this Article