Music Using an MCU
Gone are the days when even a basic music synthesizer was a bulky system requiring highly specialized design knowledge. These two Cornell students developed a portable music synthesizer using a Microchip PIC32 MCU. The portable system performs digital audio synthesis on the fly, and produces sounds that range from simple sine waves to heavily modulated waveforms.
We developed a small but powerful music synthesizer for musicians who want to make music on the go. Many current synthesizers are large and bulky, and contain a limited number of preset instruments. These limitations can obstruct creative moments and hinder experimentation. With that in mind, we wanted to create a portable device that can generate an infinite array of sounds. This way, we could provide musical inspiration for our users, wherever they may be. Our two main goals were to design compact hardware and create a powerful and flexible sound synthesis algorithm.
The system contains a full-octave (13 key) keyboard, and allows the user to play two notes simultaneously. We based the sound synthesis algorithm on frequency modulation (FM) synthesis, because it is simple to implement, yet it is capable of generating complex waveforms. The sound synthesis has 15 user-adjustable parameters, which could dramatically affect the resulting wave. These parameters are displayed on an LCD screen and can be adjusted with three rotary encoders that double as push buttons (Figure 1). The user needs only to plug in a power cable and either speakers or headphones to start experimenting with sounds!
An important consideration in our design was how the user interacts with the system. We wanted the user interface to be intuitive for a musician. We organized the 13 key keyboard in the shape of a traditional piano keyboard, starting at the C note. Additionally, we used rotary encoders for all variable inputs. These are used in many synthesizers and other musical devices to adjust parameters.
On our LCD screen, three parameters are displayed at a time, which correspond to the three rotary encoders. The 15 adjustable parameters were organized into four different categories or “screens,” which are: Main Settings, Waveform Designer, Main Envelope and FM Envelope (Figure 2). The user can toggle through these four screens by pressing down on two of the rotary encoders simultaneously. Individual parameters on each screen can be toggled through by pressing down on a single rotary encoder. That allows the user to easily cycle between the screens, and quickly find an individual parameter to adjust.
Our project is based on the Microchip PIC32MX250F128B microcontroller (MCU) . The PIC32 provided all the computational power necessary to interpret user input and produce the appropriate output. This MCU was integrated into a development board created by Sean Carroll , and the entire development board was enclosed within our project. Beyond the PIC32 MCU, several noteworthy hardware elements were integrated into our design. A full schematic of our system’s hardware is shown in Figure 3.
The first elements were the push buttons, which enable the user to play notes and navigate on the LCD screen. We used 13 push buttons for our full-octave keyboard . An additional three push buttons were built into the rotary encoders. To reduce the number of I/O pins required to handle 16 push buttons, we wired the push buttons in a 4 × 4 matrix configuration. A schematic of our push-button matrix, along with a diagram of how the push buttons were arranged in our final design, are shown in Figure 4. While the buttons were wired as a matrix, they were physically arranged in a more linear manner to represent a keyboard.
The four rows in the Figure 4 schematic were each connected to a separate port expander I/O pin configured as an output. The four columns were connected to separate port expander I/O pins configured as inputs. Each intersection represents a different push button. To detect button presses, logic-low pulses were sequentially sent onto the rows. The input pins connected to the columns continuously monitored for logic-low signals.
When an input pin detected a logic-low signal, the system knew a button had been pressed. To identify which button had been pressed, the system simply had to note which output pin was sending the logic-low pulse when the signal was detected at the input pin. Therefore, the button at the intersection of the row connected to the output pin and the column connected to the input pin must have been the button pressed.
In addition to the push buttons, we used rotary encoders to accept user input . Their primary function in our design was to control the adjustment of system parameters. As shown in Figure 2, each system screen displayed three parameters. Each parameter could be adjusted by rotating the corresponding rotary encoder through its 24 equally-spaced, angular positions.
The rotary encoders had three pins: two signal pins and one ground pin. For each rotary encoder, we configured two I/O pins as inputs, and connected them to the two signal pins, for a total of six I/O pins. To decode the output from these encoders, we connected internal pull-up resistors to each of the six input pins. This meant these pins would be at logic-high when the encoders were in an idle state. When the user rotated one of the encoders to its next angular position, each signal pin contacted the ground pin during the transition. The critical detail here is that the signal pins do not come into contact with the ground pin at exactly the same time. During a clockwise rotation, signal pin A comes into and out of contact with the ground pin before signal pin B. During a counterclockwise rotation, the opposite is true—B comes into and out of contact with the ground pin first. The state of each signal pin during rotations is shown in Figure 5.
The logic to distinguish between clockwise and counterclockwise rotations is fairly simple. When a transition from logic-high to logic-low is detected on pin A, the state of pin B is checked. If B is logic-high, the rotation is clockwise. If B is logic-low, the rotation is counterclockwise. Once a rotation has been detected and distinguished, it can be used to increase or decrease the corresponding system parameter. This type of encoding scheme is known as “quadrature encoding.”
Another important consideration was how to ensure that the system had enough I/O pins. To avoid running out, we added a port expander to our design . This port expander used SPI to communicate with the PIC32 MCU, and provided 16 extra I/O pins. We could read it about a million times a second, which was more than adequate for our purposes. We used it for all eight pins needed for our push-button matrix, and for the six signal pins of the rotary encoders.
There were three key elements for output in our design: an SPI DAC, an audio socket and a TFT LCD. The digital audio signals produced by our system were output to the DAC via SPI . The DAC, itself, was responsible for converting the digital signals into analog signals capable of being played on a set of speakers or headphones. Once converted, the analog signals were sent to a basic 3.5 mm audio socket. The user could then plug speakers or headphones into the audio socket and hear the synthesized sounds. We used a TFT LCD for visual output in our design . The LCD had 320 × 240 color pixels, and communicated with the PIC32 via SPI.
The last noteworthy hardware element of our design was the enclosure. Creating a fully enclosed system was another one of our primary design goals. We felt a custom enclosure was needed to make the system as compact as possible and to protect the internal electronics. We 3D-printed the enclosure, and made the design using Autodesk Tinkercad. To give an idea of the compactness of the final enclosure, the maximum dimensions were 18.8 cm × 10.6 cm × 2.3 cm, and some sections were smaller than these maximums. We were pleased to be able to enclose all system components in such a small package, because this made the system easily portable.
The software was organized into three major threads: The Sound Generation Interrupt Service Routine (ISR), the Input Thread and the Update Screen thread (Figure 6). The main focus of the software design was to optimize the execution time of each thread, since we were generating the audio in real time. Therefore, we had to make our FM synthesis algorithm very fast. Additionally, we used quick debouncing algorithms to accurately read the rotary encoder inputs. We also moved slow calculations with floating-point numbers and divisions to the slower Input Thread, which only needed to be updated on a millisecond scale within human reaction time. Finally, we updated the LCD screen only when necessary.
Sound Generation ISR: The integral part of our synthesizer is the fast Direct Digital Synthesis (DDS)  algorithm that is run in the Sound Generation ISR. The ISR is triggered by a hardware timer to run every 2,000 cycles, which results in a 20 kHz sampling rate. At this sampling rate, we can generate frequencies up to 10 kHz. Although humans can hear up to 20 kHz, we chose this rate because most people can’t discern differences in the 10-20 kHz range. Additionally, this allows our notes to range up to a C8 note (4,186 Hz), with room for harmonic frequencies. Every 2,000 cycles, the ISR reads all the current values of the various parameters, and performs quick, fixed-point arithmetic to calculate the next sample to output to the DAC over SPI.
But how do we calculate the next sample to output? To create a wide variety of sounds, we use FM synthesis . Our FM synthesis algorithm is based on two oscillators: a carrier and a modulator. The carrier wave corresponds to the main pitch of the note we want to generate—for example, 440 Hz is an A4 note. The modulator wave alters the frequency of the carrier wave, which creates harmonic frequencies and drastically affects the timbre of the note. To allow even more variation, we apply amplitude modulation to both the carrier and modulator, which affects the volume of the sound over time. This amplitude modulation is defined by an ADSR envelope , which stands for “attack, decay, sustain and release.”
determines the output of our FM synthesis. The frequency of the carrier wave function (wavec) is affected by the modulating wave (wavem). Both the carrier and modulator waves have amplitudes that change over time.
Therefore, to perform FM synthesis, we need to create two waves. Various waveforms (sine, square, sawtooth and noise) are stored in wave look-up tables. To choose the correct value in the wave table, we use 32-bit phase accumulators and phase increments for both the carrier and modulator. To generate a fixed-frequency wave, the constant phase increment is added to the phase accumulator every ISR call. A large phase increment steps quickly through the wave look-up table, generating a high-frequency wave, and a small phase increment creates a low-frequency wave. Using this method, we create the modulator wave. The carrier wave is generated in a similar way, but a scaled value of the modulator wave is added to the phase increment of the main wave. This is how the modulator wave changes the frequency of the carrier wave.
The use of phase accumulators/increments makes our sound generation very efficient. All the phase increments are calculated in a separate thread, since they involve slow float division. Therefore, the ISR only needs to do a quick integer addition, a bit shift and a look-up to an array. Due to the many user-adjustable parameters, the ISR also performs several multiplications to determine the output volume, modulator volume and ADSR envelopes. The values of these Params range from 0 to 1, and are GCC standard, 32-bit, fixed-point variables with 16 integer bits and 15 fraction bits . Fixed-point multiplication was used to decrease the execution time. Multiplication with fixed-point variables takes about 28 cycles, compared to about 55 cycles for float multiplication.
In addition to calculating the next sample to output, we also read the rotary encoder pins in the ISR. This was necessary because reading in a millisecond scale thread was too slow to capture every turn of the encoder. The rotary encoders are connected to the port expander, so we read the bits over SPI. However, rotary encoders are noisy, so we needed to use a fast digital debounce filter. Each time the ISR is called, the reading of the A terminal of the rotary encoder is shifted left into an integer value.
We continuously check if this integer is equal to -4,096, meaning there was a 1, followed by 12 zeros. This state ensures that we’ve received a single logic-high reading, followed by twelve logic-low readings . With that, we know the signal is stable and no longer bouncing. Once this check passes, we can tell in which direction the encoder was turned, based on the B terminal. As explained in the Hardware Design section, if the B terminal is high, the turn was clockwise. Otherwise, the turn was counter-clockwise. Finally, parameter values for the sound synthesis algorithm are updated based on these rotary encoder readings.
Input Thread: After sufficiently optimizing the ISR, we next focused on the Input Thread. This thread reads the push buttons so the correct notes are played, and also performs slower calculations like float multiplication and division. These tasks are grouped together, because they don’t need to be run as quickly as the ISR, but fast enough so the latency isn’t noticeable. Pianos inherently have a delay between when a key is pressed and when the hammer hits a string, so a latency of approximately 20 ms is acceptable to most musicians .
Because both the push buttons and DAC data are output through SPI, we need to create a critical section whenever we access the port expander. The critical section is enabled by turning the interrupt off for Timer2. We then read all the push buttons on the keypad matrix, and the push buttons connected to the rotary encoders. This information is then used to play specific notes or to change what parameters the rotary encoders affect.
Update Screen Thread: The last thread that had to be optimized was the Update Screen thread. This was especially important, because if the calculations took too long, the screen would visibly jitter and not update correctly. The screen is updated at 10 frames/s, or once every 100 ms. Drawing to the screen takes many cycles, so we optimize by drawing only what is necessary. To do this, we keep track of which parameters have changed since the last time the frame was drawn. If nothing has changed, we don’t update the screen at all, which saves a lot of execution time. Otherwise, we draw any necessary changes.
With these many optimizations, we were able to create a powerful and flexible system. Audio was generated in real-time with a 20 kHz sample rate. A user can play notes and make changes to the audio synthesis algorithm on the fly, with no discernable delay. The LCD screen also updates immediately to reflect any changes the user makes to the various parameters.
RESULTS & CONCLUSIONS
A video of our fully functional system is given in Figure 7. After we finished constructing our system, we needed to evaluate its performance. The first method used was to get visual confirmation of the waveforms produced, using an oscilloscope. Figure 8 shows the oscilloscope screen capture we obtained for a sine wave produced by our system, and a screen capture of a Fast Fourier Transform (FFT) for an A-440 note. Clearly, the system produced a clean sine wave with only small amounts of noise. The FFT shows that the magnitude of the fundamental frequency was about 30 dB greater than the magnitude of the harmonic frequencies. The oscilloscope screen captures also confirmed that the system produced square waves and sawtooth waves with very little noise.
We should mention here that another consideration in our design was whether or not to include a low-pass filter between the SPI DAC and the audio socket. It would have been relatively easy to add a simple RC, low-pass filter into our design, if we felt there were too many high-frequency components distorting the output. However, by considering the waveforms and FFTs observed on the oscilloscope, we determined that the output did not require a low-pass filter.
The next evaluation method was to quantify the frequency accuracy of our system. We used the FFT function on an oscilloscope to confirm the frequency content of our outputs. These oscilloscope readings showed us the fundamental frequency of each tone produced. We measured the fundamental frequencies of all 13 notes in an octave, and compared them to their target frequencies. By doing this, we found that the system’s average percent error between the target frequencies and the actual frequencies produced was only 0.423%. We were very pleased with this level of accuracy.
We also wanted an estimate of our maximum CPU load, to give us an idea of how efficiently our code ran. To do this, we calculated the number of cycles the system took to execute the Sound Generation ISR, and saved this value in a variable. As stated in the Software Design section, this ISR was configured to run at 20 kHz. This meant a timer interrupt triggered its execution every 2,000 cycles. By printing the variable on the LCD, we found that a maximum of 1,200 cycles were needed to execute the ISR. This maximum execution time occurred when two notes were being played simultaneously. Our maximum CPU load was roughly 1,200/2,000 or 60%.
SCREEN FRAME RATE
The final method of evaluation was measuring the frame rate of the screen under various system conditions. Our goal was to maintain a frame rate of at least 10 frames/s. The maximum frame rates for an assortment of system conditions are summarized in Table 1. Our desired frame rate could be met for every condition, except when changing between system screens. However, these screen changes occurred so quickly that any frame rate lag was almost undetectable by the user.
By using these methods to evaluate the system’s performance, we were able to confirm the desired functionality quantitatively. Overall, we achieved many of our initial goals. We created a system capable of producing a wide variety of high-quality, user-generated sounds. We believe the user interface was easy-to-use, and it demonstrated immediate responsiveness to user input. The system also was easily portable and simple to start up.
With all that said, extensions to the project are also possible. For example, the push-button matrix in our design was subject to something known as the “ghosting problem.” This prevented the user from reliably pressing more than two buttons simultaneously. One solution to this problem involves adding a diode at each intersection in the push-button matrix. This would enable the user to play chords by pressing as many buttons as desired, without getting unintended responses from the system. Other potential extensions include enabling the user to save sound presets, which would allow easy replication of sounds the user discovers, and adding a looping feature, which would allow the user to create a complete song.
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • AUGUST 2019 #349 – Get a PDF of the issueSponsor this Article
T.J. Hurd (firstname.lastname@example.org) is currently an undergraduate student at Cornell University studying Electrical and Computer Engineering. He’s passionate about music and the technologies behind it.
Ben Roberge (email@example.com) is currently an undergraduate student in the School of Electrical and Computer Engineering at Cornell University. His technical interests are in MCUs and embedded systems.