Build a Microcontroller-Based, Sight-Reading Assistant
A team of university students developed a real-time, sight-reading assistant that displays scrolling sheet music on a monitor and compares the notes to those played on a handmade piano keyboard. If the user plays a correct note at the right time, he or she gains points. If the user plays an incorrect note or plays a note too early or too late, he or she loses points.
Due to the popularity of music-based video games such as Guitar Hero and Rock Band, we built a sight-reading assistant for intermediate and advanced musicians. Our prototype displays scrolling sheet music on an LCD TV and compares the scrolling notes to notes played by a user on a handmade piano keyboard input in real time. If the user plays a correct note at the correct time, he or she earns points. If the user plays an incorrect note or plays a note too early or too late, he or she loses points. The piano keyboard is paired with custom gloves that are needed to actually play the piano. The handmade keyboard is connected to a microcontroller that monitors the user’s input into the keyboard and compares it to the notes that are supposed to be played. A second microcontroller is used to display properly formatted sheet music on the TV screen for sight-reading. We use a series of dot matrix notes to display sheet music for a song. In this article, we will describe the various technical aspects of our project in detail. Figure 1 is a diagram of the entire system.
Our sight-reading system is composed of four parts: the keyboard and gloves for playing music, the interface for displaying the sheet music and score, a system for sampling keys that are being played, and the algorithms for generating realistic musical tones. We used two Microchip (formerly Atmel) ATmega1284 microcontrollers. One microcontroller is responsible for sampling the input data from the keyboard and synthesizing the music. The other microcontroller is responsible for generating the interface, displaying and updating the sheet music on a TV LCD via NTSC, and calculating scores based on the correctness of the user’s performance. The music is displayed on staves with a treble clef. This kind of musical display is standard among all musicians, and the treble clef is often preferred by pianists.
The sheet music is displayed in the following fashion:
- Note structures are created in order for the entire song.
- Notes are placed sequentially until their total value is more than a full measure.
- A measure (bar) line is placed and the beat is reset.
- Numbers 2 and 3 are repeated until the line is full.
- A new line is started and steps 2 through 4 are repeated until the page is full (or the last note is drawn).
- Staff lines and clefs are drawn to complete the page.
- The display is halted until the user plays the entire page.
The interface was designed using a mapping of bit values. Each kind of note was essentially drawn as a square of 0s and 1s, with 1 being a white pixel and 0 being a black pixel.
The real-time music sheet on the screen and score system provides the user real-time feedback about his or her performance (see Photo 1). The feedback is relatively clear. If a note is played correctly, the score increases immediately by 10 points. What might make it difficult for the average pianist are the slightly wider key spaces and the lack of auditory tempo feedback. Currently, the beat is displayed on the screen, which makes it difficult to monitor while the user is also reading the music on the screen.
The game interface code utilizes the NTSC protocols to generate visuals on a screen with a 60-Hz refresh rate. Any logic for the displays or score generation is computed between screen refreshes, so the code must execute as quickly as possible. From the player’s perspective, the ultimate goal is to press the correct key at the correct time. Since there is intrinsic human imprecision in timing when pressing the keys, our program allows for about a 0.25-s window before and after the expected key press.
Ideally, our program would be able to play multiple songs and let the user choose between them. While we knew we did not have enough time to create a system that could read files and parse them, we hardcoded the song in our program.
KEYBOARD & GLOVE DESIGN
The keyboard comprises keys that can play up to two octaves. Our range is from Middle C (C4) to B5. Our keyboard does not support sharps or flats. We cut out adhesive copper strips as piano keys and attached them on to a plastic board that acts as the keyboard. The keys are connected to 5 VCC. To play on the keyboard, the user needs to wear our complementary gloves, which are made of copper strips that are connected to ground (see Photo 2). When the user plays on the keyboard, the glove grounds the 5-V charged key and the microcontroller detects this change in key’s voltage to know if a key is pressed.
The 10-kΩ pull-up resistor of each key ensures that when one key is pressed and grounded the other keys remain charged. Figure 2 shows the keyboard’s circuit. (Note that the keyboard circuit schematic shows only keys C4 to F4.) Initially, instead of building our own keyboard, we were considering modifying a commercial piano keyboard by adding sensors at the bottom of the keys to detect key presses. However, after closely inspecting a commercial piano keyboard, we realized that modifying a commercial keyboard to work with our microcontroller would be onerous. In addition, we liked the idea of playing music with a set of gloves. The copper-strip coated gloves add a steampunk feeling to the entire project.
SYSTEM FOR SAMPLING INPUTS
To sample inputs in real time, the microcontroller needs to be able to detect the status of all the keys at once. Directly wiring each key to each port on the microcontroller would involve 14 distinct connections directly to microcontroller. This would create a large number of wires connected directly to the microcontroller. The keyboard contains up to 14 keys, and to simplify the hardware interface directly to the microcontroller, we decided to use two 8-bit 74HC166 parallel-in-serial-out (PISO) shift registers to sample parallel inputs from the keyboard and generate a serial output to microcontroller (see Figure 3). Each PISO shift register is capable of receiving eight inputs and generating one serial output. Two shift registers are used to sample 14 inputs, and two serial output wires are used to output to the microcontroller.
Parallel-in refers to the fact that the shift register can read all parallel inputs from D0 to D7 into the shift register. At a rising clock edge when the parallel enable input (PE, active-low) becomes low, new values are read. And at each subsequent rising clock edge, a value is shifted out at Q7 beginning with D7. So at the first rising edge, we get the value of D7. At the second rising edge, we get the value of D6. At each rising clock edge, we sample the data at the Q7 port.
The microcontroller generates both the clock signal and the PE input signals for the shift registers. When we built this sight-reading system, we assumed that the fastest speed a person strikes keys is less than one-tenth of a second. So, we decided to sample 10 inputs per second. Our design is able to sample all keys per 100 ms (i.e., 1 s/10 = 0.1 s). Since each shift register has eight inputs, we need eight cycles to serially shift out all the keys. Thus, a total of eight cycles is needed. Readings of the shifted out values also takes place at each rising clock edge, and they are done in parallel with the register shifting to shift values out. Writes to shift registers occur at rising clock edge and when PE is pulled low.
We used the ATmega1284P’s Timer2 to generate the clock signal with a frequency of 160 Hz for the shift registers. To achieve a 160-Hz clock cycle, we used Clear Timer on Compare Match mode and toggled the output signal. We set OCR2A to 96. When OCR2A overflows, it clears and becomes zero. We set the timer’s prescalar to 1,024. With the following equation, we obtain 160 Hz for the clock signal:
Our system utilizes two microcontrollers. One microcontroller samples the keyboard input and generates the sound using the Karplus-Strong algorithm. The other microcontroller generates the video and keeps track of how well the player is playing the given song. We originally planned on using a single microcontroller to achieve keyboard reading, sound generation, and video interface. However, the video generation interrupt service routine (ISR) needs to be very fast, and placing sound generation and key press readings in the same ISR would distort the video signal. Moreover, adding other ISRs for keyboard reading or sound generation would interfere with the ISR of the video and creating a flickering video. For these reasons, we decided to use a separate microcontroller to handle the sound and key inputs. We use UART to send the status of the keys to the microcontroller that is responsible for generating videos.
Microcontroller 1 acts as the sender and microcontroller 2 acts as the receiver. The communication is unidirectional. Microcontroller 2 can’t send data. Microcontroller 1 can’t receive any data. We used the built-in UART serial communication as the protocol to send and receive data on the microcontrollers. Microcontroller 1 uses UART port0 Tx to send data to the receiver’s UART port1 Rx. The UDRn register in the UART port stores both incoming data and outgoing data. The sender uses UDR0 to send data, whereas on the receiver’s end data arrives in UDR1. The UART’s data rate is set to 57,600, so that the receiver’s end can keep up with the speed of video generation. Any speed slower than that rate will cause significant delay in the video generation which affects the game play.
The payload of the status of key presses is split into two UART packets to be sent from microcontroller 1 to microcontroller 2 (see Figure 4). Each UART packet contains 8 bits or a char. We used a single bit to represent the status of each key on the keyboard. Since there are 14 functioning keys on the keyboard, we would need at least 14 bits to represent the entire keyboard. We divided the 14 status bits of the keyboard in half and each packet contains seven keys.
Since two packets are sent and each packet represents one half of the keyboard, it is important to get the order of the packets correct. We initially did not include packet headers to signify the order of the packets, because we assumed that if a packet is sent first it will also get received first. However, in reality, the first packet that gets sent may not always be received as the first packet. To solve this problem, we made the most significant bit of each packet to be the packet header which signifies the order that these packets should be received. Packet0’s packet header is set to 0 to indicate that there is still more packets to follow. Packet1’s packet header is set to 1 to indicate that it is the last packet with nothing after it. When the receiver receives each respective packet, it would then know the order of the packets. Bit 6 of packet0 represents the leftmost key on the keyboard (C4). Bit 0 of packet1 represents the last key on the keyboard (B5). Thus, our packets use big-endian format to represent the key.
When a note is played on the handmade keyboard, the tone for that note is generated in real time using the Karplus-Strong (KS) Algorithm. The KS produces realistic string sound and is a relatively easy tone-generation algorithm to implement. The KS operates on a delay line, simulating a traveling wave on a string. The length of the delay line, L_delay, determines the frequency of the generated tone:
The KS feeds white noise into the delay line. It then low-pass filters the tail/output of the delay line and places the result into the head of the delay line. The output of the delay line drives a speaker. The resulted frequency tone equals F_S/L_delay.
In our implementation, the analog delay line is replaced by an emulated circular buffer and fractional delay unit (FDU). The length of the buffer plus the length of the fractional delay correspond to the length of the delay line. Thus a string: L_delay = L_buffer (integer part) + L_fdu (fractional part). The FDU is critical to simulate the length of the delay line. The length of delay usually contains a fractional part, and the FDU is the fractional part of the delay line. The circular buffer is embedded within a larger array; thus, the buffer’s length can change dynamically. The fractional delay can also change dynamically. Thus, tones of different frequencies can be generated.
Figure 5 shows the Karplus-Strong unit that was implemented in this project. It also shows the peripherals that interact with the KS unit. These peripherals include the clock signal, speaker, and interface. The KS C program consists of two sections: an ISR and a while loop in main(). The main while loop has a period of 50 ms. Its purpose is to execute a debounced read of the key presses every 50 ms. The ISR executes at 16 kHz. Its purpose is to generate the sound and update the state of the circular buffer and incrementing the pointers (see Figure 5). When a pointer reaches the last element of the buffer, it wraps back around to the first element.
Whenever a new key is pressed, the ISR changes the buffer length and fractional delay so that the new tone can be generated. The ISR must also energize the entire buffer with white noise by populating each element with uniformly random values. In subsequent executions of the ISR, the buffer outputs values to the PWM, which is heard as the tone of the desired frequency. While the key is pressed down, the ISR decreases the damping so that the tone is sustained for a longer period as what would happen in a real piano. When the key is released, the damping goes back to the normal level. In this case, as the system progresses, the damping drives the energy of the buffer back down to zero and no tone is produced until presses the key again.
Our Piano has support for 14 notes. Photos 3a, 3b, and 3c show the measured FFT for three different keys respectively: the lowest frequency key C4, A4, and the highest frequency key, B5. All tones exhibit harmonics. Photo 3d shows the effect of damping for A4, along with the clock signal, which decays in about three divs, which corresponds to about 300 ms. Table 1 shows the Karplus-strong string length and tuning that we implemented for all the keys on the keyboard.
RESULTS & IMPROVEMENTS
The final project was a great success. We were able to create a keyboard comprising keys that can play up to two-octaves. The aim of this design is similar to popular video games like Guitar Hero and Rock Band. However, our product is very different in two ways: it uses a realistic piano keyboard and it displays music in a sheet music (traditional) fashion. In addition, the keyboard is laid out the same way as an actual piano. Our sight-reading assistant is very intuitive for the intermediate to advanced pianists. However, a user without piano training probably won’t find it enjoyable since it can take months for new musicians to learn to read music (and many more months to learn to sight-reading). While this is indeed a sight-reading assistant, it would be incredibly difficult for the new pianist to use.
While our design is fully functional, there are clearly some improvements that can be made. Most noticeably, our program currently only plays one song (Cornell University’s alma mater). One improvement would be to support reading MIDI files (perhaps by SD card) and allowing the user to choose between the MIDI files on the SD card. Another improvement would be to implement support for multiple-note input and multiple-note display, thus making the program much more robust. Currently, the keyboard detects multiple inputs, but the sound generation and video game logic only support one-hot inputs.
We could also improve the music display in a few ways. Currently, sharps and flats aren’t supported, which means only music in C major and A minor can be displayed. In addition, there is no feedback when a wrong note is played. The score is the only feedback from input that is visible on the display. Having some way of highlighting the current note you’re supposed to be playing, or some feedback telling you what note you pressed incorrectly, would be a helpful improvement to the system.
Our keyboard and gloves are very robust. The only downside is the contact surface of the glove is not smooth. The ripples on the copper tips of the glove sometimes trigger multiple key presses on the keyboard. The double key press issue could not be solved by conventional key press debouncing. An area of significant improvement would be to smooth out the surface of the glove so to avoid double key strikes.
Although requiring users to wear gloves creates a cool, steampunk feel, musicians typically don’t wear gloves during real performances. So, a potential improvement would be to build a keyboard that does not require the user to wear them. This would involve more complicated sensor systems on the keyboard to detect the press of the finger.
Authors’ Note: We thank Dr. Bruce Land for teaching Cornell’s ECE 4760 course and providing materials for our final project. We also thank Ahmed Kamel for helping with technical problems.
B. Land, ATmega1284 NTSC Video Library in C, Cornell University, http://people.ece.cornell.edu/land/courses/ece4760/video/index.html.
———, “Karplus-Strong and Digital Sound Generation,” Cornell University, 2013, http://people.ece.cornell.edu/land/courses/ece4760/Math/avrDSP.htm.
Microchip Technology (formerly Atmel) | www.microchip.com
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • MARCH 2016 #308 – Get a PDF of the issue