Projects Research & Design Hub

Sound Localization

Using a PIC32 MCU

Learn how these two Cornell students built a sound localization device. The employed a Microchip PIC32 MCU and a set of microphones to determine the direction from which an arbitrary sound is coming. They recorded input from three microphones to identify the time delay between the audio recordings. These time delays provide a means to calculate the direction of the sound.

It’s amazing to see what kinds of sound analysis can be done using a 32-bit MCU. Our project is the construction of a sound localization device. The Microchip PIC32 microcontroller (MCU)-based device is a triangular arrangement of microphones used to localize the direction from which an arbitrary sound is coming. By recording input from three microphones, we were able to identify the time delay between the audio recordings. These time delays provide a means to compute the direction of the sound.

The hardware for the project is made up of three main parts: three microphone circuits, a TFT (thin-film-transistor) LCD (liquid crystal display) and a custom PIC32 prototyping board. The prototyping board gives a breakout for pins on the PIC32 in addition to 3.3V power, an SPI-controlled DAC and an SPI-controlled TFT display. The prototyping board uses the PIC32MX250F128B [1], but theoretically, any PIC32MX MCU should have the same hardware we used.

Each of the three microphone circuits includes an electret microphone, a set of filters and an amplifier. The output of each microphone circuit is fed into an ADC channel on the PIC32. The TFT display is used to show debugging information and to point in the direction of the sound. The full schematic is shown in Figure 1.

FIGURE 1 – Full schematic of the project, including MCU, microphones and power circuitry. Microphone and amplifier circuitry are on the right side.

THREE-PART CIRCUIT
The microphone circuit consists of three parts: The microphone itself [2], a high-pass filter to center the signal around half voltage and an amplifier, which uses an active band-pass filter to amplify only frequencies of interest. A Texas Instruments (TI) LM4562 audio op amp [3] acts as the core component of the amplifier and filter part of the circuit. Because the LM4562 is not a rail-to-rail op amp and does not work with 3.3V, a different set of rails was required to keep the op amp out of saturation. 9V were supplied to the positive rail, and -3V were supplied to the negative rail. Because we found that noise from the MCU can get tracking into the audio circuitry, the 3.3V supply for the microphones was generated from a separate regulator with a constant load constructed out of a few resistors.

The initial high-pass filter has a cut-off of around 160Hz. The high-pass filter on the band-pass amplifier was selected to roughly match the cut-off of the initial high-pass filter. The low-pass filter of the op amp was selected to give roughly a 7.3kHz cutoff frequency. The gain used was 100:1.

Each output from a microphone circuit was attached to an I/O pin with analog functionality. To protect these pins from any over-voltage or under-voltage conditions, a pair of Schottky diodes and a resistor were added to the output to construct a voltage snubber. The Schottky diodes provide a conducting path in case of one of these conditions, while the resistor limits the current flow through these diodes.

— ADVERTISMENT—

Advertise Here

The TFT display shows debugging information and points in the direction of the source of the sound. The part we used is an Adafruit breakout (part number 1480) [4] that provides the TFT display, a TFT display driver and an SD card reader (unused). The code for this was a library that was adapted from the library Adafruit supplied for running the TFT with an Arduino. The TFT breakout uses an SPI channel, along with a few other digital I/O pins. Cornell’s ECE4760 course links to a library for the TFT display that was adapted from an Arduino library by Tahmid [5].

A Microchip Technology MCP4822 digital-to-analog converter (DAC) [6] was used for debugging the system but was not used for the project itself. By sending the waveform to the DAC at a rate of 5kHz, we can review what the output of the system looks like. More on that later in the results section.

SOFTWARE AND MATH
First, to locate the direction of the sound, the system needs to record the reading from each microphone channel. Second, the recording of each channel is cross-correlated with the next channel to identify the relative time shift from one recording to the other. Third, the relative timing between each pair of microphone channels is used to compute the direction of the origin of the sound source. The relative direction is computed from the timing differences and the physical arrangement of the microphone placement, to derive the direction of the sound source in degrees. This cycle is repeated as quickly as possible, and the direction estimates are digitally low-passed using an averaging filter to give an estimate of the direction of the sound. Finally, the angle is written to the TFT display to show the result.

The analog outputs from the three microphone channels are connected to the three separate ADC channels. The ADC is configured to operate in a timer-triggered sampling mode, which starts a new sample each time the timer-interrupt flag is raised. It is also set to sample three channels and store the results as 16-bit signed integers in the ADC’s internal buffer. The ADC is set to raise an interrupt flag once every three samples. To make the system run at the intended frequency, the timer for triggering the ADC sampling is set to run three times faster than the intended sampling frequency.

This causes the timer to trigger three ADC samples—one for each microphone at the intended frequency. A set of three DMA channels is used to transfer the data from the ADC output buffers and into storage. Once the DMA channels are enabled, the DMA transfers 16-bit cells whenever the ADC interrupt flag is raised. When the entire block is transferred, the DMA channel raises the DMA_EV_DST_FULL to signal completion of the transfer. The computation_thread checks the completion flag for all three DMA channels to determine when to begin the computation of the sound localization.

Once the microphone data is fully recorded in the arrays, as indicated by the DMA_EV_DIST_FULL flags, the time delay is calculated between each pair of microphone recordings. The main mathematical technique we use to compute the time delay between two signals—the microphone recordings—is cross-correlation, which measures the similarity of two signals by taking the sum each pair of points in the signal as one signal slides along another. The formula is as follows:

Each correlation value gives the similarity value between the first signal and the second signal, shifted by some amount of time. The index of the entry for the maximum value is the time delay in units of the sampling rate. For example, for the two square waves shown in Figure 2, we see the result of taking the cross-correlation (bottom graph).

FIGURE 2 – Top: two identical square waves with a time shift. Bottom: the cross-correlation between the two square waves shown in the top

CROSS CORRELATION
The cross-correlation gives a peak at the maximum overlap. In this case, the orange curve was used as the signal f in the equation, and the blue curve was used as the signal g. The maximum overlap is the time shift of f with respect to g, which in this case is at -200 in both plots. Although we referred to the position of the peak as an index, we don’t mean an index into an array. It simply means the location in the signal. The index of -200 means that the blue curve must be shifted backward in time by 200 time units to match the orange curve. To get a long signal that we can easily measure, we used a swept sine wave such as the one shown in Figure 3, since a swept sine wave is not repetitive. This prevented a situation where the correlation gives a match for multiple different time shifts.

FIGURE 3 – An example of a swept sine wave, also referred to as a chirp

Figure 4 shows the result of taking the cross-correlation of a swept sine wave with another swept sine wave, in which the distinct maximum gives the relative time shift between the signals. In this case, the maximum is at 0.05, which indicates that the second signal is shifted 0.05 time units ahead of the other.

— ADVERTISMENT—

Advertise Here

FIGURE 4 – An example of the cross-correlation between two swept sine waves with a time shift of 0.05 units

In the process of computing the true direction, we evaluate a few constants. First is mic_data_size which is the size of the array that holds the microphone data. This is computed by taking [sampling rate] × [sample duration]. Another is max_diff which we use as the maximum value of the time shift of the cross-correlation peak. max_diff is the max possible shift geometrically possible, and is computed by:

where length is the length of one leg of the triangular arrangement and speed is the speed of sound. These values set how much data is sampled and what region of the sampled data is used in cross-correlation, as discussed next.

The cross-correlations are calculated on channel 0 with respect to channel 1, channel 1 with respect to channel 2, and channel 2 with respect to channel 0. Each cross-correlation is computed by sliding the middle (mic_data_size – (2 × max_diff)) wide section of the first recording fully along the second recording, to compute the sum of dot products of the fully overlapped recordings. The resulting cross-correlation values are stored in an array of size (2 × max_diff) + 1. Care must be taken to ensure that (2 × max_diff) + 1 is reasonably smaller than mic_data_size to ensure a sufficient number of data points are used in the computation. As the cross-correlation values are computed, the peak value and its associated time shift of each of the three pairs are identified and recorded to compute the direction of the source sound.

Listing 1 is the function used for the cross-correlation. In this code, we compute cross-correlation using the definition equation. A section of one of the inputs is shifted across the other input. Solving the cross-correlation via an FFT (fast Fourier transform), element-wise multiplication and IFFT (inverse FFT) has a better run-time complexity for very large inputs. However, we did not pursue this in our project, because the coefficient of the run time is unknown, and our input size is bounded by the parameters we define.

// cross-correlate each pair of microphone recordings
void cross_correlate() {
int channel, idx, shift;
for (channel = 0; channel < num_mic_channel; channel++) {
// shift the kernel -max_diff to max_diff of mic_data_size
int correlate_channel = (channel + 1 > 2) ? 0 : channel + 1;
for (shift = -(max_diff); shift <= (max_diff); shift++) {
long tmp_sum = 0;
// kernel size is mic_data_size - (2*(max_diff) + 1)
for (idx = (max_diff) + 1; idx < (mic_data_size) - (max_diff); idx++) {
int idx2 = (idx + shift);
tmp_sum += (((long) mic_data[channel][idx] + mic_bias[channel]) * ((long) mic_
data[correlate_channel][idx2] + mic_bias[correlate_channel]))>>2;
}
// save correlation results for DAC debugging
correlate_data[channel][shift + (max_diff)] = tmp_sum > 0 ? tmp_sum : 0)>>3;
// update new max peak
if (correlate_data[channel][shift + (max_diff)] > peak_max[channel]) {
peak_max[channel] = correlate_data[channel][shift + (max_diff)];
peak_index[channel] = shift + correlate_bias_adj[channel];
}
}
if (abs(peak_index[channel]) < 2)
peak_index[channel] = 0;
}
}

LISTING 1 – The code for computing a cross-correlation via direct application of the definition. Only the center region of on signal is used to avoid the need to normalize the results.

The direction of the sound source is computed using the three peak_index values identified in the cross-correlation calculations. In this case, the name “peak_index” is a bit of a misnomer, since these values actually represent the time shift, and not an index into an array.

To measure the direction, the time delay between each pair of microphones is used to compute an angle for each pair. The angle is computed by using the distance between the two microphones and the distance sound travels in the measured time delay. That is:

This angle is calculated between two microphones, but leaves ambiguity for which side of the two microphones the sound comes from. Each time delay results in both the black arrow and the red arrow (Figure 5). We compute an angle for each pair of microphones and then use the arrangement of the microphones, shown in Figure 6 to remove the ambiguity.

FIGURE 5 – An example in which the sound comes from the direction indicated by the large blue arrow. The other smaller angles show the angle estimates given by the correlation between the two microphones. The tails of the arrows start from the edge that sits between the two microphones that were correlated together.
FIGURE 6 – The arrangement of the microphones with an example sound source. Computation for the sound directions assumes that the sound source is far from the microphones.

Using each time shift, we can determine to which microphone the correlation indicates that the sound is closer. For example, in Figure 5, each pair of arrows sitting between a pair of microphones points toward one of the two microphones. In the case of Figure 5, these indicate microphone 0 for each of the upper two legs of the triangle, and microphone 1 for the lower leg. Using the arrangement of the three microphones, as long as the time shifts do not all indicate different microphones, a 60-degree range can be selected, as depicted in Figure 5 by the dotted dividing lines.

ANGLE ANALYSIS
As noted earlier, each angle estimate gives two possible angles. In Figure 5, the correct angle is marked in black, and the false angle is marked in red. For any of the 60-degree regions, one angle estimate always faces outward (of the triangular arrangement), and one faces inward. The remaining angle estimate is ambiguous. The outward-facing angle is the angle computed from 0-1, the inward-facing angle is the angle computed from 1-2 and the ambiguous angle is the angle computed from 2-0. Note that if the direction the sound came from was a bit closer to microphone 0, then the correct direction would be outward-facing rather than inward-facing, which it is now. First, the two angle estimates that are on known sides are computed and averaged together. Using this averaged value, the side for the ambiguous side is chosen by evaluating which is closest to the averaged value. This gives the last angle estimate. All three of these values are then averaged together, which gives the final angle estimate.

One of the six possible ranges of 60 degrees is selected by taking the sign of the time shift and converting it into a binary encoding. This is done in the switch statement in Listing 2. Only the first two cases were included in Listing 2 for brevity. It turns out the computations for each of the six ranges are similar, and we simply need to swap which angles estimates are the outward, inward, and ambiguous ones. This is done by using idxPidxN and idxU, which represent the index of the outward angle, the index of the inward angle and the index of the ambiguous angle, respectively, where index here identifies which cross-correlation time shift is used. The helper function range_checker determines if the ambiguous angle is outward or inward, based on the angle estimate given. The array angles_adj holds the offset of each angle estimate with the baseline direction. In this project, it is the direction of microphone 0. The other helpers lim_angles and lim_index limit the range of values to within the range of angles and indices, respectively.

int val = (peak_index[0] > 0) << 2 | (peak_index[1] > 0) << 1 | (peak_index[2] > 0);
int idxP = -1, idxN = -1, idxU = -1;
switch (val) {
case 0b110: //0-60
idxP = 0;
idxN = 1;
idxU = 2;
break;
case 0b010: //60-120
idxP = 0;
idxN = 2;
idxU = 1;
break;


. . .


}
int x;
for (x = 0; x < 3; x++){
lim_index(peak_index[x]);
angles[x] = ((double) peak_index[x])/((double)(max_diff));
angles[x] = acos(angles[x]);
}
angles[idxP] += angles_adj[idxP];
angles[idxN] *= -1.0;
angles[idxN] += angles_adj[idxN];
lim_angles(angles[idxP]);
lim_angles(angles[idxN]);
angle = angles[idxP] + angles[idxN];
if (range_checker(idxU, angle))
angles[idxU] *= -1.0;
angles[idxU] += angles_adj[idxU];
lim_angles(angles[idxU]);
double x_pos = cos(angles[idxP]) + cos(angles[idxN]) + cos(angles[idxU]);
double y_pos = sin(angles[idxP]) + sin(angles[idxN]) + sin(angles[idxU]);
angle = atan2(y_pos,x_pos);

LISTING 2 – The code for computing the sound source’s direction. The […] section omits additional cases for brevity.

The low-pass filter, which averages the new angle with the old angle to produce the result, is not shown. The process is the same as averaging the three angles, except that the component values of each vector are weighted to set the cut-off frequency of the low pass.

Averaging angles has a pitfall, in that angles wrap around. Say we wish to average two angles, 170 degrees and -170 degrees. We would like this to give the value 180 or -180 degrees, but a simple averaging of the angles gives the angle 0, which is the exact opposite of what is desired. To average the angle correctly, we instead convert each angle into a unit vector and average the components. The average vector is then converted back into an angle, giving the average angle.

RESULTS
The sound localization worked reasonably well. Although the computation delay was almost indistinguishable when we ran single sweeps, the delay turned out to limit the max accuracy of the system, since it relies on an average of multiple sweeps. The final version of the device used 20cm legs on the triangular arrangement, an 80kHz sampling rate, and 0.025 second sampling time. The remaining parameters were all derived from these values and computed at compile time, using C macros. Using multiple sweeps, we were able to get the system to home in on the direction of the sound. Figure 7 shows oscilloscope traces of the original waveform of the cross-correlation computation output from an early prototype.

— ADVERTISMENT—

Advertise Here

FIGURE 7 – Oscilloscope traces of the original waveform of the cross-correlation computation output from an early prototype. The upper trace illustrates recorded signal from one of the microphones, and the bottom trace shows the cross-correlation result from a pair of the microphone channels. Traces such as shown were used for debugging the system.

The upper trace in Figure 7 illustrates recorded signal from one of the microphones, and the bottom trace shows the cross-correlation result from a pair of the microphone channels. A high signal is used on the cross-correlation trace to show when the data start and end. This gives us a point at which to set the oscilloscope to trigger, and allows us to see where the start and end of the cross-correlation are, along with the relative location of the peak. In this image, the peak is roughly centered, showing that the sound signals arrived at the two microphones at the same time.

The plot differs from the cross-correlation of the swept sine wave examined in the math background, because this plot takes the absolute value of the cross-correlation. In testing, we found that swept sine waves picked up by the system almost always had nicely formed cross-correlation plots, such as the one shown in Figure 7. However, we found that the location of the peak wouldn’t always be in the same place. Further testing and experimentation showed that adjusting the circuit to have a well-defined phase shift for each frequency was key to making the system work. It’s essential that all filtering and amplification circuits have the same phase shift for every frequency.

After reworking the circuitry and moving to op amps with a higher gain-bandwidth product (from the MCP6242 to the LM4562), we found that the system appeared to get the angle correct to within 30 degrees in the worst case, and usually within 15 degrees of the correct location. In a more controlled environment, it is highly likely that the system would achieve better performance. The environment in which we tested the device was cluttered with lab equipment, which gave reflected sound waves and multipath distortion. Our code is based on examples given in Cornell’s ECE4760 course website [7]. The linked pages also contain the example code that our code uses as a basis. 

RESOURCES

References:
[1]: https://www.microchip.com/wwwproducts/en/en557425 microcontroller link hardware
[2]: https://cdn-shop.adafruit.com/datasheets/CMA-4544PF-W.pdf datasheet for microphone
[3]: http://www.ti.com/lit/ds/symlink/lm4562.pdf datasheet for the Op-Amp
[4]: https://www.adafruit.com/product/1480 link for the TFT display
[5]: http://tahmidmc.blogspot.com/2014/10/interfacing-color-tft-display-with.html for TFT library
[6]: https://www.microchip.com/wwwproducts/en/MCP4822 link for the Digital-to-Analog Converter
[7]: http://people.ece.cornell.edu/land/courses/ece4760   general reference for PIC32 related

Microchip Technology | www.microchip.com
Texas Instruments | www.ti.com

PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JANUARY 2020 #354 – Get a PDF of the issue


Don't miss out on upcoming issues of Circuit Cellar. Subscribe today!

 
 
Note: We’ve made the October 2017 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.


Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Become a Sponsor

Alvin Pan is a recent graduate in Electrical and Computer Engineering and Computer Science from Cornell University with interests in embedded systems and robotics. He can be reached at [email protected]

JinJie Chen is a recent ECE graduate of Cornell University and now an embedded software engineer at Google. He is particularly interested in embedded system and edge computing. He can be contacted at [email protected]