Projects Research & Design Hub

Fancy Filtering with the Teensy 3.6

Written by Brian Millier

Arm-ed for DSP

Signal filtering entails some tricky tradeoffs. A fast MCU that provides hardware-based floating-point capability eases some of those trade-offs. Here, Brian has used the Arm-based Teensy MCU modules to serve those needs. He taps the Teensy 3.6 Arm MCU module to perform real-time audio FFT-convolution filtering.

Signal filtering can be done either with analog circuitry or digitally using a microcontroller (MCU) coupled with analog-to-digital and digital-to-analog converters. The strength of analog filters is that they can cover wide frequency ranges. If they are designed entirely with passive components, the range of signal amplitudes that can be handled is limited only by the voltage rating of the various capacitors that are used. Additionally, they don’t add much, if any, noise to the signal. However, a limitation of analog filters is that they can’t provide a sharp cut-off rate at their corner frequency (Fc), unless you cascade many filter sections and use close-tolerance components.

If you need high-performance filters, then digital filters might be the way to go. You can design very sharp low-pass, high-pass, notch and band-pass filters using digital techniques, if you use high-resolution ADC/DACs to convert the analog signal into the digital domain and (optionally) back to the analog domain. However, the MCU that you use must be fast and, in general, feature hardware-based floating-point operations. Two years ago, I discovered a line of Arm-based MCU modules that fill the bill nicely.

In Circuit Cellar issues 324 (July 2017) and 325 (August 2017), I described a digital guitar amplifier based upon the Teensy 3.2 Module, which contains an Arm Cortex-M4 MCU. The analog guitar signal was converted to a 16-bit digital signal for processing, and then back to an analog signal for power amplification, by an NXP Semiconductor SGTL5000 Codec contained on the PJRC Audio Shield. This project was made possible largely due to the extremely powerful Audio library provided by the manufacturer of the Teensy modules. This library consists of many audio functions, all of which operate using DMA transfers and interrupt service routines (that is, as a background task). The sampling is done at CD quality (44,100 samples/s at 16-bit resolution).

That project involved many different audio functions—some from the Teensy Audio Library, and some that I wrote myself. The filtering I used for the project was in the form of a 5-band parametric equalizer (EQ). This consists of five blocks of band-pass filters, each one centered on a specific frequency in the audible range. Such an EQ is basically a sophisticated “tone control” for the guitar signal. While most of the other guitar signal processing was done within the Teensy 3.2 MCU, using the Audio library, the 5-band parametric EQ was handled by a DSP block contained within the SGTL5000 Codec on the Teensy Audio Shield.

After finishing that project, I became interested in more sophisticated filtering algorithms that could be performed by the Arm MCU found on the Teensy modules. The Teensy Audio Library routines work with all the Arm-based MCUs in the Teensy module family (except the lowest-cost LC model). The Audio library contains three types of digital filters:

1) Biquad (low pass, high pass, band pass, notch)
2) FIR (up to 200 taps)
3) State-variable (Chamberlin)

The Biquad algorithm executes quickly, and its coefficients are easy to calculate on the fly, which makes it easy to change the filter bandwidth and Fc quickly. Finite impulse response (FIR) filters can provide much better filter characteristics, if you configure them with enough “taps”. However, as you increase the number of taps used, the execution time increases proportionately.

All the above filters use 16-bit, fixed-point math (Arm Cortex M4 DSP instructions using the Q15 data format). This is fast and reasonably accurate, but not enough to provide very sharp filter “skirts”. When you attempt to cascade several sections of such filters, you start to see the limitations in the precision of the fixed-point math.

The higher-end Teensy modules (Teensy 3.5 and 3.6) contain the more powerful Arm Cortex M4F core. These devices have hardware floating-point instructions, which basically allow you to do floating-point operations as quickly as you could do the 16-bit fixed-point operations with the DSP instructions available on Teensy 3.2’s Arm Cortex M4 MCU.

By using a Teensy 3.6 with hardware floating-point instructions, I figured that I could handle more sophisticated filtering algorithms. Another consideration was that the Teensy 3.6 MCU runs at 180 MHz, compared to the 72 MHz clock speed of the Teensy 3.2. Also, the Teensy 3.6 can be safely over-clocked at 240 MHz, compared to the 120 MHz maximum overclocked speed of the Teensy 3.2. Figure 1 shows the Teensy 3.6 module. Figure 2 shows the Audio Shield that I used. It contains the NXP SGTL5000 Codec device (A/D and D/A converters, mic preamplification, headphone driver and digital signal processing).

FIGURE 1 – Top view of the Teensy 3.6 Arm MCU module. To the right is the on-board MicroSD socket, which accepts the MicroSD card containing the Cabinet Impulse Response file.

FIGURE 2 – Top view of the Teensy Audio Shield. The two rows of 14 holes are fitted with header pins that plug directly into the Teensy 3.6 MCU module. All interconnections between the two boards are via these 28 pins.

Although I have used digital filters in FIR and Biquad configurations, prior to this project I wasn’t familiar with the term “convolution” filtering. As part of my music/recording hobby, I had encountered the term convolution regarding:

1) Guitar amplifier cabinet simulation
2) High-end, “space-accurate” reverberation processors


Advertise Here

Convolution reverberation processors are not relevant to this discussion. However, guitar amplifier cabinet simulation is basically a fancy way of saying that you are simulating the exact frequency/phase response of a guitar amplifier and its loudspeaker(s), mounted in a specific cabinet, with the recording microphone oriented a specific way.

The “shape” of the frequency response curve of any given guitar amplifier/speaker combination will not be a “flat” response over the useful range of guitar notes. Instead it will consist of many small peaks and dips over the frequency range of interest. These “aberrations” provide the distinctive sound of interest to the musician. To some extent, one can simulate a given guitar amplifier/speaker by using a multiband parametric equalizer (EQ) and fiddling with it until it sounds the way you know the actual amplifier/speaker sounds. However, experts in the field learned that they could go one step further using the following method.

Rather than feeding an actual guitar signal into the amplifier/speaker cabinet, they feed it a short pulse, with rise/fall times as fast as possible. This short pulse is called a “finite-impulse signal.” The sound emitted by the speaker cabinet is then picked up by a professional-quality microphone, amplified, converted to digital form and stored in a file. This file represents the FIR of the guitar amplifier/speaker cabinet. I admit that I don’t have the best understanding of the mathematical “magic” involved here, but suffice it to say that all the frequency response “personality” of the guitar amplifier/speaker cabinet is contained in the finite-impulse-response (FIR) file that has been collected. The higher the sample rate used to record the impulse, the better the simulation, and the larger this FIR file will be.

Once you have this FIR file, you can use it to provide the coefficients needed for a digital FIR filter. If you pass your “raw” guitar signal through this FIR filter, it will be modified in virtually the same way that it would be if it were sent out to the specifically modeled guitar amplifier/speaker cabinet. Effectively, you can digitally record a “raw” guitar signal, which, when converted back to analog and listened to, will sound as if you were listening to it “live,” through the specific guitar amplifier/speaker that you have modeled. The FIR filter routine does what’s called a “convolution” of the guitar’s time-domain signal with the FIR array of coefficients—which is also time-domain data.

Once you absorb the idea behind this simulation technique, it becomes clear that you could implement a complex digital filter to reproduce almost any complex frequency response with this technique. I’m certain that mathematicians and electronics engineers in the communication field discovered and used this technique to design complex filters long before guitar players saw its usefulness. However, it was the guitar cabinet simulation concept that led me to investigate the FIR filtering technique more fully.

It turns out that implementing a FIR filter with enough “taps” or coefficients to perform realistic guitar amplifier/cabinet simulation generally requires a FIR filter with 512 taps or more. The Teensy 3.6, running at 240 MHz (overclocked)—and using its built-in DSP 16-bit fixed-point instructions—can process a 100-tap FIR filter (using the Teensy Audio library’s FIR filter block), using only 7% of available MCU time. This is for 16-bit data at a 44,100 Hz sample rate. That 7% figure is strongly influenced by the fact that most of the SGTL5000 Codec data transfers (in and out) are done under DMA, which frees up the main MCU from performing this time-consuming task.

Because FIR filter’s execution time is directly proportional to the number of taps [1], a 512-tap FIR should require 36% of available execution time. This timing seems reasonable, but implementing a FIR filter with such a large number of taps is impractical when using 16-bit fixed-point numbers. The accuracy is not nearly good enough to achieve proper results.


Advertise Here

What is needed is a way to implement a floating-point 512-tap convolution process that is fast enough to handle 16-bit signals at a 44,100-Hz sample rate, in real time. A powerful set of math/DSP routines for Arm Cortex devices is contained within the Cortex Microcontroller Software Interface Standard (CMSIS) library. I made use of several floating-point math functions contained in the CMSIS-DSP library.

The previous discussion involved processing signals in the time domain. That is, we sample a signal at a fixed sample rate, process these data and then send the data out at the same sample rate. We could also do the electronic filter processing in the frequency domain. This would involve converting our time-domain signal into the frequency domain. This means doing basically the same filtering (but in a different way), converting the frequency-domain signal back into the time domain, and then sending it out. On the surface, it would seem that this unnecessarily complicates the procedure, but there is a good reason to do it this way.

Converting the time-domain signal into the frequency domain can be done with a Fast Fourier Transform (FFT) routine. Converting it back into the time domain can be done with an Inverse Fast Fourier Transform routine (iFFT). Both the FFT and iFFT routines are available in the CMSIS DSP library available for Arm MCUs. For the Cortex M4F cores with built-in floating-point operations, the applicable CMSIS libraries perform those operations in floating point, very efficiently.

The big advantage to doing the filtering in the frequency domain rather than the time domain is that the computationally intensive convolution routine can be replaced by a matrix multiply routine. I referenced Steven Smith’s The Scientist and Engineer’s Guide to Digital Signal Processing [1] while doing this project. A link to it is available in RESOURCES at the end of the article. In Chapter 18 he mentions that the execution time for a standard FIR convolution routine is proportional to the number of FIR “taps,” whereas an FFT convolution routine’s execution time increases only as the logarithm of the number of FIR taps. Smith assumes that equivalent floating-point math instructions are used for both methods, and the following holds true:

1) For < 64 taps, standard convolution routines are faster.
2) For > 64 taps, FFT convolution routines are faster.


Advertise Here

In figure #18-3 of Smith’s text [1], he shows that a 512-tap standard convolution is almost 4 times slower than a 512-tap FFT convolution. I had no idea how much slower the Teensy 3.6’s floating point instructions would be compared to its highly-optimized DSP 16-bit fixed-point instructions. Therefore, I couldn’t tell whether it would be possible to implement a standard 512-tap floating-point FIR filter in real time (at 16-bit, 44,100 Hz sample rate). Considering that the FFT convolution routine should be 4 times faster, I decided to use that technique. Looking at the result that I show later in the article, this proved to be a wise choice.

Figure 3 shows the basic algorithm used for a 513-tap FFT FIR convolution filter. First let’s consider the 513-tap figure. When doing a convolution, the filter “kernel” that is used must be symmetrical around its central point (Figure 4). That is why a 513-tap value (an odd number) is used rather than 512. 512 is 29 (FFTs are generally 2n in size).

FIGURE 3 – Block diagram of the algorithm used in the Convolution Filter. The details of the overlap-add operations are not shown here, but are explained in the article.

FIGURE 4 – Shown here is a representative plot of the coefficients of a FIR low-pass filter. Notice that it is symmetrical around the half-way point in the number of taps.

Before doing any processing on the audio input stream, we must first obtain a “filter mask.” This is derived from the array containing the FIR filter coefficients—after it has been processed by a floating-point complex FFT routine, which brings it into the frequency domain. In Figure 3, I show the 513-point FIR coefficients as a 16-bit integer array. That is how a guitar cabinet impulse response file is structured—it is supplied as a WAV file in 16-bit signed format. I convert this to a floating-point array (using CMSIS arm_q15_to_float), so that it can be processed by the 1024-point, floating-point complex FFT routine (CMSIS arm_cfft_f32). Note that if you were instead trying to implement a FIR filter using coefficients from a FIR filter calculator [2], they would be normalized floating-point numbers. My FIR Filter Mask processing routine expects 16-bit integer values, so you would have to multiply those normalized floating-point coefficients by 32,768. The FIR Filter Mask, as described above, needs to be calculated only once for any given FIR filter profile. You might wonder why I am using a 1024-point complex FFT routine, when I have only 513 data-points. I’ll discuss that later.

Next, let’s look at the processing needed for filtration of the signal in real time. The Teensy Audio library does all its audio data transfer and processing in 128 blocks of 16-bit audio data. This means the incoming digital audio signal (from the SGTL5000 Codec) is transferred into Teensy 3.6 SRAM by a DMA burst transaction of 128 words (256 bytes). Similarly, these 128-word blocks are moved between various SRAM memory locations under DMA control for processing. Finally, the output data also are sent back to the Codec under DMA control.

This block size is a compromise chosen to minimize latency time (2.9 ms per 128-sample block,) while still allowing for efficient DMA transfers and other data-processing chores. However, for the 513-tap FIR routine to work, we need our 16-bit audio data to be available in 512-sample blocks. Without going into any detail yet, let’s just say that four of the Teensy Audio library’s 128-sample blocks are concatenated into one 512-sample block. An integer-to-float routine (CMSIS arm_q15_to_float) is used to convert this into a 512-element floating-point array.

This 512-sample array of time-domain audio data must now be converted into the frequency domain. This is done using a 1024-point complex floating-point FFT (CMSIS arm_cfft_f32). Why do we need a 1024-point complex FFT when we are processing only a 512-sample audio block? To begin, the audio signal data coming in consists of only the real part, not the imaginary part of a complex array. The math behind this is beyond my pay grade. But I know from experience that the sound coming out of the filter won’t be correct if you don’t use a complex FFT routine, and you must fill the imaginary portion of the input array with the same audio data that you have in the real portion. The complex FFT routine expects its input array to have the real and imaginary values interweaved, so when you are transferring the incoming audio data into the FFT array, you write each value twice before advancing to the next incoming data point.

The second question here is why are we doing a 1024-point FFT on only 512 input samples? Where are we getting the extra 512-points that we need to present to the 1024-point FFT? Here again, the theory is somewhat above my pay grade, but this is how I understand it.

Let’s go back to thinking in terms of a time-domain signal. If we are considering a continuous stream of digital audio data, it is obvious that the MCU cannot process the continuous data stream all at once. We must break the signal into smaller blocks and do the filtering on each block individually. Without getting into any math, I think it’s intuitive that filtering is just doing some form of weighted averaging over several data-points. At the very start of the datastream, there won’t be any “past history,” so the averaging process won’t be accurate. But that only happens once, at start of processing. The middle section of the block will filter okay, but as we get toward the end of the block, we’ll be missing the data present at the start of the next block, so that the averaging (filtering) will again be inaccurate. We therefore need to process the data in a way that takes into consideration the data from the next 512-sample block of data.

When a FIR digital filter with a 100-point filter kernel processes 100 incoming data points, it will result in an output of 200 data-points. Obviously, we can’t send out 200 data-points for every 100 data-points coming in, given that the input and output sample rates are identical. If you analyze the math involved, it turns out that to provide an accurate filtered signal you must:

1) Break the incoming signal into a block half the size of the FIR filter kernel.
2) Add a block of zeros to the end of these signal data, to make the total length equal to the size of the filter kernel.
3) Perform the FIR filtering on this block, resulting in an output block equal to twice the size of the filter kernel.
4) Send the first half of this output block out to the Codec, and save the last half of this block for later.
5) Perform steps 1, 2 and 3 again on the next incoming block of data. However, for step 4, recover the saved block of data from before, add it to the first half of the output block, then send this composite first half block out to Codec. Save the second half of the block for later (as in 4).

This process is referred to as the overlap-add method in DSP texts.

When we consider the FIR convolution process being done in the frequency domain, similar considerations will apply. We take 512 samples of the audio data and place it in the first half of the 1024-point FFT input array, filling both the real and imaginary elements with the audio data as mentioned above. We then fill the second half of the array with zeros (for both the real and imaginary elements). After the 1024-point FFT is performed, we will have a 1024-element of complex data in the frequency domain. In a similar fashion, the 513 FIR coefficients are padded out to 1024-points before undergoing the 1024-point FFT—which produces the Filter Mark.

The FIR convolution process in the time domain is equal to an array multiplication in the frequency domain. So, we take the FFT array from the incoming signal and multiply it with the FFT array from the FIR filter coefficients (the Filter Mark that were pre-calculated). The resulting 1024-point array, still in the frequency domain, must now be converted back into the time domain. This is done using a 1024-point iFFT routine (CMSIS arm_cfft_f32). Note that both the CMSIS FFT and iFFT routines are called using the same “arm_cfft_f32” label, but there is a parameter passed to this routine for which a “0” designates an FFT and a “1” designates an iFFT routine.

We are now back in the time domain with an array of 1024 floating-point digital audio samples. We take the first half of this array and add it to the 512 points of data saved from the last block. These 512 floating-point numbers are then converted back to 16-bit integers (CMSIS arm_float_to_q15) and sent out to the Codec to be converted to an analog signal. We then save the second half of this array to a temporary array, which will be added into the output stream the next time around. You can see that the overlap-add method that I discussed in terms of the time-domain FIR convolution is also performed, in a similar way, in the frequency-domain FIR convolution process. Note that in Figure 3, I’ve simplified the diagram somewhat by not including the zeroing of the second of the signal input array (and Filter Mask routine) nor have I shown the addition of the saved arrays from the previous block calculations. Listing 1 shows the “C” program code to perform the filtering as explained above.

LISTING 1 – Most of the actual computation is performed in this section of the program. The complexity is hidden by the use of high-level, DSP-like routines contained in the Arm CMSIS library.

// 4 blocks are in- now do the FFT1024,complex multiply and iFFT1024 on 512samples of data
// using the overlap/add method
// 1st convert Q15 samples to float
arm_q15_to_float(buffer, float_buffer_L, 512);
// float_buffer samples are now standardized from > -1.0 to < 1.0
if (passThru ==0) {
memset(FFT_buffer + 1024, 0, sizeof(FFT_buffer) / 2);
// zero pad last half of array- necessary to prevent aliasing in FFT
//fill FFT_buffer with current audio samples
k = 0;
for (i = 0; i < 512; i++)
FFT_buffer[k++] = float_buffer_L[i]; // real
FFT_buffer[k++] = float_buffer_L[i]; // imag
// calculations are performed in-place in FFT routines
arm_cfft_f32(&arm_cfft_sR_f32_len1024, FFT_buffer, 0, 1); // perform complex FFT
arm_cmplx_mult_cmplx_f32(FFT_buffer, FIR_filter_mask, iFFT_buffer, FFT_length);
// complex multiplication in Freq domain = convolution in time domain
arm_cfft_f32(&arm_cfft_sR_f32_len1024, iFFT_buffer, 1, 1); // perform complex inverse FFT
k = 0;
l = 1024;
for (int i = 0; i < 512; i++) {
float_buffer_L[i] = last_sample_buffer_L[i] + iFFT_buffer[k++];
// this performs the “ADD” in overlap/Add
last_sample_buffer_L[i] = iFFT_buffer[l++];
//this saves 512 samples (overlap) for next time around
} //end if passTHru
// convert floats to Q15 and save in temporary array tbuffer
arm_float_to_q15(&float_buffer_L[0], &tbuffer[0], BUFFER_SIZE*4);

The above description assumes that 512 audio samples are available to filter, all at once. However, the Teensy Audio library doesn’t work this way. It operates with a timed interrupt service routine (ISR) that occurs every 2.9 ms and processes a single, 128-sample block of audio data.

All the Teensy Audio processing libraries must contain a routine called “update.” This routine is responsible for receiving one of these blocks, doing whatever processing is required, and then transmitting that block and releasing its memory. You can use numerous Audio library functions in series, if so desired. So, every 2.9 ms, the Audio ISR fires, and the update code for each of the audio functions that the programmer has used in the program will be executed in sequence. Each one is processing a single, 128-sample block of audio data, and then passing it along.

Obviously, I had to write some code to adapt this 128-sample block processing into one that works with 512 samples at a time. To do this, I define a variable called “state,” which persists between these Audio ISR “update” calls. At each update, “state” is incremented by 1. For states 0 to 3, I store the incoming 128-samples of audio data in a temporary 512-element integer buffer (incrementing the buffer pointer by 256 bytes each time).

When state=3, this temporary buffer is full, so I call the 512-point FFT convolution routine (described in the last section and shown in Listing 1). That fills up a 512-element integer transmit buffer. The state variable is now set to zero, to start the process over again. In addition, for states 0 through 3, I point to successive one-quarter sections of this transmit buffer, and send a 128-sample block from this buffer section back out to the Audio library’s queue, where it will either undergo further processing (if required by the program) or be sent out to the Codec to be converted to an analog audio signal. The transmit buffer will have no valid data in it the first four times that the Audio update occurs, since no filter processing has yet taken place. So, you could get a short “blip” of noise (around 12 ms) when the program first starts processing audio data.

If you’ve carefully followed the above explanation, you can see that out of four consecutive, ISR-driven “updates,” three of them do no processing apart from moving data from one buffer to another. It is the fourth update that does all the filter processing. Using the Audio library’s AudioProcessorUsage() function, I found that the percentage of available MCU processing power used by updates 1 through 3 was less than 1%, and update 4 was 47%. These figures are obtained with the Teensy 3.6 overclocked at 240 MHz. The figures—quoted on my original GitHub site for this project [3]—are for a Teensy 3.6 clocked at 180 MHz, and are proportionately higher.

Earlier, I explained that the desired FIR coefficients must be converted into what’s called the Filter Mask, for frequency-domain filtering. Basically, I was interested in two sources for these FIR coefficients:

1) FIR filter coefficients for standard types of filters, obtained by filter calculation programs—either web-based tools or dedicated programs running on either a PC or an embedded MCU
2) Guitar Cabinet Impulse files

Let’s look at #1 first, because this type of filtering could be used more widely. If you need a filter with specific parameters that will seldom or never change, you are probably best served using a FIR filter design application, either web-based or a PC application that can be downloaded. A common web-based program is TFilter [2].

Using this program, there are a few considerations to note. For use with the Teensy Audio library/Audio Shield, the sample frequency must be set to 44,117 Hz. The Teensy Audio library actually runs at a sample rate of 44,117 Hz, slightly different from the CD standard of 44,100 Hz. Also, the filter coefficients will be output in either double-precision floating-point or integer. You would choose integer in this case, as my program is designed to work primarily with Guitar Cabinet Impulse files, which are normally formatted as Microsoft WAV files. These files use a 16- bit waveform format. Figure 5 is a screenshot of TFilter showing a low-pass filter.

FIGURE 5 – A screen-capture of the Web-based program TFilter. This program can be used to generate FIR filter coefficients for various types of digital filters. To the right, you can see I’ve selected integer coefficients, because that is what my program expects. But floating-point numbers can also be chosen.

If the parameters of the filter must be changeable while the Teensy 3.6/Audio Shield is running, then another approach must be taken. If you needed only a few FIR filter profiles, it would be possible to pre-calculate them using TFilter, and then load several banks of FIR coefficients into flash memory, to be switched in and out of SRAM as needed. The Teensy 3.6 contains 1 MB of flash memory, so there’s plenty of room for filter coefficient banks.

Another approach is to embed a FIR filter calculation routine in the Teensy 3.6’s code, itself. I have included a Teensy program that includes the calc_FIR_coeffs function. This routine calculates floating-point FIR coefficients for Low-pass, High-Pass and Band-pass filters, for a user-selected number of FIR taps. Since this routine provides normalized floating-point coefficients, I multiply all the values by 32,768 before sending them to the “cabinet_impulse” array (a 16-bit integer array).

The parameters passed to this routine are as follows:

calc_FIR_coeffs (float * coeffs, int numCoeffs, float32_t fc, float32_t Astop, int type, float dfc, float Fsamprate)

float * coeffs: a pointer to a float32_t array large enough to handle the designated number of coefficients (taps)
int numCoeffs: an integer specifying the number of coefficients
float32_t fc: a floating-point number specifying center or cutoff frequency
float32_t Astop: a floating-point number specifying expected stopband attenuation in dB
int type: type of filter- 0-Low-Pass 1-High Pass 2-BandPass
float dfc: a floating-point number specifying half-filter bandwidth (for BandPass only)
float Fsamplerate: a floating-point number specifying the sample rate in Hz.

For Cabinet Impulse FIR coefficients, the coefficients are generally stored in a Microsoft WAV file. The ones I have seen contain enough data to fill a 513-tap FIR filter coefficient array. For some reason, the ones I have seen are often very long files—many hundreds of thousands of bytes or more. Of this, only the first 512 or so data points in the “wave” chunk of the file contain actual coefficient data. The rest are zero-padded. Microsoft WAV files do not just contain raw wave data—they also include a header section at the beginning of the file. This header section contains “meta-data” about the format of the file, a pointer to the start of the wave data, and the length of the wave.

For my Teensy 3.6 application, I place the WAV file containing the Cabinet Impulse file onto an SD card. This card must be inserted into the SD CARD socket on the Teensy 3.6, itself—not in the SD card socket found on the Audio Shield. In the program, I open the file “MG.WAV,” which is the name of the sample file I used. You must modify this line of my program to match the filename you have, or rename your file to match.

To find the start of the wave data, I open the file and search for the string “data.” Assuming it is a true WAV file, the string “data” should be found. I then skip over the next 4 bytes (the wave data size field) and then read in 513 integer values. These are stored in the array cabinet_impulse (type int16_t). Whichever method you use to generate the FIR coefficients, the coefficient data in the cabinet_impulse array must be converted into a frequency-domain Filter Mask. This is done, in the Setup portion of the program as follows:

// set to zero to disable audio processing until impulse has been loaded
// generates Filter Mask and enables the audio stream

Once convolution.impulse has executed, a valid Filter Mask array will exist, and the real-time processing (filtering) of the incoming audio stream will begin. Listing 2 shows the convolution.impulse routine. The routine is passed a pointer to the 513 element FIR coefficient array generated as described above. It first converts this integer array to a floating-point array. Then it fills up the first 513 real elements of the 1024 element FIR_filter_mask array with those 513 coefficients. Since the FIR_filter_mask array must hold complex values, every second element is set to zero—in other words, zeroing out the imaginary part. The final 511 complex elements of this array are also zeroed out. The rationale for zeroing out of the last part of the array is explained in Smith’s text [1]. The filter produces a lot of artifacts in the signal if this is not done!

LISTING 2 – The convolution.impulse routine takes a 513-element FIR array (integer) and converts it into a 1024-element Filter Mask (floating-point). The CMSIS complex FFT routine is used for this purpose.

void AudioFilterConvolution::impulse(int16_t *coefs) {
arm_q15_to_float(coefs, FIR_coef, 513); // convert int_buffer to float 32bit
int k = 0;
int i = 0;
enabled = 0; // shut off audio stream while impulse is loading
for (i = 0; i < (FFT_length / 2) + 1; i++)
FIR_filter_mask[k++] = FIR_coef[i];
FIR_filter_mask[k++] = 0;

for (i = FFT_length + 1; i < FFT_length * 2; i++)
FIR_filter_mask[i] = 0.0;
arm_cfft_f32( &arm_cfft_sR_f32_len1024,
IR_filter_mask, 0, 1);
for (int i = 0; i < 1024; i++) {
// Serial.println(FIR_filter_mask[i] * 32768);
// for 1st time thru, zero out the last sample buffer to 0
memset(last_sample_buffer_L, 0, sizeof(last_sample_buffer_L));
state = 0;
enabled = 1; //enable audio stream again

After the 1024 element array has been prepared as above, a complex FFT is performed on the array (CMSIS arm_cfft_f32). Part of the “magic” in these CMSIS FFT routines is that they do the FFT process “in place”—in other words, no separate array is needed for the transformed result. As a last step, this routine zeroes out the last_sample_buffer array, which is used in the overlap-add process mentioned earlier. The first time through the overlap-add process, there is no valid last_sample_buffer array data, so it needs to be zeroed out.

A few program details merit discussion. The FFT convolution filter that I wrote is structured to work with the Teensy Audio library. When the Teensyduino Arduino add-in is installed, this Audio library will be installed by default—unless you specifically un-check the box corresponding to it during the installation routine.

Two things must be done to the Audio library to include this convolution filter:

1) You must install some CMSIS files to the Teensy core library. The exact procedure for doing this can be found in the text file “Adding CMSIS 4 library files” located at Circuit Cellar’s article code & files download webpage. Alternately, instructions can be found on my GitHub site [3]. I also include alternate instructions to incorporate the newest CMSIS 5.3 library. Either one will work properly.

2) The convolution filter code consists of 2 files: filter_convolution.h and filter_convolution_cpp
These files must be added to the folder containing the Teensy Audio library. This folder will be located under whatever folder you have installed the Arduino/Teensyduino IDE. The path is: c:\your arduino folder\hardware\teensy\avr\libraries\Audio

Also, in that folder, edit Audio.h by adding the following line at the end:

#include “filter_convolution.h” // library file added by Brian Millier

Like any custom Audio library objects that you add yourself, this one will not show up in the Audio System Design Tool found on the PJRC site. Probably the easiest way to generate the setup/connection code needed to incorporate this filter into your audio configuration, is to draw your configuration using the Design Tool Web program, but place a FIR filter. Import this configuration into your sketch. Then, within the “// GUItool: begin automatically generated code” replace “AudioFilterFIR fir1” with “AudioFilterConvolution convolution”. Also, on two of the AudioConnection lines, replace instances of “fir1” with “convolution

In my sample program, this configuration has already been done. The above procedure is only needed if you are writing your own program using additional Audio library objects. If you want the convolution filter keywords to be highlighted in orange in the Arduino IDE (like all the other Audio library objects), you can add the following line to the keywords file (contained in the Audio folder):

AudioFilterConvolution<TAB> KEYWORD2.

Note that you must separate these two words with a TAB character, not with spaces.

I’ll just mention a few details about the NXP SGTL5000 Codec found on the Teensy Audio Shield. It contains both Line input and Microphone inputs. The Microphone input is configured for an electret microphone (DC bias is provided). The SGTL5000 has a programmable gain preamplifier for the Microphone input. Both Line out and Headphone outputs (stereo) are available, and the Headphone output channel has a wide-range volume control, which is adjusted under program control. The SGTL5000 contains its own Digital Audio Processor (DAP)—basically a specialized DSP that can perform various EQ and Auto Level Control functions. An easy way to become familiar with the capabilities/settings for this device, is to access the online program Teensy Audio Library Design Tool. See RESOURCES below for the link.

When using the Audio Shield, your sketch must contain the SGTL5000 control object. When this is included, all the necessary initialization code will we added to set up the SGTL5000 in a default configuration. The SGTL5000 is configured via the I2C bus. Its I2C address is 0x0A, which shouldn’t conflict with most other I2C devices that you might also want to use.

The easiest way to learn about the SGTL5000’s capabilities and programming is to use the Teensy Audio Library Design Tool. Figure 6 is a screenshot of the Audio Library Design Tool, showing a bit of the SGTL5000 info screen on the right. Figure 7 is a screenshot of the Audio Library Design Tool configured for this project. Note that a standard fir1 filter object is placed in the workspace. See the explanation in the prior section on how to replace the code generated by the fir1 object, with code that implements the Convolution filter instead.

FIGURE 6 – An easy way to familiarize yourself with the SGTL5000 Codec used in the project is to refer to its Help file in PJRC’s online Audio System Design Tool.

FIGURE 7 – The Audio System Design Tool’s workspace for this project. Note that the FIR object is shown. See article text for explanation.

Figure 8 is the schematic of the project. As you can see, it comprises two modules: A Teensy 3.6 MCU module and the Teensy Audio Shield. The Audio shield is designed so that it can be mounted on the Teensy 3.2, 3.5 or 3.6 MCU modules directly—eliminating any interconnecting wiring. The audio Line In and Line Out are available on a 10-pin IDC header. The signal designations are shown on the bottom of the board, and the pinout matches that of a PC motherboard’s audio Line in/out connector. A filter In/Out switch is connected to the Teensy Digital 37 GPIO pin. A 5-V power source can be applied either via the micro-USB port, or the Vin pin on the Teensy 3.6 module.

FIGURE 8 – Schematic diagram of the hardware used for this project. It consists of only a Teensy 3.6 module with a PJRC Teensy Audio Shield mounted on it.

I believe I spent more time figuring out how to write this code than on any other non-work-related program I’ve tackled. The final code seems very simple, because it makes extensive use of the CMSIS library routines. However, learning how they worked and how to integrate them into the pre-existing Teensy Audio Library was quite challenging. On the other side of the coin, building the circuit was trivial due to the easy integration of the Teensy 3.6 MCU module with the PJRC Audio Shield.

Author’s Note: I’d like to acknowledge all the programming effort of Paul Stoffregen, who wrote the Teensyduino Arduino add-in and the core of the Audio Library. I also referenced work done by Frank (DD4WH) on his Teensy SDR project, which included similar FFT convolution routines. A link to Frank’s Teensy SDR project can be found on the Circuit Cellar article materials webpage. 


[1] “The Scientist and Engineer’s Guide to Digital Signal Processing”, Stephen W Smith, Ph.D. :
[2] TFilter- online FIR filter design:
[3] Author’s Github Site for this project:

Teensy Audio Library Design Tool:

Teensy 3.6 Arm MCU module, Teensy Audio Shield:

SGTL5000 Codec:

Frank’s Teensy SDR project:

NXP Semiconductors |


Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.

Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
+ posts

Brian Millier runs Computer Interface Consultants. He was an instrumentation engineer in the Department of Chemistry at Dalhousie University (Halifax, NS, Canada) for 29 years.

Supporting Companies

Upcoming Events

Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2024 KCK Media Corp.

Fancy Filtering with the Teensy 3.6

by Brian Millier time to read: 27 min