Your Gaussian Guide
Although it goes by many names, the Gaussian function appears everywhere in statistics. In this article, Robert takes a look at the math of the Gaussian function in the context of coin-toss data analysis. He then shows why this is significant in electronics, by linking the function to RMS (root mean square) measurements.
Welcome back to “The Darker Side.” I am writing this article in April 2020, in the middle of the COVID-19 crisis from my home in France. Like hundreds of millions of people, I have been locked-down in my home for one month and will be for at least a couple more months. Not sure if we are already in the middle of it, but I hope that you are all safe, and that the worldwide situation will have improved when you read this article.
Anyway, I am spending a lot of time reading articles full of medical statistics—trying to balance actual facts and the effects of randomness. That’s what gave me the idea for this article. This month I will bring you into the world of statistics. More precisely, my goal is to show you why the so-called Gaussian function appears everywhere. If you have already seen expressions like “normal distribution” or “bell curve,” this is exactly the same thing. In fact, this ubiquitous notion has several names. Because we are electronics addicts, I’ll show you why this notion is closely linked with RMS (root mean square) measurements.
HEADS OR TAILS?
Let’s start with the most basic random game: Tossing a coin (Figure 1). As you probably know, this is a very old method to invoke randomness. Even the Romans played a version of it, calling it navia aut caput, which means “ship or head.” That’s because the coins at that time displayed a ship on one side and the emperor’s head on the other.
Tossing a coin is an example of a so-called Bernoulli Trial—an experiment with two possible outcomes, where the probability of success stays constant as more and more trials are performed. More precisely, the probability of getting a head is always 0.5—assuming the coin is not biased.
Moreover, this probability is independent of the past. This means that if you toss a coin several times, each try is independent of the others. In such a case, their probabilities can simply be multiplied. For example, if you toss a coin twice, the probability to have two heads is p(HH) = p(H)×p(H) = 0.5×0.5 = 0.25. This is exactly the same for the other combinations of heads and tails: p(HH) = p(HT) =p(TH) = p(TT) = 0.25. If you sum them up, you get the result of 1, as expected.
PASCAL’S TRIANGLE AND BINOMIALS
Now let’s play a little more complex game. Imagine that you toss a coin n times, and count the number of heads—winning, say, one dollar for each head. Will you become rich? The probability of getting p heads could be calculated by determining how many different combinations of n throws can give p heads. This is what mathematicians call a binomial coefficient C(p,n), and an easy way to visualize these distributions is the well-known Pascal’s triangle (Figure 2).
How does it work? Each successive line of the triangle shows the situation after one more throw, and the arrows show the possible paths. Starting from the top, the first throw can have two outcomes: heads or tails. Tails gives you $0, heads $1, with the same probability. Now look at the second line. There are three possible gains: $0, $1 or $2. However, the probability of having $1 is twice the probability of getting either $0 or $2, since there are two possible paths to get this amount (namely heads then tails, or tails then heads).
Look now at the last line, which summarizes what could happen after five throws. There are 25=32 possible paths (TTTTT, TTTTH and so on), and your gain will be between $0 and $5. The most probable will be a gain of either $2 or $3, with a probability of 10/32=0.31=31%. You may also end up with either $1 or $4 (5/32=16% chances each). Last—if you are very unlucky or very lucky—you may get either $0 or $5 (1/32=3% chances each). The sum of these probabilities is, of course 1. Check it yourself if you don’t trust me.
I’m sure you remember how this Pascal’s triangle is built: At each step, the path count for a given state is simply the sum of the path counts of its parent’s states. You also remember that this triangle allows you to find the algebraic expression of a binomial expression—such as (x+y)n, right? For example:
The binomial coefficients for each term are, respectively 1, 5, 10, 10, 5, 1—exactly the same as the path count on the fifth line of the Pascal’s triangle. If you think twice, or expand (x+y)5 manually on a piece of paper, you will soon find out why.
OK, now imagine that you have plenty of time, and you are throwing the coin five times over and over. What will be your average gain? By average gain we mean the sum of the number of heads in each trial, divided by the number of trials. It will be $2.50. Moreover, the larger the number of trials, the closer to $2.50 your average gain will be. This may seem obvious, but this is in fact a theorem: The law of large numbers (LLN), first demonstrated by the same J. Bernoulli mentioned earlier. What will be the distribution of the gains around this average? They will follow the binomial coefficients as calculated by Pascal’s triangle. Let’s try it. Throw a coin five times, count the number of heads, redo the same test hundreds of times and plot the histogram of your gains.
Because all that would be a boring game, I wrote a small simulation script in Scilab to do this experiment automatically (Listing 1). Scilab is an open-source and free alternative to MATLAB. Even if you don’t know its syntax, I am sure you will read this source code easily. A for loop executes the same experiment ntries times. At each iteration of the loop, the grand function is called, which is Scilab’s random generator function. With the syntax I used, this function provides as an output a vector of five random floating-point numbers, each ranging from 0 to 1.999999. I then take the integer part of this result, which is a vector of five numbers, either 0 or 1. The gains of the game, in dollars, is the sum of these numbers. The script then calculates and plots the histogram of the results, as well as another function I will talk about later.
What is the resulting graph? Figure 3 shows the results of 100 tries. The yellow bars show the actual result of the simulation, and the crosses show the theoretical probabilities as calculated with the binomial coefficients. As expected, they are close—and would be closer and closer as the number of trials increases.
To get a more interesting example, I ran the test again but this time tossing the coin 40 times, and doing the test either 100 times (Figure 4) or 10,000 times (Figure 5). As expected, the distribution is centered around an average gain of $20. When the number of trials is low, the resulting gains are more or less randomly distributed around this average. As the number of trials gets higher and higher, the shape of the distribution gets closer to the values of the binomial coefficients as calculated on the 40th line of Pascal’s triangle.
As you saw in this example, by tossing a coin 40 times you will probably get a reward of between, say, $15 and $25. You may get $0 or $40, but this would be very unlikely. To measure the width of such a distribution, statisticians use a metric named “standard deviation.” A low value indicates that the samples tend to stay close to the mean. This standard deviation, noted with the Greek letter σ, is calculated as the square root of the average of the squared differences from the mean. For those who prefer a small equation to a long sentence:
Now let’s come back to our heads or tails game and its binomial distribution of gains. The good news is that in this case the standard deviation is very easily calculated. It is simply the square root of the number of throws, divided by 2:
For example, in the test illustrated in Figure 5, there were 40 throws per try. So, the standard deviation is the square root of 40 divided by 2. Do the calculation, and you will find 3.16. This means that you may expect the gain to be “usually” in the range of $20 ± $3.16, and Figure 5 confirms that.
For an electronics-oriented guy, this standard deviation has another very common name: Root Mean Square (RMS), or more exactly: AC-coupled RMS. Yes, when an EE guy talks about RMS measurement, this is the same as when a statistician talks about standard deviation.
Why? Imagine that the gain in dollars for each try is a voltage sample. The average gain of $20 translates into an average voltage of 20V, with fluctuations around this value. Now put in a series capacitor to remove the DC component of the signal: You now get an average voltage of 0V, with the same fluctuations around zero. You have subtracted the average voltage. Last, switch on a voltmeter or oscilloscope, in true RMS measurement mode. The value you will get is exactly the standard deviation of the signal:
Looks like the definition of the standard deviation, doesn’t it?
Up to now I’ve talked about heads and tails games and binomial distributions. As explained, the binomial coefficients allow you to calculate the distribution of such games, as illustrated by the Scilab simulation earlier. The downside is that such coefficients are very complex to calculate when the number of trials is high. However, if you look once more at Figure 5, you will see that the shape of the binomial distribution looks very much like the so-called Gaussian curve (or normal curve or bell curve—once again, this is all the same). What is the link between these notions? Well, the link is very, very close.
Let’s go back in time. A French mathematician, Abraham de Moivre, demonstrated in 1733 that quite a simple formula provides a good approximation of binomial distributions—at least when the number of trials is large. This result was forgotten until another great mathematician, Karl Friedrich Gauss, rediscovered and generalized it in the early 1800s. He gave his name to the resulting function, and here is the Gaussian function:
It may seem complex but it isn’t. This formula directly gives the shape of the histogram around the average, without any tedious calculation of binomial coefficients, even for very large values of trials. The standard deviation I spoke about is σ and Xˉ is the average.
Now look again at Figure 3 and Figure 5. In these graphs, I plotted the value of the binomial coefficients as small crosses, but also plotted the Gaussian function as a plain, red curve. You may also want another look at the source code (Listing 1) to check how it’s done. If you look closely at Figure 3, with five tosses per try, you will see that the crosses are close to, but not exactly on, the red line. The Gaussian function approximates the binomial distribution quite well, but with some error. However, in Figure 5, with 40 tosses per trial, the approximation is excellent and makes it unnecessary to calculate complex binomial coefficients.
Now here comes the magic trick. Another mathematical result makes Gaussian distributions ubiquitous: The so-called central limit theorem (CLT). For the sake of simplicity, I will not dig into the details, but will give you an engineering translation of this very strong result:
Take any set of random signals and sum them. Then plot the histogram of their values. You will always get a Gaussian curve as soon as you have enough samples and enough variables!
Mathematicians will immediately note that such a statement leaves out plenty of hypotheses, but, in real life, that is it. This result is very strong because it doesn’t assume that the individual random signals must follow a Gaussian curve. And even if they don’t, their sum will. This was, in fact, exactly the case in our heads and tails games: Each throw can give only 0 or 1, but their sum is Gaussian.
Put simply, CLT states that the spread of anything affected by cumulative effects of randomness is Gaussian. The best example in electronics is thermal noise, which is due to the sum of plenty of small noises. It is Gaussian. Similarly, take a laser and check the shape of the beam: Gaussian. The distribution of blood pressure in a population? Gaussian. The IQ score of Circuit Cellar’s readers? Maybe with an average above 100, but Gaussian.
Because this Gaussian curve is everywhere, it is important to know its characteristics. Look first at Figure 6. It shows a Gaussian curve with an average value normalized at zero and a given standard deviation. The vertical bars show the limits at 1, 2 or 3 standard deviations from the average. Last, the percentages provide the overall probability to be in the given range. The Figure 6 graph is very important: For all Gaussian-shaped phenomena (and, as said, nearly all are), about 68% of the values will be within one standard deviation from the average. About 95% (more exactly 13.6+34.1+34.1+13.6) will stay within two standard deviations, and about 99.7% within three standard deviations.
What’s another way to express this result? You will almost never see a signal sample more than 3σ from the average. More specifically, it will likely be within two standard deviations (95 out of 100 should be), and almost certainly within 3σ (997 out of 1,000). Interested in more unusual events? Only 1 sample in each 15,000 will be more than 4σ, and 1 in each 1,700,000 will be more than 5σ.
A final interesting property about Gaussian curves is illustrated in Figure 7. If you take any Gaussian-shaped histogram, and cut it halfway from the maximum, you will get a width of 2.355 times the standard deviation. This value is called the full width at half maximum (FWHM).
SWITCH ON THE OSCILLOSCOPE
Checking if the FWHM of a distribution is close to 2.355σ or not is, in fact, a way to check whether a distribution could be Gaussian. Why not check this with an actual signal and an oscilloscope? For the purposes of this article, I did a very simple test. I switched on one of the nice scopes we have in our company’s lab: a Teledyne LeCroy Waverunner 610Zi. Without anything connected to its input, I increased the vertical scale up to 1mV per division to measure the input noise. The result is the top curve shown in Figure 8—noise… Using the measurement tools, I employed the scope to provide me with some statistics on this noise. Looking at Figure 8, I get the following:
Peak-to-peak voltage = 1.7mV
Average voltage = 30µV
RMS voltage = 173µV
Standard deviation = 170µV
Are these figures consistent? Let’s see. The average should be zero. It is 30µV, so it’s close to zero. The standard deviation and RMS voltage should be equal if the average voltage were zero—they are, respectively, 170 µV and 173µV. Very close again. Fine. Finally, is the peak-to-peak voltage consistent with the standard deviation? This is a difficult question, because it all depends on the number of samples. With a Gaussian noise, the longer you wait, the higher the peak-to-peak measurement will be. Theoretically, if you wait for an infinite time, then you should get an infinite peak-to-peak measurement—because very infrequent events do happen!
When I did the measurement, the scope was configured to use 2M-points memory. This means that there are 2 million points on the horizontal trace, used for the statistical calculations. We can assume that the peak-to-peak measurement did catch random events with a probability of roughly one time per 2 million. Remember that a Gaussian-shaped event has a probability wider than 5σ of 1 out of 1.5 million? That’s roughly it, so we can expect the peak-to-peak measurement to be ±5σ—that’s an amplitude of 10σ peak-to-peak. Is it true? You bet it is! The scope measured a standard deviation of 170µV and a peak-to-peak voltage of 1.7mV—exactly 10 times more!
This small experiment also shows you that you must be very cautious with peak-to-peak measurements. In particular, take with great care the recommendations found on the web, such as “eyeball the p-p jitter on your scope, and divide by 5 to get an estimated RMS value.” This may be true when using an old analog scope or a low-end digital scope with some thousand points per acquisition, but if its memory is millions of points, then you should divide by 10 and not 5.
Because this LeCroy scope has some advanced statistical features, I pressed some buttons to display an actual histogram of the noise (Figure 8, bottom curve). As expected, it looks like a Gaussian curve. But is it actually a Gaussian? The scope can actually measure the FWHM of the histogram, and I got FWHM=400µV. Is this consistent with the standard deviation of 170µV? It should be 2.355 higher. And indeed 170 × 2.355 = 400.35. It is wonderful when theory and practice match, isn’t it?
I can’t resist showing you another great way to get a Gaussian curve, this time using a purely mechanical device. Remember the Pascal’s triangle illustrated in Figure 2? Remember that the binomial distribution—which is very close to a Gaussian curve—is calculated by determining the number of paths going to each final value? Well, it is indeed possible to measure this number of paths by actually dropping beads on a vertical board with interleaved rows of pegs.
Statistically, each bead will take one of the paths at random, and you get at the bottom a number of beads proportional to the binomial coefficients. Such a device is called a “bean machine” or “Galton board,” because it was invented by Sir Francis Galton. According to Wikipedia, a large version of such a machine is on display at the Boston Museum of Science and other places. Personally, I bought a small version on Amazon from Four Pines Publishing. Its uses are limited, but it is lots of fun to see it in action. One is shown in Figure 9.
Here we are. As usual, I have only introduced the topic due to lack of space. There are plenty of other great properties of Gaussian curves. Do you want some examples just for the fun? What is the Fourier transform of a Gaussian? A Gaussian. What is the distribution with maximum entropy for a given standard deviation? A Gaussian. What is the product of two Gaussians? A Gaussian. The convolution of two Gaussian? A Gaussian. With such a list of properties, it isn’t a surprise to find Gaussians everywhere, for example for reliability calculations. Here’s one last one: Did you know that GSM cellular systems use a modulation named GMSK? Do you know that the G stands for? You guessed it—Gaussian! If you want to know why, read my earlier column on that topic (“The Darker Side: Pulse Shaping Techniques,” Circuit Cellar 285, April 2014).
Central limit theorem
Who really discovered the Bell Curve
Robert Matthews https://www.sciencefocus.com/science/who-really-discovered-the-bell-curve/
History of the Normal Distribution
Jenny Kenkel https://www.math.utah.edu/~kenkel/normaldistributiontalk.pdf
The Pi-Cubed Programming Challenge – Week 9: Tossing a Coin and the Bell Curve
Teledyne Lecroy Waverunner 610Zi oscilloscope
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • AUGUST 2020 #361 – Get a PDF of the issue