It’s relatively easy to store and manipulate integers in digital systems since digital systems are based on discrete values. Typically, we store unsigned integers in plain binary representation and signed integers in two’s complement representation.
Binary representation is fairly straightforward with each successive bit (moving from right to left) carrying a weighting of the next highest power of two. This is analogous to the decimal representation we are all familiar with, where each successive digit carries a weighting of the next highest power of 10.
For example, an 8-bit value consists of bits zero through seven, where bit zero represents a weighting of 20 = 1, bit one represents a weighting of 21 = 2, and so on up to bit seven which carries a weight of 27 = 128. The decimal value an n-bit binary number with bits ɑ0 through ɑn-1 is therefore:
We can therefore store the values between zero and 2n – 1 in an n-bit word (0 to 255 in 8 bits, 0 to 65 535 in 16 bits and 0 to 4 294 967 295 in 32-bits).
For negative integers we most often use two’s complement representation. You will often hear the most significant bit of a signed integer referred to as the “sign bit”, leaving the remaining bits to represent the magnitude of the number. I find this way of thinking unhelpful.
Rather than viewing the most significant bit, ɑ n-1, as a sign flag, I find it easier to think of it as having a negative weighting instead of a positive one. This is –(2n-1) instead of 2n-1 as it would have for an unsigned integer. In the case of and 8-bit signed integer, bit seven would have a weighting of –(27) or –128. The other bits carry the same weightings as for an unsigned number. Now if the most significant bit is zero we have a positive number in the range of 0 to 127 and if it is one, we have a negative number in the range –128 to –1. Table 1 shows how this works.
In the general case the decimal representation is:
We can therefore store the values between –2n-1 and 2n-1 – 1 in an n-bit word (-128 to +127 in 8 bits, –32 768 to – 32 767 in 16 bits and –2 147 483 648 to 2 147 483 647 in 32-bits).
Unfortunately, the world does not always come in neat integer values. Many of the real-world quantities that we are interested in are continuous (length, time, temperature) and are described by real numbers. Real numbers are those that can take any (one-dimensional) value and include the integers as well as rational numbers like –1/3 or 1.25 and irrational numbers like π or √2.
Real numbers cannot be precisely represented in digital form – the best we can do is approximate them. One of the simplest ways to do this is to use a fixed-point representation. This is really just integer representation with an implicit scaling factor. Using a decimal example may help. Four decimal digits can contain the values 0 000 to 9 999 or the values 00.00 to 99.99 if we decide to place a decimal point between the second and third digit. This is the same as scaling the integer by 1/100 and lets us approximate real numbers between zero and one hundred.
In the case of a fixed-point binary representation, for example, we could decide that the highest 16 bits of a 32-bit number are the integer part, and the lowest 16 bits are the fractional part. We have effectively scaled a 32-bit signed integer down by a factor of 216. This means we can represent numbers from –32 768.00000 to (approximately) +32 767.99985 with a resolution of about 15 x 10-6. This is a pretty useful range, but we can always vary the number of digits, and scaling to suit our application.
You will often see fixed point representations denoted by a “Q-number” written as Qm.n where m denotes the size of the integer and n the size of the fractional part (and the scale factor). This notation was devised by TI and applies to signed fixed point numbers so the length of the overall integer is m + n + 1 (since one bit is required for the sign). Our example is therefore a Q15.16 fixed point representation meaning we have 15 signed integer digits and 16 fractional digits.
You can then add and subtract fixed point numbers just as you do with integers. If you multiply or divide them, you will need to make sure to re-scale afterwards. If you are doing any amount of fixed-point maths, I recommend using one of the many open-source fixed point maths libraries such and “libfixmath” or “Fixed Point Math Library for C” which provide useful functions to convert to and from fixed point format, perform complex mathematical operations such as square root and trig functions and much more.
You might ask why not just use floating point numbers which are built into may languages including C and C++. It depends on your application, but you may not want or need the considerable code and time overhead that comes with floating point libraries. Unless your MCU has a hardware FPU, floating point operations will almost always be slower than the equivalent fixed-point operation. This is the reason a lot of DSP code is written for fixed-point numbers.
The important thing is to keep in mind that whatever the representation, real numbers can only ever be approximated in digital systems – its up to you to make sure this approximation is suitable for your purpose.
“Signed Number Representations.” In Wikipedia, August 29, 2022. https://en.wikipedia.org/w/index.php?title=Signed_number_representations&oldid=1107359112.
“Fixed-Point Arithmetic.” In Wikipedia, January 8, 2023. https://en.wikipedia.org/w/index.php?title=Fixed-point_arithmetic&oldid=1132409268.
“Q (Number Format).” In Wikipedia, December 22, 2022. https://en.wikipedia.org/w/index.php?title=Q_(number_format)&oldid=1128955177.
SourceForge. “Fixed Point Math Library for C.” Accessed January 13, 2023. https://sourceforge.net/projects/fixedptc/.
Mayer, Markus. “Libfixmath.” C, December 28, 2022. https://github.com/sunsided/libfixmath.Sponsor this Article
Andrew Levido (firstname.lastname@example.org) earned a bachelor’s degree in Electrical Engineering in Sydney, Australia, in 1986. He worked for several years in R&D for power electronics and telecommunication companies before moving into management roles. Andrew has maintained a hands-on interest in electronics, particularly embedded systems, power electronics, and control theory in his free time. Over the years he has written a number of articles for various electronics publications and occasionally provides consulting services as time allows.