Circuit Cellar Flash Back – Motion Triggered Video Camera Multiplexer

The new year 2018 is almost upon us. It’s a special year for us because the year marks Circuit Cellar’s 30th anniversary. In tribute of that, we thought we’d share an article from the very first issue. 


anniversary 2

Motion Triggered Video Camera Multiplexer
by Steve Ciarcia

One of the most successful Circuit Cellar projects ever was the ImageWise video digitizing and display system (BYTE, MayAugust ‘87). It seems to be finding its way into a lot of industrial applications. I suppose I should feel flattered that a whole segment of American industry might someday depend on a Circuit Cellar project, but I can’t let that hinder me from completing the project that was the original incentive for ImageWise. Let me explain.

How it all started

When I’m not in the Circuit Cellar I’m across town at INK or in an office that I use to meet a prospective consulting client so that he doesn’t think that I only lead a subterranean existence. Rather than discuss the work done for other clients to make my point, however, I usually demonstrate my engineering expertise more subtly by just leaving some of the electronic “toys” I’ve presented lying around. The Fraggle Rock lunchbox with the dual-disk SBI 80 in it gets them every time! ImageWise was initially conceived to be the “piece de resistance”” of these hardware toys. The fact that it may have had some commercial potential was secondary. I just wanted to see the expressions on the faces of usually stern businessmen when I explained that the monitor on the corner of my desk wasn’t a closedcircuit picture of the parking lot outside my office building. It was a live video data transmission from the driveway at my house in an adjacent town.

Implementing this video system took a lot of work and it seems like I’ve opened Pandora’s box in the process. It would have been a simple matter to just aim a camera at my house and transmit a picture to the monitor on the desk but the Circuit Cellar creed is that hardware should actually work, not just impress business executives. ImageWise is a standalone serial video digitizer (there is a companion serial input video display unit as well) which is not computer dependent. Attached to a standard video camera, it takes a “video snapshot” at timed intervals or when manually triggered. The 256×244-pixel (64level grayscale) image is digitized and stored in a 62K-byte block of memory. It is then serially transmitted either as an uncompressed or run-length-encoded compressed file (this will generally reduce the 62K bytes to about 40K bytes per picture, depending upon content).

An ImageWise digitizer/transmitter normally communicates with its companion receiver/display at 28.8K bits per second. Digitized pictures therefore can be taken and displayed about every 14 seconds. While this might seem like a long time, it is quite adequate for surveillance activities and approximates the picture taking rate of bank security cameras.

“Real-Time” is relative

When we have to deal with remote rather than direct communication, “freeze-frame” imaging systems such as ImageWise can lose most of their “real time” effectiveness as continuous-activity monitors due to slow transmission mediums. Using a 9600-bps modem, a compressed image will take about 40 seconds to be displayed. At 1200 bps it will take over 5 minutes!

Of course, using such narrow logic could lead one to dismiss freeze-frame video imaging and opt for hardwired direct video, whatever the distance. However, unless you own a cable television or telephone company you might have a lot of trouble stringing wires across town. All humor aside, the only reason for using continuous monitoring systems at all is to capture and record asynchronous “events.” In the case of a surveillance system, the “event” is when someone enters the area under surveillance. For the rest of the time, you might as well be taking nature photos because, according to the definition of an event, nothing else is important. Most continuous surveillance video systems are, by necessity, real-time monitors as well. Because they have no way to ascertain that an event has occurred they simply record everything and ultimately capture the “event” by default. So what if there is 6 hours of video tape or 200 gigabytes of useless video data transmission around a 4-second “event.”

If we know exactly when an event occurs and take a freeze frame picture exactly at that time, there is no difference between its record of the event and a real-time recorder or snap-shot camera at the same instant. The only difference is that a freeze-frame recorder needs some local intelligence to ascertain that an event is occurring so that it knows when to snap a picture. Sounds simple, right?

To put real-timing into my driveway monitor, I combined a video camera and an infrared motion detector. When someone (or something) enters the trigger zone of the motion detector it will also be within the field of the video camera. If motion is detected, the controller triggers the ImageWise to capture that video frame at that instant and transmit the picture via modem immediately. The result is, in fact, real-time video, albeit delayed by 40 seconds. Using a 9600-bps modem, you will see what is going on 40 seconds after it has occurred. (Of course, you’ll see parts of the picture sooner as it is painting on the screen.) Subsequent motion will trigger additional pictures until eventually the system senses nothing and goes back to timed update. With such a system you’ll also gain new knowledge. You’ll know that it was the UPS truck that drove over the hedge because you were watching, but you aren’t quite sure who bagged the flower bed.

Of course knowing a little bit is sometimes worse than nothing at all. While a single video camera and motion detector might cover the average driveway, my driveway has multiple entrances and a variety of parking areas. When I first installed a single camera to cover the main entrance all it did was create frustration. I would see a car enter and park. If the person exited the vehicle they were soon out of view of the camera and I’d be thinking, “OK, what are they doing?” Rather than laying booby traps for some poor guy delivering newspapers, I decided to expand the system to cover additional territory. Ultimately, I installed three cameras and four motion detectors which could cover all important areas and provide enough video resolution to specifically recognize individuals. (Since I have four telephone lines into my house and only one is being used with ImageWise, I suppose the next step is to use one of them as a live intercom to speak to these visitors. A third line already goes to the home control system so I could entertain less-welcome visitors with a few special effects).

Motion Triggered Video M U X

Enough of how I got into this mess! What this is all leading to is the design of my motion triggered video camera multiplexing (MTVCM) system. I am presenting it because it was fun to do, it solved a particular personal problem, and if I don’t document it somehow, 1’11 never remember what I’ve got wired the next time I work on it.

The MTVCM is a 3-board microcomputer-based 4-channel video multiplexer with optoisolated trigger control inputs (see figure 1). Unlike the high-tech totally solidmultiplexer state audio/video (AVMUX) which I presented a couple years ago (BYTE Feb ‘86), the MTVCM is designed to be simple, lightning-proof, reliable, and above all flexible.

anniversary 3

The MTVCM is designed for relatively harsh environments. To minimize wire lengths from cameras and sensors, the MTVCM is mounted in an outside garage where its anticipated operating temperature range is -20°C to +85”C. The MTVCM operates as a standalone unit running a preprogrammed control program or can be remotely commanded to operate in a specific manner. It is connected to the Imagewise and additional electronics in the house via a twisted-pair RS-232 line, one TTL “camera ready” line, and a video output cable. At the heart of the MTVCM is an industrial temperature version of the Micromint BCC52 8052based controller which has an onboard full floating-point 8K BASIC, EPROM programmer, 48K bytes of memory, 24 bits of parallel I/O and 2 serial ports (for more information on the BCC52 contact Micromint directly, see the Articles section of the Circuit Cellar BBS, see my article “Build the BASIC-52 Computer,” BYTE, Aug ‘85, or send $2 to CIRCUIT CELLAR INK for a reprint of the original BCC52 article).

Because the BCC52 is well documented, I will not discuss it here.
The MTVCM is nothing more than a specific application of a BCC52 process controller with a little custom I/O. In the MTVCM the custom I/O consists of a 4channel relay multiplexer board and a 4-channel optoisolated input board (Micromint now manufactures a BCC40R 8-channel relay output board and a BCC40D direct decoding 8-channel optoisolated input/output board. Their design is different and should not be confused with my MTVCM custom I/O boards). Each of my custom circuits is mounted on a BCC55 decoded and buffered BCC-bus prototyping board.

Figure 2 details the basic circuitry of the BCC55 BCC-bus prototyping board. The 44-pin BCC-bus is a relatively straightforward connection system utilizing a low-order multiplexed address/data configuration directly compatible with many standard microprocessors such as the Z8, 8085, and the 8052. On the protoboard all the pertinent busses are fully latched and buffered. The full 16-bit address is presented on J19 and J20 while the 8-bit buffered data bus is available at J21. J22 presents eight decoded I/O strobes within an address range selected via JP2.

The Multiplexer Board

Figure 3 is the schematic of the relay multiplexer added to the prototyping board. The relay circuit is specifically addressed at C900H and any data sent to that location via an XBY command [typically XBY(0C900H)=X] will be latched into the 74LS273. Since it can be destructive to attach two video outputs together, the four relays are not directly controlled by the latch outputs. Instead, bits DO and Dl are used to address a 74LS139 one-off our decoder chip. The decoder is enabled by a high-level output on bit D3. Therefore, a 1000 (binary) code selects relay 4 while a 1011 code selects relay 1. An output of 0000 shuts off the relay mux (eliminating the decoder and going directly to the relay drivers allows parallel control of the four relays).

anniversary 4

All the normally-open relay contacts are connected together as a common output. Since only a single relay is ever on at one time, that video signal will be routed to the output. If the computer fails or there is a power interrupt, the default output state of a 74LS273 is normally high. Therefore, the highest priority camera should be attached to that input. If the system gets deep-sixed, the output will default to that camera and will still be of some use (I could also have used one of the normally-closed contacts instead but chose not to).

Fools and Mother Nature

I’m sure you’re curious so I will anticipate your question and answer it at the same time. With all the high-tech stuff that I continually present, how come I used mechanical relays? The answer is lightning! Anyone familiar with my writings will remember that I live in a hazardous environment when it comes to
Mother Nature. Every year I get blasted and it’s always the high-tech stuff that gets blitzed.

Because the MTVCM has to work continuously as well as be reliable I had to take measures to protect it from externally-induced calamities. This meant that all the inputs and outputs had to be isolated. In the case of the video mux, the only low-cost totally isolated switches are mechanical relays. CMOS multiplexer chips like the ones I’ve used in other projects are not isolated and would be too susceptible. (Just think of the MTVCM as a computer with three 150-foot lightning collectors running to the cameras.) Relays still serve a useful purpose whatever the level of integrated circuit technology. They also work.

Because the infrared motion sensors are connected to the AC power line and their outputs are common with it, these too had to be isolated to protect the MTVCM. Figure 4 details the circuit of the 4-channel optoisolator input board which connects to the motion detectors.

The Optoisolator Board

The opto board is addressed at CAOOH. Reading location CAOOH
[typically X=XBY(OCAOOH)] will read the 8 inputs of the 74LS244. Bits O-3 are connected to the four optoisolators and bits 4-7 are connected to a 4-pole dip switch which is used for configuration and setup. Between the optoisolators and the LS244 are four 74LS86 exclusive-OR gates. They function as selectable inverters. Depending upon the inputs to the optoisolators (normally high or low) and the settings of DIP SW2 you can select what level outputs you want for your program (guys like me who never got the hang of using PNP transistors have to design hardware so that whatever programming we are forced to do can at least be done in positive logic).

anniversary 7The optoisolators are common units sold by OPT022, Gordos,and other manufacturers. They are generically designated as either IAC5 or IDC5 modules depending upon whether the input voltage is 115 VAC or 5-48 VDC.
Since the motion detectors I used were designed to control AC flood lights, I used the IAC5 units connected across the lights.

Now that we have the hardware I suppose we have to have some software. For all practical purposes, however, virtually none is required. Since teh MTVCM is designed with hardcoded parallel port addressing, you only need about a three-line program to read the inputs, make a decision and select a video mux channel; you know, something like READ CAOOH, juggle it, and OUT C900H. I love simple software.

Of course, I got a little more carried away when I actually wrote my camera control program. I use a lot of REM statements to figure out what I did. Since it would take up too much room here, I’ve posted the MTVCM mux control software on the Circuit Cellar BBS (203-8711988) where you can download it if you want to learn more. Basically, it just sits there looking at camera #l.. If it receives a motion input from one of the sensors, it switches to the appropriate camera and generates a “camera ready” output (TTL output which is optoisolated at the other end) to the ImageWise in the house. It stays on that camera if it senses additional motion or switches to other cameras if it senses motion in their surveillance area. Eventually, it times out and goes back to camera # 1.

Basically, that’s all there is to the MTVCM. If you are an engineer you can think of it as a lightning-proof electrically-isolated process-control system. If not, just put it in your entertainment room and use it as a real neat camera controller. Now I’ve opened a real bag of worms. Remotely controlling the ImageWise digitizer/transmitter from my office through the house to the MTVCM is turning into a bigger task than I originally conceived. Getting the proper picture and tracking someone in the driveway is only part of the task.

anniversary 8

I can already envision a rack of computer equipment in the house which has to synchronize this data traffic. My biggest worry is not how much coordination or equipment it will involve, but how I can design it so that I can do it all with a three-line BASIC program! Be assured that I’ll tell you how as the saga unfolds.

Article first appeared in Issue 1 of Circuit Cellar magazine – January/February 1988

Talking Hands: American Sign Language Gesture Recognition Glove

Roberto developed a glove that enables communication between the user and those
around him. While the design is intended for use by people communicating in American Sign Language, you can apply what you learn in this article to a variety of communications applications.Capture
PHOTO 1-Here you see the finished product with all of the sensors sewn in. The use of string as opposed to adhesive for the sensors allowed the components to smoothly slide back and forth as the hand was articulated.

By Roberto Villalba

While studying at Cornell University in 2014, my lab partner Monica Lin and I designed and built a glove to be worn on the right hand that uses a machine learning (ML) algorithm to translate sign language into spoken English (see Photo 1). Our goal was to create a way for the speech impaired to be able to communicate with the general public more easily. Since every person’s hand is a unique size and shape, we aimed to create a device that could provide reliable translations regardless of those differences. Our device relies on a variety of sensors, such as flex sensors, a gyroscope, an accelerometer, and touch sensors to quantify the state of the user’s hand. These sensors allow us to capture the flex on each of the fingers, the hand’s orientation, rotation, and points of contact. By collecting a moderate amount of this data for each sign and feeding it into a ML algorithm, we are able to learn the association between sensor readings and their corresponding signs. We make use of a microcontroller to read, filter and send the data from the glove to a PC. Initially, some data is gathered from the users and the information is used to train a classifier that learns to differentiate between signs. Once the training is done, the user is able to put on the glove and make gestures which the computer then turns into audible output.

After performing some calculation and characterizing our flex sensors, we decided to use a 10-kΩ resistor. Note that the rightmost point goes into one of the microcontroller’s ADC.

FIGURE 1-After performing some calculation and characterizing our flex sensors, we decided to use a 10-kΩ resistor. Note that the rightmost point goes into one of the microcontroller’s ADC.

HIGH-LEVEL DESIGN
We use the microcontroller’s analog-to digital converter (ADC) to read the voltage drop across each of the flex sensors. We then move on to reading the linear acceleration and rotation values from the accelerometer and gyro sensor using I 2C. And finally, we get binary readings from each of the touch sensors regarding if there exists contact or not. We perform as many readings as possible within a given window of time and use all of this data to do some smoothing. This information is then sent through serial to the PC where it is gathered and processed. Python must listen to information coming in from the microprocessor and either store data or predict based on already learned information. Our code includes scripts for gathering data, loading stored data, classifying the data that is being streamed live, and some additional scripts to help with visualization of sensor readings and so on.

MCU & SENSORS
The design comprises an Atmel ATmega1284P microcontroller and a glove onto which the various sensors and necessary wires were sewn. Each finger has one Spectra Symbol flex sensor stitched on the backside of the glove. The accelerometer and gyro sensors are attached to the center of the back of the glove. The two contact sensors were made out of copper tape and wire that was affixed to four key locations.

Since each flex sensor has a resistance that varies depending on how much the finger is bent, we attached each flex sensor as part of a voltage divider circuit in order to obtain a corresponding voltage that can then be input into the microcontroller.

Capture3

We determined a good value for R1 by analyzing expected values from the flex sensor. Each one has a flat resistance of 10 k and a maximum expected resistance (obtained by measuring its resistance on a clenched fist) of about 27 k. In order to obtain the maximum range of possible output voltages from the divider circuit given an input voltage of 5 V, we plotted the expected ranges using the above equation and values of R1 in the range of 10 to 22 k. We found that the differences between the ranges were negligible and opted to use 10 k for R1 (see Figure 1).

Our resulting voltage divider has an output range of about 1 V. We were initially concerned that the resulting values from the microcontroller’s ADC converter would be too close together for the learning algorithm to discern between different values sufficiently. We planned to address this by increasing the input voltage to the voltage divider if necessary, but we found that the range of voltages described earlier was sufficient and performed extremely well.

The InvenSense MPU-6050 accelerometer and gyro sensor packet operates on a lower VCC (3.3 V) compared to the microcontroller’s 5 V. So as not to burn out the chip, we created a voltage regulator using an NPN transistor and a trimpot, connected as shown. The trimpot was adjusted so that the output of the regulator reads 3.3 V. This voltage also serves as the source for the pull-up resistors on the SDA and SCL wires to the microcontroller. Since the I 2C devices are capable only of driving the input voltages low, we connect them to VCC via two 4.7-k pull-up resistors (see Figure 2).

As described later, we found that we needed to add contact sensors to several key spots on the glove (see Figure 3). These would essentially function as switches that would pull the microcontroller input pins to ground to signal contact (be sure to set up the microcontroller pins to use the internal pull up resistors).

Figure 2: Here we see the schematic of the voltage regulator circuit that we created in order to obtain 3.3 V. The bottom of the schematic shows how this same regulator was used to pull up the signals at SCL and SDA. Figure 3: The contact sensor circuitry was quite simple. The input pins of the microcontroller are set to the internal pull-up resistors and whenever the two corresponding copper ends on the fingers touch the input is pulled low.

Figure 2: Here we see the schematic of the voltage regulator circuit that we created in order to obtain 3.3 V. The bottom of the schematic shows how this same regulator was used to pull up the signals at SCL and SDA.

Figure 3: The contact sensor circuitry was quite simple. The input pins of the microcontroller are set to the internal pull-up resistors and whenever the two corresponding copper ends on the fingers touch the input is pulled low.

I2C COMMUNICATIONS
Interfacing with the MPU-6050 required I 2C communication, for which we chose to use Peter Fleury’s public I 2C library for AVR microcontrollers. I 2C is designed to support multiple devices using a single dedicated data (SDA) bus and a single clock (SCL) bus. Even though we were only using the interface for the microcontroller to regularly poll the MPU6050, we had to adhere to the I 2C protocol. Fleury’s library provided us with macros for issuing start and stop conditions from the microcontroller (which represent different signals that the microcontroller is requesting data from the MPU-6050 or is releasing control of the bus). These provided macros allowed for us to easily initialize the I 2C interface, set up the MPU-6050, and request and receive the accelerometer and gyroscope data (described later).

Figure 4: The image is the visual output received from plotting sequences of sensor readings. The clear divisions across the horizontal signal the different signs A, B, C, and D, respectively.

Figure 4: The image is the visual output received from plotting sequences of sensor readings. The clear divisions across the horizontal signal the different signs A, B, C, and D, respectively.

While testing our I2C communication with the MPU-6050, we found that the microcontroller would on rare occasions hang while waiting for data from the I2C bus. To prevent this from stalling our program, we enabled a watchdog timer that would reset the system every 0.5 seconds, unless our program continued to progress to regular checkpoint intervals, at which time we would reset the watchdog timer to prevent it from unnecessarily resetting the system. We were able to leverage the fact that our microcontroller’s work consists primarily of continuously collecting sensor data and sending packets to a separate PC.

Photo 2: In this image we see the hand gestures for R, U, and V. As you can tell, there is not much difference in the hand’s orientation or the amount of flex on the fingers. However, note that the copper pieces make different kinds of contact for each of the signs.

Photo 2: In this image we see the hand gestures for R, U, and V. As you can tell, there is not much difference in the hand’s orientation or the amount of flex on the fingers. However, note that the copper pieces make different kinds of contact for each of the signs.

TINYREALTIME
For the majority of the code, we used Dan Henriksson and Anton Cervin’s TinyRealTime kernel. The primary reason for using this kernel is that we wanted to take advantage of the already implemented non-blocking UART library in order to communicate with the PC. While we only had a single thread running, we tried to squeeze in as much computation as possible while the data was being transmitted.

The program first initializes the I 2C, the MPU, and the ADC. After it enters an infinite loop it resets the watchdog timer and gets 16 readings from all of the sensors: accelerometers, gyroscopes, flex-sensors, and touch sensors. We then take all of the sensor values and compute filtered values by summing all of the 16 readings from each sensor. Since summation of the IMU sensors can produce overflow, we make sure to shift all of their values by 8 before summing them up. The data is then wrapped up into byte array packet that is organized in the form of a header (0xA1B2C3D4), the data, and a checksum of the data. Each of the sensors is stored into 2 bytes and the checksum is calculated by summing up the unsigned representation of each of the bytes in the data portion of the packet into a 2-byte integer. Once the packet has been created it is sent through the USB cable into the computer and the process repeats.

PYTHON COMMUNICATION
Communication with the microcontroller was established through the use of Python’s socket and struct libraries. We created a class called SerialWrapper whose main goal is to receive data from the microcontroller. It does so by opening a port and running a separate thread that waits on new data to be available. The data is then scanned for the header and a packet of the right length is removed when available. The checksum is then calculated and verified, and, if valid, the data is unpacked into the appropriate values and fed into a queue for other processes to extract. Since we know the format of the packet, we can use the struct library to extract all of the data from the packet, which is in a byte array format. We then provide the user with two modes of use. One that continuously captures and labels data in order to make a dataset, and another that continuously tries to classify incoming data. Support Vector Machines (SVM) are a widely used set of ML algorithms that learn to classify by using a kernel. While the kernel can take various forms, the most common kind are the linear SVMs. Simply put, the classification, or sign, for a set of readings is decided by taking the dot product of the readings and the classifier. While this may seem like a simple approach, the results are quite impressive. For more information about SVMs, take a look at scikit-learn’s “Support Vector Machines” (http://scikit-learn.org/stable/modules/svm.html).

PYTHON MACHINE LEARNING
For the purposes of this project we chose to focus primarily on the alphabet, a-z, and we added two more labels, “nothing” and “relaxed”, to the set. Our rationale for providing the classifier “nothing” was in order to have a class that was made up of mostly noise. This class would not only provide negative instances to help learn our other classes, but it also gave the classifier a way of outputting that the gestured sign is not recognized as one of the ones that we care about. In addition, we didn’t want the classifier to be trying to predict any of the letters when the user was simply standing by, thus we taught it what a “relaxed” state was. This state was simply the position that the user put his/her hand when they were not signing anything. In total there were 28 signs or labels. For our project we made extensive use of Python’s scikit-learn library. Since we were using various kinds of sensors with drastically different ranges of values, it was important to scale all of our data so that the SVM would have an easier time classifying. To do so we made use of the preprocessing tools available from scikit-learn. We chose to take all of our data and scale it so that the mean for each sensor was centered at zero and the readings had unit variance. This approach brought about drastic improvements in our performance and is strongly recommended. The classifier that we ended up using was a SVM that is provided by scikit-learn under the name of SVC.

Figure 5: The confusion matrix demonstrates how many times each label is predicted and how many times that prediction is accurate. We would like to see a perfect diagonal line, but we see that one square does not adhere to this. This square corresponds to “predicted V when it was really U” and it shows about a 66% accuracy.

Figure 5: The confusion matrix demonstrates how many times each label is predicted and how many times that prediction is accurate. We would like to see a perfect diagonal line, but we see that one square does not adhere to this. This square corresponds to “predicted V when it was really U” and it shows about a 66% accuracy.

Another part that was crucial to us as developers was the use of plotting in order to visualize the data and qualify how well a learning algorithm should be able to predict the various signs. The main tool that was developed for this was the plotting of a sequence of sensor readings as an image (see Figure 4). Since each packet contained a value for each of the sensors (13 in total), we could concatenate multiple packets to create a matrix. Each row is thus one of the sensor and we look at a row from left to right we get progressively later sensor readings. In addition, every packet makes up a column. This matrix could then be plotted with instances of the same sign grouped together and the differences between these and the others could then be observed. If the difference is clear to us, then the learning algorithm should have no issue telling them apart. If this is not the case, then it is possible that the algorithm could struggle more and changes to the approach could have been necessary.

The final step to classification is to pass the output of the classifier through a final level of filtering and debouncing before the output reaches the user. To accomplish this, we fill up a buffer with the last 10 predictions and only consider something a valid prediction if it has been predicted for at least nine out of the last 10 predictions. Furthermore, we debounce this output and only notify the user if this is a novel prediction and not just a continuation of the previous. We print this result on the screen and also make use of Peter Parente’s pyttsx text-to-speech x-platform to output the result as audio in the case that it is neither “nothing” or “relaxed.”

RESULTS
Our original glove did not have contact sensors on the index and middle fingers. As a result, it had a hard time predicting “R,” “U,” and “V” properly. These signs are actually quite similar to each other in terms of hand orientation and flex. To mitigate this, we added two contact sensors: one set on the tips of the index and middle fingers to detect “R,” and another pair in between the index and middle fingers to discern between “U” and “V.”

As you might have guessed, the speed of our approach is limited by the rate of communication between the microcontroller and the computer and by the rate at which we are able to poll the ADC on the microprocessor. We determined how quickly we could send data to the PC by sending data serially and increasing the send rate until we noticed a difference between the rate at which data was being received and the rate at which data was being sent. We then reduced the send frequency back to a reasonable value and converted this into a loop interval (about 3 ms).

We then aimed to gather as much data as possible from the sensors in between packet transmission. To accomplish this, we had the microcontroller gather as much data as possible between packets. And in addition to sending a packet, the microcontroller also sent the number of readings that it had performed. We then used this number to come up with a reasonable number of values to poll before aggregating the data and sending it to the PC. We concluded that the microcontroller was capable of reading and averaging each of the sensors 16 times each, which for our purposes would provide enough room to do some averaging.

The Python algorithm is currently limited by the rate at which the microcontroller sends data to the PC and the time that it takes the speech engine to say the word or letter. The rate of transfer is currently about thirty hertz and we wait to fill a buffer with about ten unanimous predictions. This means that the fastest that we could output a prediction would be about three times per second which for our needs was suitable. Of course, one can mess around with the values in order to get faster but slightly less accurate predictions. However, we felt that the glove was responsive enough at three predictions per second.

While we were able to get very accurate predictions, we did see some slight variations in accuracy depending on the size of the person’s hands. The accuracy of each flexsensor is limited beyond a certain point. Smaller hands will result in a larger degree of bend. As a result, the difference between slightly different signs with a lot of flex tends to be smaller for users with more petite hands. For example, consider the signs for “M” and “S.” The only difference between these signs is that “S” will elicit slightly more flex in the fingers. However, for smaller hands, the change in the resistance from the flex-sensor is small, and the algorithm may be unable to discern the difference between these signs.

Figure 6: We can see that even with very small amounts of data the classifier does quite well. After gathering just over 60 readings per sign it achieves an accuracy of over 98%.

Figure 6: We can see that even with very small amounts of data the classifier does quite well. After gathering just over 60 readings per sign it achieves an accuracy of over 98%.

In the end, our current classifier was able to achieve an accuracy of 98% (the error being composed almost solely of u, v sign confusion) on a task of 28 signs, the full alphabet as well as “relaxed” and “nothing” (see Figure 5). A random classifier would guess correctly 4% of the time, clearly indicating that our device is quite accurate. It is however worth noting that the algorithm could greatly benefit from improved touch sensors (seeing as the most common mistake is confusing U for V), being trained on a larger population of users, and especially on larger datasets. With a broad enough data set we could provide the new users with a small test script that only covers difficult letters to predict and relies on the already available data for the rest. The software has currently been trained on the two team members and it has been tested on some users outside of the team. The results were excellent for the team members that trained the glove and mostly satisfying though not perfect for the other volunteers. Since the volunteers did not have a chance to train the glove and were not very familiar with the signs, it is hard to say if their accuracy was a result of overfitting, individual variations in signing, or inexperience with American Sign Language. Regardless, the accuracy of the software on users who trained was near perfect and mostly accurate for users that did not know American Sign Language prior to and did not train the glove.

Lastly it is worth noting that the amount of data necessary for training the classifier was actually surprisingly small. With about 60 instances per label the classifier was able to reach the 98% mark. Given that we receive 30 samples per second and that there are 28 signs, this would mean that gathering data for training could be done in under a minute (see Figure 6).

FUTURE UPGRADES
The project met our expectations. Our initial goal was to create a system capable of recognizing and classifying gestures. We were able to do so with more than 98% average accuracy across all 28 classes. While we did not have a solid time requirement for the rate of prediction, the resulting speed made using the glove comfortable and it did not feel sluggish. Looking ahead, it would make sense to improve our approach for the touch sensors since the majority of the ambiguity in signs come from the difference between U and V. We want to use materials that lend themselves more seamlessly to clothing and provide a more reliable connection. In addition, it will be beneficial to test and train our project on a large group of people since this would provide us with richer data and more consistency. Lastly, we hope to make the glove wireless, which would allow it to easily communicate with phones and other devices and make the system truly portable.

RESOURCES
Arduino, “MPU-6050 Accelerometer + Gyro,” http://playground.arduino.cc/ Main/MPU-6050.

Atmel Corp., “8-Bit AVR Microcontroller with 128K Bytes In-System Programmable Flash: ATmega1284P,” 8059D­AVR­ 11/09, 2009,
www.atmel. com/images/doc8059.pdf.

Fleury, “AVR-Software,” 2006,
http://homepage. hispeed.ch/peterfleury/avrsoftware.html.

Lund University, “Tiny Real Time,” 2006, www.control.lth. se/~anton/tinyrealtime/.

Parente, “pyttsx – Text-tospeech x-platform,” pyttsx “struct–Interpret Strings as Packed Binary Data,” https://docs.python.org/2/ library/struct.html.

scikit-learn, “Preprocessing Data,”
http:// scikit-learn.org/stable/modules/preprocessing. html.

“Support Vector Machines,” scikit-learn.org/stable/modules/svm.html.

Spectra Symbol, “Flex Sensor FS,” 2015,
www.spectrasymbol.com/wp-content/themes/spectra/images/datasheets/FlexSensor.pdf.

Villalba and M. Lin, “Sign Language Glove,” ECE4760, Cornell University, 2014,
http:// people.ece.cornell.edu/land/courses/ece4760/FinalProjects/f2014/rdv28_mjl256/webpage/.

SOURCES
ATmega1284P Microcontroller Atmel | www.atmel.com

MPU-6050 MEMS MotionTracking Device InvenSense | www.invensense.com

Article originally published in Circuit Cellar June 2016, Issue #311