Natural Human-Computer Interaction

Recent innovations in both hardware and software have brought on a new wave of interaction techniques that depart from mice and keyboards. The widespread adoption of smartphones and tablets with capacitive touchscreens shows people’s preference to directly manipulate virtual objects with their hands.

Going beyond touch-only interaction, the Microsoft Kinect sensor enables users to play

This shows the hand tracking result from Kinect data. The red regions are our tracking results and the green lines are the skeleton tracking results from the Kinect SDK (based on data from the ChAirGest corpus: https://project.eia-fr.ch/chairgest/Pages/Overview.aspx).

This shows the hand tracking result from Kinect data. The red regions are our tracking results and the green lines are the skeleton tracking results from the Kinect SDK (based on data from the ChAirGest corpus: https://project.eia-fr.ch/chairgest/Pages/Overview.aspx).

games with their entire body. More recently, Leap Motion’s new compact sensor, consisting of two cameras and three infrared LEDs, has opened up the possibility of accurate fingertip tracking. With Project Glass, Google is pioneering new technology in the wearable human-computer interface. Other new additions to wearable technology include Samsung’s Galaxy Gear Smartwatch and Apple’s rumored iWatch.

A natural interface reduces the learning curve, or the amount of time and energy a person requires to complete a particular task. Instead of a user learning to communicate with a machine through a programming language, the machine is now learning to understand the user.

Hardware advancements have led to our clunky computer boxes becoming miniaturized, stylish sci-fi-like phones and watches. Along with these shrinking computers come ever-smaller sensors that enable a once keyboard-constrained computer to listen, see, and feel. These developments pave the way to natural human-computer interfaces.
If sensors are like eyes and ears, software would be analogous to our brains.

Understanding human speech and gestures in real time is a challenging task for natural human-computer interaction. At a higher level, both speech and gesture recognition require similar processing pipelines that include data streaming from sensors, feature extraction, and pattern recognition of a time series of feature vectors. One of the main differences between the two is feature representation because speech involves audio data while gestures involve video data.

For gesture recognition, the first main step is locating the user’s hand. Popular libraries for doing this include Microsoft’s Kinect SDK or PrimeSense’s NITE library. However, these libraries only give the coordinates of the hands as points, so the actual hand shapes cannot be evaluated.

Fingertip tracking using a Kinect sensor. The green dots are the tracked fingertips.

Our team at the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory has developed methods that use a combination of skin-color and motion detection to compute a probability map of gesture salience location. The gesture salience computation takes into consideration the amount of movement and the closeness of movement to the observer (i.e., the sensor).

We can use the probability map to find the most likely area of the gesturing hands. For each time frame, after extracting the depth data for the entire hand, we compute a histogram of oriented gradients to represent the hand shape as a more compact feature descriptor. The final feature vector for a time frame includes 3-D position, velocity, and hand acceleration as well as the hand shape descriptor. We also apply principal component analysis to reduce the feature vector’s final dimension.

A 3-D model of pointing gestures using a Kinect sensor. The top left video shows background subtraction, arm segmentation, and fingertip tracking. The top right video shows the raw depth-mapped data. The bottom left video shows the 3D model with the white plane as the tabletop, the green line as the arm, and the small red dot as the fingertip.

The next step in the gesture-recognition pipeline is to classify the feature vector sequence into different gestures. Many machine-learning methods have been used to solve this problem. A popular one is called the hidden Markov model (HMM), which is commonly used to model sequence data. It was earlier used in speech recognition with great success.

There are two steps in gesture classification. First, we need to obtain training data to learn the models for different gestures. Then, during recognition, we find the most likely model that can produce the given observed feature vectors. New developments in the area involve some variations in the HMM, such as using hierarchical HMM for real-time inference or using discriminative training to increase the recognition accuracy.

Ying Yin

Ying Yin is a PhD candidate and a Research Assistant at the Massachusetts Institute of Technology (MIT) Computer Science and Artificial Intelligence Laboratory. Originally from Suzhou, China, Ying received her BASc in Computer Engineering from the University of British Columbia in Vancouver, Canada, in 2008 and an MS in Computer Science from MIT in 2010. Her research focuses on applying machine learning and computer vision methods to multimodal human-computer interaction. Ying is also interested in web and mobile application development. She has won awards in web and mobile programming competitions at MIT.

Currently, the newest development in speech recognition at the industry scale is a method called deep learning. Earlier machine-learning methods require careful selection of feature vectors. The goal of deep learning is automatic discovery of powerful features from raw input data. So far, it has shown promising results in speech recognition. It can possibly be applied to gesture recognition to see whether it can further improve accuracy.

As component form factors shrink, sensor resolutions grow, and recognition algorithms become more accurate, natural human-computer interaction will become more and more ubiquitous in our everyday life.

Web-Based Remote I/O Control

The RIO-2010 is a web-based remote I/O control module. The Ethernet-ready module is equipped with eight relays, 16 photo-isolated digital inputs, and a 1-Wire interface for digital temperature sensor connection. The RIO-2010’s built-in web server enables you to access the I/O and use a standard web browser to remotely control the RIO-2010’s relay.

The RIO-2010 can be easily integrated into supervisory control and data acquisition (SCADA) and industrial automation systems using the standard Modbus TCP protocol. The I/O module also comes with RS-485 serial interface for applications requiring Modbus RTU/ASCII. Its built-in web server enables you to use standard web-editing tools and Ajax dynamic page technology to customize your webpage.

Contact Artila for pricing.

Artila Electronics Co., Ltd.
www.artila.com

DC Motor for Fine Rotary Motions

The RE 30 EB precious metal brushed motor features a low start-up voltage, even after a long period in standstill. With a 53-mNm rated torque, the powerful motor provides twice the power of an Maxon RE 25 EB. In addition, the RE 30 EB features minimal high-frequency interference.

The RE 30 EB motor is specifically designed for haptic applications (e.g., surgical robots). Therefore, the motor can also be used as a highly sensitive sensor, acting as the sense of touch to register mechanical resistance.

Contact Maxon for pricing.

Maxon Precision Motors
www.maxonmotorusa.com

Dual-Display Digital Multimeter

The DM3058E digital multimeter (DMM) is designed with 5.5-digit resolution and dual display. The DMM can enable system integration and is suitable for high-precision, multifunction, and automatic measurement applications.

The DM3058E is capable of measuring up to 123 readings per second. It can quickly save or recall up to 10 preset configurations, including built-in cold terminal compensation for thermocouples.

The DMM provides a convenient and flexible platform with an easy-to-use design and a built-in help system for information acquisition. In addition, it supports 10 different measurement types including DC voltage (200 mV to approximately 1,000 V), AC voltage (200 mV to approximately 750 V), DC current (200 µA to approximately
10 A), AC current (20 mA to approximately 10 A), frequency measurement (20 Hz to approximately 1 MHz), 2-Wire and 4-Wire resistance (200 O to approximately 100 MO), and diode, continuity, and capacitance.

The DM3058 is ideal for research and development labs and educational applications, as well as low-end detection, maintenance, and quality tests where automation combined with capability and value are needed.

The DM3058E digital multimeter costs $449.

Rigol Technologies, Inc.
www.rigolna.com

Two-Channel CW Laser Diode Driver with an MCU Interface

The iC-HT laser diode driver enables microcontroller-based activation of laser diodes in Continuous Wave mode. With this device, laser diodes can be driven by the optical output power (using APC), the laser diode current (using ACC), or a full controller-based power control unit.

The maximum laser diode current per channel is 750 mA. Both channels can be switched in parallel for high laser diode currents of up to 1.5 A. A current limit can also be configured for each channel.

Internal operating points and voltages can be output through ADCs. The integrated temperature sensor enables the system temperature to be monitored and can also be used to analyze control circuit feedback. Logarithmic DACs enable optimum power regulation across a large dynamic range. Therefore, a variety of laser diodes can be used.

The relevant configuration is stored in two equivalent memory areas. Internal current limits, a supply-voltage monitor, channel-specific interrupt-switching inputs, and a watchdog safeguard the laser diodes’ operation through iC-HT.

The device can be also operated by pin configuration in place of the SPI or I2C interface, where external resistors define the APC performance targets. An external supply voltage can be controlled through current output device configuration overlay (DCO) to reduce the system power dissipation (e.g., in battery-operated devices or systems).

The iC-HT operates on 2.8 to 8 V and can drive both blue and green laser diodes. The diode driver has a –40°C-to-125°C operating temperature range and is housed in a 5-mm × 5-mm, 28-pin QFN package.

The iC-HT costs $13.20 in 1,000-unit quantities.

iC-Haus GmbH
www.ichaus.com