3-D Object Segmentation for Robot Handling

A commercial humanoid service robot needs to have capabilities to perform human-like tasks. One such task for a robot in a medical scenario would be to provide medicine to a patient. The robot would need to detect the medicine bottle and move its hand to the object to pick it up. The task of locating and picking a medicine bottle up is quite trivial for a human. What does it take to enable a robot to do the same task? This, in fact, is a challenging problem for a robot. A robot tries to make sense of its environment based on the visual information it receives from a camera. Even then, creating efficient algorithms to identify an object of interest in an image, calculating the location of the robot’s arm in space, and enabling it to pick the object up is a daunting task. For our senior capstone project at Portland State University, we researched techniques that would enable a humanoid robot to locate and identify a common object (e.g., a medicine bottle) and acquire real-time position information about the robot’s hand in order to guide it to the target object. We used an InMoov open-source, 3-D humanoid robot for this project (see Photo 1).

Photo 1 The InMoov robot built at Portland State University’s robotics lab

Photo 1: The InMoov robot built at Portland State University’s robotics lab


In the field of computer vision, there are two dominant approaches to this problem—one using pixel-based 2-D imagery and another using 3-D depth imagery. We chose the 3-D approach because of the availability of state-of-the-art open source algorithms, and because of the recent influx of cheap stereo depth cameras, like the Intel RealSense R200.

Solving this problem further requires a proper combination of hardware and software along with a physical robot to implement the concept. We used an Intel Realsense R200 depth camera to collect 3-D images, and an Intel NUC with a 5th Generation Core i5 to process the 3-D image information. Likewise, for software, we used the open-source Point Cloud Library (PCL) to process 3-D point cloud data.[1] PCL contains several state-of-the-art 3-D segmentation and recognition algorithms, which made it easier for us to compare our design with other works in the same area. Similarly, the information relating to the robot arm and object position computed using our algorithms is published to the robot via the Robot Operating System (ROS). It can then be used by other modules, such as a robot arm controller, to move the robot hand.


Object segmentation is widely applied in computer vision to locate objects in an image.[2] The basic architecture of our package, as well as many others in this field, is a sequence of processing stages—that is, a pipeline. The segmentation pipeline starts with capturing an image from a 3-D depth camera. By the last stage of the pipeline, we have obtained the location and boundary information of the objects of interest, such as the hand of the robot and the nearest grabbable object.

Figure 1: 3-D object segmentation pipeline

Figure 1: 3-D object segmentation pipeline

The object segmentation pipeline of our design is shown in Figure 1. There are four main stages in our pipeline: downsampling the input raw image, using RANSAC and plane extraction algorithms, using the Euclidean Clustering technique to segment objects, and applying a bounding box to separate objects. Let’s review each one.

The raw clouds coming from the camera have a resolution which is far too high for segmentation to be feasible in real time. The basic technique for solving this problem is called “voxel filtering,” which entails compressing several nearby points into a single point.[3] In other words, all points in some specified cubical region of volume will be combined into a single point. The parameter that controls the size of this volume element is called the “leaf size.” Figure 2 shows an example of applying the voxel filter with several different leaf sizes. As the leaf size increases, the point cloud density decreases proportionally.

Figure 2: Down-sampling results for different leaf sizes

Figure 2: Down-sampling results for different leaf sizes

Random sample consensus (RANSAC) is a quick method of finding mathematical models. In the case of a plane, the RANSAC method will create a virtual plane that is then rotated and translated throughout the scene, looking for the plane with the data points that fit the model (i.e., inliers). The two parameters used are the threshold distance and the number of iterations. The greater the threshold, the thicker the plane can be. The more iteration RANSAC is allowed, the greater the probability of finding the plane with the most inliers.

Figure 3: The effects of varying the number of iterations of RANSAC. Notice that the plane on the left (a), which only used 200 iterations, was not correctly identified, while the one on the right (b), with 600 iterations, was correctly identified.

Figure 3: The effects of varying the number of iterations of RANSAC. Notice that the plane on the left, which only used 200 iterations, was not correctly identified, while the one on the right, with 600 iterations, was correctly identified.

Refer to Figure 3 to see what happens as the number of iterations is changed. The blue points represent the original data. The red points represent the plane inliers. The magenta points represent the noise (i.e., outliers) remaining after a prism extraction. As you can see, the image on the left shows how the plane of the table was not found due to RANSAC not being given enough iterations. The image on the right shows the plane being found, and the objects above the plane are properly segmented from the original data.

After RANSAC and plane extraction in the segmentation pipeline, Euclidean Clustering is performed. This process takes the down-sampled point cloud—without the plane and its convex hull—and breaks it into clusters. Each cluster hopefully corresponds to one of the objects on the table.[4] This is accomplished by first creating a kd-tree data structure, which stores the remaining points in the cloud in a way that can be searched efficiently. The cloud points are then iterated again with a radius search being performed for each point. Neighboring points within the threshold radius are then added to the current cluster and marked as processed. This continues until all points in the cloud have been marked as processed and put into different segments before the algorithm terminates. After the object segmentation and recognition has been performed, the robot knows which object to pick up, but it doesn’t know the boundaries of the object.

Saroj Bardewa (saroj@pdx.edu) is pursuing an MS in Electrical and Computer Engineering at Portland State University, where he earned a BS in Computer Engineering in June 2016. His interests include computer architecture, computer vision, machine learning, and robotics.

Sean Hendrickson (hsean@pdx.edu) is a senior studying Computer Engineering at Portland State University. His interests include computer vision and machine learning.

This complete article appears in Circuit Cellar 320 (March 2017).

Microchip and SiS Offer PCAP and 3D-Gesture Interface Modules

Microchip Technology and Silicon Integrated Systems Corp. (SiS) recently partnered to offer complete projected-capacitive touch (PCAP) and 3D-gesture interface modules. The modules are intended to simplify the design of multi-touch and 3D gesture displays with Microchip’s GestIC technology.Microchip PCAP 3D

Microchip’s GestIC is intended to be combined with multi-touch PCAP controllers. The modules from SiS integrate 2D PCAP and 3D gesture technologies. SiS modules with Microchip’s GestIC technology will enable engineers to deliver innovative 3D control displays in the consumer, home-automation, and Internet of Things markets.

Source: Microchip Technology

High-Accuracy, 3-D Magnetic Sensor

Infineon Technologies recently announced the availability of the TLV493D-A1B6, a 3-D magnetic sensor that features highly accurate three-dimensional sensing with extremely low power consumption in a small six-pin TSOP package. Magnetic field detection in x, y, and z directions enables the sensor to measure 3-D, linear, and rotation movements. The implemented digital I²C interface enables fast and bidirectional communication between the sensor and microcontroller.3D-Magnetic-Sensor_TSOP6_Infineon

The TLV493D-A1B6 is intended for consumer and industrial applications that require accurate 3-D measurements or angular measurements or low power consumption, such as joysticks, electric meters where the 3-D magnetic sensor helps to protect against tampering, and more. With its contactless position sensing and high temperature stability of magnetic threshold, the TLV493D-A1B6 enables these systems to become smaller, more accurate, and robust.

The 3-D magnetic sensor TLV493D-A1B6 enables smaller and more energy efficient e-meter systems. Today, up to three magnetic sensors—one for each dimension of external magnetic field—are needed to measure tampering attempts with large magnets. In future, the 3-D magnetic sensor TLV493D-A1B6 will replace all 3-D sensors thus making e-meters smaller and more energy efficient.

The 3-D sensor TLV493D-A1B6 detects all three dimensions of a magnetic field. Using lateral hall plates for the z direction and vertical Hall plates for the x and y direction of the magnetic field, the sensor can be used in a large magnetic field range of ±150 mT for all three dimensions. This allows measuring and covering a long magnet movement. The large operation scale also makes the magnet circuit design easy, robust and flexible.

The TLV493D-A1B6 provides 12-bit data resolution for each measurement direction. This allows a high data resolution of 0.098 mT per bit (LSB) so that even the smallest magnet movements can be measured.

One of the main development goals for the TLV493D-A1B6 sensor was low power consumption. In Power Down mode, the sensor only requires 7-nA supply current. To perform magnetic measurements, the sensor can be set in one of five different power modes. In Ultra Low Power Mode, for example, the sensor performs a magnetic measurement every 100 ms (10 Hz) resulting in a current consumption of 10 µA. The time between measurement cycles can be set flexibly allowing system specific solutions. Using the sensor with continuous measurements, the maximum power consumption is only 3.7 mA. Also, the power modes can be changed during operation.

The TLV493D-A1B6 uses a standard I²C digital protocol to communicate with external microcontrollers. It is possible to operate the sensors in a bus mode to eliminate additional wiring cost and efforts.

Targeting industrial and consumer applications, TLV493D-A1B6 can be operated on supply voltages between 2.7 and 3.5 V and in a temperature range from –40°C to 125°C. The product is qualified according to industry standard JESD47.

For a fast design-in process, Infineon offers the “3D Magnetic 2Go” evaluation board. In combination with the free 3-D sensor software, first magnetic measurements are attainable within minutes. The evaluation board applies the Infineon 32-bit XMC1100 micrcontroller that uses the ARM Cortex-M0 processor.

The “3D Magnetic 2Go” is currently available (www.ehitex.com). Engineering samples of the TLV493D-A1B6 designed for consumer and industrial applications will be available as of July 2015. Volume production is expected to start in January 2016.

Source: Infineon Technologies

GestIC Controller Enables One-step Design-in of 3-D Gesture Recognition

Microchip Technology recently announced a new addition to its patented GestIC family. The new MGC3030 3-D gesture controller features simplified user-interface options focused on gesture detection, enabling true one-step design-in of 3-D gesture recognition in consumer and embedded devices. Housed in an easy-to-manufacture SSOP28 package, the MGC3030 expands the use of 3-D gesture control features to high-volume, cost-sensitive applications such as audio, lighting, and toys.GestIC

The simplicity of gesture-detection integration offered by the MGC3030 is also achieved through Microchip’s free, downloadable AUREA graphical user interface (GUI) and easily configurable general-purpose IO ports that even allow for host MCU/processor-free usage. The MGC3030’s on-chip 32-bit digital signal processor executes real-time gesture processing, which eliminates the need for external cameras or controllers for host processing and allows for faster and more natural user interaction with devices.

The MGC3030 makes full use of the GestIC family development tools, such as Microchip’s Colibri Gesture Suite, which is an on-chip software library of gesture features. Intuitive and natural movements of the human hand are recognized, making the operation of a device functional, intuitive, and fun. Without the need to touch the device, features such as Flick Gestures, the Air Wheel, or the proximity detection perform commands such as changing audio tracks, adjusting volume control or backlighting, and many others. All gestures are processed on-chip, allowing manufacturers to realize powerful user interfaces with very low development effort.

Unique to GestIC technology, the programmable Auto Wake-Up On Approach feature begins operating in the range of 100-µW power consumption, enabling always-on gesture sensing in power-constrained applications. If real user interaction is detected, the system automatically switches into full sensing mode and alternates back to auto wake-up mode once the user leaves the sensing area. These combined features and capabilities provide designers with the ability to quickly integrate gesture detection features at price points that are ideal for high-volume devices.

Also available is Microchip’s Woodstar MGC3030 Development Kit (DM160226). The $139 kit is available via any Microchip sales representative, authorized worldwide distributor, or microchipDIRECT (www.microchip.com/Dev-Kit-012015a). The kit comes with the AUREA GUI, the central tool to parameterize the MGC3030 and the Colibri Suite to suit the needs of any design. AUREA is available via a free download at www.microchip.com/AUREA-GUI-012015a. The Colibri Gesture Suite is an extensive library of proven and natural 3-D gestures for hands and fingers that is preprogrammed into the MGC3030.

The MGC3030 featuring GestIC technology is available in a 28-pin SSOP package. Each unit costs under $2 each in high volumes.

Source: Microchip Technology

Ultra-Compact Ultrasonic Sensor Series

MaxbotixThe UCXL-MaxSonar-WR series of sensors are flexible, OEM-customizable products that can be integrated into a system with MaxBotix’s horns or flush-mounted into an existing housing. Mounting design recommendations are provided through MaxBotix’s 3-D CAD models (available in multiple formats) to facilitate your design process. The sensor layout offers four conveniently placed mounting holes for design flexibility.

The rugged, high performance sensors are individually calibrated and feature a 1-cm resolution, an operational temperature range from –40˚C to 70˚C, real-time automatic calibration (voltage, humidity, and ambient noise), 200,000+ h mean time between failures (MTBF), and an operational 3-to-5.5-V voltage range with a low 3.4-mA average current requirement.

Contact MaxBotix for pricing.

MaxBotix, Inc.