Design Solutions Research & Design Hub

Write an Object Tracking Drone Application

Using OpenCV, MAVSDK and PX4

In his August article, Raul described how to set up a development environment and install a basic toolchain including PX4 and MAVSDK software. Continuing that theme, in this article Raul discusses how to develop a basic autonomous object tracking quadrotor application in simulation, by integrating computer vision object detection with the MAVSDK MAVLink library for autonomous flight control.

  • How to develop a basic autonomous object tracking quadrotor drone application

  • How to convert between Cartesian and NED coordinate systems

  • How to initialize your MAVSDK interface 

  • How to implement color range detection in your drone

  • How to make use of a Low-Pass Moving Average Filter in your drone program

  • How to test your autonomous object tracking application

  • PX4 SITL simulation environment

  • MAVSDK-Python library

  • Python OpenCV library

In this article, I discuss how to develop a basic quadrotor application for autonomous object tracking in simulation, by integrating computer vision object detection with the MAVSDK MAVLink library for autonomous flight control. PX4 Software in the Loop (SITL) simulation will be used to test the code example. I will talk about how to set up a development environment in the Ubuntu operating system that includes all the required tools—the PX4 SITL simulation environment, the MAVSDK-Python library and the Python OpenCV library. I will then explain a workflow to read and process images to detect objects by using a very basic computer vision technique—color range segmentation. Results obtained from the detection process will then be used to control a simulated quadcopter with the MAVSDK-Python MAVLink library, to make it autonomously track the detected object.

Color range segmentation is neither the best nor the most robust approach to detect and track objects with computer vision, but it serves well for the purpose of introducing concepts about interfacing computer vision detection tasks with drone autonomous flight.

The aim of this article is to present a basic example about how to integrate OpenCV, MASVDK and the PX4 simulation environment for students and hobbyists interested in experimenting with autonomous drone tracking. To follow this article, you must have at least very basic experience with OpenCV, and general knowledge about how quadcopters work, the MAVLink protocol and how to use PX4 SITL simulation to write drone autonomous flight applications with the MAVSDK-Python Library. With that in mind, check out my two-part article “Writing MAVSDK/PX4 Drone Applications” in Circuit Cellar 361 (August 2020) [1] for a quick introduction to MAVSDK development with PX4 SITL simulation.

Because we will use simulation, all the hardware needs to run the code example is a computer with Ubuntu 18.04 operating system and a webcam. If you are using a Windows or MacOS machine, you can Install VirtualBox and create an Ubuntu 18.04 virtual machine to try the example.

Figure 1 shows the main algorithm’s flow diagram. First, we initialize the webcam, make a connection to the drone and set its default coordinates. Next, the program enters into an infinite loop, in which the following tasks will be performed recurrently. Then, we get the detected target object’s “camera image” coordinates. If no target is detected, these coordinates will be undefined, in which case we will set them equal to the image frame’s center point coordinates. These will be our Cartesian coordinate system’s origin. If a target is detected, we will apply a Low-Pass Moving Average filter to the coordinates, to get rid of the high-frequency noise inherent in the computer vision detection process.

Figure 1
Main algorithm’s flow diagram

The coordinates returned by the detection algorithm are “camera image” coordinates (Width, Height) in pixel units. This coordinate system is used with cameras and image libraries in typical computing environments, to represent pixel positions in a digital image. Its origin is located at the image’s upper left corner (Figure 2a). Meanwhile, coordinates in the NED (North, East, Down) system are used to control the drone.

Figure 2
Coordinate systems. a) “Camera image” and Cartesian b) NED

The conversion between Cartesian and NED coordinate systems is straightforward, as you can see in Figure 2b. So, after obtaining the target object’s camera image coordinates, we must convert them to Cartesian coordinates, by using the equations shown in Figure 2a. Camera image and Cartesian coordinates are in pixel units. Once we have the target point in Cartesian coordinates, we must determine its position relative to a “center rectangle” area defined around the image frame’s center—the orange rectangle in Figure 3. If the target is outside the center rectangle, the target must be followed (that is, it must be re-centered in the image frame), so we compute a new set of NED coordinates from the Cartesian coordinates we already have. If the target is inside the center rectangle, the target is already centered and the last computed NED coordinates will remain as current.

Figure 3
Detection image frame

To obtain the NED coordinates, we multiply the Cartesian coordinates by a COORD_SYS_CONV_FACTOR conversion factor determined empirically. This conversion factor gives us a rough correspondence between the distances in the image frame in pixel units and the real distances in the ground in meters, assuming we are tracking objects over the ground. See the file included with the article’s source code for hints on how to determine this conversion factor. That file—and all the associated code and support files for this article—are available on Circuit Cellar’s article code and files webpage.


Advertise Here

Next, we command the drone to navigate to the current NED coordinates. Then, we will visualize the detection frame (Figure 3) in the computer’s display and repeat the infinite loop, until the user issues a “quit” command key from the computer keyboard to break it. If so, we return the drone to the home position and end the application.

The file outlines the process of installing the PX4 SITL simulation environment, the MAVSDK-Python library and QGroundControl ground station software. This is the same installation guide presented in my previous article “Writing MAVSDK/PX4 Drone Applications” [1]. From now on, please refer to that article if you want to review concepts mentioned here related to MAVSDK, PX4 SITL simulation and Python asynchronous programming with the asyncio library. These will not be discussed in detail in this article.

Once MAVSDK and the SITL simulation environment are installed, you must install the Python OpenCV computer vision library. The process is straightforward (see the guide file included with this article’s source code).

Listing 1 shows the async def run() function definition, which is the “main” entry function for our application. In line 6, vid_cam = cv2.VideoCapture(0) we instantiate an OpenCV VideoCapture object, after passing as argument the webcam’s index we want to access. 0 will usually be the index for the default webcam connected to your PC. In lines 8-11, we check if the VideoCapture object has been initialized properly. If it isn’t, we print an error message and issue a return instruction to terminate the application. In line 13, we run the get_image_params(vid_cam) function, which will compute a number of useful constant parameters derived from the image frame size. These parameters are stored in the params Python dictionary—a global variable accessed in various parts of the code to make key calculations. Next, in lines 16-17 we initialize the MAVSDK interface with the drone by creating a drone object and opening a user datagram protocol (UDP) connection to it. I will omit the explanation of other drone initialization steps that follow, because I already covered them in my previous article [1].

Listing 1
Main entry function

 1: ### ---------- This is the application's 'main' asynchronous function ----------
 2: async def run():
 3: """ Detects a target by using color range segmentation and follows it
 4: by using Offboard control and position NED coordinates. """
 6: vid_cam = cv2.VideoCapture(0)
 8: if vid_cam.isOpened() is False: 
 9:     print('[ERROR] Couldnt open the camera.')
10:      return
11:  print('-- Camera opened successfully')
13:  await get_image_params(vid_cam) 
14:  print(f"-- Original image width, height: {params['image_width']},{params['image_height']}")
16:  drone = System()
17:  await drone.connect(system_address="udp://:14540")
18:  # ... Additional drone initialization code lines go here...
20:  N_coord = 0
21:  E_coord = 0
23:  yaw_angle = 0
25:  await drone.offboard.set_position_ned(PositionNedYaw(N_coord, E_coord, D_coord,yaw_angle))
26:  await asyncio.sleep(4)
28:  while True:
29:      tgt_cam_coord, frame, contour = await get_target_coordinates(vid_cam)
31:      if tgt_cam_coord['width'] is not None and tgt_cam_coord['height'] is not None:
32:        tgt_filt_cam_coord = await moving_average_filter(tgt_cam_coord)
33:      else:
34:        tgt_cam_coord = {'width':params['y_ax_pos'], 'height':params['x_ax_pos']}
35:        tgt_filt_cam_coord = {'width':params['y_ax_pos'], 'height':params['x_ax_pos']}
37:      tgt_cart_coord = {'x':(tgt_filt_cam_coord['width'] - params['y_ax_pos']),
38:                        'y':(params['x_ax_pos'] - tgt_filt_cam_coord['height'])}
40:      COORD_SYS_CONV_FACTOR = 0.1
42:      if abs(tgt_cart_coord['x']) > params['cent_rect_half_width'] or \
43:      abs(tgt_cart_coord['y']) > params['cent_rect_half_height']:
44:          E_coord = tgt_cart_coord['x'] * COORD_SYS_CONV_FACTOR
45:          N_coord = tgt_cart_coord['y'] * COORD_SYS_CONV_FACTOR
46:          # D_coord, yaw_angle don't change
48:      await drone.offboard.set_position_ned(PositionNedYaw(N_coord, E_coord, D_coord,yaw_angle))
50:      frame = await draw_objects(tgt_cam_coord, tgt_filt_cam_coord, frame, contour)
52:      cv2.imshow("Detect and Track", frame)
54:      key = cv2.waitKey(1) & 0xFF
55:      if key == ord("q"):
56:          break
58:  print("-- Return to launch...")
59:  await drone.action.return_to_launch()
60:  print("NOTE: check the drone has landed already before running again this script.")   61:  await asyncio.sleep(5)

In lines 20-23, we define a default pose for the drone, with North, East coordinates and Yaw angle at zero, and the Down coordinate at -HOVERING_ALTITUDE, which sets the drone’s hovering altitude to the aforementioned constant defined at the beginning of the program (see the full source code listing). Line 25 sends the initial pose to the drone, which will cause the drone to take off, and line 26 generates a 4-second delay to give time for the drone to get away from the ground.

Lines 28-56 show the application’s main tasks, as outlined in the flow diagram in Figure 1. For instance, in line 29, we call the function get_target_coordinates(vid_cam), which will return the detected target’s “camera image” coordinates, along with the image frame to which the detection algorithm has been applied and the detected target’s contour.


Advertise Here

Next, in line 31, we check if the returned coordinates are valid (not “None”). If so, we call the function moving_average_filter(tgt_cam_coord), passing those coordinates as parameters to obtain their filtered values in return. Otherwise, we set the target coordinates equal to the image center’s coordinates—that is, the point at which the Cartesian origin will be placed (see lines 34-35). In lines 37-38 we make the conversion from “camera image” to Cartesian coordinates by using the equations described in Figure 2.a.

In lines 42-45, we check that the target coordinates are outside the “center rectangle” (the orange rectangle at the image center in Figure 3). If so, we compute new East and North coordinates, multiplying the (x, y) Cartesian coordinates by COORD_SYS_CONV_FACTOR. This is the conversion factor from pixel units in the “camera image” and Cartesian coordinate systems to meters in the NED coordinate system. See the file on how to estimate this conversion factor. If the target is inside the center rectangle, we assume the target is sufficiently “centered” in the image frame, so the drone will not change its previous North and East coordinates and it will remain in its current position. Otherwise, the drone must change position to re-center the target in the image frame.

In line 48 we call the drone.offboard.set_position_ned() function to command the drone to the current NED coordinates. In line 50 we draw some objects in the image frame to visualize the detection process, as shown in Figure 3. We draw the Cartesian coordinate axes, the center rectangle, the unfiltered point and the text of its coordinates (in red), the filtered point and its coordinates (in blue) and the target’s contour (in green). For this, we call the draw_objects() function, passing as arguments the unfiltered and filtered target camera coordinates, the image frame and the target contour.

In line 52 we call OpenCV’s imshow function to create a window in which the detection image frame will be shown (Figure 3). In lines 54-56, we check if the user has pressed the “q” (quit) key in the computer’s keyboard to break the while infinite loop. If not, we will continue to repeat all steps in the while loop until the “q” key is pressed. Finally, after breaking the infinite loop, in line 59 we call the drone.action.return_to_launch() function to make the drone return to home, before the script ends execution.

Listing 2 shows the definition of the get_image_params(vid_cam) function. It runs once in the main function, before the algorithm enters the while infinite loop, and computes a set of parameters derived from the image frame size. Those parameters will be used in the rest of the code to perform key calculations. In lines 4-5 the function reads one image frame from the camera and gets its height and width. Next, in lines 7-10 it compares the original image frame height with the DESIRED_IMAGE_HEIGHT constant defined at the beginning of the script (refer to the full source code listing on Circuit Cellar’s code and files download webpage). If they are different, it computes a scaling_factor that will be used to reduce or enlarge the image frame for the detection process. Typically, we will want to reduce the image size to lower the computation workload in the detection process (fewer pixels, less computation). In lines 12-13, we compute the image frame’s new width and height by applying the scaling_factor, and in lines 14-15 we resize the current frame accordingly, before computing the rest of the parameters.


Advertise Here

Listing 2
Auxiliary functions 1

 1: async def get_image_params(vid_cam):
 2:  """ Computes useful general parameters derived from the camera image size."""
 4:  _, frame =
 5:  params['image_height'], params['image_width'], _ = frame.shape
 7:  if params['image_height'] != DESIRED_IMAGE_HEIGHT:
 8:    params['scaling_factor'] = round((DESIRED_IMAGE_HEIGHT / params['image_height']),2)
 9:  else:
10:      params['scaling_factor'] = 1
12:  params['resized_width'] = int(params['image_width'] * params['scaling_factor'])
13:  params['resized_height'] = int(params['image_height'] * params['scaling_factor'])
14:  dimension = (params['resized_width'], params['resized_height'])
15:  frame = cv2.resize(frame, dimension, interpolation = cv2.INTER_AREA)
17:  params['cent_rect_half_width'] =round(params['resized_width'] * (0.5 * PERCENT_CENTER_RECT)) 
18:  params['cent_rect_half_height'] =round(params['resized_height'] * (0.5 * PERCENT_CENTER_RECT))
20:  params['min_tgt_radius'] = round(params['resized_width'] * PERCENT_TARGET_RADIUS)
22:  params['x_ax_pos'] = int(params['resized_height']/2 - 1)
23:  params['y_ax_pos'] = int(params['resized_width']/2 - 1)
25:  params['cent_rect_p1'] = (params['y_ax_pos']-params['cent_rect_half_width'], 
26:                            params['x_ax_pos']-params['cent_rect_half_height'])
27:  params['cent_rect_p2'] = (params['y_ax_pos']+params['cent_rect_half_width'], 
28:                            params['x_ax_pos']+params['cent_rect_half_height'])
30:  return
32: async def get_target_coordinates(vid_cam):
33: """ Detects a target by using color range segmentation and returns its'camera pixel' coordinates."""
35:  HSV_LOWER_BOUND = (107, 119, 41)
36:  HSV_UPPER_BOUND = (124, 255, 255)
38:  _, frame =
40:  if params['scaling_factor'] != 1:
41:      dimension = (params['resized_width'], params['resized_height'])
42:      frame = cv2.resize(frame, dimension, interpolation = cv2.INTER_AREA)
44:  blurred = cv2.GaussianBlur(frame, (11, 11), 0) 
45:  hsv = cv2.cvtColor(blurred, cv2.COLOR_BGR2HSV)
46:  mask = cv2.inRange(hsv, HSV_LOWER_BOUND, HSV_UPPER_BOUND)
47:  mask = cv2.erode(mask, None, iterations=3)
48:  mask = cv2.dilate(mask, None, iterations=3)
50:  _, contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
52:  cX = None
53:  cY = None
54:  largest_contour = None
56:  if len(contours) > 0:
57:      largest_contour = max(contours, key=cv2.contourArea)
58:      ((x, y), radius) = cv2.minEnclosingCircle(largest_contour)
60:      if radius > params['min_tgt_radius']:
61:          M = cv2.moments(largest_contour)
62:          cX = int(M["m10"] / M["m00"])
63:          cY = int(M["m01"] / M["m00"])
64:  return {'width':cX, 'height':cY}, frame, largest_contour

In lines 17-18 we compute the size of the center rectangle used to “center” the target in the image frame. In line 20 we compute a minimum target radius. Only objects with a radius greater than that will be tracked by the drone, whereas smaller targets will be ignored. In lines 22-23 we compute the position of the X and Y axes for the Cartesian coordinate system in the image frame. Finally, in lines 25-28, we compute the upper left and lower right corner points that define the center rectangle.

The function get_target_coordinates(vid_cam) begins to be defined in line 32 of Listing 2. Line 38 invokes the read() function from the vid_cam object passed as argument, which references the webcam connected to our computer. The read() function will return in the frame variable a multidimensional array containing the BGR (Blue, Green, Red) pixel values of the image. Next, in lines 40-42 we will resize the frame if the scaling_factor parameter is other than 1. In line 44 we apply the OpenCV GaussianBlur filter to the image, which is a low-pass filter that will remove noise from it. More generally, it will remove high frequency content and make the image look smoother.

Once the image is blurred, in line 45 we convert it from the BGR (Blue, Red Green) color space to HSV (Hue, Saturation, Value) color space. Generally, it is easier to segment a color in HSV color space than in BGR, because in HSV, the color we want to segment depends only on one channel (Hue). In BGR color space, the targeted color will be a combination of three channels (Blue, Green, Red). See Figure 4, for a color example with values in both color spaces, and check the Appendix.pdf file included with the article’s source code for a brief explanation about BGR and HSV color spaces.

Figure 4
Color spaces (Image source: AlloyUI [2].)

Next, in line 46, we call the OpenCV function inRange(hsv, HSV_LOWER_BOUND, HSV_UPPER_BOUND) to obtain image segments of all pixels with values in the HSV color range, defined by the tuples HSV_LOWER_BOUND and HSV_UPPER_BOUND. The function receives as first argument the image frame we want to segment (called hsv in the code). The second and third arguments are the lower and upper bound tuples (Hue, Saturation, Value) defining the color range. This function will return a mask binary (black and white) image, in which all pixels in our defined color range will be represented as white pixels, and all the rest will be represented as black pixels. Figure 5 shows the image frame in four steps of the process.

Figure 5
Target detection. a) Original frame, b) Blurred frame, c) Masked frame, d) Eroded and dilated frame.

There’s a Python script that’s part of the tutorial files in the OpenCV library’s code repository that’s used to exemplify color segmentation with the inRange function [3]. The example is called and is not only useful for learning about the subject, but it can also help us to obtain the color range values for the object we want to track. Figure 6 shows how I obtained the color range for my target by running this script. I included the script with the source code. See the file for a detailed description of how to use the script to obtain your own target color range.

Figure 6
Defining your target’s color range

Once we obtain the color segmentation, in lines 47-48, we apply erosion and dilation to the image three times each (iterations=3)erode() will erode away the boundaries of the white mask in the image, diminishing or removing small features (noise) in the boundaries and detaching connected elements. This will decrease the mask area and shrink its size. dilate() will increase the mask size to counteract the erosion effect and reconnect some features still present.

In line 50, we call the findContours() function, which will return a list of all detected contours (of disconnected island objects) in the mask frame. Then, in line 56 we check that at least one contour has been found. In line 57 we get the contour with the largest area of all, and in line 58, we calculate the radius of an enclosing circle for that contour. This radius will be used next (line 60) to verify that the detected object is sufficiently large to be tracked by the drone, by comparing it to the min_tgt_radius parameter.

Line 61 computes the largest contour’s raw moments and returns them in a Python dictionary (called M in the code), from which key image properties can be derived. These include the contour area (the dictionary element M[“m00”]) and the contour’s centroid (x = M[“m10”] / M[“m00”], y = M[“m01”] / M[“m00”]). I will not elaborate further about moments and why exactly we obtain the object’s centroid in this way, but there’s a link [4] in RESOURCES at the end of this article, if you want to delve deep into this matter. Finally, in line 64, the function returns a Python dictionary with the target centroid’s (width, height) “camera image” coordinates, the scaled image frame used in the detection process and the contour.

The always varying amount of light, color balance and focus in the image, will change—from frame to frame—the number of pixels detected as being part of the target. That, in turn, varies the detected shape, area and centroid coordinates. This produces the effect of the centroid “trembling” at high frequency in the detection video stream. Moreover, a fair amount of vibration in a real drone, due to the motors and propellers, would also add trembling to the centroid point. To minimize this effect, we pass the centroid coordinates through a Low-Pass Moving Average filter—one of the easiest methods to filter high-frequency noise in sensor measurements. Explaining how a Low-Pass Moving Average filter works is beyond the scope of this article. See the Appendix.pdf file included with the article’s source code for a fairly detailed explanation about this filter. Lines 1-19 in Listing 3 show the moving_average_filter(coord) function definition.

Listing 3
Auxiliary functions 2

 1: async def moving_average_filter(coord):
 2:     """ Applies Low-Pass Moving Average Filter to a pair of (x, y) coordinates."""
 4:    filt_buffer['width'].append(coord['width'])
 5:    filt_buffer['height'].append(coord['height'])
 7:    if len(filt_buffer['width']) > NUM_FILT_POINTS:
 8:        filt_buffer['width'] = filt_buffer['width'][1:]
 9:        filt_buffer['height'] = filt_buffer['height'][1:]
11:    N = len(filt_buffer['width'])
13:    w_sum = sum( filt_buffer['width'] )
14:    h_sum = sum( filt_buffer['height'] )
16:    w_filt = int(round(w_sum / N))
17:    h_filt = int(round(h_sum / N))
19:    return {'width':w_filt, 'height':h_filt}
22:  async def draw_objects(cam_coord, filt_cam_coord, frame, contour):
23:    """ Draws visualization objects from the detection process.
24:        Position coordinates of every object are always in 'camera pixel' units"""
26:  cv2.line(frame, (0, params['x_ax_pos']), (params['resized_width'], params['x_ax_pos']),(0, 128, 255), 1)
27:  cv2.line(frame, (params['y_ax_pos'], 0), (params['y_ax_pos'], params['resized_height']),(0, 128, 255), 1)
28:, (params['y_ax_pos'], params['x_ax_pos']), 1, (255, 255, 255), -1)
30:  cv2.rectangle(frame, params['cent_rect_p1'], params['cent_rect_p2'], (0, 178, 255), 1)
32:  cv2.drawContours(frame, [contour], -1, (0, 255, 0), 2)
34:  x_cart_coord = cam_coord['width'] - params['y_ax_pos']
35:  y_cart_coord = params['x_ax_pos'] - cam_coord['height']
37:  x_filt_cart_coord = filt_cam_coord['width'] - params['y_ax_pos']
38:  y_filt_cart_coord = params['x_ax_pos'] - filt_cam_coord['height']
40:, (cam_coord['width'], cam_coord['height']), 5, (0, 0, 255), -1) 
41:  cv2.putText(frame, str(x_cart_coord) + ", " + str(y_cart_coord), 
42:     (cam_coord['width'] + 25, cam_coord['height'] - 25), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 1)
44:, (filt_cam_coord['width'], filt_cam_coord['height']), 5,(255, 30, 30), -1)
45:  cv2.putText(frame, str(x_filt_cart_coord) + ", " + str(y_filt_cart_coord), 
46:     (filt_cam_coord['width'] + 25, filt_cam_coord['height'] - 25), cv2.FONT_HERSHEY_SIMPLEX,0.5, 
                 (255, 30, 30), 1)
47:  return frame

If you recall, after filtering the coordinates in the “main” function, we call the draw_objects() function to draw the detection visualization objects in the image frame (Figure 3). We use the OpenCV functions line()circle()rectangle(), drawContours() and putText() to draw lines, circles, rectangles, the contour and text, respectively. Lines 26-46 in Listing 3 show the code in charge of the drawing. There are tutorials about OpenCV drawing functions on many websites (including OpenCVs official site), if you want to better understand how these functions work [5].

To control the drone, we use the drone.offboard.set_position_ned() function (see line 48 in Listing 1), which receives coordinates in the North, East, Down (NED) system, plus a Yaw angle. North is the distance to the goal point in the northern axis, East is the distance in the eastern axis and Down is the altitude. Yaw is the clockwise angle around the drone’s vertical axis (measured from the northern line) to which the drone must point when reaching the goal pose. Note that for positive altitudes, the Down coordinate must be negative. For instance, to command the drone to an altitude of 10m, the Down coordinate must be equal to -10 (“Up“ is really “minus Down“).

In my previous article “Writing MAVSDK/PX4 Drone Applications“ [1], I discussed in more detail how to control a drone’s position by sending NED position setpoints in the “offboard” flight mode. In the while loop, the detected target’s Cartesian coordinates are converted to NED coordinates, as explained at the beginning, and sent to the drone to control its position and make it follow the target.

To test the code example in simulation, take the following steps: First, run the PX4 & Gazebo, or PX4 & JMAVSim simulator. Second, open QGroundControl. Third, run the script. If the camera detects a blue object within the HSV color range defined in the code, you will see the drone moving over QGroundControl’s map in the direction of the detected target. Furthermore, if you move the blue target in front of the camera, the drone will move in the same direction, tracking the target only when it is outside the center rectangle. The text file summarizes all detailed steps to run the code in simulation. Figure 7 shows a screen capture of the script running.

Figure 7
Example “” {{ < italic }} running

As I said at the beginning, color segmentation is not the best approach to detect and track objects with computer vision. Moreover, the approach to find the target’s coordinates taken here, is not the best either. For instance, we should have taken into account the fact that a camera shot taken by a drone from above in the air, is a 2D perspective projection of the real 3D world in the camera’s field of view.

To make it right, we would need to compute the inverse of that projection, from 2D image coordinate system to the 3D world coordinate system, to obtain the target’s real (x, y) world coordinates. Obviously, we lose one dimension in the process of going from 3D to 2D, and then going back from 2D to 3D. Implementing a tracking algorithm like that was definitely out of the scope of this article. Instead, the main purpose of this article is to get you acquainted with basic computer vision and drone autonomous flight programming.

There are other, more robust computer vision detection techniques as well—much better than the one we used here—that allow detecting and classifying many types of items—such as cars, animals or people. For example, there are some Machine Learning approaches, such as the Haar Cascade Classifier or the Histogram of Oriented Gradients & SVM Classifier, and other even more effective, and nowadays more popular Deep Learning approaches. In a follow-up to this article, we will revisit this same topic to write an object-tracking autonomous drone application with object detection by Deep Learning inference. Stay tuned! 


[1] “Writing MAVSDK/PX4 Drone Applications” in Circuit Cellar 361 (August 2020)
[2] Figure 4 image source:
[3] Thresholding Operations using inRange,
[4] Moments: Find the Center of a Blob (Centroid) using OpenCV (C++/Python)
[5] Drawing Functions in OpenCV

Demo Videos of ‘’

Smoothing Images

Thresholding Operations using inRange

The Ultimate Guide to Understanding Hue, Tint, Tone and Shade

Ardupilot |
PX4 Autopilot |
QGroundControl |


Keep up-to-date with our FREE Weekly Newsletter!

Don't miss out on upcoming issues of Circuit Cellar.

Note: We’ve made the Dec 2022 issue of Circuit Cellar available as a free sample issue. In it, you’ll find a rich variety of the kinds of articles and information that exemplify a typical issue of the current magazine.

Would you like to write for Circuit Cellar? We are always accepting articles/posts from the technical community. Get in touch with us and let's discuss your ideas.

Sponsor this Article
| + posts

Raul Alvarez Torrico has a B.E. in electronics engineering and is the founder of TecBolivia, a company offering services in physical computing and educational robotics in Bolivia. In his spare time, he likes to experiment with wireless sensor networks, robotics and artificial intelligence. He also publishes articles and video tutorials about embedded systems and programming in his native language (Spanish), at his company’s web site You may contact him at

Supporting Companies

Upcoming Events

Copyright © KCK Media Corp.
All Rights Reserved

Copyright © 2024 KCK Media Corp.

Write an Object Tracking Drone Application

by Raul Alvarez Torrico time to read: 20 min