Embedded Database Systems in the Age of IoT
Embedded database systems have been around since the late 70s. Industry veterans, like myself, remember Btrieve, C-tree, Empress, db_VISTA and dBASE. Back in the day, embedded databases were used for departmental computing line-of-business applications, mostly on PCs and small Unix systems. These systems were siloes—they didn’t, and didn’t need to, share their data. To the extent that data was shared, it was in the form of a report the system generated. For example, at one database system vendor I worked for, we developed a system to track nuclear missile motors’ (rocket engines) movement, which was used for compliance with the START treaty. It was a PC-based system that, if a Soviet inspector ever asked, could generate a report about any specific motor—where it is, where it had been, whether it had been expended in a test and so on.
Today we have the Internet of Things (IoT) generating huge volumes of data, and in order to extract value from that data it must be shared. So, while embedded database systems of yore only had to manage data at rest, modern embedded database systems must also manage data in flight (Figure 1). And that brings new challenges for those who endeavor to offer embedded database system management solutions for the IoT.
DATA AT REST
First, let’s consider the data at rest. IoT systems have the potential to generate large amounts of data at the edge. Not all IoT systems generate a lot of data, but an embedded database system vendor can’t make assumptions about where and how the DBMS will be used. We have to develop for the most extreme cases.
At the same time, edge systems often have limited compute and storage capacity. Consider relatively modest MIPS, Arm and PowerPC based systems, or even a Raspberry Pi. The edge database system must be able to use the available resources as efficiently as possible. In practical terms, this means the ability to store and process the data locally, while remaining fast enough to support real-time requirements, such as when response time requirements can’t tolerate the latency inherent in communication with a cloud-based component. For example, our database system, eXtremeDB, includes a sophisticated event notification system that can be used to trigger an action outside of the database system—for example activate a solenoid, change a quality-of-service parameter and so forth.
As mentioned earlier, edge devices often have limited storage capacity. But consider an edge device gathering data from 10 sensors, every millisecond—for example, time series data. That could a 2-byte sensor ID, 4-byte timestamp and 4-byte floating point value. 10 bytes × 10 sensors × 1000 per second = 100,000 bytes of raw data per second, 6 MB per minute, 360 MB per hour and so on. The database system should store the data compactly and impose little overhead—additional storage space requirements in the form of meta data, indexes and so on. eXtremeDB achieves this with three capabilities: hybrid row and columnar layout for time series data (in this case, “sensor ID” is not repeated with every time series entry), run-length encoding of time series data (time series data is often highly repetitive; we often see 90% compression) and zip-like compression of non-time series data.
DATA IN FLIGHT
As for data in flight, embedded database system vendors have two key challenges, apart from the fundamental requirement of being able to replicate data and synchronize database instances. The first challenge is potentially slow connections. The second challenge is potentially intermittent connection.
LoRaWAN data rate is 27 Kbps. That’s miniscule compared to wired GigE networks or even the 4G/LTE rate of 50 Mbps uplink. An IoT embedded database system vendor must solve for this limited bandwidth when replicating data from the edge toward the cloud. eXtremeDB does this in two ways: First, we compress the data before it’s put into the communication channel. The second solution requires a choice on the part of the IoT developer. It’s unusual for IoT systems upstream from the edge to need the fine-grained data the edge system needs to make local decisions. Upstream systems can fulfill their purpose with aggregated data. For example, the 1,000 Hz data can be rolled up to 1 hertz data. eXtremeDB includes a library of functions to aggregate time series data, such as simple, grid and window averages.
The second key challenge for managing data in flight is that edge systems may have intermittent connections. This might be deliberate, as in a battery-powered system that only connects periodically in order to maximize battery life. Embedded database systems for the IoT must accommodate this with the ability keep track of the data that has changed locally and then be ready to transmit it when the connection is available. In eXtremeDB, this capability is encapsulated within its Active Replication Fabric feature.
In summary, when looking for an embedded database system to support edge functionality in an IoT system, verify potential candidates can meet the challenges of 1) running exceptionally well on resource-constrained devices, 2) storing the data locally within the available storage space for as long as needed until it can be transmitted upstream and 3) managing intermittent connectivity (if required)
For detailed article references and additional resources go to:
McObject | www.mcobject.com
PUBLISHED IN CIRCUIT CELLAR MAGAZINE • JUNE 2019 #347 – Get a PDF of the issue