> Data Intelligence for IoT Security: Opportunities and Challenges

Data Intelligence for IoT Security: Opportunities and Challenges


November 12th, 2019

#big data #privacy #differential privacy #ai #5g #machine learning #iot

Millions of new Internet-enabled devices are appearing on ISP’s network. These devices are not like the smartphones, PCs, and servers which dominate today. Many of them are simple, automated devices: webcams; electrical meters; “dockless” rental scooters; light bulbs; soil moisture meters for smart farms; automotive traffic sensors; and so on. Notably, many of them are inexpensive, so we can expect to see them in large numbers. With an order-of-magnitude increase in the scale of the network, it will bring corresponding challenges in how we manage and secure the network to bring value to customers.

Considering security in particular, it seems that security of IoT devices is about the same as we are used to seeing in PCs, which is to say, poor: they suffer from the usual maladies. There are IoT botnets and worms, vulnerabilities that are never patched, default credentials still used for authentication, disappearing vendors, and inattentive or overwhelmed customers. Unfortunately, what we are able to tolerate as background noise in the PC world is liable to become a crisis when the number of devices blows up to IoT scale.

That is the risk side of the equation. We also have new opportunities to provide value to our customers. The structure of a cellular-based IoT deployment is much different from a traditional enterprise or consumer network, in which many devices are connected to a customer-provided local area network (LAN), whose details are invisible to the ISP due to network address translation (NAT). Cellular IoT devices, in contrast, are connected directly to the ISP network before communicating with a customer’s cloud back end (Figure 1). This different network structure means that an ISP sees and handles the traffic of the devices themselves.

Figure 1. Cellular vs local area network.

The cellular ISP is therefore uniquely positioned to detect device misbehavior, such as an infected device’s communications with a malware command-and-control system. A cellular ISP can also take direct action to mitigate device misbehavior, for example, by quarantining infected devices on behalf of customers. This opens up research opportunities to utilize machine learning and deep learning algorithms to extract security related intelligence out of IoT traffic data using basic statistics and advanced algorithms.

IoT traffic has distinct characteristics. IoT device manufacturers are frugal: pennies matter when manufacturing millions of devices. Therefore, IoT devices tend to have just enough resources (memory, CPU, network bandwidth, etc) to perform their function. For example, a smart lightbulb may only need to process the commands to turn the light on or off, or change brightness. In many cases, these are the only uses of the data for these devices. So, these devices tend to have low and predictable network behavior. The behavior of infected devices therefore stands out more readily.

By analyzing network traffic one can have a better understanding about the expected network traffic of IoT devices. In cases when anomalies or misbehavior are observed, an ISP can take actions to remediate the behavior. In order to detect anomalous and malicious behavior from the network traffic data, in some cases, the data could be analyzed locally, even on the IoT device or an edge gateway device, by using the device resources to compute basic statistics. For example, when a device is connected to the Internet, it communicates via a suite of IP-based protocols. For each connection, one can record the source IP address, destination IP address, protocol, source port, destination port, bytes sent, bytes received, etc. These are the basic features that can be captured for all connections to and from IoT devices. Based on these features some basic statistics or derived features, can be computed to establish a device baseline for a period of time (an hour, a day, a week, etc.), basic statistical features that can be collected from IoT devices and computed locally at the edge. Using these features can address some of the most common types of attacks such as DDoS, data exfiltration and data usage spikes for billing issues.

In other cases, the analysis happens in the cloud where more data is available, providing a global view of the network, by using more powerful computation systems and complex algorithms, with more advanced analytics. One example of advanced analysis is device identification. Device identification is important in security and device inventory. As more IoT devices are connecting to the network, and each type of device having different network patterns, knowing the distribution of the devices will help in network planning. When a critical vulnerability is discovered, identifying which devices are susceptible within the network, and addressing the vulnerability will greatly reduce the risk of compromise. For service providers it is important to find a methodology to identify IoT devices by building their signatures based on the network data. This analysis requires traffic data from many devices in order to find commonality among them.

One method of identifying devices is heartbeat detection. Periodic traffic is sometimes automatically generated by the device itself, in order to accomplish its fundamental functionalities, such as to get a device software update, to report the device status, to cache weather or news data locally on the device, etc. Therefore, algorithms can detect IoT devices’ critical recurrent connections as heartbeats. New heartbeats may suggest botnet infection and lost heartbeats could be signs of device failures. An even more powerful advanced analysis is to cluster similar devices by the set of addresses that they communicate with: their community of interest. By knowing which devices are similar in network traffic, it can assist network planning, identifying susceptible devices as a group, detecting group anomalies, and cross checking against device inventory.

Whenever we deploy algorithms that collect and analyze customer data, we must be concerned with the security and privacy of the data itself. There are several techniques that we can use to achieve this. For example, we could capture and analyze the data on the device itself, or on an edge gateway, without collecting it at all together in the cloud. The closer we can keep the data to the customer, the better.

Surprisingly, technologies such as differential privacy can be used to perform even global analytics in the cloud with mathematically proofed privacy. For example, we can perform a global analysis with bounded error by carefully introducing random noise in the data reported by devices to the cloud. This means that the cloud data for any single device could be completely wrong, but the results of the global analysis will be close to the correct answer. Anyone who looks at the global data can never be sure that data of a particular device is correct thus preserves individual’s data privacy.

Given the expected increase in the number of IoT devices, and the amounts of data these devices will generate, there are many research opportunities to generate security intelligence from these data to better protect the IoT devices and the networks they are connected to. AT&T, as one of the ISPs, is researching the opportunities above and beyond, and we envision many other challenges and opportunities to arise with new devices, new technologies and new unforeseen applications.