Data leakage is becoming more difficult to stop or even trace as chips become increasingly complex and heterogeneous, and as more data is stored and utilized by chipmakers for other designs.
Unlike a cyberattack, which typically is done for a specific purpose, such as collecting private data or holding a system ransom, data leaks can spring up anywhere. And as the value of data increases, they can be just as costly. But they are much more difficult to pinpoint and stop because the causes are diverse, unpredictable, and frequently unintentional. They include:
- Manufacturing defects and circuit aging — electromigration, time-dependent dielectric breakdown, and thermal-related damage, among other things — which can provide openings like side channel attacks without actually hacking into a chip. They also can provide easier access for attackers to important data.
- Knowledge repositories for designing chips that are aimed at keeping learning in-house, but which also make it harder to keep track of proprietary third-party IP.
- Continued talent shortages at all levels of design through manufacturing, which often mean that in-depth skills and competitive knowledge developed in one company will follow employees to jobs at new companies.
From a hardware perspective, leakage can be a function of complexity in a chip or package, or it can be due to a flaw in design or manufacturing. That can open the door for someone to extract data without even touching the chip.
“Maybe it’s not even a flaw, but a weakness that’s exploited in a very clever fashion,” said Raj Jammy, chief technologist at MITRE Engenuity and executive director of the Semiconductor Alliance. “When you combine multiple chips, your vulnerabilities increase. So you have to think through this much differently. And it’s not just limited to chip-level security. You also have to worry about what happens at the package level when you put everything together on a single substrate. Sometimes people call this leakage, but it’s probably one of those weaknesses where you can sense the timing of a given chip. And once you know the timing and you start reading out what is transmitting, then you can predict what are the bits going through the chain. That could be leakage, which is non-invasive and almost like interception. In other parts, there may be weak connections due to aging, and then you may not even be operating the chip appropriately. The bigger risk is if there are spurious signals that you’re sending to a neighboring chip or chiplet in a package.”
Much of this is exacerbated by heterogeneous designs, where different chips, chiplets, or materials can have different life expectancies. Unlike in the past, when everything was developed at the same process node and into a processor or SoC integrated by one company, these components now are disaggregated and acquired from a global supply chain. Various process elements, memories, and other components are developed using different manufacturing processes, sometimes by different foundries. That makes it more difficult to fuse together these components, and it can create weaknesses that can be leveraged without ever actually touching a device. Typically that doesn’t provide access to all data, but it may not have to. Depending upon which data is leaking and from where that’s happening, it still may be extremely valuable.
“If you have a chiplet-based approach, or a multi-chip package, then all of these chips have to work together to yield the security you need,” said Peter Laackmann, distinguished engineer for the Connected Secure Systems Division at Infineon. “For example, there have been attacks where there was a security chip inside, which was certified and quite good, but it was also in the same package as a standard microcontroller. The problem was that the standard microcontroller was fully controlling the security chip. After a few attacks on the microcontroller, then you get the keys. This means the security controller cannot protect the complete system. And the same applies for all sorts of chiplets and multi-chip packages.”
Laackmann said that for security chips/chiplets, this is unlikely to be a problem because those chips typically are not stressed the way a processing element would be. But for other components, aging can cause circuits to behave differently, and that differential can be used to collect important data. “Some chips have pins that are used to supply the internal core voltage. If you access them, you have access to the internal core voltage, which normally is smoothed with external capacitors. If you defeat these capacitors, you could create a good side channel analyzer. These chips are not prepared for that. Also, you can add glitches or spikes to the internal core voltage of the chip to make false operations that jump over instructions or a pin or password entry.”
Chiplets add their own issues. “The chiplets will process data, and all those computations will have to be secured against side-channel attacks and projected against fault injection, among other things,” said Pim Tuyls, CEO of Intrinsic-ID. “But on top of that, you now have to make sure that the communications channels between all those different chiplets are secure, too. That’s a challenge in itself.”
AI/ML and IP reuse
When lots of data flows out of a system, generally it will be noticed. That may prompt a security patch, or a wholesale chip/package/system replacement. “If you have 800 Gbps Ethernet traffic, that’s a lot of data passing through,” said Neeraj Paliwal, vice president and general manager of Rambus‘ Security IP Business. “We don’t have too many use cases where that kind of unencrypted data passes through and where the data speeds are really fast.”
Data leakage, in contrast, tends to be much more subtle and harder to spot, and in many cases it’s unintentional. But with system optimization through AI/ML, which occurs in what is essentially a black box, it’s also almost impossible to trace, let alone establish legal protections.
“Layers and layers of technology are stacked on each other in our business, from the fabrication to the standard cells to the higher-level building blocks,” said Steve Roddy, CMO at Quadric. “You use those to build up other IP. Where does one end versus another begin? You scrape prior designs to figure out efficient patterns and then apply that to the next design that comes down the line. So now you’re taking insights from the RTL structures and saying, ‘With this kind of structure I should place it in a certain way.’ You’re not actually copying anything. But because the input was someone’s RTL, you’re comparing it to other similar things. And you’re basing that on someone else’s IP. The mask sets are someone else’s IP, too. With AI, you may be using customer design data to drive the training set? If so, who owns that customer data?”
Keeping track of IP as it moves through a design is another challenge. “With reinforcement learning, it takes IP and turns it into project IP,” said Simon Rance, vice president of marketing at Cliosoft. “Then you optimize it for new releases and the next version of a chiplet or other IP, and take in all the real-time data. The IP is provided by a company, the real-time data is from the field, and you are supposed to partition that based by what pieces of IP are owned by whom. It may come from different regions of the globe, so then part of the IP is owned by them and part of it by the customer. So now you have to go through the metadata to figure out who the players are in the project for that IP. A lot of times there is no clear answer. There are no standards. And to really address this you need full traceability about who has seen it, which is highly unlikely to happen.”
That makes it essential for companies to guard their IP much more assiduously than in the past. “We provide more detail here on what we’re doing than any company I’ve ever worked at,” said Paul Karazuba, vice president of marketing at Expedera. “An IP license gets you the right to understand what we’re doing. But there’s still a black box aspect of what we do, and that will remain, because what we do is significantly different than what everyone else does. And we’re patenting everything we possibly can to protect ourselves. There are still state secrets that we don’t want to reveal about our company.”
More collaboration in the market makes this more difficult to protect, something that is complicated by the increasing prominence of generative AI. “You’re dealing with knowledge that represents best practices, but so far it’s been incremental bits because you don’t want to give up everything to the customer,” said Frank Schirrmeister, vice president of marketing at Arteris IP. “AI is a solution to drive optimization, and if it’s a big customer, they’ll want their own spin. But with generative AI, where you are dealing with very specific market needs, they’re customizing lots of data for their own use. That adds all sorts of new copyright and IP issues.”
The human factor
Data has been leaking from chip companies since the first semiconductors were developed. As people change jobs, or change companies, they bring along knowledge of whatever they learned on a previous job. But the rate of churn is increasing on a global scale, and the impact can be seen in IP infringement litigation. Between 2012 and 2021, there were slightly more than 46,000 patent case filings in the United States, according to LexMachina, or an average of 4,600 per year. In contrast, in 2021 there were nearly 32,000 patent infringement lawsuits filed in China, up from about 5,800 in 2010, according to The Law Reviews.
Much of ongoing dispute between the United States and both China and Russia involves IP or patent infringement. A 2022 report issued by the Office of the U.S. Trade Representative cited gaps in trade secret protection and enforcement, particularly in China and Russia. “Theft may arise in a variety of circumstances, including those involving departing employees taking portable storage devices containing trade secrets, failed joint ventures, cyber intrusion and hacking, and misuse of information submitted by trade secret owners to government entities for purposes of complying with regulatory obligations,” said the report.
Given the complexity of the devices being developed, often limited access of employees, and the rapid pace of improvements, this generally falls under the heading of data leakage. But the volume of leakage, and the value of the leaked data, has risen to the point where governments are now seeking agreements or imposing sanctions. And for the chip industry, what used to be confined to a single company, or a limited set of suppliers, is now being disaggregated into many different parts that stretches around the globe.
Nevertheless, demand for talent is extremely high, and engineers with expertise they learned in previous roles are in high demand. The challenge will be to create architectures for engineers so that projects are broken into teams that cannot see the whole picture, which some companies already have done with design teams that are located in different countries, according to industry sources. That can be cumbersome to manage, but it also can limit the value of leaked data.
Going forward, chipmakers and IP developers will have to work harder to maintain a divide-and-conquer strategy for employees, and to stay vigilant in monitoring the flow of data wherever possible.
“Nothing is foolproof in security,” said Rambus’ Paliwal. “You’re buying security to make it harder, not to make it foolproof. Measurable security is becoming very important. That’s why you see in the industry that Siemens has acquired UltraSoC, and both proteanTecs and Synopsys are heavily into this. And now, we are starting to put structures in our IP that will help measurability. You need very smart sensors on your hardware so you know when something has changed, or wherever you see a potential data leak where the secrets are being stored.”
The more difficult part will be tracking data as it moves through chip companies, where it is used to generate other designs that may or may not recognize the source of that IP. This will be particularly challenging with machine learning, which can store optimized data for future use, and with generative AI, which the tech world is just beginning to grapple with. Ultimately, standards for behavior and laws will be enacted, but data will continue to move uninhibited across international borders where protection levels vary greatly.
Data leakage cannot be totally prevented, but it can be limited and managed better than it has been. Still, it will take a concerted effort by the entire industry, and it’s not clear at this point who would spearhead that effort or when it might happen.