How to render cloud FPGAs useless

Day 2 19:15 Fuse en Security

Dec. 28, 2025 19:15-19:55

While FPGA developers usually try to minimize the power consumption of their designs, we approached the problem from the opposite perspective: what is the maximum power consumption that can be achieved or wasted on an FPGA? Short answer: we found that it’s easy to implement oscillators running at 6 GHz that can theoretically dissipate around 20 kW on a large cloud FPGA when driving the signal to all the available resources. It is interesting to note that this power density is not very far away from that of the surface of the sun. However, such power load jump is usually not a problem as it will trigger some protection circuitry. This led us to the next question: would a localized hotspot with such power density damage the chip if we remain within the typical power envelope of a cloud FPGA (~100 W)? While we could not “fry” the chip or induce permanent errors (and we tried several variants), we did observe that a few routing wires aged to become up to 70% slower in just a few days of stressing the chip. This basically means that such an FPGA cannot be rented out to cloud users without risking timing violations. In this talk, we will present how we optimized power wasting, how we measured wire latencies with ps accuracy, how we attacked 100 FPGA cloud instances and how we can protect FPGAs against such DOS attacks.

FPGA instances are now offered by multiple cloud service providers (including Amazon EC2 F1/F2 instances, Alibaba ECS Instances, and Microsoft Azure NP-Series). The low-level programmability of FPGAs allows implementing new attack vectors including DOS attacks. While some severe attacks (such as short circuits) cannot be easily deployed as users are prevented to load own configuration bitstreams on the cloud FPGAs, it has been demonstrated that it is possible to leak information (like cloud instance scheduling policies or the physical topologies of the FPGA servers) or to mount DoS attacks by excessive power hammering. For instance, basically all cloud FPGAs provide logic cells that can be configured as small shift registers. This allows building toggle-shift-registers with 10K and more flip-flops, which can draw over 1 KW power when clocked at a few hundred MHz. In our work, we created fast ring-oscillators that bypass all design checks applied during bitstream cloud deployment and how we achieved toggle rates of 8 GHz inside an FPGA by using glitch amplification. The latter one was calibrated with the help of a time-to-digital converter (TDC). As a first attack, we used power hammering to crash AWS F1 instances by increasing power consumption to 300 W (three times the allowed power envelope). We used physical unclonable functions (PUFs) to examine the behaviour of the attacked FPGA cloud instances and we found that most remained unavailable for several hours after the attack. As a more subtle attack, we tried to cause permanent damage to FPGAs in our lab by driving fast toggling signals to virtually any available wire (and primitive) into a small region of the chip. With this, we created hotspot designs that draw 130 W in less than 1% of the available logic and routing resources of a datacenter FPGA. Even though the achieved power density was excessive, it was insufficient to induce permanent damages. This is largely due to the area inefficiencies of an FPGA that limit the power density. For instance, FPGAs use large multiplexers to implement the switchable connections and there exists only one active path that is routed through the multiplexers, hence, leaving most of the transistors sitting idle. Similarly, FPGAs provide a large number of configuration memory cells (about 1 Gb on a typical datacenter device) that draw negligible power as these do not switch during operation. All these idle elements force the power drawing circuits to be spread out, hence limiting power density. Anyway, when experimenting with different hotspot variants, we found thermal runaway effects and excessive device aging with up to a 70% increase in delay on some wires. We achieved this aging in just a few days and under normal operational conditions (i.e. by staying within the available power budget and having board cooling running). Such a large increase in latency can be considered to render an FPGA useless as it will usually not be fast enough to host (realistic) user designs. Beyond exploring these attack vectors, we developed countermeasures and design guidelines to prevent such attacks. These include scans of the user designs, use restrictions to resources like IOs and clock trees, as well as runtime monitoring and FPGA health checks. With this, we believe that FPGAs can be operated securely and reliably in a cloud setting.

Speakers of this event

Dirk

Dirk leads the Novel Computing Technologies group at Heidelberg University. He and his group work mostly on FPGAs, including applications, designing and taping out own FPGA chips (check out the FABulous eFPGA project) and various FPGA tools (like the FPGADefender FPGA bitstream virus scanner). His group designed a single FPGA system with 16 RISC-V threads running at 500 MHz and we just achieved the first 1GHz RISC-V implementation running on an FPGA.