A Microsoft research project shows how FPGAs can turn into flexible data centre resources as they’re more widely deployed for hardware acceleration.
Moore’s Law has been slowing down for years. The latest CPUs no longer increase the performance of applications the way they used to, and the performance increases you do get come with increasing power demands. For the most demanding applications — like indexing the web, managing high-speed software-defined networking and machine learning — hardware accelerators are becoming increasingly common. At first, these were GPUs, which are programmable and highly parallelised. Working with a GPU means moving all the data to the GPU and then processing it, so they’re good for doing high-latency computation in batches, but they consume a lot of power.
If you know exactly what computation you need to do, you can create custom accelerators, like Arm’s machine learning-specific Arm ML processor or Google’s TPUs (Tensor Processing Units), which are built to handle the small set of instructions used for machine learning, at a lower numeric precision, which means the chip can be more power-efficient than a general-purpose CPU. Or you can create a custom silicon ASIC, an Application-Specific Integrated Circuit designed to run a single application very efficiently — but to make that worthwhile, you need to freeze the code you’re going to run and keep using it unchanged for several years.
FPGAs (Field Programmable Gate Arrays) are in between: they’re not as power-hungry as a GPU but they can process streams of data with low latency and in parallel; they’re not as efficient as an ASIC, but you can change the code. But FPGAs have never become common because they’re not easy to program (get the Verilog code wrong and you can potentially damage the hardware), and they haven’t been easy to integrate with standard server hardware.
Microsoft started working on how to use FPGAs for AI in 2011. By 2014 it was looking specifically at accelerating deep-learning networks with FPGAs to power Bing indexing, as well Azure networking. In 2016, Microsoft built an FPGA-powered supercomputer for inference — that’s running rather than training machine-learning models — to power the Bing index, and to accelerate deep learning in Azure.
SEE: Microsoft Build 2019: The biggest takeaways (free PDF) (TechRepublic)
Now customers can run their own trained machine-learning models on FPGAs in Azure, or on the Intel Arria 10 FPGA in Azure Data Box Edge. That’s an appliance you put in your own data centre, either to pre-process data you’re sending to Azure or to run the machine-learning models you created in Azure locally to get results more quickly (while also sending the data to Azure to keep improving the model).
That hides all the complexity not just of programming FPGAs, but also of deploying them in the data centre. The FPGAs in the 2016 ‘inference supercomputer’ were on a secondary network, which meant extra cabling and only the 48 FPGAs in a rack could communicate directly. Now the FPGAs on Azure are connected directly to the network, so they sit between the network switches and servers — all the network traffic goes through them — as well as being connected to the CPU they’re physically sitting in. That means the FPGA can act as a local accelerator for that server, but it can also be part of a pool of FPGAs that a server can use to handle a data model that would be too big to fit in a single server.
That’s an approach that Doug Burger (who pioneered the FPGA work in Microsoft Research and has now moved over to be the Technical Fellow in the Azure hardware division) calls ‘hardware microservices’. “In Bing, we treat FPGAs as a pool, a fabric of network-attached devices that we manage as a collective,” he told TechRepublic. “As we move farther into the accelerated world post-Moore’s law, these hardware microservices communicating at microsecond and hundreds of nanosecond latencies is something you’ll see over and over again in the approach we take.”
Azure isn’t the only cloud with FPGAs — Baidu uses them to accelerate SSD access, for example. And in AWS developers can use them to accelerate applications that usually run on an appliance in the data centre that has an FPGA built in.
As more people get interested in deploying FPGAs in data centres, more research is being done into how to manage them in a data centre infrastructure, and how to abstract away hardware details like the way an FPGA connects to memory, storage and the network. This means that developers writing accelerators to run on an FPGA don’t have to deal with these details, making it easier to deploy different FPGAs. It also ensures than a badly programmed accelerator can’t accidentally (or maliciously) damage the FPGA hardware — for example by creating a logic loop that causes dangerous overheating.
Microsoft Research has an early take on that — an FPGA operating system called Feniks that runs on the FPGAs and both manages and connects them. Feniks can divide an FPGA into multiple ‘virtual’ accelerators, virtualising I/O, and giving FPGAs direct access to resources like disk drives over PCIe rather than having to go through the CPU — which means the CPU doesn’t get interrupted while it’s running its own workload.
SEE: Choosing your Windows 7 exit strategy: Four options (Tech Pro Research)
That’s more granularity than Microsoft is using today: an FPGA workload in Azure runs on at least one FPGA and might use many FPGAs together. When the AI for Earth team wanted to use machine learning on maps of the whole of North America to detect patterns of land use (buildings, road, airports, farming, forests, lakes and rivers, and everything else), they used 800 FPGAs to process 20 terabytes of images in just over ten minutes. But if you don’t have thousands of FPGAs, the ability to hardware-accelerate data compression (like Azure’s Project Zipline) and a network firewall like OpenFlow on the same FPGA, without the workloads interfering with each other, gives you much more flexible ways of using the hardware.
Feniks handles allocating FPGAs as resources: it tracks what FPGAs are already in use, picks one to deploy a new accelerator to, and sends configuration commands to set it up and load the accelerator. That’s the kind of job scheduling that underlies distributed software platforms like Hadoop and Kubernetes — again, turning FPGAs into hardware microservices.
Feniks is very much a research project and is not available outside Microsoft Research. But as FPGAs become more important as hardware accelerators, they’re going to require a data centre operating system just as CPU-based servers do. This is an interesting glimpse of the kind of features that FPGA data centre OSs will support.