The group comprises Google, hardware vendors IBM, HP Enterprise, Dell EMC, in addition to Intel’s more direct competitors AMD and NVIDIA, among others. IBM’s forthcoming Power9 processors, expected to start next year, will support the standard and so will IBM’s servers they’ll power.
Intel now dominates the market for server processors, and hyperscale data center operators such as Google, which spend tremendous amounts on hardware every quarter, need a workable choice. They’ve generally adopted a multi-vendor strategy for sourcing nearly all components of their infrastructure, but it’s hard to expand that strategy to processors given the size of Intel’s lead in the market.
OpenCAPI and Power9 are aimed at the high end of the server marketplace — computers used for information-intensive analytics workloads or machine learning. The group claims that the standard will be effective at boosting server functionality tenfold.
That performance improvement comes as a result of two things: higher bandwidth on the connections between CPUs and accelerators and cache coherency, which essentially means data needs to be shuffled less within the system as it’s being processed, saving resources as a result.
Accelerators, or additional processors that take on a portion of the CPU’s workload to free up its resources, have been a mainstay in the world of supercomputers for years, but their role is now growing in significance in server architecture for cloud data centers and for the fast emerging area of machine learning.
Most accelerators in use today are GPUs, made by the likes of AMD and Nvidia, and some are Intel’s Xeon Phi, but there in addition has been growth in the usage of FPGAs, or Field-Programmable Gate Arrays, as accelerators. The advantage of FPGAs is they can be reconfigured as workload needs change.
Intel has invested greatly in FPGAs a year ago, paying $16.7 billion to get FPGA specialist Altera. The most prominent user of FPGAs to accelerate cloud workloads is Microsoft, whose latest-generation cloud server layout supports the technology.
It is cloudy at this point what kind of architecture will control the marketplace for machine learning hardware. There are divergent views on this today, with companies like Nvidia supporting GPU-accelerated AI servers and Intel saying that version isn’t scalable, pitching the next generation of its Xeon Phi processors — codenamed Knights Mill and anticipated to hit the marketplace next year — as the better choice.
- Amazon’s cloud servers for information-intensive workloads, including machine learning, rely on GPUs, and so does Big Sur, Facebook’s open source server design for AI workloads.
- Google has designed its own custom processor for machine learning, called Tensor Processing Unit.
- The firm hasn’t shown any details about TPU’s design, saying simply that it’s an ASIC (Application Specific Integrated Circuit) and that it’s optimized for TensorFlow, its library of open source software for making AI applications.
Google can also be focusing on a server layout jointly with Rackspace, which will run on IBM’s Power9 processors and have the OpenCAPI interface. The firms released the first draft of the Zaius server spec, which they plan to lead to the Open Compute Project, today.
The OpenCAPI association has an FPGA player among its members, as well as server and GPU vendors. San Jose-based Xilinx plans to support OpenCAPI-enabled FPGAs, according to Friday’s announcement.
IBM’s accelerator strategy has been to support as broad an array of choices as possible.