Facebook Expands the Open Compute Project (OCP) to Cover AI Applications
Facebook supports 2.7B people, does 200 trillion predictions, and over 6 billion daily translations. Additionally, 3.5 billion images are analyzed to better recognize and tag content. So, it should come as no surprise that AI is used extensively to make all this happen. To support various AI tasks (training, inference, feature engineering, etc.) they have extended the Open Compute Project (OCP) to include hardware platforms suited to tackle AI-intensive computations.
They announced the availability of two new platforms one intended for model training (Zion) while the other (Kings Canyon) suited for inference jobs. These platforms consist of server blades, AI accelerator modules, connectivity, and the chassis.
So why have two separate platforms? The hardware architecture needed for effective training and inference are vastly different. Gear intended for training require massive amounts of memory, ability to support floating point operations while latency, power, and cost are less important. On the other hand, inference hardware requires less memory and will do just fine by only integer math operations. The most critical requirement of inference hardware is latency since inferences have to be done in real time.
Zion (training platform) is designed to efficiently handle the training of a spectrum of neural networks including CNN, LSTM, and SparseNN. The AI accelerators are housed in a vendor-agnostic OCP module (OAM) and it can support devices from AMD, Habana, Graphcore, Intel, and NVIDIA.
Kings Canyon is specifically designed for inference tasks. The AI accelerator chips are housed in M.2 module that are too vendor agnostic and can support devices from Esperanto, Habana, Intel, Marvell, and Qualcomm.
Uber Architecting Next-generation AI Servers
Similar to Facebook and Google, Uber is yet another company that utilizes AI in just about every aspect of their business. This includes recommendation engines, Uber Eats, fraud detection services among others. Imagine doing this to accommodate 15 million daily rides in 600 cities spanning through 65 countries. In order to support such a scale, they have two large data centers (each consuming 5 MW) and several smaller ones around the world.
Their AI training and inference engines presently are based on Nvidia GPUs. Their existing solution consumes 40 kW per rack which is twice the power needed for standard servers. In search for better power efficiency and performance, they are actively evaluating accelerators from the likes of Eyeris, Graphcore, and Wave Computing.
Ultra-Compact Speech Recognition Model from Google
Google announced the availability of an end-to-end, all-neural, speech recognition model that is compact enough to reside in mobile devices. This model is trained using RNN transducer (RNN-T) technology. The significance here is that effective real-time speech recognition can be performed locally even when the device is offline. An added benefit of this approach is the elimination of latencies due to spotty service coverage. This model requires 80 MB compared to 450MB needed for traditional server-based speech recognition models. The significance here is that it is quite feasible to have state-of-the-art speech recognition capabilities in whole host of battery-operated edge and IoT devices.
Hope you have benefited from this issue. Please forward to others if you find value in this content. I always welcome feedback.