AI Inference on Azure Brief

AI Inference on Azure Solution Brief

Dpsv5 Virtual Machines Powered by Ampere Altra Processors

Overview

Ampere^® Altra^® processors are designed to deliver exceptional performance for Cloud Native applications such as AI Inference. With an innovative architecture that delivers predictable high performance, linear scaling, and high energy efficiency, Ampere Altra allows workloads to run in a predictable manner with minimal variance under increasing loads. This enables industry leading performance/watt performance and a smaller carbon footprint. You can now run AI inference workloads with both industry leading performance and energy efficiency.

Microsoft offers a comprehensive line of Azure Virtual Machines featuring the Ampere Altra Cloud Native processor that can run a diverse and broad set of scale-out workloads such as web servers, open-source databases, in-memory applications, big data analytics, gaming, media, and more. The Dpsv5 VMs are general-purpose VMs that provide 2 GB of memory per vCPU and a combination of vCPUs, memory, and local storage to cost-effectively run workloads that do not require larger amounts of RAM per vCPU. The Epsv5 VMs are memory-optimized VMs that provide 4 GB of memory per vCPU, which can benefit memory-intensive workloads, including open-source databases, in-memory caching applications, gaming, and data analytics engines.

MLPerf™ Inference is a benchmark suite consisting of carefully selected AI architectures that represent the forefront of today’s artificial intelligence. It is a comprehensive test of how well a given system performs on a variety of representative machine learning tasks, including natural language processing, computer vision, recommendation engines, and more. It is a result of a consensus on the best benchmarking techniques forged by experts in architecture, systems, and machine learning.

Results and Key Findings

Ampere Altra VMs offer great performance on a variety of AI workloads, including the models in the MLPerf Inference benchmark. ResNet-50 v1.5 is a popular neural network architecture primarily used in the field of computer vision. This model, trained to perform well on ImageNet class prediction task, is part of the MLPerf Inference suite. We are running an MLPerf-like benchmarking script that measures the performance of model inference without any internal conversion to proprietary formats. This provides unbiased comparisons of performance across architectures while running the same neural network.

Ampere Altra-based Dpsv5 VMs are only cloud CPU instances on Azure that natively support FP16 vectorized computation. FP16 can deliver up to a 2x performance gain over FP32 without sacrificing model accuracy. Ampere optimized TensorFlow takes full advantage of FP16 to deliver the best performance and price-performance over legacy x86 VMs.

In the single-stream scenario, which measures the 99th percentile latency of processing a single input image, the Dps5 VM performed 36% better than the Intel Ice Lake-based Dsv5 VMs and 2.6x better than the AMD Milan-based

Dasv5 VMs, as shown in Figure 1. On price-performance, Figure 2 shows the results - the Ampere Altra-based Dpsv5 VMs had a 68% and 2.9x advantage over the Dsv5 and Dasv5 VMs.

Fig.1: ResNet-50 v1.5 Single Stream Performance on Microsoft Azure Dpsv5 Virtual Machines Powered by Ampere Altra Processors

Fig.2: ResNet-50 v1.5 Single Stream price-Performance on Microsoft Azure Dpsv5 Virtual Machines Powered by Ampere Altra Processors

In the offline scenario – measuring the maximum throughput of the system (number of processed inputs in a fixed unit of time) without latency constraints – the Ampere Altra-based Dpsv5 VM came out on top – 11% more performant than the Dsv5 VM and 2.1x compared to the Dasv5 VM as shown in Figure 3.

Fig.3: ResNet50-v1.5 Offline Throughput on Microsoft Azure Dpsv5 Virtual Machines Powered by Ampere Altra Processors

Fig.4: ResNet50-v1.5 Offline Throughput Price-Performance on Microsoft Azure Dpsv5 Virtual Machines Powered by Ampere Altra Processors

On price-performance, as shown in Figure 4, the Ampere Altra-based Dpsv5 VM was 39% more cost-efficient than the Dsv5 VM and 2.3x compared to the Dasv5 VM.

Benchmarking Configuration

The results in this workload brief are based on measurements with the Ampere Model Library (AML) for D16ps v5, D16s v5, and D16as v5 VMs.

D16ps v5 results were measured using Ampere optimized TensorFlow 2.7.1 available here running on Ubuntu 20.04 with Linux kernel 5.15.0-0.bpo.3-cloud-arm64.
D16s v5 results were measured using Intel optimized TensorFlow 2.7 available here running on Ubuntu 20.04 with Linux kernel 4.19.0-19-cloud-amd64.
D16as v5 results were measured using AMD ZenDNN TensorFlow 2.7 available here running on Ubuntu 20.04 with Linux kernel 4.19.0-19-cloud-amd64.

Price-performance data is based on Azure on-demand pricing in the Iowa region as of July 12, 2022.

Key Findings and Conclusions

AI Inference is rapidly growing as a workload in the cloud. Ampere optimized frameworks (TensorFlow, PyTorch, and ONNX Runtime) provide the best-in-class Inference performance for a variety of AI models such as computer vision, natural language processing, and recommendation engines. Popular computer vision models such as ResNet-50 have been studied on several Azure VMs. In our tests, the Microsoft Azure Dpsv5 VMs powered by the Ampere Altra Cloud Native processors and Ampere optimized TensorFlow delivered remarkably better Inference performance and price-performance than legacy x86 VMs. Overall, great performance and compelling price-performance, all while reducing your carbon footprint.

Footnotes

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

Price performance was calculated using Microsoft's Virtual Machines Pricing, in September of 2022. Refer to individual tests for more information.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing^® / 4655 Great America Parkway, Suite 601 / Santa Clara, CA 95054 / amperecomputing.com

Created At : September 27th 2022, 10:33:13 am

Last Updated At : February 14th 2024, 12:48:29 am

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

| | | | | |

This site is running on Ampere Altra Processors.