Company
Solutions
Developers
Careers
Search
EN
EN
Ampere Computing Logo
Solutions
Solutions Home
Systems
Solutions
Performance Overview
Tutorials Overview
Workload Briefs Overview
Tuning Guides Overview
Where to Try
Ampere Systems
Ampere Altra
Azure
Equinix
Google Cloud
Oracle
Tencent Cloud
Ampere AIDownloadsHow It WorksFAQs
Developers
Developer CenterDesigning Cloud ApplicationsBuilding Cloud ApplicationsDeploying Cloud ApplicationsUsing Your DataEnabling the Open-Source CommunityAmpere Ready SoftwareAmpere Developer Community Forum
Support
Search
Solutions with Ampere Cloud Native Processors

AI - Ampere Altra Family Vs. Graviton

Benefits of running AI Inference Ampere

Print
Download
Introduction
Technical Features Comparison
Cost/Performance Comparison
Conclusions
End Notes
Disclaimer
Introduction

In launching the first Cloud Native processors, Ampere has created a new CPU category in the Cloud and enterprise-class server markets. These processors benefit from a ground-up design based on cloud-native tasks eliminating many of the legacy hardware features of the x86 architecture while boosting performance, reducing unnecessary complexity, and reducing power consumption at the same time. Ampere® Altra® and Ampere Altra Max Cloud Native Processors based on the ARM v8.2 instruction set are being increasingly adopted by major Cloud Service Providers and outperform legacy CPUs both in terms of raw performance and performance/watt numbers. Ampere shows significant performance leadership on AI Inferencing workloads as well. This advantage stands not only against the legacy x86 architectures but also against an ARM v8 based processor family introduced by AWS. In what follows we will examine and compare the performance of Altra and Altra MAX with AWS’s Graviton 2 and Graviton 3.

Technical Features Comparison

DeviceAltraAltra MaxGraviton 2Graviton 3
Process Node7nm7nm7nm5nm
CPU Cores801286464
Fmax3.3GHz3.0GHz2.5GHz2.6GHz
ArchitectureARM v8.2ARM v8.2ARM v8.2ARM v8.5
Micro - ArchitectureNeoverse V1Neoverse V1Neoverse V1Neoverse V1 + 256b SVE
L1 Cache64KB I - 64KB D64KB I - 64KB D64KB I - 64KB D64KB I - 64KB D
L2 Cache1MB1MB1MB1MB
L3 Cache32MB shared16MB shared32MB shared64MB shared
Memory Channels 8x DDR4-32008x DDR4-32008x DDR4-32008x DDR5-4800
Encryption AES-256AES-256AES-256AES-256
PCIe 128 x PCIe 4.0128 x PCIe 4.064 x PCIe 4.032 x PCIe 5.0
TDP40 – 187W132 – 183W80 – 110W (est.)80 – 110W (est.)

Table 1: Ampere Altra and Altra MAX vs. Graviton 2 and Graviton 3 key features


Ampere Altra and Ampere Altra Max display some clearly superior performance advantages:

  • Up to 2x higher number of CPU cores. Ampere offers up to 2x compute capacity on the same device.
  • Higher CPU core speeds by up to 20% with no visible power penalty
  • Lower or similar silicon cost at twice the compute capacity

In addition, Ampere-optimized AI frameworks, TensorFlow, PyTorch and ONNX Runtime have additional speed up capability and take full advantage of the Ampere Altra family’s built-in hardware support for fp16 half precision data format. As a result, the Ampere Altra family deliver consistently superior performance compared to Graviton 2 and Graviton 3 in the majority of AI workloads. In this post we discuss the Ampere Altra family of processors’ benchmarks for computer vision and NLP models exemplifying our performance advantage over the Graviton family of processors.

In ResNet-50 v1.5 benchmarks we have measured latency and throughput performance of Altra Max performs and Graviton 2 and Graviton 3 (see Figures 1.1 and 1.2). All benchmarks were run using 64 cores in single threaded configuration. Latency tests used a batch size of one and the throughput tests a batch size of 64. Altra Max is 7x faster than Graviton 2 in latency with more than 2x in throughput. While Graviton 3 seems to have significantly improved its performance over Graviton 2, it still falls short against Altra Max that remains more than 3x faster in latency and more than 2x in throughput.

ResNet-50 v1.5

Figure 1.1: Altra Max latency vs.Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In latency smaller is better)

ResNet-50 v1.5

Figure 1.2: Altra Max throughput vs. Graviton 2 and Graviton 3 for ResNet-50 v1.5 (In throughput larger is better)


In NLP workloads Altra MAX conserves its advantage in both latency and throughput. Graviton 3 shows an improved performance over Graviton 2, but still falls short of reaching Altra’s performance levels. Figures 2.1 and 2.2 summarize the BERT_large_MLPERF_Squad benchmark for the three devices. Altra MAX’s latency performance is 2.4x better than Graviton 2’s and 1.7x better than Graviton 3’s and its throughput is 1.7x higher than Graviton 2’s and 1.5x higher than Graviton 3’s.

The performance advantage described above is based on Altra’s fp32 mode. When used in fp16 mode—without any impact on accuracy—the performance gap further increases in the favor of Altra as can be observed in the figures 1 and 2.

BERT_large_MLPERF_Squad

Figure 2.1: Altra MAX relative latency vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In latency smaller is better)

BERT_large_MLPERF_Squad

Figure 2.2: Altra MAX relative throughput vs. Graviton 2 and Graviton 3 for BERT_large_MLPERF_Squad (In throughput larger is better)

Cost/Performance Comparison

Access to Graviton instances is only possible through AWS. Ampere Altra is available on OCI (Oracle Cloud infrastructure), Microsoft Azure, Google Cloud, Tencent Cloud and Equinix and Hetzner offer bare-metal instances. The pricing of the A1 instances is about half of both Graviton 2 (c6g.xlarge) and Graviton 3 (c7g.xlarge). The A1 instance offers 24GB of memory for this price against 16GB for c6g and only 8GB for c7g. Given the performance advantages Ampere Altra and Altra MAX deliver over Graviton 2 and Graviton 3, they represent the obvious choice for AI inference workloads.

Furthermore, Table 3 lists the pricing options to build compute instances required to run the actual benchmarks or equivalent workloads at the performance levels shown in Figures 1 and 2.


DeviceCloud ServiceCompute InstanceConfiguration$/CPU hourMonthly Cost
Ampere AltraOCIA164 VCPUs 128 GB$0.832$599.04
Graviton 2AWSc6g.16xlarge64 VCPUs 128 GB$2.176$1,000.76
Graviton 3AWSc7g.16xlarge64VCPUs 128GB$2.312$1,111.61

Table 2: Altra and Graviton 2 and Graviton 3 cost at 64 VCPU + 128GB configuration


Finally, with many different workloads and models are considered, Altra’s average price/performance ratios over Graviton 2 and Graviton 3 are compiled and shown in the end notes.

Device Altra fp16 Altra fp32
Over Graviton 2 10.2x6.5x
Over Graviton 3 4.7x 2.8x
Table 3: Altra’s composite price/performance advantage over Graviton 2 and Graviton 3 (Using compute configurations shown in Table 3)
Conclusions

Ampere Altra and Altra Max CPUs are the clear leaders in performance and price when it comes to Cloud instances against AWS’s ARM based Graviton 2 and Graviton 3 compute instances by a wide margin.

End Notes

The composite price/performance numbers were based on the benchmark results of 12 industry standard computer vision and NLP models tested in single stream latency and offline throughput using Ampere Altra, Graviton 2 and Graviton 3. The models used for the benchmarks are shown in Table 4:


tf_bert_base_c_squad
tf_bert_large_c_wwm_squad
tf_bert_large_mlperf_squad
tf_densenet_169
tf_distilbert_base_c_squad
tf_inception_v2
tf_resnet_50_v1.5
tf_roberta_base_squad
tf_ssd_mobilenet_v1
tf_ssd_resnet_34
tf_vgg_16
tf_yolo_v4_tiny

Table 4: Models used in Altra vs. Graviton Benchmarks


The hardware platforms along with the TensorFlow versions used in the benchmarks are shown in Table 5:


Ampere Altra, 1P 80cAWS Graviton 2, 1P 64c, c6gAWS Graviton 3, 1P 64c, c7g
TF 2.7.2 + AIOTF 2.7TF 2.10 ACL, TF 2.10 or TF2.8.2 (based on availability)

Table 5: Hardware platforms and Software versions used in the benchmarks.


Given the availability of multiple versions of TensorFlow for Graviton 3, only the best performance results were used in the final benchmark report.

Disclaimer

All data and information contained herein is for informational purposes only and Ampere reserves the right to change it without notice. This document may contain technical inaccuracies, omissions and typographical errors, and Ampere is under no obligation to update or correct this information. Ampere makes no representations or warranties of any kind, including but not limited to express or implied guarantees of noninfringement, merchantability, or fitness for a particular purpose, and assumes no liability of any kind. All information is provided “AS IS.” This document is not an offer or a binding commitment by Ampere. Use of the products contemplated herein requires the subsequent negotiation and execution of a definitive agreement or is subject to Ampere’s Terms and Conditions for the Sale of Goods.

System configurations, components, software versions, and testing environments that differ from those used in Ampere’s tests may result in different measurements than those obtained by Ampere.

©2022 Ampere Computing. All Rights Reserved. Ampere, Ampere Computing, Altra and the ‘A’ logo are all registered trademarks or trademarks of Ampere Computing. Arm is a registered trademark of Arm Limited (or its subsidiaries). All other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

Ampere Computing

4655 Great America Parkway

Suite 601 Santa Clara, CA 95054

Tel: +1-669-770-3700

info[at]amperecomputing.com

About
image
image
image
image
© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.