Ampere Computing Logo
Contact Sales
Ampere Computing Logo
Ampere Cloud Native Processor solutions

Samsung FIO Installation and Tuning Guide

for Ampere Altra Processors

Overview

When designing large-scale, high-performance storage, solution architects often choose to run FIO which is a very common technique that can be used to generate a baseline storage performance for high performance drives. FIO spawns a number of threads or processes doing a particular type of I/O action as specified by the user. FIO takes a number of global parameters, each inherited by the thread unless otherwise parameters given to them overriding that setting is given. The typical use of FIO is to write a job file matching the I/O load one wants to simulate.

This section of the document guides you through the steps that are required to install and tune your FIO performance benchmark.

System Configurations

Ampere Altra Processor

  • 80 64-bit CPU cores up to 3.30 GHz
  • 64 KB L1 I-cache, 64 KB L1 D-cache per core
  • 1 MB L2 cache per core
  • 32 MB System Level Cache (SLC)
  • 2x full-width (128b) SIMD
  • Coherent mesh-based interconnect

Memory

  • 8x 72-bit DDR4-3200 channels
  • ECC and DDR4 RAS
  • Up to 16 DIMMs and 4 TB addressable memory

Connectivity

  • 128 lanes of PCIe Gen4
  • Coherent multi-socket support
  • 4 x16 CCIX lanes

Technology & Functionality

  • Arm v8.2+, SBSA Level 4
  • Advanced Power Management

Performance

  • SPECrate®2017 Integer Estimated: 300

Altra Platform Configuration

  • Altra 80-core 2P Mt. Jade server
  • 512GB Memory
  • 24x PCIe Gen. 4 Samsung PM1733a SSD drives – 30.72TB
  • Nvme 0-7 on socket 0
  • Nvme 9-24 on socket 1
  • OS: CentOS 8

Ampere Altra Max Processor

  • 128 Arm v8.2+ 64-bit CPU cores up to 3.0 GHz maximum
  • 64 KB L1 I-cache, 64 KB L1 D-cache per core
  • 1 MB L2 cache per core
  • 16 MB System Level Cache (SLC)
  • 2x full-width (128b) SIMD
  • Coherent mesh-based interconnect – Distributed snoop filtering

Memory

  • 8x 72-bit DDR4-3200 channels
  • ECC, Symbol-based ECC, and DDR4 RAS features 
  • Up to 16 DIMMs and 4 TB/socket

Connectivity

  • 128 lanes of PCIe Gen4
  • Coherent multi-socket support
  • 4 x16 CCIX lanes

Technology & Functionality

  • Arm v8.2+, SBSA Level 4
  • Advanced Power Management

Performance

SPECrate® 2017_int_base: 359

Altra Max Platform Configuration

  • Altra Max 128-core 2P Mt. Jade server

  • 512GB Memory

  • 24x PCIe Gen. 4 Samsung PM1733a SSD drives – 30.72TB

  • Nvme 0-7 on socket 0

  • Nvme 9-24 on socket 1

  • OS: CentOS 8

System Settings

Grub Setting


iommu.passthrough=1

Numa setting consideration

Need to bind memory and cpu to the correct numa when using 2P platform under tests.

For numa 0:


numa_mem_policy=bind:0 numa_cpu_nodes=0 cpus_allowed=0-127

For numa 1:


numa_mem_policy=bind:1 numa_cpu_nodes=1 cpus_allowed=128-255

PCIe speed and width check

Check link status for all 24x drives to make sure the PCIe speed and width detected correctly.

[root@localhost samsung]# ./LnkSta.sh LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 8GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok) LnkSta: Speed 16GT/s (ok), Width x4 (ok)

Max Read Request needs set to 512 bytes

[root@localhost samsung]# ./mrr.sh MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes MaxPayload 512 bytes, MaxReadReq 512 bytes MaxPayload 256 bytes, MaxReadReq 512 bytes

Precondition the drives

Prior to using the drives and running benchmark, you need to format, then precondition all the drives.

Random Read/Write Configuration for 24 Samsung PM1733a SSDs

Random Read


[global] name=random rw=randread bs=4K direct=1 numjobs=16 runtime=600 ioengine=libaio iodepth=64 norandommap group_reporting randrepeat=1 random_generator=tausworthe64

Random Write


[global] name=randomwrite rw=randwrite bs=4K direct=1 numjobs=16 ramp_time=20 runtime=600 ioengine=libaio iodepth=64 norandommap group_reporting randrepeat=1 random_generator=tausworthe64
Sequential Read/Write Configuration for 24 Samsung PM1733a SSDs

Sequential Read

[global] name=sequence rw=read bs=128K direct=1 numjobs=4 runtime=600 ioengine=libaio iodepth=64 norandommap group_reporting randrepeat=1 random_generator=tausworthe64

Sequential Write

[global] name=sequence rw=write bs=128K direct=1 numjobs=4 runtime=600 ioengine=libaio iodepth=64 norandommap group_reporting randrepeat=1 random_generator=tausworthe64
Samsung PM1733a FIO Performance on Ampere Altra
Random Read/Write (MBps), bs=4k, jobs=16, iodepth=64, 10min
Sequential Read/Write (MBps), bs=128k, jobs=4, iodepth=64, 10min
CPU Utilization (%)
Samsung PM1733a FIO Performance on Ampere Altra Max
Random Read/Write (kIOPS), Altra Max
Sequential Read/Write (MBps), Altra Max
CPU Utilization (%)
Samsung PM1733a FIO Performance on Ampere Altra vs. Altra Max
Random Read (klOPS), Altra vs. Altra Max
CPU Utilization of Random Read, Altra vs. Altra Max
Created At : April 18th 2023, 11:02:13 am
Last Updated At : July 31st 2023, 5:08:30 pm
Ampere Logo

Ampere Computing LLC

4655 Great America Parkway Suite 601

Santa Clara, CA 95054

image
image
image
image
 |  |  |  |  |  | 
© 2023 Ampere Computing LLC. All rights reserved. Ampere, Altra and the A and Ampere logos are registered trademarks or trademarks of Ampere Computing.
This site is running on Ampere Altra Processors.