Cassandra tests were performed on bare-metal single socket servers with equivalent memory, networking and storage configurations for each of the platforms shown. The processors tested here are: AMD EPYC 7763 "Milan"; Intel Xeon 8380 "Ice Lake"; Ampere Altra Q80-30; Ampere Altra Max M128-30.
This test was performed using the Cassandra stress as a load generator for benchmarking Cassandra. Each test was configured to run for 3 minutes with multiple threads and multiple clients.
It is recommended to compile Cassandra with JDK-15 (compiled with GCC 10.2 with the right flags) or newer as newer java versions have made significant progress towards generating optimized code that can improve performance for Aarch64 applications.
G1GC was used as the java compiler, with appropriate memory and threads for the jvm. Cassandra data was stored on an NVMe, while commitlog was stored on tmfs.
CentOS 8.4 (kernel 4.18) with Cassandra 4.0.1 were used. For each of the tests, a similar number of clients was used to generate requests to Cassandra.
Since it is realistic to measure throughput under a specified Service Level Agreement (SLA), a 99th percentile latency (p.99) of 10 millisecond was used. This ensured that 99 percent of the requests had a response time of 10 ms in the worst case.
The test ran for 3 minutes with warmup with 90% write and 10% read, which is a critical usage for Cassandra, as Cassandra is optimized for write operations. An appropriate number of clients and threads to load one instance of Cassandra was initially used, while ensuring the p.99 latency was at most 10 ms.
Next, the number of Cassandra instances was successively increased till one or more instances violated the p.99 latency SLA. The aggregate throughput of all instances was used as the primary performance metric. The test was run three times and minimal run-to-run variation was observed