Synthetic Benchmarks¶
CPU Benchmarks¶
front
node: Intel Core i9-13900H CPU (see its topology in the Description section)az4-n4090
andaz4-a7900
partitions: AMD Ryzen 9 7945HX CPU (see its topology in the Description section)iml-ia770
partition: Intel Core Ultra 9 185H CPU (see its topology in the Description section)az5-a890m
partition: AMD Ryzen AI 9 HX 370 CPU
Memory Throughput¶
Memory throughput is measured with the
bandwidth
benchmark and, each time, one
thread is explicitly pinned to one CPU core (even if a core possesses more than
one PU).
AMD Ryzen 9 7945HX¶
az4-mixed
RAM is: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44.
Intel Core i9-13900H¶
front
node RAM is: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44.
Intel Core Ultra 9 185H¶
iml-ia770
RAM is: 32 GB (2 x 16 GB, dual channel) DDR5 5600 MT/s.
AMD Ryzen AI 9 HX 370¶
az5-890m
node RAM is: 4 x 8 GB, quad channel LPDDR5x 7500 MT/s.
Throughput Comparison between the different Core Types¶
In the following graphics, c
, p
, e
and LPe
mean "core", "p-core",
"e-core" and "LP e-core", resp. They characterize the type of cores used in the
benchmark. Just before each abbreviation, there is a number that represents the
number of cores involved in the benchmark. For instance 4e
means that
4 e-cores have been used.
For all the core types, the L1d cache is owned by a single core.
The configuration of the L2 cache depends on the core type:
- Ryzen 9 7945HX cores, Core i9-13900H, Core Ultra 9 185H p-cores and Ryzen AI 9 HX 370 p-/e-cores: a L2 cache is dedicated to each core
- Core i9-13900H and Core Ultra 9 185H e-cores: 4 cores are sharing the L2 cache together
- Core Ultra 9 185H LP e-cores: 2 cores are sharing the L2 cache
The configuration of the L3 cache depends on the core type:
- Ryzen 9 7945HX cores: a L3 cache is shared between 8 cores
- Core i9-13900H p-cores, Core Ultra 9 185H p-cores and Core Ultra 9 185H e-cores: the L3 cache is shared for all these cores
- Core Ultra 9 185H LP e-cores do not have a L3 cache
- Ryzen AI 9 HX 370 p-cores: a L3 cache is shared between 4 cores
- Ryzen AI 9 HX 370 e-cores: a L3 cache is shared between 8 cores
The RAM throughput mainly depends on the memory module types:
- Ryzen 9 7945HX and Core i9-13900H cores: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44
- Core Ultra 9 185H: 32 GB (2 x 16 GB, dual channel) DDR5 5600 MT/s
Raw Data¶
Benchmark | # Cores | AMD Ryzen 9 7945HX | Intel Core i9-13900H - p-cores | Intel Core i9-13900H - e-cores | Intel Core Ultra 9 185H - p-cores | Intel Core Ultra 9 185H - e-cores | Intel Core Ultra 9 185H - LP e-cores | AMD Ryzen AI 9 HX 370 - p-cores (Zen5) | AMD Ryzen AI 9 HX 370 - e-cores (Zen5c) |
---|---|---|---|---|---|---|---|---|---|
bandwidth |
1 | download | download | download | download | download | download | download | download |
bandwidth |
2 | download | download | download | download | download | download | download | download |
bandwidth |
3 | download | download | download | download | download | - | download | download |
bandwidth |
4 | download | download | download | download | download | - | download | download |
bandwidth |
5 | download | download | download | download | download | - | - | download |
bandwidth |
6 | download | download | download | download | download | - | - | download |
bandwidth |
7 | download | - | download | - | download | - | - | download |
bandwidth |
8 | download | - | download | - | download | - | - | download |
bandwidth |
9 | download | - | - | - | - | - | - | - |
bandwidth |
10 | download | - | - | - | - | - | - | - |
bandwidth |
11 | download | - | - | - | - | - | - | - |
bandwidth |
12 | download | - | - | - | - | - | - | - |
bandwidth |
13 | download | - | - | - | - | - | - | - |
bandwidth |
14 | download | - | - | - | - | - | - | - |
bandwidth |
15 | download | - | - | - | - | - | - | - |
bandwidth |
16 | download | - | - | - | - | - | - | - |
Peak Performance¶
The cpufp
benchmark is used. The CPU peak
performance is measured according to different operations:
- FMA - Fused Multiply–Add, performs the following operation: \(d = a \times b + c\) on 64-bit or 32-bit floating-point numbers (reffered as f64 & f32 here)
- DPA2 - Performs the dot product of two 16-bit integers (i16) and accumulates the result in a 32-bit integer (i32): \(c^{i32} = c^{i32} + \sum^2_{s = 1}{ a_s^{i16} \times b_s^{i16}}\)
- DPA4 - Performs the dot product of four 8-bit integers (i8) and accumulates the result in a 32-bit integer (i32): \(c^{i32} = c^{i32} + \sum^4_{s = 1}{ a_s^{i8} \times b_s^{i8}}\)
In the following graphics, c
, p
, e
and LPe
mean "core", "p-core",
"e-core" and "LP e-core", resp. They characterize the type of cores used in the
benchmark. Just before each abbreviation, there is a number that represents the
number of cores involved in the benchmark. For instance 4e
means that
4 e-cores have been used.
Raw Data¶
Benchmark | AMD Ryzen 9 7945HX | Intel Core i9-13900H - p-cores | Intel Core i9-13900H - e-cores | Intel Core Ultra 9 185H - p-cores | Intel Core Ultra 9 185H - e-cores | Intel Core Ultra 9 185H - LP e-cores |
---|---|---|---|---|---|---|
cpufp |
download | download | download | download | download | download |
GPU Benchmarks¶
front
node: Intel Iris Xe iGPUaz4-n4090
partition: AMD Radeon 610M iGPU and Nvidia GeForce RTX 4090 dGPUaz4-a7900
partition: AMD Radeon 610M iGPU and AMD Radeon RX 7900 XTX dGPUiml-ia770
partition: Intel Arc Mobile iGPU and Intel Arc 770 eGPUaz5-a890m
partition: AMD Radeon 890M iGPU
Memory Throughput¶
Measurement of the memory throughput between the GPU and its global
memory (VRAM) with the clpeak
benchmark. When it is a iGPU, its global memory is the RAM and it is shared
with the CPU.
Peak Performance¶
Measurement of the GPU peak performance. The
clpeak
benchmark is used. It is
an OpenCL benchmark that executes a compute intensive program to estimate peak
performance.
Warning
The Nvidia GeForce RTX 4090 GPU supports float16
format in hardware
but this is not supported by the OpenCL Nvidia driver. This is why the PoCL
driver has also been installed.
Kernel Launch Latency¶
The kernel launch latency is the duration between the time of the order to
execute a kernel from the CPU user-space and the time of its beginning of
execution on the GPU (excluding data buffers memory transfers). The
clpeak
benchmark is used.
In the previous histogram we did not reported the latency of the PoCL GeForce RTX 4090 because it is abnormally super high (3.6 ms).
Warning
Some OpenCL implementations do not report the correct kernel launch latency and this is why the Radeon 610M and the Radeon RX 7900 XTX do no appear in the previous histogram.
Raw Data¶
Benchmark | AMD Radeon 610M | Intel Iris Xe | Intel Arc Mobile | AMD Radeon 890M | Intel Arc 770 | AMD Radeon RX 7900 XTX | Nvidia GeForce RTX 4090 |
---|---|---|---|---|---|---|---|
clpeak |
download | download | download [Xe][i915] | download | download [Xe][i915] | download | download |
SSD Benchmarks¶
front
node: 3x Samsung 990 PRO 4 TB SSDaz4-n4090
partition: 1x Samsung 990 PRO 4 TB SSDaz4-a7900
partition: 1x Samsung 990 PRO 2 TB SSDiml-ia770
partition: 1x Kingston OM8PGP41024Q-A0 1 TB SSDaz5-a890m
partition: 1x Crucial P3 Plus CT1000P3PSSD8 1 TB SSD
Raw Data¶
Benchmark | Samsung 990 PRO 4 TB (front ) |
Samsung 990 PRO 4 TB (az4-n4090 ) |
Samsung 990 PRO 2 TB | Kingston OM8PGP41024Q-A0 1 TB | Crucial P3 Plus CT1000P3PSSD8 1 TB |
---|---|---|---|---|---|
dd |
download | download | download | download | download |
iozone |
download | download | download | download | download |