Synthetic Benchmarks¶
CPU Benchmarks¶
- frontnode: Intel Core i9-13900H CPU (see its topology in the Description section)
- az4-n4090and- az4-a7900partitions: AMD Ryzen 9 7945HX CPU (see its topology in the Description section)
- iml-ia770partition: Intel Core Ultra 9 185H CPU (see its topology in the Description section)
- az5-a890mpartition: AMD Ryzen AI 9 HX 370 CPU
Memory Throughput¶
Memory throughput is measured with the 
bandwidth benchmark and, each time, one 
thread is explicitly pinned to one CPU core (even if a core possesses more than 
one PU).
AMD Ryzen 9 7945HX¶
az4-mixed RAM is: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44.
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
Intel Core i9-13900H¶
front node RAM is: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44.
  
  
  
  
  
  
  
  
  
  
  
  
  
  
Intel Core Ultra 9 185H¶
iml-ia770 RAM is: 32 GB (2 x 16 GB, dual channel) DDR5 5600 MT/s.
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
  
AMD Ryzen AI 9 HX 370¶
az5-890m node RAM is: 4 x 8 GB, quad channel LPDDR5x 7500 MT/s.
  
  
  
  
  
  
  
  
  
  
  
  
Throughput Comparison between the different Core Types¶
In the following graphics, c, p, e and LPe mean "core", "p-core",
"e-core" and "LP e-core", resp. They characterize the type of cores used in the
benchmark. Just before each abbreviation, there is a number that represents the 
number of cores involved in the benchmark. For instance 4e means that 
4 e-cores have been used.
For all the core types, the L1d cache is owned by a single core.
The configuration of the L2 cache depends on the core type:
- Ryzen 9 7945HX cores, Core i9-13900H, Core Ultra 9 185H p-cores and Ryzen AI 9 HX 370 p-/e-cores: a L2 cache is dedicated to each core
- Core i9-13900H and Core Ultra 9 185H e-cores: 4 cores are sharing the L2 cache together
- Core Ultra 9 185H LP e-cores: 2 cores are sharing the L2 cache
The configuration of the L3 cache depends on the core type:
- Ryzen 9 7945HX cores: a L3 cache is shared between 8 cores
- Core i9-13900H p-cores, Core Ultra 9 185H p-cores and Core Ultra 9 185H e-cores: the L3 cache is shared for all these cores
- Core Ultra 9 185H LP e-cores do not have a L3 cache
- Ryzen AI 9 HX 370 p-cores: a L3 cache is shared between 4 cores
- Ryzen AI 9 HX 370 e-cores: a L3 cache is shared between 8 cores
The RAM throughput mainly depends on the memory module types:
- Ryzen 9 7945HX and Core i9-13900H cores: Corsair Vengeance SO-DIMM 96 GB (2 x 48 GB, dual channel) DDR5 5200 MT/s CL44
- Core Ultra 9 185H: 32 GB (2 x 16 GB, dual channel) DDR5 5600 MT/s
Raw Data¶
| Benchmark | # Cores | AMD Ryzen 9 7945HX | Intel Core i9-13900H - p-cores | Intel Core i9-13900H - e-cores | Intel Core Ultra 9 185H - p-cores | Intel Core Ultra 9 185H - e-cores | Intel Core Ultra 9 185H - LP e-cores | AMD Ryzen AI 9 HX 370 - p-cores (Zen5) | AMD Ryzen AI 9 HX 370 - e-cores (Zen5c) | 
|---|---|---|---|---|---|---|---|---|---|
| bandwidth | 1 | download | download | download | download | download | download | download | download | 
| bandwidth | 2 | download | download | download | download | download | download | download | download | 
| bandwidth | 3 | download | download | download | download | download | - | download | download | 
| bandwidth | 4 | download | download | download | download | download | - | download | download | 
| bandwidth | 5 | download | download | download | download | download | - | - | download | 
| bandwidth | 6 | download | download | download | download | download | - | - | download | 
| bandwidth | 7 | download | - | download | - | download | - | - | download | 
| bandwidth | 8 | download | - | download | - | download | - | - | download | 
| bandwidth | 9 | download | - | - | - | - | - | - | - | 
| bandwidth | 10 | download | - | - | - | - | - | - | - | 
| bandwidth | 11 | download | - | - | - | - | - | - | - | 
| bandwidth | 12 | download | - | - | - | - | - | - | - | 
| bandwidth | 13 | download | - | - | - | - | - | - | - | 
| bandwidth | 14 | download | - | - | - | - | - | - | - | 
| bandwidth | 15 | download | - | - | - | - | - | - | - | 
| bandwidth | 16 | download | - | - | - | - | - | - | - | 
Peak Performance¶
The cpufp benchmark is used. The CPU peak 
performance is measured according to different operations:
- FMA - Fused Multiply–Add, performs the following operation: \(d = a \times b + c\) on 64-bit or 32-bit floating-point numbers (reffered as f64 & f32 here)
- DPA2 - Performs the dot product of two 16-bit integers (i16) and accumulates the result in a 32-bit integer (i32): \(c^{i32} = c^{i32} + \sum^2_{s = 1}{ a_s^{i16} \times b_s^{i16}}\)
- DPA4 - Performs the dot product of four 8-bit integers (i8) and accumulates the result in a 32-bit integer (i32): \(c^{i32} = c^{i32} + \sum^4_{s = 1}{ a_s^{i8} \times b_s^{i8}}\)
In the following graphics, c, p, e and LPe mean "core", "p-core",
"e-core" and "LP e-core", resp. They characterize the type of cores used in the
benchmark. Just before each abbreviation, there is a number that represents the 
number of cores involved in the benchmark. For instance 4e means that 
4 e-cores have been used.
Raw Data¶
| Benchmark | AMD Ryzen 9 7945HX | Intel Core i9-13900H - p-cores | Intel Core i9-13900H - e-cores | Intel Core Ultra 9 185H - p-cores | Intel Core Ultra 9 185H - e-cores | Intel Core Ultra 9 185H - LP e-cores | 
|---|---|---|---|---|---|---|
| cpufp | download | download | download | download | download | download | 
GPU Benchmarks¶
- frontnode: Intel Iris Xe iGPU
- az4-n4090partition: AMD Radeon 610M iGPU and Nvidia GeForce RTX 4090 dGPU
- az4-a7900partition: AMD Radeon 610M iGPU and AMD Radeon RX 7900 XTX dGPU
- iml-ia770partition: Intel Arc Mobile iGPU and Intel Arc 770 eGPU
- az5-a890mpartition: AMD Radeon 890M iGPU
Memory Throughput¶
Measurement of the memory throughput between the GPU and its global 
memory (VRAM) with the clpeak 
benchmark. When it is a iGPU, its global memory is the RAM and it is shared 
with the CPU.
Peak Performance¶
Measurement of the GPU peak performance. The 
clpeak benchmark is used. It is 
an OpenCL benchmark that executes a compute intensive program to estimate peak 
performance.
Warning
The Nvidia GeForce RTX 4090 GPU supports float16 format in hardware
but this is not supported by the OpenCL Nvidia driver. This is why the PoCL 
driver has also been installed.
Kernel Launch Latency¶
The kernel launch latency is the duration between the time of the order to 
execute a kernel from the CPU user-space and the time of its beginning of 
execution on the GPU (excluding data buffers memory transfers). The 
clpeak benchmark is used.
In the previous histogram we did not reported the latency of the PoCL GeForce RTX 4090 because it is abnormally super high (3.6 ms).
Warning
Some OpenCL implementations do not report the correct kernel launch latency and this is why the Radeon 610M and the Radeon RX 7900 XTX do no appear in the previous histogram.
Raw Data¶
| Benchmark | AMD Radeon 610M | Intel Iris Xe | Intel Arc Mobile | AMD Radeon 890M | Intel Arc 770 | AMD Radeon RX 7900 XTX | Nvidia GeForce RTX 4090 | 
|---|---|---|---|---|---|---|---|
| clpeak | download | download | download [Xe][i915] | download | download [Xe][i915] | download | download | 
SSD Benchmarks¶
- frontnode: 3x Samsung 990 PRO 4 TB SSD
- az4-n4090partition: 1x Samsung 990 PRO 4 TB SSD
- az4-a7900partition: 1x Samsung 990 PRO 2 TB SSD
- iml-ia770partition: 1x Kingston OM8PGP41024Q-A0 1 TB SSD
- az5-a890mpartition: 1x Crucial P3 Plus CT1000P3PSSD8 1 TB SSD
Raw Data¶
| Benchmark | Samsung 990 PRO 4 TB ( front) | Samsung 990 PRO 4 TB ( az4-n4090) | Samsung 990 PRO 2 TB | Kingston OM8PGP41024Q-A0 1 TB | Crucial P3 Plus CT1000P3PSSD8 1 TB | 
|---|---|---|---|---|---|
| dd | download | download | download | download | download | 
| iozone | download | download | download | download | download |