./cpufp --thread_pool=[12] Number Threads: 1 Thread Pool Binding: 12 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 86.02 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 43.01 GOPS | | FMA | FMA(f32,f32,f32) | 43.008 GFLOPS | | FMA | FMA(f64,f64,f64) | 21.505 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 21.505 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 10.752 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 21.503 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 10.752 GFLOPS | -------------------------------------------------------------- ./cpufp --thread_pool=[12-19] Number Threads: 8 Thread Pool Binding: 12 13 14 15 16 17 18 19 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 688.23 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 344.11 GOPS | | FMA | FMA(f32,f32,f32) | 344.09 GFLOPS | | FMA | FMA(f64,f64,f64) | 172.05 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 172.05 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 86.023 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 172.03 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 86.016 GFLOPS | --------------------------------------------------------------