./cpufp --thread_pool=[20] Number Threads: 1 Thread Pool Binding: 20 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 139.3 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 69.222 GOPS | | FMA | FMA(f32,f32,f32) | 34.956 GFLOPS | | FMA | FMA(f64,f64,f64) | 17.595 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 17.464 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 8.7641 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 17.717 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 8.6435 GFLOPS | -------------------------------------------------------------- ./cpufp --thread_pool=[20,21] Number Threads: 2 Thread Pool Binding: 20 21 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 298.27 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 149.24 GOPS | | FMA | FMA(f32,f32,f32) | 74.84 GFLOPS | | FMA | FMA(f64,f64,f64) | 37.259 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 37.011 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 18.686 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 37.204 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 18.8 GFLOPS | --------------------------------------------------------------