./cpufp --thread_pool=[12] Number Threads: 1 Thread Pool Binding: 12 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 239.53 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 120.52 GOPS | | FMA | FMA(f32,f32,f32) | 60.053 GFLOPS | | FMA | FMA(f64,f64,f64) | 30.138 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 30.094 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 14.952 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 30.213 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 14.995 GFLOPS | -------------------------------------------------------------- ./cpufp --thread_pool=[12-19] Number Threads: 8 Thread Pool Binding: 12 13 14 15 16 17 18 19 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 1.6827 TOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 835.95 GOPS | | FMA | FMA(f32,f32,f32) | 420.67 GFLOPS | | FMA | FMA(f64,f64,f64) | 210.29 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 210.43 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 104.49 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 210.23 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 105.21 GFLOPS | --------------------------------------------------------------