./cpufp --thread_pool=[0] Number Threads: 1 Thread Pool Binding: 0 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 484.3 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 242.17 GOPS | | FMA | FMA(f32,f32,f32) | 121.08 GFLOPS | | FMA | FMA(f64,f64,f64) | 60.545 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 88.979 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 44.496 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 45.252 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 22.638 GFLOPS | -------------------------------------------------------------- ./cpufp --thread_pool=[0,2,4,6,8,10] Number Threads: 6 Thread Pool Binding: 0 2 4 6 8 10 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 2.906 TOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 1.453 TOPS | | FMA | FMA(f32,f32,f32) | 726.46 GFLOPS | | FMA | FMA(f64,f64,f64) | 363.3 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 533.68 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 266.95 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 270.96 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 135.84 GFLOPS | --------------------------------------------------------------