./cpufp --thread_pool=[0] Number Threads: 1 Thread Pool Binding: 0 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 610.96 GOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 305.4 GOPS | | FMA | FMA(f32,f32,f32) | 152.74 GFLOPS | | FMA | FMA(f64,f64,f64) | 76.437 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 110.88 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 56.909 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 57.308 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 28.489 GFLOPS | -------------------------------------------------------------- ./cpufp --thread_pool=[0,1,3,6,8,10] Number Threads: 6 Thread Pool Binding: 0 1 3 6 8 10 -------------------------------------------------------------- | Instruction Set | Core Computation | Peak Performance | | AVX_VNNI | DP4A(s32,u8,s8) | 3.4526 TOPS | | AVX_VNNI | DP2A(s32,s16,s16) | 1.7241 TOPS | | FMA | FMA(f32,f32,f32) | 860.4 GFLOPS | | FMA | FMA(f64,f64,f64) | 430.91 GFLOPS | | AVX | ADD(MUL(f32,f32),f32) | 641.66 GFLOPS | | AVX | ADD(MUL(f64,f64),f64) | 320.9 GFLOPS | | SSE | ADD(MUL(f32,f32),f32) | 323.75 GFLOPS | | SSE2 | ADD(MUL(f64,f64),f64) | 160.74 GFLOPS | --------------------------------------------------------------