./clpeak Platform: AMD Accelerated Parallel Processing Device: gfx1150 Driver version : 3635.0 (HSA1.1,LC) (Linux x64) Compute units : 8 Clock frequency : 2900 MHz Global memory bandwidth (GBPS) float : 89.48 float2 : 95.48 float4 : 96.20 float8 : 93.38 float16 : 96.39 Single-precision compute (GFLOPS) float : 4199.26 float2 : 4300.34 float4 : 4285.72 float8 : 4205.67 float16 : 4236.92 Half-precision compute (GFLOPS) half : 4290.57 half2 : 7295.99 half4 : 7111.85 half8 : 7103.91 half16 : 6717.18 Double-precision compute (GFLOPS) double : 181.86 double2 : 181.30 double4 : 180.56 double8 : 180.13 double16 : 178.96 Integer compute (GIOPS) int : 1074.42 int2 : 1008.43 int4 : 1031.80 int8 : 904.98 int16 : 1079.03 Integer compute Fast 24bit (GIOPS) int : 3266.01 int2 : 3493.31 int4 : 3482.08 int8 : 3331.33 int16 : 3383.28 Integer char (8bit) compute (GIOPS) char : 3244.71 char2 : 2195.23 char4 : 2145.75 char8 : 1948.54 char16 : 1953.09 Integer short (16bit) compute (GIOPS) short : 3371.77 short2 : 3419.20 short4 : 3380.22 short8 : 3337.97 short16 : 3374.91 Transfer bandwidth (GBPS) enqueueWriteBuffer : 26.51 enqueueReadBuffer : 26.45 enqueueWriteBuffer non-blocking : 26.52 enqueueReadBuffer non-blocking : 26.48 enqueueMapBuffer(for read) : 380085.59 memcpy from mapped ptr : 26.44 enqueueUnmap(after write) : 681740.81 memcpy to mapped ptr : 26.54 Kernel launch latency : 5.83 us