Platform: AMD Accelerated Parallel Processing Device: gfx1036 Driver version : 3635.0 (HSA1.1,LC) (Linux x64) Compute units : 1 Clock frequency : 2200 MHz Global memory bandwidth (GBPS) float : 37.70 float2 : 47.62 float4 : 49.18 float8 : 43.33 float16 : 35.86 Single-precision compute (GFLOPS) float : 512.54 float2 : 516.92 float4 : 520.29 float8 : 509.35 float16 : 502.36 Half-precision compute (GFLOPS) half : 510.31 half2 : 1067.73 half4 : 1069.36 half8 : 1037.82 half16 : 1048.27 Double-precision compute (GFLOPS) double : 34.19 double2 : 34.10 double4 : 34.13 double8 : 33.89 double16 : 33.54 Integer compute (GIOPS) int : 107.46 int2 : 107.12 int4 : 108.21 int8 : 107.37 int16 : 106.68 Integer compute Fast 24bit (GIOPS) int : 533.57 int2 : 536.99 int4 : 535.24 int8 : 522.02 int16 : 478.56 Integer char (8bit) compute (GIOPS) char : 541.26 char2 : 288.72 char4 : 278.60 char8 : 256.80 char16 : 254.43 Integer short (16bit) compute (GIOPS) short : 535.69 short2 : 535.77 short4 : 535.26 short8 : 532.39 short16 : 509.55 Transfer bandwidth (GBPS) enqueueWriteBuffer : 17.56 enqueueReadBuffer : 4.97 enqueueWriteBuffer non-blocking : 17.72 enqueueReadBuffer non-blocking : 4.96 enqueueMapBuffer(for read) : 244032.22 memcpy from mapped ptr : 4.87 enqueueUnmap(after write) : 370255.78 memcpy to mapped ptr : 16.53 Kernel launch latency : 1545461120.00 us