Platform: AMD Accelerated Parallel Processing Device: gfx1100 Driver version : 3635.0 (HSA1.1,LC) (Linux x64) Compute units : 48 Clock frequency : 2371 MHz Global memory bandwidth (GBPS) float : 707.45 float2 : 750.59 float4 : 756.57 float8 : 811.21 float16 : 856.90 Single-precision compute (GFLOPS) float : 30893.80 float2 : 33078.14 float4 : 32527.02 float8 : 33777.64 float16 : 28855.14 Half-precision compute (GFLOPS) half : 31073.23 half2 : 64274.54 half4 : 63923.77 half8 : 63553.30 half16 : 63412.55 Double-precision compute (GFLOPS) double : 1141.44 double2 : 1137.77 double4 : 1134.46 double8 : 1127.09 double16 : 1119.44 Integer compute (GIOPS) int : 6710.68 int2 : 6667.33 int4 : 6689.00 int8 : 6660.87 int16 : 6622.93 Integer compute Fast 24bit (GIOPS) int : 29006.98 int2 : 28720.87 int4 : 28630.46 int8 : 28535.36 int16 : 24233.41 Integer char (8bit) compute (GIOPS) char : 31074.16 char2 : 16723.87 char4 : 16627.11 char8 : 14027.69 char16 : 13643.48 Integer short (16bit) compute (GIOPS) short : 31083.53 short2 : 29873.99 short4 : 29376.22 short8 : 28949.41 short16 : 28414.74 Transfer bandwidth (GBPS) enqueueWriteBuffer : 17.18 enqueueReadBuffer : 4.65 enqueueWriteBuffer non-blocking : 17.06 enqueueReadBuffer non-blocking : 4.61 enqueueMapBuffer(for read) : 254140.08 memcpy from mapped ptr : 4.54 enqueueUnmap(after write) : 338186.41 memcpy to mapped ptr : 16.28 Kernel launch latency : 1545461120.00 us