News
Using a kernel-level profiler, I found that TensorFlow utilizes DepthwiseConv2dGPUKernelNHWC, which takes approximately 6.8ms per iteration in the following test case, while PyTorch uses ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results