News
Using a kernel-level profiler, I found that TensorFlow utilizes DepthwiseConv2dGPUKernelNHWC, which takes approximately 6.8ms per iteration in the following test case, while PyTorch uses ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results