Visual Profiler collects NVLink topology and NVLink transmit/receive throughput metrics and maps the metrics on to the topology. The topology is collected by default along with the timeline. Throughput/ utilization metrics are generated only when NVLink option is chosen. NVLink information is presented in the Results section of Examine GPU Usage in CUDA Application Analysis in Guided Analysis. NVLink Analysis shows topology that shows the logical NVLink connections between different devices. A logical link comprises of 1 to 4 physical NVLinks of same properties connected between two devices. Visual profiler lists the properties and achieved utilization for logical NVLinks in ‘Logical NVLink Properties’ table. It also lists the transmit and receive throughputs for logical NVLink in ‘Logical NVLink Throughput’ table.
官方文档在此:https://docs.nvidia.com/cuda/pdf/CUDA_Profiler_Users_Guide.pdf
附录2-PXN - PCI X NVLINK
The new feature introduced in NCCL 2.12 is called PXN, as PCI × NVLink, as it enables a GPU to communicate with a NIC on the node through NVLink and then PCI. This is instead of going through the CPU using QPI or other inter-CPU protocols, which would not be able to deliver full bandwidth. That way, even though each GPU still tries to use its local NIC as much as possible, it can reach other NICs if required. 就是机外nic,机内nvlink,固然是个新feature,但用的老拓扑。不过机外网络训练使用的doubling的实现。