Nvidia Xavier Nx platform PCIE rate adjustment debugging record

1. Preface

How to increase the maximum speed of the jettson Xavier on the pcie?

Because it is limited to 2.5 GT / s, Xavier seems to be able to increase to 8 GT/s.

Using Jetpack 4.5

0004:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad1 (rev a1) (prog-if 00 [Normal decode])
        LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM not supported, Exit Latency L0s <1us, L1 <64us
        LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-

When no device is connected to nx

0004:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad1 (rev a1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 33
Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
I/O behind bridge: 00001000-00001fff
Memory behind bridge: 40000000-400fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
Address: 0000000000000000  Data: 0000
Masking: 00000000  Pending: 00000000
Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 256 bytes, PhantFunc 0
ExtTag- RBE+
DevCtl: Report errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #0, Speed 8GT/s, Width x1, ASPM not supported, Exit Latency L0s <1us, L1 <64us
ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible+
RootCap: CRSVisible+
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR+, OBFF Not Supported ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR+, OBFF Disabled ARIFwd-
LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
 Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
 EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities: [b0] MSI-X: Enable- Count=8 Masked-
Vector table: BAR=2 offset=00000000
PBA: BAR=2 offset=00010000
Capabilities: [100 v2] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [148 v1] #19
Capabilities: [168 v1] #26
Capabilities: [18c v1] #27
Capabilities: [1ac v1] L1 PM Substates
L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2- ASPM_L1.1- L1_PM_Substates+
  PortCommonModeRestoreTime=60us PortTPowerOnTime=40us
L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
   T_CommonMode=60us
L1SubCtl2: T_PwrOn=60us
Capabilities: [1bc v1] Vendor Specific Information: ID=0002 Rev=4 Len=100 <?>
Capabilities: [2bc v1] Vendor Specific Information: ID=0001 Rev=1 Len=038 <?>
Capabilities: [2f4 v1] #25
Capabilities: [300 v1] Precision Time Measurement
PTMCap: Requester:+ Responder:+ Root:+
PTMClockGranularity: 16ns
PTMControl: Enabled:- RootSelected:-
PTMEffectiveGranularity: Unknown
Capabilities: [30c v1] Vendor Specific Information: ID=0004 Rev=1 Len=054 <?>
Kernel driver in use: pcieport

2. Query documents

Jetson Xavier actually has Gen-4 speed (i.e. 16 GT/s),

This is the default setting (that is, when a device with Gen-4 speed is connected, the link will appear at Gen-4 speed). Otherwise, the link speed depends on what is connected to the root port,

The final speed depends on the equipment end

You can use this script to change the speed pcie_set_speed.sh

    #!/bin/bash

    dev=$1
    speed=$2

    if [ -z "$dev" ]; then
        echo "Error: no device specified"
        exit 1
    fi

    if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
        dev="0000:$dev"
    fi

    if [ ! -e "/sys/bus/pci/devices/$dev" ]; then
        echo "Error: device $dev not found"
        exit 1
    fi

    pciec=$(setpci -s $dev CAP_EXP+02.W)
    pt=$((("0x$pciec" & 0xF0) >> 4))

    port=$(basename $(dirname $(readlink "/sys/bus/pci/devices/$dev")))

    if (($pt == 0)) || (($pt == 1)) || (($pt == 5)); then
        dev=$port
    fi

    lc=$(setpci -s $dev CAP_EXP+0c.L)
    ls=$(setpci -s $dev CAP_EXP+12.W)

    max_speed=$(("0x$lc" & 0xF))

    echo "Link capabilities:" $lc
    echo "Max link speed:" $max_speed
    echo "Link status:" $ls
    echo "Current link speed:" $(("0x$ls" & 0xF))

    if [ -z "$speed" ]; then
        speed=$max_speed
    fi

    if (($speed > $max_speed)); then
        speed=$max_speed
    fi

    echo "Configuring $dev..."

    lc2=$(setpci -s $dev CAP_EXP+30.L)

    echo "Original link control 2:" $lc2
    echo "Original link target speed:" $(("0x$lc2" & 0xF))

    lc2n=$(printf "%08x" $((("0x$lc2" & 0xFFFFFFF0) | $speed)))

    echo "New target link speed:" $speed
    echo "New link control 2:" $lc2n

    setpci -s $dev CAP_EXP+30.L=$lc2n

    echo "Triggering link retraining..."

    lc=$(setpci -s $dev CAP_EXP+10.L)

    echo "Original link control:" $lc

    lcn=$(printf "%08x" $(("0x$lc" | 0x20)))

    echo "New link control:" $lcn

    setpci -s $dev CAP_EXP+10.L=$lcn

    sleep 0.1

    ls=$(setpci -s $dev CAP_EXP+12.W)

    echo "Link status:" $ls
    echo "Current link speed:" $(("0x$ls" & 0xF))

Is there a deeper way to change the speed of pcie instead of executing this script every time?

3. Install an 8GT/s equipment

Without executing any script and without any connection, nothing can be negotiated, so the link speed remains at 2.5GT /.

If you install an 8GT/s device, you will see the corresponding speed adjustment.

This is a segment of NVMe device, running at 8GT/s x4

0005:01:00.0 Non-Volatile memory controller: Micron/Crucial Technology Device 540a (rev 01) (prog-if 02 [NVM Express])
        Subsystem: Micron/Crucial Technology Device 540a
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 35
        IOMMU group: 61
        Region 0: Memory at 1f40000000 (64-bit, non-prefetchable) [size=16K]
        Capabilities: [80] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 0.000W
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr- TransPend-
                LnkCap: Port #1, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 unlimited
                        ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 8GT/s (ok), Width x4 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

This is the bridge it connects

0005:00:00.0 PCI bridge: NVIDIA Corporation Device 1ad0 (rev a1) (prog-if 00 [Normal decode])
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 35
        IOMMU group: 60
        Bus: primary=00, secondary=01, subordinate=ff, sec-latency=0
        I/O behind bridge: 0000f000-00000fff [disabled]
        Memory behind bridge: 40000000-400fffff [size=1M]
        Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled]
        Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
        BridgeCtl: Parity- SERR- NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
                PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
        Capabilities: [40] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
                Address: 0000000000000000  Data: 0000
                Masking: 00000000  Pending: 00000000
        Capabilities: [70] Express (v2) Root Port (Slot-), MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0
                        ExtTag- RBE+
                DevCtl: CorrErr+ NonFatalErr+ FatalErr+ UnsupReq+
                        RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 512 bytes
                DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
                LnkCap: Port #0, Speed 16GT/s, Width x8, ASPM not supported
                        ClockPM- Surprise+ LLActRep+ BwNot+ ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt+ AutBWInt-
                LnkSta: Speed 8GT/s (downgraded), Width x4 (downgraded)
                        TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+

4. Mining program commissioning

When starting the mining program, check the memory space based on pcie.

If nothing has changed, you can't dig anything on xavier nx and get the log:

cuda-0   Using Pci Id : 00:00.0 Xavier (Compute 7.2) Memory : 2.5 GB

The process requires at least 4.2 GB to generate DAG.

If you change the speed of pcie and run the mining process, you get the following message:

cuda-0   Using Pci Id : 00:00.0 Xavier (Compute 7.2) Memory : 6.19 GB

The mining process ran successfully because it had enough memory to generate DAG this time.

Therefore, in one way or another, they are a link to pcie speed and can run the mining process on this card.

5. Adjust the equipment tree

There is a device tree named "NVIDIA, init speed"

You can try to add it to a pcie device by overwriting it with a device tree

pcie@14160000 {
        nvidia,init-speed = <3>;
    };
    pcie@141a0000 {
        nvidia,init-speed = <4>;
    };

The method described involves creating a new dtb that is loaded into the kernel at boot time.

The easiest way to do this is to run PCIe automatically at startup_ set_ speed. SH script.

This can be easily done with system services

The save path is "/ etc/systemd/system/pcie_set_speed.service"

[Unit]
Description=Set PCIe Speed

[Service]
Type=oneshot
ExecStart=/root/pcie_set_speed.sh

[Install]
WantedBy=sysinit.target

Then PCI_ set_ speed. Copy the SH script to / root /, and make sure it is executable. Run now

$ sudo systemctl daemon-reload
$ sudo systemctl enable pcie_set_speed
$ sudo systemctl start pcie_set_speed

Configure ok

 

Keywords: nvidia pci-e

Added by jannoy on Fri, 31 Dec 2021 22:19:40 +0200