Message ID | 20240529201744.15420-1-namcao@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] pci-bridge/xio3130_downstream: fix invalid link speed and link width | expand |
On Wed, 29 May 2024 22:17:44 +0200 Nam Cao <namcao@linutronix.de> wrote: > Set link width to x1 and link speed to 2.5 Gb/s as specified by the > datasheet. Without this, these fields in the link status register read > zero, which is incorrect. > > This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields > to support higher speeds and widths"), which allows PCIe slot to set link > width and link speed. However, if PCIe slot does not explicitly set these > properties, they will be zero. Before this commit, the width and speed > default to x1 and 2.5 Gb/s. > > Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher speeds and widths") > Signed-off-by: Nam Cao <namcao@linutronix.de> Hi Nam, I'm feeling a bit guilty about this one a known it was there for a while. I was lazy when fixing the equivalent CXL case a while back on basis no one had noticed and unlike CXL (where migration is broken for a lot of reasons) fixing this may need to take into account migration from broken to fixed versions. Have you tested that? I did the CXL fix slightly differently. Can't remember why though - looking at the fact it uses an instance_post_init, is there an issue with accidentally overwriting the parameters? Or did I just over engineer the fix? https://gitlab.com/jic23/qemu/-/commit/314f5033c639ebe8218078a17513935747f15d9d > --- > v2: implement this in .realize() instead > --- > hw/pci-bridge/xio3130_downstream.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c > index 38a2361fa2..2df1ee203d 100644 > --- a/hw/pci-bridge/xio3130_downstream.c > +++ b/hw/pci-bridge/xio3130_downstream.c > @@ -72,6 +72,9 @@ static void xio3130_downstream_realize(PCIDevice *d, Error **errp) > pci_bridge_initfn(d, TYPE_PCIE_BUS); > pcie_port_init_reg(d); > > + s->speed = QEMU_PCI_EXP_LNK_2_5GT; > + s->width = QEMU_PCI_EXP_LNK_X1; > + > rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR, > XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT, > XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
On Fri, May 31, 2024 at 11:14:00AM +0100, Jonathan Cameron wrote: > On Wed, 29 May 2024 22:17:44 +0200 > Nam Cao <namcao@linutronix.de> wrote: > > > Set link width to x1 and link speed to 2.5 Gb/s as specified by the > > datasheet. Without this, these fields in the link status register read > > zero, which is incorrect. > > > > This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields > > to support higher speeds and widths"), which allows PCIe slot to set link > > width and link speed. However, if PCIe slot does not explicitly set these > > properties, they will be zero. Before this commit, the width and speed > > default to x1 and 2.5 Gb/s. > > > > Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher speeds and widths") > > Signed-off-by: Nam Cao <namcao@linutronix.de> > Hi Nam, > > I'm feeling a bit guilty about this one a known it was there for a while. > > I was lazy when fixing the equivalent CXL case a while back on > basis no one had noticed and unlike CXL (where migration is broken for a lot > of reasons) fixing this may need to take into account migration from broken to > fixed versions. Have you tested that? I tested this patch with Linux kernel. I noticed this bug when Linux complained that the PCI link was broken. Linux determines weather a link is up by checking if these speed/width fields have valid value. Repro: qemu-system-x86_64 \ -machine pc-q35-2.10 \ -kernel bzImage \ -drive "file=img,format=raw" \ -m 2048 -smp 1 -enable-kvm \ -append "console=ttyS0 root=/dev/sda debug" \ -nographic \ -device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=253 \ -device x3130-upstream,id=up1,bus=rp1 \ -device xio3130-downstream,id=dp1,bus=up1,chassis=1,slot=1 Then after Linux has booted: device_add device_add e1000,bus=dp1,id=eth0 Then Linux complains that something is wrong with the link: pcieport 0000:02:00.0: pciehp: Slot(1-1): Cannot train link: status 0x2000 This patch gets rid of Linux's complain, and the hot-plug now works fine. > I did the CXL fix slightly differently. Can't remember why though - looking > at the fact it uses an instance_post_init, is there an issue with accidentally > overwriting the parameters? Or did I just over engineer the fix? I would say over engineer. I think CXL does not take link speed and link width as parameters. Best regards, Nam
On Fri, 31 May 2024 12:36:35 +0200 Nam Cao <namcao@linutronix.de> wrote: > On Fri, May 31, 2024 at 11:14:00AM +0100, Jonathan Cameron wrote: > > On Wed, 29 May 2024 22:17:44 +0200 > > Nam Cao <namcao@linutronix.de> wrote: > > > > > Set link width to x1 and link speed to 2.5 Gb/s as specified by the > > > datasheet. Without this, these fields in the link status register read > > > zero, which is incorrect. > > > > > > This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields > > > to support higher speeds and widths"), which allows PCIe slot to set link > > > width and link speed. However, if PCIe slot does not explicitly set these > > > properties, they will be zero. Before this commit, the width and speed > > > default to x1 and 2.5 Gb/s. > > > > > > Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher speeds and widths") > > > Signed-off-by: Nam Cao <namcao@linutronix.de> > > Hi Nam, > > > > I'm feeling a bit guilty about this one a known it was there for a while. > > > > I was lazy when fixing the equivalent CXL case a while back on > > basis no one had noticed and unlike CXL (where migration is broken for a lot > > of reasons) fixing this may need to take into account migration from broken to > > fixed versions. Have you tested that? > I've run into problems in the past around updating config space registers because when we migrate from a prepatch QEMU instance to a post patch 1 the config space registers are compared. I'm not sure if LNKCAP is included in that. LNKSTA is explicitly ruled out I think. For examples see all the machine version checks in hw/core/machine.c The one that bit me was fixed with x-pcie-err-unc-mask when I was fixing a register that didn't match the spec defined values. > I tested this patch with Linux kernel. > > I noticed this bug when Linux complained that the PCI link was broken. > Linux determines weather a link is up by checking if these speed/width > fields have valid value. > > Repro: > qemu-system-x86_64 \ > -machine pc-q35-2.10 \ > -kernel bzImage \ > -drive "file=img,format=raw" \ > -m 2048 -smp 1 -enable-kvm \ > -append "console=ttyS0 root=/dev/sda debug" \ > -nographic \ > -device pcie-root-port,bus=pcie.0,slot=1,id=rp1,bus-reserve=253 \ > -device x3130-upstream,id=up1,bus=rp1 \ > -device xio3130-downstream,id=dp1,bus=up1,chassis=1,slot=1 > > Then after Linux has booted: > device_add device_add e1000,bus=dp1,id=eth0 > > Then Linux complains that something is wrong with the link: > pcieport 0000:02:00.0: pciehp: Slot(1-1): Cannot train link: status 0x2000 > > This patch gets rid of Linux's complain, and the hot-plug now works fine. > > > I did the CXL fix slightly differently. Can't remember why though - looking > > at the fact it uses an instance_post_init, is there an issue with accidentally > > overwriting the parameters? Or did I just over engineer the fix? > > I would say over engineer. I think CXL does not take link speed and link > width as parameters. I've implemented control but this still ends up over engineered because the reason I want to control this is to vary access parameters for calculating latency and bandwidth. That is easiest done by controlling the EP status to degrade the link. For that I just set the CAP register on the switch DSP to allow suitably high values and let pcie_sync_bridge() match this to the status of the EP (which I have properties to contro). There seems to be only one way 'negotiation' of these parameters so it needs to be EP driven. Jonathan > > Best regards, > Nam
diff --git a/hw/pci-bridge/xio3130_downstream.c b/hw/pci-bridge/xio3130_downstream.c index 38a2361fa2..2df1ee203d 100644 --- a/hw/pci-bridge/xio3130_downstream.c +++ b/hw/pci-bridge/xio3130_downstream.c @@ -72,6 +72,9 @@ static void xio3130_downstream_realize(PCIDevice *d, Error **errp) pci_bridge_initfn(d, TYPE_PCIE_BUS); pcie_port_init_reg(d); + s->speed = QEMU_PCI_EXP_LNK_2_5GT; + s->width = QEMU_PCI_EXP_LNK_X1; + rc = msi_init(d, XIO3130_MSI_OFFSET, XIO3130_MSI_NR_VECTOR, XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_64BIT, XIO3130_MSI_SUPPORTED_FLAGS & PCI_MSI_FLAGS_MASKBIT,
Set link width to x1 and link speed to 2.5 Gb/s as specified by the datasheet. Without this, these fields in the link status register read zero, which is incorrect. This problem appeared since 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher speeds and widths"), which allows PCIe slot to set link width and link speed. However, if PCIe slot does not explicitly set these properties, they will be zero. Before this commit, the width and speed default to x1 and 2.5 Gb/s. Fixes: 3d67447fe7c2 ("pcie: Fill PCIESlot link fields to support higher speeds and widths") Signed-off-by: Nam Cao <namcao@linutronix.de> --- v2: implement this in .realize() instead --- hw/pci-bridge/xio3130_downstream.c | 3 +++ 1 file changed, 3 insertions(+)