Message ID | 20210712133500.1126371-3-kai.heng.feng@canonical.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [1/3] e1000e: Separate TGP from SPT | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | warning | Series does not have a cover letter |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Guessed tree name to be net-next |
netdev/subject_prefix | warning | Target tree name not specified in the subject |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 26 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
Hi Sasha, On Mon, Jul 12, 2021 at 9:35 PM Kai-Heng Feng <kai.heng.feng@canonical.com> wrote: > > On TGL systems, PCI_COMMAND may randomly flip to 0 on system resume. > This is devastating to drivers that use pci_set_master(), like NVMe and > xHCI, to enable DMA in their resume routine, as pci_set_master() can > inadvertently disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY, making > resources inaccessible. > > The issue is reproducible on all kernel releases, but the situation is > exacerbated by commit 6cecf02e77ab ("Revert "e1000e: disable s0ix entry > and exit flows for ME systems""). > > Seems like ME can do many things to other PCI devices until it's finally out of > ULP polling. So ensure e1000e PM ops are serialized by enforcing suspend/resume > order to workaround the issue. > > Of course this will make system suspend and resume a bit slower, but we > probably need to settle on this workaround until ME is fully supported. > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212039 > Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Series "e1000e: Add handshake with the CSME to support s0ix" doesn't fix the issue, so this patch is still needed. Kai-Heng > --- > drivers/net/ethernet/intel/e1000e/netdev.c | 14 +++++++++++++- > 1 file changed, 13 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c > index e63445a8ce12..0244d3dd90a3 100644 > --- a/drivers/net/ethernet/intel/e1000e/netdev.c > +++ b/drivers/net/ethernet/intel/e1000e/netdev.c > @@ -7319,7 +7319,8 @@ static const struct net_device_ops e1000e_netdev_ops = { > > static void e1000e_create_device_links(struct pci_dev *pdev) > { > - struct pci_dev *tgp_mei_me; > + struct pci_bus *bus = pdev->bus; > + struct pci_dev *tgp_mei_me, *p; > > /* Find TGP mei_me devices and make e1000e power depend on mei_me */ > tgp_mei_me = pci_get_device(PCI_VENDOR_ID_INTEL, 0xa0e0, NULL); > @@ -7335,6 +7336,17 @@ static void e1000e_create_device_links(struct pci_dev *pdev) > pci_info(pdev, "System and runtime PM depends on %s\n", > pci_name(tgp_mei_me)); > > + /* Find other devices in the SoC and make them depend on e1000e */ > + list_for_each_entry(p, &bus->devices, bus_list) { > + if (&p->dev == &pdev->dev || &p->dev == &tgp_mei_me->dev) > + continue; > + > + if (device_link_add(&p->dev, &pdev->dev, > + DL_FLAG_AUTOREMOVE_SUPPLIER)) > + pci_info(p, "System PM depends on %s\n", > + pci_name(pdev)); > + } > + > pci_dev_put(tgp_mei_me); > } > > -- > 2.31.1 >
On 7/27/2021 09:53, Kai-Heng Feng wrote: > Hi Sasha, > > On Mon, Jul 12, 2021 at 9:35 PM Kai-Heng Feng > <kai.heng.feng@canonical.com> wrote: >> >> On TGL systems, PCI_COMMAND may randomly flip to 0 on system resume. >> This is devastating to drivers that use pci_set_master(), like NVMe and >> xHCI, to enable DMA in their resume routine, as pci_set_master() can >> inadvertently disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY, making >> resources inaccessible. >> >> The issue is reproducible on all kernel releases, but the situation is >> exacerbated by commit 6cecf02e77ab ("Revert "e1000e: disable s0ix entry >> and exit flows for ME systems""). >> >> Seems like ME can do many things to other PCI devices until it's finally out of >> ULP polling. So ensure e1000e PM ops are serialized by enforcing suspend/resume >> order to workaround the issue. >> >> Of course this will make system suspend and resume a bit slower, but we >> probably need to settle on this workaround until ME is fully supported. >> >> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212039 >> Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> > > Series "e1000e: Add handshake with the CSME to support s0ix" doesn't > fix the issue, so this patch is still needed. Hello Kai-Heng, This problem is still under investigation by the ME team. Let's wait for their response. Series "e1000e: Add handshake with the CSME to support s0ix" - support only s0ix flow on AMT/CSME none provisioned systems and not related to this problem. > > Kai-Heng > >> --- >> drivers/net/ethernet/intel/e1000e/netdev.c | 14 +++++++++++++- >> 1 file changed, 13 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c >> index e63445a8ce12..0244d3dd90a3 100644 >> --- a/drivers/net/ethernet/intel/e1000e/netdev.c >> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c >> @@ -7319,7 +7319,8 @@ static const struct net_device_ops e1000e_netdev_ops = { >> >> static void e1000e_create_device_links(struct pci_dev *pdev) >> { >> - struct pci_dev *tgp_mei_me; >> + struct pci_bus *bus = pdev->bus; >> + struct pci_dev *tgp_mei_me, *p; >> >> /* Find TGP mei_me devices and make e1000e power depend on mei_me */ >> tgp_mei_me = pci_get_device(PCI_VENDOR_ID_INTEL, 0xa0e0, NULL); >> @@ -7335,6 +7336,17 @@ static void e1000e_create_device_links(struct pci_dev *pdev) >> pci_info(pdev, "System and runtime PM depends on %s\n", >> pci_name(tgp_mei_me)); >> >> + /* Find other devices in the SoC and make them depend on e1000e */ >> + list_for_each_entry(p, &bus->devices, bus_list) { >> + if (&p->dev == &pdev->dev || &p->dev == &tgp_mei_me->dev) >> + continue; >> + >> + if (device_link_add(&p->dev, &pdev->dev, >> + DL_FLAG_AUTOREMOVE_SUPPLIER)) >> + pci_info(p, "System PM depends on %s\n", >> + pci_name(pdev)); >> + } >> + >> pci_dev_put(tgp_mei_me); >> } >> >> -- >> 2.31.1 >> sasha
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index e63445a8ce12..0244d3dd90a3 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -7319,7 +7319,8 @@ static const struct net_device_ops e1000e_netdev_ops = { static void e1000e_create_device_links(struct pci_dev *pdev) { - struct pci_dev *tgp_mei_me; + struct pci_bus *bus = pdev->bus; + struct pci_dev *tgp_mei_me, *p; /* Find TGP mei_me devices and make e1000e power depend on mei_me */ tgp_mei_me = pci_get_device(PCI_VENDOR_ID_INTEL, 0xa0e0, NULL); @@ -7335,6 +7336,17 @@ static void e1000e_create_device_links(struct pci_dev *pdev) pci_info(pdev, "System and runtime PM depends on %s\n", pci_name(tgp_mei_me)); + /* Find other devices in the SoC and make them depend on e1000e */ + list_for_each_entry(p, &bus->devices, bus_list) { + if (&p->dev == &pdev->dev || &p->dev == &tgp_mei_me->dev) + continue; + + if (device_link_add(&p->dev, &pdev->dev, + DL_FLAG_AUTOREMOVE_SUPPLIER)) + pci_info(p, "System PM depends on %s\n", + pci_name(pdev)); + } + pci_dev_put(tgp_mei_me); }
On TGL systems, PCI_COMMAND may randomly flip to 0 on system resume. This is devastating to drivers that use pci_set_master(), like NVMe and xHCI, to enable DMA in their resume routine, as pci_set_master() can inadvertently disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY, making resources inaccessible. The issue is reproducible on all kernel releases, but the situation is exacerbated by commit 6cecf02e77ab ("Revert "e1000e: disable s0ix entry and exit flows for ME systems""). Seems like ME can do many things to other PCI devices until it's finally out of ULP polling. So ensure e1000e PM ops are serialized by enforcing suspend/resume order to workaround the issue. Of course this will make system suspend and resume a bit slower, but we probably need to settle on this workaround until ME is fully supported. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212039 Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> --- drivers/net/ethernet/intel/e1000e/netdev.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)