Message ID | 20201214153450.874339-1-mario.limonciello@dell.com (mailing list archive) |
---|---|
Headers | show |
Series | Improve s0ix flows for systems i219LM | expand |
On Mon, Dec 14, 2020 at 7:35 AM Mario Limonciello <mario.limonciello@dell.com> wrote: > > commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME systems") > disabled s0ix flows for systems that have various incarnations of the > i219-LM ethernet controller. This was done because of some regressions > caused by an earlier > commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case") > with i219-LM controller. > > Per discussion with Intel architecture team this direction should be changed and > allow S0ix flows to be used by default. This patch series includes directional > changes for their conclusions in https://lkml.org/lkml/2020/12/13/15. > > Changes from v3 to v4: > - Drop patch 1 for proper s0i3.2 entry, it was separated and is now merged in kernel > - Add patch to only run S0ix flows if shutdown succeeded which was suggested in > thread > - Adjust series for guidance from https://lkml.org/lkml/2020/12/13/15 > * Revert i219-LM disallow-list. > * Drop all patches for systems tested by Dell in an allow list > * Increase ULP timeout to 1000ms > Changes from v2 to v3: > - Correct some grammar and spelling issues caught by Bjorn H. > * s/s0ix/S0ix/ in all commit messages > * Fix a typo in commit message > * Fix capitalization of proper nouns > - Add more pre-release systems that pass > - Re-order the series to add systems only at the end of the series > - Add Fixes tag to a patch in series. > > Changes from v1 to v2: > - Directly incorporate Vitaly's dependency patch in the series > - Split out s0ix code into it's own file > - Adjust from DMI matching to PCI subsystem vendor ID/device matching > - Remove module parameter and sysfs, use ethtool flag instead. > - Export s0ix flag to ethtool private flags > - Include more people and lists directly in this submission chain. > > Mario Limonciello (4): > e1000e: Only run S0ix flows if shutdown succeeded > e1000e: bump up timeout to wait when ME un-configure ULP mode > Revert "e1000e: disable s0ix entry and exit flows for ME systems" > e1000e: Export S0ix flags to ethtool > > drivers/net/ethernet/intel/e1000e/e1000.h | 1 + > drivers/net/ethernet/intel/e1000e/ethtool.c | 40 ++++++++++++++ > drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 +- > drivers/net/ethernet/intel/e1000e/netdev.c | 59 ++++----------------- > 4 files changed, 53 insertions(+), 51 deletions(-) > The changes look good to me. Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Hi All, Sasha (and the other intel-wired-lan folks), thank you for investigating this further and for coming up with a better solution. Mario, thank you for implementing the new scheme. I've tested this patch set on a Lenovo X1C8 with vPRO and AMT enabled in the BIOS (the previous issues were soon on a X1C7). I have good and bad news: The good news is that after reverting the "e1000e: disable s0ix entry and exit flows for ME systems" I can reproduce the original issue on the X1C8 (I no longer have a X1C7 to test on). The bad news is that increasing the timeout to 1 second does not fix the issue. Suspend/resume is still broken after one suspend/resume cycle, as described in the original bug-report: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865570 More good news though, bumping the timeout to 250 poll iterations (approx 2.5 seconds) as done in Aaron Ma's original patch for this fixes this on the X1C8 just as it did on the X1C7 (it takes 2 seconds for ULP_CONFIG_DONE to clear). I've ran some extra tests and the poll loop succeeds on its first iteration when an ethernet-cable is connected. It seems that Lenovo's variant of the ME firmware waits up to 2 seconds for a link, causing the long wait for ULP_CONFIG_DONE to clear. I think that for now the best fix would be to increase the timeout to 2.5 seconds as done in Aaron Ma's original patch. Combined with a broken-firmware warning when we waited longer then 1 second, to make it clear that there is a firmware issue here and that the long wait / slow resume is not the fault of the driver. ### I've added Mark Pearson from Lenovo to the Cc so that Lenovo can investigate this issue further. Mark, this thread is about an issue with enabling S0ix support for e1000e (i219lm) controllers. This was enabled in the kernel a while ago, but then got disabled again on vPro / AMT enabled systems because on some systems (Lenovo X1C7 and now also X1C8) this lead to suspend/resume issues. When AMT is active then there is a handover handshake for the OS to get access to the ethernet controller from the ME. The Intel folks have checked and the Windows driver is using a timeout of 1 second for this handshake, yet on Lenovo systems this is taking 2 seconds. This likely has something to do with the ME firmware on these Lenovo models, can you get the firmware team at Lenovo to investigate this further ? Regards, Hans p.s. I also have a small review remark on patch 4/4 I will reply to that patch separately. On 12/14/20 4:34 PM, Mario Limonciello wrote: > commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME systems") > disabled s0ix flows for systems that have various incarnations of the > i219-LM ethernet controller. This was done because of some regressions > caused by an earlier > commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case") > with i219-LM controller. > > Per discussion with Intel architecture team this direction should be changed and > allow S0ix flows to be used by default. This patch series includes directional > changes for their conclusions in https://lkml.org/lkml/2020/12/13/15. > > Changes from v3 to v4: > - Drop patch 1 for proper s0i3.2 entry, it was separated and is now merged in kernel > - Add patch to only run S0ix flows if shutdown succeeded which was suggested in > thread > - Adjust series for guidance from https://lkml.org/lkml/2020/12/13/15 > * Revert i219-LM disallow-list. > * Drop all patches for systems tested by Dell in an allow list > * Increase ULP timeout to 1000ms > Changes from v2 to v3: > - Correct some grammar and spelling issues caught by Bjorn H. > * s/s0ix/S0ix/ in all commit messages > * Fix a typo in commit message > * Fix capitalization of proper nouns > - Add more pre-release systems that pass > - Re-order the series to add systems only at the end of the series > - Add Fixes tag to a patch in series. > > Changes from v1 to v2: > - Directly incorporate Vitaly's dependency patch in the series > - Split out s0ix code into it's own file > - Adjust from DMI matching to PCI subsystem vendor ID/device matching > - Remove module parameter and sysfs, use ethtool flag instead. > - Export s0ix flag to ethtool private flags > - Include more people and lists directly in this submission chain. > > Mario Limonciello (4): > e1000e: Only run S0ix flows if shutdown succeeded > e1000e: bump up timeout to wait when ME un-configure ULP mode > Revert "e1000e: disable s0ix entry and exit flows for ME systems" > e1000e: Export S0ix flags to ethtool > > drivers/net/ethernet/intel/e1000e/e1000.h | 1 + > drivers/net/ethernet/intel/e1000e/ethtool.c | 40 ++++++++++++++ > drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 +- > drivers/net/ethernet/intel/e1000e/netdev.c | 59 ++++----------------- > 4 files changed, 53 insertions(+), 51 deletions(-) > > -- > 2.25.1 >
Thanks Hans On 14/12/2020 13:31, Mark Pearson wrote: > > > ------------------------------------------------------------------------ > *From:* Hans de Goede <hdegoede@redhat.com> > *Sent:* December 14, 2020 13:24 > *To:* Mario Limonciello <mario.limonciello@dell.com>; Jeff Kirsher > <jeffrey.t.kirsher@intel.com>; Tony Nguyen <anthony.l.nguyen@intel.com>; > intel-wired-lan@lists.osuosl.org <intel-wired-lan@lists.osuosl.org>; > David Miller <davem@davemloft.net>; Aaron Ma <aaron.ma@canonical.com>; > Mark Pearson <mpearson@lenovo.com> > *Cc:* linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; > Netdev <netdev@vger.kernel.org>; Alexander Duyck > <alexander.duyck@gmail.com>; Jakub Kicinski <kuba@kernel.org>; Sasha > Netfin <sasha.neftin@intel.com>; Aaron Brown <aaron.f.brown@intel.com>; > Stefan Assmann <sassmann@redhat.com>; darcari@redhat.com > <darcari@redhat.com>; Yijun.Shen@dell.com <Yijun.Shen@dell.com>; > Perry.Yuan@dell.com <Perry.Yuan@dell.com>; anthony.wong@canonical.com > <anthony.wong@canonical.com> > *Subject:* [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems > i219LM > > Hi All, > <snip> > > ### > > I've added Mark Pearson from Lenovo to the Cc so that Lenovo > can investigate this issue further. > > Mark, this thread is about an issue with enabling S0ix support for > e1000e (i219lm) controllers. This was enabled in the kernel a > while ago, but then got disabled again on vPro / AMT enabled > systems because on some systems (Lenovo X1C7 and now also X1C8) > this lead to suspend/resume issues. > > When AMT is active then there is a handover handshake for the > OS to get access to the ethernet controller from the ME. The > Intel folks have checked and the Windows driver is using a timeout > of 1 second for this handshake, yet on Lenovo systems this is > taking 2 seconds. This likely has something to do with the > ME firmware on these Lenovo models, can you get the firmware > team at Lenovo to investigate this further ? Absolutely - I'll ask them to look into this again. We did try to make progress with this previously - but it got a bit stuck and hence the need for these patches....but I believe things may have changed a bit so it's worth trying again Mark
> Hi All, > > Sasha (and the other intel-wired-lan folks), thank you for investigating this > further and for coming up with a better solution. > > Mario, thank you for implementing the new scheme. > Sure. > I've tested this patch set on a Lenovo X1C8 with vPRO and AMT enabled in the > BIOS > (the previous issues were soon on a X1C7). > > I have good and bad news: > > The good news is that after reverting the > "e1000e: disable s0ix entry and exit flows for ME systems" > I can reproduce the original issue on the X1C8 (I no longer have > a X1C7 to test on). > > The bad news is that increasing the timeout to 1 second does > not fix the issue. Suspend/resume is still broken after one > suspend/resume cycle, as described in the original bug-report: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865570 > > More good news though, bumping the timeout to 250 poll iterations > (approx 2.5 seconds) as done in Aaron Ma's original patch for > this fixes this on the X1C8 just as it did on the X1C7 > (it takes 2 seconds for ULP_CONFIG_DONE to clear). > > I've ran some extra tests and the poll loop succeeds on its > first iteration when an ethernet-cable is connected. It seems > that Lenovo's variant of the ME firmware waits up to 2 seconds > for a link, causing the long wait for ULP_CONFIG_DONE to clear. > > I think that for now the best fix would be to increase the timeout > to 2.5 seconds as done in Aaron Ma's original patch. Combined > with a broken-firmware warning when we waited longer then 1 second, > to make it clear that there is a firmware issue here and that > the long wait / slow resume is not the fault of the driver. > OK. I've submitted v5 with this suggestion. > ### > > I've added Mark Pearson from Lenovo to the Cc so that Lenovo > can investigate this issue further. > > Mark, this thread is about an issue with enabling S0ix support for > e1000e (i219lm) controllers. This was enabled in the kernel a > while ago, but then got disabled again on vPro / AMT enabled > systems because on some systems (Lenovo X1C7 and now also X1C8) > this lead to suspend/resume issues. > > When AMT is active then there is a handover handshake for the > OS to get access to the ethernet controller from the ME. The > Intel folks have checked and the Windows driver is using a timeout > of 1 second for this handshake, yet on Lenovo systems this is > taking 2 seconds. This likely has something to do with the > ME firmware on these Lenovo models, can you get the firmware > team at Lenovo to investigate this further ? > Please be very careful with nomenclature. AMT active, or AMT capable? The goal for this series is to support AMT capable systems with an i219LM where AMT has not been provisioned by the end user or organization. OEMs do not ship systems with AMD provisioned. I don't know that this series will work properly with AMT active, and we will need more guidance from Intel's team to enable that feature. Please lets keep that discussion separate from this series. > Regards, > > Hans > > p.s. > > I also have a small review remark on patch 4/4 I will > reply to that patch separately. > Thanks. > > > > > > > > On 12/14/20 4:34 PM, Mario Limonciello wrote: > > commit e086ba2fccda ("e1000e: disable s0ix entry and exit flows for ME > systems") > > disabled s0ix flows for systems that have various incarnations of the > > i219-LM ethernet controller. This was done because of some regressions > > caused by an earlier > > commit 632fbd5eb5b0e ("e1000e: fix S0ix flows for cable connected case") > > with i219-LM controller. > > > > Per discussion with Intel architecture team this direction should be changed > and > > allow S0ix flows to be used by default. This patch series includes > directional > > changes for their conclusions in https://lkml.org/lkml/2020/12/13/15. > > > > Changes from v3 to v4: > > - Drop patch 1 for proper s0i3.2 entry, it was separated and is now merged > in kernel > > - Add patch to only run S0ix flows if shutdown succeeded which was > suggested in > > thread > > - Adjust series for guidance from https://lkml.org/lkml/2020/12/13/15 > > * Revert i219-LM disallow-list. > > * Drop all patches for systems tested by Dell in an allow list > > * Increase ULP timeout to 1000ms > > Changes from v2 to v3: > > - Correct some grammar and spelling issues caught by Bjorn H. > > * s/s0ix/S0ix/ in all commit messages > > * Fix a typo in commit message > > * Fix capitalization of proper nouns > > - Add more pre-release systems that pass > > - Re-order the series to add systems only at the end of the series > > - Add Fixes tag to a patch in series. > > > > Changes from v1 to v2: > > - Directly incorporate Vitaly's dependency patch in the series > > - Split out s0ix code into it's own file > > - Adjust from DMI matching to PCI subsystem vendor ID/device matching > > - Remove module parameter and sysfs, use ethtool flag instead. > > - Export s0ix flag to ethtool private flags > > - Include more people and lists directly in this submission chain. > > > > Mario Limonciello (4): > > e1000e: Only run S0ix flows if shutdown succeeded > > e1000e: bump up timeout to wait when ME un-configure ULP mode > > Revert "e1000e: disable s0ix entry and exit flows for ME systems" > > e1000e: Export S0ix flags to ethtool > > > > drivers/net/ethernet/intel/e1000e/e1000.h | 1 + > > drivers/net/ethernet/intel/e1000e/ethtool.c | 40 ++++++++++++++ > > drivers/net/ethernet/intel/e1000e/ich8lan.c | 4 +- > > drivers/net/ethernet/intel/e1000e/netdev.c | 59 ++++----------------- > > 4 files changed, 53 insertions(+), 51 deletions(-) > > > > -- > > 2.25.1 > >
Hi, On 12/14/20 8:36 PM, Limonciello, Mario wrote: >> Hi All, >> >> Sasha (and the other intel-wired-lan folks), thank you for investigating this >> further and for coming up with a better solution. >> >> Mario, thank you for implementing the new scheme. >> > > Sure. > >> I've tested this patch set on a Lenovo X1C8 with vPRO and AMT enabled in the >> BIOS >> (the previous issues were soon on a X1C7). >> >> I have good and bad news: >> >> The good news is that after reverting the >> "e1000e: disable s0ix entry and exit flows for ME systems" >> I can reproduce the original issue on the X1C8 (I no longer have >> a X1C7 to test on). >> >> The bad news is that increasing the timeout to 1 second does >> not fix the issue. Suspend/resume is still broken after one >> suspend/resume cycle, as described in the original bug-report: >> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1865570 >> >> More good news though, bumping the timeout to 250 poll iterations >> (approx 2.5 seconds) as done in Aaron Ma's original patch for >> this fixes this on the X1C8 just as it did on the X1C7 >> (it takes 2 seconds for ULP_CONFIG_DONE to clear). >> >> I've ran some extra tests and the poll loop succeeds on its >> first iteration when an ethernet-cable is connected. It seems >> that Lenovo's variant of the ME firmware waits up to 2 seconds >> for a link, causing the long wait for ULP_CONFIG_DONE to clear. >> >> I think that for now the best fix would be to increase the timeout >> to 2.5 seconds as done in Aaron Ma's original patch. Combined >> with a broken-firmware warning when we waited longer then 1 second, >> to make it clear that there is a firmware issue here and that >> the long wait / slow resume is not the fault of the driver. >> > > OK. I've submitted v5 with this suggestion. > >> ### >> >> I've added Mark Pearson from Lenovo to the Cc so that Lenovo >> can investigate this issue further. >> >> Mark, this thread is about an issue with enabling S0ix support for >> e1000e (i219lm) controllers. This was enabled in the kernel a >> while ago, but then got disabled again on vPro / AMT enabled >> systems because on some systems (Lenovo X1C7 and now also X1C8) >> this lead to suspend/resume issues. >> >> When AMT is active then there is a handover handshake for the >> OS to get access to the ethernet controller from the ME. The >> Intel folks have checked and the Windows driver is using a timeout >> of 1 second for this handshake, yet on Lenovo systems this is >> taking 2 seconds. This likely has something to do with the >> ME firmware on these Lenovo models, can you get the firmware >> team at Lenovo to investigate this further ? >> > > Please be very careful with nomenclature. AMT active, or AMT capable? > The goal for this series is to support AMT capable systems with an i219LM > where AMT has not been provisioned by the end user or organization. > OEMs do not ship systems with AMD provisioned. Ah, sorry about that. What I meant with "active" is set to "Enabled" in the BIOS. Also FWIW I just tried disabling AMT in the BIOS (using the "Disabled" option, not the "Permanently Disabled" option) on the Lenovo X1 Carbon 8th gen, but that does not make a difference. It still takes 2 seconds for ULP_CONFIG_DONE to clear even with AMT set to "Disabled" in the BIOS :| Regards, Hans
On 12/14/2020 20:40, Mark Pearson wrote: > Thanks Hans > > On 14/12/2020 13:31, Mark Pearson wrote: >> >> >> ------------------------------------------------------------------------ >> *From:* Hans de Goede <hdegoede@redhat.com> >> *Sent:* December 14, 2020 13:24 >> *To:* Mario Limonciello <mario.limonciello@dell.com>; Jeff Kirsher >> <jeffrey.t.kirsher@intel.com>; Tony Nguyen <anthony.l.nguyen@intel.com>; >> intel-wired-lan@lists.osuosl.org <intel-wired-lan@lists.osuosl.org>; >> David Miller <davem@davemloft.net>; Aaron Ma <aaron.ma@canonical.com>; >> Mark Pearson <mpearson@lenovo.com> >> *Cc:* linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>; >> Netdev <netdev@vger.kernel.org>; Alexander Duyck >> <alexander.duyck@gmail.com>; Jakub Kicinski <kuba@kernel.org>; Sasha >> Netfin <sasha.neftin@intel.com>; Aaron Brown <aaron.f.brown@intel.com>; >> Stefan Assmann <sassmann@redhat.com>; darcari@redhat.com >> <darcari@redhat.com>; Yijun.Shen@dell.com <Yijun.Shen@dell.com>; >> Perry.Yuan@dell.com <Perry.Yuan@dell.com>; anthony.wong@canonical.com >> <anthony.wong@canonical.com> >> *Subject:* [External] Re: [PATCH v4 0/4] Improve s0ix flows for systems >> i219LM >> >> Hi All, >> > <snip> >> >> ### >> >> I've added Mark Pearson from Lenovo to the Cc so that Lenovo >> can investigate this issue further. >> >> Mark, this thread is about an issue with enabling S0ix support for >> e1000e (i219lm) controllers. This was enabled in the kernel a >> while ago, but then got disabled again on vPro / AMT enabled >> systems because on some systems (Lenovo X1C7 and now also X1C8) >> this lead to suspend/resume issues. >> >> When AMT is active then there is a handover handshake for the >> OS to get access to the ethernet controller from the ME. The >> Intel folks have checked and the Windows driver is using a timeout >> of 1 second for this handshake, yet on Lenovo systems this is >> taking 2 seconds. This likely has something to do with the >> ME firmware on these Lenovo models, can you get the firmware >> team at Lenovo to investigate this further ? > Absolutely - I'll ask them to look into this again. > we need to explain why on Windows systems required 1s and on Linux systems up to 2.5s - otherwise it is not reliable approach - you will encounter others buggy system. (ME not POR on the Linux systems - is only one possible answer) > We did try to make progress with this previously - but it got a bit > stuck and hence the need for these patches....but I believe things may > have changed a bit so it's worth trying again > > Mark > Sasha
> > Absolutely - I'll ask them to look into this again. > > > we need to explain why on Windows systems required 1s and on Linux > systems up to 2.5s - otherwise it is not reliable approach - you will > encounter others buggy system. > (ME not POR on the Linux systems - is only one possible answer) Sasha: In your opinion does this information need to block the series? or can we follow up with more changes later on as more information becomes available? For now v5 of the series extends the timeout but at least makes a mention that there appears to be a firmware bug when more than 1 second is taken.
On 12/15/2020 19:20, Limonciello, Mario wrote: > >>> Absolutely - I'll ask them to look into this again. >>> >> we need to explain why on Windows systems required 1s and on Linux >> systems up to 2.5s - otherwise it is not reliable approach - you will >> encounter others buggy system. >> (ME not POR on the Linux systems - is only one possible answer) > > Sasha: In your opinion does this information need to block the series? > or can we follow up with more changes later on as more information becomes > available? > I do not think this should block the patches series. > For now v5 of the series extends the timeout but at least makes a mention > that there appears to be a firmware bug when more than 1 second is taken. >