diff mbox series

[V1] PCI/ASPM: Save/restore L1SS Capability for suspend/resume

Message ID 20220201123536.12962-1-vidyas@nvidia.com (mailing list archive)
State Changes Requested
Delegated to: Bjorn Helgaas
Headers show
Series [V1] PCI/ASPM: Save/restore L1SS Capability for suspend/resume | expand

Commit Message

Vidya Sagar Feb. 1, 2022, 12:35 p.m. UTC
Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
saved and restored during suspend/resume leading to L1 Substates
configuration being lost post-resume.

Save the L1 Substates control registers so that the configuration is
retained post-resume.

Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
---
Hi,
Similar patch was merged in the past through the commit 4257f7e008ea
("PCI/ASPM: Save/restore L1SS Capability for suspend/resume") but it later
got reverted through the commit 40fb68c7725a as Kenneth R. Crudup
<kenny@panix.com> reported disk IO errors because of it. I couldn't spend much
time debugging the issue back then, but taking a fresh look at the issue, it
seems more like an issue with the platform in question than this change itself.
Reason being that there are other devices that support ASPM L1 Sub-States
on that platform (as observed in the lspci output mentioned at
https://lore.kernel.org/linux-pci/53d3bd83-c0c2-d71f-9e5b-1dbdde55786@panix.com/ )
and assuming that L1 Sub-States are indeed enabled for those devices, there
are no issues reported from those devices except from the NVMe drive.
When it comes to the NVMe driver, the code at
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3008
has some quirks for some of the models from Dell Inc and I'm wondering if
the model on which the issue was observed might need a quirk of its own??

So,
Kenneth R. Crudup <kenny@panix.com>
Could you please try this patch on top of linux-next and collect more info?
- 'sudo lspci -vvvv' output before and after hibernation
- could you please confirm the working of this patch for non NVMe devices that
  support L1 Sub-States?
- Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
  quirks (one at a time) in check_vendor_combination_bug() API and see if it
  makes any difference?

Thanks & Regards,
Vidya Sagar

 drivers/pci/pci.c       |  7 +++++++
 drivers/pci/pci.h       |  4 ++++
 drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 55 insertions(+)

Comments

Kenneth Crudup Feb. 1, 2022, 1:54 p.m. UTC | #1
On Tue, 1 Feb 2022, Vidya Sagar wrote:

> So,
> Could you please try this patch on top of linux-next and collect more info?

Sure! Give me a day or so (traveling) and I'll report back.

	-Kenny
Kenneth Crudup Feb. 1, 2022, 7:03 p.m. UTC | #2
The attachments made the messages too big; resending one at a time.

On Tue, 1 Feb 2022, Kenneth R. Crudup wrote:

>
> On Tue, 1 Feb 2022, Vidya Sagar wrote:
>
> > Kenneth R. Crudup <kenny@panix.com>
> > Could you please try this patch on top of linux-next and collect more info?
> > - 'sudo lspci -vvvv' output before and after hibernation
>
> See attached. Added a diff file, too.
>
> > - could you please confirm the working of this patch for non NVMe devices that
> >   support L1 Sub-States?
>
> It seems to work, I'm on it now. I saw a "GPU Hang" from the i915 driver, but
> it doesn't seem to be affecting the GPU any.
>
> > - Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
> >   quirks (one at a time) in check_vendor_combination_bug() API and see if it
> >   makes any difference?
>
> Do you still need me to do that, since this appears to work?
>
> I will try another hibernate attempt just to be sure, too.
>
> 	-Kenny
>
>
Kenneth Crudup Feb. 1, 2022, 7:05 p.m. UTC | #3
The attachments made the messages too big; resending one at a time.

On Tue, 1 Feb 2022, Kenneth R. Crudup wrote:

>
> On Tue, 1 Feb 2022, Vidya Sagar wrote:
>
> > Kenneth R. Crudup <kenny@panix.com>
> > Could you please try this patch on top of linux-next and collect more info?
> > - 'sudo lspci -vvvv' output before and after hibernation
>
> See attached. Added a diff file, too.
>
> > - could you please confirm the working of this patch for non NVMe devices that
> >   support L1 Sub-States?
>
> It seems to work, I'm on it now. I saw a "GPU Hang" from the i915 driver, but
> it doesn't seem to be affecting the GPU any.
>
> > - Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
> >   quirks (one at a time) in check_vendor_combination_bug() API and see if it
> >   makes any difference?
>
> Do you still need me to do that, since this appears to work?
>
> I will try another hibernate attempt just to be sure, too.
>
> 	-Kenny
>
>
Kenneth Crudup Feb. 1, 2022, 7:10 p.m. UTC | #4
The attachments made the messages too big; resending one at a time.
(This one is the "lspci" after the 2nd hibernation attempt (which worked)).

On Tue, 1 Feb 2022, Kenneth R. Crudup wrote:

>
> On Tue, 1 Feb 2022, Vidya Sagar wrote:
>
> > Kenneth R. Crudup <kenny@panix.com>
> > Could you please try this patch on top of linux-next and collect more info?
> > - 'sudo lspci -vvvv' output before and after hibernation
>
> See attached. Added a diff file, too.
>
> > - could you please confirm the working of this patch for non NVMe devices that
> >   support L1 Sub-States?
>
> It seems to work, I'm on it now. I saw a "GPU Hang" from the i915 driver, but
> it doesn't seem to be affecting the GPU any.
>
> > - Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
> >   quirks (one at a time) in check_vendor_combination_bug() API and see if it
> >   makes any difference?
>
> Do you still need me to do that, since this appears to work?
>
> I will try another hibernate attempt just to be sure, too.
>
> 	-Kenny
>
>
Vidya Sagar Feb. 1, 2022, 7:24 p.m. UTC | #5
Hi Kenneth,
Thanks for sharing the files.
BTW, I see that the ASPM L1SS capability is supported by only two 
endpoints viz. KIOXIA's NVMe drive and Realtek's Card reader. None of 
the root ports seem to have the support. So, I'm wondering how was it 
even getting enabled in the first place earlier?
(OR)
was it the case that L1SS sub-states were never enabled earlier also and 
the issue was occurring without having ASPM L1SS enabled? (but with only 
L0s and L1 enabled??)

Also, I see that from 'before' and 'after' logs that for both NVMe and 
Card reader and their corresponding root ports, none of the ASPM states 
are enabled (not even L0s or L1).
Did you set the policy to 'powersupersave' before hibernating the system?

On 2/2/2022 12:23 AM, Kenneth R. Crudup wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Tue, 1 Feb 2022, Vidya Sagar wrote:
> 
>> Kenneth R. Crudup <kenny@panix.com>
>> Could you please try this patch on top of linux-next and collect more info?
>> - 'sudo lspci -vvvv' output before and after hibernation
> 
> See attached. Added a diff file, too.
> 
>> - could you please confirm the working of this patch for non NVMe devices that
>>    support L1 Sub-States?
> 
> It seems to work, I'm on it now. I saw a "GPU Hang" from the i915 driver, but
> it doesn't seem to be affecting the GPU any.
> 
>> - Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
>>    quirks (one at a time) in check_vendor_combination_bug() API and see if it
>>    makes any difference?
> 
> Do you still need me to do that, since this appears to work?
> 
> I will try another hibernate attempt just to be sure, too.
> 
>          -Kenny
> 
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA
>
Kenneth Crudup Feb. 1, 2022, 10:25 p.m. UTC | #6
On Wed, 2 Feb 2022, Vidya Sagar wrote:

> BTW, I see that the ASPM L1SS capability is supported by only two endpoints
> viz. KIOXIA's NVMe drive and Realtek's Card reader. None of the root ports
> seem to have the support. So, I'm wondering how was it even getting enabled in
> the first place earlier?

> (OR)

> was it the case that L1SS sub-states were never enabled earlier also and the
> issue was occurring without having ASPM L1SS enabled? (but with only L0s and
> L1 enabled??)

I'm not proficient enough in PCIe to be able to be sure of the answers to those-
what can/could I do to determine this?

> Also, I see that from 'before' and 'after' logs that for both NVMe and Card
> reader and their corresponding root ports, none of the ASPM states are enabled
> (not even L0s or L1).
> Did you set the policy to 'powersupersave' before hibernating the system?

Yeah:

CONFIG_PCIEASPM_POWER_SUPERSAVE=y

My laptop loses ~1.5%/hr in S3; I was trying anything I could to reduce that,
if possible.

	-Kenny
Vidya Sagar Feb. 2, 2022, 4:17 a.m. UTC | #7
On 2/2/2022 3:55 AM, Kenneth R. Crudup wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 2 Feb 2022, Vidya Sagar wrote:
> 
>> BTW, I see that the ASPM L1SS capability is supported by only two endpoints
>> viz. KIOXIA's NVMe drive and Realtek's Card reader. None of the root ports
>> seem to have the support. So, I'm wondering how was it even getting enabled in
>> the first place earlier?
> 
>> (OR)
> 
>> was it the case that L1SS sub-states were never enabled earlier also and the
>> issue was occurring without having ASPM L1SS enabled? (but with only L0s and
>> L1 enabled??)
> 
> I'm not proficient enough in PCIe to be able to be sure of the answers to those-
> what can/could I do to determine this?
Nothing at this point, but could you please confirm that you are using 
the same system as before? if that is the case, then, I'm not sure how 
is it possible that the earlier patch which is also for saving/restoring 
L1SS registers could affect a system that doesn't even support L1SS.

Bjorn, any thoughts on this?

> 
>> Also, I see that from 'before' and 'after' logs that for both NVMe and Card
>> reader and their corresponding root ports, none of the ASPM states are enabled
>> (not even L0s or L1).
>> Did you set the policy to 'powersupersave' before hibernating the system?
> 
> Yeah:
> 
> CONFIG_PCIEASPM_POWER_SUPERSAVE=y
> 
> My laptop loses ~1.5%/hr in S3; I was trying anything I could to reduce that,
> if possible.
> 
>          -Kenny
> 
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA
>
Kenneth Crudup Feb. 4, 2022, 1:18 a.m. UTC | #8
On Wed, 2 Feb 2022, Vidya Sagar wrote:

> but could you please confirm that you are using the same system as before?

Yeah, the same Dell XPS 7390 2-in-1 as last year.

I've merged this change into Linus' master and it's been
suspending/resuming/hibernating with no issues so far. Is there anything else
you'd like me to test?

	-Kenny
Vidya Sagar Feb. 4, 2022, 5:37 a.m. UTC | #9
On 2/4/2022 6:48 AM, Kenneth R. Crudup wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Wed, 2 Feb 2022, Vidya Sagar wrote:
> 
>> but could you please confirm that you are using the same system as before?
> 
> Yeah, the same Dell XPS 7390 2-in-1 as last year.
> 
> I've merged this change into Linus' master and it's been
> suspending/resuming/hibernating with no issues so far. Is there anything else
> you'd like me to test?
Nothing more at this point. Thanks for your time though.

> 
>          -Kenny
> 
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA
>
Abhishek Sahu Feb. 4, 2022, 11:23 a.m. UTC | #10
On 2/1/2022 6:05 PM, Vidya Sagar wrote:
> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't
> saved and restored during suspend/resume leading to L1 Substates
> configuration being lost post-resume.
> 
> Save the L1 Substates control registers so that the configuration is
> retained post-resume.
> 

 Thanks Vidya for making this change.
 
 I have applied your patch in v5.17-rc2 and did 100 cycles of
 suspend/resume in a Alder lake based notebook which has 
 NVIDIA discrete GPU and it is working fine.

 After Boot:

 # lspci -d "0x10de:" -vvv|grep "L1SubCtl1"
   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+

 After Suspend/resume without this patch:

   L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-

 After Suspend/resume with this patch:
   L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+


 Tested-by: Abhishek Sahu <abhsahu@nvidia.com>

 Thanks,
 Abhishek

> Signed-off-by: Vidya Sagar <vidyas@nvidia.com>
> ---
> Hi,
> Similar patch was merged in the past through the commit 4257f7e008ea
> ("PCI/ASPM: Save/restore L1SS Capability for suspend/resume") but it later
> got reverted through the commit 40fb68c7725a as Kenneth R. Crudup
> <kenny@panix.com> reported disk IO errors because of it. I couldn't spend much
> time debugging the issue back then, but taking a fresh look at the issue, it
> seems more like an issue with the platform in question than this change itself.
> Reason being that there are other devices that support ASPM L1 Sub-States
> on that platform (as observed in the lspci output mentioned at
> https://lore.kernel.org/linux-pci/53d3bd83-c0c2-d71f-9e5b-1dbdde55786@panix.com/ )
> and assuming that L1 Sub-States are indeed enabled for those devices, there
> are no issues reported from those devices except from the NVMe drive.
> When it comes to the NVMe driver, the code at
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/nvme/host/pci.c?h=v5.17-rc2#n3008
> has some quirks for some of the models from Dell Inc and I'm wondering if
> the model on which the issue was observed might need a quirk of its own??
> 
> So,
> Kenneth R. Crudup <kenny@panix.com>
> Could you please try this patch on top of linux-next and collect more info?
> - 'sudo lspci -vvvv' output before and after hibernation
> - could you please confirm the working of this patch for non NVMe devices that
>   support L1 Sub-States?
> - Could you please try "NVME_QUIRK_NO_DEEPEST_PS" and "NVME_QUIRK_SIMPLE_SUSPEND"
>   quirks (one at a time) in check_vendor_combination_bug() API and see if it
>   makes any difference?
> 
> Thanks & Regards,
> Vidya Sagar
> 
>  drivers/pci/pci.c       |  7 +++++++
>  drivers/pci/pci.h       |  4 ++++
>  drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 55 insertions(+)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 9ecce435fb3f..75a8b264ddac 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1617,6 +1617,7 @@ int pci_save_state(struct pci_dev *dev)
>  		return i;
>  
>  	pci_save_ltr_state(dev);
> +	pci_save_aspm_l1ss_state(dev);
>  	pci_save_dpc_state(dev);
>  	pci_save_aer_state(dev);
>  	pci_save_ptm_state(dev);
> @@ -1723,6 +1724,7 @@ void pci_restore_state(struct pci_dev *dev)
>  	 * LTR itself (in the PCIe capability).
>  	 */
>  	pci_restore_ltr_state(dev);
> +	pci_restore_aspm_l1ss_state(dev);
>  
>  	pci_restore_pcie_state(dev);
>  	pci_restore_pasid_state(dev);
> @@ -3430,6 +3432,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev)
>  	if (error)
>  		pci_err(dev, "unable to allocate suspend buffer for LTR\n");
>  
> +	error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
> +					    2 * sizeof(u32));
> +	if (error)
> +		pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
> +
>  	pci_allocate_vc_save_buffers(dev);
>  }
>  
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 3d60cabde1a1..5de1cfe07749 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev);
>  void pcie_aspm_exit_link_state(struct pci_dev *pdev);
>  void pcie_aspm_pm_state_change(struct pci_dev *pdev);
>  void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev);
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
>  #else
>  static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
>  static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
>  #endif
>  
>  #ifdef CONFIG_PCIE_ECRC
> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> index a96b7424c9bc..2c29fdd20059 100644
> --- a/drivers/pci/pcie/aspm.c
> +++ b/drivers/pci/pcie/aspm.c
> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
>  				PCI_L1SS_CTL1_L1SS_MASK, val);
>  }
>  
> +void pci_save_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +	int aspm_l1ss;
> +	struct pci_cap_saved_state *save_state;
> +	u32 *cap;
> +
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +	if (!aspm_l1ss)
> +		return;
> +
> +	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +	if (!save_state)
> +		return;
> +
> +	cap = (u32 *)&save_state->cap.data[0];
> +	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
> +	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
> +}
> +
> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
> +{
> +	int aspm_l1ss;
> +	struct pci_cap_saved_state *save_state;
> +	u32 *cap;
> +
> +	if (!pci_is_pcie(dev))
> +		return;
> +
> +	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
> +	if (!aspm_l1ss)
> +		return;
> +
> +	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
> +	if (!save_state)
> +		return;
> +
> +	cap = (u32 *)&save_state->cap.data[0];
> +	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
> +	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
> +}
> +
>  static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
>  {
>  	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
Bjorn Helgaas Feb. 4, 2022, 11:02 p.m. UTC | #11
On Wed, Feb 02, 2022 at 09:47:06AM +0530, Vidya Sagar wrote:
> On 2/2/2022 3:55 AM, Kenneth R. Crudup wrote:
> > On Wed, 2 Feb 2022, Vidya Sagar wrote:
> > > BTW, I see that the ASPM L1SS capability is supported by only
> > > two endpoints viz. KIOXIA's NVMe drive and Realtek's Card
> > > reader. None of the root ports seem to have the support. So, I'm
> > > wondering how was it even getting enabled in the first place
> > > earlier?
> > 
> > > (OR)
> > 
> > > was it the case that L1SS sub-states were never enabled earlier
> > > also and the issue was occurring without having ASPM L1SS
> > > enabled? (but with only L0s and L1 enabled??)
> > 
> > I'm not proficient enough in PCIe to be able to be sure of the
> > answers to those- what can/could I do to determine this?
>
> Nothing at this point, but could you please confirm that you are
> using the same system as before? if that is the case, then, I'm not
> sure how is it possible that the earlier patch which is also for
> saving/restoring L1SS registers could affect a system that doesn't
> even support L1SS.
> 
> Bjorn, any thoughts on this?

Do we have a theory on what might have caused the regression before?
Or at least something that is different this time around?

There's been a lot of discussion on this thread, so I'm going to
ignore this patch for now.  If you think it's good to go, please post
it again with the relevant Tested-bys and a note about why we think it
should work this time when it didn't last time.

Bjorn

> > > Also, I see that from 'before' and 'after' logs that for both
> > > NVMe and Card reader and their corresponding root ports, none of
> > > the ASPM states are enabled (not even L0s or L1).
> > > Did you set the policy to 'powersupersave' before hibernating
> > > the system?
> > 
> > Yeah:
> > 
> > CONFIG_PCIEASPM_POWER_SUPERSAVE=y
> > 
> > My laptop loses ~1.5%/hr in S3; I was trying anything I could to reduce that,
> > if possible.
> > 
> >          -Kenny
> > 
> > --
> > Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA
> >
Kenneth Crudup Feb. 4, 2022, 11:17 p.m. UTC | #12
On Fri, 4 Feb 2022, Bjorn Helgaas wrote:

> Do we have a theory on what might have caused the regression before?
> Or at least something that is different this time around?

If you'd like, I could try re-applying the previous problem commit or your
attempted fix on top of Linus' master if you'd like to see if something was
fixed somewhere else in the PCIe subsystem, but if you think it's not worth-
while I'm satisfied with the current fix (or probably more-exactly for my
particular machine, lack of regression).

	-Kenny
Vidya Sagar Feb. 5, 2022, 4:44 p.m. UTC | #13
On 2/5/2022 4:47 AM, Kenneth R. Crudup wrote:
> External email: Use caution opening links or attachments
> 
> 
> On Fri, 4 Feb 2022, Bjorn Helgaas wrote:
> 
>> Do we have a theory on what might have caused the regression before?
>> Or at least something that is different this time around?
> 
> If you'd like, I could try re-applying the previous problem commit or your
> attempted fix on top of Linus' master if you'd like to see if something was
> fixed somewhere else in the PCIe subsystem, but if you think it's not worth-
> while I'm satisfied with the current fix (or probably more-exactly for my
> particular machine, lack of regression).
That would be a good starting point to understand it better. In fact if 
the previous problematic patch works fine on master, then, we are sure 
that something in the sub-system would have fixed the issue.
I'll wait for the outcome of this experiment before re-sending my patch.
Thanks in advance Kenny for volunteering for this activity.

- Vidya Sagar
> 
>          -Kenny
> 
> --
> Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA
>
Kenneth Crudup Feb. 5, 2022, 5:32 p.m. UTC | #14
On Sat, 5 Feb 2022, Kenneth R. Crudup wrote:

> > If you'd like, I could try re-applying the previous problem commit or your
> > attempted fix on top of Linus' master if you'd like to see if something was
> > fixed somewhere else in the PCIe subsystem, but if you think it's not worth-
> > while I'm satisfied with the current fix (or probably more-exactly for my
> > particular machine, lack of regression).

On Sat, 5 Feb 2022, Vidya Sagar wrote:

> That would be a good starting point to understand it better. In fact if the
> previous problematic patch works fine on master, then, we are sure that
> something in the sub-system would have fixed the issue.

So this is my report of the regression I'd found with Bjorn's original commit:
----
Date: Fri, 25 Dec 2020 16:38:56
From: Kenneth R. Crudup <kenny@panix.com>
To: vidyas@nvidia.com
Cc: bhelgaas@google.com
Subject: Commit 4257f7e0 ("PCI/ASPM: Save/restore L1SS Capability for suspend/resume") causing hibernate resume
    failures

I've been running Linus' master branch on my laptop (Dell XPS 13 2-in-1). With
this commit in place, after resuming from hibernate my machine is essentially
useless, with a torrent of disk I/O errors on my NVMe device (at least, and
possibly other devices affected) until a reboot.

I do use tlp to set the PCIe ASPM to "performance" on AC and "powersupersave"
on battery.

Let me know if you need more information.
----

I just reapplied it on top of Linus' master and not only did it go in cleanly(!),
NOW I'm not getting any issues after a suspend/resume. I've attached the output
of "lspci -vvvvnn" before a hibernation (but not the very *first* one; if you
need that output, let me know) and will submit the same post-hibernation (which
is the same as the pre-hibernation case) and the post-suspend case (which is
slightly different) in subsequent E-mails (due to attachment size).

	-Kenny
Kenneth Crudup Feb. 5, 2022, 5:33 p.m. UTC | #15
After-suspend lspci

	-Kenny
diff mbox series

Patch

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 9ecce435fb3f..75a8b264ddac 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1617,6 +1617,7 @@  int pci_save_state(struct pci_dev *dev)
 		return i;
 
 	pci_save_ltr_state(dev);
+	pci_save_aspm_l1ss_state(dev);
 	pci_save_dpc_state(dev);
 	pci_save_aer_state(dev);
 	pci_save_ptm_state(dev);
@@ -1723,6 +1724,7 @@  void pci_restore_state(struct pci_dev *dev)
 	 * LTR itself (in the PCIe capability).
 	 */
 	pci_restore_ltr_state(dev);
+	pci_restore_aspm_l1ss_state(dev);
 
 	pci_restore_pcie_state(dev);
 	pci_restore_pasid_state(dev);
@@ -3430,6 +3432,11 @@  void pci_allocate_cap_save_buffers(struct pci_dev *dev)
 	if (error)
 		pci_err(dev, "unable to allocate suspend buffer for LTR\n");
 
+	error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS,
+					    2 * sizeof(u32));
+	if (error)
+		pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n");
+
 	pci_allocate_vc_save_buffers(dev);
 }
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3d60cabde1a1..5de1cfe07749 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -562,11 +562,15 @@  void pcie_aspm_init_link_state(struct pci_dev *pdev);
 void pcie_aspm_exit_link_state(struct pci_dev *pdev);
 void pcie_aspm_pm_state_change(struct pci_dev *pdev);
 void pcie_aspm_powersave_config_link(struct pci_dev *pdev);
+void pci_save_aspm_l1ss_state(struct pci_dev *dev);
+void pci_restore_aspm_l1ss_state(struct pci_dev *dev);
 #else
 static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { }
 static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { }
 static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { }
+static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { }
+static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { }
 #endif
 
 #ifdef CONFIG_PCIE_ECRC
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index a96b7424c9bc..2c29fdd20059 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -726,6 +726,50 @@  static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
 				PCI_L1SS_CTL1_L1SS_MASK, val);
 }
 
+void pci_save_aspm_l1ss_state(struct pci_dev *dev)
+{
+	int aspm_l1ss;
+	struct pci_cap_saved_state *save_state;
+	u32 *cap;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!aspm_l1ss)
+		return;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!save_state)
+		return;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++);
+	pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++);
+}
+
+void pci_restore_aspm_l1ss_state(struct pci_dev *dev)
+{
+	int aspm_l1ss;
+	struct pci_cap_saved_state *save_state;
+	u32 *cap;
+
+	if (!pci_is_pcie(dev))
+		return;
+
+	aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!aspm_l1ss)
+		return;
+
+	save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS);
+	if (!save_state)
+		return;
+
+	cap = (u32 *)&save_state->cap.data[0];
+	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++);
+	pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++);
+}
+
 static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
 {
 	pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,