diff mbox series

[v1] rtw88: pci: disable aspm for platform inter-op with module parameter

Message ID 20200605074703.32726-1-yhchuang@realtek.com (mailing list archive)
State Accepted
Commit 68aa716b7dd36f55e080da9e27bc594346334c41
Delegated to: Kalle Valo
Headers show
Series [v1] rtw88: pci: disable aspm for platform inter-op with module parameter | expand

Commit Message

Tony Chuang June 5, 2020, 7:47 a.m. UTC
From: Yan-Hsuan Chuang <yhchuang@realtek.com>

Some platforms cannot read the DBI register successfully for the
ASPM settings. After the read failed, the bus could be unstable,
and the device just became unavailable [1]. For those platforms,
the ASPM should be disabled. But as the ASPM can help the driver
to save the power consumption in power save mode, the ASPM is still
needed. So, add a module parameter for them to disable it, then
the device can still work, while others can benefit from the less
power consumption that brings by ASPM enabled.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=206411
[2] Note that my lenovo T430 is the same.

Fixes: 3dff7c6e3749 ("rtw88: allows to enable/disable HCI link PS mechanism")
Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Sebastian Andrzej Siewior June 10, 2020, 9:37 p.m. UTC | #1
On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> 
> Some platforms cannot read the DBI register successfully for the
> ASPM settings. After the read failed, the bus could be unstable,
> and the device just became unavailable [1]. For those platforms,
> the ASPM should be disabled. But as the ASPM can help the driver
> to save the power consumption in power save mode, the ASPM is still
> needed. So, add a module parameter for them to disable it, then
> the device can still work, while others can benefit from the less
> power consumption that brings by ASPM enabled.

Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
is save to use?

If someone notices the warning they still have to search for the warning
in order to make the link towards loading the module with the
disable_aspm=1 paramter.
Is it known what causes the failure?

> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411
> [2] Note that my lenovo T430 is the same.

Sebastian
Tony Chuang June 16, 2020, 11:06 a.m. UTC | #2
> On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >
> > Some platforms cannot read the DBI register successfully for the
> > ASPM settings. After the read failed, the bus could be unstable,
> > and the device just became unavailable [1]. For those platforms,
> > the ASPM should be disabled. But as the ASPM can help the driver
> > to save the power consumption in power save mode, the ASPM is still
> > needed. So, add a module parameter for them to disable it, then
> > the device can still work, while others can benefit from the less
> > power consumption that brings by ASPM enabled.
> 
> Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
> is save to use?
> 
> If someone notices the warning they still have to search for the warning
> in order to make the link towards loading the module with the
> disable_aspm=1 paramter.
> Is it known what causes the failure?
> 

I think as long as the rtw_dbi_read() fails, the consequent register
operation will also fail, and still get an error read/write the register.
And this is some sort of PCI issue, and I am not really familiar with it.
Such as the root cause or how it fails.

If we can default disable it, then we can help those platforms, but
then other platform will suffer from higher power consumption.

Yen-Hsuan
Sebastian Andrzej Siewior June 16, 2020, 1:35 p.m. UTC | #3
On 2020-06-16 11:06:28 [+0000], Tony Chuang wrote:
> > On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> > >
> > > Some platforms cannot read the DBI register successfully for the
> > > ASPM settings. After the read failed, the bus could be unstable,
> > > and the device just became unavailable [1]. For those platforms,
> > > the ASPM should be disabled. But as the ASPM can help the driver
> > > to save the power consumption in power save mode, the ASPM is still
> > > needed. So, add a module parameter for them to disable it, then
> > > the device can still work, while others can benefit from the less
> > > power consumption that brings by ASPM enabled.
> > 
> > Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
> > is save to use?
> > 
> > If someone notices the warning they still have to search for the warning
> > in order to make the link towards loading the module with the
> > disable_aspm=1 paramter.
> > Is it known what causes the failure?
> > 
> 
> I think as long as the rtw_dbi_read() fails, the consequent register
> operation will also fail, and still get an error read/write the register.
> And this is some sort of PCI issue, and I am not really familiar with it.
> Such as the root cause or how it fails.

Then it does not sound safe to enable it by default.

> If we can default disable it, then we can help those platforms, but
> then other platform will suffer from higher power consumption.

So for those platform, where the error occurs, you expect that the user
manages to read the error message (a backtrace from rtw_dbi_read8()) and
connects this the need to set a certain module option.

> Yen-Hsuan

Sebastian
Tony Chuang June 17, 2020, 5:30 a.m. UTC | #4
0000], Tony Chuang wrote:
> > > On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> > > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> > > >
> > > > Some platforms cannot read the DBI register successfully for the
> > > > ASPM settings. After the read failed, the bus could be unstable,
> > > > and the device just became unavailable [1]. For those platforms,
> > > > the ASPM should be disabled. But as the ASPM can help the driver
> > > > to save the power consumption in power save mode, the ASPM is still
> > > > needed. So, add a module parameter for them to disable it, then
> > > > the device can still work, while others can benefit from the less
> > > > power consumption that brings by ASPM enabled.
> > >
> > > Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
> > > is save to use?
> > >
> > > If someone notices the warning they still have to search for the warning
> > > in order to make the link towards loading the module with the
> > > disable_aspm=1 paramter.
> > > Is it known what causes the failure?
> > >
> >
> > I think as long as the rtw_dbi_read() fails, the consequent register
> > operation will also fail, and still get an error read/write the register.
> > And this is some sort of PCI issue, and I am not really familiar with it.
> > Such as the root cause or how it fails.
> 
> Then it does not sound safe to enable it by default.

We have had a discussion about this, but I cannot find the thread now.
People suggested that the module parameter should not be used.
And they think that if the ASPM can help for power consumption, then
it should be default enabled. But I think it should be based on that the
other platforms will not just fail to bring up the device. However, the
platforms are less than the others, not sure if default enable or disable
is better.

> 
> > If we can default disable it, then we can help those platforms, but
> > then other platform will suffer from higher power consumption.
> 
> So for those platform, where the error occurs, you expect that the user
> manages to read the error message (a backtrace from rtw_dbi_read8()) and
> connects this the need to set a certain module option.

Yes, we can discuss if it should be default enabled or not. Otherwise the
people with those platforms can only do that to prevent this. Really bad.

> Sebastian
> 

Yen-Hsuan
Sebastian Andrzej Siewior June 17, 2020, 7:29 a.m. UTC | #5
On 2020-06-17 05:30:22 [+0000], Tony Chuang wrote:
> 0000], Tony Chuang wrote:
> > > > On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> > > > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> > > > >
> > > > > Some platforms cannot read the DBI register successfully for the
> > > > > ASPM settings. After the read failed, the bus could be unstable,
> > > > > and the device just became unavailable [1]. For those platforms,
> > > > > the ASPM should be disabled. But as the ASPM can help the driver
> > > > > to save the power consumption in power save mode, the ASPM is still
> > > > > needed. So, add a module parameter for them to disable it, then
> > > > > the device can still work, while others can benefit from the less
> > > > > power consumption that brings by ASPM enabled.
> > > >
> > > > Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
> > > > is save to use?
> > > >
> > > > If someone notices the warning they still have to search for the warning
> > > > in order to make the link towards loading the module with the
> > > > disable_aspm=1 paramter.
> > > > Is it known what causes the failure?
> > > >
> > >
> > > I think as long as the rtw_dbi_read() fails, the consequent register
> > > operation will also fail, and still get an error read/write the register.
> > > And this is some sort of PCI issue, and I am not really familiar with it.
> > > Such as the root cause or how it fails.
> > 
> > Then it does not sound safe to enable it by default.
> 
> We have had a discussion about this, but I cannot find the thread now.
> People suggested that the module parameter should not be used.
> And they think that if the ASPM can help for power consumption, then
> it should be default enabled. But I think it should be based on that the
> other platforms will not just fail to bring up the device. However, the
> platforms are less than the others, not sure if default enable or disable
> is better.

What I fail to understand is if this error affects other PCI devices as
well or just this one. And if it is possible to reset the wifi device
and everything gets back no normal. Or is it just the register reading,
that spams the log and would affect the system otherwise if you would
just avoided after the first fail.

> > > If we can default disable it, then we can help those platforms, but
> > > then other platform will suffer from higher power consumption.
> > 
> > So for those platform, where the error occurs, you expect that the user
> > manages to read the error message (a backtrace from rtw_dbi_read8()) and
> > connects this the need to set a certain module option.
> 
> Yes, we can discuss if it should be default enabled or not. Otherwise the
> people with those platforms can only do that to prevent this. Really bad.

It would be good to know the root cause of this. So then default enable
would depend on it.
You could have a allow/forbid list based on DMI once you identified
good/bad systems but this includes additional maintenance.

I think that at the very least, if the read fails you should give the
user additional information how to stop this from happening again. And
either stop issuing the commands again or skip driver loading (depending
what it means for device stability).

> Yen-Hsuan

Sebastian
Larry Finger June 20, 2020, 7:53 p.m. UTC | #6
On 6/5/20 2:47 AM, yhchuang@realtek.com wrote:
> From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> 
> Some platforms cannot read the DBI register successfully for the
> ASPM settings. After the read failed, the bus could be unstable,
> and the device just became unavailable [1]. For those platforms,
> the ASPM should be disabled. But as the ASPM can help the driver
> to save the power consumption in power save mode, the ASPM is still
> needed. So, add a module parameter for them to disable it, then
> the device can still work, while others can benefit from the less
> power consumption that brings by ASPM enabled.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411
> [2] Note that my lenovo T430 is the same.

As someone who maintains these drivers in a GitHub repo so that users of older 
kernels can have access to them, I am in favor of this module option. Only a 
very few cases would need to disable ASPM, and I see no reason for everyone else 
to use additional power as would be needed with automatic disabling. Adding a 
new machine to some quirk list would be more difficult than merely telling a 
novice user how to turn ASPM off for their system.

In case someone is collecting machines that would need a quirk, I found another 
one as shown in https://github.com/lwfinger/rtlwifi_new/issues/622. That one is 
a Lenovo Thinkpad E490.

Larry
Kalle Valo July 15, 2020, 9 a.m. UTC | #7
Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:

> On 2020-06-17 05:30:22 [+0000], Tony Chuang wrote:
>> 0000], Tony Chuang wrote:
>> > > > On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
>> > > > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
>> > > > >
>> > > > > Some platforms cannot read the DBI register successfully for the
>> > > > > ASPM settings. After the read failed, the bus could be unstable,
>> > > > > and the device just became unavailable [1]. For those platforms,
>> > > > > the ASPM should be disabled. But as the ASPM can help the driver
>> > > > > to save the power consumption in power save mode, the ASPM is still
>> > > > > needed. So, add a module parameter for them to disable it, then
>> > > > > the device can still work, while others can benefit from the less
>> > > > > power consumption that brings by ASPM enabled.
>> > > >
>> > > > Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
>> > > > is save to use?
>> > > >
>> > > > If someone notices the warning they still have to search for the warning
>> > > > in order to make the link towards loading the module with the
>> > > > disable_aspm=1 paramter.
>> > > > Is it known what causes the failure?
>> > > >
>> > >
>> > > I think as long as the rtw_dbi_read() fails, the consequent register
>> > > operation will also fail, and still get an error read/write the register.
>> > > And this is some sort of PCI issue, and I am not really familiar with it.
>> > > Such as the root cause or how it fails.
>> > 
>> > Then it does not sound safe to enable it by default.
>> 
>> We have had a discussion about this, but I cannot find the thread now.
>> People suggested that the module parameter should not be used.
>> And they think that if the ASPM can help for power consumption, then
>> it should be default enabled. But I think it should be based on that the
>> other platforms will not just fail to bring up the device. However, the
>> platforms are less than the others, not sure if default enable or disable
>> is better.
>
> What I fail to understand is if this error affects other PCI devices as
> well or just this one. And if it is possible to reset the wifi device
> and everything gets back no normal. Or is it just the register reading,
> that spams the log and would affect the system otherwise if you would
> just avoided after the first fail.
>
>> > > If we can default disable it, then we can help those platforms, but
>> > > then other platform will suffer from higher power consumption.
>> > 
>> > So for those platform, where the error occurs, you expect that the user
>> > manages to read the error message (a backtrace from rtw_dbi_read8()) and
>> > connects this the need to set a certain module option.
>> 
>> Yes, we can discuss if it should be default enabled or not. Otherwise the
>> people with those platforms can only do that to prevent this. Really bad.
>
> It would be good to know the root cause of this. So then default enable
> would depend on it.
> You could have a allow/forbid list based on DMI once you identified
> good/bad systems but this includes additional maintenance.

I think there should be this kind of quirk list in rtw88 which would
disable ASPM automatically on problematic platforms. We should not
require the user to figure out the problem on their own and disable ASPM
manually using the module parameter.

> I think that at the very least, if the read fails you should give the
> user additional information how to stop this from happening again. And
> either stop issuing the commands again or skip driver loading (depending
> what it means for device stability).

Yes, if we can guess that this is an ASPM problem giving additional
information is very helpful for the user, please do that as well.
Kalle Valo July 15, 2020, 9:10 a.m. UTC | #8
<yhchuang@realtek.com> wrote:

> From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> 
> Some platforms cannot read the DBI register successfully for the
> ASPM settings. After the read failed, the bus could be unstable,
> and the device just became unavailable [1]. For those platforms,
> the ASPM should be disabled. But as the ASPM can help the driver
> to save the power consumption in power save mode, the ASPM is still
> needed. So, add a module parameter for them to disable it, then
> the device can still work, while others can benefit from the less
> power consumption that brings by ASPM enabled.
> 
> [1] https://bugzilla.kernel.org/show_bug.cgi?id=206411
> [2] Note that my lenovo T430 is the same.
> 
> Fixes: 3dff7c6e3749 ("rtw88: allows to enable/disable HCI link PS mechanism")
> Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>

Patch applied to wireless-drivers-next.git, thanks.

68aa716b7dd3 rtw88: pci: disable aspm for platform inter-op with module parameter
Tony Chuang July 15, 2020, 9:31 a.m. UTC | #9
> Sebastian Andrzej Siewior <bigeasy@linutronix.de> writes:
> 
> > On 2020-06-17 05:30:22 [+0000], Tony Chuang wrote:
> >> 0000], Tony Chuang wrote:
> >> > > > On 2020-06-05 15:47:03 [+0800], yhchuang@realtek.com wrote:
> >> > > > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >> > > > >
> >> > > > > Some platforms cannot read the DBI register successfully for the
> >> > > > > ASPM settings. After the read failed, the bus could be unstable,
> >> > > > > and the device just became unavailable [1]. For those platforms,
> >> > > > > the ASPM should be disabled. But as the ASPM can help the driver
> >> > > > > to save the power consumption in power save mode, the ASPM is still
> >> > > > > needed. So, add a module parameter for them to disable it, then
> >> > > > > the device can still work, while others can benefit from the less
> >> > > > > power consumption that brings by ASPM enabled.
> >> > > >
> >> > > > Can you set disable_aspm if rtw_dbi_read8() fails? Or make a test if it
> >> > > > is save to use?
> >> > > >
> >> > > > If someone notices the warning they still have to search for the warning
> >> > > > in order to make the link towards loading the module with the
> >> > > > disable_aspm=1 paramter.
> >> > > > Is it known what causes the failure?
> >> > > >
> >> > >
> >> > > I think as long as the rtw_dbi_read() fails, the consequent register
> >> > > operation will also fail, and still get an error read/write the register.
> >> > > And this is some sort of PCI issue, and I am not really familiar with it.
> >> > > Such as the root cause or how it fails.
> >> >
> >> > Then it does not sound safe to enable it by default.
> >>
> >> We have had a discussion about this, but I cannot find the thread now.
> >> People suggested that the module parameter should not be used.
> >> And they think that if the ASPM can help for power consumption, then
> >> it should be default enabled. But I think it should be based on that the
> >> other platforms will not just fail to bring up the device. However, the
> >> platforms are less than the others, not sure if default enable or disable
> >> is better.
> >
> > What I fail to understand is if this error affects other PCI devices as
> > well or just this one. And if it is possible to reset the wifi device
> > and everything gets back no normal. Or is it just the register reading,
> > that spams the log and would affect the system otherwise if you would
> > just avoided after the first fail.
> >
> >> > > If we can default disable it, then we can help those platforms, but
> >> > > then other platform will suffer from higher power consumption.
> >> >
> >> > So for those platform, where the error occurs, you expect that the user
> >> > manages to read the error message (a backtrace from rtw_dbi_read8())
> and
> >> > connects this the need to set a certain module option.
> >>
> >> Yes, we can discuss if it should be default enabled or not. Otherwise the
> >> people with those platforms can only do that to prevent this. Really bad.
> >
> > It would be good to know the root cause of this. So then default enable
> > would depend on it.
> > You could have a allow/forbid list based on DMI once you identified
> > good/bad systems but this includes additional maintenance.
> 
> I think there should be this kind of quirk list in rtw88 which would
> disable ASPM automatically on problematic platforms. We should not
> require the user to figure out the problem on their own and disable ASPM
> manually using the module parameter.

OK, I'll add a quirk list for the platforms.

> 
> > I think that at the very least, if the read fails you should give the
> > user additional information how to stop this from happening again. And
> > either stop issuing the commands again or skip driver loading (depending
> > what it means for device stability).
> 
> Yes, if we can guess that this is an ASPM problem giving additional
> information is very helpful for the user, please do that as well.
> 

Yes, should add additional information for the users, so they can
help to report the platforms.

Yen-Hsuan
diff mbox series

Patch

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 8228db9a5fc8..3413973bc475 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -14,8 +14,11 @@ 
 #include "debug.h"
 
 static bool rtw_disable_msi;
+static bool rtw_pci_disable_aspm;
 module_param_named(disable_msi, rtw_disable_msi, bool, 0644);
+module_param_named(disable_aspm, rtw_pci_disable_aspm, bool, 0644);
 MODULE_PARM_DESC(disable_msi, "Set Y to disable MSI interrupt support");
+MODULE_PARM_DESC(disable_aspm, "Set Y to disable PCI ASPM support");
 
 static u32 rtw_pci_tx_queue_idx_addr[] = {
 	[RTW_TX_QUEUE_BK]	= RTK_PCI_TXBD_IDX_BKQ,
@@ -1200,6 +1203,9 @@  static void rtw_pci_clkreq_set(struct rtw_dev *rtwdev, bool enable)
 	u8 value;
 	int ret;
 
+	if (rtw_pci_disable_aspm)
+		return;
+
 	ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
 	if (ret) {
 		rtw_err(rtwdev, "failed to read CLKREQ_L1, ret=%d", ret);
@@ -1219,6 +1225,9 @@  static void rtw_pci_aspm_set(struct rtw_dev *rtwdev, bool enable)
 	u8 value;
 	int ret;
 
+	if (rtw_pci_disable_aspm)
+		return;
+
 	ret = rtw_dbi_read8(rtwdev, RTK_PCIE_LINK_CFG, &value);
 	if (ret) {
 		rtw_err(rtwdev, "failed to read ASPM, ret=%d", ret);