diff mbox series

[1/3] mwifiex: disable ps_mode explicitly by default instead

Message ID 20201028142433.18501-2-kitakar@gmail.com (mailing list archive)
State Not Applicable
Delegated to: Netdev Maintainers
Headers show
Series mwifiex: disable ps_mode by default for stability | expand

Commit Message

Tsuchiya Yuto Oct. 28, 2020, 2:24 p.m. UTC
On Microsoft Surface devices (PCIe-88W8897), the ps_mode causes
connection unstable, especially with 5GHz APs. Then, it eventually causes
fw crash.

This commit disables ps_mode by default instead of enabling it.

Required code is extracted from mwifiex_drv_set_power().

Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
---
 drivers/net/wireless/marvell/mwifiex/sta_cmd.c | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

Comments

Brian Norris Oct. 29, 2020, 6:25 p.m. UTC | #1
On Wed, Oct 28, 2020 at 7:04 PM Tsuchiya Yuto <kitakar@gmail.com> wrote:
>
> On Microsoft Surface devices (PCIe-88W8897), the ps_mode causes
> connection unstable, especially with 5GHz APs. Then, it eventually causes
> fw crash.
>
> This commit disables ps_mode by default instead of enabling it.
>
> Required code is extracted from mwifiex_drv_set_power().
>
> Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>

You should read up on WIPHY_FLAG_PS_ON_BY_DEFAULT and
CONFIG_CFG80211_DEFAULT_PS, and set/respect those appropriately (hint:
mwifiex sets WIPHY_FLAG_PS_ON_BY_DEFAULT, and your patch makes this a
lie). Also, this seems like a quirk that you haven't properly worked
out -- if you're working on a quirk framework in your other series,
you should just key into that.

For the record, Chrome OS supports plenty of mwifiex systems with 8897
(SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
those. Your problem sounds to be exclusively a problem with the PCIe
8897 firmware.

As-is, NAK.

Brian
Andy Shevchenko Oct. 29, 2020, 6:36 p.m. UTC | #2
On Thu, Oct 29, 2020 at 8:29 PM Brian Norris <briannorris@chromium.org> wrote:
> On Wed, Oct 28, 2020 at 7:04 PM Tsuchiya Yuto <kitakar@gmail.com> wrote:

...

> For the record, Chrome OS supports plenty of mwifiex systems with 8897
> (SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
> those. Your problem sounds to be exclusively a problem with the PCIe
> 8897 firmware.

And this feeling (that it's a FW issue) what I have. But the problem
here, that Marvell didn't fix and probably won't fix their FW...

Just wondering if Google (and MS in their turn) use different
firmwares to what we have available in Linux.
Brian Norris Oct. 29, 2020, 6:56 p.m. UTC | #3
On Thu, Oct 29, 2020 at 11:37 AM Andy Shevchenko
<andy.shevchenko@gmail.com> wrote:
> And this feeling (that it's a FW issue) what I have. But the problem
> here, that Marvell didn't fix and probably won't fix their FW...

Sure, I wouldn't hold your breath. So some of these tactics (disabling
PS, etc.) may be valid, but you have to do them smartly, acknowledging
that there are other (more stable) firmwares and chips in use for this
same driver.

> Just wondering if Google (and MS in their turn) use different
> firmwares to what we have available in Linux.

No clue about MS. But Chrom{e,ium} OS generally publishes all this
stuff where possible. You can see what we use here:

https://chromium.googlesource.com/chromiumos/third_party/linux-firmware/+/HEAD/mrvl/
https://chromium.googlesource.com/chromiumos/third_party/marvell/+/HEAD/

We try to stay somewhat in sync / parallel with "upstream"
linux-firmware, and strongly encourage vendors to send the same
binaries upstream when they hand them to us, but there are exceptions
and oversights (e.g., old products might have used a different
firmware branch).

Notably, I'll repeat: we (Chrome OS) don't actually support the PCIe
variant of 8897, so the report in question ("PCIe-88W8897") has no
equivalent in a supported Chrome OS system (even if there are binaries
in the links above, we don't use them). I would not be surprised if
there are an enormous number of firmware bugs there, as there were
initially for PCIe-88W8997 (which we do support).

Brian
Tsuchiya Yuto Oct. 30, 2020, 8:04 a.m. UTC | #4
On Thu, 2020-10-29 at 11:25 -0700, Brian Norris wrote:
> On Wed, Oct 28, 2020 at 7:04 PM Tsuchiya Yuto <kitakar@gmail.com> wrote:
> > 
> > On Microsoft Surface devices (PCIe-88W8897), the ps_mode causes
> > connection unstable, especially with 5GHz APs. Then, it eventually causes
> > fw crash.
> > 
> > This commit disables ps_mode by default instead of enabling it.
> > 
> > Required code is extracted from mwifiex_drv_set_power().
> > 
> > Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com>
> 
> You should read up on WIPHY_FLAG_PS_ON_BY_DEFAULT and
> CONFIG_CFG80211_DEFAULT_PS, and set/respect those appropriately (hint:
> mwifiex sets WIPHY_FLAG_PS_ON_BY_DEFAULT, and your patch makes this a
> lie). Also, this seems like a quirk that you haven't properly worked
> out -- if you're working on a quirk framework in your other series,
> you should just key into that.

Thanks for the review! I didn't know about the flag, much appreciated.
By setting the flag to false explicitly, indeed userspace doesn't try
to enable power_save now at least for this short amount of time. I wonder
if I can drop the second patch (adding module parameter) now. But I still
want to make sure that power_save won't be enabled by userspace tools by
default.

Regarding quirks, I also don't want to break existing users. So, of course
I can try to use the quirk framework if we really can't fix the firmware.

> For the record, Chrome OS supports plenty of mwifiex systems with 8897
> (SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
> those. Your problem sounds to be exclusively a problem with the PCIe
> 8897 firmware.

Actually, I already know that some Chromebooks use these mwifiex cards
(but not out PCIe-88W8897) because I personally like chromiumos. I'm
always wondering what is the difference. If the difference is firmware,
our PCIe-88W8897 firmware should really be fixed instead of this stupid
series.

Yes, I'm sorry that I know this series is just a stupid one but I have to
send this anyway because this stability issue has not been fixed for a
long time. I should have added this buglink to every commit as well:

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681

If the firmware can't be fixed, I'm afraid I have to go this way. It makes
no sense to keep enabling power_save for the affected devices if we know
it's broken.

> As-is, NAK.
> 
> Brian
Brian Norris Nov. 20, 2020, 9:04 p.m. UTC | #5
On Fri, Oct 30, 2020 at 1:04 AM Tsuchiya Yuto <kitakar@gmail.com> wrote:
> On Thu, 2020-10-29 at 11:25 -0700, Brian Norris wrote:
> > For the record, Chrome OS supports plenty of mwifiex systems with 8897
> > (SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
> > those. Your problem sounds to be exclusively a problem with the PCIe
> > 8897 firmware.
>
> Actually, I already know that some Chromebooks use these mwifiex cards
> (but not out PCIe-88W8897) because I personally like chromiumos. I'm
> always wondering what is the difference. If the difference is firmware,
> our PCIe-88W8897 firmware should really be fixed instead of this stupid
> series.

PCIe is a very different beast. (For one, it uses DMA and
memory-mapped registers, where SDIO has neither.) It was a very
difficult slog to get PCIe/8997 working reliably for the few
Chromebooks that shipped it, and lots of that work is in firmware. I
would not be surprised if the PCIe-related changes Marvell made for
8997 never fed back into their PCIe-8897 firmware. Or maybe they only
ever launched PCIe-8897 for Windows, and the Windows driver included
workarounds that were never published to their Linux driver. But now
I'm just speculating.

> Yes, I'm sorry that I know this series is just a stupid one but I have to
> send this anyway because this stability issue has not been fixed for a
> long time. I should have added this buglink to every commit as well:
>
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681
>
> If the firmware can't be fixed, I'm afraid I have to go this way. It makes
> no sense to keep enabling power_save for the affected devices if we know
> it's broken.

Condolences and sympathy, seriously. You likely have little chance of
getting the firmware fixed, so without new information (e.g,. other
workarounds?), this is the probably the right way to go.

Brian
Tsuchiya Yuto Nov. 26, 2020, 7:44 p.m. UTC | #6
On Fri, 2020-11-20 at 13:04 -0800, Brian Norris wrote:
> On Fri, Oct 30, 2020 at 1:04 AM Tsuchiya Yuto <kitakar@gmail.com> wrote:
> > On Thu, 2020-10-29 at 11:25 -0700, Brian Norris wrote:
> > > For the record, Chrome OS supports plenty of mwifiex systems with 8897
> > > (SDIO only) and 8997 (PCIe), with PS enabled, and you're hurting
> > > those. Your problem sounds to be exclusively a problem with the PCIe
> > > 8897 firmware.
> > 
> > Actually, I already know that some Chromebooks use these mwifiex cards
> > (but not out PCIe-88W8897) because I personally like chromiumos. I'm
> > always wondering what is the difference. If the difference is firmware,
> > our PCIe-88W8897 firmware should really be fixed instead of this stupid
> > series.
> 
> PCIe is a very different beast. (For one, it uses DMA and
> memory-mapped registers, where SDIO has neither.) It was a very
> difficult slog to get PCIe/8997 working reliably for the few
> Chromebooks that shipped it, and lots of that work is in firmware. I
> would not be surprised if the PCIe-related changes Marvell made for
> 8997 never fed back into their PCIe-8897 firmware. Or maybe they only
> ever launched PCIe-8897 for Windows, and the Windows driver included
> workarounds that were never published to their Linux driver. But now
> I'm just speculating.

Thanks. Yeah, this is indeed hard work. Actually, I (and maybe also other
users) am already thankful that there is wifi driver/firmware available
on Linux :) and it'll be greater if we can fix ps_mode-related issues.

> > Yes, I'm sorry that I know this series is just a stupid one but I have to
> > send this anyway because this stability issue has not been fixed for a
> > long time. I should have added this buglink to every commit as well:
> > 
> > BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=109681
> > 
> > If the firmware can't be fixed, I'm afraid I have to go this way. It makes
> > no sense to keep enabling power_save for the affected devices if we know
> > it's broken.
> 
> Condolences and sympathy, seriously. You likely have little chance of
> getting the firmware fixed, so without new information (e.g,. other
> workarounds?), this is the probably the right way to go.

Thank you for the pointer!

There are two issues regarding ps_mode:
1) fw crashes with "Firmware wakeup failed"
   (I haven't mentioned in this series, but ps_mode also causes fw crashes)
2) connection instability (like large ping delay or even ping not reaching)

If anyone is ever interested in dmesg log with debug_mask=0xffffffff and
device_dump, I posted them to the Bugzilla [1] before.

Regarding the #2, although this is even not a workaround but I found
scanning APs will fix this. So, when I encounter this issue, I keep
scanning APs like "watch -n10 sudo iw dev ${dev_name} scan". So, it
seems that scanning APs will somehow wake wifi up? In other words, wifi
is sleeping when it shouldn't? or wifi somehow failed to wake up when
it should?

Regarding #1, we don't have any ideas yet. There is a guess that memory
leak will occur in the fw every time wifi goes into sleep, but don't know.

We even don't have the exact reproducers for both #1 and #2. What we
know so far is that, enabling ps_mode causes these issues.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=109681#c130

> Brian
diff mbox series

Patch

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
index d3a968ef21ef9..9b7b52fbc9c45 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_cmd.c
@@ -2333,14 +2333,19 @@  int mwifiex_sta_init_cmd(struct mwifiex_private *priv, u8 first_sta, bool init)
 			return -1;
 
 		if (priv->bss_type != MWIFIEX_BSS_TYPE_UAP) {
-			/* Enable IEEE PS by default */
-			priv->adapter->ps_mode = MWIFIEX_802_11_POWER_MODE_PSP;
+			/* Disable IEEE PS by default */
+			priv->adapter->ps_mode = MWIFIEX_802_11_POWER_MODE_CAM;
 			ret = mwifiex_send_cmd(priv,
 					       HostCmd_CMD_802_11_PS_MODE_ENH,
-					       EN_AUTO_PS, BITMAP_STA_PS, NULL,
+					       DIS_AUTO_PS, BITMAP_STA_PS, NULL,
 					       true);
 			if (ret)
 				return -1;
+			ret = mwifiex_send_cmd(priv,
+					       HostCmd_CMD_802_11_PS_MODE_ENH,
+					       GET_PS, 0, NULL, false);
+			if (ret)
+				return -1;
 		}
 
 		if (drcs) {