Message ID | 20201028142625.18642-1-kitakar@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Kalle Valo |
Headers | show |
Series | mwifiex: pcie: add enable_device_dump module parameter | expand |
On Wed, Oct 28, 2020 at 3:58 PM Tsuchiya Yuto <kitakar@gmail.com> wrote: > > The devicve_dump may take a little bit long time and users may want to > disable the dump for daily usage. > > This commit adds a new module parameter enable_device_dump and disables > the device_dump by default. As with one of your other patches, please don't change the defaults and hide them under a module parameter. If you're adding a module parameter, leave the default behavior alone. This also seems like something that might be nicer as a user-space knob in generic form (similar to "/sys/class/devcoredump/disabled", except on a per-device basis, and fed back to the driver so it doesn't waste time generating such dumps), but I suppose I can see why a module parameter (so you can just stick your configuration in /etc/modprobe.d/) might be easier to deal with in some cases. Brian
On Wed, 2020-10-28 at 17:12 -0700, Brian Norris wrote: > On Wed, Oct 28, 2020 at 3:58 PM Tsuchiya Yuto <kitakar@gmail.com> wrote: > > > > The devicve_dump may take a little bit long time and users may want to > > disable the dump for daily usage. > > > > This commit adds a new module parameter enable_device_dump and disables > > the device_dump by default. > > As with one of your other patches, please don't change the defaults > and hide them under a module parameter. If you're adding a module > parameter, leave the default behavior alone. Thanks for the review! I mentioned about power_save stability on the other patches. But I should have added this fact into the commit message of this patch that even if we disable that power_save, the firmware crashes a lot on some specific devices. Really a lot. For example, as far as I know Surface Pro 5 needs ASPM L1.2 substate disabled to avoid the firmware crash. Disabling it is still acceptable. On the other hand, Surface 3 needs L1 ASPM state disabled. This is not acceptable because this breaks S0ix. Anyway, handling ASPM should be done in firmware I think. So, the context of why I sent this patch is the next. We can't fix the fw crash itself, so, we decided to just let it crash and reset by itself (with the other fw reset quirks I sent). In this way, the time it does device_dump is really annoying if fw crashes so often. Let me know if splitting this patch like this works. 1) The first patch is to add this module parameter but don't change the default behavior. 2) The second patch is to change the parameter value depending on the DMI matching or something so that it doesn't break the existing users. But what I want to say here as well is that, if the firmware can be fixed, we don't need a patch like this. > This also seems like something that might be nicer as a user-space > knob in generic form (similar to "/sys/class/devcoredump/disabled", > except on a per-device basis, and fed back to the driver so it doesn't > waste time generating such dumps), but I suppose I can see why a > module parameter (so you can just stick your configuration in > /etc/modprobe.d/) might be easier to deal with in some cases. Agreed. > Brian
(Sorry if anything's a bit slow here. I don't really have time to write out full proposals myself.) On Fri, Oct 30, 2020 at 3:30 AM Tsuchiya Yuto <kitakar@gmail.com> wrote: > Let me know if splitting this patch like this works. 1) The first patch > is to add this module parameter but don't change the default behavior. That *could* be OK with me, although it's already been said that there are many people who dislike extra module parameters. I also don't see why this needs to be a module parameter at all. If you do #2 right, you don't really need this, as there are already several standard ways of doing this (e.g., via Kconfig, or via nl80211 on a per-device basis). > 2) The second patch is to change the parameter value depending on the > DMI matching or something so that it doesn't break the existing users. Point 2 sounds good, and this is the key point. Note that you can do point 2 without making it a module parameter. Just keep a flag in the driver-private structures. > But what I want to say here as well is that, if the firmware can be fixed, > we don't need a patch like this. Sure. That's also where we don't necessarily need more ways to control this from user space (e.g., module parameters), but just better detection of currently broken systems (in the driver). And if firmware ever gets fixed, we can undo the "broken device" detection. Brian
On Fri, 2020-11-20 at 12:53 -0800, Brian Norris wrote: > (Sorry if anything's a bit slow here. I don't really have time to > write out full proposals myself.) Don’t worry, I (and other owners) am already glad to hear from upstream devs, including you :) (and I'm also sorry for the late reply from me. It was difficult to find spare time these days.) > On Fri, Oct 30, 2020 at 3:30 AM Tsuchiya Yuto <kitakar@gmail.com> wrote: > > Let me know if splitting this patch like this works. 1) The first patch > > is to add this module parameter but don't change the default behavior. > > That *could* be OK with me, although it's already been said that there > are many people who dislike extra module parameters. I also don't see > why this needs to be a module parameter at all. If you do #2 right, > you don't really need this, as there are already several standard ways > of doing this (e.g., via Kconfig, or via nl80211 on a per-device > basis). > > > 2) The second patch is to change the parameter value depending on the > > DMI matching or something so that it doesn't break the existing users. > > Point 2 sounds good, and this is the key point. Note that you can do > point 2 without making it a module parameter. Just keep a flag in the > driver-private structures. I chose to also provide the module parameter because I thought it would be a little bit complicated for users to re-enable this dump feature if I chose not to provide this parameter. If I don't provide the parameter but still want to allow users to re-enable this dump feature, we can provide a way to change the bit flags of quirks (via a new debugfs entry or another module parameter). That said, is there a way to easily change the quirk flags only for this dump feature? For example, assume that the following three quirks are enabled by default: /* quirks */ #define QUIRK_FW_RST_D3COLD BIT(0) #define QUIRK_NO_DEVICE_DUMP BIT(1) #define QUIRK_FOO BIT(2) .driver_data = (void *)(QUIRK_FW_RST_D3COLD | QUIRK_NO_DEVICE_DUMP | QUIRK_FOO), card->quirks = (uintptr_t)dmi_id->driver_data; /* and assume that card->quirks can be changed by users by a file named * "quirks" under debugfs. */ So, the card->quirks will be "7" in decimal by default. Then, if users want to re-enable the dump feature, as far as I know, users need to know the default value "7", then need to know that device_dump is controlled by bit 1. Finally, users can set the new quirk flags "5" in decimal (101 in binary). echo 5 > /sys/kernel/debug/mwifiex/mlan0/quirks I'm glad if there is another nice way to control only the device_dump quirk flag, if we only provide a way to change quirk flags. But at the same time I also think that if someone dare to want to re-enable this feature, maybe the person won't feel it's complicated haha. So, I'll drop the device_dump module parameter and switch to use the quirk framework, adding a way to change the quirk flags value by users. That said, we may even drop disabling the dump. Take a look at my comment I wrote below. > > But what I want to say here as well is that, if the firmware can be fixed, > > we don't need a patch like this. > > Sure. That's also where we don't necessarily need more ways to control > this from user space (e.g., module parameters), but just better > detection of currently broken systems (in the driver). And if firmware > ever gets fixed, we can undo the "broken device" detection. There are two types of firmware crashes we observes: 1) cmd_timedout (other than ps_mode-related) 2) Firmware wakeup failed (ps_mode-related) The #2 is observed when we enabled ps_mode. The #1 is observed for the other causes. And hopefully, verdre (in Cc) found a "fix" [1] for the #1 fw crash. We are trying the fix now. The pull req (still WIP) basically addresses fw crashing by using "non-posted" register writes and uninterruptible wait queue for PCI operations in the kernel driver side. We still don't have any ideas to "fix" the #2 fw crash, but now we don't see the #1 crash anymore so far. I'd like to see where the attempt goes before I start working on this patch here again. Thank you, everyone. [1] https://github.com/linux-surface/kernel/pull/70 [WIP] Properly fix wifi firmware crashes by jonas2515 · Pull Request #70 · linux-surface/kernel > Brian
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index 6a10ff0377a24..8254e06fb22ce 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -33,6 +33,11 @@ static struct mwifiex_if_ops pcie_ops; +static bool enable_device_dump; +module_param(enable_device_dump, bool, 0644); +MODULE_PARM_DESC(enable_device_dump, + "enable device_dump (default: disabled)"); + static const struct mwifiex_pcie_card_reg mwifiex_reg_8766 = { .cmd_addr_lo = PCIE_SCRATCH_0_REG, .cmd_addr_hi = PCIE_SCRATCH_1_REG, @@ -2938,6 +2943,12 @@ static void mwifiex_pcie_fw_dump(struct mwifiex_adapter *adapter) static void mwifiex_pcie_device_dump_work(struct mwifiex_adapter *adapter) { + if (!enable_device_dump) { + mwifiex_dbg(adapter, MSG, + "device_dump is disabled by module parameter\n"); + return; + } + adapter->devdump_data = vzalloc(MWIFIEX_FW_DUMP_SIZE); if (!adapter->devdump_data) { mwifiex_dbg(adapter, ERROR,
The devicve_dump may take a little bit long time and users may want to disable the dump for daily usage. This commit adds a new module parameter enable_device_dump and disables the device_dump by default. Signed-off-by: Tsuchiya Yuto <kitakar@gmail.com> --- drivers/net/wireless/marvell/mwifiex/pcie.c | 11 +++++++++++ 1 file changed, 11 insertions(+)