Message ID | 20211011134238.16551-1-verdre@v0yd.nl (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | mwifiex: Add quirk resetting the PCI bridge on MS Surface devices | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
[+cc Alex] On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote: > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card > reports a hardcoded LTR value to the system during initialization, > probably as an (unsuccessful) attempt of the developers to fix firmware > crashes. This LTR value prevents most of the Microsoft Surface devices > from entering deep powersaving states (either platform C-State 10 or > S0ix state), because the exit latency of that state would be higher than > what the card can tolerate. S0ix and C-State 10 are ACPI concepts that don't mean anything in a PCIe context. I think LTR is only involved in deciding whether to enter the ASPM L1.2 substate. Maybe the system will only enter C-State 10 or S0ix when the link is in L1.2? > Turns out the card works just the same (including the firmware crashes) > no matter if that hardcoded LTR value is reported or not, so it's kind > of useless and only prevents us from saving power. > > To get rid of those hardcoded LTR requirements, it's possible to reset > the PCI bridge device after initializing the cards firmware. I'm not > exactly sure why that works, maybe the power management subsystem of the > PCH resets its stored LTR values when doing a function level reset of > the bridge device. Doing the reset once after starting the wifi firmware > works very well, probably because the firmware only reports that LTR > value a single time during firmware startup. > > Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> > --- > drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++ > .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------ > .../wireless/marvell/mwifiex/pcie_quirks.h | 1 + > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c > index c6ccce426b49..2506e7e49f0c 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb) > static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter) > { > struct pcie_service_card *card = adapter->card; > + struct pci_dev *pdev = card->dev; > + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); > const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; > int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask; > > + /* Trigger a function level reset of the PCI bridge device, this makes > + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB > + * card stop reporting a fixed LTR value that prevents the system from > + * entering package C10 and S0ix powersaving states. I don't believe this. Why would resetting the root port change what the downstream device reports via LTR messages? From PCIe r5.0, sec 5.5.1: The following rules define how the L1.1 and L1.2 substates are entered: ... * When in ASPM L1.0 and the ASPM L1.2 Enable bit is Set, the L1.2 substate must be entered when CLKREQ# is deasserted and all of the following conditions are true: - The reported snooped LTR value last sent or received by this Port is greater than or equal to the value set by the LTR_L1.2_THRESHOLD Value and Scale fields, or there is no snoop service latency requirement. - The reported non-snooped LTR last sent or received by this Port value is greater than or equal to the value set by the LTR_L1.2_THRESHOLD Value and Scale fields, or there is no non-snoop service latency requirement. From the LTR Message format in sec 6.18: No-Snoop Latency and Snoop Latency: As shown in Figure 6-15, these fields include a Requirement bit that indicates if the device has a latency requirement for the given type of Request. If the Requirement bit is Set, the LatencyValue and LatencyScale fields describe the latency requirement. If the Requirement bit is Clear, there is no latency requirement and the LatencyValue and LatencyScale fields are ignored. Resetting the root port might make it forget the LTR value it last received. If that's equivalent to having no service latency requirement, it *might* enable L1.2 entry, although that doesn't seem equivalent to the downstream device having sent an LTR message with the Requirement bit cleared. I think the endpoint is required to send a new LTR message before it goes to a non-D0 state (sec 6.18), so the bridge will capture the latency again, and we'll probably be back in the same state. This all seems fragile to me. If we force the link to L1.2 without knowing accurate exit latencies and latency tolerance, the device is liable to drop packets. > + * We need to do it here because it must happen after firmware > + * initialization and this function is called right after that is done. > + */ > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) > + pci_reset_function(parent_pdev); PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be supported by endpoints, so I guess this will actually do some other kind of reset. > /* Write the RX ring read pointer in to reg->rx_rdptr */ > if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr | > tx_wrap)) { > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > index 0234cf3c2974..cbf0565353ae 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 5", > @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 5 (LTE)", > @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 6", > @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Book 1", > @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Book 2", > @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Laptop 1", > @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Laptop 2", > @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > {} > }; > @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card) > dev_info(&pdev->dev, "no quirks enabled\n"); > if (card->quirks & QUIRK_FW_RST_D3COLD) > dev_info(&pdev->dev, "quirk reset_d3cold enabled\n"); > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) > + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n"); > } > > static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev) > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > index 8ec4176d698f..f8d463f4269a 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > @@ -18,6 +18,7 @@ > #include "pcie.h" > > #define QUIRK_FW_RST_D3COLD BIT(0) > +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1) > > void mwifiex_initialize_quirks(struct pcie_service_card *card); > int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev); > -- > 2.31.1 >
On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote: > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card > reports a hardcoded LTR value to the system during initialization, > probably as an (unsuccessful) attempt of the developers to fix firmware > crashes. This LTR value prevents most of the Microsoft Surface devices > from entering deep powersaving states (either platform C-State 10 or > S0ix state), because the exit latency of that state would be higher than > what the card can tolerate. This description looks like a generic issue in 88W8897 chip or its firmware and not something to Surface PCIe controller or Surface HW. But please correct me if I'm wrong here. Has somebody 88W8897-based PCIe card in non-Surface device and can check or verify if this issue happens also outside of the Surface device? It would be really nice to know if this is issue in Surface or in 8897. > Turns out the card works just the same (including the firmware crashes) > no matter if that hardcoded LTR value is reported or not, so it's kind > of useless and only prevents us from saving power. > > To get rid of those hardcoded LTR requirements, it's possible to reset > the PCI bridge device after initializing the cards firmware. I'm not > exactly sure why that works, maybe the power management subsystem of the > PCH resets its stored LTR values when doing a function level reset of > the bridge device. Doing the reset once after starting the wifi firmware > works very well, probably because the firmware only reports that LTR > value a single time during firmware startup. > > Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> > --- > drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++ > .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------ > .../wireless/marvell/mwifiex/pcie_quirks.h | 1 + > 3 files changed, 31 insertions(+), 8 deletions(-) > > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c > index c6ccce426b49..2506e7e49f0c 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c > @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb) > static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter) > { > struct pcie_service_card *card = adapter->card; > + struct pci_dev *pdev = card->dev; > + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); > const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; > int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask; > > + /* Trigger a function level reset of the PCI bridge device, this makes > + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB > + * card stop reporting a fixed LTR value that prevents the system from > + * entering package C10 and S0ix powersaving states. > + * We need to do it here because it must happen after firmware > + * initialization and this function is called right after that is done. > + */ > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) > + pci_reset_function(parent_pdev); > + > /* Write the RX ring read pointer in to reg->rx_rdptr */ > if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr | > tx_wrap)) { > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > index 0234cf3c2974..cbf0565353ae 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c > @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 5", > @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 5 (LTE)", > @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Pro 6", > @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Book 1", > @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Book 2", > @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Laptop 1", > @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > { > .ident = "Surface Laptop 2", > @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { > DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), > DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"), > }, > - .driver_data = (void *)QUIRK_FW_RST_D3COLD, > + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | > + QUIRK_DO_FLR_ON_BRIDGE), > }, > {} > }; > @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card) > dev_info(&pdev->dev, "no quirks enabled\n"); > if (card->quirks & QUIRK_FW_RST_D3COLD) > dev_info(&pdev->dev, "quirk reset_d3cold enabled\n"); > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) > + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n"); > } > > static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev) > diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > index 8ec4176d698f..f8d463f4269a 100644 > --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h > @@ -18,6 +18,7 @@ > #include "pcie.h" > > #define QUIRK_FW_RST_D3COLD BIT(0) > +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1) > > void mwifiex_initialize_quirks(struct pcie_service_card *card); > int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev); > -- > 2.31.1 >
On 10/11/21 18:53, Bjorn Helgaas wrote: > [+cc Alex] > > On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote: >> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card >> reports a hardcoded LTR value to the system during initialization, >> probably as an (unsuccessful) attempt of the developers to fix firmware >> crashes. This LTR value prevents most of the Microsoft Surface devices >> from entering deep powersaving states (either platform C-State 10 or >> S0ix state), because the exit latency of that state would be higher than >> what the card can tolerate. > > S0ix and C-State 10 are ACPI concepts that don't mean anything in a > PCIe context. > > I think LTR is only involved in deciding whether to enter the ASPM > L1.2 substate. Maybe the system will only enter C-State 10 or S0ix > when the link is in L1.2? Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting (ctrl+f "IP LINK PM STATE"). > >> Turns out the card works just the same (including the firmware crashes) >> no matter if that hardcoded LTR value is reported or not, so it's kind >> of useless and only prevents us from saving power. >> >> To get rid of those hardcoded LTR requirements, it's possible to reset >> the PCI bridge device after initializing the cards firmware. I'm not >> exactly sure why that works, maybe the power management subsystem of the >> PCH resets its stored LTR values when doing a function level reset of >> the bridge device. Doing the reset once after starting the wifi firmware >> works very well, probably because the firmware only reports that LTR >> value a single time during firmware startup. >> >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> >> --- >> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++ >> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------ >> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 + >> 3 files changed, 31 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c >> index c6ccce426b49..2506e7e49f0c 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c >> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb) >> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter) >> { >> struct pcie_service_card *card = adapter->card; >> + struct pci_dev *pdev = card->dev; >> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); >> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; >> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask; >> >> + /* Trigger a function level reset of the PCI bridge device, this makes >> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB >> + * card stop reporting a fixed LTR value that prevents the system from >> + * entering package C10 and S0ix powersaving states. > > I don't believe this. Why would resetting the root port change what > the downstream device reports via LTR messages? > > From PCIe r5.0, sec 5.5.1: > > The following rules define how the L1.1 and L1.2 substates are entered: > ... > * When in ASPM L1.0 and the ASPM L1.2 Enable bit is Set, the L1.2 > substate must be entered when CLKREQ# is deasserted and all of > the following conditions are true: > > - The reported snooped LTR value last sent or received by this > Port is greater than or equal to the value set by the > LTR_L1.2_THRESHOLD Value and Scale fields, or there is no > snoop service latency requirement. > > - The reported non-snooped LTR last sent or received by this > Port value is greater than or equal to the value set by the > LTR_L1.2_THRESHOLD Value and Scale fields, or there is no > non-snoop service latency requirement. > > From the LTR Message format in sec 6.18: > > No-Snoop Latency and Snoop Latency: As shown in Figure 6-15, these > fields include a Requirement bit that indicates if the device has a > latency requirement for the given type of Request. If the > Requirement bit is Set, the LatencyValue and LatencyScale fields > describe the latency requirement. If the Requirement bit is Clear, > there is no latency requirement and the LatencyValue and > LatencyScale fields are ignored. > > Resetting the root port might make it forget the LTR value it last > received. If that's equivalent to having no service latency > requirement, it *might* enable L1.2 entry, although that doesn't seem > equivalent to the downstream device having sent an LTR message with > the Requirement bit cleared. > > I think the endpoint is required to send a new LTR message before it > goes to a non-D0 state (sec 6.18), so the bridge will capture the > latency again, and we'll probably be back in the same state. Indeed that happens when suspending the device, after resuming the LTR value is back to the initial value. mwifiex_pcie_init_fw_port() is executed on resume, too though (I should probably have mentioned this in the commit message, will do in v2), so this is taken care of. While suspended, the device goes into D3 anyway and S0ix is achieved regardless of the LTR value. > > This all seems fragile to me. If we force the link to L1.2 without > knowing accurate exit latencies and latency tolerance, the device is > liable to drop packets. Yeah, I'm not saying this patch isn't an ugly hack... What I can say though is that this patch has been running in the linux-surface (https://github.com/linux-surface/kernel/pull/72) kernel for a few months now, and so far we've only received positive feedback. There's two alternatives I can think of to deal with this issue: 1) Revert the cards firmware in linux-firmware back to the second-latest version. That firmware didn't report a fixed LTR value and also doesn't have any other obvious issues I know of compared to the latest one. 2) Somehow interact with the PMC Core driver to make it ignore the LTR values reported by the card (I doubt that's possible from mwifiex). It can be done manually via debugfs by writing to /sys/kernel/debug/pmc_core/ltr_ignore. > >> + * We need to do it here because it must happen after firmware >> + * initialization and this function is called right after that is done. >> + */ >> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) >> + pci_reset_function(parent_pdev); > > PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be > supported by endpoints, so I guess this will actually do some other > kind of reset. Interesting, I briefly searched and it doesn't seem like think there's public documentation available by Intel that goes into the specifics here, maybe someone working at Intel knows more? > >> /* Write the RX ring read pointer in to reg->rx_rdptr */ >> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr | >> tx_wrap)) { >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> index 0234cf3c2974..cbf0565353ae 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 5", >> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 5 (LTE)", >> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 6", >> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Book 1", >> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Book 2", >> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Laptop 1", >> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Laptop 2", >> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> {} >> }; >> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card) >> dev_info(&pdev->dev, "no quirks enabled\n"); >> if (card->quirks & QUIRK_FW_RST_D3COLD) >> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n"); >> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) >> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n"); >> } >> >> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev) >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> index 8ec4176d698f..f8d463f4269a 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> @@ -18,6 +18,7 @@ >> #include "pcie.h" >> >> #define QUIRK_FW_RST_D3COLD BIT(0) >> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1) >> >> void mwifiex_initialize_quirks(struct pcie_service_card *card); >> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev); >> -- >> 2.31.1 >>
On 10/11/21 19:02, Pali Rohár wrote: > On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote: >> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card >> reports a hardcoded LTR value to the system during initialization, >> probably as an (unsuccessful) attempt of the developers to fix firmware >> crashes. This LTR value prevents most of the Microsoft Surface devices >> from entering deep powersaving states (either platform C-State 10 or >> S0ix state), because the exit latency of that state would be higher than >> what the card can tolerate. > > This description looks like a generic issue in 88W8897 chip or its > firmware and not something to Surface PCIe controller or Surface HW. But > please correct me if I'm wrong here. > > Has somebody 88W8897-based PCIe card in non-Surface device and can check > or verify if this issue happens also outside of the Surface device? > > It would be really nice to know if this is issue in Surface or in 8897. > Fairly sure the LTR value is something that's reported by the firmware and will be the same on all 8897 devices (as mentioned in my reply to Bjorn the second-latest firmware doesn't report that fixed LTR value). The thing is I'm not sure if this hack works fine on non-Surface devices or maybe breaks things there (I guess the change had some effect on Marvells test platform at least), so this is simply the minimum risk approach. >> Turns out the card works just the same (including the firmware crashes) >> no matter if that hardcoded LTR value is reported or not, so it's kind >> of useless and only prevents us from saving power. >> >> To get rid of those hardcoded LTR requirements, it's possible to reset >> the PCI bridge device after initializing the cards firmware. I'm not >> exactly sure why that works, maybe the power management subsystem of the >> PCH resets its stored LTR values when doing a function level reset of >> the bridge device. Doing the reset once after starting the wifi firmware >> works very well, probably because the firmware only reports that LTR >> value a single time during firmware startup. >> >> Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> >> --- >> drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++ >> .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------ >> .../wireless/marvell/mwifiex/pcie_quirks.h | 1 + >> 3 files changed, 31 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c >> index c6ccce426b49..2506e7e49f0c 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie.c >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c >> @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb) >> static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter) >> { >> struct pcie_service_card *card = adapter->card; >> + struct pci_dev *pdev = card->dev; >> + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); >> const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; >> int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask; >> >> + /* Trigger a function level reset of the PCI bridge device, this makes >> + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB >> + * card stop reporting a fixed LTR value that prevents the system from >> + * entering package C10 and S0ix powersaving states. >> + * We need to do it here because it must happen after firmware >> + * initialization and this function is called right after that is done. >> + */ >> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) >> + pci_reset_function(parent_pdev); >> + >> /* Write the RX ring read pointer in to reg->rx_rdptr */ >> if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr | >> tx_wrap)) { >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> index 0234cf3c2974..cbf0565353ae 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c >> @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 5", >> @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 5 (LTE)", >> @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Pro 6", >> @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Book 1", >> @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Book 2", >> @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Laptop 1", >> @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> { >> .ident = "Surface Laptop 2", >> @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { >> DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), >> DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"), >> }, >> - .driver_data = (void *)QUIRK_FW_RST_D3COLD, >> + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | >> + QUIRK_DO_FLR_ON_BRIDGE), >> }, >> {} >> }; >> @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card) >> dev_info(&pdev->dev, "no quirks enabled\n"); >> if (card->quirks & QUIRK_FW_RST_D3COLD) >> dev_info(&pdev->dev, "quirk reset_d3cold enabled\n"); >> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) >> + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n"); >> } >> >> static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev) >> diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> index 8ec4176d698f..f8d463f4269a 100644 >> --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h >> @@ -18,6 +18,7 @@ >> #include "pcie.h" >> >> #define QUIRK_FW_RST_D3COLD BIT(0) >> +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1) >> >> void mwifiex_initialize_quirks(struct pcie_service_card *card); >> int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev); >> -- >> 2.31.1 >>
On Tuesday 12 October 2021 10:48:49 Jonas Dreßler wrote: > 1) Revert the cards firmware in linux-firmware back to the second-latest > version. That firmware didn't report a fixed LTR value and also doesn't > have any other obvious issues I know of compared to the latest one. FYI, there are new bugs with new firmware versions for 8997 sent by NXP to linux-firmware repository... and questions what to do with it. Seems that NXP again do not respond to any questions after new version was merged into linux-firmware repo. https://lore.kernel.org/linux-firmware/edeb34bc-7c85-7f1d-79e4-e3e21df86334@gk2.net/ So firmware revert also for other ex-Marvell / NXP chips is not something which could not happen.
On 10/12/21 11:00, Pali Rohár wrote: > On Tuesday 12 October 2021 10:48:49 Jonas Dreßler wrote: >> 1) Revert the cards firmware in linux-firmware back to the second-latest >> version. That firmware didn't report a fixed LTR value and also doesn't >> have any other obvious issues I know of compared to the latest one. > > FYI, there are new bugs with new firmware versions for 8997 sent by NXP > to linux-firmware repository... and questions what to do with it. Seems > that NXP again do not respond to any questions after new version was > merged into linux-firmware repo. > > https://lore.kernel.org/linux-firmware/edeb34bc-7c85-7f1d-79e4-e3e21df86334@gk2.net/ > > So firmware revert also for other ex-Marvell / NXP chips is not > something which could not happen. > Argh, nevermind, it seems like my memory is fooling me once again, sorry. I just tried the older firmware and I was completely wrong, there's no difference at all between the versions when it comes to LTR messages. So there goes alternative 1). Something interesting and reassuring I noticed though: After resuming from suspend the firmware actually doesn't send a new LTR message, which means now the LTR is 0 and we enter PC10/S0ix just fine. So basically the change this patch does is already in effect, just after one suspend/resume cycle. That gives me more confidence that we should maybe apply this patch for all 8897 devices, not only Surface devices?
On Tue, Oct 12, 2021 at 10:48:49AM +0200, Jonas Dreßler wrote: > On 10/11/21 18:53, Bjorn Helgaas wrote: > > On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote: > > > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card > > > reports a hardcoded LTR value to the system during initialization, > > > probably as an (unsuccessful) attempt of the developers to fix firmware > > > crashes. This LTR value prevents most of the Microsoft Surface devices > > > from entering deep powersaving states (either platform C-State 10 or > > > S0ix state), because the exit latency of that state would be higher than > > > what the card can tolerate. > > > > S0ix and C-State 10 are ACPI concepts that don't mean anything in a > > PCIe context. > > > > I think LTR is only involved in deciding whether to enter the ASPM > > L1.2 substate. Maybe the system will only enter C-State 10 or S0ix > > when the link is in L1.2? > > Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting > (ctrl+f "IP LINK PM STATE"). I think it would be helpful if the commit log included this missing link, e.g., the LTR value prevents the link from going to L1.2, which in turn prevents use of C-State 10/S0ix. > There's two alternatives I can think of to deal with this issue: > > 1) Revert the cards firmware in linux-firmware back to the second-latest > version. That firmware didn't report a fixed LTR value and also doesn't > have any other obvious issues I know of compared to the latest one. You've mentioned "fixed LTR value" more than once. My weak understanding of LTR and L1.2 is that the latencies a device reports via LTR messages are essentially a function of buffering in the device and electrical characteristics of the link. I expect them to be set once and not changed. But did the previous firmware report different latencies at different times? Or did it just not advertise L1.2 support at all? Or do you mean the new firmware reports a "corrected" LTR value that doesn't work as well? > 2) Somehow interact with the PMC Core driver to make it ignore the LTR > values reported by the card (I doubt that's possible from mwifiex). > It can be done manually via debugfs by writing to > /sys/kernel/debug/pmc_core/ltr_ignore. Interesting; I wasn't aware of that, thanks. This still feels like a configuration issue. If we ignore the reported LTR values, I guess you mean the root port assumes it's *always* safe to enter L1.2, i.e., the device has enough buffering to deal with the exit latency? I would think there would be a way to program the LTR capability to have the device itself report that, so we wouldn't have to fiddle with the upstream end. > > > + * We need to do it here because it must happen after firmware > > > + * initialization and this function is called right after that is done. > > > + */ > > > + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) > > > + pci_reset_function(parent_pdev); > > > > PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be > > supported by endpoints, so I guess this will actually do some other > > kind of reset. > > Interesting, I briefly searched and it doesn't seem like think > there's public documentation available by Intel that goes into > the specifics here, maybe someone working at Intel knows more? "lspci -vv" will tell you whether the root port advertises FLR support. The spec says it shouldn't, but I think pci_reset_function() relies on what DevCap says. You could instrument pci_reset_function() to see exactly what kind of reset we do. Bjorn
[+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] On Tue, Oct 12, 2021 at 10:55:03AM +0200, Jonas Dreßler wrote: > On 10/11/21 19:02, Pali Rohár wrote: > > On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote: > > > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card > > > reports a hardcoded LTR value to the system during initialization, > > > probably as an (unsuccessful) attempt of the developers to fix firmware > > > crashes. This LTR value prevents most of the Microsoft Surface devices > > > from entering deep powersaving states (either platform C-State 10 or > > > S0ix state), because the exit latency of that state would be higher than > > > what the card can tolerate. > > > > This description looks like a generic issue in 88W8897 chip or its > > firmware and not something to Surface PCIe controller or Surface HW. But > > please correct me if I'm wrong here. > > > > Has somebody 88W8897-based PCIe card in non-Surface device and can check > > or verify if this issue happens also outside of the Surface device? > > > > It would be really nice to know if this is issue in Surface or in 8897. > > Fairly sure the LTR value is something that's reported by the firmware > and will be the same on all 8897 devices (as mentioned in my reply to Bjorn > the second-latest firmware doesn't report that fixed LTR value). I suggested earlier that the LTR values reported by the device might depend on the electrical characteristics of the link and hence be platform-dependent, but I think that might be wrong. The spec (PCIe r5.0, sec 5.5.4) does say that some of the *other* parameters related to L1.2 entry are platform-dependent: Prior to setting either or both of the enable bits for L1.2, the values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value and Scale fields) must be programmed. The TPOWER_ON and Common_Mode_Restore_Time fields must be programmed to the appropriate values based on the components and AC coupling capacitors used in the connection linking the two components. The determination of these values is design implementation specific. These T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values are in the L1 PM Substates Control registers. I don't know of a way for the kernel or the device firmware to learn these circuit characteristics or the appropriate values, so I think only system firmware can program the L1 PM Substates Control registers (a corollary of this is that I don't see a way for hot-plugged devices to *ever* use L1.2). I wonder if this reset quirk works because pci_reset_function() saves and restores much of config space, but it currently does *not* restore the L1 PM Substates capability, so those T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get cleared out by the reset. We did briefly save/restore it [1], but we had to revert that because of a regression that AFAIK was never resolved [2]. I expect we will eventually save/restore this, so if the quirk depends on it *not* being restored, that would be a problem. You should be able to test whether this is the critical thing by clearing those registers with setpci instead of doing the reset. Per spec, they can only be modified when L1.2 is disabled, so you would have to disable it via sysfs (for the endpoint, I think) /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root port, then re-enable L1.2. [1] https://git.kernel.org/linus/4257f7e008ea [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/
On 10/12/21 13:29, Bjorn Helgaas wrote: > On Tue, Oct 12, 2021 at 10:48:49AM +0200, Jonas Dreßler wrote: >> On 10/11/21 18:53, Bjorn Helgaas wrote: >>> On Mon, Oct 11, 2021 at 03:42:38PM +0200, Jonas Dreßler wrote: >>>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card >>>> reports a hardcoded LTR value to the system during initialization, >>>> probably as an (unsuccessful) attempt of the developers to fix firmware >>>> crashes. This LTR value prevents most of the Microsoft Surface devices >>>> from entering deep powersaving states (either platform C-State 10 or >>>> S0ix state), because the exit latency of that state would be higher than >>>> what the card can tolerate. >>> >>> S0ix and C-State 10 are ACPI concepts that don't mean anything in a >>> PCIe context. >>> >>> I think LTR is only involved in deciding whether to enter the ASPM >>> L1.2 substate. Maybe the system will only enter C-State 10 or S0ix >>> when the link is in L1.2? >> >> Yup, this is indeed the case, see https://01.org/blogs/qwang59/2020/linux-s0ix-troubleshooting >> (ctrl+f "IP LINK PM STATE"). > > I think it would be helpful if the commit log included this missing > link, e.g., the LTR value prevents the link from going to L1.2, which > in turn prevents use of C-State 10/S0ix. > >> There's two alternatives I can think of to deal with this issue: >> >> 1) Revert the cards firmware in linux-firmware back to the second-latest >> version. That firmware didn't report a fixed LTR value and also doesn't >> have any other obvious issues I know of compared to the latest one. > > You've mentioned "fixed LTR value" more than once. My weak > understanding of LTR and L1.2 is that the latencies a device reports > via LTR messages are essentially a function of buffering in the device > and electrical characteristics of the link. I expect them to be set > once and not changed. I'm not an expert on PCI at all, but from my understanding the idea behind the LTR mechanism is to be able to dynamically communicate latency requirements depending on the current situation/workload. So for example in case the wifi card is receiving a lot of data, it would report a lower latency tolerance because its rx buffers are filling up fast. Looking at ltr_show in the pmc_core debug driver, that also seems to be how other devices handle it: For example moving the USB mouse makes the XHCI controller report a non-null LTR for a few seconds. So with "fixed LTR value" I meant to say that in my understanding this is exactly the opposite of how LTR is supposed to be used: Instead of dynamically reporting a new latency tolerance when the card is being used, it reports a "fixed" tolerance once during firmware startup, maybe with the intention of papering over bugs in the firmware... > > But did the previous firmware report different latencies at different > times? Or did it just not advertise L1.2 support at all? Or do you > mean the new firmware reports a "corrected" LTR value that doesn't > work as well? Sorry, as mentioned in my other reply the previous firmware is actually behaving in the exact same way, no clue why I remembered this wrong. > >> 2) Somehow interact with the PMC Core driver to make it ignore the LTR >> values reported by the card (I doubt that's possible from mwifiex). >> It can be done manually via debugfs by writing to >> /sys/kernel/debug/pmc_core/ltr_ignore. > > Interesting; I wasn't aware of that, thanks. This still feels like a > configuration issue. If we ignore the reported LTR values, I guess > you mean the root port assumes it's *always* safe to enter L1.2, i.e., > the device has enough buffering to deal with the exit latency? Not sure about that, in theory there's also the whole negotiation via the CLKREQ# pin when entering ASPM L1.2. The card could use that to reject L1.2 entry I guess. > > I would think there would be a way to program the LTR capability to > have the device itself report that, so we wouldn't have to fiddle with > the upstream end. Well I mean the device does report LTR capabilities and a maximum snoop and non-snoop latency via extended capabilities, so I guess that means it is supported. > >>>> + * We need to do it here because it must happen after firmware >>>> + * initialization and this function is called right after that is done. >>>> + */ > >>>> + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) >>>> + pci_reset_function(parent_pdev); >>> >>> PCIe r5.0, sec 7.5.3.3, says Function Level Reset can only be >>> supported by endpoints, so I guess this will actually do some other >>> kind of reset. >> >> Interesting, I briefly searched and it doesn't seem like think >> there's public documentation available by Intel that goes into >> the specifics here, maybe someone working at Intel knows more? > > "lspci -vv" will tell you whether the root port advertises FLR > support. The spec says it shouldn't, but I think pci_reset_function() > relies on what DevCap says. You could instrument pci_reset_function() > to see exactly what kind of reset we do. Ahh indeed, I wasn't aware there's multiple kinds of resets. Looks like what it uses is pci_pm_reset(). > > Bjorn >
On 10/12/21 17:39, Bjorn Helgaas wrote: > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: > https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] > > On Tue, Oct 12, 2021 at 10:55:03AM +0200, Jonas Dreßler wrote: >> On 10/11/21 19:02, Pali Rohár wrote: >>> On Monday 11 October 2021 15:42:38 Jonas Dreßler wrote: >>>> The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card >>>> reports a hardcoded LTR value to the system during initialization, >>>> probably as an (unsuccessful) attempt of the developers to fix firmware >>>> crashes. This LTR value prevents most of the Microsoft Surface devices >>>> from entering deep powersaving states (either platform C-State 10 or >>>> S0ix state), because the exit latency of that state would be higher than >>>> what the card can tolerate. >>> >>> This description looks like a generic issue in 88W8897 chip or its >>> firmware and not something to Surface PCIe controller or Surface HW. But >>> please correct me if I'm wrong here. >>> >>> Has somebody 88W8897-based PCIe card in non-Surface device and can check >>> or verify if this issue happens also outside of the Surface device? >>> >>> It would be really nice to know if this is issue in Surface or in 8897. >> >> Fairly sure the LTR value is something that's reported by the firmware >> and will be the same on all 8897 devices (as mentioned in my reply to Bjorn >> the second-latest firmware doesn't report that fixed LTR value). > > I suggested earlier that the LTR values reported by the device might > depend on the electrical characteristics of the link and hence be > platform-dependent, but I think that might be wrong. > > The spec (PCIe r5.0, sec 5.5.4) does say that some of the *other* > parameters related to L1.2 entry are platform-dependent: > > Prior to setting either or both of the enable bits for L1.2, the > values for TPOWER_ON, Common_Mode_Restore_Time, and, if the ASPM > L1.2 Enable bit is to be Set, the LTR_L1.2_THRESHOLD (both Value > and Scale fields) must be programmed. The TPOWER_ON and > Common_Mode_Restore_Time fields must be programmed to the > appropriate values based on the components and AC coupling > capacitors used in the connection linking the two components. The > determination of these values is design implementation specific. > > These T_POWER_ON, Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD > values are in the L1 PM Substates Control registers. > > I don't know of a way for the kernel or the device firmware to learn > these circuit characteristics or the appropriate values, so I think > only system firmware can program the L1 PM Substates Control registers > (a corollary of this is that I don't see a way for hot-plugged devices > to *ever* use L1.2). > > I wonder if this reset quirk works because pci_reset_function() saves > and restores much of config space, but it currently does *not* restore > the L1 PM Substates capability, so those T_POWER_ON, > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get > cleared out by the reset. We did briefly save/restore it [1], but we > had to revert that because of a regression that AFAIK was never > resolved [2]. I expect we will eventually save/restore this, so if > the quirk depends on it *not* being restored, that would be a problem. > > You should be able to test whether this is the critical thing by > clearing those registers with setpci instead of doing the reset. Per > spec, they can only be modified when L1.2 is disabled, so you would > have to disable it via sysfs (for the endpoint, I think) > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root > port, then re-enable L1.2. > > [1] https://git.kernel.org/linus/4257f7e008ea > [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/ > Hmm, interesting, thanks for those links. Are you sure the config values will get lost on the reset? If we only reset the port by going into D3hot and back into D0, the device will remain powered and won't lose the config space, will it? Because when I reset the bridge using pci_reset_function() (ie. pci_pm_reset()) or when I suspend and resume the laptop, all the L1 PM Substates registers are still the same as before, nothing is lost. That said, our new mwifiex_pcie_reset_d3cold_quirk() puts *both the card and the bridge* into D3cold, so I gave that a try, and indeed the cards L1 Substate Ctl registers are cleared out (so T_CommonMode, LTR1.2_Threshold and T_PwrOn), but the bridge still has its values, no clue why that's the case.
Jonas Dreßler <verdre@v0yd.nl> wrote: > The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card > reports a hardcoded LTR value to the system during initialization, > probably as an (unsuccessful) attempt of the developers to fix firmware > crashes. This LTR value prevents most of the Microsoft Surface devices > from entering deep powersaving states (either platform C-State 10 or > S0ix state), because the exit latency of that state would be higher than > what the card can tolerate. > > Turns out the card works just the same (including the firmware crashes) > no matter if that hardcoded LTR value is reported or not, so it's kind > of useless and only prevents us from saving power. > > To get rid of those hardcoded LTR requirements, it's possible to reset > the PCI bridge device after initializing the cards firmware. I'm not > exactly sure why that works, maybe the power management subsystem of the > PCH resets its stored LTR values when doing a function level reset of > the bridge device. Doing the reset once after starting the wifi firmware > works very well, probably because the firmware only reports that LTR > value a single time during firmware startup. > > Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> I'm not sure what was the conclusion from the discussion, so dropping the patch. Please resend once it's ready to be applied. Patch set to Changes Requested.
On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote: > On 10/12/21 17:39, Bjorn Helgaas wrote: > > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: > > https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] > > I wonder if this reset quirk works because pci_reset_function() saves > > and restores much of config space, but it currently does *not* restore > > the L1 PM Substates capability, so those T_POWER_ON, > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get > > cleared out by the reset. We did briefly save/restore it [1], but we > > had to revert that because of a regression that AFAIK was never > > resolved [2]. I expect we will eventually save/restore this, so if > > the quirk depends on it *not* being restored, that would be a problem. > > > > You should be able to test whether this is the critical thing by > > clearing those registers with setpci instead of doing the reset. Per > > spec, they can only be modified when L1.2 is disabled, so you would > > have to disable it via sysfs (for the endpoint, I think) > > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root > > port, then re-enable L1.2. > > > > [1] https://git.kernel.org/linus/4257f7e008ea > > [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/ > > Hmm, interesting, thanks for those links. > > Are you sure the config values will get lost on the reset? If we only reset > the port by going into D3hot and back into D0, the device will remain powered > and won't lose the config space, will it? I think you're doing a PM reset (transition to D3hot and back to D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. The spec doesn't actually *require* the device to be reset; it only says the internal state of the device is undefined after these transitions.
On 10/18/21 17:35, Bjorn Helgaas wrote: > On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote: >> On 10/12/21 17:39, Bjorn Helgaas wrote: >>> [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: >>> https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] > >>> I wonder if this reset quirk works because pci_reset_function() saves >>> and restores much of config space, but it currently does *not* restore >>> the L1 PM Substates capability, so those T_POWER_ON, >>> Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get >>> cleared out by the reset. We did briefly save/restore it [1], but we >>> had to revert that because of a regression that AFAIK was never >>> resolved [2]. I expect we will eventually save/restore this, so if >>> the quirk depends on it *not* being restored, that would be a problem. >>> >>> You should be able to test whether this is the critical thing by >>> clearing those registers with setpci instead of doing the reset. Per >>> spec, they can only be modified when L1.2 is disabled, so you would >>> have to disable it via sysfs (for the endpoint, I think) >>> /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root >>> port, then re-enable L1.2. >>> >>> [1] https://git.kernel.org/linus/4257f7e008ea >>> [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/ >> >> Hmm, interesting, thanks for those links. >> >> Are you sure the config values will get lost on the reset? If we only reset >> the port by going into D3hot and back into D0, the device will remain powered >> and won't lose the config space, will it? > > I think you're doing a PM reset (transition to D3hot and back to D0). > Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. The spec > doesn't actually *require* the device to be reset; it only says the > internal state of the device is undefined after these transitions. > Not requiring the device to be reset sounds sensible to me given that D3hot is what devices are transitioned into during suspend. But anyway, that doesn't really get us any further except it somewhat gives an explanation why the LTR is suddenly 0 after the reset. Or are you making the point that we shouldn't rely on "undefined state" for this hack because not all PCI bridges/ports will necessarily behave the same?
On Mon, Oct 25, 2021 at 06:45:29PM +0200, Jonas Dreßler wrote: > On 10/18/21 17:35, Bjorn Helgaas wrote: > > On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote: > > > On 10/12/21 17:39, Bjorn Helgaas wrote: > > > > [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: > > > > https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] > > > > > > I wonder if this reset quirk works because pci_reset_function() saves > > > > and restores much of config space, but it currently does *not* restore > > > > the L1 PM Substates capability, so those T_POWER_ON, > > > > Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get > > > > cleared out by the reset. We did briefly save/restore it [1], but we > > > > had to revert that because of a regression that AFAIK was never > > > > resolved [2]. I expect we will eventually save/restore this, so if > > > > the quirk depends on it *not* being restored, that would be a problem. > > > > > > > > You should be able to test whether this is the critical thing by > > > > clearing those registers with setpci instead of doing the reset. Per > > > > spec, they can only be modified when L1.2 is disabled, so you would > > > > have to disable it via sysfs (for the endpoint, I think) > > > > /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root > > > > port, then re-enable L1.2. > > > > > > > > [1] https://git.kernel.org/linus/4257f7e008ea > > > > [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/ > > > > > > Hmm, interesting, thanks for those links. > > > > > > Are you sure the config values will get lost on the reset? If we > > > only reset the port by going into D3hot and back into D0, the > > > device will remain powered and won't lose the config space, will > > > it? > > > > I think you're doing a PM reset (transition to D3hot and back to > > D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. > > The spec doesn't actually *require* the device to be reset; it > > only says the internal state of the device is undefined after > > these transitions. > > Not requiring the device to be reset sounds sensible to me given > that D3hot is what devices are transitioned into during suspend. > > But anyway, that doesn't really get us any further except it > somewhat gives an explanation why the LTR is suddenly 0 after the > reset. Or are you making the point that we shouldn't rely on > "undefined state" for this hack because not all PCI bridges/ports > will necessarily behave the same? I guess I'm just making the point that I don't understand why the bridge reset fixes something, and I'm not confident that the fix will work on every system and continue working even if/when the PCI core starts saving and restoring the L1 PM Substates capability.
On 10/26/21 01:56, Bjorn Helgaas wrote: > On Mon, Oct 25, 2021 at 06:45:29PM +0200, Jonas Dreßler wrote: >> On 10/18/21 17:35, Bjorn Helgaas wrote: >>> On Thu, Oct 14, 2021 at 12:08:31AM +0200, Jonas Dreßler wrote: >>>> On 10/12/21 17:39, Bjorn Helgaas wrote: >>>>> [+cc Vidya, Victor, ASPM L1.2 config issue; beginning of thread: >>>>> https://lore.kernel.org/all/20211011134238.16551-1-verdre@v0yd.nl/] >>> >>>>> I wonder if this reset quirk works because pci_reset_function() saves >>>>> and restores much of config space, but it currently does *not* restore >>>>> the L1 PM Substates capability, so those T_POWER_ON, >>>>> Common_Mode_Restore_Time, and LTR_L1.2_THRESHOLD values probably get >>>>> cleared out by the reset. We did briefly save/restore it [1], but we >>>>> had to revert that because of a regression that AFAIK was never >>>>> resolved [2]. I expect we will eventually save/restore this, so if >>>>> the quirk depends on it *not* being restored, that would be a problem. >>>>> >>>>> You should be able to test whether this is the critical thing by >>>>> clearing those registers with setpci instead of doing the reset. Per >>>>> spec, they can only be modified when L1.2 is disabled, so you would >>>>> have to disable it via sysfs (for the endpoint, I think) >>>>> /sys/.../l1_2_aspm and /sys/.../l1_2_pcipm, do the setpci on the root >>>>> port, then re-enable L1.2. >>>>> >>>>> [1] https://git.kernel.org/linus/4257f7e008ea >>>>> [2] https://lore.kernel.org/all/20210127160449.2990506-1-helgaas@kernel.org/ >>>> >>>> Hmm, interesting, thanks for those links. >>>> >>>> Are you sure the config values will get lost on the reset? If we >>>> only reset the port by going into D3hot and back into D0, the >>>> device will remain powered and won't lose the config space, will >>>> it? >>> >>> I think you're doing a PM reset (transition to D3hot and back to >>> D0). Linux only does this when PCI_PM_CTRL_NO_SOFT_RESET == 0. >>> The spec doesn't actually *require* the device to be reset; it >>> only says the internal state of the device is undefined after >>> these transitions. >> >> Not requiring the device to be reset sounds sensible to me given >> that D3hot is what devices are transitioned into during suspend. >> >> But anyway, that doesn't really get us any further except it >> somewhat gives an explanation why the LTR is suddenly 0 after the >> reset. Or are you making the point that we shouldn't rely on >> "undefined state" for this hack because not all PCI bridges/ports >> will necessarily behave the same? > > I guess I'm just making the point that I don't understand why the > bridge reset fixes something, and I'm not confident that the fix will > work on every system and continue working even if/when the PCI core > starts saving and restoring the L1 PM Substates capability. > FWIW, I've tested it with the restoring of L1 PM Substates enabled now and the bridge reset worked just as before. But yeah I, too, have no clue why exactly the bridge reset does what it does... Anyway, I've also confirmed that it actually impacts the power usage by measuring consumed energy during idle over a few minutes: Applying either the bridge reset quirk or ignoring the LTR via pmc_core results in about 7% less energy usage. Given that the overall energy usage was almost nothing to make the measurement easier, those 7% are not a lot, but nonetheless it confirms that the quirk works.
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index c6ccce426b49..2506e7e49f0c 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -1748,9 +1748,21 @@ mwifiex_pcie_send_boot_cmd(struct mwifiex_adapter *adapter, struct sk_buff *skb) static int mwifiex_pcie_init_fw_port(struct mwifiex_adapter *adapter) { struct pcie_service_card *card = adapter->card; + struct pci_dev *pdev = card->dev; + struct pci_dev *parent_pdev = pci_upstream_bridge(pdev); const struct mwifiex_pcie_card_reg *reg = card->pcie.reg; int tx_wrap = card->txbd_wrptr & reg->tx_wrap_mask; + /* Trigger a function level reset of the PCI bridge device, this makes + * the firmware (latest version 15.68.19.p21) of the 88W8897 PCIe+USB + * card stop reporting a fixed LTR value that prevents the system from + * entering package C10 and S0ix powersaving states. + * We need to do it here because it must happen after firmware + * initialization and this function is called right after that is done. + */ + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) + pci_reset_function(parent_pdev); + /* Write the RX ring read pointer in to reg->rx_rdptr */ if (mwifiex_write_reg(adapter, reg->rx_rdptr, card->rxbd_rdptr | tx_wrap)) { diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c index 0234cf3c2974..cbf0565353ae 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.c @@ -27,7 +27,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 4"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Pro 5", @@ -36,7 +37,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1796"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Pro 5 (LTE)", @@ -45,7 +47,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_SKU, "Surface_Pro_1807"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Pro 6", @@ -53,7 +56,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Pro 6"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Book 1", @@ -61,7 +65,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Book 2", @@ -69,7 +74,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Book 2"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Laptop 1", @@ -77,7 +83,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, { .ident = "Surface Laptop 2", @@ -85,7 +92,8 @@ static const struct dmi_system_id mwifiex_quirk_table[] = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Microsoft Corporation"), DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "Surface Laptop 2"), }, - .driver_data = (void *)QUIRK_FW_RST_D3COLD, + .driver_data = (void *)(QUIRK_FW_RST_D3COLD | + QUIRK_DO_FLR_ON_BRIDGE), }, {} }; @@ -103,6 +111,8 @@ void mwifiex_initialize_quirks(struct pcie_service_card *card) dev_info(&pdev->dev, "no quirks enabled\n"); if (card->quirks & QUIRK_FW_RST_D3COLD) dev_info(&pdev->dev, "quirk reset_d3cold enabled\n"); + if (card->quirks & QUIRK_DO_FLR_ON_BRIDGE) + dev_info(&pdev->dev, "quirk do_flr_on_bridge enabled\n"); } static void mwifiex_pcie_set_power_d3cold(struct pci_dev *pdev) diff --git a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h index 8ec4176d698f..f8d463f4269a 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h +++ b/drivers/net/wireless/marvell/mwifiex/pcie_quirks.h @@ -18,6 +18,7 @@ #include "pcie.h" #define QUIRK_FW_RST_D3COLD BIT(0) +#define QUIRK_DO_FLR_ON_BRIDGE BIT(1) void mwifiex_initialize_quirks(struct pcie_service_card *card); int mwifiex_pcie_reset_d3cold_quirk(struct pci_dev *pdev);
The most recent firmware (15.68.19.p21) of the 88W8897 PCIe+USB card reports a hardcoded LTR value to the system during initialization, probably as an (unsuccessful) attempt of the developers to fix firmware crashes. This LTR value prevents most of the Microsoft Surface devices from entering deep powersaving states (either platform C-State 10 or S0ix state), because the exit latency of that state would be higher than what the card can tolerate. Turns out the card works just the same (including the firmware crashes) no matter if that hardcoded LTR value is reported or not, so it's kind of useless and only prevents us from saving power. To get rid of those hardcoded LTR requirements, it's possible to reset the PCI bridge device after initializing the cards firmware. I'm not exactly sure why that works, maybe the power management subsystem of the PCH resets its stored LTR values when doing a function level reset of the bridge device. Doing the reset once after starting the wifi firmware works very well, probably because the firmware only reports that LTR value a single time during firmware startup. Signed-off-by: Jonas Dreßler <verdre@v0yd.nl> --- drivers/net/wireless/marvell/mwifiex/pcie.c | 12 +++++++++ .../wireless/marvell/mwifiex/pcie_quirks.c | 26 +++++++++++++------ .../wireless/marvell/mwifiex/pcie_quirks.h | 1 + 3 files changed, 31 insertions(+), 8 deletions(-)