[net] net: wireless: intel: iwlwifi: fix GRO_NORMAL packet stalling
diff mbox series

Message ID 20191127094123.18161-1-alobakin@dlink.ru
State New
Delegated to: Luca Coelho
Headers show
Series
  • [net] net: wireless: intel: iwlwifi: fix GRO_NORMAL packet stalling
Related show

Commit Message

Alexander Lobakin Nov. 27, 2019, 9:41 a.m. UTC
Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
napi_gro_receive()") has applied batched GRO_NORMAL packets processing
to all napi_gro_receive() users, including mac80211-based drivers.

However, this change has led to a regression in iwlwifi driver [1][2] as
it is required for NAPI users to call napi_complete_done() or
napi_complete() and the end of every polling iteration, whilst iwlwifi
doesn't use NAPI scheduling at all and just calls napi_gro_flush().
In that particular case, packets which have not been already flushed
from napi->rx_list stall in it until at least next Rx cycle.

Fix this by adding a manual flushing of the list to iwlwifi driver right
before napi_gro_flush() call to mimic napi_complete() logics.

I prefer to open-code gro_normal_list() rather than exporting it for 2
reasons:
* to prevent from using it and napi_gro_flush() in any new drivers,
  as it is the *really* bad way to use NAPI that should be avoided;
* to keep gro_normal_list() static and don't lose any CC optimizations.

I also don't add the "Fixes:" tag as the mentioned commit was only a
trigger that only exposed an improper usage of NAPI in this particular
driver.

[1] https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
[2] https://bugzilla.kernel.org/show_bug.cgi?id=205647

Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
---
 drivers/net/wireless/intel/iwlwifi/pcie/rx.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Comments

Luciano Coelho Nov. 27, 2019, 9:58 a.m. UTC | #1
On Wed, 2019-11-27 at 12:41 +0300, Alexander Lobakin wrote:
> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
> to all napi_gro_receive() users, including mac80211-based drivers.
> 
> However, this change has led to a regression in iwlwifi driver [1][2] as
> it is required for NAPI users to call napi_complete_done() or
> napi_complete() and the end of every polling iteration, whilst iwlwifi
> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
> In that particular case, packets which have not been already flushed
> from napi->rx_list stall in it until at least next Rx cycle.
> 
> Fix this by adding a manual flushing of the list to iwlwifi driver right
> before napi_gro_flush() call to mimic napi_complete() logics.
> 
> I prefer to open-code gro_normal_list() rather than exporting it for 2
> reasons:
> * to prevent from using it and napi_gro_flush() in any new drivers,
>   as it is the *really* bad way to use NAPI that should be avoided;
> * to keep gro_normal_list() static and don't lose any CC optimizations.
> 
> I also don't add the "Fixes:" tag as the mentioned commit was only a
> trigger that only exposed an improper usage of NAPI in this particular
> driver.
> 
> [1] https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
> 
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> ---

We don't usually use "net: wireless: intel:" in the commit message, we
would use "iwlwifi: pcie:", but I don't care much.

Otherwise:

Acked-by: Luca Coelho <luciano.coelho@intel.com>

Thanks a lot for the fix!

Dave, I'm assuming you'll take this directly into your tree, right?

--
Cheers,
Luca.
Alexander Lobakin Nov. 27, 2019, 10:12 a.m. UTC | #2
Luciano Coelho wrote 27.11.2019 12:58:
> On Wed, 2019-11-27 at 12:41 +0300, Alexander Lobakin wrote:
>> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
>> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
>> to all napi_gro_receive() users, including mac80211-based drivers.
>> 
>> However, this change has led to a regression in iwlwifi driver [1][2] 
>> as
>> it is required for NAPI users to call napi_complete_done() or
>> napi_complete() and the end of every polling iteration, whilst iwlwifi
>> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
>> In that particular case, packets which have not been already flushed
>> from napi->rx_list stall in it until at least next Rx cycle.
>> 
>> Fix this by adding a manual flushing of the list to iwlwifi driver 
>> right
>> before napi_gro_flush() call to mimic napi_complete() logics.
>> 
>> I prefer to open-code gro_normal_list() rather than exporting it for 2
>> reasons:
>> * to prevent from using it and napi_gro_flush() in any new drivers,
>>   as it is the *really* bad way to use NAPI that should be avoided;
>> * to keep gro_normal_list() static and don't lose any CC 
>> optimizations.
>> 
>> I also don't add the "Fixes:" tag as the mentioned commit was only a
>> trigger that only exposed an improper usage of NAPI in this particular
>> driver.
>> 
>> [1] 
>> https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
>> 
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
>> ---
> 
> We don't usually use "net: wireless: intel:" in the commit message, we
> would use "iwlwifi: pcie:", but I don't care much.
> 
> Otherwise:
> 
> Acked-by: Luca Coelho <luciano.coelho@intel.com>

Thank you!

> Thanks a lot for the fix!
> 
> Dave, I'm assuming you'll take this directly into your tree, right?

Also please let me know if I should send v2 with Ack and fixed commit
subject!

> --
> Cheers,
> Luca.

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
Alexander Lobakin Nov. 27, 2019, 10:29 a.m. UTC | #3
Nicholas Johnson wrote 27.11.2019 13:23:
> Hi,

Hi Nicholas,

>  Sorry for top down reply, stuck with my phone. If it replies HTML
> then I am so done with Outlook client.
> 
>  Does my Reported-by tag apply here?
> 
>  As the reporter, should I check to see that it indeed solves the
> issue on the original hardware setup? I can do this within two hours
> and give Tested-by then.

Oops, I'm sorry I forgot to mention you in the commit message. Let's
see what Dave will say, I have no problems with waiting for your test
results and publishing v2.

>  Thanks
> 
>  Regards,
> 
>  Nicholas

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
Nicholas Johnson Nov. 27, 2019, 11:16 a.m. UTC | #4
On Wed, Nov 27, 2019 at 01:29:03PM +0300, Alexander Lobakin wrote:
> Nicholas Johnson wrote 27.11.2019 13:23:
> > Hi,
> 
> Hi Nicholas,
> 
> >  Sorry for top down reply, stuck with my phone. If it replies HTML
> > then I am so done with Outlook client.

I am very sorry to everybody for the improper reply. It looks like it 
was HTML as vger.kernel.org told me I was spam. If anybody knows a good 
email client for kernel development for Android then I am all ears.

I went home early and I have my computer(s) now.

> > 
> >  Does my Reported-by tag apply here?
> > 
> >  As the reporter, should I check to see that it indeed solves the
> > issue on the original hardware setup? I can do this within two hours
> > and give Tested-by then.
> 
> Oops, I'm sorry I forgot to mention you in the commit message. Let's
> see what Dave will say, I have no problems with waiting for your test
> results and publishing v2.

All good. :)

I tested the the patch and it works fine. Great work, the first 
hypothesis as to what the problem was is correct. It now connects to 
wireless networks without any hassles.

Reported-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>
Tested-by: Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au>

I do not understand the networking subsystem well enough to give 
Reviewed-by, yet. Hopefully in time.

Thanks to everybody for handling my report.

> 
> >  Thanks
> > 
> >  Regards,
> > 
> >  Nicholas
> 
> Regards,
> ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

Regards,
Nicholas
Alexander Lobakin Nov. 27, 2019, 11:44 a.m. UTC | #5
Nicholas Johnson wrote 27.11.2019 14:16:
> On Wed, Nov 27, 2019 at 01:29:03PM +0300, Alexander Lobakin wrote:
>> Nicholas Johnson wrote 27.11.2019 13:23:
>> > Hi,
>> 
>> Hi Nicholas,
>> 
>> >  Sorry for top down reply, stuck with my phone. If it replies HTML
>> > then I am so done with Outlook client.
> 
> I am very sorry to everybody for the improper reply. It looks like it
> was HTML as vger.kernel.org told me I was spam. If anybody knows a good
> email client for kernel development for Android then I am all ears.
> 
> I went home early and I have my computer(s) now.
> 
>> >
>> >  Does my Reported-by tag apply here?
>> >
>> >  As the reporter, should I check to see that it indeed solves the
>> > issue on the original hardware setup? I can do this within two hours
>> > and give Tested-by then.
>> 
>> Oops, I'm sorry I forgot to mention you in the commit message. Let's
>> see what Dave will say, I have no problems with waiting for your test
>> results and publishing v2.
> 
> All good. :)
> 
> I tested the the patch and it works fine. Great work, the first
> hypothesis as to what the problem was is correct. It now connects to
> wireless networks without any hassles.
> 
> Reported-by: Nicholas Johnson 
> <nicholas.johnson-opensource@outlook.com.au>
> Tested-by: Nicholas Johnson 
> <nicholas.johnson-opensource@outlook.com.au>

Oh, much thanks for testing this out! I think this one will hit netdev
fixes tree soon.

> I do not understand the networking subsystem well enough to give
> Reviewed-by, yet. Hopefully in time.

I'm sure you will :)

> Thanks to everybody for handling my report.
> 
>> 
>> >  Thanks
>> >
>> >  Regards,
>> >
>> >  Nicholas
>> 
>> Regards,
>> ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
> 
> Regards,
> Nicholas

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
Kalle Valo Nov. 27, 2019, 1:38 p.m. UTC | #6
Nicholas Johnson <nicholas.johnson-opensource@outlook.com.au> writes:

> On Wed, Nov 27, 2019 at 01:29:03PM +0300, Alexander Lobakin wrote:
>> Nicholas Johnson wrote 27.11.2019 13:23:
>> > Hi,
>> 
>> Hi Nicholas,
>> 
>> >  Sorry for top down reply, stuck with my phone. If it replies HTML
>> > then I am so done with Outlook client.
>
> I am very sorry to everybody for the improper reply. It looks like it 
> was HTML as vger.kernel.org told me I was spam. If anybody knows a good 
> email client for kernel development for Android then I am all ears.

I use K-9 Mail because Greg K-H used it :) TBH I use it mostly only for
reading but seems to work quite well:

https://k9mail.github.io
Edward Cree Nov. 27, 2019, 4:05 p.m. UTC | #7
On 27/11/2019 09:41, Alexander Lobakin wrote:
> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
> to all napi_gro_receive() users, including mac80211-based drivers.
>
> However, this change has led to a regression in iwlwifi driver [1][2] as
> it is required for NAPI users to call napi_complete_done() or
> napi_complete() and the end of every polling iteration, whilst iwlwifi
> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
> In that particular case, packets which have not been already flushed
> from napi->rx_list stall in it until at least next Rx cycle.
>
> Fix this by adding a manual flushing of the list to iwlwifi driver right
> before napi_gro_flush() call to mimic napi_complete() logics.
>
> I prefer to open-code gro_normal_list() rather than exporting it for 2
> reasons:
> * to prevent from using it and napi_gro_flush() in any new drivers,
>   as it is the *really* bad way to use NAPI that should be avoided;
> * to keep gro_normal_list() static and don't lose any CC optimizations.
>
> I also don't add the "Fixes:" tag as the mentioned commit was only a
> trigger that only exposed an improper usage of NAPI in this particular
> driver.
>
> [1] https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
>
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
Reviewed-by: Edward Cree <ecree@solarflare.com>
Alexander Lobakin Nov. 27, 2019, 4:31 p.m. UTC | #8
Edward Cree wrote 27.11.2019 19:05:
> On 27/11/2019 09:41, Alexander Lobakin wrote:
>> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
>> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
>> to all napi_gro_receive() users, including mac80211-based drivers.
>> 
>> However, this change has led to a regression in iwlwifi driver [1][2] 
>> as
>> it is required for NAPI users to call napi_complete_done() or
>> napi_complete() and the end of every polling iteration, whilst iwlwifi
>> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
>> In that particular case, packets which have not been already flushed
>> from napi->rx_list stall in it until at least next Rx cycle.
>> 
>> Fix this by adding a manual flushing of the list to iwlwifi driver 
>> right
>> before napi_gro_flush() call to mimic napi_complete() logics.
>> 
>> I prefer to open-code gro_normal_list() rather than exporting it for 2
>> reasons:
>> * to prevent from using it and napi_gro_flush() in any new drivers,
>>   as it is the *really* bad way to use NAPI that should be avoided;
>> * to keep gro_normal_list() static and don't lose any CC 
>> optimizations.
>> 
>> I also don't add the "Fixes:" tag as the mentioned commit was only a
>> trigger that only exposed an improper usage of NAPI in this particular
>> driver.
>> 
>> [1] 
>> https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
>> 
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> Reviewed-by: Edward Cree <ecree@solarflare.com>

Thanks! And you were the first who's found the root of the issue.

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ
David Miller Nov. 27, 2019, 7:23 p.m. UTC | #9
From: Alexander Lobakin <alobakin@dlink.ru>
Date: Wed, 27 Nov 2019 12:41:23 +0300

> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
> to all napi_gro_receive() users, including mac80211-based drivers.
> 
> However, this change has led to a regression in iwlwifi driver [1][2] as
> it is required for NAPI users to call napi_complete_done() or
> napi_complete() and the end of every polling iteration, whilst iwlwifi
> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
> In that particular case, packets which have not been already flushed
> from napi->rx_list stall in it until at least next Rx cycle.
> 
> Fix this by adding a manual flushing of the list to iwlwifi driver right
> before napi_gro_flush() call to mimic napi_complete() logics.
> 
> I prefer to open-code gro_normal_list() rather than exporting it for 2
> reasons:
> * to prevent from using it and napi_gro_flush() in any new drivers,
>   as it is the *really* bad way to use NAPI that should be avoided;
> * to keep gro_normal_list() static and don't lose any CC optimizations.
> 
> I also don't add the "Fixes:" tag as the mentioned commit was only a
> trigger that only exposed an improper usage of NAPI in this particular
> driver.
> 
> [1] https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
> 
> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>

Applied, thanks for the quick turnaround.
Alexander Lobakin Nov. 27, 2019, 7:28 p.m. UTC | #10
David Miller wrote 27.11.2019 22:23:
> From: Alexander Lobakin <alobakin@dlink.ru>
> Date: Wed, 27 Nov 2019 12:41:23 +0300
> 
>> Commit 6570bc79c0df ("net: core: use listified Rx for GRO_NORMAL in
>> napi_gro_receive()") has applied batched GRO_NORMAL packets processing
>> to all napi_gro_receive() users, including mac80211-based drivers.
>> 
>> However, this change has led to a regression in iwlwifi driver [1][2] 
>> as
>> it is required for NAPI users to call napi_complete_done() or
>> napi_complete() and the end of every polling iteration, whilst iwlwifi
>> doesn't use NAPI scheduling at all and just calls napi_gro_flush().
>> In that particular case, packets which have not been already flushed
>> from napi->rx_list stall in it until at least next Rx cycle.
>> 
>> Fix this by adding a manual flushing of the list to iwlwifi driver 
>> right
>> before napi_gro_flush() call to mimic napi_complete() logics.
>> 
>> I prefer to open-code gro_normal_list() rather than exporting it for 2
>> reasons:
>> * to prevent from using it and napi_gro_flush() in any new drivers,
>>   as it is the *really* bad way to use NAPI that should be avoided;
>> * to keep gro_normal_list() static and don't lose any CC 
>> optimizations.
>> 
>> I also don't add the "Fixes:" tag as the mentioned commit was only a
>> trigger that only exposed an improper usage of NAPI in this particular
>> driver.
>> 
>> [1] 
>> https://lore.kernel.org/netdev/PSXP216MB04388962C411CD0B17A86F47804A0@PSXP216MB0438.KORP216.PROD.OUTLOOK.COM
>> [2] https://bugzilla.kernel.org/show_bug.cgi?id=205647
>> 
>> Signed-off-by: Alexander Lobakin <alobakin@dlink.ru>
> 
> Applied, thanks for the quick turnaround.

Thank you all folks!

Regards,
ᚷ ᛖ ᚢ ᚦ ᚠ ᚱ

Patch
diff mbox series

diff --git a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
index a4d325fcf94a..452da44a21e0 100644
--- a/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
+++ b/drivers/net/wireless/intel/iwlwifi/pcie/rx.c
@@ -1421,6 +1421,7 @@  static struct iwl_rx_mem_buffer *iwl_pcie_get_rxb(struct iwl_trans *trans,
 static void iwl_pcie_rx_handle(struct iwl_trans *trans, int queue)
 {
 	struct iwl_trans_pcie *trans_pcie = IWL_TRANS_GET_PCIE_TRANS(trans);
+	struct napi_struct *napi;
 	struct iwl_rxq *rxq;
 	u32 r, i, count = 0;
 	bool emergency = false;
@@ -1526,8 +1527,16 @@  static void iwl_pcie_rx_handle(struct iwl_trans *trans, int queue)
 	if (unlikely(emergency && count))
 		iwl_pcie_rxq_alloc_rbs(trans, GFP_ATOMIC, rxq);
 
-	if (rxq->napi.poll)
-		napi_gro_flush(&rxq->napi, false);
+	napi = &rxq->napi;
+	if (napi->poll) {
+		if (napi->rx_count) {
+			netif_receive_skb_list(&napi->rx_list);
+			INIT_LIST_HEAD(&napi->rx_list);
+			napi->rx_count = 0;
+		}
+
+		napi_gro_flush(napi, false);
+	}
 
 	iwl_pcie_rxq_restock(trans, rxq);
 }