Message ID | 20180103113540.GA10306@redhat.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Kalle Valo |
Headers | show |
I'll apply your patch and let you know. Thank you very very much for your help, patience, and re-posting the patch. Your work is very valuable, sorry for the slowness and not testing this before. Enrico On Wed, 3 Jan 2018, Stanislaw Gruszka wrote: > Date: Wed, 3 Jan 2018 12:35:41 > From: Stanislaw Gruszka <sgruszka@redhat.com> > To: Enrico Mioso <mrkiko.rs@gmail.com> > Cc: linux-wireless@vger.kernel.org, Johannes Berg <johannes.berg@intel.com>, > Daniel Golle <daniel@makrotopia.org>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, nbd@nbd.name > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > Hi > > On Sun, Dec 24, 2017 at 01:19:06PM +0100, Enrico Mioso wrote: >> Unfortunately, the error di appear again. Still, I experienced no stalls. >> But i am starting to think this doesn't happen necessarily when a device is going out of range. Now I think I don't know when this triggers. > > <sni> >> Sun Dec 24 10:10:50 2017 authpriv.info dropbear[840]: Exit (root): Disconnect received >> Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.616638] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 >> Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.626106] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 >> Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.635562] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > I would also try bigger threshold for waking queue like > in below patch for those errors: > > diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > index 357c0941aaad..b8bdf57ed7ea 100644 > --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > @@ -416,7 +416,7 @@ static void rt2x00lib_clear_entry(struct rt2x00_dev *rt2x00dev, > * before it was stopped. > */ > spin_lock_bh(&entry->queue->tx_lock); > - if (!rt2x00queue_threshold(entry->queue)) > + if (rt2x00queue_available(queue) > 2*queue->threshold) > rt2x00queue_unpause_queue(entry->queue); > spin_unlock_bh(&entry->queue->tx_lock); > } >
Crash happening here also in the WL-330n3G device; unfortunately, no messages to report. Regarding your hypothesis regardingthe firmware not being able to communicate a problem regarding it's inability to send frames: around when OpenWRT had 3.9.9 kernel, this device worked for weeks no stop with no issues I think. And I don't think this is only a coicnidence. So I suspect, but it's only an impression, also because I know ... very very little about this, that at some point things worked. thank you very very much for your patience and help. Sorry for the slowness. I triggered this problem this morning after something like 1 h of use. thank you again, Enrico On Thu, 11 Jan 2018, Tom Psyborg wrote: > Date: Thu, 11 Jan 2018 17:00:14 > From: Tom Psyborg <pozega.tomislav@gmail.com> > To: Enrico Mioso <mrkiko.rs@gmail.com> > Cc: Stanislaw Gruszka <sgruszka@redhat.com>, > linux-wireless <linux-wireless@vger.kernel.org>, > Johannes Berg <johannes.berg@intel.com>, > Daniel Golle <daniel@makrotopia.org>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name> > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > does not help as expected. there was even a crash: > [ 92.559018] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 92.563751] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 92.937179] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 92.942395] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 104.195138] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 104.199798] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 107.175301] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 107.179910] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 131.973838] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 131.978753] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 134.551378] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 134.556511] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 143.642447] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 143.647579] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 176.019555] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 176.025456] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 177.724843] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010e, type=4 > [ 179.044594] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 179.313118] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 179.668916] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 180.496958] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 180.675051] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010e, type=4 > [ 187.155300] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010e, type=4 > [ 191.620272] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 spurious TX_FIFO_STATUS interrupt(s) > [ 191.624983] ieee80211 phy1: rt2800mmio_txdone: Warning - Got TX status for an empty queue 2, dropping > [ 200.584720] ------------[ cut here ]------------ > [ 200.586933] WARNING: CPU: 0 PID: 0 at backports-2017-11-01/net/mac80211/rx.c:629 ieee80211_rx_napi+0x254/0x924 [mac80211] > [ 200.591898] Modules linked in: ath9k_htc ath9k_common rt2800usb rt2800soc rt2800pci rt2800mmio rt2800lib pppoe ppp_async ath9k_hw ath rt2x00usb rt2x00soc rt2x00pci rt2x00mmio rt2x00lib pppox ppp_generic mt76x2e mt7603e mt76 mac80211 > iptable_nat iptable_mangle iptable_filter ipt_REJECT ipt_MASQUERADE ip_tables cfg80211 xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG x_tables slhc > nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_log_common nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack crc_itu_t crc_ccitt compat eeprom_93cx6 leds_gpio ohci_platform > ohci_hcd ehci_platform ehci_hcd gpio_button_hotplug usbcore nls_base usb_common > [ 200.622366] CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.73 #0 > [ 200.624960] Stack : 803e751a 00000031 00000000 00000001 80392f34 80392ba7 80349710 00000000 > [ 200.628749] 803e3660 00000275 00000004 870e2730 00000000 8004d250 8034eef0 80390000 > [ 200.632536] 00000003 00000275 8034d0b8 87c0ddd4 00000000 8007cb3c 803e751a 0000006d > [ 200.636325] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 200.640112] 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 200.643901] ... > [ 200.645007] Call Trace: > [ 200.646120] [<8000e478>] show_stack+0x54/0x88 > [ 200.648101] [<800239f0>] __warn+0xe4/0x118 > [ 200.649954] [<80023ab8>] warn_slowpath_null+0x1c/0x34 > [ 200.652292] [<877255e0>] ieee80211_rx_napi+0x254/0x924 [mac80211] > [ 200.655081] [<87054ccc>] ath9k_rx_tasklet+0x144/0x194 [ath9k_htc] > [ 200.657846] [<80026bac>] tasklet_action+0x80/0xc8 > [ 200.659979] [<8002655c>] __do_softirq+0x250/0x298 > [ 200.662112] [<800096f8>] handle_int+0x138/0x144 > [ 200.664160] [<8000b1e4>] r4k_wait_irqoff+0x18/0x20 > [ 200.666330] [<80047d24>] cpu_startup_entry+0x84/0xd0 > [ 200.668580] [<803b2bb4>] start_kernel+0x44c/0x46c > [ 200.670710] ---[ end trace 56da331f0ae836de ]--- > > On 3 January 2018 at 15:04, Enrico Mioso <mrkiko.rs@gmail.com> wrote: > I'll apply your patch and let you know. Thank you very very much for your help, patience, and re-posting the patch. > Your work is very valuable, sorry for the slowness and not testing this before. > Enrico > > > On Wed, 3 Jan 2018, Stanislaw Gruszka wrote: > > Date: Wed, 3 Jan 2018 12:35:41 > From: Stanislaw Gruszka <sgruszka@redhat.com> > To: Enrico Mioso <mrkiko.rs@gmail.com> > Cc: linux-wireless@vger.kernel.org, Johannes Berg <johannes.berg@intel.com>, > Daniel Golle <daniel@makrotopia.org>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, nbd@nbd.name > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > Hi > > On Sun, Dec 24, 2017 at 01:19:06PM +0100, Enrico Mioso wrote: > Unfortunately, the error di appear again. Still, I experienced no stalls. > But i am starting to think this doesn't happen necessarily when a device is going out of range. Now I think I don't know when this triggers. > > > <sni> > Sun Dec 24 10:10:50 2017 authpriv.info dropbear[840]: Exit (root): Disconnect received > Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.616638] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.626106] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sun Dec 24 11:04:33 2017 kern.err kernel: [ 1510.635562] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > > I would also try bigger threshold for waking queue like > in below patch for those errors: > > diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > index 357c0941aaad..b8bdf57ed7ea 100644 > --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c > @@ -416,7 +416,7 @@ static void rt2x00lib_clear_entry(struct rt2x00_dev *rt2x00dev, > * before it was stopped. > */ > spin_lock_bh(&entry->queue->tx_lock); > - if (!rt2x00queue_threshold(entry->queue)) > + if (rt2x00queue_available(queue) > 2*queue->threshold) > rt2x00queue_unpause_queue(entry->queue); > spin_unlock_bh(&entry->queue->tx_lock); > } > > > >
On Thu, Jan 11, 2018 at 05:00:14PM +0100, Tom Psyborg wrote: > does not help as expected. there was even a crash: You mean the kernel hangs ? There is a WARNING at the end of the logs but it's from ath9k not from rt2800. > [ 92.559018] ieee80211 phy1: rt2800mmio_txstatus_is_spurious: Warning - 4 > spurious TX_FIFO_STATUS interrupt(s) This is suspicious, we should not get spurious interrupts. Perhaps there is something wrong on platform lever, maybe with pinmux ? > for an empty queue 2, dropping > [ 176.019555] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - > Frame received with unrecognized signal, mode=0x0001, signal=0x010f, type=4 > [ 176.025456] ieee80211 phy1: rt2x00lib_rxdone_read_signal: Warning - and this indicate that HW/FW do not provide correct frames to the driver any longer. Again - I think we do something that make device no longer possible to function normally. Changing queue stop/start thresholds will not help with that. Cheers Stanislaw
On Mon, Jan 22, 2018 at 06:45:57AM +0100, Enrico Mioso wrote: > Crash happening here also in the WL-330n3G device; unfortunately, no messages to report. > Regarding your hypothesis regardingthe firmware not being able to communicate a problem regarding it's inability to send frames: around when OpenWRT had 3.9.9 kernel, this device worked for weeks no stop with no issues I think. > And I don't think this is only a coicnidence. So I suspect, but it's only an impression, also because I know ... very very little about this, that at some point things worked. I think FW was upgraded since 3.9 kernel as well. But this could be also problem how driver program the device registers, not only FW problem. Regards Stanislaw
good morning! First of all - thank you very much for your help and patience: it's not common to find kind people, and people that cares to help with some older hardware (in my case). So, regarding the WL-330N3G, I think you are right. And after all, thinking about it, my previous reasoning isn't so plausible, sorry. Regarding the firmware, in my specific case it seems OpenWRT extracts calibration data from the flash, but doesn't upload a firmware to the device. I'll look closely when the device is in a more reachable state. I may well be wrong. Thank you again for all Stanislav, Tom for helping out in reporting. And in general, all of you. Enrico
Hello! I can confirm - there is no firmware being uploaded to the device, or at least: if it's there, it's not uploaded from an external file. root@smalltalk:~# ls -Fa -1 /lib/firmware/ ./ ../ regulatory.db root@smalltalk:~# Anything I might do? I can also give an ssh session to the device in case. Enrico On Tue, 23 Jan 2018, Stanislaw Gruszka wrote: > Date: Tue, 23 Jan 2018 14:22:35 > From: Stanislaw Gruszka <sgruszka@redhat.com> > To: Enrico Mioso <mrkiko.rs@gmail.com> > Cc: Tom Psyborg <pozega.tomislav@gmail.com>, > linux-wireless <linux-wireless@vger.kernel.org>, > Johannes Berg <johannes.berg@intel.com>, > Daniel Golle <daniel@makrotopia.org>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name> > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > On Mon, Jan 22, 2018 at 06:45:57AM +0100, Enrico Mioso wrote: >> Crash happening here also in the WL-330n3G device; unfortunately, no messages to report. >> Regarding your hypothesis regardingthe firmware not being able to communicate a problem regarding it's inability to send frames: around when OpenWRT had 3.9.9 kernel, this device worked for weeks no stop with no issues I think. >> And I don't think this is only a coicnidence. So I suspect, but it's only an impression, also because I know ... very very little about this, that at some point things worked. > > I think FW was upgraded since 3.9 kernel as well. But this > could be also problem how driver program the device registers, > not only FW problem. > > Regards > Stanislaw >
Hi So if 3.9 works and current kernel does not, the best way to address the issue would be identify change that broke. Ideally if bisecton could be performed, but this can be hard taking that the problem is not easy reproducible. Further difficultly comes from a fact that OpenWRT has custom patches. So I suggest some kind of manual semi-bisection on OpenWRT repository with long testing, to find out first or close to first OpenWRT change that start to broke things. Cheers Stanislaw On Wed, Jan 24, 2018 at 09:18:23AM +0100, Enrico Mioso wrote: > Hello! > I can confirm - there is no firmware being uploaded to the device, or at least: if it's there, it's not uploaded from an external file. > root@smalltalk:~# ls -Fa -1 /lib/firmware/ > ./ > ../ > regulatory.db > root@smalltalk:~# > > Anything I might do? I can also give an ssh session to the device in case. > > Enrico > > On Tue, 23 Jan 2018, Stanislaw Gruszka wrote: > > >Date: Tue, 23 Jan 2018 14:22:35 > >From: Stanislaw Gruszka <sgruszka@redhat.com> > >To: Enrico Mioso <mrkiko.rs@gmail.com> > >Cc: Tom Psyborg <pozega.tomislav@gmail.com>, > > linux-wireless <linux-wireless@vger.kernel.org>, > > Johannes Berg <johannes.berg@intel.com>, > > Daniel Golle <daniel@makrotopia.org>, Arnd Bergmann <arnd@arndb.de>, > > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name> > >Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > > frame due to full tx queue...? > > > >On Mon, Jan 22, 2018 at 06:45:57AM +0100, Enrico Mioso wrote: > >>Crash happening here also in the WL-330n3G device; unfortunately, no messages to report. > >>Regarding your hypothesis regardingthe firmware not being able to communicate a problem regarding it's inability to send frames: around when OpenWRT had 3.9.9 kernel, this device worked for weeks no stop with no issues I think. > >>And I don't think this is only a coicnidence. So I suspect, but it's only an impression, also because I know ... very very little about this, that at some point things worked. > > > >I think FW was upgraded since 3.9 kernel as well. But this > >could be also problem how driver program the device registers, > >not only FW problem. > > > >Regards > >Stanislaw > >
[forwarding to all other involved players] On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: > Hi Daniel, > The driver seems much improved after this fix. it's about those two [PATCH 1/2] rt2x00: pause almost full queue early [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path > Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: > > [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 > > Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com > > > > On 20 Dec 2017, at 06:00, Daniel Golle <daniel@makrotopia.org> wrote: > > > > Hi Jamie, > > > > there was another round of discussion regarding the queue management > > problem and Stanislaw Gruszka (the main rt2x00 maintainer) suggested > > two patches which would be a more correct fix for the > > rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue > > message and consecutive problems. > > > > I imported them to my staging tree. Please retest and join the > > discussion on > > https://wireless.wiki.kernel.org/en/developers/mailinglists#linux-wireless_online_archives > > see posts > > https://marc.info/?l=linux-wireless&m=151368325114059&w=2 > > https://marc.info/?l=linux-wireless&m=151368325914062&w=2 > > and > > https://marc.info/?l=linux-wireless&m=151368992716169&w=2 > > > > > > Cheers > > > > > > Daniel > > > >
Hello to all of you. I would be very happy if this problem could be sorted out. But I should be honest: when the only possibility seemd to run a git bisect on openwrt,I was discouraged. too many builds, and some patching would be needed if you consider that this thing worked with 3.9 kernel or something like that, and I might be wrong (i.e.: it could be 3.6 or something). That's why I actually didn't start this - sorry to all guys. Stanislaw's help has been greatly apreciated, and invaluable. The problem seems to verify differently depending on the card: on an MT7620A-based SOC, namely the TP-LINK Archer MR200, the error occurs but seems not to get the card stuck. Different story is for the wireless NIC found in the Asus WL-330N3G device. The circumnstances on which the problem verifies itself are not clear to me. Still, it would be very importantto get this solved, due to the mainstream presence of the hardware itself. I don't have so many wi-fi clients here: the iPhone still has been a good tool to cause some problems. I am available for code testing. As some of you will already know, I may be slow on responding and testing overall, and having the problem verify can take some time. Still, I am available. Thank you for your work, time and patience. On Thu, 1 Mar 2018, Daniel Golle wrote: > Date: Thu, 1 Mar 2018 16:30:10 > From: Daniel Golle <daniel@makrotopia.org> > To: Stanislaw Gruszka <sgruszka@redhat.com> > Cc: Enrico Mioso <mrkiko.rs@gmail.com>, > Tom Psyborg <pozega.tomislav@gmail.com>, > linux-wireless <linux-wireless@vger.kernel.org>, > Johannes Berg <johannes.berg@intel.com>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name>, > Jamie Stuart <jamie@onebillion.org> > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > [forwarding to all other involved players] > > On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: >> Hi Daniel, >> The driver seems much improved after this fix. > > it's about those two > [PATCH 1/2] rt2x00: pause almost full queue early > [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path > >> Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: >> >> [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 >> >> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com >> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com >> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com >> >> >>> On 20 Dec 2017, at 06:00, Daniel Golle <daniel@makrotopia.org> wrote: >>> >>> Hi Jamie, >>> >>> there was another round of discussion regarding the queue management >>> problem and Stanislaw Gruszka (the main rt2x00 maintainer) suggested >>> two patches which would be a more correct fix for the >>> rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue >>> message and consecutive problems. >>> >>> I imported them to my staging tree. Please retest and join the >>> discussion on >>> https://wireless.wiki.kernel.org/en/developers/mailinglists#linux-wireless_online_archives >>> see posts >>> https://marc.info/?l=linux-wireless&m=151368325114059&w=2 >>> https://marc.info/?l=linux-wireless&m=151368325914062&w=2 >>> and >>> https://marc.info/?l=linux-wireless&m=151368992716169&w=2 >>> >>> >>> Cheers >>> >>> >>> Daniel >>> >>> >
On Thu, Mar 01, 2018 at 04:30:10PM +0100, Daniel Golle wrote: > [forwarding to all other involved players] > > On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: > > Hi Daniel, > > The driver seems much improved after this fix. > > it's about those two > [PATCH 1/2] rt2x00: pause almost full queue early > [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path > > > Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: This is some testbed? Could you share how did you setup such environment and what are client devices ? > > [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 This is indicator that HW/FW has a problem. There could be various reasons for that. One possible I can also observe in my setup,is strange mishmash of seq on frames which were not acked in BlockACK and had to be resent. This can happen when many frames are wrongly decoded (i.e. when there is bad radio condition or we have not correct low level RF/BBP setup for a Ralink device). To mitigate that problem we can limit length of agreggeted AMPDU frame. I attached two patches which do that. One for RX side second for TX side. Please check if they make a diffrent. You can also hardcode ba_size = 0 for those 30 clients setup. Note the patches can cause (possibly small) perfromance degradation on good setups. Mathias, could you check them as well and see if they do not cause performance regression on your device ? Lastly when I changed ba_size setting, it was a problem on your setup. > > Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > > Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com > > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com > > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 > > Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com For those errors I recommend to remove 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch patch. Whould be good if OpenWRT developers could apply this patch only on target where it is really needed, not for all rt2800 devices. Thanks Stanislaw
Hi Stanis, Our environment has the following wireless config (MT7620A): config wifi-device 'radio0' option type 'mac80211' option channel 'auto' option hwmode '11g' option path 'platform/10180000.wmac' option txpower '20' option country 'GB' option htmode 'HT40' option noscan '1' config wifi-iface 'default_radio0' option device 'radio0' option network 'lan' option mode 'ap' option encryption 'psk2+aes' option key ‘KEY' option maxassoc '96' option ssid ’SSID' option disassoc_low_ack ‘0' The 30 clients are all Apple iPads (a mixture of iPad mini and mini 2, running iOS 9-11). During this testing period, all clients were simultaneously downloading files from the devices’ sdcard (served via nginx running on the device). Although this is not a typical use-case, it was useful in stress-testing the wireless setup. We’ll try your patches Stanis. Daniel - forgive the stupid question, but how do we apply Stanis’ patches atop your staging tree (or both yours and Stanis’ atop trunk?) > On 7 Mar 2018, at 14:27, Stanislaw Gruszka <sgruszka@redhat.com> wrote: > > On Thu, Mar 01, 2018 at 04:30:10PM +0100, Daniel Golle wrote: >> [forwarding to all other involved players] >> >> On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: >>> Hi Daniel, >>> The driver seems much improved after this fix. >> >> it's about those two >> [PATCH 1/2] rt2x00: pause almost full queue early >> [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path >> >>> Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: > > This is some testbed? Could you share how did you setup such > environment and what are client devices ? > >>> [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 > > This is indicator that HW/FW has a problem. There could be various > reasons for that. One possible I can also observe in my setup,is strange > mishmash of seq on frames which were not acked in BlockACK and > had to be resent. This can happen when many frames are wrongly decoded > (i.e. when there is bad radio condition or we have not correct low level > RF/BBP setup for a Ralink device). To mitigate that problem we can > limit length of agreggeted AMPDU frame. > > I attached two patches which do that. One for RX side second for TX side. > Please check if they make a diffrent. You can also hardcode ba_size = 0 > for those 30 clients setup. > > Note the patches can cause (possibly small) perfromance degradation on > good setups. > > Mathias, could you check them as well and see if they do not cause > performance regression on your device ? Lastly when I changed ba_size > setting, it was a problem on your setup. > >>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com >>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com >>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com > > For those errors I recommend to remove > > 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch > > patch. Whould be good if OpenWRT developers could apply this patch only > on target where it is really needed, not for all rt2800 devices. > > Thanks > Stanislaw
Thank you very very much guys for keeping me up to date with the progress, it's important for me. Even if I am not subscribed to the list. Thank you for all guys. On Wed, 7 Mar 2018, Jamie Stuart wrote: > Date: Wed, 7 Mar 2018 16:47:34 > From: Jamie Stuart <jamie@onebillion.org> > To: Stanislaw Gruszka <sgruszka@redhat.com> > Cc: Daniel Golle <daniel@makrotopia.org>, Enrico Mioso <mrkiko.rs@gmail.com>, > Tom Psyborg <pozega.tomislav@gmail.com>, > linux-wireless <linux-wireless@vger.kernel.org>, > Johannes Berg <johannes.berg@intel.com>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name>, > Mathias Kresin <dev@kresin.me> > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > Hi Stanis, > > Our environment has the following wireless config (MT7620A): > > config wifi-device 'radio0' > option type 'mac80211' > option channel 'auto' > option hwmode '11g' > option path 'platform/10180000.wmac' > option txpower '20' > option country 'GB' > option htmode 'HT40' > option noscan '1' > > config wifi-iface 'default_radio0' > option device 'radio0' > option network 'lan' > option mode 'ap' > option encryption 'psk2+aes' > option key ‘KEY' > option maxassoc '96' > option ssid ’SSID' > option disassoc_low_ack ‘0' > > > The 30 clients are all Apple iPads (a mixture of iPad mini and mini 2, running iOS 9-11). During this testing period, all clients were simultaneously downloading files from the devices’ sdcard (served via nginx running on the device). Although this is not a typical use-case, it was useful in stress-testing the wireless setup. > > We’ll try your patches Stanis. > Daniel - forgive the stupid question, but how do we apply Stanis’ patches atop your staging tree (or both yours and Stanis’ atop trunk?) > > > >> On 7 Mar 2018, at 14:27, Stanislaw Gruszka <sgruszka@redhat.com> wrote: >> >> On Thu, Mar 01, 2018 at 04:30:10PM +0100, Daniel Golle wrote: >>> [forwarding to all other involved players] >>> >>> On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: >>>> Hi Daniel, >>>> The driver seems much improved after this fix. >>> >>> it's about those two >>> [PATCH 1/2] rt2x00: pause almost full queue early >>> [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path >>> >>>> Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: >> >> This is some testbed? Could you share how did you setup such >> environment and what are client devices ? >> >>>> [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 >> >> This is indicator that HW/FW has a problem. There could be various >> reasons for that. One possible I can also observe in my setup,is strange >> mishmash of seq on frames which were not acked in BlockACK and >> had to be resent. This can happen when many frames are wrongly decoded >> (i.e. when there is bad radio condition or we have not correct low level >> RF/BBP setup for a Ralink device). To mitigate that problem we can >> limit length of agreggeted AMPDU frame. >> >> I attached two patches which do that. One for RX side second for TX side. >> Please check if they make a diffrent. You can also hardcode ba_size = 0 >> for those 30 clients setup. >> >> Note the patches can cause (possibly small) perfromance degradation on >> good setups. >> >> Mathias, could you check them as well and see if they do not cause >> performance regression on your device ? Lastly when I changed ba_size >> setting, it was a problem on your setup. >> >>>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com >> >> For those errors I recommend to remove >> >> 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch >> >> patch. Whould be good if OpenWRT developers could apply this patch only >> on target where it is really needed, not for all rt2800 devices. >> >> Thanks >> Stanislaw > >
And BTW, thank you to all of you involved on this, for the testing and the work involved. And all. BTW - happy woman fest to all of you wimen, and the ones you know. Enrico On Wed, 7 Mar 2018, Jamie Stuart wrote: > Date: Wed, 7 Mar 2018 16:47:34 > From: Jamie Stuart <jamie@onebillion.org> > To: Stanislaw Gruszka <sgruszka@redhat.com> > Cc: Daniel Golle <daniel@makrotopia.org>, Enrico Mioso <mrkiko.rs@gmail.com>, > Tom Psyborg <pozega.tomislav@gmail.com>, > linux-wireless <linux-wireless@vger.kernel.org>, > Johannes Berg <johannes.berg@intel.com>, Arnd Bergmann <arnd@arndb.de>, > John Crispin <john@phrozen.org>, Felix Fietkau <nbd@nbd.name>, > Mathias Kresin <dev@kresin.me> > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping > frame due to full tx queue...? > > Hi Stanis, > > Our environment has the following wireless config (MT7620A): > > config wifi-device 'radio0' > option type 'mac80211' > option channel 'auto' > option hwmode '11g' > option path 'platform/10180000.wmac' > option txpower '20' > option country 'GB' > option htmode 'HT40' > option noscan '1' > > config wifi-iface 'default_radio0' > option device 'radio0' > option network 'lan' > option mode 'ap' > option encryption 'psk2+aes' > option key ‘KEY' > option maxassoc '96' > option ssid ’SSID' > option disassoc_low_ack ‘0' > > > The 30 clients are all Apple iPads (a mixture of iPad mini and mini 2, running iOS 9-11). During this testing period, all clients were simultaneously downloading files from the devices’ sdcard (served via nginx running on the device). Although this is not a typical use-case, it was useful in stress-testing the wireless setup. > > We’ll try your patches Stanis. > Daniel - forgive the stupid question, but how do we apply Stanis’ patches atop your staging tree (or both yours and Stanis’ atop trunk?) > > > >> On 7 Mar 2018, at 14:27, Stanislaw Gruszka <sgruszka@redhat.com> wrote: >> >> On Thu, Mar 01, 2018 at 04:30:10PM +0100, Daniel Golle wrote: >>> [forwarding to all other involved players] >>> >>> On Thu, Mar 01, 2018 at 05:50:51PM +0300, Jamie Stuart wrote: >>>> Hi Daniel, >>>> The driver seems much improved after this fix. >>> >>> it's about those two >>> [PATCH 1/2] rt2x00: pause almost full queue early >>> [PATCH 2/2] rt2x00: do not pause queue unconditionally on error path >>> >>>> Under very heavy load (30 clients downloading multi-GB files from SD card on the server concurrently), wifi dies with errors: >> >> This is some testbed? Could you share how did you setup such >> environment and what are client devices ? >> >>>> [ 7794.230376] ieee80211 phy0: rt2x00lib_rxdone_read_signal: Warning - Frame received with unrecognized signal, mode=0x0001, signal=0x010c, type=4 >> >> This is indicator that HW/FW has a problem. There could be various >> reasons for that. One possible I can also observe in my setup,is strange >> mishmash of seq on frames which were not acked in BlockACK and >> had to be resent. This can happen when many frames are wrongly decoded >> (i.e. when there is bad radio condition or we have not correct low level >> RF/BBP setup for a Ralink device). To mitigate that problem we can >> limit length of agreggeted AMPDU frame. >> >> I attached two patches which do that. One for RX side second for TX side. >> Please check if they make a diffrent. You can also hardcode ba_size = 0 >> for those 30 clients setup. >> >> Note the patches can cause (possibly small) perfromance degradation on >> good setups. >> >> Mathias, could you check them as well and see if they do not cause >> performance regression on your device ? Lastly when I changed ba_size >> setting, it was a problem on your setup. >> >>>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:47 2018 kern.err kernel: [ 8702.146403] Please file bug report to http://rt2x00.serialmonkey.com >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.288149] Please file bug report to http://rt2x00.serialmonkey.com >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Arrived at non-free entry in the non-full queue 2 >>>> Thu Mar 1 16:36:48 2018 kern.err kernel: [ 8702.380761] Please file bug report to http://rt2x00.serialmonkey.com >> >> For those errors I recommend to remove >> >> 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch >> >> patch. Whould be good if OpenWRT developers could apply this patch only >> on target where it is really needed, not for all rt2800 devices. >> >> Thanks >> Stanislaw > >
diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c index 357c0941aaad..b8bdf57ed7ea 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c @@ -416,7 +416,7 @@ static void rt2x00lib_clear_entry(struct rt2x00_dev *rt2x00dev, * before it was stopped. */ spin_lock_bh(&entry->queue->tx_lock); - if (!rt2x00queue_threshold(entry->queue)) + if (rt2x00queue_available(queue) > 2*queue->threshold) rt2x00queue_unpause_queue(entry->queue); spin_unlock_bh(&entry->queue->tx_lock); }