Message ID | 20180528125039.GA22122@redhat.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Hi! On Mon, May 28, 2018 at 02:50:40PM +0200, Stanislaw Gruszka wrote: > I have some updates here. > > ... > Another issue is that sometimes we do not sent rate probe frames, > I posted fix for that (revert of my older change): > https://marc.info/?l=linux-wireless&m=152750671715369&w=2 > This patch improve performance when mt7620 is connected with Intel iwl7265 > device. With the patch I have 70 Mbits/sec compared to about 17 Mbits/sec > before. Perhaps it also fix stability issues. Oh goodness, I didn't expect anything like that to still happen at this stage. Thanks a lot for investigating the issue! I've imported the patch to OpenWrt git: https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=f4a639a3d7d40b4f63c431c2d554c479fbcc6b74 Once we got some feedback from users (please test!) we can push that also to still be part of the openwrt-18.06 release. Cheers Daniel
On Mon, May 28, 2018 at 02:50:39PM +0200, Stanislaw Gruszka wrote:
> I have some updates here.
And now more updates. I have 5 patches to test here:
https://github.com/sgruszka/wireless-drivers-next/commits/rt2800-draft-v2
as reported by T-Bone here:
https://bugzilla.kernel.org/show_bug.cgi?id=82751
first 3 patches made Netgear WN3000RPv3 router workable.
Another 2 should be further improvement, but were not tested broadly
and can have some bugs.
Daniel, could you apply this on your staging tree for testing?
Please also remove:
600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch
991-rt2800_change_rx_ampdu_factor.patch
992-rt2800_change_ba_size.patch
993-rt2800_change_rx_ampdu_density.patch
Those are not needed and can be harmful with the test patches,
(especially spurious interrupt one, patches will not apply cleanly
with it).
Patches should make "Dropping frame due to full tx queue" and
"Queue 2 failed to flush" errors gone. However if device is somewhat
wrongly configured by driver at low level (i.e. wrong MAC/BBP/RF
registers programing) there still will be troubles to have stable
and fast wireless connection.
On TODO site, the proper watchdog should be added, but I'm not sure
how to exactly detect device/firmware hung and how exactly device should
be rested (by some HW reset by register or by doing full reinitialization).
Another thing is fixing RATE_PROBE frames which are aggregated with
other frames and not sent at requested rate. I implemented qsel queue patch
similar to mt76, but this not work as expected on older Ralink chips.
https://github.com/sgruszka/wireless-drivers-next/commit/846d205edd8c36d1b7828fee54bf4cf40bf8cb1a
Any help with those 2 problems are welcome.
Thanks
Stanislaw
Hi Stanislaw, On Wed, Aug 15, 2018 at 01:40:30PM +0200, Stanislaw Gruszka wrote: > On Mon, May 28, 2018 at 02:50:39PM +0200, Stanislaw Gruszka wrote: > > I have some updates here. > > And now more updates. I have 5 patches to test here: > https://github.com/sgruszka/wireless-drivers-next/commits/rt2800-draft-v2 > > as reported by T-Bone here: > https://bugzilla.kernel.org/show_bug.cgi?id=82751 > first 3 patches made Netgear WN3000RPv3 router workable. > Another 2 should be further improvement, but were not tested broadly > and can have some bugs. > > Daniel, could you apply this on your staging tree for testing? Done and rebased on top of blogic's tree with mac80211 from v4.18-rc7 instead of (outdated) wireless-testing. Only compile-tested by myself for now, maybe someone can run a build on actual MT7620 hardware? Clone the master branch of https://git.openwrt.org/openwrt/staging/dangole.git and give it shot and let us know the results. > Please also remove: > 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch > 991-rt2800_change_rx_ampdu_factor.patch > 992-rt2800_change_ba_size.patch > 993-rt2800_change_rx_ampdu_density.patch > Those are not needed and can be harmful with the test patches, > (especially spurious interrupt one, patches will not apply cleanly > with it). Ack. Hope it doesn't break Rt3883 and/or Rt3663 for which that patch was added by Gabor Juhos a decade ago... > > Patches should make "Dropping frame due to full tx queue" and > "Queue 2 failed to flush" errors gone. However if device is somewhat > wrongly configured by driver at low level (i.e. wrong MAC/BBP/RF > registers programing) there still will be troubles to have stable > and fast wireless connection. > > On TODO site, the proper watchdog should be added, but I'm not sure > how to exactly detect device/firmware hung and how exactly device should > be rested (by some HW reset by register or by doing full reinitialization). > > Another thing is fixing RATE_PROBE frames which are aggregated with > other frames and not sent at requested rate. I implemented qsel queue patch > similar to mt76, but this not work as expected on older Ralink chips. > https://github.com/sgruszka/wireless-drivers-next/commit/846d205edd8c36d1b7828fee54bf4cf40bf8cb1a Which hardware did you try? Just so I can reproduce what's going on and maybe help fixing it... Cheers Daniel > > Any help with those 2 problems are welcome. > > Thanks > Stanislaw >
Hello On Thu, Aug 16, 2018 at 12:35:29AM +0200, Daniel Golle wrote: > Clone the master branch of > https://git.openwrt.org/openwrt/staging/dangole.git > and give it shot and let us know the results. Thanks! > > Please also remove: > > 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch > > 991-rt2800_change_rx_ampdu_factor.patch > > 992-rt2800_change_ba_size.patch > > 993-rt2800_change_rx_ampdu_density.patch > > Those are not needed and can be harmful with the test patches, > > (especially spurious interrupt one, patches will not apply cleanly > > with it). > > Ack. Hope it doesn't break Rt3883 and/or Rt3663 for which that patch > was added by Gabor Juhos a decade ago... The new patches change the way we handle TX status interrupt. I think spurious interrupts were a problem because we do not disable the IRQ and read statuses, so we could get interrupt after we empty TX_STA_FIFO register. With current implementation this should not be an issue. > > Another thing is fixing RATE_PROBE frames which are aggregated with > > other frames and not sent at requested rate. I implemented qsel queue patch > > similar to mt76, but this not work as expected on older Ralink chips. > > https://github.com/sgruszka/wireless-drivers-next/commit/846d205edd8c36d1b7828fee54bf4cf40bf8cb1a > > Which hardware did you try? Just so I can reproduce what's going on > and maybe help fixing it... I tested on RT3062 PCI and some USB dongles, don't remember chip version. Basically HW become unresponsive after sent some traffic. I debugged the problem by printing sequence number and rate of RATE PROBE frames by something like this: diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c index 92ddc19..d7f250b 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c @@ -334,6 +334,12 @@ static void rt2x00queue_create_tx_descriptor_ht(struct rt2x00_dev *rt2x00dev, txdesc->u.ht.mcs |= 0x08; } + if (tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) { + int ssn = le16_to_cpu(hdr->seq_ctrl) >> 4; + printk("RATE PROBE %d %d\n", ssn, txdesc->u.ht.mcs); + } + + if (test_bit(CONFIG_HT_DISABLED, &rt2x00dev->flags)) { if (!(tx_info->flags & IEEE80211_TX_CTL_FIRST_FRAGMENT)) txdesc->u.ht.txop = TXOP_SIFS; and compared that with on air traffic captured in monitor mode on different device in wireshark. On air traffic showed different probe frames rate than requested. Regards Stanislaw
Still seeing a lot of those Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.972151] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.981622] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.991070] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.000506] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.009935] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 while wifi somehow keeps working at degraded level once the error occured. On Thu, Aug 16, 2018 at 07:30:40PM +0000, Craig Matsuura wrote: > FWIW we would use iperf and push a lot of data to a STA and would cause lock ups and issues with the queues. > > > Craig > > > Craig Matsuura • Technical Director, Embedded Software Architecture > > cmatsuura@vivint.com<mailto:cmatsuura@vivint.com> • P: 801.229.6005 > > > > simply smarter • vivint.com > > 3401 N Ashton Blvd. Lehi, UT 84043 > > > > [1497369905956_vivint-logo-orange.png] > > ________________________________ > From: linux-wireless-owner@vger.kernel.org <linux-wireless-owner@vger.kernel.org> on behalf of Stanislaw Gruszka <sgruszka@redhat.com> > Sent: Thursday, August 16, 2018 5:01:04 AM > To: Daniel Golle > Cc: Kofi Agor; Enrico Mioso; Craig Matsuura; Mathias Kresin; Tom Psyborg; linux-wireless; John Crispin; Felix Fietkau; Jamie Stuart > Subject: Re: ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue...? > > Hello > > On Thu, Aug 16, 2018 at 12:35:29AM +0200, Daniel Golle wrote: > > Clone the master branch of > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.openwrt.org%2Fopenwrt%2Fstaging%2Fdangole.git&data=01%7C01%7Ccmatsuura%40vivint.com%7C259b5ae9949e445dd9fe08d603680486%7C54cc98ca024a470185483741e3b8d59d%7C0&sdata=8mOOSNRnqvjDVnCNHH15c%2BfEzr69yPpCkbSdD6Q8ZuA%3D&reserved=0 > > and give it shot and let us know the results. > > Thanks! > > > > Please also remove: > > > 600-23-rt2x00-rt2800mmio-add-a-workaround-for-spurious-TX_F.patch > > > 991-rt2800_change_rx_ampdu_factor.patch > > > 992-rt2800_change_ba_size.patch > > > 993-rt2800_change_rx_ampdu_density.patch > > > Those are not needed and can be harmful with the test patches, > > > (especially spurious interrupt one, patches will not apply cleanly > > > with it). > > > > Ack. Hope it doesn't break Rt3883 and/or Rt3663 for which that patch > > was added by Gabor Juhos a decade ago... > > The new patches change the way we handle TX status interrupt. I think > spurious interrupts were a problem because we do not disable the IRQ > and read statuses, so we could get interrupt after we empty TX_STA_FIFO > register. With current implementation this should not be an issue. > > > > Another thing is fixing RATE_PROBE frames which are aggregated with > > > other frames and not sent at requested rate. I implemented qsel queue patch > > > similar to mt76, but this not work as expected on older Ralink chips. > > > https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsgruszka%2Fwireless-drivers-next%2Fcommit%2F846d205edd8c36d1b7828fee54bf4cf40bf8cb1a&data=01%7C01%7Ccmatsuura%40vivint.com%7C259b5ae9949e445dd9fe08d603680486%7C54cc98ca024a470185483741e3b8d59d%7C0&sdata=mI%2BPAh7OalmB1hxzjF%2FL6GqxwHDPi7Hy5P9IWADu5gI%3D&reserved=0 > > > > Which hardware did you try? Just so I can reproduce what's going on > > and maybe help fixing it... > > I tested on RT3062 PCI and some USB dongles, don't remember chip version. > Basically HW become unresponsive after sent some traffic. > > I debugged the problem by printing sequence number and rate of RATE PROBE > frames by something like this: > > diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c > index 92ddc19..d7f250b 100644 > --- a/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c > +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00queue.c > @@ -334,6 +334,12 @@ static void rt2x00queue_create_tx_descriptor_ht(struct rt2x00_dev *rt2x00dev, > txdesc->u.ht.mcs |= 0x08; > } > > + if (tx_info->flags & IEEE80211_TX_CTL_RATE_CTRL_PROBE) { > + int ssn = le16_to_cpu(hdr->seq_ctrl) >> 4; > + printk("RATE PROBE %d %d\n", ssn, txdesc->u.ht.mcs); > + } > + > + > if (test_bit(CONFIG_HT_DISABLED, &rt2x00dev->flags)) { > if (!(tx_info->flags & IEEE80211_TX_CTL_FIRST_FRAGMENT)) > txdesc->u.ht.txop = TXOP_SIFS; > > and compared that with on air traffic captured in monitor mode on > different device in wireshark. On air traffic showed different > probe frames rate than requested. > > Regards > Stanislaw
Hi Daniel On Sat, Aug 18, 2018 at 06:08:40PM +0200, Daniel Golle wrote: > Still seeing a lot of those > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.972151] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.981622] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.991070] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.000506] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.009935] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > while wifi somehow keeps working at degraded level once the error > occured. I would like to know why we get those errors at all. Perhaps there is something wrong with queue pause/unpasue logic. Or with locking. Or maybe this is normal, we pausue queue in mac80211 but it still have some pending skbs which are scheduled to the driver. Let's check if pausing is correct. Please apply this patch: https://github.com/sgruszka/wireless-drivers-next/commit/525c50486e17446793b21ac7a8498cb48b3bb210.patch and provide dmesg output. Note that adding printk messages can make itself somehow AP unresponsive, as I pointed earlier in this thread. However at this point I would like to check if queue pause works as it should in the driver. Also, how do test? I can not reproduce those errors. Thanks Stanislaw
Hello On Mon, Aug 20, 2018 at 02:20:41PM +0200, Stanislaw Gruszka wrote: > On Sat, Aug 18, 2018 at 06:08:40PM +0200, Daniel Golle wrote: > > Still seeing a lot of those > > > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.972151] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.981622] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8364.991070] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.000506] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > Sat Aug 18 16:05:51 2018 kern.err kernel: [ 8365.009935] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 > > > > while wifi somehow keeps working at degraded level once the error > > occured. > > I would like to know why we get those errors at all. Perhaps there is > something wrong with queue pause/unpasue logic. Or with locking. > Or maybe this is normal, we pausue queue in mac80211 but it still have > some pending skbs which are scheduled to the driver. > > Let's check if pausing is correct. Please apply this patch: > https://github.com/sgruszka/wireless-drivers-next/commit/525c50486e17446793b21ac7a8498cb48b3bb210.patch > and provide dmesg output. > > Note that adding printk messages can make itself somehow AP > unresponsive, as I pointed earlier in this thread. However > at this point I would like to check if queue pause works as it > should in the driver. I get testing results from T-Bone user in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=82751 And get this: [ 781.644185] ieee80211 phy0: rt2x00queue_write_tx_frame: Error - Dropping frame due to full tx queue 2 [ 781.662620] 2 d1 s1 p1 ms1 Looks like rt2x00 correctly stop queue in mac80211, but sill get skb's. So we can do 3 things: increase queue size to 128, increase threshold to 16 and make the massage debug one instead of error (I checked some other drivers and looks most of them silently drop the frame in case queue is full). Especially removing the message can be useful since printk can somehow make mt7620 router unresponsive for some time resulting in connection drops. Thoughts ? Regards Stanislaw
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c index 595c662a61e8..de2643f7189f 100644 --- a/net/mac80211/agg-tx.c +++ b/net/mac80211/agg-tx.c @@ -135,6 +135,8 @@ void ieee80211_send_bar(struct ieee80211_vif *vif, u8 *ra, u16 tid, u16 ssn) bar->control = cpu_to_le16(bar_control); bar->start_seq_num = cpu_to_le16(ssn); + printk("%s tid %d ssn %04x %d\n", __func__, tid, ssn , ssn >> 4); + IEEE80211_SKB_CB(skb)->flags |= IEEE80211_TX_INTFL_DONT_ENCRYPT | IEEE80211_TX_CTL_REQ_TX_STATUS; ieee80211_tx_skb_tid(sdata, skb, tid); This patch alone cause wifi disconnects. I don't know where problem is, but remove printing i.e. rt2x00_warn() messages can be thing to check if you observe wifi disconnects. Another issue is BA ssn mishmashes which I can observe. Those were causing by older rt2800 usb hardware not responding correctly to BAR frame. I.e for BAR frame with SSN 0x0060 (6), it responded BA with SSN 7fe0 (2046). Removing of posting BAR by this patch: diff --git a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c index 357c0941aaad..7e2786464bcf 100644 --- a/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c +++ b/drivers/net/wireless/ralink/rt2x00/rt2x00dev.c @@ -382,9 +382,6 @@ static void rt2x00lib_fill_tx_status(struct rt2x00_dev *rt2x00dev, IEEE80211_TX_CTL_AMPDU; tx_info->status.ampdu_len = 1; tx_info->status.ampdu_ack_len = success ? 1 : 0; - - if (!success) - tx_info->flags |= IEEE80211_TX_STAT_AMPDU_NO_BACK; } if (rate_flags & IEEE80211_TX_RC_USE_RTS_CTS) {