[05/15] rtw88: pci: release tx skbs DMAed when stop

Message ID	1568617425-28062-6-git-send-email-yhchuang@realtek.com (mailing list archive)
State	Accepted
Commit	0e41edcdfe86435fef709b7de8397e8a5a0e1b2f
Delegated to:	Kalle Valo
Headers	show Return-Path: <SRS0=mNgj=XL=vger.kernel.org=linux-wireless-owner@kernel.org> Authenticated-By: X-SpamFilter-By: BOX Solutions SpamTrap 5.62 with qID x8G73rS3029972, This message is accepted by code: ctloc85258 From: <yhchuang@realtek.com> To: <kvalo@codeaurora.org> CC: <linux-wireless@vger.kernel.org>, <briannorris@chromium.org> Subject: [PATCH 05/15] rtw88: pci: release tx skbs DMAed when stop Date: Mon, 16 Sep 2019 15:03:35 +0800 Message-ID: <1568617425-28062-6-git-send-email-yhchuang@realtek.com> In-Reply-To: <1568617425-28062-1-git-send-email-yhchuang@realtek.com> References: <1568617425-28062-1-git-send-email-yhchuang@realtek.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk
Series	rtw88: Add support for deep PS mode \| expand [00/15] rtw88: Add support for deep PS mode [01/15] rtw88: remove redundant flag check helper function [02/15] rtw88: configure firmware after HCI started [03/15] rtw88: pci: reset H2C queue indexes in a single write [04/15] rtw88: pci: extract skbs free routine for trx rings [05/15] rtw88: pci: release tx skbs DMAed when stop [06/15] rtw88: not to enter or leave PS under IRQ [07/15] rtw88: not to control LPS by each vif [08/15] rtw88: remove unused lps state check helper [09/15] rtw88: LPS enter/leave should be protected by lock [10/15] rtw88: leave PS state for dynamic mechanism [11/15] rtw88: add deep power save support [12/15] rtw88: not to enter LPS by coex strategy [13/15] rtw88: select deep PS mode when module is inserted [14/15] rtw88: add deep PS PG mode for 8822c [15/15] rtw88: remove misleading module parameter rtw_fw_support_lps

Tony Chuang Sept. 16, 2019, 7:03 a.m. UTC

From: Yan-Hsuan Chuang <yhchuang@realtek.com>

Interrupt is disabled to stop PCI, which means the skbs
queued for each TX ring will not be released via DMA
interrupt. To avoid those skbs remained being left in
the skb queue until PCI has been removed, driver needs
to release skbs by itself.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Brian Norris Sept. 18, 2019, 12:41 a.m. UTC | #1

May be a dumb question but:

On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
>
> From: Yan-Hsuan Chuang <yhchuang@realtek.com>
>
> Interrupt is disabled to stop PCI, which means the skbs
> queued for each TX ring will not be released via DMA
> interrupt.

In what cases do you hit this? I think you do this when entering PS
mode, no? But then, see below.

> To avoid those skbs remained being left in
> the skb queue until PCI has been removed, driver needs
> to release skbs by itself.

Doesn't that also mean your dropping these packets? Shouldn't you be
delaying PS transitions until you've finished TX'ing?

Brian

Tony Chuang Sept. 18, 2019, 2:10 a.m. UTC | #2

> May be a dumb question but:
> 
> On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> >
> > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >
> > Interrupt is disabled to stop PCI, which means the skbs
> > queued for each TX ring will not be released via DMA
> > interrupt.
> 
> In what cases do you hit this? I think you do this when entering PS
> mode, no? But then, see below.

I'll hit this when ieee80211_ops::stop, or rtw_power_off.
Both are to turn off the device, so there's no more DMA activities.
If we don't release the SKBs that are not released by DMA interrupt
when powering off, these could be leaked.

> 
> > To avoid those skbs remained being left in
> > the skb queue until PCI has been removed, driver needs
> > to release skbs by itself.
> 
> Doesn't that also mean your dropping these packets? Shouldn't you be
> delaying PS transitions until you've finished TX'ing?
> 
> Brian
> 

Yan-Hsuan

Brian Norris Sept. 20, 2019, 12:35 a.m. UTC | #3

On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com> wrote:
> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> > >
> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> > >
> > > Interrupt is disabled to stop PCI, which means the skbs
> > > queued for each TX ring will not be released via DMA
> > > interrupt.
> >
> > In what cases do you hit this? I think you do this when entering PS
> > mode, no? But then, see below.
>
> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
> Both are to turn off the device, so there's no more DMA activities.
> If we don't release the SKBs that are not released by DMA interrupt
> when powering off, these could be leaked.

Ah, I was a bit confused. So it does get called from "PS" routines:
rtw_enter_ips() -> rtw_core_stop()
but that "IPS" mode means "Inactive" Power Save, and it's only used
when transitioning into idle states (IEEE80211_CONF_IDLE).

Incidentally, I think this also may explain many of the leaks I've
been seeing elsewhere, when I leave a device sitting and scanning for
a very long time -- each scan attempt is making a single transition
out-and-back to IPS mode, which meant it may be leaking any
outstanding TX DMA. And testing confirms this: if I just bring up the
interface, run a scan, then bring it down, I see many fewer unmaps
than maps. Doing this enough times, I run out of contiguous DMA memory
and the device stops working. This fixes that problem for me. So:

Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>

I wonder if, given the problems I've seen (the driver can become
totally ineroperable), this patch and the previous patch (its only
real dependency) should be fast-tracked to the current release.

Brian

Kalle Valo Sept. 20, 2019, 7:26 a.m. UTC | #4

Brian Norris <briannorris@chromium.org> writes:

> On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com> wrote:
>> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
>> > >
>> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
>> > >
>> > > Interrupt is disabled to stop PCI, which means the skbs
>> > > queued for each TX ring will not be released via DMA
>> > > interrupt.
>> >
>> > In what cases do you hit this? I think you do this when entering PS
>> > mode, no? But then, see below.
>>
>> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
>> Both are to turn off the device, so there's no more DMA activities.
>> If we don't release the SKBs that are not released by DMA interrupt
>> when powering off, these could be leaked.
>
> Ah, I was a bit confused. So it does get called from "PS" routines:
> rtw_enter_ips() -> rtw_core_stop()
> but that "IPS" mode means "Inactive" Power Save, and it's only used
> when transitioning into idle states (IEEE80211_CONF_IDLE).
>
> Incidentally, I think this also may explain many of the leaks I've
> been seeing elsewhere, when I leave a device sitting and scanning for
> a very long time -- each scan attempt is making a single transition
> out-and-back to IPS mode, which meant it may be leaking any
> outstanding TX DMA. And testing confirms this: if I just bring up the
> interface, run a scan, then bring it down, I see many fewer unmaps
> than maps. Doing this enough times, I run out of contiguous DMA memory
> and the device stops working. This fixes that problem for me. So:
>
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> Tested-by: Brian Norris <briannorris@chromium.org>
>
> I wonder if, given the problems I've seen (the driver can become
> totally ineroperable), this patch and the previous patch (its only
> real dependency) should be fast-tracked to the current release.

I agree, this sounds like a serious problem. So I'm planning to queue
patches 4 and 5 to v5.4, if it's ok for Tony.

Tony Chuang Sept. 20, 2019, 8:29 a.m. UTC | #5

> Brian Norris <briannorris@chromium.org> writes:
> 
> > On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com>
> wrote:
> >> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> >> > >
> >> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >> > >
> >> > > Interrupt is disabled to stop PCI, which means the skbs
> >> > > queued for each TX ring will not be released via DMA
> >> > > interrupt.
> >> >
> >> > In what cases do you hit this? I think you do this when entering PS
> >> > mode, no? But then, see below.
> >>
> >> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
> >> Both are to turn off the device, so there's no more DMA activities.
> >> If we don't release the SKBs that are not released by DMA interrupt
> >> when powering off, these could be leaked.
> >
> > Ah, I was a bit confused. So it does get called from "PS" routines:

I thought you're talking about IEEE80211_CONF_PS instead of
IEEE80211_CONF_IDLE.

> > rtw_enter_ips() -> rtw_core_stop()
> > but that "IPS" mode means "Inactive" Power Save, and it's only used
> > when transitioning into idle states (IEEE80211_CONF_IDLE).
> >
> > Incidentally, I think this also may explain many of the leaks I've
> > been seeing elsewhere, when I leave a device sitting and scanning for
> > a very long time -- each scan attempt is making a single transition
> > out-and-back to IPS mode, which meant it may be leaking any
> > outstanding TX DMA. And testing confirms this: if I just bring up the
> > interface, run a scan, then bring it down, I see many fewer unmaps
> > than maps. Doing this enough times, I run out of contiguous DMA memory
> > and the device stops working. This fixes that problem for me. So:
> >
> > Reviewed-by: Brian Norris <briannorris@chromium.org>
> > Tested-by: Brian Norris <briannorris@chromium.org>
> >
> > I wonder if, given the problems I've seen (the driver can become
> > totally ineroperable), this patch and the previous patch (its only
> > real dependency) should be fast-tracked to the current release.
> 
> I agree, this sounds like a serious problem. So I'm planning to queue
> patches 4 and 5 to v5.4, if it's ok for Tony.

It's OK for me, didn't realize that this is a serious problem, so I missed it.
Also if possible you should queue patch 2, that reordering will cause
two H2C skbs not be released because HCI hasn't started, everytime
enter/leave IDLE state (rtw_power_[on|off]).

Should I resend and add a v5.4 prefix or something?

Yan-Hsuan

Kalle Valo Sept. 20, 2019, 8:35 a.m. UTC | #6

Tony Chuang <yhchuang@realtek.com> writes:

>> > I wonder if, given the problems I've seen (the driver can become
>> > totally ineroperable), this patch and the previous patch (its only
>> > real dependency) should be fast-tracked to the current release.
>> 
>> I agree, this sounds like a serious problem. So I'm planning to queue
>> patches 4 and 5 to v5.4, if it's ok for Tony.
>
> It's OK for me, didn't realize that this is a serious problem, so I missed it.
> Also if possible you should queue patch 2, that reordering will cause
> two H2C skbs not be released because HCI hasn't started, everytime
> enter/leave IDLE state (rtw_power_[on|off]).
>
> Should I resend and add a v5.4 prefix or something?

No need to resend, I'll try to apply patches 2, 4 and 5 as is and will
let you know if there are any problems.

Brian Norris Sept. 20, 2019, 10:33 p.m. UTC | #7

On Fri, Sep 20, 2019 at 1:29 AM Tony Chuang <yhchuang@realtek.com> wrote:
> > Brian Norris <briannorris@chromium.org> writes:
> > > Ah, I was a bit confused. So it does get called from "PS" routines:
>
> I thought you're talking about IEEE80211_CONF_PS instead of
> IEEE80211_CONF_IDLE.

Like I said, I was confused :)

On first glance, I just saw the codepath showing up in ps.c, but then
I noticed it's only for IDLE, not PS.

> Also if possible you should queue patch 2, that reordering will cause
> two H2C skbs not be released because HCI hasn't started, everytime
> enter/leave IDLE state (rtw_power_[on|off]).

That patch also looks good to me, FWIW.

Side note: it's a little bit strange that your driver can silently
misbehave so badly, just by TX'ing in the wrong state. Would this be a
good case to add some WARN_ON() or WARN_ON_ONCE() (e.g., in functions
like rtw_fw_send_h2c_packet()), to check for the appropriate "started"
state?

Brian

[05/15] rtw88: pci: release tx skbs DMAed when stop

Commit Message

Comments

Patch