diff mbox series

[05/15] rtw88: pci: release tx skbs DMAed when stop

Message ID 1568617425-28062-6-git-send-email-yhchuang@realtek.com (mailing list archive)
State Accepted
Commit 0e41edcdfe86435fef709b7de8397e8a5a0e1b2f
Delegated to: Kalle Valo
Headers show
Series rtw88: Add support for deep PS mode | expand

Commit Message

Tony Chuang Sept. 16, 2019, 7:03 a.m. UTC
From: Yan-Hsuan Chuang <yhchuang@realtek.com>

Interrupt is disabled to stop PCI, which means the skbs
queued for each TX ring will not be released via DMA
interrupt. To avoid those skbs remained being left in
the skb queue until PCI has been removed, driver needs
to release skbs by itself.

Signed-off-by: Yan-Hsuan Chuang <yhchuang@realtek.com>
---
 drivers/net/wireless/realtek/rtw88/pci.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

Comments

Brian Norris Sept. 18, 2019, 12:41 a.m. UTC | #1
May be a dumb question but:

On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
>
> From: Yan-Hsuan Chuang <yhchuang@realtek.com>
>
> Interrupt is disabled to stop PCI, which means the skbs
> queued for each TX ring will not be released via DMA
> interrupt.

In what cases do you hit this? I think you do this when entering PS
mode, no? But then, see below.

> To avoid those skbs remained being left in
> the skb queue until PCI has been removed, driver needs
> to release skbs by itself.

Doesn't that also mean your dropping these packets? Shouldn't you be
delaying PS transitions until you've finished TX'ing?

Brian
Tony Chuang Sept. 18, 2019, 2:10 a.m. UTC | #2
> May be a dumb question but:
> 
> On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> >
> > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >
> > Interrupt is disabled to stop PCI, which means the skbs
> > queued for each TX ring will not be released via DMA
> > interrupt.
> 
> In what cases do you hit this? I think you do this when entering PS
> mode, no? But then, see below.

I'll hit this when ieee80211_ops::stop, or rtw_power_off.
Both are to turn off the device, so there's no more DMA activities.
If we don't release the SKBs that are not released by DMA interrupt
when powering off, these could be leaked.

> 
> > To avoid those skbs remained being left in
> > the skb queue until PCI has been removed, driver needs
> > to release skbs by itself.
> 
> Doesn't that also mean your dropping these packets? Shouldn't you be
> delaying PS transitions until you've finished TX'ing?
> 
> Brian
> 

Yan-Hsuan
Brian Norris Sept. 20, 2019, 12:35 a.m. UTC | #3
On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com> wrote:
> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> > >
> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> > >
> > > Interrupt is disabled to stop PCI, which means the skbs
> > > queued for each TX ring will not be released via DMA
> > > interrupt.
> >
> > In what cases do you hit this? I think you do this when entering PS
> > mode, no? But then, see below.
>
> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
> Both are to turn off the device, so there's no more DMA activities.
> If we don't release the SKBs that are not released by DMA interrupt
> when powering off, these could be leaked.

Ah, I was a bit confused. So it does get called from "PS" routines:
rtw_enter_ips() -> rtw_core_stop()
but that "IPS" mode means "Inactive" Power Save, and it's only used
when transitioning into idle states (IEEE80211_CONF_IDLE).

Incidentally, I think this also may explain many of the leaks I've
been seeing elsewhere, when I leave a device sitting and scanning for
a very long time -- each scan attempt is making a single transition
out-and-back to IPS mode, which meant it may be leaking any
outstanding TX DMA. And testing confirms this: if I just bring up the
interface, run a scan, then bring it down, I see many fewer unmaps
than maps. Doing this enough times, I run out of contiguous DMA memory
and the device stops working. This fixes that problem for me. So:

Reviewed-by: Brian Norris <briannorris@chromium.org>
Tested-by: Brian Norris <briannorris@chromium.org>

I wonder if, given the problems I've seen (the driver can become
totally ineroperable), this patch and the previous patch (its only
real dependency) should be fast-tracked to the current release.

Brian
Kalle Valo Sept. 20, 2019, 7:26 a.m. UTC | #4
Brian Norris <briannorris@chromium.org> writes:

> On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com> wrote:
>> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
>> > >
>> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
>> > >
>> > > Interrupt is disabled to stop PCI, which means the skbs
>> > > queued for each TX ring will not be released via DMA
>> > > interrupt.
>> >
>> > In what cases do you hit this? I think you do this when entering PS
>> > mode, no? But then, see below.
>>
>> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
>> Both are to turn off the device, so there's no more DMA activities.
>> If we don't release the SKBs that are not released by DMA interrupt
>> when powering off, these could be leaked.
>
> Ah, I was a bit confused. So it does get called from "PS" routines:
> rtw_enter_ips() -> rtw_core_stop()
> but that "IPS" mode means "Inactive" Power Save, and it's only used
> when transitioning into idle states (IEEE80211_CONF_IDLE).
>
> Incidentally, I think this also may explain many of the leaks I've
> been seeing elsewhere, when I leave a device sitting and scanning for
> a very long time -- each scan attempt is making a single transition
> out-and-back to IPS mode, which meant it may be leaking any
> outstanding TX DMA. And testing confirms this: if I just bring up the
> interface, run a scan, then bring it down, I see many fewer unmaps
> than maps. Doing this enough times, I run out of contiguous DMA memory
> and the device stops working. This fixes that problem for me. So:
>
> Reviewed-by: Brian Norris <briannorris@chromium.org>
> Tested-by: Brian Norris <briannorris@chromium.org>
>
> I wonder if, given the problems I've seen (the driver can become
> totally ineroperable), this patch and the previous patch (its only
> real dependency) should be fast-tracked to the current release.

I agree, this sounds like a serious problem. So I'm planning to queue
patches 4 and 5 to v5.4, if it's ok for Tony.
Tony Chuang Sept. 20, 2019, 8:29 a.m. UTC | #5
> Brian Norris <briannorris@chromium.org> writes:
> 
> > On Tue, Sep 17, 2019 at 7:10 PM Tony Chuang <yhchuang@realtek.com>
> wrote:
> >> > On Mon, Sep 16, 2019 at 12:03 AM <yhchuang@realtek.com> wrote:
> >> > >
> >> > > From: Yan-Hsuan Chuang <yhchuang@realtek.com>
> >> > >
> >> > > Interrupt is disabled to stop PCI, which means the skbs
> >> > > queued for each TX ring will not be released via DMA
> >> > > interrupt.
> >> >
> >> > In what cases do you hit this? I think you do this when entering PS
> >> > mode, no? But then, see below.
> >>
> >> I'll hit this when ieee80211_ops::stop, or rtw_power_off.
> >> Both are to turn off the device, so there's no more DMA activities.
> >> If we don't release the SKBs that are not released by DMA interrupt
> >> when powering off, these could be leaked.
> >
> > Ah, I was a bit confused. So it does get called from "PS" routines:

I thought you're talking about IEEE80211_CONF_PS instead of
IEEE80211_CONF_IDLE.

> > rtw_enter_ips() -> rtw_core_stop()
> > but that "IPS" mode means "Inactive" Power Save, and it's only used
> > when transitioning into idle states (IEEE80211_CONF_IDLE).
> >
> > Incidentally, I think this also may explain many of the leaks I've
> > been seeing elsewhere, when I leave a device sitting and scanning for
> > a very long time -- each scan attempt is making a single transition
> > out-and-back to IPS mode, which meant it may be leaking any
> > outstanding TX DMA. And testing confirms this: if I just bring up the
> > interface, run a scan, then bring it down, I see many fewer unmaps
> > than maps. Doing this enough times, I run out of contiguous DMA memory
> > and the device stops working. This fixes that problem for me. So:
> >
> > Reviewed-by: Brian Norris <briannorris@chromium.org>
> > Tested-by: Brian Norris <briannorris@chromium.org>
> >
> > I wonder if, given the problems I've seen (the driver can become
> > totally ineroperable), this patch and the previous patch (its only
> > real dependency) should be fast-tracked to the current release.
> 
> I agree, this sounds like a serious problem. So I'm planning to queue
> patches 4 and 5 to v5.4, if it's ok for Tony.

It's OK for me, didn't realize that this is a serious problem, so I missed it.
Also if possible you should queue patch 2, that reordering will cause
two H2C skbs not be released because HCI hasn't started, everytime
enter/leave IDLE state (rtw_power_[on|off]).

Should I resend and add a v5.4 prefix or something?

Yan-Hsuan
Kalle Valo Sept. 20, 2019, 8:35 a.m. UTC | #6
Tony Chuang <yhchuang@realtek.com> writes:

>> > I wonder if, given the problems I've seen (the driver can become
>> > totally ineroperable), this patch and the previous patch (its only
>> > real dependency) should be fast-tracked to the current release.
>> 
>> I agree, this sounds like a serious problem. So I'm planning to queue
>> patches 4 and 5 to v5.4, if it's ok for Tony.
>
> It's OK for me, didn't realize that this is a serious problem, so I missed it.
> Also if possible you should queue patch 2, that reordering will cause
> two H2C skbs not be released because HCI hasn't started, everytime
> enter/leave IDLE state (rtw_power_[on|off]).
>
> Should I resend and add a v5.4 prefix or something?

No need to resend, I'll try to apply patches 2, 4 and 5 as is and will
let you know if there are any problems.
Brian Norris Sept. 20, 2019, 10:33 p.m. UTC | #7
On Fri, Sep 20, 2019 at 1:29 AM Tony Chuang <yhchuang@realtek.com> wrote:
> > Brian Norris <briannorris@chromium.org> writes:
> > > Ah, I was a bit confused. So it does get called from "PS" routines:
>
> I thought you're talking about IEEE80211_CONF_PS instead of
> IEEE80211_CONF_IDLE.

Like I said, I was confused :)

On first glance, I just saw the codepath showing up in ps.c, but then
I noticed it's only for IDLE, not PS.

> Also if possible you should queue patch 2, that reordering will cause
> two H2C skbs not be released because HCI hasn't started, everytime
> enter/leave IDLE state (rtw_power_[on|off]).

That patch also looks good to me, FWIW.

Side note: it's a little bit strange that your driver can silently
misbehave so badly, just by TX'ing in the wrong state. Would this be a
good case to add some WARN_ON() or WARN_ON_ONCE() (e.g., in functions
like rtw_fw_send_h2c_packet()), to check for the appropriate "started"
state?

Brian
diff mbox series

Patch

diff --git a/drivers/net/wireless/realtek/rtw88/pci.c b/drivers/net/wireless/realtek/rtw88/pci.c
index 3238161..509743c 100644
--- a/drivers/net/wireless/realtek/rtw88/pci.c
+++ b/drivers/net/wireless/realtek/rtw88/pci.c
@@ -500,6 +500,17 @@  static void rtw_pci_dma_reset(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci)
 	rtwpci->rx_tag = 0;
 }
 
+static void rtw_pci_dma_release(struct rtw_dev *rtwdev, struct rtw_pci *rtwpci)
+{
+	struct rtw_pci_tx_ring *tx_ring;
+	u8 queue;
+
+	for (queue = 0; queue < RTK_MAX_TX_QUEUE_NUM; queue++) {
+		tx_ring = &rtwpci->tx_rings[queue];
+		rtw_pci_free_tx_ring_skbs(rtwdev, tx_ring);
+	}
+}
+
 static int rtw_pci_start(struct rtw_dev *rtwdev)
 {
 	struct rtw_pci *rtwpci = (struct rtw_pci *)rtwdev->priv;
@@ -521,6 +532,7 @@  static void rtw_pci_stop(struct rtw_dev *rtwdev)
 
 	spin_lock_irqsave(&rtwpci->irq_lock, flags);
 	rtw_pci_disable_interrupt(rtwdev, rtwpci);
+	rtw_pci_dma_release(rtwdev, rtwpci);
 	spin_unlock_irqrestore(&rtwpci->irq_lock, flags);
 }