diff mbox

Regression associated with commit c8628155ece3 - "tcp: reduce out_of_order memory use"

Message ID 1345825432.19483.20.camel@edumazet-glaptop (mailing list archive)
State Not Applicable, archived
Headers show

Commit Message

Eric Dumazet Aug. 24, 2012, 4:23 p.m. UTC
On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
> > On 08/24/2012 10:19 AM, David Miller wrote:
> > >
> > > This looks like full-on data corruption to me.
> > 
> > I agree. The question is why does it happen with r8712u, and only after the 
> > commit in the subject. Drivers for other devices that I have are OK. Thus far, I 
> > have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are 
> > no reports posted for this bug with any other device.
> 
> bugs can sit unnoticed, and one change somewhere can uncover them.
> 
> Really this driver must have a bug, if not half a dozen of bugs.
> 
> For example this sequence of code is a clear bug :
> 
> sub_skb = dev_alloc_skb(nSubframe_Length + 12); 
> skb_reserve(sub_skb, 12);
> 
> 
> Also the free_recv_skb_queue looks really suspect to me
> 
> What the hell is doing recv_tasklet() I really wonder.
> 
> This code, combined with the skb_clone() in recvbuf2recvframe()
> can clearly reuse an skb passed to upper stacks.
> 
> 
> queueing one skb in free_recv_skb_queue should be done
> only if no clone of this skb exist somewhere.
> 
> Please someone fix this buggy driver.
> 

Try the following patch for a start



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Larry Finger Aug. 24, 2012, 4:58 p.m. UTC | #1
On 08/24/2012 11:23 AM, Eric Dumazet wrote:
> On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
>> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
>>> On 08/24/2012 10:19 AM, David Miller wrote:
>>>>
>>>> This looks like full-on data corruption to me.
>>>
>>> I agree. The question is why does it happen with r8712u, and only after the
>>> commit in the subject. Drivers for other devices that I have are OK. Thus far, I
>>> have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are
>>> no reports posted for this bug with any other device.
>>
>> bugs can sit unnoticed, and one change somewhere can uncover them.
>>
>> Really this driver must have a bug, if not half a dozen of bugs.
>>
>> For example this sequence of code is a clear bug :
>>
>> sub_skb = dev_alloc_skb(nSubframe_Length + 12);
>> skb_reserve(sub_skb, 12);
>>
>>
>> Also the free_recv_skb_queue looks really suspect to me
>>
>> What the hell is doing recv_tasklet() I really wonder.
>>
>> This code, combined with the skb_clone() in recvbuf2recvframe()
>> can clearly reuse an skb passed to upper stacks.
>>
>>
>> queueing one skb in free_recv_skb_queue should be done
>> only if no clone of this skb exist somewhere.
>>
>> Please someone fix this buggy driver.
>>
>
> Try the following patch for a start
>
> diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c
> index 8e82ce2..88e3ca6 100644
> --- a/drivers/staging/rtl8712/rtl8712_recv.c
> +++ b/drivers/staging/rtl8712/rtl8712_recv.c
> @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
>   		recvbuf2recvframe(padapter, pskb);
>   		skb_reset_tail_pointer(pskb);
>   		pskb->len = 0;
> -		skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
> +		if (!skb_cloned(pskb))
> +			skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
> +		else
> +			consume_skb(pskb);
>   	}
>   }

This one did not help. There is no doubt it is needed for the case where memory 
is tight, an allocation fails, and the driver clones the skb. In the present 
case, debug statements have shown that the skb_clone() call was not made.

In the long term, this driver will be replaced with one that uses mac80211, but 
in the short term, I am trying to fix it.

As I said earlier, my skb skills are minimal. Could you explain what is wrong 
with the following sequence?

  sub_skb = dev_alloc_skb(nSubframe_Length + 12);
  skb_reserve(sub_skb, 12);

Thanks,

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 24, 2012, 5:47 p.m. UTC | #2
On Fri, 2012-08-24 at 11:58 -0500, Larry Finger wrote:
> On 08/24/2012 11:23 AM, Eric Dumazet wrote:
> > On Fri, 2012-08-24 at 18:18 +0200, Eric Dumazet wrote:
> >> On Fri, 2012-08-24 at 10:58 -0500, Larry Finger wrote:
> >>> On 08/24/2012 10:19 AM, David Miller wrote:
> >>>>
> >>>> This looks like full-on data corruption to me.
> >>>
> >>> I agree. The question is why does it happen with r8712u, and only after the
> >>> commit in the subject. Drivers for other devices that I have are OK. Thus far, I
> >>> have tested b43, rtl8187, ath9k_htc, and rtl8192cu. To my knowledge, there are
> >>> no reports posted for this bug with any other device.
> >>
> >> bugs can sit unnoticed, and one change somewhere can uncover them.
> >>
> >> Really this driver must have a bug, if not half a dozen of bugs.
> >>
> >> For example this sequence of code is a clear bug :
> >>
> >> sub_skb = dev_alloc_skb(nSubframe_Length + 12);
> >> skb_reserve(sub_skb, 12);
> >>
> >>
> >> Also the free_recv_skb_queue looks really suspect to me
> >>
> >> What the hell is doing recv_tasklet() I really wonder.
> >>
> >> This code, combined with the skb_clone() in recvbuf2recvframe()
> >> can clearly reuse an skb passed to upper stacks.
> >>
> >>
> >> queueing one skb in free_recv_skb_queue should be done
> >> only if no clone of this skb exist somewhere.
> >>
> >> Please someone fix this buggy driver.
> >>
> >
> > Try the following patch for a start
> >
> > diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c
> > index 8e82ce2..88e3ca6 100644
> > --- a/drivers/staging/rtl8712/rtl8712_recv.c
> > +++ b/drivers/staging/rtl8712/rtl8712_recv.c
> > @@ -1127,6 +1127,9 @@ static void recv_tasklet(void *priv)
> >   		recvbuf2recvframe(padapter, pskb);
> >   		skb_reset_tail_pointer(pskb);
> >   		pskb->len = 0;
> > -		skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
> > +		if (!skb_cloned(pskb))
> > +			skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
> > +		else
> > +			consume_skb(pskb);
> >   	}
> >   }
> 
> This one did not help. There is no doubt it is needed for the case where memory 
> is tight, an allocation fails, and the driver clones the skb. In the present 
> case, debug statements have shown that the skb_clone() call was not made.
> 
> In the long term, this driver will be replaced with one that uses mac80211, but 
> in the short term, I am trying to fix it.
> 
> As I said earlier, my skb skills are minimal. Could you explain what is wrong 
> with the following sequence?
> 
>   sub_skb = dev_alloc_skb(nSubframe_Length + 12);
>   skb_reserve(sub_skb, 12);

dev_alloc_skb() can return NULL

-> crash

skb_clone() can also return NULL 

-> crash



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Larry Finger Aug. 27, 2012, 5:55 p.m. UTC | #3
On 08/24/2012 12:47 PM, Eric Dumazet wrote:
>
> dev_alloc_skb() can return NULL
>
> -> crash
>
> skb_clone() can also return NULL
>
> -> crash

I have prepared a patch to fix all the unchecked allocations.

Over the weekend I made some progress. To test the latest vendor driver, I 
installed a 32-bit system. Their driver is not compatible with a 64-bit system. 
I found that not only did the vendor driver work with secure sites, but so did 
the in-kernel version. I now have tcpdump output for the 32-bit case that works, 
and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
incompatibility when I did the conversion.

Thanks for all your help in trying to resolve this issue.

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Aug. 27, 2012, 6:21 p.m. UTC | #4
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:

> I have prepared a patch to fix all the unchecked allocations.
> 
> Over the weekend I made some progress. To test the latest vendor driver, I 
> installed a 32-bit system. Their driver is not compatible with a 64-bit system. 
> I found that not only did the vendor driver work with secure sites, but so did 
> the in-kernel version. I now have tcpdump output for the 32-bit case that works, 
> and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
> incompatibility when I did the conversion.
> 
> Thanks for all your help in trying to resolve this issue.

Interesting, so these 32/64 bits problems did not happen in prior
kernels ? (< 3.4 )


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Larry Finger Aug. 27, 2012, 8:39 p.m. UTC | #5
On 08/27/2012 01:21 PM, Eric Dumazet wrote:
> On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:
>
>> I have prepared a patch to fix all the unchecked allocations.
>>
>> Over the weekend I made some progress. To test the latest vendor driver, I
>> installed a 32-bit system. Their driver is not compatible with a 64-bit system.
>> I found that not only did the vendor driver work with secure sites, but so did
>> the in-kernel version. I now have tcpdump output for the 32-bit case that works,
>> and the 64-bit case that fails. It seems likely that I missed some 32/64 bit
>> incompatibility when I did the conversion.
>>
>> Thanks for all your help in trying to resolve this issue.
>
> Interesting, so these 32/64 bits problems did not happen in prior
> kernels ? (< 3.4 )

Prior kernels are OK.

Larry

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 10, 2012, 8:39 a.m. UTC | #6
On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:

> I have prepared a patch to fix all the unchecked allocations.
> 
> Over the weekend I made some progress. To test the latest vendor driver, I 
> installed a 32-bit system. Their driver is not compatible with a 64-bit system. 
> I found that not only did the vendor driver work with secure sites, but so did 
> the in-kernel version. I now have tcpdump output for the 32-bit case that works, 
> and the 64-bit case that fails. It seems likely that I missed some 32/64 bit 
> incompatibility when I did the conversion.
> 
> Thanks for all your help in trying to resolve this issue.
> 
> Larry
> 
> 

Hi Larry

It appears I have a D-Link N300 (DWA-131) nano USB adapter, using
staging/rtl8712 driver.

I tried many kernel versions (including 3.3) and none seems to work
reliably.

Sometime, I have some traffic but only for about 50 frames...
It might be because my access point is a netgear wndr3800, because I
have following warning a bit before the freezes :

r8712u: [r8712_got_addbareq_event_callback] mac = 20:4e:7f:5a:cd:30, sea = 80, tid = 0

Thanks


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Larry Finger Sept. 10, 2012, 2:53 p.m. UTC | #7
On 09/10/2012 03:39 AM, Eric Dumazet wrote:
> On Mon, 2012-08-27 at 12:55 -0500, Larry Finger wrote:
>
>> I have prepared a patch to fix all the unchecked allocations.
>>
>> Over the weekend I made some progress. To test the latest vendor driver, I
>> installed a 32-bit system. Their driver is not compatible with a 64-bit system.
>> I found that not only did the vendor driver work with secure sites, but so did
>> the in-kernel version. I now have tcpdump output for the 32-bit case that works,
>> and the 64-bit case that fails. It seems likely that I missed some 32/64 bit
>> incompatibility when I did the conversion.
>>
>> Thanks for all your help in trying to resolve this issue.
>>
>> Larry
>>
>>
>
> Hi Larry
>
> It appears I have a D-Link N300 (DWA-131) nano USB adapter, using
> staging/rtl8712 driver.
>
> I tried many kernel versions (including 3.3) and none seems to work
> reliably.
>
> Sometime, I have some traffic but only for about 50 frames...
> It might be because my access point is a netgear wndr3800, because I
> have following warning a bit before the freezes :
>
> r8712u: [r8712_got_addbareq_event_callback] mac = 20:4e:7f:5a:cd:30, sea = 80, tid = 0

Eric,

What is the md5sum for the firmware file /lib/firmware/rtlwifi/rtl8712u.bin? 
Over the weekend, there was a report of another device that had problems with 
firmware that was added to the linux-firmware repo in July with md5sum of 
c6f3b7b880aefb7b3f249428d659bdbb. An older version with md5sum of 
200fd952db3cc9259b1fd05e3e51966f works in that case. I'll send you this one 
privately.

I have a Netgear WNDR3300 and also get the addbareq events, but I do not get 
freezes. I'm not sure the message is correlated.

Larry


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Sept. 10, 2012, 3:04 p.m. UTC | #8
On Mon, 2012-09-10 at 09:53 -0500, Larry Finger wrote:

> Eric,
> 
> What is the md5sum for the firmware file /lib/firmware/rtlwifi/rtl8712u.bin? 
> Over the weekend, there was a report of another device that had problems with 
> firmware that was added to the linux-firmware repo in July with md5sum of 
> c6f3b7b880aefb7b3f249428d659bdbb. An older version with md5sum of 
> 200fd952db3cc9259b1fd05e3e51966f works in that case. I'll send you this one 
> privately.
> 
> I have a Netgear WNDR3300 and also get the addbareq events, but I do not get 
> freezes. I'm not sure the message is correlated.
> 

It seems I have the c6f3b7b880aefb7b3f249428d659bdbb one
# md5sum /data/src/linux-firmware/rtlwifi/rtl8712u.bin /lib/firmware/rtlwifi/rtl8712u.bin
c6f3b7b880aefb7b3f249428d659bdbb  /data/src/linux-firmware/rtlwifi/rtl8712u.bin
c6f3b7b880aefb7b3f249428d659bdbb  /lib/firmware/rtlwifi/rtl8712u.bin

Since I have linux-firmware tree, should I go back to initial commit 
commit 8f919160792e4702c6b7a67a243cea4f757407e4
Author: Larry Finger <Larry.Finger@lwfinger.net>
Date:   Mon Nov 1 23:56:52 2010 -0500

    linux-firmware: Add firmware files for Realtek RTL8712U and
RTL8192CE

Thanks !



--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/staging/rtl8712/rtl8712_recv.c b/drivers/staging/rtl8712/rtl8712_recv.c
index 8e82ce2..88e3ca6 100644
--- a/drivers/staging/rtl8712/rtl8712_recv.c
+++ b/drivers/staging/rtl8712/rtl8712_recv.c
@@ -1127,6 +1127,9 @@  static void recv_tasklet(void *priv)
 		recvbuf2recvframe(padapter, pskb);
 		skb_reset_tail_pointer(pskb);
 		pskb->len = 0;
-		skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
+		if (!skb_cloned(pskb))
+			skb_queue_tail(&precvpriv->free_recv_skb_queue, pskb);
+		else
+			consume_skb(pskb);
 	}
 }