diff mbox

bridge: orphan frags on local receive

Message ID 530B4534.3000106@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qinchuanyu Feb. 24, 2014, 1:12 p.m. UTC
with vhost tx zero_copy, guest nic might get hang when host reserving
skb in socket queue delivered by guest, the case has been solved in
tun, it also been needed by bridge. This could easily happened when a
LAST_ACK state tcp occuring between guest and host.

Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
---
  net/bridge/br_input.c |    3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

Comments

Michael S. Tsirkin Feb. 24, 2014, 1:29 p.m. UTC | #1
On Mon, Feb 24, 2014 at 09:12:20PM +0800, Qin Chuanyu wrote:
> with vhost tx zero_copy, guest nic might get hang when host reserving
> skb in socket queue delivered by guest, the case has been solved in
> tun, it also been needed by bridge. This could easily happened when a
> LAST_ACK state tcp occuring between guest and host.
> 
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>

Do you actually observe guest hang?

I would expect orphan frags in
__netif_receive_skb_core to be enough.


> ---
>  net/bridge/br_input.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 28d5446..744e27a 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
>  		br->dev->stats.multicast++;
>  	} else if ((dst = __br_fdb_get(br, dest, vid)) &&
>  			dst->is_local) {
> +		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> +			goto drop;
>  		skb2 = skb;
>  		/* Do not forward the packet since it's local. */
>  		skb = NULL;
> @@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>  out:
>  	return 0;
>  drop:
> +	skb_tx_error(skb);
>  	kfree_skb(skb);
>  	goto out;
>  }
> -- 
> 1.7.3.1.msysgit.0
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qinchuanyu Feb. 24, 2014, 1:52 p.m. UTC | #2
On 2014/2/24 21:29, Michael S. Tsirkin wrote:
> On Mon, Feb 24, 2014 at 09:12:20PM +0800, Qin Chuanyu wrote:
>> with vhost tx zero_copy, guest nic might get hang when host reserving
>> skb in socket queue delivered by guest, the case has been solved in
>> tun, it also been needed by bridge. This could easily happened when a
>> LAST_ACK state tcp occuring between guest and host.
>>
>> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
>
> Do you actually observe guest hang?
>
yes, guest nic could not xmit any more until the skb holded by host has
been freed, the mainly reason is that though virtio-net could use vring
desc out of order, but ubufs is been used by vhost in order. so only
one skb could cause guest nic hang.
> I would expect orphan frags in
> __netif_receive_skb_core to be enough.
>
yes, it would be better. I would deliver another patch soon.
>
>> ---
>>   net/bridge/br_input.c |    3 +++
>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>
>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>> index 28d5446..744e27a 100644
>> --- a/net/bridge/br_input.c
>> +++ b/net/bridge/br_input.c
>> @@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>   		br->dev->stats.multicast++;
>>   	} else if ((dst = __br_fdb_get(br, dest, vid)) &&
>>   			dst->is_local) {
>> +		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
>> +			goto drop;
>>   		skb2 = skb;
>>   		/* Do not forward the packet since it's local. */
>>   		skb = NULL;
>> @@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>   out:
>>   	return 0;
>>   drop:
>> +	skb_tx_error(skb);
>>   	kfree_skb(skb);
>>   	goto out;
>>   }
>> --
>> 1.7.3.1.msysgit.0
>
>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qinchuanyu Feb. 24, 2014, 2:31 p.m. UTC | #3
On 2014/2/24 21:52, Qin Chuanyu wrote:
> On 2014/2/24 21:29, Michael S. Tsirkin wrote:
>> On Mon, Feb 24, 2014 at 09:12:20PM +0800, Qin Chuanyu wrote:
>>> with vhost tx zero_copy, guest nic might get hang when host reserving
>>> skb in socket queue delivered by guest, the case has been solved in
>>> tun, it also been needed by bridge. This could easily happened when a
>>> LAST_ACK state tcp occuring between guest and host.
>>>
>>> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
>>
>> Do you actually observe guest hang?
>>
> yes, guest nic could not xmit any more until the skb holded by host has
> been freed, the mainly reason is that though virtio-net could use vring
> desc out of order, but ubufs is been used by vhost in order. so only
> one skb could cause guest nic hang.
>> I would expect orphan frags in
>> __netif_receive_skb_core to be enough.
>>
> yes, it would be better. I would deliver another patch soon.
>>
After thinking for a while, I think that orphan frags could only been
involved in virtual nic receive path, because after
__netif_receive_skb_core, skb would been transmitted by physic nic, in
this way orphan frags is unnecessary.
>>> ---
>>>   net/bridge/br_input.c |    3 +++
>>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>> index 28d5446..744e27a 100644
>>> --- a/net/bridge/br_input.c
>>> +++ b/net/bridge/br_input.c
>>> @@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>>           br->dev->stats.multicast++;
>>>       } else if ((dst = __br_fdb_get(br, dest, vid)) &&
>>>               dst->is_local) {
>>> +        if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
>>> +            goto drop;
>>>           skb2 = skb;
>>>           /* Do not forward the packet since it's local. */
>>>           skb = NULL;
>>> @@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>>   out:
>>>       return 0;
>>>   drop:
>>> +    skb_tx_error(skb);
>>>       kfree_skb(skb);
>>>       goto out;
>>>   }
>>> --
>>> 1.7.3.1.msysgit.0
>>
>>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Feb. 24, 2014, 3:49 p.m. UTC | #4
On Mon, Feb 24, 2014 at 09:52:11PM +0800, Qin Chuanyu wrote:
> On 2014/2/24 21:29, Michael S. Tsirkin wrote:
> >On Mon, Feb 24, 2014 at 09:12:20PM +0800, Qin Chuanyu wrote:
> >>with vhost tx zero_copy, guest nic might get hang when host reserving
> >>skb in socket queue delivered by guest, the case has been solved in
> >>tun, it also been needed by bridge. This could easily happened when a
> >>LAST_ACK state tcp occuring between guest and host.
> >>
> >>Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> >
> >Do you actually observe guest hang?
> >
> yes, guest nic could not xmit any more until the skb holded by host has
> been freed, the mainly reason is that though virtio-net could use vring
> desc out of order, but ubufs is been used by vhost in order. so only
> one skb could cause guest nic hang.
> >I would expect orphan frags in
> >__netif_receive_skb_core to be enough.
> >
> yes, it would be better. I would deliver another patch soon.

well __netif_receive_skb_core already calls orphan frags already.
What's left to do?

> >
> >>---
> >>  net/bridge/br_input.c |    3 +++
> >>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>
> >>diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> >>index 28d5446..744e27a 100644
> >>--- a/net/bridge/br_input.c
> >>+++ b/net/bridge/br_input.c
> >>@@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
> >>  		br->dev->stats.multicast++;
> >>  	} else if ((dst = __br_fdb_get(br, dest, vid)) &&
> >>  			dst->is_local) {
> >>+		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> >>+			goto drop;
> >>  		skb2 = skb;
> >>  		/* Do not forward the packet since it's local. */
> >>  		skb = NULL;
> >>@@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
> >>  out:
> >>  	return 0;
> >>  drop:
> >>+	skb_tx_error(skb);
> >>  	kfree_skb(skb);
> >>  	goto out;
> >>  }
> >>--
> >>1.7.3.1.msysgit.0
> >
> >
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qinchuanyu Feb. 25, 2014, 2:02 a.m. UTC | #5
On 2014/2/24 23:49, Michael S. Tsirkin wrote:
> On Mon, Feb 24, 2014 at 09:52:11PM +0800, Qin Chuanyu wrote:
>> On 2014/2/24 21:29, Michael S. Tsirkin wrote:
>>> On Mon, Feb 24, 2014 at 09:12:20PM +0800, Qin Chuanyu wrote:
>>>> with vhost tx zero_copy, guest nic might get hang when host reserving
>>>> skb in socket queue delivered by guest, the case has been solved in
>>>> tun, it also been needed by bridge. This could easily happened when a
>>>> LAST_ACK state tcp occuring between guest and host.
>>>>
>>>> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
>>>
>>> Do you actually observe guest hang?
>>>
>> yes, guest nic could not xmit any more until the skb holded by host has
>> been freed, the mainly reason is that though virtio-net could use vring
>> desc out of order, but ubufs is been used by vhost in order. so only
>> one skb could cause guest nic hang.
>>> I would expect orphan frags in
>>> __netif_receive_skb_core to be enough.
>>>
>> yes, it would be better. I would deliver another patch soon.
>
> well __netif_receive_skb_core already calls orphan frags already.
> What's left to do?
>
None :) , I will use with this change.
>>>
>>>> ---
>>>>   net/bridge/br_input.c |    3 +++
>>>>   1 files changed, 3 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
>>>> index 28d5446..744e27a 100644
>>>> --- a/net/bridge/br_input.c
>>>> +++ b/net/bridge/br_input.c
>>>> @@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>>>   		br->dev->stats.multicast++;
>>>>   	} else if ((dst = __br_fdb_get(br, dest, vid)) &&
>>>>   			dst->is_local) {
>>>> +		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
>>>> +			goto drop;
>>>>   		skb2 = skb;
>>>>   		/* Do not forward the packet since it's local. */
>>>>   		skb = NULL;
>>>> @@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>>>>   out:
>>>>   	return 0;
>>>>   drop:
>>>> +	skb_tx_error(skb);
>>>>   	kfree_skb(skb);
>>>>   	goto out;
>>>>   }
>>>> --
>>>> 1.7.3.1.msysgit.0
>>>
>>>
>>
>
> .
>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Feb. 25, 2014, 3:53 a.m. UTC | #6
On 02/24/2014 09:12 PM, Qin Chuanyu wrote:
> with vhost tx zero_copy, guest nic might get hang when host reserving
> skb in socket queue delivered by guest, the case has been solved in
> tun, it also been needed by bridge. This could easily happened when a
> LAST_ACK state tcp occuring between guest and host.
>
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---
>  net/bridge/br_input.c |    3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
> index 28d5446..744e27a 100644
> --- a/net/bridge/br_input.c
> +++ b/net/bridge/br_input.c
> @@ -117,6 +117,8 @@ int br_handle_frame_finish(struct sk_buff *skb)
>          br->dev->stats.multicast++;
>      } else if ((dst = __br_fdb_get(br, dest, vid)) &&
>              dst->is_local) {
> +        if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
> +            goto drop;
>          skb2 = skb;
>          /* Do not forward the packet since it's local. */
>          skb = NULL;
> @@ -136,6 +138,7 @@ int br_handle_frame_finish(struct sk_buff *skb)
>  out:
>      return 0;
>  drop:
> +    skb_tx_error(skb);
>      kfree_skb(skb);
>      goto out;
>  }

Pretty similar to the issue we found for non-work conserving qdiscs. The
questions is whether or not should we audit all such cases ( I believe
we could find even more ) and does the skb_orphan_frags(), or doing
something like switch to use data copy in this case

I will post a patch that switch to use data copy when the number of
pending DMAs exceed a limit. Looks like it can help here.

Another question is whether or nor do we need a skb_orphan() here now
(at least for packets from tun) ?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/bridge/br_input.c b/net/bridge/br_input.c
index 28d5446..744e27a 100644
--- a/net/bridge/br_input.c
+++ b/net/bridge/br_input.c
@@ -117,6 +117,8 @@  int br_handle_frame_finish(struct sk_buff *skb)
  		br->dev->stats.multicast++;
  	} else if ((dst = __br_fdb_get(br, dest, vid)) &&
  			dst->is_local) {
+		if (unlikely(skb_orphan_frags(skb, GFP_ATOMIC)))
+			goto drop;
  		skb2 = skb;
  		/* Do not forward the packet since it's local. */
  		skb = NULL;
@@ -136,6 +138,7 @@  int br_handle_frame_finish(struct sk_buff *skb)
  out:
  	return 0;
  drop:
+	skb_tx_error(skb);
  	kfree_skb(skb);
  	goto out;
  }