diff mbox

tun: use netif_receive_skb instead of netif_rx_ni

Message ID 52FA32C5.9040601@huawei.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qinchuanyu Feb. 11, 2014, 2:25 p.m. UTC
we could xmit directly instead of going through softirq to gain 
throughput and lantency improved.
test model: VM-Host-Host just do transmit. with vhost thread and nic
interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
Host OS: suse11sp3, Guest OS: suse11sp3

latency result(us):
packet_len 64 256 512 1460
old(UDP)   44  47  48   66
new(UDP)   38  41  42   66

old(TCP)   52  55  70  117
new(TCP)   45  48  61  114

throughput result(Gbit/s):
packet_len   64   512   1024   1460
old(UDP)   0.42  2.02   3.75   4.68
new(UDP)   0.45  2.14   3.77   5.06

TCP due to the latency, client couldn't send packet big enough
to get benefit from TSO of nic, so the result show it will send
more packet per sencond but get lower throughput.

Eric mentioned that it would has problem with cgroup, but the patch
had been sent by Herbert Xu.
patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52

Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
---
  drivers/net/tun.c |    4 +++-
  1 files changed, 3 insertions(+), 1 deletions(-)

  	tun->dev->stats.rx_bytes += len;

Comments

Eric Dumazet Feb. 11, 2014, 3:24 p.m. UTC | #1
On Tue, 2014-02-11 at 22:25 +0800, Qin Chuanyu wrote:
> we could xmit directly instead of going through softirq to gain 
> throughput and lantency improved.
> test model: VM-Host-Host just do transmit. with vhost thread and nic
> interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
> Host OS: suse11sp3, Guest OS: suse11sp3
> 
> latency result(us):
> packet_len 64 256 512 1460
> old(UDP)   44  47  48   66
> new(UDP)   38  41  42   66
> 
> old(TCP)   52  55  70  117
> new(TCP)   45  48  61  114
> 
> throughput result(Gbit/s):
> packet_len   64   512   1024   1460
> old(UDP)   0.42  2.02   3.75   4.68
> new(UDP)   0.45  2.14   3.77   5.06
> 
> TCP due to the latency, client couldn't send packet big enough
> to get benefit from TSO of nic, so the result show it will send
> more packet per sencond but get lower throughput.
> 
> Eric mentioned that it would has problem with cgroup, but the patch
> had been sent by Herbert Xu.
> patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
> 
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---
>   drivers/net/tun.c |    4 +++-
>   1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44c4db8..90b4e58 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct 
> *tun, struct tun_file *tfile,
>   	skb_probe_transport_header(skb, 0);
> 
>   	rxhash = skb_get_hash(skb);
> -	netif_rx_ni(skb);
> +	rcu_read_lock_bh();
> +	netif_receive_skb(skb);
> +	rcu_read_unlock_bh();
> 
>   	tun->dev->stats.rx_packets++;
>   	tun->dev->stats.rx_bytes += len;

I already said this patch is not good :

rcu_read_lock_bh() makes no sense here.

What is really needed is local_bh_disable();

Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
cleaner form.

Just use it, CC him, credit him, please ?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Feb. 11, 2014, 3:47 p.m. UTC | #2
On Tue, Feb 11, 2014 at 07:24:04AM -0800, Eric Dumazet wrote:
> On Tue, 2014-02-11 at 22:25 +0800, Qin Chuanyu wrote:
> > we could xmit directly instead of going through softirq to gain 
> > throughput and lantency improved.
> > test model: VM-Host-Host just do transmit. with vhost thread and nic
> > interrupt bind cpu1. netperf do throuhput test and qperf do lantency test.
> > Host OS: suse11sp3, Guest OS: suse11sp3
> > 
> > latency result(us):
> > packet_len 64 256 512 1460
> > old(UDP)   44  47  48   66
> > new(UDP)   38  41  42   66
> > 
> > old(TCP)   52  55  70  117
> > new(TCP)   45  48  61  114
> > 
> > throughput result(Gbit/s):
> > packet_len   64   512   1024   1460
> > old(UDP)   0.42  2.02   3.75   4.68
> > new(UDP)   0.45  2.14   3.77   5.06
> > 
> > TCP due to the latency, client couldn't send packet big enough
> > to get benefit from TSO of nic, so the result show it will send
> > more packet per sencond but get lower throughput.
> > 
> > Eric mentioned that it would has problem with cgroup, but the patch
> > had been sent by Herbert Xu.
> > patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
> > 
> > Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> > ---
> >   drivers/net/tun.c |    4 +++-
> >   1 files changed, 3 insertions(+), 1 deletions(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index 44c4db8..90b4e58 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct 
> > *tun, struct tun_file *tfile,
> >   	skb_probe_transport_header(skb, 0);
> > 
> >   	rxhash = skb_get_hash(skb);
> > -	netif_rx_ni(skb);
> > +	rcu_read_lock_bh();
> > +	netif_receive_skb(skb);
> > +	rcu_read_unlock_bh();
> > 
> >   	tun->dev->stats.rx_packets++;
> >   	tun->dev->stats.rx_bytes += len;
> 
> I already said this patch is not good :
> 
> rcu_read_lock_bh() makes no sense here.
> 
> What is really needed is local_bh_disable();
> 
> Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
> cleaner form.
> 
> Just use it, CC him, credit him, please ?
> 

But not before making sure (testing) that patch does not break the
cgroups classifier please.
Qinchuanyu Feb. 12, 2014, 1:50 a.m. UTC | #3
On 2014/2/11 23:24, Eric Dumazet wrote:
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 44c4db8..90b4e58 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
>> *tun, struct tun_file *tfile,
>>    	skb_probe_transport_header(skb, 0);
>>
>>    	rxhash = skb_get_hash(skb);
>> -	netif_rx_ni(skb);
>> +	rcu_read_lock_bh();
>> +	netif_receive_skb(skb);
>> +	rcu_read_unlock_bh();
>>
>>    	tun->dev->stats.rx_packets++;
>>    	tun->dev->stats.rx_bytes += len;
>
> I already said this patch is not good :
>
> rcu_read_lock_bh() makes no sense here.
>
> What is really needed is local_bh_disable();
>
> Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
> cleaner form.
>
> Just use it, CC him, credit him, please ?
>
To: Herbert Xu
I saw that you had delivered a patch to resolve the problem of cgroup.
patch num is f845172531fb7410c7fb7780b1a6e51ee6df7d52

so would you deliver your patch for tun again? I had test it and found 
it work well.

To: Eric
I approve your idea that rcu_read_lock make no sense. but I couldn't
understand why dev_queue_xmit use it.
What is the difference between vhost thread using tap to xmit skb and 
another thread use dev_queue_xmit to xmit skb?
It is really my honor if you could explain it for me.
Thanks.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Feb. 12, 2014, 5:28 a.m. UTC | #4
On 02/11/2014 10:25 PM, Qin Chuanyu wrote:
> we could xmit directly instead of going through softirq to gain
> throughput and lantency improved.
> test model: VM-Host-Host just do transmit. with vhost thread and nic
> interrupt bind cpu1. netperf do throuhput test and qperf do lantency
> test.
> Host OS: suse11sp3, Guest OS: suse11sp3
>
> latency result(us):
> packet_len 64 256 512 1460
> old(UDP)   44  47  48   66
> new(UDP)   38  41  42   66
>
> old(TCP)   52  55  70  117
> new(TCP)   45  48  61  114
>
> throughput result(Gbit/s):
> packet_len   64   512   1024   1460
> old(UDP)   0.42  2.02   3.75   4.68
> new(UDP)   0.45  2.14   3.77   5.06
>
> TCP due to the latency, client couldn't send packet big enough
> to get benefit from TSO of nic, so the result show it will send
> more packet per sencond but get lower throughput.
>
> Eric mentioned that it would has problem with cgroup, but the patch
> had been sent by Herbert Xu.
> patch_id f845172531fb7410c7fb7780b1a6e51ee6df7d52
>
> Signed-off-by: Chuanyu Qin <qinchuanyu@huawei.com>
> ---

A question: without NAPI weight, could this starve other net devices?
>  drivers/net/tun.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44c4db8..90b4e58 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
> *tun, struct tun_file *tfile,
>      skb_probe_transport_header(skb, 0);
>
>      rxhash = skb_get_hash(skb);
> -    netif_rx_ni(skb);
> +    rcu_read_lock_bh();
> +    netif_receive_skb(skb);
> +    rcu_read_unlock_bh();
>
>      tun->dev->stats.rx_packets++;
>      tun->dev->stats.rx_bytes += len;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 12, 2014, 5:47 a.m. UTC | #5
On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:

> A question: without NAPI weight, could this starve other net devices?

Not really, as net devices are serviced by softirq handler.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Feb. 12, 2014, 5:50 a.m. UTC | #6
On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
>
>> A question: without NAPI weight, could this starve other net devices?
> Not really, as net devices are serviced by softirq handler.
>
>

Yes, then the issue is tun could be starved by other net devices.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Feb. 12, 2014, 6:26 a.m. UTC | #7
On Wed, 2014-02-12 at 13:50 +0800, Jason Wang wrote:
> On 02/12/2014 01:47 PM, Eric Dumazet wrote:
> > On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
> >
> >> A question: without NAPI weight, could this starve other net devices?
> > Not really, as net devices are serviced by softirq handler.
> >
> >
> 
> Yes, then the issue is tun could be starved by other net devices.

How this patch changes anything to this 'problem' ?

netif_rx_ni() can only be called if your process is not preempted by
other high prio tasks/softirqs.

If this process is scheduled on a cpu, then disabling bh to process
_one_ packet wont fundamentally change dynamic of the system.



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qinchuanyu Feb. 12, 2014, 6:46 a.m. UTC | #8
On 2014/2/12 13:28, Jason Wang wrote:

> A question: without NAPI weight, could this starve other net devices?
tap xmit skb use thread context?the poll func of physical nic driver
could be called in softirq context without change.

I had test it by binding vhost thread and physic nic interrupt on the 
same vcpu, use netperf xmit udp, test model is VM1-Host1-Host2.

if only VM1 xmit skb, the top show as below :
Cpu1 :0.0%us, 95.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 5.0%si,  0.0%st

then use host2 xmit skb to VM1, the top show as below :
Cpu1 :0.0%us, 41.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 59.0%si, 0.0%st

so I think there is no problem with this change.

>>   drivers/net/tun.c |    4 +++-
>>   1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 44c4db8..90b4e58 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
>> *tun, struct tun_file *tfile,
>>       skb_probe_transport_header(skb, 0);
>>
>>       rxhash = skb_get_hash(skb);
>> -    netif_rx_ni(skb);
>> +    rcu_read_lock_bh();
>> +    netif_receive_skb(skb);
>> +    rcu_read_unlock_bh();
>>
>>       tun->dev->stats.rx_packets++;
>>       tun->dev->stats.rx_bytes += len;
>
>
> .
>


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Feb. 12, 2014, 7:38 a.m. UTC | #9
On 02/12/2014 02:26 PM, Eric Dumazet wrote:
> On Wed, 2014-02-12 at 13:50 +0800, Jason Wang wrote:
>> On 02/12/2014 01:47 PM, Eric Dumazet wrote:
>>> On Wed, 2014-02-12 at 13:28 +0800, Jason Wang wrote:
>>>
>>>> A question: without NAPI weight, could this starve other net devices?
>>> Not really, as net devices are serviced by softirq handler.
>>>
>>>
>> Yes, then the issue is tun could be starved by other net devices.
> How this patch changes anything to this 'problem' ?
>
> netif_rx_ni() can only be called if your process is not preempted by
> other high prio tasks/softirqs.
>
> If this process is scheduled on a cpu, then disabling bh to process
> _one_ packet wont fundamentally change dynamic of the system.
>

After looking at the code for a while, I agree it won't be a great change.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Feb. 12, 2014, 7:40 a.m. UTC | #10
On 02/12/2014 02:46 PM, Qin Chuanyu wrote:
> On 2014/2/12 13:28, Jason Wang wrote:
>
>> A question: without NAPI weight, could this starve other net devices?
> tap xmit skb use thread context?the poll func of physical nic driver
> could be called in softirq context without change.
>
> I had test it by binding vhost thread and physic nic interrupt on the
> same vcpu, use netperf xmit udp, test model is VM1-Host1-Host2.
>
> if only VM1 xmit skb, the top show as below :
> Cpu1 :0.0%us, 95.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 5.0%si,  0.0%st
>
> then use host2 xmit skb to VM1, the top show as below :
> Cpu1 :0.0%us, 41.0%sy, 0.0%ni, 0.0%id,  0.0%wa, 0.0%hi, 59.0%si, 0.0%st
>
> so I think there is no problem with this change.

Yes, I realize it was ok after Eric's comment.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Herbert Xu Feb. 16, 2014, 1:03 p.m. UTC | #11
On Wed, Feb 12, 2014 at 09:50:03AM +0800, Qin Chuanyu wrote:
> On 2014/2/11 23:24, Eric Dumazet wrote:
> >>diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> >>index 44c4db8..90b4e58 100644
> >>--- a/drivers/net/tun.c
> >>+++ b/drivers/net/tun.c
> >>@@ -1184,7 +1184,9 @@ static ssize_t tun_get_user(struct tun_struct
> >>*tun, struct tun_file *tfile,
> >>   	skb_probe_transport_header(skb, 0);
> >>
> >>   	rxhash = skb_get_hash(skb);
> >>-	netif_rx_ni(skb);
> >>+	rcu_read_lock_bh();
> >>+	netif_receive_skb(skb);
> >>+	rcu_read_unlock_bh();
> >>
> >>   	tun->dev->stats.rx_packets++;
> >>   	tun->dev->stats.rx_bytes += len;
> >
> >I already said this patch is not good :
> >
> >rcu_read_lock_bh() makes no sense here.
> >
> >What is really needed is local_bh_disable();
> >
> >Herbert patch ( http://patchwork.ozlabs.org/patch/52963/ ) had a much
> >cleaner form.
> >
> >Just use it, CC him, credit him, please ?
> >
> To: Herbert Xu
> I saw that you had delivered a patch to resolve the problem of cgroup.
> patch num is f845172531fb7410c7fb7780b1a6e51ee6df7d52
> 
> so would you deliver your patch for tun again? I had test it and
> found it work well.

Feel free to resubmit my patch either as is or with your
modifications as I no longer need it for my original purposes.

Thanks,
diff mbox

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44c4db8..90b4e58 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1184,7 +1184,9 @@  static ssize_t tun_get_user(struct tun_struct 
*tun, struct tun_file *tfile,
  	skb_probe_transport_header(skb, 0);

  	rxhash = skb_get_hash(skb);
-	netif_rx_ni(skb);
+	rcu_read_lock_bh();
+	netif_receive_skb(skb);
+	rcu_read_unlock_bh();

  	tun->dev->stats.rx_packets++;