[net-next,RFC,2/2] vhost_net: basic polling support

Message ID	1445491649-62614-2-git-send-email-jasowang@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> From: Jason Wang <jasowang@redhat.com> To: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Cc: Jason Wang <jasowang@redhat.com> Subject: [PATCH net-next RFC 2/2] vhost_net: basic polling support Date: Thu, 22 Oct 2015 01:27:29 -0400 Message-Id: <1445491649-62614-2-git-send-email-jasowang@redhat.com> In-Reply-To: <1445491649-62614-1-git-send-email-jasowang@redhat.com> References: <1445491649-62614-1-git-send-email-jasowang@redhat.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk

Jason Wang Oct. 22, 2015, 5:27 a.m. UTC

This patch tries to poll for new added tx buffer for a while at the
end of tx processing. The maximum time spent on polling were limited
through a module parameter. To avoid block rx, the loop will end it
there's new other works queued on vhost so in fact socket receive
queue is also be polled.

busyloop_timeout = 50 gives us following improvement on TCP_RR test:

size/session/+thu%/+normalize%
    1/     1/   +5%/  -20%
    1/    50/  +17%/   +3%

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Michael S. Tsirkin Oct. 22, 2015, 9:33 a.m. UTC | #1

On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
> This patch tries to poll for new added tx buffer for a while at the
> end of tx processing. The maximum time spent on polling were limited
> through a module parameter. To avoid block rx, the loop will end it
> there's new other works queued on vhost so in fact socket receive
> queue is also be polled.
> 
> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
> 
> size/session/+thu%/+normalize%
>     1/     1/   +5%/  -20%
>     1/    50/  +17%/   +3%

Is there a measureable increase in cpu utilization
with busyloop_timeout = 0?

> Signed-off-by: Jason Wang <jasowang@redhat.com>

We might be able to shave off the minor regression
by careful use of likely/unlikely, or maybe
deferring 

> ---
>  drivers/vhost/net.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 9eda69e..bbb522a 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -31,7 +31,9 @@
>  #include "vhost.h"
>  
>  static int experimental_zcopytx = 1;
> +static int busyloop_timeout = 50;
>  module_param(experimental_zcopytx, int, 0444);
> +module_param(busyloop_timeout, int, 0444);

Pls add a description, including the units and the special
value 0.

>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
>  		                       " 1 -Enable; 0 - Disable");
>  
> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>  	rcu_read_unlock_bh();
>  }
>  
> +static bool tx_can_busy_poll(struct vhost_dev *dev,
> +			     unsigned long endtime)
> +{
> +	unsigned long now = local_clock() >> 10;

local_clock might go backwards if we jump between CPUs.
One way to fix would be to record the CPU id and break
out of loop if that changes.

Also - defer this until we actually know we need it?

> +
> +	return busyloop_timeout && !need_resched() &&
> +	       !time_after(now, endtime) && !vhost_has_work(dev) &&
> +	       single_task_running();

signal pending as well?

> +}
> +
>  /* Expects to be always run from workqueue - which acts as
>   * read-size critical section for our kind of RCU. */
>  static void handle_tx(struct vhost_net *net)
>  {
>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>  	struct vhost_virtqueue *vq = &nvq->vq;
> +	unsigned long endtime;
>  	unsigned out, in;
>  	int head;
>  	struct msghdr msg = {
> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
>  			      % UIO_MAXIOV == nvq->done_idx))
>  			break;
>  
> +		endtime  = (local_clock() >> 10) + busyloop_timeout;
> +again:
>  		head = vhost_get_vq_desc(vq, vq->iov,
>  					 ARRAY_SIZE(vq->iov),
>  					 &out, &in,
> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
>  			break;
>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>  		if (head == vq->num) {
> +			if (tx_can_busy_poll(vq->dev, endtime)) {
> +				cpu_relax();
> +				goto again;
> +			}
>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>  				vhost_disable_notify(&net->dev, vq);
>  				continue;
> -- 
> 1.8.3.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Rick Jones Oct. 22, 2015, 3:46 p.m. UTC | #2

On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
>> This patch tries to poll for new added tx buffer for a while at the
>> end of tx processing. The maximum time spent on polling were limited
>> through a module parameter. To avoid block rx, the loop will end it
>> there's new other works queued on vhost so in fact socket receive
>> queue is also be polled.
>>
>> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
>>
>> size/session/+thu%/+normalize%
>>      1/     1/   +5%/  -20%
>>      1/    50/  +17%/   +3%
>
> Is there a measureable increase in cpu utilization
> with busyloop_timeout = 0?

And since a netperf TCP_RR test is involved, be careful about what 
netperf reports for CPU util if that increase isn't in the context of 
the guest OS.

For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS, 
aggregate _RR and even aggregate _RR/packets per second for many VMs on 
the same system would be in order.

happy benchmarking,

rick jones

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin Oct. 22, 2015, 4:16 p.m. UTC | #3

On Thu, Oct 22, 2015 at 08:46:33AM -0700, Rick Jones wrote:
> On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote:
> >On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
> >>This patch tries to poll for new added tx buffer for a while at the
> >>end of tx processing. The maximum time spent on polling were limited
> >>through a module parameter. To avoid block rx, the loop will end it
> >>there's new other works queued on vhost so in fact socket receive
> >>queue is also be polled.
> >>
> >>busyloop_timeout = 50 gives us following improvement on TCP_RR test:
> >>
> >>size/session/+thu%/+normalize%
> >>     1/     1/   +5%/  -20%
> >>     1/    50/  +17%/   +3%
> >
> >Is there a measureable increase in cpu utilization
> >with busyloop_timeout = 0?
> 
> And since a netperf TCP_RR test is involved, be careful about what netperf
> reports for CPU util if that increase isn't in the context of the guest OS.
> 
> For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS,
> aggregate _RR and even aggregate _RR/packets per second for many VMs on the
> same system would be in order.
> 
> happy benchmarking,
> 
> rick jones

Absolutely, merging a new kernel API just for a specific
benchmark doesn't make sense.
I'm guessing this is just an early RFC, a fuller submission
will probably include more numbers.

Jason Wang Oct. 23, 2015, 7:13 a.m. UTC | #4

On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
>> This patch tries to poll for new added tx buffer for a while at the
>> end of tx processing. The maximum time spent on polling were limited
>> through a module parameter. To avoid block rx, the loop will end it
>> there's new other works queued on vhost so in fact socket receive
>> queue is also be polled.
>>
>> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
>>
>> size/session/+thu%/+normalize%
>>     1/     1/   +5%/  -20%
>>     1/    50/  +17%/   +3%
> Is there a measureable increase in cpu utilization
> with busyloop_timeout = 0?

Just run TCP_RR, no increasing. Will run a complete test on next version.

>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> We might be able to shave off the minor regression
> by careful use of likely/unlikely, or maybe
> deferring 

Yes, but what did "deferring" mean here?
 
>
>> ---
>>  drivers/vhost/net.c | 19 +++++++++++++++++++
>>  1 file changed, 19 insertions(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 9eda69e..bbb522a 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -31,7 +31,9 @@
>>  #include "vhost.h"
>>  
>>  static int experimental_zcopytx = 1;
>> +static int busyloop_timeout = 50;
>>  module_param(experimental_zcopytx, int, 0444);
>> +module_param(busyloop_timeout, int, 0444);
> Pls add a description, including the units and the special
> value 0.

Ok.

>
>>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
>>  		                       " 1 -Enable; 0 - Disable");
>>  
>> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
>>  	rcu_read_unlock_bh();
>>  }
>>  
>> +static bool tx_can_busy_poll(struct vhost_dev *dev,
>> +			     unsigned long endtime)
>> +{
>> +	unsigned long now = local_clock() >> 10;
> local_clock might go backwards if we jump between CPUs.
> One way to fix would be to record the CPU id and break
> out of loop if that changes.

Right, or maybe disable preemption in this case?

>
> Also - defer this until we actually know we need it?

Right.

>
>> +
>> +	return busyloop_timeout && !need_resched() &&
>> +	       !time_after(now, endtime) && !vhost_has_work(dev) &&
>> +	       single_task_running();
> signal pending as well?

Yes.

>> +}
>> +
>>  /* Expects to be always run from workqueue - which acts as
>>   * read-size critical section for our kind of RCU. */
>>  static void handle_tx(struct vhost_net *net)
>>  {
>>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
>>  	struct vhost_virtqueue *vq = &nvq->vq;
>> +	unsigned long endtime;
>>  	unsigned out, in;
>>  	int head;
>>  	struct msghdr msg = {
>> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
>>  			      % UIO_MAXIOV == nvq->done_idx))
>>  			break;
>>  
>> +		endtime  = (local_clock() >> 10) + busyloop_timeout;
>> +again:
>>  		head = vhost_get_vq_desc(vq, vq->iov,
>>  					 ARRAY_SIZE(vq->iov),
>>  					 &out, &in,
>> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
>>  			break;
>>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
>>  		if (head == vq->num) {
>> +			if (tx_can_busy_poll(vq->dev, endtime)) {
>> +				cpu_relax();
>> +				goto again;
>> +			}
>>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
>>  				vhost_disable_notify(&net->dev, vq);
>>  				continue;
>> -- 
>> 1.8.3.1
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Wang Oct. 23, 2015, 7:14 a.m. UTC | #5

On 10/23/2015 12:16 AM, Michael S. Tsirkin wrote:
> On Thu, Oct 22, 2015 at 08:46:33AM -0700, Rick Jones wrote:
>> On 10/22/2015 02:33 AM, Michael S. Tsirkin wrote:
>>> On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
>>>> This patch tries to poll for new added tx buffer for a while at the
>>>> end of tx processing. The maximum time spent on polling were limited
>>>> through a module parameter. To avoid block rx, the loop will end it
>>>> there's new other works queued on vhost so in fact socket receive
>>>> queue is also be polled.
>>>>
>>>> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
>>>>
>>>> size/session/+thu%/+normalize%
>>>>     1/     1/   +5%/  -20%
>>>>     1/    50/  +17%/   +3%
>>> Is there a measureable increase in cpu utilization
>>> with busyloop_timeout = 0?
>> And since a netperf TCP_RR test is involved, be careful about what netperf
>> reports for CPU util if that increase isn't in the context of the guest OS.

Right, the cpu utilization is measured on host.

>>
>> For completeness, looking at the effect on TCP_STREAM and TCP_MAERTS,
>> aggregate _RR and even aggregate _RR/packets per second for many VMs on the
>> same system would be in order.
>>
>> happy benchmarking,
>>
>> rick jones
> Absolutely, merging a new kernel API just for a specific
> benchmark doesn't make sense.
> I'm guessing this is just an early RFC, a fuller submission
> will probably include more numbers.
>

Yes, will run more complete tests.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Michael S. Tsirkin Oct. 23, 2015, 1:39 p.m. UTC | #6

On Fri, Oct 23, 2015 at 03:13:07PM +0800, Jason Wang wrote:
> 
> 
> On 10/22/2015 05:33 PM, Michael S. Tsirkin wrote:
> > On Thu, Oct 22, 2015 at 01:27:29AM -0400, Jason Wang wrote:
> >> This patch tries to poll for new added tx buffer for a while at the
> >> end of tx processing. The maximum time spent on polling were limited
> >> through a module parameter. To avoid block rx, the loop will end it
> >> there's new other works queued on vhost so in fact socket receive
> >> queue is also be polled.
> >>
> >> busyloop_timeout = 50 gives us following improvement on TCP_RR test:
> >>
> >> size/session/+thu%/+normalize%
> >>     1/     1/   +5%/  -20%
> >>     1/    50/  +17%/   +3%
> > Is there a measureable increase in cpu utilization
> > with busyloop_timeout = 0?
> 
> Just run TCP_RR, no increasing. Will run a complete test on next version.
> 
> >
> >> Signed-off-by: Jason Wang <jasowang@redhat.com>
> > We might be able to shave off the minor regression
> > by careful use of likely/unlikely, or maybe
> > deferring 
> 
> Yes, but what did "deferring" mean here?

Don't call local_clock until we know we'll need it.

> >
> >> ---
> >>  drivers/vhost/net.c | 19 +++++++++++++++++++
> >>  1 file changed, 19 insertions(+)
> >>
> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> >> index 9eda69e..bbb522a 100644
> >> --- a/drivers/vhost/net.c
> >> +++ b/drivers/vhost/net.c
> >> @@ -31,7 +31,9 @@
> >>  #include "vhost.h"
> >>  
> >>  static int experimental_zcopytx = 1;
> >> +static int busyloop_timeout = 50;
> >>  module_param(experimental_zcopytx, int, 0444);
> >> +module_param(busyloop_timeout, int, 0444);
> > Pls add a description, including the units and the special
> > value 0.
> 
> Ok.
> 
> >
> >>  MODULE_PARM_DESC(experimental_zcopytx, "Enable Zero Copy TX;"
> >>  		                       " 1 -Enable; 0 - Disable");
> >>  
> >> @@ -287,12 +289,23 @@ static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> >>  	rcu_read_unlock_bh();
> >>  }
> >>  
> >> +static bool tx_can_busy_poll(struct vhost_dev *dev,
> >> +			     unsigned long endtime)
> >> +{
> >> +	unsigned long now = local_clock() >> 10;
> > local_clock might go backwards if we jump between CPUs.
> > One way to fix would be to record the CPU id and break
> > out of loop if that changes.
> 
> Right, or maybe disable preemption in this case?
> 
> >
> > Also - defer this until we actually know we need it?
> 
> Right.
> 
> >
> >> +
> >> +	return busyloop_timeout && !need_resched() &&
> >> +	       !time_after(now, endtime) && !vhost_has_work(dev) &&
> >> +	       single_task_running();
> > signal pending as well?
> 
> Yes.
> 
> >> +}
> >> +
> >>  /* Expects to be always run from workqueue - which acts as
> >>   * read-size critical section for our kind of RCU. */
> >>  static void handle_tx(struct vhost_net *net)
> >>  {
> >>  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];
> >>  	struct vhost_virtqueue *vq = &nvq->vq;
> >> +	unsigned long endtime;
> >>  	unsigned out, in;
> >>  	int head;
> >>  	struct msghdr msg = {
> >> @@ -331,6 +344,8 @@ static void handle_tx(struct vhost_net *net)
> >>  			      % UIO_MAXIOV == nvq->done_idx))
> >>  			break;
> >>  
> >> +		endtime  = (local_clock() >> 10) + busyloop_timeout;
> >> +again:
> >>  		head = vhost_get_vq_desc(vq, vq->iov,
> >>  					 ARRAY_SIZE(vq->iov),
> >>  					 &out, &in,
> >> @@ -340,6 +355,10 @@ static void handle_tx(struct vhost_net *net)
> >>  			break;
> >>  		/* Nothing new?  Wait for eventfd to tell us they refilled. */
> >>  		if (head == vq->num) {
> >> +			if (tx_can_busy_poll(vq->dev, endtime)) {
> >> +				cpu_relax();
> >> +				goto again;
> >> +			}
> >>  			if (unlikely(vhost_enable_notify(&net->dev, vq))) {
> >>  				vhost_disable_notify(&net->dev, vq);
> >>  				continue;
> >> -- 
> >> 1.8.3.1
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net-next,RFC,2/2] vhost_net: basic polling support

Commit Message

Comments

Patch