diff mbox

[2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

Message ID 1376630190-5912-3-git-send-email-jasowang@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason Wang Aug. 16, 2013, 5:16 a.m. UTC
Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). With the patch we will call at most 2 times
(consider done_idx warp around) compared to N times w/o this patch.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |   13 ++++++++-----
 1 files changed, 8 insertions(+), 5 deletions(-)

Comments

Michael S. Tsirkin Aug. 16, 2013, 9:54 a.m. UTC | #1
On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote:
> Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
> vhost_add_used_and_signal(). With the patch we will call at most 2 times
> (consider done_idx warp around) compared to N times w/o this patch.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

So? Does this help performance then?

> ---
>  drivers/vhost/net.c |   13 ++++++++-----
>  1 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 280ee66..8a6dd0d 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -281,7 +281,7 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
>  {
>  	struct vhost_net_virtqueue *nvq =
>  		container_of(vq, struct vhost_net_virtqueue, vq);
> -	int i;
> +	int i, add;
>  	int j = 0;
>  
>  	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
> @@ -289,14 +289,17 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
>  			vhost_net_tx_err(net);
>  		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
>  			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
> -			vhost_add_used_and_signal(vq->dev, vq,
> -						  vq->heads[i].id, 0);
>  			++j;
>  		} else
>  			break;
>  	}
> -	if (j)
> -		nvq->done_idx = i;
> +	while (j) {
> +		add = min(UIO_MAXIOV - nvq->done_idx, j);
> +		vhost_add_used_and_signal_n(vq->dev, vq,
> +					    &vq->heads[nvq->done_idx], add);
> +		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
> +		j -= add;
> +	}
>  }
>  
>  static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)
> -- 
> 1.7.1
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Aug. 20, 2013, 2:33 a.m. UTC | #2
On 08/16/2013 05:54 PM, Michael S. Tsirkin wrote:
> On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote:
>> > Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
>> > vhost_add_used_and_signal(). With the patch we will call at most 2 times
>> > (consider done_idx warp around) compared to N times w/o this patch.
>> > 
>> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> So? Does this help performance then?
>

Looks like it can especially when guest does support event index. When
guest enable tx interrupt, this can saves us some unnecessary signal to
guest. I will do some test.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Wang Aug. 23, 2013, 8:50 a.m. UTC | #3
On 08/20/2013 10:33 AM, Jason Wang wrote:
> On 08/16/2013 05:54 PM, Michael S. Tsirkin wrote:
>> On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote:
>>>> Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
>>>> vhost_add_used_and_signal(). With the patch we will call at most 2 times
>>>> (consider done_idx warp around) compared to N times w/o this patch.
>>>>
>>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> So? Does this help performance then?
>>
> Looks like it can especially when guest does support event index. When
> guest enable tx interrupt, this can saves us some unnecessary signal to
> guest. I will do some test.

Have done some test. I can see 2% - 3% increasing in both aggregate
transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test.

I'm using ixgbe. W/o this patch, I can see more than 100 calls of
vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is
because ixgbe (and other modern ethernet driver) tends to free old tx
skbs in a loop during tx interrupt, and vhost tend to batch the adding
used and signal in vhost_zerocopy_callback(). Switching to use
vhost_add_use_and_signal_n() means saving 100 times of used idx updating
and memory barriers.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael S. Tsirkin Aug. 25, 2013, 11:48 a.m. UTC | #4
On Fri, Aug 23, 2013 at 04:50:38PM +0800, Jason Wang wrote:
> On 08/20/2013 10:33 AM, Jason Wang wrote:
> > On 08/16/2013 05:54 PM, Michael S. Tsirkin wrote:
> >> On Fri, Aug 16, 2013 at 01:16:26PM +0800, Jason Wang wrote:
> >>>> Switch to use vhost_add_used_and_signal_n() to avoid multiple calls to
> >>>> vhost_add_used_and_signal(). With the patch we will call at most 2 times
> >>>> (consider done_idx warp around) compared to N times w/o this patch.
> >>>>
> >>>> Signed-off-by: Jason Wang <jasowang@redhat.com>
> >> So? Does this help performance then?
> >>
> > Looks like it can especially when guest does support event index. When
> > guest enable tx interrupt, this can saves us some unnecessary signal to
> > guest. I will do some test.
> 
> Have done some test. I can see 2% - 3% increasing in both aggregate
> transaction rate and per cpu transaction rate in TCP_RR and UDP_RR test.
> 
> I'm using ixgbe. W/o this patch, I can see more than 100 calls of
> vhost_add_used_signal() in one vhost_zerocopy_signaled_used(). This is
> because ixgbe (and other modern ethernet driver) tends to free old tx
> skbs in a loop during tx interrupt, and vhost tend to batch the adding
> used and signal in vhost_zerocopy_callback(). Switching to use
> vhost_add_use_and_signal_n() means saving 100 times of used idx updating
> and memory barriers.

Well it's only smp_wmb so a nop on most architectures, so
a 2% gain is surprising.
I'm guessing the cache miss on the write is what's
giving us a speedup here.

I'll review the code, thanks.
diff mbox

Patch

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 280ee66..8a6dd0d 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -281,7 +281,7 @@  static void vhost_zerocopy_signal_used(struct vhost_net *net,
 {
 	struct vhost_net_virtqueue *nvq =
 		container_of(vq, struct vhost_net_virtqueue, vq);
-	int i;
+	int i, add;
 	int j = 0;
 
 	for (i = nvq->done_idx; i != nvq->upend_idx; i = (i + 1) % UIO_MAXIOV) {
@@ -289,14 +289,17 @@  static void vhost_zerocopy_signal_used(struct vhost_net *net,
 			vhost_net_tx_err(net);
 		if (VHOST_DMA_IS_DONE(vq->heads[i].len)) {
 			vq->heads[i].len = VHOST_DMA_CLEAR_LEN;
-			vhost_add_used_and_signal(vq->dev, vq,
-						  vq->heads[i].id, 0);
 			++j;
 		} else
 			break;
 	}
-	if (j)
-		nvq->done_idx = i;
+	while (j) {
+		add = min(UIO_MAXIOV - nvq->done_idx, j);
+		vhost_add_used_and_signal_n(vq->dev, vq,
+					    &vq->heads[nvq->done_idx], add);
+		nvq->done_idx = (nvq->done_idx + add) % UIO_MAXIOV;
+		j -= add;
+	}
 }
 
 static void vhost_zerocopy_callback(struct ubuf_info *ubuf, bool success)