diff mbox

kvm:virtio-net: Run TX from the I/O thread

Message ID 20090121230642.10404.65372.stgit@kvm.aw (mailing list archive)
State New, archived
Headers show

Commit Message

Alex Williamson Jan. 21, 2009, 11:08 p.m. UTC
This is an attempt to improve the latency of virtio-net while not hurting
throughput.  I wanted to try moving packet TX into a different thread
so we can quickly return to the guest after it kicks us to send packets
out.  I also switched the order of when the tx_timer comes into play, so
we can get an inital burst of packets out, then wait for the timer to
fire and notify us if there's more to do.  Here's what it does for me
(average of 5 runs each, testing to a remote system on a 1Gb network):

netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s =  99.58%
netperf TCP_RR:      2028.72/s -> 3927.99/s  = 193.62%
tbench:              92.99MB/s -> 99.97MB/s  = 107.51%

I'd be interested to hear if it helps or hurts anyone else.  Thanks,

Alex


Signed-off-by: Alex Williamson <alex.williamson@hp.com>
---

 qemu/hw/virtio-net.c |   57 ++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 57 insertions(+), 0 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Avi Kivity Jan. 22, 2009, 12:12 p.m. UTC | #1
Alex Williamson wrote:
> This is an attempt to improve the latency of virtio-net while not hurting
> throughput.  I wanted to try moving packet TX into a different thread
> so we can quickly return to the guest after it kicks us to send packets
> out.  I also switched the order of when the tx_timer comes into play, so
> we can get an inital burst of packets out, then wait for the timer to
> fire and notify us if there's more to do.  Here's what it does for me
> (average of 5 runs each, testing to a remote system on a 1Gb network):
>
> netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s =  99.58%
> netperf TCP_RR:      2028.72/s -> 3927.99/s  = 193.62%
> tbench:              92.99MB/s -> 99.97MB/s  = 107.51%
>
> I'd be interested to hear if it helps or hurts anyone else.  Thanks,
>   

My worry with this change is that increases cpu utilization even more 
than it increases bandwidth, so that our bits/cycle measure decreases.  
The descriptors (and perhaps data) are likely on the same cache as the 
vcpu, and moving the transmit to the iothread will cause them to move to 
the iothread's cache.

My preferred approach to increasing both bandwidth and bits/cycle (the 
latter figure is more important IMO, unfortunately benchmarks don't 
measure it) is to aio-enable tap and raw sockets.  The vcpu thread would 
only touch the packet descriptors (not data) and submit all packets in 
one io_submit() call.  Unfortunately a huge amount of work is needed to 
pull this off.
Mark McLoughlin Jan. 22, 2009, 12:47 p.m. UTC | #2
Hi Alex,

On Wed, 2009-01-21 at 16:08 -0700, Alex Williamson wrote:
> This is an attempt to improve the latency of virtio-net while not hurting
> throughput.  I wanted to try moving packet TX into a different thread
> so we can quickly return to the guest after it kicks us to send packets
> out.  I also switched the order of when the tx_timer comes into play, so
> we can get an inital burst of packets out, then wait for the timer to
> fire and notify us if there's more to do.  Here's what it does for me
> (average of 5 runs each, testing to a remote system on a 1Gb network):
> 
> netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s =  99.58%
> netperf TCP_RR:      2028.72/s -> 3927.99/s  = 193.62%
> tbench:              92.99MB/s -> 99.97MB/s  = 107.51%
> 
> I'd be interested to hear if it helps or hurts anyone else.  Thanks,

Avi and I went back and forth on this one in great detail before:

  http://www.mail-archive.com/kvm@vger.kernel.org/msg06431.html

Avi's arguments make a lot of sense, but looking over those patches
again now, I still think that applying them would be a better approach
than what we have right now.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark McLoughlin Jan. 22, 2009, 12:48 p.m. UTC | #3
On Thu, 2009-01-22 at 14:12 +0200, Avi Kivity wrote:

>  My worry with this change is that increases cpu utilization even more 
> than it increases bandwidth, so that our bits/cycle measure decreases.  

Yep, agreed it's important to watch out for this.

> The descriptors (and perhaps data) are likely on the same cache as the 
> vcpu, and moving the transmit to the iothread will cause them to move to 
> the iothread's cache.

We flush from the I/O thread right now.

We only ever flush from the vcpu thread when the ring fills up, which
rarely happens from what I've observed.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity Jan. 22, 2009, 1:41 p.m. UTC | #4
Mark McLoughlin wrote:
> Hi Alex,
>
> On Wed, 2009-01-21 at 16:08 -0700, Alex Williamson wrote:
>   
>> This is an attempt to improve the latency of virtio-net while not hurting
>> throughput.  I wanted to try moving packet TX into a different thread
>> so we can quickly return to the guest after it kicks us to send packets
>> out.  I also switched the order of when the tx_timer comes into play, so
>> we can get an inital burst of packets out, then wait for the timer to
>> fire and notify us if there's more to do.  Here's what it does for me
>> (average of 5 runs each, testing to a remote system on a 1Gb network):
>>
>> netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s =  99.58%
>> netperf TCP_RR:      2028.72/s -> 3927.99/s  = 193.62%
>> tbench:              92.99MB/s -> 99.97MB/s  = 107.51%
>>
>> I'd be interested to hear if it helps or hurts anyone else.  Thanks,
>>     
>
> Avi and I went back and forth on this one in great detail before:
>
>   http://www.mail-archive.com/kvm@vger.kernel.org/msg06431.html
>
> Avi's arguments make a lot of sense, but looking over those patches
> again now, I still think that applying them would be a better approach
> than what we have right now.
>   

I can go with that.  Anthony, should I wait for a qemu iothread so it 
can go upstream directly?
Avi Kivity Jan. 22, 2009, 1:42 p.m. UTC | #5
(copying Anthony this time)

Mark McLoughlin wrote:
> Hi Alex,
>
> On Wed, 2009-01-21 at 16:08 -0700, Alex Williamson wrote:
>   
>> This is an attempt to improve the latency of virtio-net while not hurting
>> throughput.  I wanted to try moving packet TX into a different thread
>> so we can quickly return to the guest after it kicks us to send packets
>> out.  I also switched the order of when the tx_timer comes into play, so
>> we can get an inital burst of packets out, then wait for the timer to
>> fire and notify us if there's more to do.  Here's what it does for me
>> (average of 5 runs each, testing to a remote system on a 1Gb network):
>>
>> netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s =  99.58%
>> netperf TCP_RR:      2028.72/s -> 3927.99/s  = 193.62%
>> tbench:              92.99MB/s -> 99.97MB/s  = 107.51%
>>
>> I'd be interested to hear if it helps or hurts anyone else.  Thanks,
>>     
>
> Avi and I went back and forth on this one in great detail before:
>
>   http://www.mail-archive.com/kvm@vger.kernel.org/msg06431.html
>
> Avi's arguments make a lot of sense, but looking over those patches
> again now, I still think that applying them would be a better approach
> than what we have right now.
>   

I can go with that.  Anthony, should I wait for a qemu iothread so it
can go upstream directly?
Alex Williamson Jan. 22, 2009, 6:36 p.m. UTC | #6
On Thu, 2009-01-22 at 12:48 +0000, Mark McLoughlin wrote:
> On Thu, 2009-01-22 at 14:12 +0200, Avi Kivity wrote:
> 
> >  My worry with this change is that increases cpu utilization even more 
> > than it increases bandwidth, so that our bits/cycle measure decreases.  
> 
> Yep, agreed it's important to watch out for this.
> 
> > The descriptors (and perhaps data) are likely on the same cache as the 
> > vcpu, and moving the transmit to the iothread will cause them to move to 
> > the iothread's cache.
> 
> We flush from the I/O thread right now.
> 
> We only ever flush from the vcpu thread when the ring fills up, which
> rarely happens from what I've observed.

Sorry to have come in late to the discussion, but it seems like maybe it
needed another kick after a couple months anyway.  As noted, we are
mostly (almost exclusively?) doing TX from the timer anyway, so this
change or Mark's previous patch series don't really change current cache
effects.  I am curious what happens to latency with Mark's series since
that isn't really addressed by the charts, hopefully good things without
the tx_timer.

A thread per device or perhaps even a thread per RX/TX stream seems like
a logical goal, but these current patches do provide a worthwhile
incremental improvement.  Perhaps we could affinitize the guest to do
I/O on a specific vcpu via _PXM methods in ACPI so we can provide hints
to the scheduler to keep a vcpu thread and it's associated I/O threads
nearby.  Thanks,

Alex
Anthony Liguori Jan. 22, 2009, 6:43 p.m. UTC | #7
Avi Kivity wrote:
> Mark McLoughlin wrote:
>> Avi and I went back and forth on this one in great detail before:
>>
>>   http://www.mail-archive.com/kvm@vger.kernel.org/msg06431.html
>>
>> Avi's arguments make a lot of sense, but looking over those patches
>> again now, I still think that applying them would be a better approach
>> than what we have right now.
>>   
>
> I can go with that.  Anthony, should I wait for a qemu iothread so it 
> can go upstream directly?

Uh, go ahead and apply it now.  The io thread should come soon but I 
don't think it's going to be easier to wait and merge this later than 
dealing with the conflict after you resync against QEMU post IO thread.

Regards,

Anthony Liguori

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c
index 8f3c41d..26508bd 100644
--- a/qemu/hw/virtio-net.c
+++ b/qemu/hw/virtio-net.c
@@ -17,6 +17,8 @@ 
 #include "virtio-net.h"
 #ifdef USE_KVM
 #include "qemu-kvm.h"
+#include "compatfd.h"
+#include "qemu-char.h"
 #endif
 
 #define TAP_VNET_HDR
@@ -49,6 +51,9 @@  typedef struct VirtIONet
         int enabled;
         uint32_t *vlans;
     } vlan_table;
+#ifdef USE_KVM
+    int fds[2];
+#endif
 } VirtIONet;
 
 /* TODO
@@ -570,10 +575,50 @@  static void virtio_net_flush_tx(VirtIONet *n, VirtQueue *vq)
     }
 }
 
+#ifdef USE_KVM
+static void virtio_net_do_tx(void *opaque)
+{
+    VirtIONet *n = opaque;
+
+    virtio_net_flush_tx(n, n->tx_vq);
+    qemu_mod_timer(n->tx_timer,
+                   qemu_get_clock(vm_clock) + TX_TIMER_INTERVAL);
+}
+
+static void virtio_net_notify_tx(int fd)
+{
+    uint64_t value = 1;
+    char buffer[8];
+    size_t offset = 0;
+
+    memcpy(buffer, &value, sizeof(value));
+
+    while (offset < 8) {
+        ssize_t len;
+
+        len = write(fd, buffer + offset, 8 - offset);
+        if (len == -1 && errno == EINTR)
+            continue;
+
+        if (len <= 0)
+            break;
+
+        offset += len;
+    }
+
+    if (offset != 8)
+        fprintf(stderr, "virtio-net: failed to notify tx thread\n");
+}
+#endif
+
 static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
 {
     VirtIONet *n = to_virtio_net(vdev);
 
+#ifdef USE_KVM
+    virtio_queue_set_notification(n->tx_vq, 0);
+    virtio_net_notify_tx(n->fds[0]);
+#else
     if (n->tx_timer_active) {
         virtio_queue_set_notification(vq, 1);
         qemu_del_timer(n->tx_timer);
@@ -585,6 +630,7 @@  static void virtio_net_handle_tx(VirtIODevice *vdev, VirtQueue *vq)
         n->tx_timer_active = 1;
         virtio_queue_set_notification(vq, 0);
     }
+#endif
 }
 
 static void virtio_net_tx_timer(void *opaque)
@@ -598,7 +644,9 @@  static void virtio_net_tx_timer(void *opaque)
         return;
 
     virtio_queue_set_notification(n->tx_vq, 1);
+#ifndef USE_KVM
     virtio_net_flush_tx(n, n->tx_vq);
+#endif
 }
 
 /*
@@ -712,6 +760,15 @@  PCIDevice *virtio_net_init(PCIBus *bus, NICInfo *nd, int devfn)
                                  virtio_net_receive, virtio_net_can_receive, n);
     n->vc->link_status_changed = virtio_net_set_link_status;
 
+#ifdef USE_KVM
+    if (qemu_eventfd(n->fds) == -1) {
+        fprintf(stderr, "failed to setup virtio-net eventfd\n");
+        return NULL;
+    }
+
+    qemu_set_fd_handler2(n->fds[0], NULL, virtio_net_do_tx, NULL, (void *)n);
+#endif
+
     qemu_format_nic_info_str(n->vc, n->mac);
 
     n->tx_timer = qemu_new_timer(vm_clock, virtio_net_tx_timer, n);