diff mbox

Fix migration issue when the destination is loaded

Message ID 4A5DEFB3.4060702@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Dor Laor July 15, 2009, 3:03 p.m. UTC
From abbe3b57af6a28bb81e5fb8b09b10802a8ccc3fe Mon Sep 17 00:00:00 2001
From: Dor Laor <dor@redhat.com>
Date: Wed, 15 Jul 2009 17:53:16 +0300
Subject: [PATCH] Fix migration issue when the destination is loaded

If the migration socket is full, we get EAGAIN for the write.
The set_fd_handler2 defers the write for later on. The function
tries to wake up the iothread by qemu_kvm_notify_work.
Since this happens in a loop, multiple times, the pipe that emulates eventfd
becomes full and we get a deadlock.

It is solved by checking for write-readiness using select.
Note that I check for select only for full 8 byte write and not
for partial writes. This is because we'll break the reader otherwise.

Signed-off-by: Dor Laor <dor@redhat.com>
---
 qemu-kvm.c |   16 ++++++++++++++++
 1 files changed, 16 insertions(+), 0 deletions(-)

Comments

Mark McLoughlin July 16, 2009, 9:39 a.m. UTC | #1
Hi Dor,

On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
> If the migration socket is full, we get EAGAIN for the write.
> The set_fd_handler2 defers the write for later on. The function
> tries to wake up the iothread by qemu_kvm_notify_work.
> Since this happens in a loop, multiple times, the pipe that emulates
> eventfd becomes full and we get a deadlock.

I'm not sure I follow:

  - You're seeing qemu_kvm_notify_work() being called many times

  - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(), 
    main_loop_break()

  - This happens when write() in migrate_fd_put_buffer() returns EAGAIN 
    because the socket buffer has filled up

Correct?

That sounds like migrate_fd_put_buffer() is being called repeatedly
while we know the socket isn't writable?

Shouldn't the buffered file could stop attempting to call put_buffer()
until it has been notified that the underlying fd is writable?

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dor Laor July 16, 2009, noon UTC | #2
On 07/16/2009 12:39 PM, Mark McLoughlin wrote:
> Hi Dor,
>
> On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
>> If the migration socket is full, we get EAGAIN for the write.
>> The set_fd_handler2 defers the write for later on. The function
>> tries to wake up the iothread by qemu_kvm_notify_work.
>> Since this happens in a loop, multiple times, the pipe that emulates
>> eventfd becomes full and we get a deadlock.
>
> I'm not sure I follow:
>
>    - You're seeing qemu_kvm_notify_work() being called many times
>
>    - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(),
>      main_loop_break()
>
>    - This happens when write() in migrate_fd_put_buffer() returns EAGAIN
>      because the socket buffer has filled up
>
> Correct?
>
> That sounds like migrate_fd_put_buffer() is being called repeatedly
> while we know the socket isn't writable?
>
> Shouldn't the buffered file could stop attempting to call put_buffer()
> until it has been notified that the underlying fd is writable?

There are two fds here:
The one returning EAGAIN, is the migration socket. That's why 
migrate_fd_put_buffer is called and a call back is rescheduled by 
qemu_set_fd_handler2.

In the procedure of qemu_set_fd_handler2, the main_loop_break is called. 
It needs to notify the iothread. It does is by qemu_eventfd, since it is 
being emulated on older kernels, we use a pipe.

The pipe fd is the one that blocks.
I though of setting it to non-blocking but we might get partial writes 
with a non blocking fd (in theory) too.


>
> Cheers,
> Mark.
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark McLoughlin July 16, 2009, 3:20 p.m. UTC | #3
On Thu, 2009-07-16 at 15:00 +0300, Dor Laor wrote:
> On 07/16/2009 12:39 PM, Mark McLoughlin wrote:
> > Hi Dor,
> >
> > On Wed, 2009-07-15 at 18:03 +0300, Dor Laor wrote:
> >> If the migration socket is full, we get EAGAIN for the write.
> >> The set_fd_handler2 defers the write for later on. The function
> >> tries to wake up the iothread by qemu_kvm_notify_work.
> >> Since this happens in a loop, multiple times, the pipe that emulates
> >> eventfd becomes full and we get a deadlock.
> >
> > I'm not sure I follow:
> >
> >    - You're seeing qemu_kvm_notify_work() being called many times
> >
> >    - The call chain is migrate_fd_put_buffer(), qemu_set_fd_handler2(),
> >      main_loop_break()
> >
> >    - This happens when write() in migrate_fd_put_buffer() returns EAGAIN
> >      because the socket buffer has filled up
> >
> > Correct?
> >
> > That sounds like migrate_fd_put_buffer() is being called repeatedly
> > while we know the socket isn't writable?
> >
> > Shouldn't the buffered file could stop attempting to call put_buffer()
> > until it has been notified that the underlying fd is writable?
> 
> There are two fds here:
> The one returning EAGAIN, is the migration socket. That's why 
> migrate_fd_put_buffer is called and a call back is rescheduled by 
> qemu_set_fd_handler2.
> 
> In the procedure of qemu_set_fd_handler2, the main_loop_break is called. 
> It needs to notify the iothread. It does is by qemu_eventfd, since it is 
> being emulated on older kernels, we use a pipe.
> 
> The pipe fd is the one that blocks.
> I though of setting it to non-blocking but we might get partial writes 
> with a non blocking fd (in theory) too.

Yes, but if the pipe fd is blocking, that means we're writing a lot of
events to it, which means the write to the migration socket is failing
with EAGAIN a lot.

As soon as we get EAGAIN on the migration socket, we should stop trying
to write to socket until the we get notification that its writable.

If we did that, we'd only be writing a very small number of events to
the pipe and we wouldn't block.

Cheers,
Mark.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/qemu-kvm.c b/qemu-kvm.c
index ed7e466..0ea67a7 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -2094,12 +2094,28 @@  void qemu_kvm_notify_work(void)
     uint64_t value = 1;
     char buffer[8];
     size_t offset = 0;
+    fd_set wfds;
+    struct timeval tv;
+    int retval;
 
     if (io_thread_fd == -1)
 	return;
 
     memcpy(buffer, &value, sizeof(value));
 
+    FD_ZERO(&wfds);
+    FD_SET(io_thread_fd, &wfds);
+    tv.tv_sec = tv.tv_usec = 0;
+    retval = select(io_thread_fd + 1, NULL, &wfds, NULL, &tv);
+    if (retval == -1) {
+        fprintf(stderr, "failed to notify io thread due to select error\n");
+        return;
+    } else if (retval == 0)
+        /* We probably ponded this pipe too much and it is full now */
+        return;
+
+    assert(FD_ISSET(io_thread_fd, &wfds));
+
     while (offset < 8) {
 	ssize_t len;