Message ID | 20170313124434.1043-14-quintela@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 13/03/2017 13:44, Juan Quintela wrote: > case RAM_SAVE_FLAG_MULTIFD_PAGE: > fd_num = qemu_get_be16(f); > - if (fd_num != 0) { > - /* this is yet an unused variable, changed later */ > - fd_num = fd_num; > - } > + multifd_recv_page(host, fd_num); > qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > break; I still believe this design is a mistake. Paolo
* Paolo Bonzini (pbonzini@redhat.com) wrote: > > > On 13/03/2017 13:44, Juan Quintela wrote: > > case RAM_SAVE_FLAG_MULTIFD_PAGE: > > fd_num = qemu_get_be16(f); > > - if (fd_num != 0) { > > - /* this is yet an unused variable, changed later */ > > - fd_num = fd_num; > > - } > > + multifd_recv_page(host, fd_num); > > qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > > break; > > I still believe this design is a mistake. Is it a use of a separate FD carrying all of the flags/addresses that you object to? Dave > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 17/03/2017 14:02, Dr. David Alan Gilbert wrote: >>> case RAM_SAVE_FLAG_MULTIFD_PAGE: >>> fd_num = qemu_get_be16(f); >>> - if (fd_num != 0) { >>> - /* this is yet an unused variable, changed later */ >>> - fd_num = fd_num; >>> - } >>> + multifd_recv_page(host, fd_num); >>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE); >>> break; >> I still believe this design is a mistake. > Is it a use of a separate FD carrying all of the flags/addresses that > you object to? Yes, it introduces a serialization point unnecessarily, and I don't believe the rationale that Juan offered was strong enough. This is certainly true on the receive side, but serialization is not even necessary on the send side. Multiple threads can efficiently split the work among themselves and visit the dirty bitmap without a central distributor. I need to study the code more to understand another issue. Say you have a page that is sent to two different threads in two different iterations, like thread 1 iteration 1: pages 3, 7 thread 2 iteration 1: page 3 iteration 2: page 7 Does the code ensure that all threads wait at the end of an iteration? Otherwise, thread 2 could process page 7 from iteration 2 before or while thread 1 processes the same page from iteration 1. Paolo
* Paolo Bonzini (pbonzini@redhat.com) wrote: > > > On 17/03/2017 14:02, Dr. David Alan Gilbert wrote: > >>> case RAM_SAVE_FLAG_MULTIFD_PAGE: > >>> fd_num = qemu_get_be16(f); > >>> - if (fd_num != 0) { > >>> - /* this is yet an unused variable, changed later */ > >>> - fd_num = fd_num; > >>> - } > >>> + multifd_recv_page(host, fd_num); > >>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE); > >>> break; > >> I still believe this design is a mistake. > > Is it a use of a separate FD carrying all of the flags/addresses that > > you object to? > > Yes, it introduces a serialization point unnecessarily, and I don't > believe the rationale that Juan offered was strong enough. > > This is certainly true on the receive side, but serialization is not > even necessary on the send side. Is there an easy way to benchmark it (without writing both) to figure out if sending (word) (page) on one fd is less efficient than sending two fd's with the pages and words separate? > Multiple threads can efficiently split > the work among themselves and visit the dirty bitmap without a central > distributor. I mostly agree; I kind of fancy the idea of having one per NUMA node; but a central distributor might be a good idea anyway in the cases where you find the heavy-writer all happens to be in the same area. > > I need to study the code more to understand another issue. Say you have > a page that is sent to two different threads in two different > iterations, like > > thread 1 > iteration 1: pages 3, 7 > thread 2 > iteration 1: page 3 > iteration 2: page 7 > > Does the code ensure that all threads wait at the end of an iteration? > Otherwise, thread 2 could process page 7 from iteration 2 before or > while thread 1 processes the same page from iteration 1. I think there's a sync at the end of each iteration on Juan's current code that stops that. Dave > > Paolo -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 17/03/2017 20:36, Dr. David Alan Gilbert wrote: > * Paolo Bonzini (pbonzini@redhat.com) wrote: >> On 17/03/2017 14:02, Dr. David Alan Gilbert wrote: >>>>> case RAM_SAVE_FLAG_MULTIFD_PAGE: >>>>> fd_num = qemu_get_be16(f); >>>>> - if (fd_num != 0) { >>>>> - /* this is yet an unused variable, changed later */ >>>>> - fd_num = fd_num; >>>>> - } >>>>> + multifd_recv_page(host, fd_num); >>>>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE); >>>>> break; >>>> I still believe this design is a mistake. >>> Is it a use of a separate FD carrying all of the flags/addresses that >>> you object to? >> >> Yes, it introduces a serialization point unnecessarily, and I don't >> believe the rationale that Juan offered was strong enough. >> >> This is certainly true on the receive side, but serialization is not >> even necessary on the send side. > > Is there an easy way to benchmark it (without writing both) to figure > out if sending (word) (page) on one fd is less efficient than sending > two fd's with the pages and words separate? I think it shouldn't be hard to write a version which keeps the central distributor but puts the metadata in the auxiliary fds too. But I think what matters is not efficiency, but rather being more forward-proof. Besides liberty of changing implementation, Juan's current code simply has no commands in auxiliary file descriptors, which can be very limiting. Paolo >> Multiple threads can efficiently split >> the work among themselves and visit the dirty bitmap without a central >> distributor. > > I mostly agree; I kind of fancy the idea of having one per NUMA node; > but a central distributor might be a good idea anyway in the cases > where you find the heavy-writer all happens to be in the same area. > >> >> I need to study the code more to understand another issue. Say you have >> a page that is sent to two different threads in two different >> iterations, like >> >> thread 1 >> iteration 1: pages 3, 7 >> thread 2 >> iteration 1: page 3 >> iteration 2: page 7 >> >> Does the code ensure that all threads wait at the end of an iteration? >> Otherwise, thread 2 could process page 7 from iteration 2 before or >> while thread 1 processes the same page from iteration 1. > > I think there's a sync at the end of each iteration on Juan's current code > that stops that.
Paolo Bonzini <pbonzini@redhat.com> wrote: > On 17/03/2017 20:36, Dr. David Alan Gilbert wrote: >> * Paolo Bonzini (pbonzini@redhat.com) wrote: >>> On 17/03/2017 14:02, Dr. David Alan Gilbert wrote: >>>>>> case RAM_SAVE_FLAG_MULTIFD_PAGE: >>>>>> fd_num = qemu_get_be16(f); >>>>>> - if (fd_num != 0) { >>>>>> - /* this is yet an unused variable, changed later */ >>>>>> - fd_num = fd_num; >>>>>> - } >>>>>> + multifd_recv_page(host, fd_num); >>>>>> qemu_get_buffer(f, host, TARGET_PAGE_SIZE); >>>>>> break; >>>>> I still believe this design is a mistake. >>>> Is it a use of a separate FD carrying all of the flags/addresses that >>>> you object to? >>> >>> Yes, it introduces a serialization point unnecessarily, and I don't >>> believe the rationale that Juan offered was strong enough. >>> >>> This is certainly true on the receive side, but serialization is not >>> even necessary on the send side. >> >> Is there an easy way to benchmark it (without writing both) to figure >> out if sending (word) (page) on one fd is less efficient than sending >> two fd's with the pages and words separate? > > I think it shouldn't be hard to write a version which keeps the central > distributor but puts the metadata in the auxiliary fds too. That is not difficult to do (famous last words). I will try to test both approachs for next version, thanks. > > But I think what matters is not efficiency, but rather being more > forward-proof. Besides liberty of changing implementation, Juan's > current code simply has no commands in auxiliary file descriptors, which > can be very limiting. > > Paolo > >>> Multiple threads can efficiently split >>> the work among themselves and visit the dirty bitmap without a central >>> distributor. >> >> I mostly agree; I kind of fancy the idea of having one per NUMA node; >> but a central distributor might be a good idea anyway in the cases >> where you find the heavy-writer all happens to be in the same area. >> >>> >>> I need to study the code more to understand another issue. Say you have >>> a page that is sent to two different threads in two different >>> iterations, like >>> >>> thread 1 >>> iteration 1: pages 3, 7 >>> thread 2 >>> iteration 1: page 3 >>> iteration 2: page 7 >>> >>> Does the code ensure that all threads wait at the end of an iteration? >>> Otherwise, thread 2 could process page 7 from iteration 2 before or >>> while thread 1 processes the same page from iteration 1. >> >> I think there's a sync at the end of each iteration on Juan's current code >> that stops that. This can't happen by design. We sync all threads at the end of each migration. Later, Juan.
diff --git a/migration/ram.c b/migration/ram.c index 6f5ca50..3b1a2dc 100644 --- a/migration/ram.c +++ b/migration/ram.c @@ -44,6 +44,7 @@ #include "exec/ram_addr.h" #include "qemu/rcu_queue.h" #include "migration/colo.h" +#include "qemu/iov.h" static int dirty_rate_high_cnt; @@ -536,7 +537,7 @@ int migrate_multifd_send_threads_create(void) return 0; } -static int multifd_send_page(uint8_t *address) +static uint16_t multifd_send_page(uint8_t *address, bool last_page) { int i, j; MultiFDSendParams *p = NULL; /* make happy gcc */ @@ -552,8 +553,10 @@ static int multifd_send_page(uint8_t *address) pages.iov[pages.num].iov_len = TARGET_PAGE_SIZE; pages.num++; - if (pages.num < (pages.size - 1)) { - return UINT16_MAX; + if (!last_page) { + if (pages.num < (pages.size - 1)) { + return UINT16_MAX; + } } qemu_sem_wait(&multifd_send_state->sem); @@ -581,13 +584,18 @@ static int multifd_send_page(uint8_t *address) } struct MultiFDRecvParams { + /* not changed */ int id; QemuThread thread; QIOChannel *c; QemuSemaphore init; + QemuSemaphore ready; QemuSemaphore sem; QemuMutex mutex; + /* proteced by param mutex */ bool quit; + multifd_pages_t pages; + bool done; }; typedef struct MultiFDRecvParams MultiFDRecvParams; @@ -641,6 +649,7 @@ static void *multifd_recv_thread(void *opaque) qio_channel_read(p->c, &start, 1, &error_abort); qemu_sem_post(&p->init); + qemu_sem_post(&p->ready); while (true) { qemu_mutex_lock(&p->mutex); @@ -648,6 +657,13 @@ static void *multifd_recv_thread(void *opaque) qemu_mutex_unlock(&p->mutex); break; } + if (p->pages.num) { + p->pages.num = 0; + p->done = true; + qemu_mutex_unlock(&p->mutex); + qemu_sem_post(&p->ready); + continue; + } qemu_mutex_unlock(&p->mutex); qemu_sem_wait(&p->sem); } @@ -672,8 +688,11 @@ int migrate_multifd_recv_threads_create(void) qemu_mutex_init(&p->mutex); qemu_sem_init(&p->sem, 0); qemu_sem_init(&p->init, 0); + qemu_sem_init(&p->ready, 0); p->quit = false; p->id = i; + p->done = false; + multifd_init_group(&p->pages); p->c = socket_recv_channel_create(); if (!p->c) { @@ -690,6 +709,42 @@ int migrate_multifd_recv_threads_create(void) return 0; } +static void multifd_recv_page(uint8_t *address, uint16_t fd_num) +{ + int thread_count; + MultiFDRecvParams *p; + static multifd_pages_t pages; + static bool once; + + if (!once) { + multifd_init_group(&pages); + once = true; + } + + pages.iov[pages.num].iov_base = address; + pages.iov[pages.num].iov_len = TARGET_PAGE_SIZE; + pages.num++; + + if (fd_num == UINT16_MAX) { + return; + } + + thread_count = migrate_multifd_threads(); + assert(fd_num < thread_count); + p = &multifd_recv_state->params[fd_num]; + + qemu_sem_wait(&p->ready); + + qemu_mutex_lock(&p->mutex); + p->done = false; + iov_copy(p->pages.iov, pages.num, pages.iov, pages.num, 0, + iov_size(pages.iov, pages.num)); + p->pages.num = pages.num; + pages.num = 0; + qemu_mutex_unlock(&p->mutex); + qemu_sem_post(&p->sem); +} + /** * save_page_header: Write page header to wire * @@ -1153,7 +1208,7 @@ static int ram_multifd_page(QEMUFile *f, PageSearchStatus *pss, if (pages == -1) { *bytes_transferred += save_page_header(f, block, offset | RAM_SAVE_FLAG_MULTIFD_PAGE); - fd_num = multifd_send_page(p); + fd_num = multifd_send_page(p, migration_dirty_pages == 1); qemu_put_be16(f, fd_num); *bytes_transferred += 2; /* size of fd_num */ qemu_put_buffer(f, p, TARGET_PAGE_SIZE); @@ -3012,10 +3067,7 @@ static int ram_load(QEMUFile *f, void *opaque, int version_id) case RAM_SAVE_FLAG_MULTIFD_PAGE: fd_num = qemu_get_be16(f); - if (fd_num != 0) { - /* this is yet an unused variable, changed later */ - fd_num = fd_num; - } + multifd_recv_page(host, fd_num); qemu_get_buffer(f, host, TARGET_PAGE_SIZE); break;
We make the locking and the transfer of information specific, even if we are still receiving things through the main thread. Signed-off-by: Juan Quintela <quintela@redhat.com> --- migration/ram.c | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 60 insertions(+), 8 deletions(-)