diff mbox series

[2/2] migration/multifd: Allow to sync with sender threads only

Message ID 20241205185303.897010-3-peterx@redhat.com (mailing list archive)
State New
Headers show
Series migration/multifd: Some VFIO preparations | expand

Commit Message

Peter Xu Dec. 5, 2024, 6:53 p.m. UTC
Teach multifd_send_sync_main() to sync with threads only.

We already have such requests, which is when mapped-ram is enabled with
multifd.  In that case, no SYNC messages will be pushed to the stream when
multifd syncs the sender threads because there's no destination threads
waiting for that.  The whole point of the sync is to make sure all threads
flushed their jobs.

So fundamentally we have a request to do the sync in different ways:

  - Either to sync the threads only,
  - Or to sync the threads but also with the destination side

Mapped-ram did it already because of the use_packet check in the sync
handler of the sender thread.  It works.

However it may stop working when e.g. VFIO may start to reuse multifd
channels to push device states.  In that case VFIO has similar request on
"thread-only sync" however we can't check a flag because such sync request
can still come from RAM which needs the on-wire notifications.

Paving way for that by allowing the multifd_send_sync_main() to specify
what kind of sync the caller needs.  We can use it for mapped-ram already.

No functional change intended.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 migration/multifd.h        | 16 +++++++++++++---
 migration/multifd-nocomp.c |  8 +++++++-
 migration/multifd.c        | 14 ++++++++------
 3 files changed, 28 insertions(+), 10 deletions(-)

Comments

Fabiano Rosas Dec. 5, 2024, 8:16 p.m. UTC | #1
Peter Xu <peterx@redhat.com> writes:

> Teach multifd_send_sync_main() to sync with threads only.
>
> We already have such requests, which is when mapped-ram is enabled with
> multifd.  In that case, no SYNC messages will be pushed to the stream when
> multifd syncs the sender threads because there's no destination threads
> waiting for that.  The whole point of the sync is to make sure all threads
> flushed their jobs.
>
> So fundamentally we have a request to do the sync in different ways:
>
>   - Either to sync the threads only,
>   - Or to sync the threads but also with the destination side
>
> Mapped-ram did it already because of the use_packet check in the sync
> handler of the sender thread.  It works.
>
> However it may stop working when e.g. VFIO may start to reuse multifd
> channels to push device states.  In that case VFIO has similar request on
> "thread-only sync" however we can't check a flag because such sync request
> can still come from RAM which needs the on-wire notifications.
>
> Paving way for that by allowing the multifd_send_sync_main() to specify
> what kind of sync the caller needs.  We can use it for mapped-ram already.
>
> No functional change intended.
>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  migration/multifd.h        | 16 +++++++++++++---
>  migration/multifd-nocomp.c |  8 +++++++-
>  migration/multifd.c        | 14 ++++++++------
>  3 files changed, 28 insertions(+), 10 deletions(-)
>
> diff --git a/migration/multifd.h b/migration/multifd.h
> index 50d58c0c9c..6b2f60a917 100644
> --- a/migration/multifd.h
> +++ b/migration/multifd.h
> @@ -19,6 +19,15 @@
>  typedef struct MultiFDRecvData MultiFDRecvData;
>  typedef struct MultiFDSendData MultiFDSendData;
>  
> +typedef enum {
> +    /* No sync request */
> +    MULTIFD_SYNC_NONE = 0,
> +    /* Sync on the sender threads without pushing messages */
> +    MULTIFD_SYNC_THREADS,
> +    /* Sync on the sender threads, meanwhile push "SYNC" message to the wire */

s/meanwhile//

> +    MULTIFD_SYNC_THREADS_AND_NOTIFY,
> +} MultiFDSyncReq;

I think I'd prefer the local vs. remote terminology I introduced in my
proposal [1] for cleaning up the multifd_flush_after_each_section() code:

LOCAL - sync the local threads between themselves
REMOTE - put a message on the stream for the remote end to perform a
         sync on their threads.

Down below you're passing the
MULTIFD_SYNC_THREADS_AND_NOTIFY into the send thread, but the "sync
threads" part of this is really done outside the thread, so that part
doesn't have a meaning inside the thread.

1- https://lore.kernel.org/r/875xo8n4ue.fsf@suse.de

Also, please provide your input there^, it would be nice to unify the
terminology and reasoning about both changes.

> +
>  bool multifd_send_setup(void);
>  void multifd_send_shutdown(void);
>  void multifd_send_channel_created(void);
> @@ -28,7 +37,7 @@ void multifd_recv_shutdown(void);
>  bool multifd_recv_all_channels_created(void);
>  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>  void multifd_recv_sync_main(void);
> -int multifd_send_sync_main(void);
> +int multifd_send_sync_main(MultiFDSyncReq req);
>  bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>  bool multifd_recv(void);
>  MultiFDRecvData *multifd_get_recv_data(void);
> @@ -143,7 +152,7 @@ typedef struct {
>      /* multifd flags for each packet */
>      uint32_t flags;
>      /*
> -     * The sender thread has work to do if either of below boolean is set.
> +     * The sender thread has work to do if either of below field is set.
>       *
>       * @pending_job:  a job is pending
>       * @pending_sync: a sync request is pending
> @@ -152,7 +161,8 @@ typedef struct {
>       * cleared by the multifd sender threads.
>       */
>      bool pending_job;
> -    bool pending_sync;
> +    MultiFDSyncReq pending_sync;
> +
>      MultiFDSendData *data;
>  
>      /* thread local variables. No locking required */
> diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> index 55191152f9..f64c4c9abd 100644
> --- a/migration/multifd-nocomp.c
> +++ b/migration/multifd-nocomp.c
> @@ -345,6 +345,8 @@ retry:
>  
>  int multifd_ram_flush_and_sync(void)
>  {
> +    MultiFDSyncReq req;
> +
>      if (!migrate_multifd()) {
>          return 0;
>      }
> @@ -356,7 +358,11 @@ int multifd_ram_flush_and_sync(void)
>          }
>      }
>  
> -    return multifd_send_sync_main();
> +    /* File migrations only need to sync with threads */
> +    req = migrate_mapped_ram() ?
> +        MULTIFD_SYNC_THREADS : MULTIFD_SYNC_THREADS_AND_NOTIFY;
> +
> +    return multifd_send_sync_main(req);
>  }
>  
>  bool multifd_send_prepare_common(MultiFDSendParams *p)
> diff --git a/migration/multifd.c b/migration/multifd.c
> index 498e71fd10..77645e87a0 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -523,7 +523,7 @@ static int multifd_zero_copy_flush(QIOChannel *c)
>      return ret;
>  }
>  
> -int multifd_send_sync_main(void)
> +int multifd_send_sync_main(MultiFDSyncReq req)
>  {
>      int i;
>      bool flush_zero_copy;
> @@ -543,8 +543,8 @@ int multifd_send_sync_main(void)
>           * We should be the only user so far, so not possible to be set by
>           * others concurrently.
>           */
> -        assert(qatomic_read(&p->pending_sync) == false);
> -        qatomic_set(&p->pending_sync, true);
> +        assert(qatomic_read(&p->pending_sync) == MULTIFD_SYNC_NONE);
> +        qatomic_set(&p->pending_sync, req);

Hmm, isn't it easier to skip the whole loop if req ==
MULTIFD_SYNC_THREADS? I don't remember why we kept this loop here for
mapped-ram.

>          qemu_sem_post(&p->sem);
>      }
>      for (i = 0; i < migrate_multifd_channels(); i++) {
> @@ -635,14 +635,16 @@ static void *multifd_send_thread(void *opaque)
>               */
>              qatomic_store_release(&p->pending_job, false);
>          } else {
> +            MultiFDSyncReq req = qatomic_read(&p->pending_sync);
> +
>              /*
>               * If not a normal job, must be a sync request.  Note that
>               * pending_sync is a standalone flag (unlike pending_job), so
>               * it doesn't require explicit memory barriers.
>               */
> -            assert(qatomic_read(&p->pending_sync));
> +            assert(req != MULTIFD_SYNC_NONE);
>  
> -            if (use_packets) {
> +            if (req == MULTIFD_SYNC_THREADS_AND_NOTIFY) {

Good, more explicit.

>                  p->flags = MULTIFD_FLAG_SYNC;
>                  multifd_send_fill_packet(p);
>                  ret = qio_channel_write_all(p->c, (void *)p->packet,
> @@ -654,7 +656,7 @@ static void *multifd_send_thread(void *opaque)
>                  stat64_add(&mig_stats.multifd_bytes, p->packet_len);
>              }
>  
> -            qatomic_set(&p->pending_sync, false);
> +            qatomic_set(&p->pending_sync, MULTIFD_SYNC_NONE);

It's a bit weird that MULTIFD_SYNC_THREADS will never have an use inside
the thread. Makes me think it should never exist in the first place. But
then we're back into pending_sync + use_packets... looks like it would
be less convoluted to skip the loop up there and assert(!use_packets) in
here.

Unless I'm missing something...

>              qemu_sem_post(&p->sem_sync);
>          }
>      }
Peter Xu Dec. 5, 2024, 8:35 p.m. UTC | #2
On Thu, Dec 05, 2024 at 05:16:05PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > Teach multifd_send_sync_main() to sync with threads only.
> >
> > We already have such requests, which is when mapped-ram is enabled with
> > multifd.  In that case, no SYNC messages will be pushed to the stream when
> > multifd syncs the sender threads because there's no destination threads
> > waiting for that.  The whole point of the sync is to make sure all threads
> > flushed their jobs.
> >
> > So fundamentally we have a request to do the sync in different ways:
> >
> >   - Either to sync the threads only,
> >   - Or to sync the threads but also with the destination side
> >
> > Mapped-ram did it already because of the use_packet check in the sync
> > handler of the sender thread.  It works.
> >
> > However it may stop working when e.g. VFIO may start to reuse multifd
> > channels to push device states.  In that case VFIO has similar request on
> > "thread-only sync" however we can't check a flag because such sync request
> > can still come from RAM which needs the on-wire notifications.
> >
> > Paving way for that by allowing the multifd_send_sync_main() to specify
> > what kind of sync the caller needs.  We can use it for mapped-ram already.
> >
> > No functional change intended.
> >
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  migration/multifd.h        | 16 +++++++++++++---
> >  migration/multifd-nocomp.c |  8 +++++++-
> >  migration/multifd.c        | 14 ++++++++------
> >  3 files changed, 28 insertions(+), 10 deletions(-)
> >
> > diff --git a/migration/multifd.h b/migration/multifd.h
> > index 50d58c0c9c..6b2f60a917 100644
> > --- a/migration/multifd.h
> > +++ b/migration/multifd.h
> > @@ -19,6 +19,15 @@
> >  typedef struct MultiFDRecvData MultiFDRecvData;
> >  typedef struct MultiFDSendData MultiFDSendData;
> >  
> > +typedef enum {
> > +    /* No sync request */
> > +    MULTIFD_SYNC_NONE = 0,
> > +    /* Sync on the sender threads without pushing messages */
> > +    MULTIFD_SYNC_THREADS,
> > +    /* Sync on the sender threads, meanwhile push "SYNC" message to the wire */
> 
> s/meanwhile//
> 
> > +    MULTIFD_SYNC_THREADS_AND_NOTIFY,
> > +} MultiFDSyncReq;
> 
> I think I'd prefer the local vs. remote terminology I introduced in my
> proposal [1] for cleaning up the multifd_flush_after_each_section() code:

I'm ok with your naming, as long as the comment will explain.

> 
> LOCAL - sync the local threads between themselves
> REMOTE - put a message on the stream for the remote end to perform a
>          sync on their threads.
> 
> Down below you're passing the
> MULTIFD_SYNC_THREADS_AND_NOTIFY into the send thread, but the "sync
> threads" part of this is really done outside the thread, so that part
> doesn't have a meaning inside the thread.
> 
> 1- https://lore.kernel.org/r/875xo8n4ue.fsf@suse.de
> 
> Also, please provide your input there^, it would be nice to unify the
> terminology and reasoning about both changes.

Yes, I'm mostly flushing my inbox in time order unless prioritized, so I'm
getting there today or tomorrow.

> 
> > +
> >  bool multifd_send_setup(void);
> >  void multifd_send_shutdown(void);
> >  void multifd_send_channel_created(void);
> > @@ -28,7 +37,7 @@ void multifd_recv_shutdown(void);
> >  bool multifd_recv_all_channels_created(void);
> >  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
> >  void multifd_recv_sync_main(void);
> > -int multifd_send_sync_main(void);
> > +int multifd_send_sync_main(MultiFDSyncReq req);
> >  bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
> >  bool multifd_recv(void);
> >  MultiFDRecvData *multifd_get_recv_data(void);
> > @@ -143,7 +152,7 @@ typedef struct {
> >      /* multifd flags for each packet */
> >      uint32_t flags;
> >      /*
> > -     * The sender thread has work to do if either of below boolean is set.
> > +     * The sender thread has work to do if either of below field is set.
> >       *
> >       * @pending_job:  a job is pending
> >       * @pending_sync: a sync request is pending
> > @@ -152,7 +161,8 @@ typedef struct {
> >       * cleared by the multifd sender threads.
> >       */
> >      bool pending_job;
> > -    bool pending_sync;
> > +    MultiFDSyncReq pending_sync;
> > +
> >      MultiFDSendData *data;
> >  
> >      /* thread local variables. No locking required */
> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
> > index 55191152f9..f64c4c9abd 100644
> > --- a/migration/multifd-nocomp.c
> > +++ b/migration/multifd-nocomp.c
> > @@ -345,6 +345,8 @@ retry:
> >  
> >  int multifd_ram_flush_and_sync(void)
> >  {
> > +    MultiFDSyncReq req;
> > +
> >      if (!migrate_multifd()) {
> >          return 0;
> >      }
> > @@ -356,7 +358,11 @@ int multifd_ram_flush_and_sync(void)
> >          }
> >      }
> >  
> > -    return multifd_send_sync_main();
> > +    /* File migrations only need to sync with threads */
> > +    req = migrate_mapped_ram() ?
> > +        MULTIFD_SYNC_THREADS : MULTIFD_SYNC_THREADS_AND_NOTIFY;
> > +
> > +    return multifd_send_sync_main(req);
> >  }
> >  
> >  bool multifd_send_prepare_common(MultiFDSendParams *p)
> > diff --git a/migration/multifd.c b/migration/multifd.c
> > index 498e71fd10..77645e87a0 100644
> > --- a/migration/multifd.c
> > +++ b/migration/multifd.c
> > @@ -523,7 +523,7 @@ static int multifd_zero_copy_flush(QIOChannel *c)
> >      return ret;
> >  }
> >  
> > -int multifd_send_sync_main(void)
> > +int multifd_send_sync_main(MultiFDSyncReq req)
> >  {
> >      int i;
> >      bool flush_zero_copy;
> > @@ -543,8 +543,8 @@ int multifd_send_sync_main(void)
> >           * We should be the only user so far, so not possible to be set by
> >           * others concurrently.
> >           */
> > -        assert(qatomic_read(&p->pending_sync) == false);
> > -        qatomic_set(&p->pending_sync, true);
> > +        assert(qatomic_read(&p->pending_sync) == MULTIFD_SYNC_NONE);
> > +        qatomic_set(&p->pending_sync, req);
> 
> Hmm, isn't it easier to skip the whole loop if req ==
> MULTIFD_SYNC_THREADS? I don't remember why we kept this loop here for
> mapped-ram.

The "thread-only" version of request (or, in your preferred naming, "local"
sync request) says: "please flush all the works enqueued in sender thread".
Sync is still needed even for mapped-ram to make sure pwrite()s all land.
Also needed for VFIO.

> 
> >          qemu_sem_post(&p->sem);
> >      }
> >      for (i = 0; i < migrate_multifd_channels(); i++) {
> > @@ -635,14 +635,16 @@ static void *multifd_send_thread(void *opaque)
> >               */
> >              qatomic_store_release(&p->pending_job, false);
> >          } else {
> > +            MultiFDSyncReq req = qatomic_read(&p->pending_sync);
> > +
> >              /*
> >               * If not a normal job, must be a sync request.  Note that
> >               * pending_sync is a standalone flag (unlike pending_job), so
> >               * it doesn't require explicit memory barriers.
> >               */
> > -            assert(qatomic_read(&p->pending_sync));
> > +            assert(req != MULTIFD_SYNC_NONE);
> >  
> > -            if (use_packets) {
> > +            if (req == MULTIFD_SYNC_THREADS_AND_NOTIFY) {
> 
> Good, more explicit.
> 
> >                  p->flags = MULTIFD_FLAG_SYNC;
> >                  multifd_send_fill_packet(p);
> >                  ret = qio_channel_write_all(p->c, (void *)p->packet,
> > @@ -654,7 +656,7 @@ static void *multifd_send_thread(void *opaque)
> >                  stat64_add(&mig_stats.multifd_bytes, p->packet_len);
> >              }
> >  
> > -            qatomic_set(&p->pending_sync, false);
> > +            qatomic_set(&p->pending_sync, MULTIFD_SYNC_NONE);
> 
> It's a bit weird that MULTIFD_SYNC_THREADS will never have an use inside
> the thread.

It has; it guarantees that existing queued pending_job is completed.

> Makes me think it should never exist in the first place. But
> then we're back into pending_sync + use_packets... looks like it would
> be less convoluted to skip the loop up there and assert(!use_packets) in
> here.
> 
> Unless I'm missing something...
> 
> >              qemu_sem_post(&p->sem_sync);
> >          }
> >      }
>
Fabiano Rosas Dec. 5, 2024, 9:50 p.m. UTC | #3
Peter Xu <peterx@redhat.com> writes:

> On Thu, Dec 05, 2024 at 05:16:05PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > Teach multifd_send_sync_main() to sync with threads only.
>> >
>> > We already have such requests, which is when mapped-ram is enabled with
>> > multifd.  In that case, no SYNC messages will be pushed to the stream when
>> > multifd syncs the sender threads because there's no destination threads
>> > waiting for that.  The whole point of the sync is to make sure all threads
>> > flushed their jobs.
>> >
>> > So fundamentally we have a request to do the sync in different ways:
>> >
>> >   - Either to sync the threads only,
>> >   - Or to sync the threads but also with the destination side
>> >
>> > Mapped-ram did it already because of the use_packet check in the sync
>> > handler of the sender thread.  It works.
>> >
>> > However it may stop working when e.g. VFIO may start to reuse multifd
>> > channels to push device states.  In that case VFIO has similar request on
>> > "thread-only sync" however we can't check a flag because such sync request
>> > can still come from RAM which needs the on-wire notifications.
>> >
>> > Paving way for that by allowing the multifd_send_sync_main() to specify
>> > what kind of sync the caller needs.  We can use it for mapped-ram already.
>> >
>> > No functional change intended.
>> >
>> > Signed-off-by: Peter Xu <peterx@redhat.com>
>> > ---
>> >  migration/multifd.h        | 16 +++++++++++++---
>> >  migration/multifd-nocomp.c |  8 +++++++-
>> >  migration/multifd.c        | 14 ++++++++------
>> >  3 files changed, 28 insertions(+), 10 deletions(-)
>> >
>> > diff --git a/migration/multifd.h b/migration/multifd.h
>> > index 50d58c0c9c..6b2f60a917 100644
>> > --- a/migration/multifd.h
>> > +++ b/migration/multifd.h
>> > @@ -19,6 +19,15 @@
>> >  typedef struct MultiFDRecvData MultiFDRecvData;
>> >  typedef struct MultiFDSendData MultiFDSendData;
>> >  
>> > +typedef enum {
>> > +    /* No sync request */
>> > +    MULTIFD_SYNC_NONE = 0,
>> > +    /* Sync on the sender threads without pushing messages */
>> > +    MULTIFD_SYNC_THREADS,
>> > +    /* Sync on the sender threads, meanwhile push "SYNC" message to the wire */
>> 
>> s/meanwhile//
>> 
>> > +    MULTIFD_SYNC_THREADS_AND_NOTIFY,
>> > +} MultiFDSyncReq;
>> 
>> I think I'd prefer the local vs. remote terminology I introduced in my
>> proposal [1] for cleaning up the multifd_flush_after_each_section() code:
>
> I'm ok with your naming, as long as the comment will explain.
>
>> 
>> LOCAL - sync the local threads between themselves
>> REMOTE - put a message on the stream for the remote end to perform a
>>          sync on their threads.
>> 
>> Down below you're passing the
>> MULTIFD_SYNC_THREADS_AND_NOTIFY into the send thread, but the "sync
>> threads" part of this is really done outside the thread, so that part
>> doesn't have a meaning inside the thread.
>> 
>> 1- https://lore.kernel.org/r/875xo8n4ue.fsf@suse.de
>> 
>> Also, please provide your input there^, it would be nice to unify the
>> terminology and reasoning about both changes.
>
> Yes, I'm mostly flushing my inbox in time order unless prioritized, so I'm
> getting there today or tomorrow.
>
>> 
>> > +
>> >  bool multifd_send_setup(void);
>> >  void multifd_send_shutdown(void);
>> >  void multifd_send_channel_created(void);
>> > @@ -28,7 +37,7 @@ void multifd_recv_shutdown(void);
>> >  bool multifd_recv_all_channels_created(void);
>> >  void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
>> >  void multifd_recv_sync_main(void);
>> > -int multifd_send_sync_main(void);
>> > +int multifd_send_sync_main(MultiFDSyncReq req);
>> >  bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
>> >  bool multifd_recv(void);
>> >  MultiFDRecvData *multifd_get_recv_data(void);
>> > @@ -143,7 +152,7 @@ typedef struct {
>> >      /* multifd flags for each packet */
>> >      uint32_t flags;
>> >      /*
>> > -     * The sender thread has work to do if either of below boolean is set.
>> > +     * The sender thread has work to do if either of below field is set.
>> >       *
>> >       * @pending_job:  a job is pending
>> >       * @pending_sync: a sync request is pending
>> > @@ -152,7 +161,8 @@ typedef struct {
>> >       * cleared by the multifd sender threads.
>> >       */
>> >      bool pending_job;
>> > -    bool pending_sync;
>> > +    MultiFDSyncReq pending_sync;
>> > +
>> >      MultiFDSendData *data;
>> >  
>> >      /* thread local variables. No locking required */
>> > diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
>> > index 55191152f9..f64c4c9abd 100644
>> > --- a/migration/multifd-nocomp.c
>> > +++ b/migration/multifd-nocomp.c
>> > @@ -345,6 +345,8 @@ retry:
>> >  
>> >  int multifd_ram_flush_and_sync(void)
>> >  {
>> > +    MultiFDSyncReq req;
>> > +
>> >      if (!migrate_multifd()) {
>> >          return 0;
>> >      }
>> > @@ -356,7 +358,11 @@ int multifd_ram_flush_and_sync(void)
>> >          }
>> >      }
>> >  
>> > -    return multifd_send_sync_main();
>> > +    /* File migrations only need to sync with threads */
>> > +    req = migrate_mapped_ram() ?
>> > +        MULTIFD_SYNC_THREADS : MULTIFD_SYNC_THREADS_AND_NOTIFY;
>> > +
>> > +    return multifd_send_sync_main(req);
>> >  }
>> >  
>> >  bool multifd_send_prepare_common(MultiFDSendParams *p)
>> > diff --git a/migration/multifd.c b/migration/multifd.c
>> > index 498e71fd10..77645e87a0 100644
>> > --- a/migration/multifd.c
>> > +++ b/migration/multifd.c
>> > @@ -523,7 +523,7 @@ static int multifd_zero_copy_flush(QIOChannel *c)
>> >      return ret;
>> >  }
>> >  
>> > -int multifd_send_sync_main(void)
>> > +int multifd_send_sync_main(MultiFDSyncReq req)
>> >  {
>> >      int i;
>> >      bool flush_zero_copy;
>> > @@ -543,8 +543,8 @@ int multifd_send_sync_main(void)
>> >           * We should be the only user so far, so not possible to be set by
>> >           * others concurrently.
>> >           */
>> > -        assert(qatomic_read(&p->pending_sync) == false);
>> > -        qatomic_set(&p->pending_sync, true);
>> > +        assert(qatomic_read(&p->pending_sync) == MULTIFD_SYNC_NONE);
>> > +        qatomic_set(&p->pending_sync, req);
>> 
>> Hmm, isn't it easier to skip the whole loop if req ==
>> MULTIFD_SYNC_THREADS? I don't remember why we kept this loop here for
>> mapped-ram.
>
> The "thread-only" version of request (or, in your preferred naming, "local"
> sync request) says: "please flush all the works enqueued in sender thread".
> Sync is still needed even for mapped-ram to make sure pwrite()s all land.
> Also needed for VFIO.

I think I remember now, what's needed is to release p->sem and wait on
p->sem_sync (one in each of these loops). We don't need to set the
pending_sync flag if it's not going to be used:

multifd_send_sync_main:
    for () {
        ...
        if (remote_sync) {
            assert(qatomic_read(&p->pending_sync) == false);
            qatomic_set(&p->pending_sync, true);
        }
        qemu_sem_post(&p->sem);
    }
    for () {
        ...
        qemu_sem_wait(&multifd_send_state->channels_ready);
        qemu_sem_wait(&p->sem_sync);
    }

in multifd_send_thread:

        if (qatomic_load_acquire(&p->pending_job)) {
            ...
            qatomic_store_release(&p->pending_job, false);
        } else if (qatomic_read(&p->pending_sync)) {
            ...
            p->flags = MULTIFD_FLAG_SYNC;
            qatomic_set(&p->pending_sync, false);
            qemu_sem_post(&p->sem_sync);
        } else {
            qemu_sem_post(&p->sem_sync);
        }

Is this clearer? Then we avoid the enum altogether, a boolean would
suffice.

>
>> 
>> >          qemu_sem_post(&p->sem);
>> >      }
>> >      for (i = 0; i < migrate_multifd_channels(); i++) {
>> > @@ -635,14 +635,16 @@ static void *multifd_send_thread(void *opaque)
>> >               */
>> >              qatomic_store_release(&p->pending_job, false);
>> >          } else {
>> > +            MultiFDSyncReq req = qatomic_read(&p->pending_sync);
>> > +
>> >              /*
>> >               * If not a normal job, must be a sync request.  Note that
>> >               * pending_sync is a standalone flag (unlike pending_job), so
>> >               * it doesn't require explicit memory barriers.
>> >               */
>> > -            assert(qatomic_read(&p->pending_sync));
>> > +            assert(req != MULTIFD_SYNC_NONE);
>> >  
>> > -            if (use_packets) {
>> > +            if (req == MULTIFD_SYNC_THREADS_AND_NOTIFY) {
>> 
>> Good, more explicit.
>> 
>> >                  p->flags = MULTIFD_FLAG_SYNC;
>> >                  multifd_send_fill_packet(p);
>> >                  ret = qio_channel_write_all(p->c, (void *)p->packet,
>> > @@ -654,7 +656,7 @@ static void *multifd_send_thread(void *opaque)
>> >                  stat64_add(&mig_stats.multifd_bytes, p->packet_len);
>> >              }
>> >  
>> > -            qatomic_set(&p->pending_sync, false);
>> > +            qatomic_set(&p->pending_sync, MULTIFD_SYNC_NONE);
>>
Peter Xu Dec. 5, 2024, 10:12 p.m. UTC | #4
On Thu, Dec 05, 2024 at 06:50:23PM -0300, Fabiano Rosas wrote:
> I think I remember now, what's needed is to release p->sem and wait on
> p->sem_sync (one in each of these loops). We don't need to set the
> pending_sync flag if it's not going to be used:
> 
> multifd_send_sync_main:
>     for () {
>         ...
>         if (remote_sync) {
>             assert(qatomic_read(&p->pending_sync) == false);
>             qatomic_set(&p->pending_sync, true);
>         }
>         qemu_sem_post(&p->sem);
>     }
>     for () {
>         ...
>         qemu_sem_wait(&multifd_send_state->channels_ready);
>         qemu_sem_wait(&p->sem_sync);
>     }
> 
> in multifd_send_thread:
> 
>         if (qatomic_load_acquire(&p->pending_job)) {
>             ...
>             qatomic_store_release(&p->pending_job, false);
>         } else if (qatomic_read(&p->pending_sync)) {
>             ...
>             p->flags = MULTIFD_FLAG_SYNC;
>             qatomic_set(&p->pending_sync, false);
>             qemu_sem_post(&p->sem_sync);
>         } else {

How do you trigger this "else" path at all, if without setting pending_sync
first?

>             qemu_sem_post(&p->sem_sync);
>         }
> 
> Is this clearer? Then we avoid the enum altogether, a boolean would
> suffice.

So one of us missed something. :)

In case if I missed it, a runnable patch would work to clarify.
Peter Xu Dec. 5, 2024, 10:20 p.m. UTC | #5
On Thu, Dec 05, 2024 at 05:12:47PM -0500, Peter Xu wrote:
> In case if I missed it, a runnable patch would work to clarify.

Ohhh no need now, I see what you meant.

But then you'll really need to comment p->sem, with something like:

-    /* sem where to wait for more work */
+    /* sem where to wait for more work.  If there's no any work, it means
+     * a local sync. */
     QemuSemaphore sem;

Do you like it?  I definitely don't.. because it's confusing why p->sem can
imply a sync request if we already have pending_sync.  IMHO it's cleaner
when we have pending_sync, use it for all kinds of syncs.
diff mbox series

Patch

diff --git a/migration/multifd.h b/migration/multifd.h
index 50d58c0c9c..6b2f60a917 100644
--- a/migration/multifd.h
+++ b/migration/multifd.h
@@ -19,6 +19,15 @@ 
 typedef struct MultiFDRecvData MultiFDRecvData;
 typedef struct MultiFDSendData MultiFDSendData;
 
+typedef enum {
+    /* No sync request */
+    MULTIFD_SYNC_NONE = 0,
+    /* Sync on the sender threads without pushing messages */
+    MULTIFD_SYNC_THREADS,
+    /* Sync on the sender threads, meanwhile push "SYNC" message to the wire */
+    MULTIFD_SYNC_THREADS_AND_NOTIFY,
+} MultiFDSyncReq;
+
 bool multifd_send_setup(void);
 void multifd_send_shutdown(void);
 void multifd_send_channel_created(void);
@@ -28,7 +37,7 @@  void multifd_recv_shutdown(void);
 bool multifd_recv_all_channels_created(void);
 void multifd_recv_new_channel(QIOChannel *ioc, Error **errp);
 void multifd_recv_sync_main(void);
-int multifd_send_sync_main(void);
+int multifd_send_sync_main(MultiFDSyncReq req);
 bool multifd_queue_page(RAMBlock *block, ram_addr_t offset);
 bool multifd_recv(void);
 MultiFDRecvData *multifd_get_recv_data(void);
@@ -143,7 +152,7 @@  typedef struct {
     /* multifd flags for each packet */
     uint32_t flags;
     /*
-     * The sender thread has work to do if either of below boolean is set.
+     * The sender thread has work to do if either of below field is set.
      *
      * @pending_job:  a job is pending
      * @pending_sync: a sync request is pending
@@ -152,7 +161,8 @@  typedef struct {
      * cleared by the multifd sender threads.
      */
     bool pending_job;
-    bool pending_sync;
+    MultiFDSyncReq pending_sync;
+
     MultiFDSendData *data;
 
     /* thread local variables. No locking required */
diff --git a/migration/multifd-nocomp.c b/migration/multifd-nocomp.c
index 55191152f9..f64c4c9abd 100644
--- a/migration/multifd-nocomp.c
+++ b/migration/multifd-nocomp.c
@@ -345,6 +345,8 @@  retry:
 
 int multifd_ram_flush_and_sync(void)
 {
+    MultiFDSyncReq req;
+
     if (!migrate_multifd()) {
         return 0;
     }
@@ -356,7 +358,11 @@  int multifd_ram_flush_and_sync(void)
         }
     }
 
-    return multifd_send_sync_main();
+    /* File migrations only need to sync with threads */
+    req = migrate_mapped_ram() ?
+        MULTIFD_SYNC_THREADS : MULTIFD_SYNC_THREADS_AND_NOTIFY;
+
+    return multifd_send_sync_main(req);
 }
 
 bool multifd_send_prepare_common(MultiFDSendParams *p)
diff --git a/migration/multifd.c b/migration/multifd.c
index 498e71fd10..77645e87a0 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -523,7 +523,7 @@  static int multifd_zero_copy_flush(QIOChannel *c)
     return ret;
 }
 
-int multifd_send_sync_main(void)
+int multifd_send_sync_main(MultiFDSyncReq req)
 {
     int i;
     bool flush_zero_copy;
@@ -543,8 +543,8 @@  int multifd_send_sync_main(void)
          * We should be the only user so far, so not possible to be set by
          * others concurrently.
          */
-        assert(qatomic_read(&p->pending_sync) == false);
-        qatomic_set(&p->pending_sync, true);
+        assert(qatomic_read(&p->pending_sync) == MULTIFD_SYNC_NONE);
+        qatomic_set(&p->pending_sync, req);
         qemu_sem_post(&p->sem);
     }
     for (i = 0; i < migrate_multifd_channels(); i++) {
@@ -635,14 +635,16 @@  static void *multifd_send_thread(void *opaque)
              */
             qatomic_store_release(&p->pending_job, false);
         } else {
+            MultiFDSyncReq req = qatomic_read(&p->pending_sync);
+
             /*
              * If not a normal job, must be a sync request.  Note that
              * pending_sync is a standalone flag (unlike pending_job), so
              * it doesn't require explicit memory barriers.
              */
-            assert(qatomic_read(&p->pending_sync));
+            assert(req != MULTIFD_SYNC_NONE);
 
-            if (use_packets) {
+            if (req == MULTIFD_SYNC_THREADS_AND_NOTIFY) {
                 p->flags = MULTIFD_FLAG_SYNC;
                 multifd_send_fill_packet(p);
                 ret = qio_channel_write_all(p->c, (void *)p->packet,
@@ -654,7 +656,7 @@  static void *multifd_send_thread(void *opaque)
                 stat64_add(&mig_stats.multifd_bytes, p->packet_len);
             }
 
-            qatomic_set(&p->pending_sync, false);
+            qatomic_set(&p->pending_sync, MULTIFD_SYNC_NONE);
             qemu_sem_post(&p->sem_sync);
         }
     }