[v3,20/29] postcopy: postcopy_notify_shared_wake

Message ID	20180216131625.9639-21-dgilbert@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org> From: "Dr. David Alan Gilbert (git)" <dgilbert@redhat.com> To: qemu-devel@nongnu.org, maxime.coquelin@redhat.com, marcandre.lureau@redhat.com, peterx@redhat.com, imammedo@redhat.com, mst@redhat.com Date: Fri, 16 Feb 2018 13:16:16 +0000 Message-Id: <20180216131625.9639-21-dgilbert@redhat.com> In-Reply-To: <20180216131625.9639-1-dgilbert@redhat.com> References: <20180216131625.9639-1-dgilbert@redhat.com> Subject: [Qemu-devel] [PATCH v3 20/29] postcopy: postcopy_notify_shared_wake Precedence: list Cc: aarcange@redhat.com, quintela@redhat.com Errors-To: qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+patchwork-qemu-devel=patchwork.kernel.org@nongnu.org>

Dr. David Alan Gilbert Feb. 16, 2018, 1:16 p.m. UTC

From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>

Add a hook to allow a client userfaultfd to be 'woken'
when a page arrives, and a walker that calls that
hook for relevant clients given a RAMBlock and offset.

Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
 migration/postcopy-ram.c | 16 ++++++++++++++++
 migration/postcopy-ram.h | 10 ++++++++++
 2 files changed, 26 insertions(+)

Peter Xu March 2, 2018, 7:51 a.m. UTC | #1

On Fri, Feb 16, 2018 at 01:16:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> 
> Add a hook to allow a client userfaultfd to be 'woken'
> when a page arrives, and a walker that calls that
> hook for relevant clients given a RAMBlock and offset.
> 
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> ---
>  migration/postcopy-ram.c | 16 ++++++++++++++++
>  migration/postcopy-ram.h | 10 ++++++++++
>  2 files changed, 26 insertions(+)
> 
> diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> index 67deae7e1c..879711968c 100644
> --- a/migration/postcopy-ram.c
> +++ b/migration/postcopy-ram.c
> @@ -824,6 +824,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
>      return ret;
>  }
>  
> +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
> +{
> +    int i;
> +    MigrationIncomingState *mis = migration_incoming_get_current();
> +    GArray *pcrfds = mis->postcopy_remote_fds;
> +
> +    for (i = 0; i < pcrfds->len; i++) {
> +        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
> +        int ret = cur->waker(cur, rb, offset);
> +        if (ret) {
> +            return ret;
> +        }
> +    }
> +    return 0;
> +}
> +

We should know that which FD needs what pages, right?  If with that
information, we can only notify the ones who have page faulted on
exactly the same page?  Otherwise we do UFFDIO_WAKE once for each
client when a page is ready, even if the clients have not page faulted
at all?

But for the first version, I think it's fine.  And I believe if we
maintain the faulted addresses we need some way to sync between the
wake thread and fault thread too.  And I totally have no idea on how
this difference will be any kind of bottle neck at all, since I guess
the network link should still be the postcopy bottleneck considering
that 10g is mostly what we have now (or even, 1g).

Reviewed-by: Peter Xu <peterx@redhat.com>

>  /*
>   * Place a host page (from) at (host) atomically
>   * returns 0 on success
> diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> index 2e3dd844d5..2b71cf958e 100644
> --- a/migration/postcopy-ram.h
> +++ b/migration/postcopy-ram.h
> @@ -146,6 +146,10 @@ struct PostCopyFD;
>  
>  /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
>  typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
> +/* Notification to wake, either on place or on reception of
> + * a fault on something that's already arrived (race)
> + */
> +typedef int (*pcfdwake)(struct PostCopyFD *pcfd, RAMBlock *rb, uint64_t offset);
>  
>  struct PostCopyFD {
>      int fd;
> @@ -153,6 +157,8 @@ struct PostCopyFD {
>      void *data;
>      /* Handler to be called whenever we get a poll event */
>      pcfdhandler handler;
> +    /* Notification to wake shared client */
> +    pcfdwake waker;
>      /* A string to use in error messages */
>      const char *idstr;
>  };
> @@ -162,6 +168,10 @@ struct PostCopyFD {
>   */
>  void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
>  void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
> +/* Call each of the shared 'waker's registerd telling them of
> + * availability of a block.
> + */
> +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset);
>  /* Notify a client ufd that a page is available
>   * Note: The 'client_address' is in the address space of the client
>   * program not QEMU
> -- 
> 2.14.3
>

Dr. David Alan Gilbert March 5, 2018, 7:55 p.m. UTC | #2

* Peter Xu (peterx@redhat.com) wrote:
> On Fri, Feb 16, 2018 at 01:16:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > 
> > Add a hook to allow a client userfaultfd to be 'woken'
> > when a page arrives, and a walker that calls that
> > hook for relevant clients given a RAMBlock and offset.
> > 
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > ---
> >  migration/postcopy-ram.c | 16 ++++++++++++++++
> >  migration/postcopy-ram.h | 10 ++++++++++
> >  2 files changed, 26 insertions(+)
> > 
> > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > index 67deae7e1c..879711968c 100644
> > --- a/migration/postcopy-ram.c
> > +++ b/migration/postcopy-ram.c
> > @@ -824,6 +824,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> >      return ret;
> >  }
> >  
> > +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
> > +{
> > +    int i;
> > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > +    GArray *pcrfds = mis->postcopy_remote_fds;
> > +
> > +    for (i = 0; i < pcrfds->len; i++) {
> > +        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
> > +        int ret = cur->waker(cur, rb, offset);
> > +        if (ret) {
> > +            return ret;
> > +        }
> > +    }
> > +    return 0;
> > +}
> > +
> 
> We should know that which FD needs what pages, right?  If with that
> information, we can only notify the ones who have page faulted on
> exactly the same page?  Otherwise we do UFFDIO_WAKE once for each
> client when a page is ready, even if the clients have not page faulted
> at all?

The 'waker' function we call knows that, we don't; see the
'vhost_user_postcopy_waker' in the next patch, and it hunts down whether
the address the waker is called for is one it's responsible for.
Also note that a shared page might be shared between multiple other
programs - not just one.  In our case that could be two vhost-user
devices wired to two separate processes.

> But for the first version, I think it's fine.  And I believe if we
> maintain the faulted addresses we need some way to sync between the
> wake thread and fault thread too.

Hmm can you explain that a bit more?

> And I totally have no idea on how
> this difference will be any kind of bottle neck at all, since I guess
> the network link should still be the postcopy bottleneck considering
> that 10g is mostly what we have now (or even, 1g).
> 
> Reviewed-by: Peter Xu <peterx@redhat.com>

Thanks.

Dave

> 
> >  /*
> >   * Place a host page (from) at (host) atomically
> >   * returns 0 on success
> > diff --git a/migration/postcopy-ram.h b/migration/postcopy-ram.h
> > index 2e3dd844d5..2b71cf958e 100644
> > --- a/migration/postcopy-ram.h
> > +++ b/migration/postcopy-ram.h
> > @@ -146,6 +146,10 @@ struct PostCopyFD;
> >  
> >  /* ufd is a pointer to the struct uffd_msg *TODO: more Portable! */
> >  typedef int (*pcfdhandler)(struct PostCopyFD *pcfd, void *ufd);
> > +/* Notification to wake, either on place or on reception of
> > + * a fault on something that's already arrived (race)
> > + */
> > +typedef int (*pcfdwake)(struct PostCopyFD *pcfd, RAMBlock *rb, uint64_t offset);
> >  
> >  struct PostCopyFD {
> >      int fd;
> > @@ -153,6 +157,8 @@ struct PostCopyFD {
> >      void *data;
> >      /* Handler to be called whenever we get a poll event */
> >      pcfdhandler handler;
> > +    /* Notification to wake shared client */
> > +    pcfdwake waker;
> >      /* A string to use in error messages */
> >      const char *idstr;
> >  };
> > @@ -162,6 +168,10 @@ struct PostCopyFD {
> >   */
> >  void postcopy_register_shared_ufd(struct PostCopyFD *pcfd);
> >  void postcopy_unregister_shared_ufd(struct PostCopyFD *pcfd);
> > +/* Call each of the shared 'waker's registerd telling them of
> > + * availability of a block.
> > + */
> > +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset);
> >  /* Notify a client ufd that a page is available
> >   * Note: The 'client_address' is in the address space of the client
> >   * program not QEMU
> > -- 
> > 2.14.3
> > 
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Peter Xu March 6, 2018, 3:37 a.m. UTC | #3

On Mon, Mar 05, 2018 at 07:55:13PM +0000, Dr. David Alan Gilbert wrote:
> * Peter Xu (peterx@redhat.com) wrote:
> > On Fri, Feb 16, 2018 at 01:16:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > 
> > > Add a hook to allow a client userfaultfd to be 'woken'
> > > when a page arrives, and a walker that calls that
> > > hook for relevant clients given a RAMBlock and offset.
> > > 
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > ---
> > >  migration/postcopy-ram.c | 16 ++++++++++++++++
> > >  migration/postcopy-ram.h | 10 ++++++++++
> > >  2 files changed, 26 insertions(+)
> > > 
> > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > index 67deae7e1c..879711968c 100644
> > > --- a/migration/postcopy-ram.c
> > > +++ b/migration/postcopy-ram.c
> > > @@ -824,6 +824,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> > >      return ret;
> > >  }
> > >  
> > > +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
> > > +{
> > > +    int i;
> > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > +    GArray *pcrfds = mis->postcopy_remote_fds;
> > > +
> > > +    for (i = 0; i < pcrfds->len; i++) {
> > > +        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
> > > +        int ret = cur->waker(cur, rb, offset);
> > > +        if (ret) {
> > > +            return ret;
> > > +        }
> > > +    }
> > > +    return 0;
> > > +}
> > > +
> > 
> > We should know that which FD needs what pages, right?  If with that
> > information, we can only notify the ones who have page faulted on
> > exactly the same page?  Otherwise we do UFFDIO_WAKE once for each
> > client when a page is ready, even if the clients have not page faulted
> > at all?
> 
> The 'waker' function we call knows that, we don't; see the
> 'vhost_user_postcopy_waker' in the next patch, and it hunts down whether
> the address the waker is called for is one it's responsible for.

For vhost-user devices, they should be always responsible for mostly
all RAM exported on the guest?  If so, they will always be notified to
wake up if a page is copied?

Here I was thinking not only about responsible ranges - It was about
whether each PostcopyFD could note down the faulted addresses that
were waiting to be service.  Then when we do the wake up, we could
possibly skip notifying the PostcopyFD when the copied page is not
covering any of the faulted addresses on that PostcopyFD?

> Also note that a shared page might be shared between multiple other
> programs - not just one.  In our case that could be two vhost-user
> devices wired to two separate processes.

Yeah, but the idea still stands IMHO - we can notify only those
PostcopyFDs that have faulted on the page already and skip the rest.
For sure there can be more than one candidate for the wakeup, since
there can be multiple PostcopyFDs that captured page fault on the same
page (or even, same address).

> 
> > But for the first version, I think it's fine.  And I believe if we
> > maintain the faulted addresses we need some way to sync between the
> > wake thread and fault thread too.
> 
> Hmm can you explain that a bit more?

Basically above was what I thought - to record the faulted addresses
with specific PostcopyFD when page fault happened, then we may know
which page(s) will a PostcopyFD need.  But when with that, we'll
possibly need a lock to protect the information (or any other sync
method).

(Hope I didn't miss anything important along the way)

Thanks,

Dr. David Alan Gilbert March 6, 2018, 10:54 a.m. UTC | #4

* Peter Xu (peterx@redhat.com) wrote:
> On Mon, Mar 05, 2018 at 07:55:13PM +0000, Dr. David Alan Gilbert wrote:
> > * Peter Xu (peterx@redhat.com) wrote:
> > > On Fri, Feb 16, 2018 at 01:16:16PM +0000, Dr. David Alan Gilbert (git) wrote:
> > > > From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > 
> > > > Add a hook to allow a client userfaultfd to be 'woken'
> > > > when a page arrives, and a walker that calls that
> > > > hook for relevant clients given a RAMBlock and offset.
> > > > 
> > > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > ---
> > > >  migration/postcopy-ram.c | 16 ++++++++++++++++
> > > >  migration/postcopy-ram.h | 10 ++++++++++
> > > >  2 files changed, 26 insertions(+)
> > > > 
> > > > diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
> > > > index 67deae7e1c..879711968c 100644
> > > > --- a/migration/postcopy-ram.c
> > > > +++ b/migration/postcopy-ram.c
> > > > @@ -824,6 +824,22 @@ static int qemu_ufd_copy_ioctl(int userfault_fd, void *host_addr,
> > > >      return ret;
> > > >  }
> > > >  
> > > > +int postcopy_notify_shared_wake(RAMBlock *rb, uint64_t offset)
> > > > +{
> > > > +    int i;
> > > > +    MigrationIncomingState *mis = migration_incoming_get_current();
> > > > +    GArray *pcrfds = mis->postcopy_remote_fds;
> > > > +
> > > > +    for (i = 0; i < pcrfds->len; i++) {
> > > > +        struct PostCopyFD *cur = &g_array_index(pcrfds, struct PostCopyFD, i);
> > > > +        int ret = cur->waker(cur, rb, offset);
> > > > +        if (ret) {
> > > > +            return ret;
> > > > +        }
> > > > +    }
> > > > +    return 0;
> > > > +}
> > > > +
> > > 
> > > We should know that which FD needs what pages, right?  If with that
> > > information, we can only notify the ones who have page faulted on
> > > exactly the same page?  Otherwise we do UFFDIO_WAKE once for each
> > > client when a page is ready, even if the clients have not page faulted
> > > at all?
> > 
> > The 'waker' function we call knows that, we don't; see the
> > 'vhost_user_postcopy_waker' in the next patch, and it hunts down whether
> > the address the waker is called for is one it's responsible for.
> 
> For vhost-user devices, they should be always responsible for mostly
> all RAM exported on the guest?  If so, they will always be notified to
> wake up if a page is copied?

Right; but this patch isn't vhost-user specific; this is more general.

> Here I was thinking not only about responsible ranges - It was about
> whether each PostcopyFD could note down the faulted addresses that
> were waiting to be service.  Then when we do the wake up, we could
> possibly skip notifying the PostcopyFD when the copied page is not
> covering any of the faulted addresses on that PostcopyFD?

Yes, that would be possible - in this case I made that the job of
the device that had registered (i.e. the waker method) rather than
the core postcopy code.

> > Also note that a shared page might be shared between multiple other
> > programs - not just one.  In our case that could be two vhost-user
> > devices wired to two separate processes.
> 
> Yeah, but the idea still stands IMHO - we can notify only those
> PostcopyFDs that have faulted on the page already and skip the rest.
> For sure there can be more than one candidate for the wakeup, since
> there can be multiple PostcopyFDs that captured page fault on the same
> page (or even, same address).
> 
> > 
> > > But for the first version, I think it's fine.  And I believe if we
> > > maintain the faulted addresses we need some way to sync between the
> > > wake thread and fault thread too.
> > 
> > Hmm can you explain that a bit more?
> 
> Basically above was what I thought - to record the faulted addresses
> with specific PostcopyFD when page fault happened, then we may know
> which page(s) will a PostcopyFD need.  But when with that, we'll
> possibly need a lock to protect the information (or any other sync
> method).

OK, but I think you're suggesting building a whole new data structure to
know which ones need notifying;  that sounds like a lot of extra
complexity for not much gain.

Dave

> (Hope I didn't miss anything important along the way)
> 
> Thanks,
> 
> -- 
> Peter Xu
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Peter Xu March 7, 2018, 10:13 a.m. UTC | #5

On Tue, Mar 06, 2018 at 10:54:18AM +0000, Dr. David Alan Gilbert wrote:

[...]

> > Basically above was what I thought - to record the faulted addresses
> > with specific PostcopyFD when page fault happened, then we may know
> > which page(s) will a PostcopyFD need.  But when with that, we'll
> > possibly need a lock to protect the information (or any other sync
> > method).
> 
> OK, but I think you're suggesting building a whole new data structure to
> know which ones need notifying;  that sounds like a lot of extra
> complexity for not much gain.

Yes we may need a new structure (or just a list of addresses?), and
indeed I have no idea on how that would help us.  I think it depends
on how many useless wakeup we will have, and how expensive is each of
such a wakeup notification.  Again, I think current solution is good
enough as long as we don't see explicit blocker on performance side,
and we can rethink that when really needed.  Thanks,

[v3,20/29] postcopy: postcopy_notify_shared_wake

Commit Message

Comments

Patch