diff mbox series

[8/9] migration: Add support for fdset with multifd + file

Message ID 20240426142042.14573-9-farosas@suse.de (mailing list archive)
State New, archived
Headers show
Series migration/mapped-ram: Add direct-io support | expand

Commit Message

Fabiano Rosas April 26, 2024, 2:20 p.m. UTC
Allow multifd to use an fdset when migrating to a file. This is useful
for the scenario where the management layer wants to have control over
the migration file.

By receiving the file descriptors directly, QEMU can delegate some
high level operating system operations to the management layer (such
as mandatory access control). The management layer might also want to
add its own headers before the migration stream.

Enable the "file:/dev/fdset/#" syntax for the multifd migration with
mapped-ram. The requirements for the fdset mechanism are:

On the migration source side:

- the fdset must contain two fds that are not duplicates between
  themselves;
- if direct-io is to be used, exactly one of the fds must have the
  O_DIRECT flag set;
- the file must be opened with WRONLY both times.

On the migration destination side:

- the fdset must contain one fd;
- the file must be opened with RDONLY.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 docs/devel/migration/main.rst       | 18 ++++++++++++++
 docs/devel/migration/mapped-ram.rst |  6 ++++-
 migration/file.c                    | 38 ++++++++++++++++++++++++++++-
 3 files changed, 60 insertions(+), 2 deletions(-)

Comments

Daniel P. Berrangé May 8, 2024, 8:53 a.m. UTC | #1
On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> Allow multifd to use an fdset when migrating to a file. This is useful
> for the scenario where the management layer wants to have control over
> the migration file.
> 
> By receiving the file descriptors directly, QEMU can delegate some
> high level operating system operations to the management layer (such
> as mandatory access control). The management layer might also want to
> add its own headers before the migration stream.
> 
> Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> mapped-ram. The requirements for the fdset mechanism are:
> 
> On the migration source side:
> 
> - the fdset must contain two fds that are not duplicates between
>   themselves;
> - if direct-io is to be used, exactly one of the fds must have the
>   O_DIRECT flag set;
> - the file must be opened with WRONLY both times.
> 
> On the migration destination side:
> 
> - the fdset must contain one fd;
> - the file must be opened with RDONLY.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  docs/devel/migration/main.rst       | 18 ++++++++++++++
>  docs/devel/migration/mapped-ram.rst |  6 ++++-
>  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>  3 files changed, 60 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> index 54385a23e5..50f6096470 100644
> --- a/docs/devel/migration/main.rst
> +++ b/docs/devel/migration/main.rst
> @@ -47,6 +47,24 @@ over any transport.
>    QEMU interference. Note that QEMU does not flush cached file
>    data/metadata at the end of migration.
>  
> +  The file migration also supports using a file that has already been
> +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> +  (see add-fd QMP command documentation). This method allows a
> +  management application to have control over the migration file
> +  opening operation. There are, however, strict requirements to this
> +  interface:
> +
> +  On the migration source side:
> +    - if the multifd capability is to be used, the fdset must contain
> +      two file descriptors that are not duplicates between themselves;
> +    - if the direct-io capability is to be used, exactly one of the
> +      file descriptors must have the O_DIRECT flag set;
> +    - the file must be opened with WRONLY.
> +
> +  On the migration destination side:
> +    - the fdset must contain one file descriptor;
> +    - the file must be opened with RDONLY.
> +
>  In addition, support is included for migration using RDMA, which
>  transports the page data using ``RDMA``, where the hardware takes care of
>  transporting the pages, and the load on the CPU is much lower.  While the
> diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> index fa4cefd9fc..e6505511f0 100644
> --- a/docs/devel/migration/mapped-ram.rst
> +++ b/docs/devel/migration/mapped-ram.rst
> @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>  sequential stream. Having the pages at fixed offsets also allows the
>  usage of O_DIRECT for save/restore of the migration stream as the
>  pages are ensured to be written respecting O_DIRECT alignment
> -restrictions (direct-io support not yet implemented).
> +restrictions.
>  
>  Usage
>  -----
> @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>  the source side before migrating.
>  
> +For best performance enable the ``direct-io`` capability as well:
> +
> +    ``migrate_set_capability direct-io on``
> +
>  Use-cases
>  ---------
>  
> diff --git a/migration/file.c b/migration/file.c
> index b9265b14dd..3bc8bc7463 100644
> --- a/migration/file.c
> +++ b/migration/file.c
> @@ -17,6 +17,7 @@
>  #include "io/channel-file.h"
>  #include "io/channel-socket.h"
>  #include "io/channel-util.h"
> +#include "monitor/monitor.h"
>  #include "options.h"
>  #include "trace.h"
>  
> @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>      }
>  }
>  
> +/*
> + * With multifd, due to the behavior of the dup() system call, we need
> + * the fdset to have two non-duplicate fds so we can enable direct IO
> + * in the secondary channels without affecting the main channel.
> + */
>  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>                               Error **errp)
>  {
> +    FdsetInfoList *fds_info;
> +    FdsetFdInfoList *fd_info;
>      const char *fdset_id_str;
> +    int nfds = 0;
>  
>      *fdset_id = -1;
>  
> @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>          return false;
>      }
>  
> +    if (!migrate_multifd() || !migrate_direct_io()) {
> +        return true;
> +    }
> +
> +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> +         fds_info = fds_info->next) {
> +
> +        if (*fdset_id != fds_info->value->fdset_id) {
> +            continue;
> +        }
> +
> +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> +            if (nfds++ > 2) {
> +                break;
> +            }
> +        }
> +    }
> +
> +    if (nfds != 2) {
> +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> +                   "got %d", nfds);
> +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> +        *fdset_id = -1;
> +        return false;
> +    }
> +
>      return true;
>  }

Related to my thoughts in an earlier patch, where I say that use of fdsets
ought to be transparent to QEMU code, I'm not a fan of having this logic
in migration code.

IIUC, the migration code will call  qio_channel_file_new_path twice,
once with O_DIRECT and once without. This should trigger two calls
into monitor_fdset_dup_fd_add with different flags. If we're matching
flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
provided, are we not able to report an error there ?

>  
> @@ -209,10 +244,11 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
>      g_autofree char *filename = g_strdup(file_args->filename);
>      QIOChannelFile *fioc = NULL;
>      uint64_t offset = file_args->offset;
> +    int flags = O_RDONLY;
>  
>      trace_migration_file_incoming(filename);
>  
> -    fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
> +    fioc = qio_channel_file_new_path(filename, flags, 0, errp);
>      if (!fioc) {
>          return;
>      }
> -- 
> 2.35.3
> 

With regards,
Daniel
Peter Xu May 8, 2024, 6:23 p.m. UTC | #2
On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> > Allow multifd to use an fdset when migrating to a file. This is useful
> > for the scenario where the management layer wants to have control over
> > the migration file.
> > 
> > By receiving the file descriptors directly, QEMU can delegate some
> > high level operating system operations to the management layer (such
> > as mandatory access control). The management layer might also want to
> > add its own headers before the migration stream.
> > 
> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> > mapped-ram. The requirements for the fdset mechanism are:
> > 
> > On the migration source side:
> > 
> > - the fdset must contain two fds that are not duplicates between
> >   themselves;
> > - if direct-io is to be used, exactly one of the fds must have the
> >   O_DIRECT flag set;
> > - the file must be opened with WRONLY both times.
> > 
> > On the migration destination side:
> > 
> > - the fdset must contain one fd;
> > - the file must be opened with RDONLY.
> > 
> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> > ---
> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
> >  3 files changed, 60 insertions(+), 2 deletions(-)
> > 
> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> > index 54385a23e5..50f6096470 100644
> > --- a/docs/devel/migration/main.rst
> > +++ b/docs/devel/migration/main.rst
> > @@ -47,6 +47,24 @@ over any transport.
> >    QEMU interference. Note that QEMU does not flush cached file
> >    data/metadata at the end of migration.
> >  
> > +  The file migration also supports using a file that has already been
> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> > +  (see add-fd QMP command documentation). This method allows a
> > +  management application to have control over the migration file
> > +  opening operation. There are, however, strict requirements to this
> > +  interface:
> > +
> > +  On the migration source side:
> > +    - if the multifd capability is to be used, the fdset must contain
> > +      two file descriptors that are not duplicates between themselves;
> > +    - if the direct-io capability is to be used, exactly one of the
> > +      file descriptors must have the O_DIRECT flag set;
> > +    - the file must be opened with WRONLY.
> > +
> > +  On the migration destination side:
> > +    - the fdset must contain one file descriptor;
> > +    - the file must be opened with RDONLY.
> > +
> >  In addition, support is included for migration using RDMA, which
> >  transports the page data using ``RDMA``, where the hardware takes care of
> >  transporting the pages, and the load on the CPU is much lower.  While the
> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> > index fa4cefd9fc..e6505511f0 100644
> > --- a/docs/devel/migration/mapped-ram.rst
> > +++ b/docs/devel/migration/mapped-ram.rst
> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
> >  sequential stream. Having the pages at fixed offsets also allows the
> >  usage of O_DIRECT for save/restore of the migration stream as the
> >  pages are ensured to be written respecting O_DIRECT alignment
> > -restrictions (direct-io support not yet implemented).
> > +restrictions.
> >  
> >  Usage
> >  -----
> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
> >  the source side before migrating.
> >  
> > +For best performance enable the ``direct-io`` capability as well:
> > +
> > +    ``migrate_set_capability direct-io on``
> > +
> >  Use-cases
> >  ---------
> >  
> > diff --git a/migration/file.c b/migration/file.c
> > index b9265b14dd..3bc8bc7463 100644
> > --- a/migration/file.c
> > +++ b/migration/file.c
> > @@ -17,6 +17,7 @@
> >  #include "io/channel-file.h"
> >  #include "io/channel-socket.h"
> >  #include "io/channel-util.h"
> > +#include "monitor/monitor.h"
> >  #include "options.h"
> >  #include "trace.h"
> >  
> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
> >      }
> >  }
> >  
> > +/*
> > + * With multifd, due to the behavior of the dup() system call, we need
> > + * the fdset to have two non-duplicate fds so we can enable direct IO
> > + * in the secondary channels without affecting the main channel.
> > + */
> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >                               Error **errp)
> >  {
> > +    FdsetInfoList *fds_info;
> > +    FdsetFdInfoList *fd_info;
> >      const char *fdset_id_str;
> > +    int nfds = 0;
> >  
> >      *fdset_id = -1;
> >  
> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >          return false;
> >      }
> >  
> > +    if (!migrate_multifd() || !migrate_direct_io()) {
> > +        return true;
> > +    }
> > +
> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> > +         fds_info = fds_info->next) {
> > +
> > +        if (*fdset_id != fds_info->value->fdset_id) {
> > +            continue;
> > +        }
> > +
> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> > +            if (nfds++ > 2) {
> > +                break;
> > +            }
> > +        }
> > +    }
> > +
> > +    if (nfds != 2) {
> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> > +                   "got %d", nfds);
> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> > +        *fdset_id = -1;
> > +        return false;
> > +    }
> > +
> >      return true;
> >  }
> 
> Related to my thoughts in an earlier patch, where I say that use of fdsets
> ought to be transparent to QEMU code, I'm not a fan of having this logic
> in migration code.
> 
> IIUC, the migration code will call  qio_channel_file_new_path twice,
> once with O_DIRECT and once without. This should trigger two calls
> into monitor_fdset_dup_fd_add with different flags. If we're matching
> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
> provided, are we not able to report an error there ?

Right, this sounds working.

For a real sanity check, we may want to somehow check the two fds returned
from qio_channel_file_new_path() to point to the same file underneath.

What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as
the whole purpose of having two fds is to make sure they have different
struct file to back the fd (and only one of them has O_DIRECT).  fstat()
might work in this case over the st_ino field, etc. maybe fstatfs() too but
perhaps that's over cautious.  Just a pain to use two fds as a start..

Thanks,
Fabiano Rosas May 8, 2024, 8:39 p.m. UTC | #3
Peter Xu <peterx@redhat.com> writes:

> On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> > Allow multifd to use an fdset when migrating to a file. This is useful
>> > for the scenario where the management layer wants to have control over
>> > the migration file.
>> > 
>> > By receiving the file descriptors directly, QEMU can delegate some
>> > high level operating system operations to the management layer (such
>> > as mandatory access control). The management layer might also want to
>> > add its own headers before the migration stream.
>> > 
>> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> > mapped-ram. The requirements for the fdset mechanism are:
>> > 
>> > On the migration source side:
>> > 
>> > - the fdset must contain two fds that are not duplicates between
>> >   themselves;
>> > - if direct-io is to be used, exactly one of the fds must have the
>> >   O_DIRECT flag set;
>> > - the file must be opened with WRONLY both times.
>> > 
>> > On the migration destination side:
>> > 
>> > - the fdset must contain one fd;
>> > - the file must be opened with RDONLY.
>> > 
>> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> > ---
>> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
>> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
>> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> > 
>> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
>> > index 54385a23e5..50f6096470 100644
>> > --- a/docs/devel/migration/main.rst
>> > +++ b/docs/devel/migration/main.rst
>> > @@ -47,6 +47,24 @@ over any transport.
>> >    QEMU interference. Note that QEMU does not flush cached file
>> >    data/metadata at the end of migration.
>> >  
>> > +  The file migration also supports using a file that has already been
>> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> > +  (see add-fd QMP command documentation). This method allows a
>> > +  management application to have control over the migration file
>> > +  opening operation. There are, however, strict requirements to this
>> > +  interface:
>> > +
>> > +  On the migration source side:
>> > +    - if the multifd capability is to be used, the fdset must contain
>> > +      two file descriptors that are not duplicates between themselves;
>> > +    - if the direct-io capability is to be used, exactly one of the
>> > +      file descriptors must have the O_DIRECT flag set;
>> > +    - the file must be opened with WRONLY.
>> > +
>> > +  On the migration destination side:
>> > +    - the fdset must contain one file descriptor;
>> > +    - the file must be opened with RDONLY.
>> > +
>> >  In addition, support is included for migration using RDMA, which
>> >  transports the page data using ``RDMA``, where the hardware takes care of
>> >  transporting the pages, and the load on the CPU is much lower.  While the
>> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
>> > index fa4cefd9fc..e6505511f0 100644
>> > --- a/docs/devel/migration/mapped-ram.rst
>> > +++ b/docs/devel/migration/mapped-ram.rst
>> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>> >  sequential stream. Having the pages at fixed offsets also allows the
>> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >  pages are ensured to be written respecting O_DIRECT alignment
>> > -restrictions (direct-io support not yet implemented).
>> > +restrictions.
>> >  
>> >  Usage
>> >  -----
>> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>> >  the source side before migrating.
>> >  
>> > +For best performance enable the ``direct-io`` capability as well:
>> > +
>> > +    ``migrate_set_capability direct-io on``
>> > +
>> >  Use-cases
>> >  ---------
>> >  
>> > diff --git a/migration/file.c b/migration/file.c
>> > index b9265b14dd..3bc8bc7463 100644
>> > --- a/migration/file.c
>> > +++ b/migration/file.c
>> > @@ -17,6 +17,7 @@
>> >  #include "io/channel-file.h"
>> >  #include "io/channel-socket.h"
>> >  #include "io/channel-util.h"
>> > +#include "monitor/monitor.h"
>> >  #include "options.h"
>> >  #include "trace.h"
>> >  
>> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>> >      }
>> >  }
>> >  
>> > +/*
>> > + * With multifd, due to the behavior of the dup() system call, we need
>> > + * the fdset to have two non-duplicate fds so we can enable direct IO
>> > + * in the secondary channels without affecting the main channel.
>> > + */
>> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >                               Error **errp)
>> >  {
>> > +    FdsetInfoList *fds_info;
>> > +    FdsetFdInfoList *fd_info;
>> >      const char *fdset_id_str;
>> > +    int nfds = 0;
>> >  
>> >      *fdset_id = -1;
>> >  
>> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >          return false;
>> >      }
>> >  
>> > +    if (!migrate_multifd() || !migrate_direct_io()) {
>> > +        return true;
>> > +    }
>> > +
>> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
>> > +         fds_info = fds_info->next) {
>> > +
>> > +        if (*fdset_id != fds_info->value->fdset_id) {
>> > +            continue;
>> > +        }
>> > +
>> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
>> > +            if (nfds++ > 2) {
>> > +                break;
>> > +            }
>> > +        }
>> > +    }
>> > +
>> > +    if (nfds != 2) {
>> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>> > +                   "got %d", nfds);
>> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
>> > +        *fdset_id = -1;
>> > +        return false;
>> > +    }
>> > +
>> >      return true;
>> >  }
>> 
>> Related to my thoughts in an earlier patch, where I say that use of fdsets
>> ought to be transparent to QEMU code, I'm not a fan of having this logic
>> in migration code.
>> 
>> IIUC, the migration code will call  qio_channel_file_new_path twice,
>> once with O_DIRECT and once without. This should trigger two calls
>> into monitor_fdset_dup_fd_add with different flags. If we're matching
>> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
>> provided, are we not able to report an error there ?
>
> Right, this sounds working.

It works, but due to how low-level fdset is, it's difficult to match the
low level error to anything meaningful we can report to the user. I'll
have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
very useful.

-1 with no errno
-1 with EACCES (should actually be EBADF)
-1 with ENOENT

There has been some discusstion around this before actually:

https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html

Or, you know, let the management layer figure it out. We seem to be
heading in this direction already. I imagine once the code is written to
interact with QEMU, it would not have any reason to change, so it might
be ok to replace some of the code I'm adding in this series with
documentation and call it a day. I don't like this approach very much,
but it would definitely make this series way simpler.

>
> For a real sanity check, we may want to somehow check the two fds returned
> from qio_channel_file_new_path() to point to the same file underneath.
>
> What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as
> the whole purpose of having two fds is to make sure they have different
> struct file to back the fd (and only one of them has O_DIRECT).  fstat()
> might work in this case over the st_ino field, etc. maybe fstatfs() too but
> perhaps that's over cautious.  Just a pain to use two fds as a start..
>
> Thanks,
Daniel P. Berrangé May 9, 2024, 8:08 a.m. UTC | #4
On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
> Peter Xu <peterx@redhat.com> writes:
> 
> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
> >> > Allow multifd to use an fdset when migrating to a file. This is useful
> >> > for the scenario where the management layer wants to have control over
> >> > the migration file.
> >> > 
> >> > By receiving the file descriptors directly, QEMU can delegate some
> >> > high level operating system operations to the management layer (such
> >> > as mandatory access control). The management layer might also want to
> >> > add its own headers before the migration stream.
> >> > 
> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
> >> > mapped-ram. The requirements for the fdset mechanism are:
> >> > 
> >> > On the migration source side:
> >> > 
> >> > - the fdset must contain two fds that are not duplicates between
> >> >   themselves;
> >> > - if direct-io is to be used, exactly one of the fds must have the
> >> >   O_DIRECT flag set;
> >> > - the file must be opened with WRONLY both times.
> >> > 
> >> > On the migration destination side:
> >> > 
> >> > - the fdset must contain one fd;
> >> > - the file must be opened with RDONLY.
> >> > 
> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
> >> > ---
> >> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
> >> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
> >> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
> >> > 
> >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> >> > index 54385a23e5..50f6096470 100644
> >> > --- a/docs/devel/migration/main.rst
> >> > +++ b/docs/devel/migration/main.rst
> >> > @@ -47,6 +47,24 @@ over any transport.
> >> >    QEMU interference. Note that QEMU does not flush cached file
> >> >    data/metadata at the end of migration.
> >> >  
> >> > +  The file migration also supports using a file that has already been
> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
> >> > +  (see add-fd QMP command documentation). This method allows a
> >> > +  management application to have control over the migration file
> >> > +  opening operation. There are, however, strict requirements to this
> >> > +  interface:
> >> > +
> >> > +  On the migration source side:
> >> > +    - if the multifd capability is to be used, the fdset must contain
> >> > +      two file descriptors that are not duplicates between themselves;
> >> > +    - if the direct-io capability is to be used, exactly one of the
> >> > +      file descriptors must have the O_DIRECT flag set;
> >> > +    - the file must be opened with WRONLY.
> >> > +
> >> > +  On the migration destination side:
> >> > +    - the fdset must contain one file descriptor;
> >> > +    - the file must be opened with RDONLY.
> >> > +
> >> >  In addition, support is included for migration using RDMA, which
> >> >  transports the page data using ``RDMA``, where the hardware takes care of
> >> >  transporting the pages, and the load on the CPU is much lower.  While the
> >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
> >> > index fa4cefd9fc..e6505511f0 100644
> >> > --- a/docs/devel/migration/mapped-ram.rst
> >> > +++ b/docs/devel/migration/mapped-ram.rst
> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
> >> >  sequential stream. Having the pages at fixed offsets also allows the
> >> >  usage of O_DIRECT for save/restore of the migration stream as the
> >> >  pages are ensured to be written respecting O_DIRECT alignment
> >> > -restrictions (direct-io support not yet implemented).
> >> > +restrictions.
> >> >  
> >> >  Usage
> >> >  -----
> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
> >> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
> >> >  the source side before migrating.
> >> >  
> >> > +For best performance enable the ``direct-io`` capability as well:
> >> > +
> >> > +    ``migrate_set_capability direct-io on``
> >> > +
> >> >  Use-cases
> >> >  ---------
> >> >  
> >> > diff --git a/migration/file.c b/migration/file.c
> >> > index b9265b14dd..3bc8bc7463 100644
> >> > --- a/migration/file.c
> >> > +++ b/migration/file.c
> >> > @@ -17,6 +17,7 @@
> >> >  #include "io/channel-file.h"
> >> >  #include "io/channel-socket.h"
> >> >  #include "io/channel-util.h"
> >> > +#include "monitor/monitor.h"
> >> >  #include "options.h"
> >> >  #include "trace.h"
> >> >  
> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
> >> >      }
> >> >  }
> >> >  
> >> > +/*
> >> > + * With multifd, due to the behavior of the dup() system call, we need
> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO
> >> > + * in the secondary channels without affecting the main channel.
> >> > + */
> >> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >> >                               Error **errp)
> >> >  {
> >> > +    FdsetInfoList *fds_info;
> >> > +    FdsetFdInfoList *fd_info;
> >> >      const char *fdset_id_str;
> >> > +    int nfds = 0;
> >> >  
> >> >      *fdset_id = -1;
> >> >  
> >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
> >> >          return false;
> >> >      }
> >> >  
> >> > +    if (!migrate_multifd() || !migrate_direct_io()) {
> >> > +        return true;
> >> > +    }
> >> > +
> >> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
> >> > +         fds_info = fds_info->next) {
> >> > +
> >> > +        if (*fdset_id != fds_info->value->fdset_id) {
> >> > +            continue;
> >> > +        }
> >> > +
> >> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
> >> > +            if (nfds++ > 2) {
> >> > +                break;
> >> > +            }
> >> > +        }
> >> > +    }
> >> > +
> >> > +    if (nfds != 2) {
> >> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
> >> > +                   "got %d", nfds);
> >> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
> >> > +        *fdset_id = -1;
> >> > +        return false;
> >> > +    }
> >> > +
> >> >      return true;
> >> >  }
> >> 
> >> Related to my thoughts in an earlier patch, where I say that use of fdsets
> >> ought to be transparent to QEMU code, I'm not a fan of having this logic
> >> in migration code.
> >> 
> >> IIUC, the migration code will call  qio_channel_file_new_path twice,
> >> once with O_DIRECT and once without. This should trigger two calls
> >> into monitor_fdset_dup_fd_add with different flags. If we're matching
> >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
> >> provided, are we not able to report an error there ?
> >
> > Right, this sounds working.
> 
> It works, but due to how low-level fdset is, it's difficult to match the
> low level error to anything meaningful we can report to the user. I'll
> have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
> very useful.
> 
> -1 with no errno
> -1 with EACCES (should actually be EBADF)
> -1 with ENOENT
> 
> There has been some discusstion around this before actually:
> 
> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html

The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
and that has a "Error **errp" parameter.  We should rewrite
monitor_fdset_dup_fd_add to also have an "Error **errp" at which
point we can actually report useful, actionable error messages
from it. Errnos be gone !

With regards,
Daniel
Fabiano Rosas May 17, 2024, 10:43 p.m. UTC | #5
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote:
>> Peter Xu <peterx@redhat.com> writes:
>> 
>> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote:
>> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote:
>> >> > Allow multifd to use an fdset when migrating to a file. This is useful
>> >> > for the scenario where the management layer wants to have control over
>> >> > the migration file.
>> >> > 
>> >> > By receiving the file descriptors directly, QEMU can delegate some
>> >> > high level operating system operations to the management layer (such
>> >> > as mandatory access control). The management layer might also want to
>> >> > add its own headers before the migration stream.
>> >> > 
>> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with
>> >> > mapped-ram. The requirements for the fdset mechanism are:
>> >> > 
>> >> > On the migration source side:
>> >> > 
>> >> > - the fdset must contain two fds that are not duplicates between
>> >> >   themselves;
>> >> > - if direct-io is to be used, exactly one of the fds must have the
>> >> >   O_DIRECT flag set;
>> >> > - the file must be opened with WRONLY both times.
>> >> > 
>> >> > On the migration destination side:
>> >> > 
>> >> > - the fdset must contain one fd;
>> >> > - the file must be opened with RDONLY.
>> >> > 
>> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de>
>> >> > ---
>> >> >  docs/devel/migration/main.rst       | 18 ++++++++++++++
>> >> >  docs/devel/migration/mapped-ram.rst |  6 ++++-
>> >> >  migration/file.c                    | 38 ++++++++++++++++++++++++++++-
>> >> >  3 files changed, 60 insertions(+), 2 deletions(-)
>> >> > 
>> >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
>> >> > index 54385a23e5..50f6096470 100644
>> >> > --- a/docs/devel/migration/main.rst
>> >> > +++ b/docs/devel/migration/main.rst
>> >> > @@ -47,6 +47,24 @@ over any transport.
>> >> >    QEMU interference. Note that QEMU does not flush cached file
>> >> >    data/metadata at the end of migration.
>> >> >  
>> >> > +  The file migration also supports using a file that has already been
>> >> > +  opened. A set of file descriptors is passed to QEMU via an "fdset"
>> >> > +  (see add-fd QMP command documentation). This method allows a
>> >> > +  management application to have control over the migration file
>> >> > +  opening operation. There are, however, strict requirements to this
>> >> > +  interface:
>> >> > +
>> >> > +  On the migration source side:
>> >> > +    - if the multifd capability is to be used, the fdset must contain
>> >> > +      two file descriptors that are not duplicates between themselves;
>> >> > +    - if the direct-io capability is to be used, exactly one of the
>> >> > +      file descriptors must have the O_DIRECT flag set;
>> >> > +    - the file must be opened with WRONLY.
>> >> > +
>> >> > +  On the migration destination side:
>> >> > +    - the fdset must contain one file descriptor;
>> >> > +    - the file must be opened with RDONLY.
>> >> > +
>> >> >  In addition, support is included for migration using RDMA, which
>> >> >  transports the page data using ``RDMA``, where the hardware takes care of
>> >> >  transporting the pages, and the load on the CPU is much lower.  While the
>> >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
>> >> > index fa4cefd9fc..e6505511f0 100644
>> >> > --- a/docs/devel/migration/mapped-ram.rst
>> >> > +++ b/docs/devel/migration/mapped-ram.rst
>> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a
>> >> >  sequential stream. Having the pages at fixed offsets also allows the
>> >> >  usage of O_DIRECT for save/restore of the migration stream as the
>> >> >  pages are ensured to be written respecting O_DIRECT alignment
>> >> > -restrictions (direct-io support not yet implemented).
>> >> > +restrictions.
>> >> >  
>> >> >  Usage
>> >> >  -----
>> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration:
>> >> >  Mapped-ram migration is best done non-live, i.e. by stopping the VM on
>> >> >  the source side before migrating.
>> >> >  
>> >> > +For best performance enable the ``direct-io`` capability as well:
>> >> > +
>> >> > +    ``migrate_set_capability direct-io on``
>> >> > +
>> >> >  Use-cases
>> >> >  ---------
>> >> >  
>> >> > diff --git a/migration/file.c b/migration/file.c
>> >> > index b9265b14dd..3bc8bc7463 100644
>> >> > --- a/migration/file.c
>> >> > +++ b/migration/file.c
>> >> > @@ -17,6 +17,7 @@
>> >> >  #include "io/channel-file.h"
>> >> >  #include "io/channel-socket.h"
>> >> >  #include "io/channel-util.h"
>> >> > +#include "monitor/monitor.h"
>> >> >  #include "options.h"
>> >> >  #include "trace.h"
>> >> >  
>> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void)
>> >> >      }
>> >> >  }
>> >> >  
>> >> > +/*
>> >> > + * With multifd, due to the behavior of the dup() system call, we need
>> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO
>> >> > + * in the secondary channels without affecting the main channel.
>> >> > + */
>> >> >  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >> >                               Error **errp)
>> >> >  {
>> >> > +    FdsetInfoList *fds_info;
>> >> > +    FdsetFdInfoList *fd_info;
>> >> >      const char *fdset_id_str;
>> >> > +    int nfds = 0;
>> >> >  
>> >> >      *fdset_id = -1;
>> >> >  
>> >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
>> >> >          return false;
>> >> >      }
>> >> >  
>> >> > +    if (!migrate_multifd() || !migrate_direct_io()) {
>> >> > +        return true;
>> >> > +    }
>> >> > +
>> >> > +    for (fds_info = qmp_query_fdsets(NULL); fds_info;
>> >> > +         fds_info = fds_info->next) {
>> >> > +
>> >> > +        if (*fdset_id != fds_info->value->fdset_id) {
>> >> > +            continue;
>> >> > +        }
>> >> > +
>> >> > +        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
>> >> > +            if (nfds++ > 2) {
>> >> > +                break;
>> >> > +            }
>> >> > +        }
>> >> > +    }
>> >> > +
>> >> > +    if (nfds != 2) {
>> >> > +        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
>> >> > +                   "got %d", nfds);
>> >> > +        qmp_remove_fd(*fdset_id, false, -1, NULL);
>> >> > +        *fdset_id = -1;
>> >> > +        return false;
>> >> > +    }
>> >> > +
>> >> >      return true;
>> >> >  }
>> >> 
>> >> Related to my thoughts in an earlier patch, where I say that use of fdsets
>> >> ought to be transparent to QEMU code, I'm not a fan of having this logic
>> >> in migration code.
>> >> 
>> >> IIUC, the migration code will call  qio_channel_file_new_path twice,
>> >> once with O_DIRECT and once without. This should trigger two calls
>> >> into monitor_fdset_dup_fd_add with different flags. If we're matching
>> >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was
>> >> provided, are we not able to report an error there ?
>> >
>> > Right, this sounds working.
>> 
>> It works, but due to how low-level fdset is, it's difficult to match the
>> low level error to anything meaningful we can report to the user. I'll
>> have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not
>> very useful.
>> 
>> -1 with no errno
>> -1 with EACCES (should actually be EBADF)
>> -1 with ENOENT
>> 
>> There has been some discusstion around this before actually:
>> 
>> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html
>
> The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
> and that has a "Error **errp" parameter.  We should rewrite
> monitor_fdset_dup_fd_add to also have an "Error **errp" at which
> point we can actually report useful, actionable error messages
> from it. Errnos be gone !

I can do that, but qemu_open_old() does not pass the error in. Please
see if this works for you:

-->8--
From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001
From: Fabiano Rosas <farosas@suse.de>
Date: Fri, 17 May 2024 19:30:39 -0300
Subject: [PATCH] io: Stop using qemu_open_old in channel-file

We want to make use of the Error object to report fdset errors from
qemu_open_internal() and passing the error pointer to qemu_open_old()
would require changing all callers. Move the file channel to the new
API instead.

Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
 io/channel-file.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/io/channel-file.c b/io/channel-file.c
index 6436cfb6ae..2ea8d08360 100644
--- a/io/channel-file.c
+++ b/io/channel-file.c
@@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
 
     ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
 
-    ioc->fd = qemu_open_old(path, flags, mode);
+    if (flags & O_CREAT) {
+        ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
+    } else {
+        ioc->fd = qemu_open(path, flags, errp);
+    }
     if (ioc->fd < 0) {
         object_unref(OBJECT(ioc));
-        error_setg_errno(errp, errno,
-                         "Unable to open %s", path);
         return NULL;
     }
Daniel P. Berrangé May 18, 2024, 8:36 a.m. UTC | #6
On Fri, May 17, 2024 at 07:43:35PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> >
> > The only caller of monitor_fdset_dup_fd_add is qemu_open_internal
> > and that has a "Error **errp" parameter.  We should rewrite
> > monitor_fdset_dup_fd_add to also have an "Error **errp" at which
> > point we can actually report useful, actionable error messages
> > from it. Errnos be gone !
> 
> I can do that, but qemu_open_old() does not pass the error in. Please
> see if this works for you:
> 
> -->8--
> From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001
> From: Fabiano Rosas <farosas@suse.de>
> Date: Fri, 17 May 2024 19:30:39 -0300
> Subject: [PATCH] io: Stop using qemu_open_old in channel-file
> 
> We want to make use of the Error object to report fdset errors from
> qemu_open_internal() and passing the error pointer to qemu_open_old()
> would require changing all callers. Move the file channel to the new
> API instead.
> 
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  io/channel-file.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/io/channel-file.c b/io/channel-file.c
> index 6436cfb6ae..2ea8d08360 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path,
>  
>      ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE));
>  
> -    ioc->fd = qemu_open_old(path, flags, mode);
> +    if (flags & O_CREAT) {
> +        ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp);
> +    } else {
> +        ioc->fd = qemu_open(path, flags, errp);
> +    }
>      if (ioc->fd < 0) {
>          object_unref(OBJECT(ioc));
> -        error_setg_errno(errp, errno,
> -                         "Unable to open %s", path);
>          return NULL;
>      }

Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


With regards,
Daniel
diff mbox series

Patch

diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
index 54385a23e5..50f6096470 100644
--- a/docs/devel/migration/main.rst
+++ b/docs/devel/migration/main.rst
@@ -47,6 +47,24 @@  over any transport.
   QEMU interference. Note that QEMU does not flush cached file
   data/metadata at the end of migration.
 
+  The file migration also supports using a file that has already been
+  opened. A set of file descriptors is passed to QEMU via an "fdset"
+  (see add-fd QMP command documentation). This method allows a
+  management application to have control over the migration file
+  opening operation. There are, however, strict requirements to this
+  interface:
+
+  On the migration source side:
+    - if the multifd capability is to be used, the fdset must contain
+      two file descriptors that are not duplicates between themselves;
+    - if the direct-io capability is to be used, exactly one of the
+      file descriptors must have the O_DIRECT flag set;
+    - the file must be opened with WRONLY.
+
+  On the migration destination side:
+    - the fdset must contain one file descriptor;
+    - the file must be opened with RDONLY.
+
 In addition, support is included for migration using RDMA, which
 transports the page data using ``RDMA``, where the hardware takes care of
 transporting the pages, and the load on the CPU is much lower.  While the
diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst
index fa4cefd9fc..e6505511f0 100644
--- a/docs/devel/migration/mapped-ram.rst
+++ b/docs/devel/migration/mapped-ram.rst
@@ -16,7 +16,7 @@  location in the file, rather than constantly being added to a
 sequential stream. Having the pages at fixed offsets also allows the
 usage of O_DIRECT for save/restore of the migration stream as the
 pages are ensured to be written respecting O_DIRECT alignment
-restrictions (direct-io support not yet implemented).
+restrictions.
 
 Usage
 -----
@@ -35,6 +35,10 @@  Use a ``file:`` URL for migration:
 Mapped-ram migration is best done non-live, i.e. by stopping the VM on
 the source side before migrating.
 
+For best performance enable the ``direct-io`` capability as well:
+
+    ``migrate_set_capability direct-io on``
+
 Use-cases
 ---------
 
diff --git a/migration/file.c b/migration/file.c
index b9265b14dd..3bc8bc7463 100644
--- a/migration/file.c
+++ b/migration/file.c
@@ -17,6 +17,7 @@ 
 #include "io/channel-file.h"
 #include "io/channel-socket.h"
 #include "io/channel-util.h"
+#include "monitor/monitor.h"
 #include "options.h"
 #include "trace.h"
 
@@ -54,10 +55,18 @@  static void file_remove_fdset(void)
     }
 }
 
+/*
+ * With multifd, due to the behavior of the dup() system call, we need
+ * the fdset to have two non-duplicate fds so we can enable direct IO
+ * in the secondary channels without affecting the main channel.
+ */
 static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
                              Error **errp)
 {
+    FdsetInfoList *fds_info;
+    FdsetFdInfoList *fd_info;
     const char *fdset_id_str;
+    int nfds = 0;
 
     *fdset_id = -1;
 
@@ -71,6 +80,32 @@  static bool file_parse_fdset(const char *filename, int64_t *fdset_id,
         return false;
     }
 
+    if (!migrate_multifd() || !migrate_direct_io()) {
+        return true;
+    }
+
+    for (fds_info = qmp_query_fdsets(NULL); fds_info;
+         fds_info = fds_info->next) {
+
+        if (*fdset_id != fds_info->value->fdset_id) {
+            continue;
+        }
+
+        for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) {
+            if (nfds++ > 2) {
+                break;
+            }
+        }
+    }
+
+    if (nfds != 2) {
+        error_setg(errp, "Outgoing migration needs two fds in the fdset, "
+                   "got %d", nfds);
+        qmp_remove_fd(*fdset_id, false, -1, NULL);
+        *fdset_id = -1;
+        return false;
+    }
+
     return true;
 }
 
@@ -209,10 +244,11 @@  void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp)
     g_autofree char *filename = g_strdup(file_args->filename);
     QIOChannelFile *fioc = NULL;
     uint64_t offset = file_args->offset;
+    int flags = O_RDONLY;
 
     trace_migration_file_incoming(filename);
 
-    fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp);
+    fioc = qio_channel_file_new_path(filename, flags, 0, errp);
     if (!fioc) {
         return;
     }