Message ID | 20240426142042.14573-9-farosas@suse.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | migration/mapped-ram: Add direct-io support | expand |
On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote: > Allow multifd to use an fdset when migrating to a file. This is useful > for the scenario where the management layer wants to have control over > the migration file. > > By receiving the file descriptors directly, QEMU can delegate some > high level operating system operations to the management layer (such > as mandatory access control). The management layer might also want to > add its own headers before the migration stream. > > Enable the "file:/dev/fdset/#" syntax for the multifd migration with > mapped-ram. The requirements for the fdset mechanism are: > > On the migration source side: > > - the fdset must contain two fds that are not duplicates between > themselves; > - if direct-io is to be used, exactly one of the fds must have the > O_DIRECT flag set; > - the file must be opened with WRONLY both times. > > On the migration destination side: > > - the fdset must contain one fd; > - the file must be opened with RDONLY. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > docs/devel/migration/main.rst | 18 ++++++++++++++ > docs/devel/migration/mapped-ram.rst | 6 ++++- > migration/file.c | 38 ++++++++++++++++++++++++++++- > 3 files changed, 60 insertions(+), 2 deletions(-) > > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst > index 54385a23e5..50f6096470 100644 > --- a/docs/devel/migration/main.rst > +++ b/docs/devel/migration/main.rst > @@ -47,6 +47,24 @@ over any transport. > QEMU interference. Note that QEMU does not flush cached file > data/metadata at the end of migration. > > + The file migration also supports using a file that has already been > + opened. A set of file descriptors is passed to QEMU via an "fdset" > + (see add-fd QMP command documentation). This method allows a > + management application to have control over the migration file > + opening operation. There are, however, strict requirements to this > + interface: > + > + On the migration source side: > + - if the multifd capability is to be used, the fdset must contain > + two file descriptors that are not duplicates between themselves; > + - if the direct-io capability is to be used, exactly one of the > + file descriptors must have the O_DIRECT flag set; > + - the file must be opened with WRONLY. > + > + On the migration destination side: > + - the fdset must contain one file descriptor; > + - the file must be opened with RDONLY. > + > In addition, support is included for migration using RDMA, which > transports the page data using ``RDMA``, where the hardware takes care of > transporting the pages, and the load on the CPU is much lower. While the > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst > index fa4cefd9fc..e6505511f0 100644 > --- a/docs/devel/migration/mapped-ram.rst > +++ b/docs/devel/migration/mapped-ram.rst > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a > sequential stream. Having the pages at fixed offsets also allows the > usage of O_DIRECT for save/restore of the migration stream as the > pages are ensured to be written respecting O_DIRECT alignment > -restrictions (direct-io support not yet implemented). > +restrictions. > > Usage > ----- > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: > Mapped-ram migration is best done non-live, i.e. by stopping the VM on > the source side before migrating. > > +For best performance enable the ``direct-io`` capability as well: > + > + ``migrate_set_capability direct-io on`` > + > Use-cases > --------- > > diff --git a/migration/file.c b/migration/file.c > index b9265b14dd..3bc8bc7463 100644 > --- a/migration/file.c > +++ b/migration/file.c > @@ -17,6 +17,7 @@ > #include "io/channel-file.h" > #include "io/channel-socket.h" > #include "io/channel-util.h" > +#include "monitor/monitor.h" > #include "options.h" > #include "trace.h" > > @@ -54,10 +55,18 @@ static void file_remove_fdset(void) > } > } > > +/* > + * With multifd, due to the behavior of the dup() system call, we need > + * the fdset to have two non-duplicate fds so we can enable direct IO > + * in the secondary channels without affecting the main channel. > + */ > static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > Error **errp) > { > + FdsetInfoList *fds_info; > + FdsetFdInfoList *fd_info; > const char *fdset_id_str; > + int nfds = 0; > > *fdset_id = -1; > > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > return false; > } > > + if (!migrate_multifd() || !migrate_direct_io()) { > + return true; > + } > + > + for (fds_info = qmp_query_fdsets(NULL); fds_info; > + fds_info = fds_info->next) { > + > + if (*fdset_id != fds_info->value->fdset_id) { > + continue; > + } > + > + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { > + if (nfds++ > 2) { > + break; > + } > + } > + } > + > + if (nfds != 2) { > + error_setg(errp, "Outgoing migration needs two fds in the fdset, " > + "got %d", nfds); > + qmp_remove_fd(*fdset_id, false, -1, NULL); > + *fdset_id = -1; > + return false; > + } > + > return true; > } Related to my thoughts in an earlier patch, where I say that use of fdsets ought to be transparent to QEMU code, I'm not a fan of having this logic in migration code. IIUC, the migration code will call qio_channel_file_new_path twice, once with O_DIRECT and once without. This should trigger two calls into monitor_fdset_dup_fd_add with different flags. If we're matching flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was provided, are we not able to report an error there ? > > @@ -209,10 +244,11 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) > g_autofree char *filename = g_strdup(file_args->filename); > QIOChannelFile *fioc = NULL; > uint64_t offset = file_args->offset; > + int flags = O_RDONLY; > > trace_migration_file_incoming(filename); > > - fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp); > + fioc = qio_channel_file_new_path(filename, flags, 0, errp); > if (!fioc) { > return; > } > -- > 2.35.3 > With regards, Daniel
On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote: > On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote: > > Allow multifd to use an fdset when migrating to a file. This is useful > > for the scenario where the management layer wants to have control over > > the migration file. > > > > By receiving the file descriptors directly, QEMU can delegate some > > high level operating system operations to the management layer (such > > as mandatory access control). The management layer might also want to > > add its own headers before the migration stream. > > > > Enable the "file:/dev/fdset/#" syntax for the multifd migration with > > mapped-ram. The requirements for the fdset mechanism are: > > > > On the migration source side: > > > > - the fdset must contain two fds that are not duplicates between > > themselves; > > - if direct-io is to be used, exactly one of the fds must have the > > O_DIRECT flag set; > > - the file must be opened with WRONLY both times. > > > > On the migration destination side: > > > > - the fdset must contain one fd; > > - the file must be opened with RDONLY. > > > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > > --- > > docs/devel/migration/main.rst | 18 ++++++++++++++ > > docs/devel/migration/mapped-ram.rst | 6 ++++- > > migration/file.c | 38 ++++++++++++++++++++++++++++- > > 3 files changed, 60 insertions(+), 2 deletions(-) > > > > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst > > index 54385a23e5..50f6096470 100644 > > --- a/docs/devel/migration/main.rst > > +++ b/docs/devel/migration/main.rst > > @@ -47,6 +47,24 @@ over any transport. > > QEMU interference. Note that QEMU does not flush cached file > > data/metadata at the end of migration. > > > > + The file migration also supports using a file that has already been > > + opened. A set of file descriptors is passed to QEMU via an "fdset" > > + (see add-fd QMP command documentation). This method allows a > > + management application to have control over the migration file > > + opening operation. There are, however, strict requirements to this > > + interface: > > + > > + On the migration source side: > > + - if the multifd capability is to be used, the fdset must contain > > + two file descriptors that are not duplicates between themselves; > > + - if the direct-io capability is to be used, exactly one of the > > + file descriptors must have the O_DIRECT flag set; > > + - the file must be opened with WRONLY. > > + > > + On the migration destination side: > > + - the fdset must contain one file descriptor; > > + - the file must be opened with RDONLY. > > + > > In addition, support is included for migration using RDMA, which > > transports the page data using ``RDMA``, where the hardware takes care of > > transporting the pages, and the load on the CPU is much lower. While the > > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst > > index fa4cefd9fc..e6505511f0 100644 > > --- a/docs/devel/migration/mapped-ram.rst > > +++ b/docs/devel/migration/mapped-ram.rst > > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a > > sequential stream. Having the pages at fixed offsets also allows the > > usage of O_DIRECT for save/restore of the migration stream as the > > pages are ensured to be written respecting O_DIRECT alignment > > -restrictions (direct-io support not yet implemented). > > +restrictions. > > > > Usage > > ----- > > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: > > Mapped-ram migration is best done non-live, i.e. by stopping the VM on > > the source side before migrating. > > > > +For best performance enable the ``direct-io`` capability as well: > > + > > + ``migrate_set_capability direct-io on`` > > + > > Use-cases > > --------- > > > > diff --git a/migration/file.c b/migration/file.c > > index b9265b14dd..3bc8bc7463 100644 > > --- a/migration/file.c > > +++ b/migration/file.c > > @@ -17,6 +17,7 @@ > > #include "io/channel-file.h" > > #include "io/channel-socket.h" > > #include "io/channel-util.h" > > +#include "monitor/monitor.h" > > #include "options.h" > > #include "trace.h" > > > > @@ -54,10 +55,18 @@ static void file_remove_fdset(void) > > } > > } > > > > +/* > > + * With multifd, due to the behavior of the dup() system call, we need > > + * the fdset to have two non-duplicate fds so we can enable direct IO > > + * in the secondary channels without affecting the main channel. > > + */ > > static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > > Error **errp) > > { > > + FdsetInfoList *fds_info; > > + FdsetFdInfoList *fd_info; > > const char *fdset_id_str; > > + int nfds = 0; > > > > *fdset_id = -1; > > > > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > > return false; > > } > > > > + if (!migrate_multifd() || !migrate_direct_io()) { > > + return true; > > + } > > + > > + for (fds_info = qmp_query_fdsets(NULL); fds_info; > > + fds_info = fds_info->next) { > > + > > + if (*fdset_id != fds_info->value->fdset_id) { > > + continue; > > + } > > + > > + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { > > + if (nfds++ > 2) { > > + break; > > + } > > + } > > + } > > + > > + if (nfds != 2) { > > + error_setg(errp, "Outgoing migration needs two fds in the fdset, " > > + "got %d", nfds); > > + qmp_remove_fd(*fdset_id, false, -1, NULL); > > + *fdset_id = -1; > > + return false; > > + } > > + > > return true; > > } > > Related to my thoughts in an earlier patch, where I say that use of fdsets > ought to be transparent to QEMU code, I'm not a fan of having this logic > in migration code. > > IIUC, the migration code will call qio_channel_file_new_path twice, > once with O_DIRECT and once without. This should trigger two calls > into monitor_fdset_dup_fd_add with different flags. If we're matching > flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was > provided, are we not able to report an error there ? Right, this sounds working. For a real sanity check, we may want to somehow check the two fds returned from qio_channel_file_new_path() to point to the same file underneath. What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as the whole purpose of having two fds is to make sure they have different struct file to back the fd (and only one of them has O_DIRECT). fstat() might work in this case over the st_ino field, etc. maybe fstatfs() too but perhaps that's over cautious. Just a pain to use two fds as a start.. Thanks,
Peter Xu <peterx@redhat.com> writes: > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote: >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote: >> > Allow multifd to use an fdset when migrating to a file. This is useful >> > for the scenario where the management layer wants to have control over >> > the migration file. >> > >> > By receiving the file descriptors directly, QEMU can delegate some >> > high level operating system operations to the management layer (such >> > as mandatory access control). The management layer might also want to >> > add its own headers before the migration stream. >> > >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with >> > mapped-ram. The requirements for the fdset mechanism are: >> > >> > On the migration source side: >> > >> > - the fdset must contain two fds that are not duplicates between >> > themselves; >> > - if direct-io is to be used, exactly one of the fds must have the >> > O_DIRECT flag set; >> > - the file must be opened with WRONLY both times. >> > >> > On the migration destination side: >> > >> > - the fdset must contain one fd; >> > - the file must be opened with RDONLY. >> > >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >> > --- >> > docs/devel/migration/main.rst | 18 ++++++++++++++ >> > docs/devel/migration/mapped-ram.rst | 6 ++++- >> > migration/file.c | 38 ++++++++++++++++++++++++++++- >> > 3 files changed, 60 insertions(+), 2 deletions(-) >> > >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst >> > index 54385a23e5..50f6096470 100644 >> > --- a/docs/devel/migration/main.rst >> > +++ b/docs/devel/migration/main.rst >> > @@ -47,6 +47,24 @@ over any transport. >> > QEMU interference. Note that QEMU does not flush cached file >> > data/metadata at the end of migration. >> > >> > + The file migration also supports using a file that has already been >> > + opened. A set of file descriptors is passed to QEMU via an "fdset" >> > + (see add-fd QMP command documentation). This method allows a >> > + management application to have control over the migration file >> > + opening operation. There are, however, strict requirements to this >> > + interface: >> > + >> > + On the migration source side: >> > + - if the multifd capability is to be used, the fdset must contain >> > + two file descriptors that are not duplicates between themselves; >> > + - if the direct-io capability is to be used, exactly one of the >> > + file descriptors must have the O_DIRECT flag set; >> > + - the file must be opened with WRONLY. >> > + >> > + On the migration destination side: >> > + - the fdset must contain one file descriptor; >> > + - the file must be opened with RDONLY. >> > + >> > In addition, support is included for migration using RDMA, which >> > transports the page data using ``RDMA``, where the hardware takes care of >> > transporting the pages, and the load on the CPU is much lower. While the >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst >> > index fa4cefd9fc..e6505511f0 100644 >> > --- a/docs/devel/migration/mapped-ram.rst >> > +++ b/docs/devel/migration/mapped-ram.rst >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a >> > sequential stream. Having the pages at fixed offsets also allows the >> > usage of O_DIRECT for save/restore of the migration stream as the >> > pages are ensured to be written respecting O_DIRECT alignment >> > -restrictions (direct-io support not yet implemented). >> > +restrictions. >> > >> > Usage >> > ----- >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: >> > Mapped-ram migration is best done non-live, i.e. by stopping the VM on >> > the source side before migrating. >> > >> > +For best performance enable the ``direct-io`` capability as well: >> > + >> > + ``migrate_set_capability direct-io on`` >> > + >> > Use-cases >> > --------- >> > >> > diff --git a/migration/file.c b/migration/file.c >> > index b9265b14dd..3bc8bc7463 100644 >> > --- a/migration/file.c >> > +++ b/migration/file.c >> > @@ -17,6 +17,7 @@ >> > #include "io/channel-file.h" >> > #include "io/channel-socket.h" >> > #include "io/channel-util.h" >> > +#include "monitor/monitor.h" >> > #include "options.h" >> > #include "trace.h" >> > >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void) >> > } >> > } >> > >> > +/* >> > + * With multifd, due to the behavior of the dup() system call, we need >> > + * the fdset to have two non-duplicate fds so we can enable direct IO >> > + * in the secondary channels without affecting the main channel. >> > + */ >> > static bool file_parse_fdset(const char *filename, int64_t *fdset_id, >> > Error **errp) >> > { >> > + FdsetInfoList *fds_info; >> > + FdsetFdInfoList *fd_info; >> > const char *fdset_id_str; >> > + int nfds = 0; >> > >> > *fdset_id = -1; >> > >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, >> > return false; >> > } >> > >> > + if (!migrate_multifd() || !migrate_direct_io()) { >> > + return true; >> > + } >> > + >> > + for (fds_info = qmp_query_fdsets(NULL); fds_info; >> > + fds_info = fds_info->next) { >> > + >> > + if (*fdset_id != fds_info->value->fdset_id) { >> > + continue; >> > + } >> > + >> > + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { >> > + if (nfds++ > 2) { >> > + break; >> > + } >> > + } >> > + } >> > + >> > + if (nfds != 2) { >> > + error_setg(errp, "Outgoing migration needs two fds in the fdset, " >> > + "got %d", nfds); >> > + qmp_remove_fd(*fdset_id, false, -1, NULL); >> > + *fdset_id = -1; >> > + return false; >> > + } >> > + >> > return true; >> > } >> >> Related to my thoughts in an earlier patch, where I say that use of fdsets >> ought to be transparent to QEMU code, I'm not a fan of having this logic >> in migration code. >> >> IIUC, the migration code will call qio_channel_file_new_path twice, >> once with O_DIRECT and once without. This should trigger two calls >> into monitor_fdset_dup_fd_add with different flags. If we're matching >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was >> provided, are we not able to report an error there ? > > Right, this sounds working. It works, but due to how low-level fdset is, it's difficult to match the low level error to anything meaningful we can report to the user. I'll have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not very useful. -1 with no errno -1 with EACCES (should actually be EBADF) -1 with ENOENT There has been some discusstion around this before actually: https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html Or, you know, let the management layer figure it out. We seem to be heading in this direction already. I imagine once the code is written to interact with QEMU, it would not have any reason to change, so it might be ok to replace some of the code I'm adding in this series with documentation and call it a day. I don't like this approach very much, but it would definitely make this series way simpler. > > For a real sanity check, we may want to somehow check the two fds returned > from qio_channel_file_new_path() to point to the same file underneath. > > What mentioned in the other thread (kcmp with KCMP_FILE) might not work, as > the whole purpose of having two fds is to make sure they have different > struct file to back the fd (and only one of them has O_DIRECT). fstat() > might work in this case over the st_ino field, etc. maybe fstatfs() too but > perhaps that's over cautious. Just a pain to use two fds as a start.. > > Thanks,
On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote: > >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote: > >> > Allow multifd to use an fdset when migrating to a file. This is useful > >> > for the scenario where the management layer wants to have control over > >> > the migration file. > >> > > >> > By receiving the file descriptors directly, QEMU can delegate some > >> > high level operating system operations to the management layer (such > >> > as mandatory access control). The management layer might also want to > >> > add its own headers before the migration stream. > >> > > >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with > >> > mapped-ram. The requirements for the fdset mechanism are: > >> > > >> > On the migration source side: > >> > > >> > - the fdset must contain two fds that are not duplicates between > >> > themselves; > >> > - if direct-io is to be used, exactly one of the fds must have the > >> > O_DIRECT flag set; > >> > - the file must be opened with WRONLY both times. > >> > > >> > On the migration destination side: > >> > > >> > - the fdset must contain one fd; > >> > - the file must be opened with RDONLY. > >> > > >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> > >> > --- > >> > docs/devel/migration/main.rst | 18 ++++++++++++++ > >> > docs/devel/migration/mapped-ram.rst | 6 ++++- > >> > migration/file.c | 38 ++++++++++++++++++++++++++++- > >> > 3 files changed, 60 insertions(+), 2 deletions(-) > >> > > >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst > >> > index 54385a23e5..50f6096470 100644 > >> > --- a/docs/devel/migration/main.rst > >> > +++ b/docs/devel/migration/main.rst > >> > @@ -47,6 +47,24 @@ over any transport. > >> > QEMU interference. Note that QEMU does not flush cached file > >> > data/metadata at the end of migration. > >> > > >> > + The file migration also supports using a file that has already been > >> > + opened. A set of file descriptors is passed to QEMU via an "fdset" > >> > + (see add-fd QMP command documentation). This method allows a > >> > + management application to have control over the migration file > >> > + opening operation. There are, however, strict requirements to this > >> > + interface: > >> > + > >> > + On the migration source side: > >> > + - if the multifd capability is to be used, the fdset must contain > >> > + two file descriptors that are not duplicates between themselves; > >> > + - if the direct-io capability is to be used, exactly one of the > >> > + file descriptors must have the O_DIRECT flag set; > >> > + - the file must be opened with WRONLY. > >> > + > >> > + On the migration destination side: > >> > + - the fdset must contain one file descriptor; > >> > + - the file must be opened with RDONLY. > >> > + > >> > In addition, support is included for migration using RDMA, which > >> > transports the page data using ``RDMA``, where the hardware takes care of > >> > transporting the pages, and the load on the CPU is much lower. While the > >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst > >> > index fa4cefd9fc..e6505511f0 100644 > >> > --- a/docs/devel/migration/mapped-ram.rst > >> > +++ b/docs/devel/migration/mapped-ram.rst > >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a > >> > sequential stream. Having the pages at fixed offsets also allows the > >> > usage of O_DIRECT for save/restore of the migration stream as the > >> > pages are ensured to be written respecting O_DIRECT alignment > >> > -restrictions (direct-io support not yet implemented). > >> > +restrictions. > >> > > >> > Usage > >> > ----- > >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: > >> > Mapped-ram migration is best done non-live, i.e. by stopping the VM on > >> > the source side before migrating. > >> > > >> > +For best performance enable the ``direct-io`` capability as well: > >> > + > >> > + ``migrate_set_capability direct-io on`` > >> > + > >> > Use-cases > >> > --------- > >> > > >> > diff --git a/migration/file.c b/migration/file.c > >> > index b9265b14dd..3bc8bc7463 100644 > >> > --- a/migration/file.c > >> > +++ b/migration/file.c > >> > @@ -17,6 +17,7 @@ > >> > #include "io/channel-file.h" > >> > #include "io/channel-socket.h" > >> > #include "io/channel-util.h" > >> > +#include "monitor/monitor.h" > >> > #include "options.h" > >> > #include "trace.h" > >> > > >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void) > >> > } > >> > } > >> > > >> > +/* > >> > + * With multifd, due to the behavior of the dup() system call, we need > >> > + * the fdset to have two non-duplicate fds so we can enable direct IO > >> > + * in the secondary channels without affecting the main channel. > >> > + */ > >> > static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > >> > Error **errp) > >> > { > >> > + FdsetInfoList *fds_info; > >> > + FdsetFdInfoList *fd_info; > >> > const char *fdset_id_str; > >> > + int nfds = 0; > >> > > >> > *fdset_id = -1; > >> > > >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, > >> > return false; > >> > } > >> > > >> > + if (!migrate_multifd() || !migrate_direct_io()) { > >> > + return true; > >> > + } > >> > + > >> > + for (fds_info = qmp_query_fdsets(NULL); fds_info; > >> > + fds_info = fds_info->next) { > >> > + > >> > + if (*fdset_id != fds_info->value->fdset_id) { > >> > + continue; > >> > + } > >> > + > >> > + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { > >> > + if (nfds++ > 2) { > >> > + break; > >> > + } > >> > + } > >> > + } > >> > + > >> > + if (nfds != 2) { > >> > + error_setg(errp, "Outgoing migration needs two fds in the fdset, " > >> > + "got %d", nfds); > >> > + qmp_remove_fd(*fdset_id, false, -1, NULL); > >> > + *fdset_id = -1; > >> > + return false; > >> > + } > >> > + > >> > return true; > >> > } > >> > >> Related to my thoughts in an earlier patch, where I say that use of fdsets > >> ought to be transparent to QEMU code, I'm not a fan of having this logic > >> in migration code. > >> > >> IIUC, the migration code will call qio_channel_file_new_path twice, > >> once with O_DIRECT and once without. This should trigger two calls > >> into monitor_fdset_dup_fd_add with different flags. If we're matching > >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was > >> provided, are we not able to report an error there ? > > > > Right, this sounds working. > > It works, but due to how low-level fdset is, it's difficult to match the > low level error to anything meaningful we can report to the user. I'll > have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not > very useful. > > -1 with no errno > -1 with EACCES (should actually be EBADF) > -1 with ENOENT > > There has been some discusstion around this before actually: > > https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html The only caller of monitor_fdset_dup_fd_add is qemu_open_internal and that has a "Error **errp" parameter. We should rewrite monitor_fdset_dup_fd_add to also have an "Error **errp" at which point we can actually report useful, actionable error messages from it. Errnos be gone ! With regards, Daniel
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, May 08, 2024 at 05:39:53PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Wed, May 08, 2024 at 09:53:48AM +0100, Daniel P. Berrangé wrote: >> >> On Fri, Apr 26, 2024 at 11:20:41AM -0300, Fabiano Rosas wrote: >> >> > Allow multifd to use an fdset when migrating to a file. This is useful >> >> > for the scenario where the management layer wants to have control over >> >> > the migration file. >> >> > >> >> > By receiving the file descriptors directly, QEMU can delegate some >> >> > high level operating system operations to the management layer (such >> >> > as mandatory access control). The management layer might also want to >> >> > add its own headers before the migration stream. >> >> > >> >> > Enable the "file:/dev/fdset/#" syntax for the multifd migration with >> >> > mapped-ram. The requirements for the fdset mechanism are: >> >> > >> >> > On the migration source side: >> >> > >> >> > - the fdset must contain two fds that are not duplicates between >> >> > themselves; >> >> > - if direct-io is to be used, exactly one of the fds must have the >> >> > O_DIRECT flag set; >> >> > - the file must be opened with WRONLY both times. >> >> > >> >> > On the migration destination side: >> >> > >> >> > - the fdset must contain one fd; >> >> > - the file must be opened with RDONLY. >> >> > >> >> > Signed-off-by: Fabiano Rosas <farosas@suse.de> >> >> > --- >> >> > docs/devel/migration/main.rst | 18 ++++++++++++++ >> >> > docs/devel/migration/mapped-ram.rst | 6 ++++- >> >> > migration/file.c | 38 ++++++++++++++++++++++++++++- >> >> > 3 files changed, 60 insertions(+), 2 deletions(-) >> >> > >> >> > diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst >> >> > index 54385a23e5..50f6096470 100644 >> >> > --- a/docs/devel/migration/main.rst >> >> > +++ b/docs/devel/migration/main.rst >> >> > @@ -47,6 +47,24 @@ over any transport. >> >> > QEMU interference. Note that QEMU does not flush cached file >> >> > data/metadata at the end of migration. >> >> > >> >> > + The file migration also supports using a file that has already been >> >> > + opened. A set of file descriptors is passed to QEMU via an "fdset" >> >> > + (see add-fd QMP command documentation). This method allows a >> >> > + management application to have control over the migration file >> >> > + opening operation. There are, however, strict requirements to this >> >> > + interface: >> >> > + >> >> > + On the migration source side: >> >> > + - if the multifd capability is to be used, the fdset must contain >> >> > + two file descriptors that are not duplicates between themselves; >> >> > + - if the direct-io capability is to be used, exactly one of the >> >> > + file descriptors must have the O_DIRECT flag set; >> >> > + - the file must be opened with WRONLY. >> >> > + >> >> > + On the migration destination side: >> >> > + - the fdset must contain one file descriptor; >> >> > + - the file must be opened with RDONLY. >> >> > + >> >> > In addition, support is included for migration using RDMA, which >> >> > transports the page data using ``RDMA``, where the hardware takes care of >> >> > transporting the pages, and the load on the CPU is much lower. While the >> >> > diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst >> >> > index fa4cefd9fc..e6505511f0 100644 >> >> > --- a/docs/devel/migration/mapped-ram.rst >> >> > +++ b/docs/devel/migration/mapped-ram.rst >> >> > @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a >> >> > sequential stream. Having the pages at fixed offsets also allows the >> >> > usage of O_DIRECT for save/restore of the migration stream as the >> >> > pages are ensured to be written respecting O_DIRECT alignment >> >> > -restrictions (direct-io support not yet implemented). >> >> > +restrictions. >> >> > >> >> > Usage >> >> > ----- >> >> > @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: >> >> > Mapped-ram migration is best done non-live, i.e. by stopping the VM on >> >> > the source side before migrating. >> >> > >> >> > +For best performance enable the ``direct-io`` capability as well: >> >> > + >> >> > + ``migrate_set_capability direct-io on`` >> >> > + >> >> > Use-cases >> >> > --------- >> >> > >> >> > diff --git a/migration/file.c b/migration/file.c >> >> > index b9265b14dd..3bc8bc7463 100644 >> >> > --- a/migration/file.c >> >> > +++ b/migration/file.c >> >> > @@ -17,6 +17,7 @@ >> >> > #include "io/channel-file.h" >> >> > #include "io/channel-socket.h" >> >> > #include "io/channel-util.h" >> >> > +#include "monitor/monitor.h" >> >> > #include "options.h" >> >> > #include "trace.h" >> >> > >> >> > @@ -54,10 +55,18 @@ static void file_remove_fdset(void) >> >> > } >> >> > } >> >> > >> >> > +/* >> >> > + * With multifd, due to the behavior of the dup() system call, we need >> >> > + * the fdset to have two non-duplicate fds so we can enable direct IO >> >> > + * in the secondary channels without affecting the main channel. >> >> > + */ >> >> > static bool file_parse_fdset(const char *filename, int64_t *fdset_id, >> >> > Error **errp) >> >> > { >> >> > + FdsetInfoList *fds_info; >> >> > + FdsetFdInfoList *fd_info; >> >> > const char *fdset_id_str; >> >> > + int nfds = 0; >> >> > >> >> > *fdset_id = -1; >> >> > >> >> > @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, >> >> > return false; >> >> > } >> >> > >> >> > + if (!migrate_multifd() || !migrate_direct_io()) { >> >> > + return true; >> >> > + } >> >> > + >> >> > + for (fds_info = qmp_query_fdsets(NULL); fds_info; >> >> > + fds_info = fds_info->next) { >> >> > + >> >> > + if (*fdset_id != fds_info->value->fdset_id) { >> >> > + continue; >> >> > + } >> >> > + >> >> > + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { >> >> > + if (nfds++ > 2) { >> >> > + break; >> >> > + } >> >> > + } >> >> > + } >> >> > + >> >> > + if (nfds != 2) { >> >> > + error_setg(errp, "Outgoing migration needs two fds in the fdset, " >> >> > + "got %d", nfds); >> >> > + qmp_remove_fd(*fdset_id, false, -1, NULL); >> >> > + *fdset_id = -1; >> >> > + return false; >> >> > + } >> >> > + >> >> > return true; >> >> > } >> >> >> >> Related to my thoughts in an earlier patch, where I say that use of fdsets >> >> ought to be transparent to QEMU code, I'm not a fan of having this logic >> >> in migration code. >> >> >> >> IIUC, the migration code will call qio_channel_file_new_path twice, >> >> once with O_DIRECT and once without. This should trigger two calls >> >> into monitor_fdset_dup_fd_add with different flags. If we're matching >> >> flags in that monitor_fdset_dup_fd_add(), then if only 1 FD was >> >> provided, are we not able to report an error there ? >> > >> > Right, this sounds working. >> >> It works, but due to how low-level fdset is, it's difficult to match the >> low level error to anything meaningful we can report to the user. I'll >> have to add an errp to monitor_fdset_dup_fd_add(). Its returns are not >> very useful. >> >> -1 with no errno >> -1 with EACCES (should actually be EBADF) >> -1 with ENOENT >> >> There has been some discusstion around this before actually: >> >> https://lists.gnu.org/archive/html/qemu-devel/2021-08/msg02544.html > > The only caller of monitor_fdset_dup_fd_add is qemu_open_internal > and that has a "Error **errp" parameter. We should rewrite > monitor_fdset_dup_fd_add to also have an "Error **errp" at which > point we can actually report useful, actionable error messages > from it. Errnos be gone ! I can do that, but qemu_open_old() does not pass the error in. Please see if this works for you: -->8-- From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001 From: Fabiano Rosas <farosas@suse.de> Date: Fri, 17 May 2024 19:30:39 -0300 Subject: [PATCH] io: Stop using qemu_open_old in channel-file We want to make use of the Error object to report fdset errors from qemu_open_internal() and passing the error pointer to qemu_open_old() would require changing all callers. Move the file channel to the new API instead. Signed-off-by: Fabiano Rosas <farosas@suse.de> --- io/channel-file.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/io/channel-file.c b/io/channel-file.c index 6436cfb6ae..2ea8d08360 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path, ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE)); - ioc->fd = qemu_open_old(path, flags, mode); + if (flags & O_CREAT) { + ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp); + } else { + ioc->fd = qemu_open(path, flags, errp); + } if (ioc->fd < 0) { object_unref(OBJECT(ioc)); - error_setg_errno(errp, errno, - "Unable to open %s", path); return NULL; }
On Fri, May 17, 2024 at 07:43:35PM -0300, Fabiano Rosas wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > > > The only caller of monitor_fdset_dup_fd_add is qemu_open_internal > > and that has a "Error **errp" parameter. We should rewrite > > monitor_fdset_dup_fd_add to also have an "Error **errp" at which > > point we can actually report useful, actionable error messages > > from it. Errnos be gone ! > > I can do that, but qemu_open_old() does not pass the error in. Please > see if this works for you: > > -->8-- > From 16e333cc5aeca1fab3f75f79048c0ab0d62d5b08 Mon Sep 17 00:00:00 2001 > From: Fabiano Rosas <farosas@suse.de> > Date: Fri, 17 May 2024 19:30:39 -0300 > Subject: [PATCH] io: Stop using qemu_open_old in channel-file > > We want to make use of the Error object to report fdset errors from > qemu_open_internal() and passing the error pointer to qemu_open_old() > would require changing all callers. Move the file channel to the new > API instead. > > Signed-off-by: Fabiano Rosas <farosas@suse.de> > --- > io/channel-file.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/io/channel-file.c b/io/channel-file.c > index 6436cfb6ae..2ea8d08360 100644 > --- a/io/channel-file.c > +++ b/io/channel-file.c > @@ -68,11 +68,13 @@ qio_channel_file_new_path(const char *path, > > ioc = QIO_CHANNEL_FILE(object_new(TYPE_QIO_CHANNEL_FILE)); > > - ioc->fd = qemu_open_old(path, flags, mode); > + if (flags & O_CREAT) { > + ioc->fd = qemu_create(path, flags & ~O_CREAT, mode, errp); > + } else { > + ioc->fd = qemu_open(path, flags, errp); > + } > if (ioc->fd < 0) { > object_unref(OBJECT(ioc)); > - error_setg_errno(errp, errno, > - "Unable to open %s", path); > return NULL; > } Reviewed-by: Daniel P. Berrangé <berrange@redhat.com> With regards, Daniel
diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst index 54385a23e5..50f6096470 100644 --- a/docs/devel/migration/main.rst +++ b/docs/devel/migration/main.rst @@ -47,6 +47,24 @@ over any transport. QEMU interference. Note that QEMU does not flush cached file data/metadata at the end of migration. + The file migration also supports using a file that has already been + opened. A set of file descriptors is passed to QEMU via an "fdset" + (see add-fd QMP command documentation). This method allows a + management application to have control over the migration file + opening operation. There are, however, strict requirements to this + interface: + + On the migration source side: + - if the multifd capability is to be used, the fdset must contain + two file descriptors that are not duplicates between themselves; + - if the direct-io capability is to be used, exactly one of the + file descriptors must have the O_DIRECT flag set; + - the file must be opened with WRONLY. + + On the migration destination side: + - the fdset must contain one file descriptor; + - the file must be opened with RDONLY. + In addition, support is included for migration using RDMA, which transports the page data using ``RDMA``, where the hardware takes care of transporting the pages, and the load on the CPU is much lower. While the diff --git a/docs/devel/migration/mapped-ram.rst b/docs/devel/migration/mapped-ram.rst index fa4cefd9fc..e6505511f0 100644 --- a/docs/devel/migration/mapped-ram.rst +++ b/docs/devel/migration/mapped-ram.rst @@ -16,7 +16,7 @@ location in the file, rather than constantly being added to a sequential stream. Having the pages at fixed offsets also allows the usage of O_DIRECT for save/restore of the migration stream as the pages are ensured to be written respecting O_DIRECT alignment -restrictions (direct-io support not yet implemented). +restrictions. Usage ----- @@ -35,6 +35,10 @@ Use a ``file:`` URL for migration: Mapped-ram migration is best done non-live, i.e. by stopping the VM on the source side before migrating. +For best performance enable the ``direct-io`` capability as well: + + ``migrate_set_capability direct-io on`` + Use-cases --------- diff --git a/migration/file.c b/migration/file.c index b9265b14dd..3bc8bc7463 100644 --- a/migration/file.c +++ b/migration/file.c @@ -17,6 +17,7 @@ #include "io/channel-file.h" #include "io/channel-socket.h" #include "io/channel-util.h" +#include "monitor/monitor.h" #include "options.h" #include "trace.h" @@ -54,10 +55,18 @@ static void file_remove_fdset(void) } } +/* + * With multifd, due to the behavior of the dup() system call, we need + * the fdset to have two non-duplicate fds so we can enable direct IO + * in the secondary channels without affecting the main channel. + */ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, Error **errp) { + FdsetInfoList *fds_info; + FdsetFdInfoList *fd_info; const char *fdset_id_str; + int nfds = 0; *fdset_id = -1; @@ -71,6 +80,32 @@ static bool file_parse_fdset(const char *filename, int64_t *fdset_id, return false; } + if (!migrate_multifd() || !migrate_direct_io()) { + return true; + } + + for (fds_info = qmp_query_fdsets(NULL); fds_info; + fds_info = fds_info->next) { + + if (*fdset_id != fds_info->value->fdset_id) { + continue; + } + + for (fd_info = fds_info->value->fds; fd_info; fd_info = fd_info->next) { + if (nfds++ > 2) { + break; + } + } + } + + if (nfds != 2) { + error_setg(errp, "Outgoing migration needs two fds in the fdset, " + "got %d", nfds); + qmp_remove_fd(*fdset_id, false, -1, NULL); + *fdset_id = -1; + return false; + } + return true; } @@ -209,10 +244,11 @@ void file_start_incoming_migration(FileMigrationArgs *file_args, Error **errp) g_autofree char *filename = g_strdup(file_args->filename); QIOChannelFile *fioc = NULL; uint64_t offset = file_args->offset; + int flags = O_RDONLY; trace_migration_file_incoming(filename); - fioc = qio_channel_file_new_path(filename, O_RDONLY, 0, errp); + fioc = qio_channel_file_new_path(filename, flags, 0, errp); if (!fioc) { return; }
Allow multifd to use an fdset when migrating to a file. This is useful for the scenario where the management layer wants to have control over the migration file. By receiving the file descriptors directly, QEMU can delegate some high level operating system operations to the management layer (such as mandatory access control). The management layer might also want to add its own headers before the migration stream. Enable the "file:/dev/fdset/#" syntax for the multifd migration with mapped-ram. The requirements for the fdset mechanism are: On the migration source side: - the fdset must contain two fds that are not duplicates between themselves; - if direct-io is to be used, exactly one of the fds must have the O_DIRECT flag set; - the file must be opened with WRONLY both times. On the migration destination side: - the fdset must contain one fd; - the file must be opened with RDONLY. Signed-off-by: Fabiano Rosas <farosas@suse.de> --- docs/devel/migration/main.rst | 18 ++++++++++++++ docs/devel/migration/mapped-ram.rst | 6 ++++- migration/file.c | 38 ++++++++++++++++++++++++++++- 3 files changed, 60 insertions(+), 2 deletions(-)