Message ID | 20220512154320.19697-6-avihaih@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | vfio/migration: Implement VFIO migration protocol v2 | expand |
Avihai Horon <avihaih@nvidia.com> wrote: > Add new function qemu_file_get_to_fd() that allows reading data from > QEMUFile and writing it straight into a given fd. > > This will be used later in VFIO migration code. > > Signed-off-by: Avihai Horon <avihaih@nvidia.com> > --- > migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++ > migration/qemu-file.h | 1 + > 2 files changed, 35 insertions(+) > > diff --git a/migration/qemu-file.c b/migration/qemu-file.c > index 1479cddad9..cad3d32eb3 100644 > --- a/migration/qemu-file.c > +++ b/migration/qemu-file.c > @@ -867,3 +867,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) > { > return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL; > } > + > +/* > + * Read size bytes from QEMUFile f and write them to fd. > + */ > +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) > +{ > + while (size) { > + size_t pending = f->buf_size - f->buf_index; > + ssize_t rc; > + > + if (!pending) { > + rc = qemu_fill_buffer(f); > + if (rc < 0) { > + return rc; > + } > + if (rc == 0) { > + return -1; > + } > + continue; > + } > + > + rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); > + if (rc < 0) { > + return rc; > + } > + if (rc == 0) { > + return -1; > + } > + f->buf_index += rc; > + size -= rc; > + } > + > + return 0; > +} Is there a really performance difference to just use: uint8_t buffer[size]; qemu_get_buffer(f, buffer, size); write(fd, buffer, size); Or telling it otherwise, what sizes are we talking here? Thanks, Juan. > diff --git a/migration/qemu-file.h b/migration/qemu-file.h > index 3f36d4dc8c..dd26037450 100644 > --- a/migration/qemu-file.h > +++ b/migration/qemu-file.h > @@ -162,6 +162,7 @@ int qemu_file_shutdown(QEMUFile *f); > QEMUFile *qemu_file_get_return_path(QEMUFile *f); > void qemu_fflush(QEMUFile *f); > void qemu_file_set_blocking(QEMUFile *f, bool block); > +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); > > void ram_control_before_iterate(QEMUFile *f, uint64_t flags); > void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
On 5/16/2022 2:31 PM, Juan Quintela wrote: > External email: Use caution opening links or attachments > > > Avihai Horon <avihaih@nvidia.com> wrote: >> Add new function qemu_file_get_to_fd() that allows reading data from >> QEMUFile and writing it straight into a given fd. >> >> This will be used later in VFIO migration code. >> >> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >> --- >> migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++ >> migration/qemu-file.h | 1 + >> 2 files changed, 35 insertions(+) >> >> diff --git a/migration/qemu-file.c b/migration/qemu-file.c >> index 1479cddad9..cad3d32eb3 100644 >> --- a/migration/qemu-file.c >> +++ b/migration/qemu-file.c >> @@ -867,3 +867,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) >> { >> return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL; >> } >> + >> +/* >> + * Read size bytes from QEMUFile f and write them to fd. >> + */ >> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) >> +{ >> + while (size) { >> + size_t pending = f->buf_size - f->buf_index; >> + ssize_t rc; >> + >> + if (!pending) { >> + rc = qemu_fill_buffer(f); >> + if (rc < 0) { >> + return rc; >> + } >> + if (rc == 0) { >> + return -1; >> + } >> + continue; >> + } >> + >> + rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); >> + if (rc < 0) { >> + return rc; >> + } >> + if (rc == 0) { >> + return -1; >> + } >> + f->buf_index += rc; >> + size -= rc; >> + } >> + >> + return 0; >> +} > Is there a really performance difference to just use: > > uint8_t buffer[size]; > > qemu_get_buffer(f, buffer, size); > write(fd, buffer, size); > > Or telling it otherwise, what sizes are we talking here? It depends on the device, but It can range from a few MBs to several GBs AFAIK. Thanks. > > Thanks, Juan. > > >> diff --git a/migration/qemu-file.h b/migration/qemu-file.h >> index 3f36d4dc8c..dd26037450 100644 >> --- a/migration/qemu-file.h >> +++ b/migration/qemu-file.h >> @@ -162,6 +162,7 @@ int qemu_file_shutdown(QEMUFile *f); >> QEMUFile *qemu_file_get_return_path(QEMUFile *f); >> void qemu_fflush(QEMUFile *f); >> void qemu_file_set_blocking(QEMUFile *f, bool block); >> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); >> >> void ram_control_before_iterate(QEMUFile *f, uint64_t flags); >> void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
Avihai Horon <avihaih@nvidia.com> wrote: > On 5/16/2022 2:31 PM, Juan Quintela wrote: >> External email: Use caution opening links or attachments >> >> >> Avihai Horon <avihaih@nvidia.com> wrote: >>> Add new function qemu_file_get_to_fd() that allows reading data from >>> QEMUFile and writing it straight into a given fd. >>> >>> This will be used later in VFIO migration code. >>> >>> Signed-off-by: Avihai Horon <avihaih@nvidia.com> >>> --- >>> migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++ >>> migration/qemu-file.h | 1 + >>> 2 files changed, 35 insertions(+) >>> >>> diff --git a/migration/qemu-file.c b/migration/qemu-file.c >>> index 1479cddad9..cad3d32eb3 100644 >>> --- a/migration/qemu-file.c >>> +++ b/migration/qemu-file.c >>> @@ -867,3 +867,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) >>> { >>> return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL; >>> } >>> + >>> +/* >>> + * Read size bytes from QEMUFile f and write them to fd. >>> + */ >>> +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) >>> +{ >>> + while (size) { >>> + size_t pending = f->buf_size - f->buf_index; >>> + ssize_t rc; >>> + >>> + if (!pending) { >>> + rc = qemu_fill_buffer(f); >>> + if (rc < 0) { >>> + return rc; >>> + } >>> + if (rc == 0) { >>> + return -1; >>> + } >>> + continue; >>> + } >>> + >>> + rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); >>> + if (rc < 0) { >>> + return rc; >>> + } >>> + if (rc == 0) { >>> + return -1; >>> + } >>> + f->buf_index += rc; >>> + size -= rc; >>> + } >>> + >>> + return 0; >>> +} >> Is there a really performance difference to just use: >> >> uint8_t buffer[size]; >> >> qemu_get_buffer(f, buffer, size); >> write(fd, buffer, size); >> >> Or telling it otherwise, what sizes are we talking here? > > It depends on the device, but It can range from a few MBs to several > GBs AFAIK. a few MB is ok. Several GB on the main migration channel without a single header/whatever? This sounds like a recipe for disaster IMHO. Remember that on source side, we have a migration thread. This patch will just block that migration thread. If we are using 10Gigabit networking, 1GB/second for friends and making math easy, each GB will take 1 second downtime. During that downtime, migration control is basically handycapped. And you are sendinfg this data with qemu_put_buffer_async(). This function just adds its buffer to an iovec, only user uses 4KB (or whatever your page size is) to the iovec. You are talking about adding gigabytes here. I don't know how well this is going to work, but my guess is that migrate_cancel is not going to be happy. On destination side, this is even worse, because we receive this multigigabyte chunk in a coroutine, and I am not sure that this will not completely block all qemu while this is happening (again, multisecond time). I have to think more about this problem, but I can see how this is going to be able to go through the migration main channel. Later, Juan.
On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote: > >> Is there a really performance difference to just use: > >> > >> uint8_t buffer[size]; > >> > >> qemu_get_buffer(f, buffer, size); > >> write(fd, buffer, size); > >> > >> Or telling it otherwise, what sizes are we talking here? > > > > It depends on the device, but It can range from a few MBs to several > > GBs AFAIK. > > a few MB is ok. > > Several GB on the main migration channel without a single > header/whatever? IIRC it iterates in multi-megabyte chunks each which gets a header. The chunking size is set by the size of the buffer mmap The overall point is that memcpying GB's is going to be taxing so we want to eliminate copies on this path, especially copies that result in more system calls. We are expecting to look into further optimization down the road here because even this is still too slow. Jason
On Wed, May 18, 2022 at 12:42:37PM -0300, Jason Gunthorpe wrote: > On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote: > > > >> Is there a really performance difference to just use: > > >> > > >> uint8_t buffer[size]; > > >> > > >> qemu_get_buffer(f, buffer, size); > > >> write(fd, buffer, size); > > >> > > >> Or telling it otherwise, what sizes are we talking here? > > > > > > It depends on the device, but It can range from a few MBs to several > > > GBs AFAIK. > > > > a few MB is ok. > > > > Several GB on the main migration channel without a single > > header/whatever? > > IIRC it iterates in multi-megabyte chunks each which gets a header. > > The chunking size is set by the size of the buffer mmap > > The overall point is that memcpying GB's is going to be taxing so we > want to eliminate copies on this path, especially copies that result > in more system calls. > > We are expecting to look into further optimization down the road here > because even this is still too slow. Considering the possibility of future optimization, IMHO adding this kind of API at the QEMUFile level is too high. We'd be better pushing the impl down into the QIOChannel API level. int64_t qio_channel_copy_range(QIOCHannel *srcioc, QIOChannel *tgtioc, size_t len); The QIOChannel impl can do pretty much what you showed in the general case, but in special cases it could have the option to offload to the kernel copy_range() syscall to avoid the context sitches. With regards, Daniel
On Wed, May 18, 2022 at 05:00:26PM +0100, Daniel P. Berrangé wrote: > On Wed, May 18, 2022 at 12:42:37PM -0300, Jason Gunthorpe wrote: > > On Wed, May 18, 2022 at 01:54:34PM +0200, Juan Quintela wrote: > > > > > >> Is there a really performance difference to just use: > > > >> > > > >> uint8_t buffer[size]; > > > >> > > > >> qemu_get_buffer(f, buffer, size); > > > >> write(fd, buffer, size); > > > >> > > > >> Or telling it otherwise, what sizes are we talking here? > > > > > > > > It depends on the device, but It can range from a few MBs to several > > > > GBs AFAIK. > > > > > > a few MB is ok. > > > > > > Several GB on the main migration channel without a single > > > header/whatever? > > > > IIRC it iterates in multi-megabyte chunks each which gets a header. > > > > The chunking size is set by the size of the buffer mmap > > > > The overall point is that memcpying GB's is going to be taxing so we > > want to eliminate copies on this path, especially copies that result > > in more system calls. > > > > We are expecting to look into further optimization down the road here > > because even this is still too slow. > > Considering the possibility of future optimization, IMHO adding this > kind of API at the QEMUFile level is too high. We'd be better pushing > the impl down into the QIOChannel API level. > > int64_t qio_channel_copy_range(QIOCHannel *srcioc, > QIOChannel *tgtioc, > size_t len); > > The QIOChannel impl can do pretty much what you showed in the general > case, but in special cases it could have the option to offload to the > kernel copy_range() syscall to avoid the context sitches. This is probably something to do down the road when we figure out exactly what is best. Currently we don't have kernel support for optimized copy_file_range() (ie fops splice_read) inside the VFIO drivers so copy_file_range() will just fail. I didn't look closely again but IIRC the challenge is that the QIOChannel doesn't have a ready large temporary buffer to use for a non-splice fallback path. Jason
diff --git a/migration/qemu-file.c b/migration/qemu-file.c index 1479cddad9..cad3d32eb3 100644 --- a/migration/qemu-file.c +++ b/migration/qemu-file.c @@ -867,3 +867,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file) { return file->has_ioc ? QIO_CHANNEL(file->opaque) : NULL; } + +/* + * Read size bytes from QEMUFile f and write them to fd. + */ +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size) +{ + while (size) { + size_t pending = f->buf_size - f->buf_index; + ssize_t rc; + + if (!pending) { + rc = qemu_fill_buffer(f); + if (rc < 0) { + return rc; + } + if (rc == 0) { + return -1; + } + continue; + } + + rc = write(fd, f->buf + f->buf_index, MIN(pending, size)); + if (rc < 0) { + return rc; + } + if (rc == 0) { + return -1; + } + f->buf_index += rc; + size -= rc; + } + + return 0; +} diff --git a/migration/qemu-file.h b/migration/qemu-file.h index 3f36d4dc8c..dd26037450 100644 --- a/migration/qemu-file.h +++ b/migration/qemu-file.h @@ -162,6 +162,7 @@ int qemu_file_shutdown(QEMUFile *f); QEMUFile *qemu_file_get_return_path(QEMUFile *f); void qemu_fflush(QEMUFile *f); void qemu_file_set_blocking(QEMUFile *f, bool block); +int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size); void ram_control_before_iterate(QEMUFile *f, uint64_t flags); void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
Add new function qemu_file_get_to_fd() that allows reading data from QEMUFile and writing it straight into a given fd. This will be used later in VFIO migration code. Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- migration/qemu-file.c | 34 ++++++++++++++++++++++++++++++++++ migration/qemu-file.h | 1 + 2 files changed, 35 insertions(+)