[v2,00/11] vfio/migration: Implement VFIO migration protocol v2

Message ID	20220530170739.19072-1-avihaih@nvidia.com (mailing list archive)
Headers	show Return-Path: <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org> Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.235 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.235; helo=mail.nvidia.com; pr=C From: Avihai Horon <avihaih@nvidia.com> To: <qemu-devel@nongnu.org>, Cornelia Huck <cohuck@redhat.com>, "Alex Williamson" <alex.williamson@redhat.com>, Juan Quintela <quintela@redhat.com>, "Dr . David Alan Gilbert" <dgilbert@redhat.com> CC: Joao Martins <joao.m.martins@oracle.com>, Yishai Hadas <yishaih@nvidia.com>, Jason Gunthorpe <jgg@nvidia.com>, Mark Bloch <mbloch@nvidia.com>, Maor Gottlieb <maorg@nvidia.com>, Kirti Wankhede <kwankhede@nvidia.com>, Tarun Gupta <targupta@nvidia.com>, Avihai Horon <avihaih@nvidia.com> Subject: [PATCH v2 00/11] vfio/migration: Implement VFIO migration protocol v2 Date: Mon, 30 May 2022 20:07:28 +0300 Message-ID: <20220530170739.19072-1-avihaih@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain Received-SPF: softfail client-ip=40.107.244.84; envelope-from=avihaih@nvidia.com; helo=NAM12-MW2-obe.outbound.protection.outlook.com X-Spam_score_int: -21 X-Spam_score: -2.2 X-Spam_bar: -- X-Spam_report: (-2.2 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.082, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action Precedence: list Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: "Qemu-devel" <qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org>
Series	vfio/migration: Implement VFIO migration protocol v2 \| expand [v2,00/11] vfio/migration: Implement VFIO migration protocol v2 [v2,01/11] vfio/migration: Fix NULL pointer dereference bug [v2,02/11] vfio/migration: Skip pre-copy if dirty page tracking is not supported [v2,03/11] migration/qemu-file: Add qemu_file_get_to_fd() [v2,04/11] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one [v2,05/11] vfio/migration: Move migration v1 logic to vfio_migration_init() [v2,06/11] vfio/migration: Rename functions/structs related to v1 protocol [v2,07/11] vfio/migration: Implement VFIO migration protocol v2 [v2,08/11] vfio/migration: Remove VFIO migration protocol v1 [v2,09/11] vfio/migration: Reset device if setting recover state fails [v2,10/11] vfio: Alphabetize migration section of VFIO trace-events file [v2,11/11] docs/devel: Align vfio-migration docs to VFIO migration v2

Avihai Horon May 30, 2022, 5:07 p.m. UTC

Hello,

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Migration protocol v2 currently doesn't support the pre-copy phase of
   migration.

Full description of the v2 protocol and the differences from v1 can be
found here [1].

Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
that will be used later.

Patches 4-6 refactor v1 protocol code to make it easier to add v2
protocol.

Patches 7-11 implement v2 protocol and remove v1 protocol.

Thanks.

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/

Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
- Split the big patch that replaced v1 with v2 into several patches as
  suggested by Joao, to make review easier.
- Change warn_report to warn_report_once when container doesn't support
  dirty tracking.
- Add Reviewed-by tag.

Avihai Horon (11):
  vfio/migration: Fix NULL pointer dereference bug
  vfio/migration: Skip pre-copy if dirty page tracking is not supported
  migration/qemu-file: Add qemu_file_get_to_fd()
  vfio/common: Change vfio_devices_all_running_and_saving() logic to
    equivalent one
  vfio/migration: Move migration v1 logic to vfio_migration_init()
  vfio/migration: Rename functions/structs related to v1 protocol
  vfio/migration: Implement VFIO migration protocol v2
  vfio/migration: Remove VFIO migration protocol v1
  vfio/migration: Reset device if setting recover state fails
  vfio: Alphabetize migration section of VFIO trace-events file
  docs/devel: Align vfio-migration docs to VFIO migration v2

 docs/devel/vfio-migration.rst |  77 ++--
 hw/vfio/common.c              |  21 +-
 hw/vfio/migration.c           | 640 ++++++++--------------------------
 hw/vfio/trace-events          |  25 +-
 include/hw/vfio/vfio-common.h |   8 +-
 migration/migration.c         |   5 +
 migration/migration.h         |   3 +
 migration/qemu-file.c         |  34 ++
 migration/qemu-file.h         |   1 +
 9 files changed, 252 insertions(+), 562 deletions(-)

Avihai Horon June 7, 2022, 5:44 p.m. UTC | #1

On 5/30/2022 8:07 PM, Avihai Horon wrote:
> Hello,
>
> Following VFIO migration protocol v2 acceptance in kernel, this series
> implements VFIO migration according to the new v2 protocol and replaces
> the now deprecated v1 implementation.
>
> The main differences between v1 and v2 migration protocols are:
> 1. VFIO device state is represented as a finite state machine instead of
>     a bitmap.
>
> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>     ioctl and normal read() and write() instead of the migration region
>     used in v1.
>
> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>     migration.
>
> Full description of the v2 protocol and the differences from v1 can be
> found here [1].
>
> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> that will be used later.
>
> Patches 4-6 refactor v1 protocol code to make it easier to add v2
> protocol.
>
> Patches 7-11 implement v2 protocol and remove v1 protocol.
>
> Thanks.
>
> [1]
> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>
> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> - Split the big patch that replaced v1 with v2 into several patches as
>    suggested by Joao, to make review easier.
> - Change warn_report to warn_report_once when container doesn't support
>    dirty tracking.
> - Add Reviewed-by tag.
>
> Avihai Horon (11):
>    vfio/migration: Fix NULL pointer dereference bug
>    vfio/migration: Skip pre-copy if dirty page tracking is not supported
>    migration/qemu-file: Add qemu_file_get_to_fd()
>    vfio/common: Change vfio_devices_all_running_and_saving() logic to
>      equivalent one
>    vfio/migration: Move migration v1 logic to vfio_migration_init()
>    vfio/migration: Rename functions/structs related to v1 protocol
>    vfio/migration: Implement VFIO migration protocol v2
>    vfio/migration: Remove VFIO migration protocol v1
>    vfio/migration: Reset device if setting recover state fails
>    vfio: Alphabetize migration section of VFIO trace-events file
>    docs/devel: Align vfio-migration docs to VFIO migration v2
>
>   docs/devel/vfio-migration.rst |  77 ++--
>   hw/vfio/common.c              |  21 +-
>   hw/vfio/migration.c           | 640 ++++++++--------------------------
>   hw/vfio/trace-events          |  25 +-
>   include/hw/vfio/vfio-common.h |   8 +-
>   migration/migration.c         |   5 +
>   migration/migration.h         |   3 +
>   migration/qemu-file.c         |  34 ++
>   migration/qemu-file.h         |   1 +
>   9 files changed, 252 insertions(+), 562 deletions(-)
>
Ping.

Alex Williamson June 7, 2022, 9:32 p.m. UTC | #2

On Tue, 7 Jun 2022 20:44:23 +0300
Avihai Horon <avihaih@nvidia.com> wrote:

> On 5/30/2022 8:07 PM, Avihai Horon wrote:
> > Hello,
> >
> > Following VFIO migration protocol v2 acceptance in kernel, this series
> > implements VFIO migration according to the new v2 protocol and replaces
> > the now deprecated v1 implementation.
> >
> > The main differences between v1 and v2 migration protocols are:
> > 1. VFIO device state is represented as a finite state machine instead of
> >     a bitmap.
> >
> > 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
> >     ioctl and normal read() and write() instead of the migration region
> >     used in v1.
> >
> > 3. Migration protocol v2 currently doesn't support the pre-copy phase of
> >     migration.
> >
> > Full description of the v2 protocol and the differences from v1 can be
> > found here [1].
> >
> > Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> > that will be used later.
> >
> > Patches 4-6 refactor v1 protocol code to make it easier to add v2
> > protocol.
> >
> > Patches 7-11 implement v2 protocol and remove v1 protocol.
> >
> > Thanks.
> >
> > [1]
> > https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
> >
> > Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> > - Split the big patch that replaced v1 with v2 into several patches as
> >    suggested by Joao, to make review easier.
> > - Change warn_report to warn_report_once when container doesn't support
> >    dirty tracking.
> > - Add Reviewed-by tag.
> >
> > Avihai Horon (11):
> >    vfio/migration: Fix NULL pointer dereference bug
> >    vfio/migration: Skip pre-copy if dirty page tracking is not supported
> >    migration/qemu-file: Add qemu_file_get_to_fd()
> >    vfio/common: Change vfio_devices_all_running_and_saving() logic to
> >      equivalent one
> >    vfio/migration: Move migration v1 logic to vfio_migration_init()
> >    vfio/migration: Rename functions/structs related to v1 protocol
> >    vfio/migration: Implement VFIO migration protocol v2
> >    vfio/migration: Remove VFIO migration protocol v1
> >    vfio/migration: Reset device if setting recover state fails
> >    vfio: Alphabetize migration section of VFIO trace-events file
> >    docs/devel: Align vfio-migration docs to VFIO migration v2
> >
> >   docs/devel/vfio-migration.rst |  77 ++--
> >   hw/vfio/common.c              |  21 +-
> >   hw/vfio/migration.c           | 640 ++++++++--------------------------
> >   hw/vfio/trace-events          |  25 +-
> >   include/hw/vfio/vfio-common.h |   8 +-
> >   migration/migration.c         |   5 +
> >   migration/migration.h         |   3 +
> >   migration/qemu-file.c         |  34 ++
> >   migration/qemu-file.h         |   1 +
> >   9 files changed, 252 insertions(+), 562 deletions(-)
> >  
> Ping.

Based on the changelog, this seems like a mostly cosmetic spin and I
don't see that all of the discussion threads from v1 were resolved to
everyone's satisfaction.  I'm certainly still uncomfortable with the
pre-copy behavior and I thought there were still some action items to
figure out whether an SLA is present and vet the solution with
management tools.  Thanks,

Alex

Avihai Horon June 13, 2022, 11:21 a.m. UTC | #3

On 6/8/2022 12:32 AM, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Tue, 7 Jun 2022 20:44:23 +0300
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>>> Hello,
>>>
>>> Following VFIO migration protocol v2 acceptance in kernel, this series
>>> implements VFIO migration according to the new v2 protocol and replaces
>>> the now deprecated v1 implementation.
>>>
>>> The main differences between v1 and v2 migration protocols are:
>>> 1. VFIO device state is represented as a finite state machine instead of
>>>      a bitmap.
>>>
>>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>>>      ioctl and normal read() and write() instead of the migration region
>>>      used in v1.
>>>
>>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>>>      migration.
>>>
>>> Full description of the v2 protocol and the differences from v1 can be
>>> found here [1].
>>>
>>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
>>> that will be used later.
>>>
>>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
>>> protocol.
>>>
>>> Patches 7-11 implement v2 protocol and remove v1 protocol.
>>>
>>> Thanks.
>>>
>>> [1]
>>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>>>
>>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
>>> - Split the big patch that replaced v1 with v2 into several patches as
>>>     suggested by Joao, to make review easier.
>>> - Change warn_report to warn_report_once when container doesn't support
>>>     dirty tracking.
>>> - Add Reviewed-by tag.
>>>
>>> Avihai Horon (11):
>>>     vfio/migration: Fix NULL pointer dereference bug
>>>     vfio/migration: Skip pre-copy if dirty page tracking is not supported
>>>     migration/qemu-file: Add qemu_file_get_to_fd()
>>>     vfio/common: Change vfio_devices_all_running_and_saving() logic to
>>>       equivalent one
>>>     vfio/migration: Move migration v1 logic to vfio_migration_init()
>>>     vfio/migration: Rename functions/structs related to v1 protocol
>>>     vfio/migration: Implement VFIO migration protocol v2
>>>     vfio/migration: Remove VFIO migration protocol v1
>>>     vfio/migration: Reset device if setting recover state fails
>>>     vfio: Alphabetize migration section of VFIO trace-events file
>>>     docs/devel: Align vfio-migration docs to VFIO migration v2
>>>
>>>    docs/devel/vfio-migration.rst |  77 ++--
>>>    hw/vfio/common.c              |  21 +-
>>>    hw/vfio/migration.c           | 640 ++++++++--------------------------
>>>    hw/vfio/trace-events          |  25 +-
>>>    include/hw/vfio/vfio-common.h |   8 +-
>>>    migration/migration.c         |   5 +
>>>    migration/migration.h         |   3 +
>>>    migration/qemu-file.c         |  34 ++
>>>    migration/qemu-file.h         |   1 +
>>>    9 files changed, 252 insertions(+), 562 deletions(-)
>>>
>> Ping.
> Based on the changelog, this seems like a mostly cosmetic spin and I
> don't see that all of the discussion threads from v1 were resolved to
> everyone's satisfaction.  I'm certainly still uncomfortable with the
> pre-copy behavior and I thought there were still some action items to
> figure out whether an SLA is present and vet the solution with
> management tools.  Thanks,

Yes.
OK, so let's clear things up and reach an agreement before I prepare the 
v3 series.

There are three topics that came up in previous discussion:

 1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
    Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
    outside migration thread.
    This code existed before and I fixed a NULL pointer dereference that
    I encountered.
    I suggested that later we can refactor VMChangeStateHandler to
    return error.
    I prefer not to do this refactor right now because I am not sure
    it's as straightforward change as it might seem - if some notifier
    fails and we abort do_vm_stop/vm_prepare_start in the middle, can
    this leave the VM in some unstable state?
    We plan to leave it as is and not do the refactor as part of this
    series.
    Are you ok with this?

 2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page
    tracking is not supported.
    As previously discussed, this patch doesn't consider the configured
    downtime limit.
    One way to fix it is to allow such migration only when "no SLA" (no
    downtime limit) is set. AFAIK today there is no way that one can set
    "no SLA".
    If we go with this option, we change normal flow of migration
    (skipping pre-copy) and might need to change management tools.

Instead, what about letting QEMU VFIO code mark all pages dirty (instead 
of kernel)?
This way we don’t skip pre-copy and we get the same behavior we have now 
of perpetual dirtying all RAM, which respects SLA.
If we go with this option, do we need to block migration when IOMMU is 
sPAPR TCE?
Until now migration would be blocked because sPAPR TCE doesn't report 
dirty_pages_supported cap, but going with this option we will allow 
migration even when dirty_pages_supported cap is not set (and let QEMU 
dirty all pages).

 3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
    Juan expressed his concern about the amount of data that will go
    through main migration thread.

This is already the case in v1 protocol - VFIO devices send all their 
data in the main migration thread. Note that like in v1 protocol, here 
as well the data is sent in small sized chunks, each with a header.
This patch just aims to eliminate an extra copy.

We plan to leave it as is. Is this ok?

Thanks.

Alex Williamson June 17, 2022, 9:51 p.m. UTC | #4

On Mon, 13 Jun 2022 14:21:26 +0300
Avihai Horon <avihaih@nvidia.com> wrote:

> On 6/8/2022 12:32 AM, Alex Williamson wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Tue, 7 Jun 2022 20:44:23 +0300
> > Avihai Horon <avihaih@nvidia.com> wrote:
> >  
> >> On 5/30/2022 8:07 PM, Avihai Horon wrote:  
> >>> Hello,
> >>>
> >>> Following VFIO migration protocol v2 acceptance in kernel, this series
> >>> implements VFIO migration according to the new v2 protocol and replaces
> >>> the now deprecated v1 implementation.
> >>>
> >>> The main differences between v1 and v2 migration protocols are:
> >>> 1. VFIO device state is represented as a finite state machine instead of
> >>>      a bitmap.
> >>>
> >>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
> >>>      ioctl and normal read() and write() instead of the migration region
> >>>      used in v1.
> >>>
> >>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
> >>>      migration.
> >>>
> >>> Full description of the v2 protocol and the differences from v1 can be
> >>> found here [1].
> >>>
> >>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
> >>> that will be used later.
> >>>
> >>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
> >>> protocol.
> >>>
> >>> Patches 7-11 implement v2 protocol and remove v1 protocol.
> >>>
> >>> Thanks.
> >>>
> >>> [1]
> >>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
> >>>
> >>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
> >>> - Split the big patch that replaced v1 with v2 into several patches as
> >>>     suggested by Joao, to make review easier.
> >>> - Change warn_report to warn_report_once when container doesn't support
> >>>     dirty tracking.
> >>> - Add Reviewed-by tag.
> >>>
> >>> Avihai Horon (11):
> >>>     vfio/migration: Fix NULL pointer dereference bug
> >>>     vfio/migration: Skip pre-copy if dirty page tracking is not supported
> >>>     migration/qemu-file: Add qemu_file_get_to_fd()
> >>>     vfio/common: Change vfio_devices_all_running_and_saving() logic to
> >>>       equivalent one
> >>>     vfio/migration: Move migration v1 logic to vfio_migration_init()
> >>>     vfio/migration: Rename functions/structs related to v1 protocol
> >>>     vfio/migration: Implement VFIO migration protocol v2
> >>>     vfio/migration: Remove VFIO migration protocol v1
> >>>     vfio/migration: Reset device if setting recover state fails
> >>>     vfio: Alphabetize migration section of VFIO trace-events file
> >>>     docs/devel: Align vfio-migration docs to VFIO migration v2
> >>>
> >>>    docs/devel/vfio-migration.rst |  77 ++--
> >>>    hw/vfio/common.c              |  21 +-
> >>>    hw/vfio/migration.c           | 640 ++++++++--------------------------
> >>>    hw/vfio/trace-events          |  25 +-
> >>>    include/hw/vfio/vfio-common.h |   8 +-
> >>>    migration/migration.c         |   5 +
> >>>    migration/migration.h         |   3 +
> >>>    migration/qemu-file.c         |  34 ++
> >>>    migration/qemu-file.h         |   1 +
> >>>    9 files changed, 252 insertions(+), 562 deletions(-)
> >>>  
> >> Ping.  
> > Based on the changelog, this seems like a mostly cosmetic spin and I
> > don't see that all of the discussion threads from v1 were resolved to
> > everyone's satisfaction.  I'm certainly still uncomfortable with the
> > pre-copy behavior and I thought there were still some action items to
> > figure out whether an SLA is present and vet the solution with
> > management tools.  Thanks,  
> 
> Yes.
> OK, so let's clear things up and reach an agreement before I prepare the 
> v3 series.
> 
> There are three topics that came up in previous discussion:
> 
>  1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
>     Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
>     outside migration thread.
>     This code existed before and I fixed a NULL pointer dereference that
>     I encountered.
>     I suggested that later we can refactor VMChangeStateHandler to
>     return error.
>     I prefer not to do this refactor right now because I am not sure
>     it's as straightforward change as it might seem - if some notifier
>     fails and we abort do_vm_stop/vm_prepare_start in the middle, can
>     this leave the VM in some unstable state?
>     We plan to leave it as is and not do the refactor as part of this
>     series.
>     Are you ok with this?

I'll defer to Juan here, it's not 100% clear to me from the last reply
if he's looking for that sooner than later.  Juan?
 
>  2. [PATCH v2 02/11] vfio/migration: Skip pre-copy if dirty page
>     tracking is not supported.
>     As previously discussed, this patch doesn't consider the configured
>     downtime limit.
>     One way to fix it is to allow such migration only when "no SLA" (no
>     downtime limit) is set. AFAIK today there is no way that one can set
>     "no SLA".
>     If we go with this option, we change normal flow of migration
>     (skipping pre-copy) and might need to change management tools.
> 
> Instead, what about letting QEMU VFIO code mark all pages dirty (instead 
> of kernel)?
> This way we don’t skip pre-copy and we get the same behavior we have now 
> of perpetual dirtying all RAM, which respects SLA.
> If we go with this option, do we need to block migration when IOMMU is 
> sPAPR TCE?
> Until now migration would be blocked because sPAPR TCE doesn't report 
> dirty_pages_supported cap, but going with this option we will allow 
> migration even when dirty_pages_supported cap is not set (and let QEMU 
> dirty all pages).

It's ok by me if QEMU vfio is the one that marks all mapped pages dirty
if the host interface provides no way to do so.  Would we toggle that
based on whether the device has bus-master enabled?

Regarding SPAPR, I'd tend to think that if we're dirtying in QEMU then
nothing prevents us from implementing the same there, but also I'm not
going to stand in the way of simply disabling migration for that IOMMU
backend unless someone speaks up that they think it deserves parity.
 
>  3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
>     Juan expressed his concern about the amount of data that will go
>     through main migration thread.
> 
> This is already the case in v1 protocol - VFIO devices send all their 
> data in the main migration thread. Note that like in v1 protocol, here 
> as well the data is sent in small sized chunks, each with a header.
> This patch just aims to eliminate an extra copy.
> 
> We plan to leave it as is. Is this ok?

I don't think we should lean too heavily on this being a bump from v1 to
v2 protocol as v1 was only ever experimental and hasn't been widely
used in practice AFAIK.  Again, I'll defer to the migration folks for
this, it requires their buy-in.  Thanks,

Alex

Jason Gunthorpe June 23, 2022, 2:56 p.m. UTC | #5

On Fri, Jun 17, 2022 at 03:51:29PM -0600, Alex Williamson wrote:

> It's ok by me if QEMU vfio is the one that marks all mapped pages dirty
> if the host interface provides no way to do so.  Would we toggle that
> based on whether the device has bus-master enabled?

I don't think so, that is a very niche optimization, it would only
happen if a device is plugged in but never used.

If a device truely doesn't have bus master capability at all then it's
VFIO migration driver should implement report dirties and report no
dirties.

> Regarding SPAPR, I'd tend to think that if we're dirtying in QEMU then
> nothing prevents us from implementing the same there, but also I'm not
> going to stand in the way of simply disabling migration for that IOMMU
> backend unless someone speaks up that they think it deserves parity.

If the VFIO device internal tracker is being used it should work with
SPAPR too.

The full algorithm should be to try to find a dirty tracker for each
VFIO migration device and if none is found then always dirty
everything at STOP_COPY.

iommufd will provide the only global dirty tracker, so if SPAPR or
legacy VFIO type1 is used without a device internal tracker then it
should do the all-dirties.

Jason

Avihai Horon June 27, 2022, 7:36 a.m. UTC | #6

On 6/18/2022 12:51 AM, Alex Williamson wrote:
> External email: Use caution opening links or attachments
>
>
> On Mon, 13 Jun 2022 14:21:26 +0300
> Avihai Horon <avihaih@nvidia.com> wrote:
>
>> On 6/8/2022 12:32 AM, Alex Williamson wrote:
>>> External email: Use caution opening links or attachments
>>>
>>>
>>> On Tue, 7 Jun 2022 20:44:23 +0300
>>> Avihai Horon <avihaih@nvidia.com> wrote:
>>>
>>>> On 5/30/2022 8:07 PM, Avihai Horon wrote:
>>>>> Hello,
>>>>>
>>>>> Following VFIO migration protocol v2 acceptance in kernel, this series
>>>>> implements VFIO migration according to the new v2 protocol and replaces
>>>>> the now deprecated v1 implementation.
>>>>>
>>>>> The main differences between v1 and v2 migration protocols are:
>>>>> 1. VFIO device state is represented as a finite state machine instead of
>>>>>       a bitmap.
>>>>>
>>>>> 2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
>>>>>       ioctl and normal read() and write() instead of the migration region
>>>>>       used in v1.
>>>>>
>>>>> 3. Migration protocol v2 currently doesn't support the pre-copy phase of
>>>>>       migration.
>>>>>
>>>>> Full description of the v2 protocol and the differences from v1 can be
>>>>> found here [1].
>>>>>
>>>>> Patches 1-3 are prep patches fixing bugs and adding QEMUFile function
>>>>> that will be used later.
>>>>>
>>>>> Patches 4-6 refactor v1 protocol code to make it easier to add v2
>>>>> protocol.
>>>>>
>>>>> Patches 7-11 implement v2 protocol and remove v1 protocol.
>>>>>
>>>>> Thanks.
>>>>>
>>>>> [1]
>>>>> https://lore.kernel.org/all/20220224142024.147653-10-yishaih@nvidia.com/
>>>>>
>>>>> Changes from v1: https://lore.kernel.org/all/20220512154320.19697-1-avihaih@nvidia.com/
>>>>> - Split the big patch that replaced v1 with v2 into several patches as
>>>>>      suggested by Joao, to make review easier.
>>>>> - Change warn_report to warn_report_once when container doesn't support
>>>>>      dirty tracking.
>>>>> - Add Reviewed-by tag.
>>>>>
>>>>> Avihai Horon (11):
>>>>>      vfio/migration: Fix NULL pointer dereference bug
>>>>>      vfio/migration: Skip pre-copy if dirty page tracking is not supported
>>>>>      migration/qemu-file: Add qemu_file_get_to_fd()
>>>>>      vfio/common: Change vfio_devices_all_running_and_saving() logic to
>>>>>        equivalent one
>>>>>      vfio/migration: Move migration v1 logic to vfio_migration_init()
>>>>>      vfio/migration: Rename functions/structs related to v1 protocol
>>>>>      vfio/migration: Implement VFIO migration protocol v2
>>>>>      vfio/migration: Remove VFIO migration protocol v1
>>>>>      vfio/migration: Reset device if setting recover state fails
>>>>>      vfio: Alphabetize migration section of VFIO trace-events file
>>>>>      docs/devel: Align vfio-migration docs to VFIO migration v2
>>>>>
>>>>>     docs/devel/vfio-migration.rst |  77 ++--
>>>>>     hw/vfio/common.c              |  21 +-
>>>>>     hw/vfio/migration.c           | 640 ++++++++--------------------------
>>>>>     hw/vfio/trace-events          |  25 +-
>>>>>     include/hw/vfio/vfio-common.h |   8 +-
>>>>>     migration/migration.c         |   5 +
>>>>>     migration/migration.h         |   3 +
>>>>>     migration/qemu-file.c         |  34 ++
>>>>>     migration/qemu-file.h         |   1 +
>>>>>     9 files changed, 252 insertions(+), 562 deletions(-)
>>>>>
>>>> Ping.
>>> Based on the changelog, this seems like a mostly cosmetic spin and I
>>> don't see that all of the discussion threads from v1 were resolved to
>>> everyone's satisfaction.  I'm certainly still uncomfortable with the
>>> pre-copy behavior and I thought there were still some action items to
>>> figure out whether an SLA is present and vet the solution with
>>> management tools.  Thanks,
>> Yes.
>> OK, so let's clear things up and reach an agreement before I prepare the
>> v3 series.
>>
>> There are three topics that came up in previous discussion:
>>
>>   1. [PATCH v2 01/11] vfio/migration: Fix NULL pointer dereference bug.
>>      Juan gave his Reviewed-by but he wasn't sure about qemu_file_* usage
>>      outside migration thread.
>>      This code existed before and I fixed a NULL pointer dereference that
>>      I encountered.
>>      I suggested that later we can refactor VMChangeStateHandler to
>>      return error.
>>      I prefer not to do this refactor right now because I am not sure
>>      it's as straightforward change as it might seem - if some notifier
>>      fails and we abort do_vm_stop/vm_prepare_start in the middle, can
>>      this leave the VM in some unstable state?
>>      We plan to leave it as is and not do the refactor as part of this
>>      series.
>>      Are you ok with this?
> I'll defer to Juan here, it's not 100% clear to me from the last reply
> if he's looking for that sooner than later.  Juan?
<snip>
>>   3. [PATCH v2 03/11] migration/qemu-file: Add qemu_file_get_to_fd().
>>      Juan expressed his concern about the amount of data that will go
>>      through main migration thread.
>>
>> This is already the case in v1 protocol - VFIO devices send all their
>> data in the main migration thread. Note that like in v1 protocol, here
>> as well the data is sent in small sized chunks, each with a header.
>> This patch just aims to eliminate an extra copy.
>>
>> We plan to leave it as is. Is this ok?
> I don't think we should lean too heavily on this being a bump from v1 to
> v2 protocol as v1 was only ever experimental and hasn't been widely
> used in practice AFAIK.  Again, I'll defer to the migration folks for
> this, it requires their buy-in.  Thanks,

Ping.
Juan, can you respond to items 1 and 3?
Thanks.

[v2,00/11] vfio/migration: Implement VFIO migration protocol v2

Message

Comments