mbox series

[v5,0/8] vfio/hisilicon: add ACC live migration driver

Message ID 20220221114043.2030-1-shameerali.kolothum.thodi@huawei.com (mailing list archive)
Headers show
Series vfio/hisilicon: add ACC live migration driver | expand

Message

Shameerali Kolothum Thodi Feb. 21, 2022, 11:40 a.m. UTC
Hi,

This series attempts to add vfio live migration support for
HiSilicon ACC VF devices based on the new v2 migration protocol
definition and mlx5 v8 series discussed here[0].

RFCv4 --> v5
  - Dropped RFC tag as v2 migration APIs are more stable now.
  - Addressed review comments from Jason and Alex (Thanks!).

This is sanity tested on a HiSilicon platform using the Qemu branch
provided here[1].

Please take a look and let me know your feedback.

Thanks,
Shameer
[0] https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
[1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2


v3 --> RFCv4
-Based on migration v2 protocol and mlx5 v7 series.
-Added RFC tag again as migration v2 protocol is still under discussion.
-Added new patch #6 to retrieve the PF QM data.
-PRE_COPY compatibility check is now done after the migration data
 transfer. This is not ideal and needs discussion.

RFC v2 --> v3
 -Dropped RFC tag as the vfio_pci_core subsystem framework is now
  part of 5.15-rc1.
 -Added override methods for vfio_device_ops read/write/mmap calls
  to limit the access within the functional register space.
 -Patches 1 to 3 are code refactoring to move the common ACC QM
  definitions and header around.

RFCv1 --> RFCv2

 -Adds a new vendor-specific vfio_pci driver(hisi-acc-vfio-pci)
  for HiSilicon ACC VF devices based on the new vfio-pci-core
  framework proposal.

 -Since HiSilicon ACC VF device MMIO space contains both the
  functional register space and migration control register space,
  override the vfio_device_ops ioctl method to report only the
  functional space to VMs.

 -For a successful migration, we still need access to VF dev
  functional register space mainly to read the status registers.
  But accessing these while the Guest vCPUs are running may leave
  a security hole. To avoid any potential security issues, we
  map/unmap the MMIO regions on a need basis and is safe to do so.
  (Please see hisi_acc_vf_ioremap/unmap() fns in patch #4).
 
 -Dropped debugfs support for now.
 -Uses common QM functions for mailbox access(patch #3).

Longfang Liu (2):
  crypto: hisilicon/qm: Move few definitions to common header
  hisi_acc_vfio_pci: Add support for VFIO live migration

Shameer Kolothum (6):
  crypto: hisilicon/qm: Move the QM header to include/linux
  hisi_acc_qm: Move PCI device IDs to common header
  hisi_acc_vfio_pci: add new vfio_pci driver for HiSilicon ACC devices
  hisi_acc_vfio_pci: Restrict access to VF dev BAR2 migration region
  hisi_acc_vfio_pci: Add helper to retrieve the PF qm data
  hisi_acc_vfio_pci: Use its own PCI reset_done error handler

 drivers/crypto/hisilicon/hpre/hpre.h          |    2 +-
 drivers/crypto/hisilicon/hpre/hpre_main.c     |   18 +-
 drivers/crypto/hisilicon/qm.c                 |   34 +-
 drivers/crypto/hisilicon/sec2/sec.h           |    2 +-
 drivers/crypto/hisilicon/sec2/sec_main.c      |   20 +-
 drivers/crypto/hisilicon/sgl.c                |    2 +-
 drivers/crypto/hisilicon/zip/zip.h            |    2 +-
 drivers/crypto/hisilicon/zip/zip_main.c       |   17 +-
 drivers/vfio/pci/Kconfig                      |    2 +
 drivers/vfio/pci/Makefile                     |    2 +
 drivers/vfio/pci/hisilicon/Kconfig            |   16 +
 drivers/vfio/pci/hisilicon/Makefile           |    4 +
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.c    | 1316 +++++++++++++++++
 .../vfio/pci/hisilicon/hisi_acc_vfio_pci.h    |  119 ++
 .../qm.h => include/linux/hisi_acc_qm.h       |   44 +
 include/linux/pci_ids.h                       |    6 +
 16 files changed, 1552 insertions(+), 54 deletions(-)
 create mode 100644 drivers/vfio/pci/hisilicon/Kconfig
 create mode 100644 drivers/vfio/pci/hisilicon/Makefile
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
 create mode 100644 drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
 rename drivers/crypto/hisilicon/qm.h => include/linux/hisi_acc_qm.h (88%)

Comments

Jason Gunthorpe Feb. 22, 2022, 12:49 a.m. UTC | #1
On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote:
> 
> Hi,
> 
> This series attempts to add vfio live migration support for
> HiSilicon ACC VF devices based on the new v2 migration protocol
> definition and mlx5 v8 series discussed here[0].
> 
> RFCv4 --> v5
>   - Dropped RFC tag as v2 migration APIs are more stable now.
>   - Addressed review comments from Jason and Alex (Thanks!).
> 
> This is sanity tested on a HiSilicon platform using the Qemu branch
> provided here[1].
> 
> Please take a look and let me know your feedback.
> 
> Thanks,
> Shameer
> [0] https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
> [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
> 
> 
> v3 --> RFCv4
> -Based on migration v2 protocol and mlx5 v7 series.
> -Added RFC tag again as migration v2 protocol is still under discussion.
> -Added new patch #6 to retrieve the PF QM data.
> -PRE_COPY compatibility check is now done after the migration data
>  transfer. This is not ideal and needs discussion.

Alex, do you want to keep the PRE_COPY in just for acc for now? Or do
you think this is not a good temporary use for it?

We have some work toward doing the compatability more generally, but I
think it will be a while before that is all settled.

Jason
Alex Williamson Feb. 22, 2022, 7:29 p.m. UTC | #2
On Mon, 21 Feb 2022 20:49:43 -0400
Jason Gunthorpe <jgg@nvidia.com> wrote:

> On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote:
> > 
> > Hi,
> > 
> > This series attempts to add vfio live migration support for
> > HiSilicon ACC VF devices based on the new v2 migration protocol
> > definition and mlx5 v8 series discussed here[0].
> > 
> > RFCv4 --> v5
> >   - Dropped RFC tag as v2 migration APIs are more stable now.
> >   - Addressed review comments from Jason and Alex (Thanks!).
> > 
> > This is sanity tested on a HiSilicon platform using the Qemu branch
> > provided here[1].
> > 
> > Please take a look and let me know your feedback.
> > 
> > Thanks,
> > Shameer
> > [0] https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
> > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
> > 
> > 
> > v3 --> RFCv4
> > -Based on migration v2 protocol and mlx5 v7 series.
> > -Added RFC tag again as migration v2 protocol is still under discussion.
> > -Added new patch #6 to retrieve the PF QM data.
> > -PRE_COPY compatibility check is now done after the migration data
> >  transfer. This is not ideal and needs discussion.  
> 
> Alex, do you want to keep the PRE_COPY in just for acc for now? Or do
> you think this is not a good temporary use for it?
> 
> We have some work toward doing the compatability more generally, but I
> think it will be a while before that is all settled.

In the original migration protocol I recall that we discussed that
using the pre-copy phase for compatibility testing, even without
additional device data, as a valid use case.  The migration driver of
course needs to account for the fact that userspace is not required to
perform a pre-copy, and therefore cannot rely on that exclusively for
compatibility testing, but failing a migration earlier due to detection
of an incompatibility is generally a good thing.

If the ACC driver wants to re-incorporate this behavior into a non-RFC
proposed series and we could align accepting them into the same kernel
release, that sounds ok to me.  Thanks,

Alex
Shameerali Kolothum Thodi Feb. 23, 2022, 3:53 p.m. UTC | #3
> -----Original Message-----
> From: Alex Williamson [mailto:alex.williamson@redhat.com]
> Sent: 22 February 2022 19:30
> To: Jason Gunthorpe <jgg@nvidia.com>
> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> kvm@vger.kernel.org; linux-kernel@vger.kernel.org;
> linux-crypto@vger.kernel.org; cohuck@redhat.com; mgurtovoy@nvidia.com;
> yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang
> <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>;
> Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B)
> <wangzhou1@hisilicon.com>
> Subject: Re: [PATCH v5 0/8] vfio/hisilicon: add ACC live migration driver
> 
> On Mon, 21 Feb 2022 20:49:43 -0400
> Jason Gunthorpe <jgg@nvidia.com> wrote:
> 
> > On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote:
> > >
> > > Hi,
> > >
> > > This series attempts to add vfio live migration support for
> > > HiSilicon ACC VF devices based on the new v2 migration protocol
> > > definition and mlx5 v8 series discussed here[0].
> > >
> > > RFCv4 --> v5
> > >   - Dropped RFC tag as v2 migration APIs are more stable now.
> > >   - Addressed review comments from Jason and Alex (Thanks!).
> > >
> > > This is sanity tested on a HiSilicon platform using the Qemu branch
> > > provided here[1].
> > >
> > > Please take a look and let me know your feedback.
> > >
> > > Thanks,
> > > Shameer
> > > [0]
> https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/
> > > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2
> > >
> > >
> > > v3 --> RFCv4
> > > -Based on migration v2 protocol and mlx5 v7 series.
> > > -Added RFC tag again as migration v2 protocol is still under discussion.
> > > -Added new patch #6 to retrieve the PF QM data.
> > > -PRE_COPY compatibility check is now done after the migration data
> > >  transfer. This is not ideal and needs discussion.
> >
> > Alex, do you want to keep the PRE_COPY in just for acc for now? Or do
> > you think this is not a good temporary use for it?
> >
> > We have some work toward doing the compatability more generally, but I
> > think it will be a while before that is all settled.
> 
> In the original migration protocol I recall that we discussed that
> using the pre-copy phase for compatibility testing, even without
> additional device data, as a valid use case.  The migration driver of
> course needs to account for the fact that userspace is not required to
> perform a pre-copy, and therefore cannot rely on that exclusively for
> compatibility testing, but failing a migration earlier due to detection
> of an incompatibility is generally a good thing.
> 
> If the ACC driver wants to re-incorporate this behavior into a non-RFC
> proposed series and we could align accepting them into the same kernel
> release, that sounds ok to me.  Thanks,

Ok. I will add the support to PRE_COPY and check compatibility early. 

From FSM arc point of view, I guess it is adding,

STATE_RUNNING --> STATE_PRE_COPY
   create the saving file.
   get_match_data();
   return fd;

STATE_PRE_COPY  --> STATE_STOP_COPY
   stop_device()
   get_device_data()
   update the saving migf total_len;

resume_write()
   check compatibility once we have enough bytes.

Also add support to IOCTL VFIO_DEVICE_MIG_PRECOPY.

I will have a go and sent out a revised one.

Thanks,
Shameer