Message ID | 20240306135855.4123535-1-xin.zeng@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | crypto: qat - enable QAT GEN4 SRIOV VF live migration for QAT GEN4 | expand |
On Wed, Mar 06, 2024 at 09:58:45PM +0800, Xin Zeng wrote: > This set enables live migration for Intel QAT GEN4 SRIOV Virtual > Functions (VFs). > It is composed of 10 patches. Patch 1~6 refactor the original QAT PF > driver implementation which will be reused by the following patches. > Patch 7 introduces the logic to the QAT PF driver that allows to save > and restore the state of a bank (a QAT VF is a wrapper around banks) and > drain a ring pair. Patch 8 adds the QAT PF driver a set of interfaces to > allow to save and restore the state of a VF that will be called by the > module qat_vfio_pci which will be introduced in the last patch. Patch 9 > implements the defined device interfaces. The last one adds a vfio pci > extension specific for QAT which intercepts the vfio device operations > for a QAT VF to allow live migration. > > Here are the steps required to test the live migration of a QAT GEN4 VF: > 1. Bind one or more QAT GEN4 VF devices to the module qat_vfio_pci.ko > 2. Assign the VFs to the virtual machine and enable device live > migration > 3. Run a workload using a QAT VF inside the VM, for example using qatlib > (https://github.com/intel/qatlib) > 4. Migrate the VM from the source node to a destination node > > Changes in v5 since v4: https://lore.kernel.org/kvm/20240228143402.89219-9-xin.zeng@intel.com > - Remove device ID recheck as no consensus has been reached yet (Kevin) > - Add missing state PRE_COPY_P2P in precopy_iotcl (Kevin) > - Rearrange the state transition flow for better readability (Kevin) > - Remove unnecessary Reviewed-by in commit message (Kevin) > > Changes in v4 since v3: https://lore.kernel.org/kvm/20240221155008.960369-11-xin.zeng@intel.com > - Change the order of maintainer entry for QAT vfio pci driver in > MAINTAINERS to make it alphabetical (Alex) > - Put QAT VFIO PCI driver under vfio/pci directly instead of > vfio/pci/intel (Alex) > - Add id_table recheck during device probe (Alex) > > Changes in v3 since v2: https://lore.kernel.org/kvm/20240220032052.66834-1-xin.zeng@intel.com > - Use state_mutex directly instead of unnecessary deferred_reset mode > (Jason) > > Changes in v2 since v1: https://lore.kernel.org/all/20240201153337.4033490-1-xin.zeng@intel.com > - Add VFIO_MIGRATION_PRE_COPY support (Alex) > - Remove unnecessary module dependancy in Kconfig (Alex) > - Use direct function calls instead of function pointers in qat vfio > variant driver (Jason) > - Address the comments including uncessary pointer check and kfree, > missing lock and direct use of pci_iov_vf_id (Shameer) > - Change CHECK_STAT macro to avoid repeat comparison (Kamlesh) > > Changes in v1 since RFC: https://lore.kernel.org/all/20230630131304.64243-1-xin.zeng@intel.com > - Address comments including the right module dependancy in Kconfig, > source file name and module description (Alex) > - Added PCI error handler and P2P state handler (Suggested by Kevin) > - Refactor the state check duing loading ring state (Kevin) > - Fix missed call to vfio_put_device in the error case (Breet) > - Migrate the shadow states in PF driver > - Rebase on top of 6.8-rc1 > > Giovanni Cabiddu (2): > crypto: qat - adf_get_etr_base() helper > crypto: qat - relocate CSR access code > > Siming Wan (3): > crypto: qat - rename get_sla_arr_of_type() > crypto: qat - expand CSR operations for QAT GEN4 devices > crypto: qat - add bank save and restore flows > > Xin Zeng (5): > crypto: qat - relocate and rename 4xxx PF2VM definitions > crypto: qat - move PFVF compat checker to a function > crypto: qat - add interface for live migration > crypto: qat - implement interface for live migration > vfio/qat: Add vfio_pci driver for Intel QAT VF devices > > MAINTAINERS | 8 + > .../intel/qat/qat_420xx/adf_420xx_hw_data.c | 3 + > .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 5 + > .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c | 1 + > .../qat/qat_c3xxxvf/adf_c3xxxvf_hw_data.c | 1 + > .../intel/qat/qat_c62x/adf_c62x_hw_data.c | 1 + > .../intel/qat/qat_c62xvf/adf_c62xvf_hw_data.c | 1 + > drivers/crypto/intel/qat/qat_common/Makefile | 6 +- > .../intel/qat/qat_common/adf_accel_devices.h | 88 ++ > .../intel/qat/qat_common/adf_common_drv.h | 10 + > .../qat/qat_common/adf_gen2_hw_csr_data.c | 101 ++ > .../qat/qat_common/adf_gen2_hw_csr_data.h | 86 ++ > .../intel/qat/qat_common/adf_gen2_hw_data.c | 97 -- > .../intel/qat/qat_common/adf_gen2_hw_data.h | 76 -- > .../qat/qat_common/adf_gen4_hw_csr_data.c | 231 ++++ > .../qat/qat_common/adf_gen4_hw_csr_data.h | 188 +++ > .../intel/qat/qat_common/adf_gen4_hw_data.c | 380 +++++-- > .../intel/qat/qat_common/adf_gen4_hw_data.h | 127 +-- > .../intel/qat/qat_common/adf_gen4_pfvf.c | 8 +- > .../intel/qat/qat_common/adf_gen4_vf_mig.c | 1010 +++++++++++++++++ > .../intel/qat/qat_common/adf_gen4_vf_mig.h | 10 + > .../intel/qat/qat_common/adf_mstate_mgr.c | 318 ++++++ > .../intel/qat/qat_common/adf_mstate_mgr.h | 89 ++ > .../intel/qat/qat_common/adf_pfvf_pf_proto.c | 8 +- > .../intel/qat/qat_common/adf_pfvf_utils.h | 11 + > drivers/crypto/intel/qat/qat_common/adf_rl.c | 10 +- > drivers/crypto/intel/qat/qat_common/adf_rl.h | 2 + > .../crypto/intel/qat/qat_common/adf_sriov.c | 7 +- > .../intel/qat/qat_common/adf_transport.c | 4 +- > .../crypto/intel/qat/qat_common/qat_mig_dev.c | 130 +++ > .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c | 1 + > .../qat_dh895xccvf/adf_dh895xccvf_hw_data.c | 1 + > drivers/vfio/pci/Kconfig | 2 + > drivers/vfio/pci/Makefile | 2 + > drivers/vfio/pci/qat/Kconfig | 12 + > drivers/vfio/pci/qat/Makefile | 3 + > drivers/vfio/pci/qat/main.c | 662 +++++++++++ > include/linux/qat/qat_mig_dev.h | 31 + > 38 files changed, 3344 insertions(+), 387 deletions(-) > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.c > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.h > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.c > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.h > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.c > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.h > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.c > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.h > create mode 100644 drivers/crypto/intel/qat/qat_common/qat_mig_dev.c > create mode 100644 drivers/vfio/pci/qat/Kconfig > create mode 100644 drivers/vfio/pci/qat/Makefile > create mode 100644 drivers/vfio/pci/qat/main.c > create mode 100644 include/linux/qat/qat_mig_dev.h > > > base-commit: 318407ed77e4140d02e43a001b1f4753e3ce6b5f > -- > 2.18.2 Patches 1-9 applied. Thanks.
On Thu, 28 Mar 2024 18:51:41 +0800 Herbert Xu <herbert@gondor.apana.org.au> wrote: > On Wed, Mar 06, 2024 at 09:58:45PM +0800, Xin Zeng wrote: > > This set enables live migration for Intel QAT GEN4 SRIOV Virtual > > Functions (VFs). > > It is composed of 10 patches. Patch 1~6 refactor the original QAT PF > > driver implementation which will be reused by the following patches. > > Patch 7 introduces the logic to the QAT PF driver that allows to save > > and restore the state of a bank (a QAT VF is a wrapper around banks) and > > drain a ring pair. Patch 8 adds the QAT PF driver a set of interfaces to > > allow to save and restore the state of a VF that will be called by the > > module qat_vfio_pci which will be introduced in the last patch. Patch 9 > > implements the defined device interfaces. The last one adds a vfio pci > > extension specific for QAT which intercepts the vfio device operations > > for a QAT VF to allow live migration. > > > > Here are the steps required to test the live migration of a QAT GEN4 VF: > > 1. Bind one or more QAT GEN4 VF devices to the module qat_vfio_pci.ko > > 2. Assign the VFs to the virtual machine and enable device live > > migration > > 3. Run a workload using a QAT VF inside the VM, for example using qatlib > > (https://github.com/intel/qatlib) > > 4. Migrate the VM from the source node to a destination node > > > > Changes in v5 since v4: https://lore.kernel.org/kvm/20240228143402.89219-9-xin.zeng@intel.com > > - Remove device ID recheck as no consensus has been reached yet (Kevin) > > - Add missing state PRE_COPY_P2P in precopy_iotcl (Kevin) > > - Rearrange the state transition flow for better readability (Kevin) > > - Remove unnecessary Reviewed-by in commit message (Kevin) > > > > Changes in v4 since v3: https://lore.kernel.org/kvm/20240221155008.960369-11-xin.zeng@intel.com > > - Change the order of maintainer entry for QAT vfio pci driver in > > MAINTAINERS to make it alphabetical (Alex) > > - Put QAT VFIO PCI driver under vfio/pci directly instead of > > vfio/pci/intel (Alex) > > - Add id_table recheck during device probe (Alex) > > > > Changes in v3 since v2: https://lore.kernel.org/kvm/20240220032052.66834-1-xin.zeng@intel.com > > - Use state_mutex directly instead of unnecessary deferred_reset mode > > (Jason) > > > > Changes in v2 since v1: https://lore.kernel.org/all/20240201153337.4033490-1-xin.zeng@intel.com > > - Add VFIO_MIGRATION_PRE_COPY support (Alex) > > - Remove unnecessary module dependancy in Kconfig (Alex) > > - Use direct function calls instead of function pointers in qat vfio > > variant driver (Jason) > > - Address the comments including uncessary pointer check and kfree, > > missing lock and direct use of pci_iov_vf_id (Shameer) > > - Change CHECK_STAT macro to avoid repeat comparison (Kamlesh) > > > > Changes in v1 since RFC: https://lore.kernel.org/all/20230630131304.64243-1-xin.zeng@intel.com > > - Address comments including the right module dependancy in Kconfig, > > source file name and module description (Alex) > > - Added PCI error handler and P2P state handler (Suggested by Kevin) > > - Refactor the state check duing loading ring state (Kevin) > > - Fix missed call to vfio_put_device in the error case (Breet) > > - Migrate the shadow states in PF driver > > - Rebase on top of 6.8-rc1 > > > > Giovanni Cabiddu (2): > > crypto: qat - adf_get_etr_base() helper > > crypto: qat - relocate CSR access code > > > > Siming Wan (3): > > crypto: qat - rename get_sla_arr_of_type() > > crypto: qat - expand CSR operations for QAT GEN4 devices > > crypto: qat - add bank save and restore flows > > > > Xin Zeng (5): > > crypto: qat - relocate and rename 4xxx PF2VM definitions > > crypto: qat - move PFVF compat checker to a function > > crypto: qat - add interface for live migration > > crypto: qat - implement interface for live migration > > vfio/qat: Add vfio_pci driver for Intel QAT VF devices > > > > MAINTAINERS | 8 + > > .../intel/qat/qat_420xx/adf_420xx_hw_data.c | 3 + > > .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c | 5 + > > .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c | 1 + > > .../qat/qat_c3xxxvf/adf_c3xxxvf_hw_data.c | 1 + > > .../intel/qat/qat_c62x/adf_c62x_hw_data.c | 1 + > > .../intel/qat/qat_c62xvf/adf_c62xvf_hw_data.c | 1 + > > drivers/crypto/intel/qat/qat_common/Makefile | 6 +- > > .../intel/qat/qat_common/adf_accel_devices.h | 88 ++ > > .../intel/qat/qat_common/adf_common_drv.h | 10 + > > .../qat/qat_common/adf_gen2_hw_csr_data.c | 101 ++ > > .../qat/qat_common/adf_gen2_hw_csr_data.h | 86 ++ > > .../intel/qat/qat_common/adf_gen2_hw_data.c | 97 -- > > .../intel/qat/qat_common/adf_gen2_hw_data.h | 76 -- > > .../qat/qat_common/adf_gen4_hw_csr_data.c | 231 ++++ > > .../qat/qat_common/adf_gen4_hw_csr_data.h | 188 +++ > > .../intel/qat/qat_common/adf_gen4_hw_data.c | 380 +++++-- > > .../intel/qat/qat_common/adf_gen4_hw_data.h | 127 +-- > > .../intel/qat/qat_common/adf_gen4_pfvf.c | 8 +- > > .../intel/qat/qat_common/adf_gen4_vf_mig.c | 1010 +++++++++++++++++ > > .../intel/qat/qat_common/adf_gen4_vf_mig.h | 10 + > > .../intel/qat/qat_common/adf_mstate_mgr.c | 318 ++++++ > > .../intel/qat/qat_common/adf_mstate_mgr.h | 89 ++ > > .../intel/qat/qat_common/adf_pfvf_pf_proto.c | 8 +- > > .../intel/qat/qat_common/adf_pfvf_utils.h | 11 + > > drivers/crypto/intel/qat/qat_common/adf_rl.c | 10 +- > > drivers/crypto/intel/qat/qat_common/adf_rl.h | 2 + > > .../crypto/intel/qat/qat_common/adf_sriov.c | 7 +- > > .../intel/qat/qat_common/adf_transport.c | 4 +- > > .../crypto/intel/qat/qat_common/qat_mig_dev.c | 130 +++ > > .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c | 1 + > > .../qat_dh895xccvf/adf_dh895xccvf_hw_data.c | 1 + > > drivers/vfio/pci/Kconfig | 2 + > > drivers/vfio/pci/Makefile | 2 + > > drivers/vfio/pci/qat/Kconfig | 12 + > > drivers/vfio/pci/qat/Makefile | 3 + > > drivers/vfio/pci/qat/main.c | 662 +++++++++++ > > include/linux/qat/qat_mig_dev.h | 31 + > > 38 files changed, 3344 insertions(+), 387 deletions(-) > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.c > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen2_hw_csr_data.h > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.c > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_hw_csr_data.h > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.c > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_vf_mig.h > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.c > > create mode 100644 drivers/crypto/intel/qat/qat_common/adf_mstate_mgr.h > > create mode 100644 drivers/crypto/intel/qat/qat_common/qat_mig_dev.c > > create mode 100644 drivers/vfio/pci/qat/Kconfig > > create mode 100644 drivers/vfio/pci/qat/Makefile > > create mode 100644 drivers/vfio/pci/qat/main.c > > create mode 100644 include/linux/qat/qat_mig_dev.h > > > > > > base-commit: 318407ed77e4140d02e43a001b1f4753e3ce6b5f > > -- > > 2.18.2 > > Patches 1-9 applied. Thanks. Hi Herbert, Would you mind making a branch available for those in anticipation of the qat vfio variant driver itself being merged through the vfio tree? Thanks, Alex
On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote: > > Would you mind making a branch available for those in anticipation of > the qat vfio variant driver itself being merged through the vfio tree? > Thanks, OK, I've just pushed out a vfio branch. Please take a look to see if I messed anything up. Cheers,
Hi Alex, On Tue, Apr 02, 2024 at 10:52:06AM +0800, Herbert Xu wrote: > On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote: > > > > Would you mind making a branch available for those in anticipation of > > the qat vfio variant driver itself being merged through the vfio tree? > > Thanks, > > OK, I've just pushed out a vfio branch. Please take a look to > see if I messed anything up. What are the next steps here? Shall we re-send the patch `vfio/qat: Add vfio_pci driver for Intel QAT VF devices` rebased against vfio-next? Or, wait for you to merge the branch from Herbert, then rebase and re-send? Or, are you going to take the patch that was sent to the mailing list as is and handle the rebase? (There is only a small conflict to sort on the makefiles). Thanks,
On Fri, 12 Apr 2024 15:19:14 +0100 "Cabiddu, Giovanni" <giovanni.cabiddu@intel.com> wrote: > Hi Alex, > > On Tue, Apr 02, 2024 at 10:52:06AM +0800, Herbert Xu wrote: > > On Thu, Mar 28, 2024 at 09:03:49AM -0600, Alex Williamson wrote: > > > > > > Would you mind making a branch available for those in anticipation of > > > the qat vfio variant driver itself being merged through the vfio tree? > > > Thanks, > > > > OK, I've just pushed out a vfio branch. Please take a look to > > see if I messed anything up. > What are the next steps here? > > Shall we re-send the patch `vfio/qat: Add vfio_pci driver for Intel QAT > VF devices` rebased against vfio-next? > Or, wait for you to merge the branch from Herbert, then rebase and re-send? > Or, are you going to take the patch that was sent to the mailing list as is > and handle the rebase? (There is only a small conflict to sort on the > makefiles). Hi Giovanni, The code itself looks fine to me, the Makefile conflict is trivial, MAINTAINERS also requires a trivial re-ordering to keep it alphabetical now that virtio-vfio-pci is merged. The only thing I spot that could use some attention is the documentation, where our acceptance criteria requests: Additionally, drivers should make an attempt to provide sufficient documentation for reviewers to understand the device specific extensions, for example in the case of migration data, how is the device state composed and consumed, which portions are not otherwise available to the user via vfio-pci, what safeguards exist to validate the data, etc. A lot of the code here is very similar in flow to the other migration drivers, but I think it would be good to address some of the topics above in comments throughout the driver. For example, how does the driver address P2P states, what information is provided in PRE_COPY, how is versioning handled, is user sensitive data included in the device migration data, typical ranges of device migration data size, etc. Kevin might have an edge in understanding the theory of operation here already and documenting the interesting aspects of the driver in comments might drive a little more engagement. Thanks, Alex