Message ID | 20220221114043.2030-1-shameerali.kolothum.thodi@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | vfio/hisilicon: add ACC live migration driver | expand |
On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote: > > Hi, > > This series attempts to add vfio live migration support for > HiSilicon ACC VF devices based on the new v2 migration protocol > definition and mlx5 v8 series discussed here[0]. > > RFCv4 --> v5 > - Dropped RFC tag as v2 migration APIs are more stable now. > - Addressed review comments from Jason and Alex (Thanks!). > > This is sanity tested on a HiSilicon platform using the Qemu branch > provided here[1]. > > Please take a look and let me know your feedback. > > Thanks, > Shameer > [0] https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/ > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 > > > v3 --> RFCv4 > -Based on migration v2 protocol and mlx5 v7 series. > -Added RFC tag again as migration v2 protocol is still under discussion. > -Added new patch #6 to retrieve the PF QM data. > -PRE_COPY compatibility check is now done after the migration data > transfer. This is not ideal and needs discussion. Alex, do you want to keep the PRE_COPY in just for acc for now? Or do you think this is not a good temporary use for it? We have some work toward doing the compatability more generally, but I think it will be a while before that is all settled. Jason
On Mon, 21 Feb 2022 20:49:43 -0400 Jason Gunthorpe <jgg@nvidia.com> wrote: > On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote: > > > > Hi, > > > > This series attempts to add vfio live migration support for > > HiSilicon ACC VF devices based on the new v2 migration protocol > > definition and mlx5 v8 series discussed here[0]. > > > > RFCv4 --> v5 > > - Dropped RFC tag as v2 migration APIs are more stable now. > > - Addressed review comments from Jason and Alex (Thanks!). > > > > This is sanity tested on a HiSilicon platform using the Qemu branch > > provided here[1]. > > > > Please take a look and let me know your feedback. > > > > Thanks, > > Shameer > > [0] https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/ > > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 > > > > > > v3 --> RFCv4 > > -Based on migration v2 protocol and mlx5 v7 series. > > -Added RFC tag again as migration v2 protocol is still under discussion. > > -Added new patch #6 to retrieve the PF QM data. > > -PRE_COPY compatibility check is now done after the migration data > > transfer. This is not ideal and needs discussion. > > Alex, do you want to keep the PRE_COPY in just for acc for now? Or do > you think this is not a good temporary use for it? > > We have some work toward doing the compatability more generally, but I > think it will be a while before that is all settled. In the original migration protocol I recall that we discussed that using the pre-copy phase for compatibility testing, even without additional device data, as a valid use case. The migration driver of course needs to account for the fact that userspace is not required to perform a pre-copy, and therefore cannot rely on that exclusively for compatibility testing, but failing a migration earlier due to detection of an incompatibility is generally a good thing. If the ACC driver wants to re-incorporate this behavior into a non-RFC proposed series and we could align accepting them into the same kernel release, that sounds ok to me. Thanks, Alex
> -----Original Message----- > From: Alex Williamson [mailto:alex.williamson@redhat.com] > Sent: 22 February 2022 19:30 > To: Jason Gunthorpe <jgg@nvidia.com> > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>; > kvm@vger.kernel.org; linux-kernel@vger.kernel.org; > linux-crypto@vger.kernel.org; cohuck@redhat.com; mgurtovoy@nvidia.com; > yishaih@nvidia.com; Linuxarm <linuxarm@huawei.com>; liulongfang > <liulongfang@huawei.com>; Zengtao (B) <prime.zeng@hisilicon.com>; > Jonathan Cameron <jonathan.cameron@huawei.com>; Wangzhou (B) > <wangzhou1@hisilicon.com> > Subject: Re: [PATCH v5 0/8] vfio/hisilicon: add ACC live migration driver > > On Mon, 21 Feb 2022 20:49:43 -0400 > Jason Gunthorpe <jgg@nvidia.com> wrote: > > > On Mon, Feb 21, 2022 at 11:40:35AM +0000, Shameer Kolothum wrote: > > > > > > Hi, > > > > > > This series attempts to add vfio live migration support for > > > HiSilicon ACC VF devices based on the new v2 migration protocol > > > definition and mlx5 v8 series discussed here[0]. > > > > > > RFCv4 --> v5 > > > - Dropped RFC tag as v2 migration APIs are more stable now. > > > - Addressed review comments from Jason and Alex (Thanks!). > > > > > > This is sanity tested on a HiSilicon platform using the Qemu branch > > > provided here[1]. > > > > > > Please take a look and let me know your feedback. > > > > > > Thanks, > > > Shameer > > > [0] > https://lore.kernel.org/kvm/20220220095716.153757-1-yishaih@nvidia.com/ > > > [1] https://github.com/jgunthorpe/qemu/commits/vfio_migration_v2 > > > > > > > > > v3 --> RFCv4 > > > -Based on migration v2 protocol and mlx5 v7 series. > > > -Added RFC tag again as migration v2 protocol is still under discussion. > > > -Added new patch #6 to retrieve the PF QM data. > > > -PRE_COPY compatibility check is now done after the migration data > > > transfer. This is not ideal and needs discussion. > > > > Alex, do you want to keep the PRE_COPY in just for acc for now? Or do > > you think this is not a good temporary use for it? > > > > We have some work toward doing the compatability more generally, but I > > think it will be a while before that is all settled. > > In the original migration protocol I recall that we discussed that > using the pre-copy phase for compatibility testing, even without > additional device data, as a valid use case. The migration driver of > course needs to account for the fact that userspace is not required to > perform a pre-copy, and therefore cannot rely on that exclusively for > compatibility testing, but failing a migration earlier due to detection > of an incompatibility is generally a good thing. > > If the ACC driver wants to re-incorporate this behavior into a non-RFC > proposed series and we could align accepting them into the same kernel > release, that sounds ok to me. Thanks, Ok. I will add the support to PRE_COPY and check compatibility early. From FSM arc point of view, I guess it is adding, STATE_RUNNING --> STATE_PRE_COPY create the saving file. get_match_data(); return fd; STATE_PRE_COPY --> STATE_STOP_COPY stop_device() get_device_data() update the saving migf total_len; resume_write() check compatibility once we have enough bytes. Also add support to IOCTL VFIO_DEVICE_MIG_PRECOPY. I will have a go and sent out a revised one. Thanks, Shameer