mbox series

[v4,0/2] introduction of migration_version attribute for VFIO live migration

Message ID 20190531004438.24528-1-yan.y.zhao@intel.com (mailing list archive)
Headers show
Series introduction of migration_version attribute for VFIO live migration | expand

Message

Yan Zhao May 31, 2019, 12:44 a.m. UTC
This patchset introduces a migration_version attribute under sysfs of VFIO
Mediated devices.

This migration_version attribute is used to check migration compatibility
between two mdev devices of the same mdev type.

Patch 1 defines migration_version attribute in
Documentation/vfio-mediated-device.txt

Patch 2 uses GVT as an example to show how to expose migration_version
attribute and check migration compatibility in vendor driver.

v4:
1. fixed indentation/spell errors, reworded several error messages
2. added a missing memory free for error handling in patch 2

v3:
1. renamed version to migration_version
2. let errno to be freely defined by vendor driver
3. let checking mdev_type be prerequisite of migration compatibility check
4. reworded most part of patch 1
5. print detailed error log in patch 2 and generate migration_version
string at init time

v2:
1. renamed patched 1
2. made definition of device version string completely private to vendor
driver
3. reverted changes to sample mdev drivers
4. described intent and usage of version attribute more clearly.


Yan Zhao (2):
  vfio/mdev: add migration_version attribute for mdev device
  drm/i915/gvt: export migration_version to mdev sysfs for Intel vGPU

 Documentation/vfio-mediated-device.txt       | 113 +++++++++++++
 drivers/gpu/drm/i915/gvt/Makefile            |   2 +-
 drivers/gpu/drm/i915/gvt/gvt.c               |  39 +++++
 drivers/gpu/drm/i915/gvt/gvt.h               |   5 +
 drivers/gpu/drm/i915/gvt/migration_version.c | 168 +++++++++++++++++++
 drivers/gpu/drm/i915/gvt/vgpu.c              |  13 +-
 6 files changed, 337 insertions(+), 3 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/migration_version.c

Comments

Alex Williamson June 3, 2019, 7:29 p.m. UTC | #1
On Thu, 30 May 2019 20:44:38 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> This patchset introduces a migration_version attribute under sysfs of VFIO
> Mediated devices.
> 
> This migration_version attribute is used to check migration compatibility
> between two mdev devices of the same mdev type.
> 
> Patch 1 defines migration_version attribute in
> Documentation/vfio-mediated-device.txt
> 
> Patch 2 uses GVT as an example to show how to expose migration_version
> attribute and check migration compatibility in vendor driver.

Thanks for iterating through this, it looks like we've settled on
something reasonable, but now what?  This is one piece of the puzzle to
supporting mdev migration, but I don't think it makes sense to commit
this upstream on its own without also defining the remainder of how we
actually do migration, preferably with more than one working
implementation and at least prototyped, if not final, QEMU support.  I
hope that was the intent, and maybe it's now time to look at the next
piece of the puzzle.  Thanks,

Alex
Yan Zhao June 4, 2019, 12:34 a.m. UTC | #2
On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:
> On Thu, 30 May 2019 20:44:38 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > This patchset introduces a migration_version attribute under sysfs of VFIO
> > Mediated devices.
> > 
> > This migration_version attribute is used to check migration compatibility
> > between two mdev devices of the same mdev type.
> > 
> > Patch 1 defines migration_version attribute in
> > Documentation/vfio-mediated-device.txt
> > 
> > Patch 2 uses GVT as an example to show how to expose migration_version
> > attribute and check migration compatibility in vendor driver.
> 
> Thanks for iterating through this, it looks like we've settled on
> something reasonable, but now what?  This is one piece of the puzzle to
> supporting mdev migration, but I don't think it makes sense to commit
> this upstream on its own without also defining the remainder of how we
> actually do migration, preferably with more than one working
> implementation and at least prototyped, if not final, QEMU support.  I
> hope that was the intent, and maybe it's now time to look at the next
> piece of the puzzle.  Thanks,
> 
> Alex

Got it. 
Also thank you and all for discussing and guiding all along:)
We'll move to the next episode now.

Thanks
Yan
Alex Williamson March 23, 2020, 9:29 p.m. UTC | #3
On Mon, 3 Jun 2019 20:34:22 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:
> > On Thu, 30 May 2019 20:44:38 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > Mediated devices.
> > > 
> > > This migration_version attribute is used to check migration compatibility
> > > between two mdev devices of the same mdev type.
> > > 
> > > Patch 1 defines migration_version attribute in
> > > Documentation/vfio-mediated-device.txt
> > > 
> > > Patch 2 uses GVT as an example to show how to expose migration_version
> > > attribute and check migration compatibility in vendor driver.  
> > 
> > Thanks for iterating through this, it looks like we've settled on
> > something reasonable, but now what?  This is one piece of the puzzle to
> > supporting mdev migration, but I don't think it makes sense to commit
> > this upstream on its own without also defining the remainder of how we
> > actually do migration, preferably with more than one working
> > implementation and at least prototyped, if not final, QEMU support.  I
> > hope that was the intent, and maybe it's now time to look at the next
> > piece of the puzzle.  Thanks,
> > 
> > Alex  
> 
> Got it. 
> Also thank you and all for discussing and guiding all along:)
> We'll move to the next episode now.

Hi Yan,

As we're hopefully moving towards a migration API, would it make sense
to refresh this series at the same time?  I think we're still expecting
a vendor driver implementing Kirti's migration API to also implement
this sysfs interface for compatibility verification.  Thanks,

Alex
Yan Zhao March 24, 2020, 3:53 a.m. UTC | #4
On Tue, Mar 24, 2020 at 05:29:59AM +0800, Alex Williamson wrote:
> On Mon, 3 Jun 2019 20:34:22 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:
> > > On Thu, 30 May 2019 20:44:38 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > Mediated devices.
> > > > 
> > > > This migration_version attribute is used to check migration compatibility
> > > > between two mdev devices of the same mdev type.
> > > > 
> > > > Patch 1 defines migration_version attribute in
> > > > Documentation/vfio-mediated-device.txt
> > > > 
> > > > Patch 2 uses GVT as an example to show how to expose migration_version
> > > > attribute and check migration compatibility in vendor driver.  
> > > 
> > > Thanks for iterating through this, it looks like we've settled on
> > > something reasonable, but now what?  This is one piece of the puzzle to
> > > supporting mdev migration, but I don't think it makes sense to commit
> > > this upstream on its own without also defining the remainder of how we
> > > actually do migration, preferably with more than one working
> > > implementation and at least prototyped, if not final, QEMU support.  I
> > > hope that was the intent, and maybe it's now time to look at the next
> > > piece of the puzzle.  Thanks,
> > > 
> > > Alex  
> > 
> > Got it. 
> > Also thank you and all for discussing and guiding all along:)
> > We'll move to the next episode now.
> 
> Hi Yan,
> 
> As we're hopefully moving towards a migration API, would it make sense
> to refresh this series at the same time?  I think we're still expecting
> a vendor driver implementing Kirti's migration API to also implement
> this sysfs interface for compatibility verification.  Thanks,
>
Hi Alex
Got it!
Thanks for reminding of this. And as now we have vfio-pci implementing
vendor ops to allow live migration of pass-through devices, is it
necessary to implement similar sysfs node for those devices?
or do you think just PCI IDs of those devices are enough for libvirt to
know device compatibility ?

Thanks
Yan
Dr. David Alan Gilbert March 24, 2020, 9:23 a.m. UTC | #5
* Yan Zhao (yan.y.zhao@intel.com) wrote:
> On Tue, Mar 24, 2020 at 05:29:59AM +0800, Alex Williamson wrote:
> > On Mon, 3 Jun 2019 20:34:22 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > 
> > > On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:
> > > > On Thu, 30 May 2019 20:44:38 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >   
> > > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > > Mediated devices.
> > > > > 
> > > > > This migration_version attribute is used to check migration compatibility
> > > > > between two mdev devices of the same mdev type.
> > > > > 
> > > > > Patch 1 defines migration_version attribute in
> > > > > Documentation/vfio-mediated-device.txt
> > > > > 
> > > > > Patch 2 uses GVT as an example to show how to expose migration_version
> > > > > attribute and check migration compatibility in vendor driver.  
> > > > 
> > > > Thanks for iterating through this, it looks like we've settled on
> > > > something reasonable, but now what?  This is one piece of the puzzle to
> > > > supporting mdev migration, but I don't think it makes sense to commit
> > > > this upstream on its own without also defining the remainder of how we
> > > > actually do migration, preferably with more than one working
> > > > implementation and at least prototyped, if not final, QEMU support.  I
> > > > hope that was the intent, and maybe it's now time to look at the next
> > > > piece of the puzzle.  Thanks,
> > > > 
> > > > Alex  
> > > 
> > > Got it. 
> > > Also thank you and all for discussing and guiding all along:)
> > > We'll move to the next episode now.
> > 
> > Hi Yan,
> > 
> > As we're hopefully moving towards a migration API, would it make sense
> > to refresh this series at the same time?  I think we're still expecting
> > a vendor driver implementing Kirti's migration API to also implement
> > this sysfs interface for compatibility verification.  Thanks,
> >
> Hi Alex
> Got it!
> Thanks for reminding of this. And as now we have vfio-pci implementing
> vendor ops to allow live migration of pass-through devices, is it
> necessary to implement similar sysfs node for those devices?
> or do you think just PCI IDs of those devices are enough for libvirt to
> know device compatibility ?

Wasn't the problem that we'd have to know how to check for things like:
  a) Whether different firmware versions in the device were actually
compatible
  b) Whether minor hardware differences were compatible - e.g. some
hardware might let you migrate to the next version of hardware up.

Dave

> Thanks
> Yan
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Alex Williamson March 24, 2020, 2:49 p.m. UTC | #6
On Tue, 24 Mar 2020 09:23:31 +0000
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > On Tue, Mar 24, 2020 at 05:29:59AM +0800, Alex Williamson wrote:  
> > > On Mon, 3 Jun 2019 20:34:22 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:  
> > > > > On Thu, 30 May 2019 20:44:38 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > >     
> > > > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > > > Mediated devices.
> > > > > > 
> > > > > > This migration_version attribute is used to check migration compatibility
> > > > > > between two mdev devices of the same mdev type.
> > > > > > 
> > > > > > Patch 1 defines migration_version attribute in
> > > > > > Documentation/vfio-mediated-device.txt
> > > > > > 
> > > > > > Patch 2 uses GVT as an example to show how to expose migration_version
> > > > > > attribute and check migration compatibility in vendor driver.    
> > > > > 
> > > > > Thanks for iterating through this, it looks like we've settled on
> > > > > something reasonable, but now what?  This is one piece of the puzzle to
> > > > > supporting mdev migration, but I don't think it makes sense to commit
> > > > > this upstream on its own without also defining the remainder of how we
> > > > > actually do migration, preferably with more than one working
> > > > > implementation and at least prototyped, if not final, QEMU support.  I
> > > > > hope that was the intent, and maybe it's now time to look at the next
> > > > > piece of the puzzle.  Thanks,
> > > > > 
> > > > > Alex    
> > > > 
> > > > Got it. 
> > > > Also thank you and all for discussing and guiding all along:)
> > > > We'll move to the next episode now.  
> > > 
> > > Hi Yan,
> > > 
> > > As we're hopefully moving towards a migration API, would it make sense
> > > to refresh this series at the same time?  I think we're still expecting
> > > a vendor driver implementing Kirti's migration API to also implement
> > > this sysfs interface for compatibility verification.  Thanks,
> > >  
> > Hi Alex
> > Got it!
> > Thanks for reminding of this. And as now we have vfio-pci implementing
> > vendor ops to allow live migration of pass-through devices, is it
> > necessary to implement similar sysfs node for those devices?
> > or do you think just PCI IDs of those devices are enough for libvirt to
> > know device compatibility ?  
> 
> Wasn't the problem that we'd have to know how to check for things like:
>   a) Whether different firmware versions in the device were actually
> compatible
>   b) Whether minor hardware differences were compatible - e.g. some
> hardware might let you migrate to the next version of hardware up.

Yes, minor changes in hardware or firmware that may not be represented
in the device ID or hardware revision.  Also the version is as much for
indicating the compatibility of the vendor defined migration protocol
as it is for the hardware itself.  I certainly wouldn't be so bold as
to create a protocol that is guaranteed compatible forever.  We'll need
to expose the same sysfs attribute in some standard location for
non-mdev devices.  I assume vfio-pci would provide the vendor ops some
mechanism to expose these in a standard namespace of sysfs attributes
under the device itself.  Perhaps that indicates we need to link the
mdev type version under the mdev device as well to make this
transparent to userspace tools like libvirt.  Thanks,

Alex
Yan Zhao March 25, 2020, 12:56 a.m. UTC | #7
On Tue, Mar 24, 2020 at 10:49:54PM +0800, Alex Williamson wrote:
> On Tue, 24 Mar 2020 09:23:31 +0000
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Yan Zhao (yan.y.zhao@intel.com) wrote:
> > > On Tue, Mar 24, 2020 at 05:29:59AM +0800, Alex Williamson wrote:  
> > > > On Mon, 3 Jun 2019 20:34:22 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >   
> > > > > On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:  
> > > > > > On Thu, 30 May 2019 20:44:38 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > >     
> > > > > > > This patchset introduces a migration_version attribute under sysfs of VFIO
> > > > > > > Mediated devices.
> > > > > > > 
> > > > > > > This migration_version attribute is used to check migration compatibility
> > > > > > > between two mdev devices of the same mdev type.
> > > > > > > 
> > > > > > > Patch 1 defines migration_version attribute in
> > > > > > > Documentation/vfio-mediated-device.txt
> > > > > > > 
> > > > > > > Patch 2 uses GVT as an example to show how to expose migration_version
> > > > > > > attribute and check migration compatibility in vendor driver.    
> > > > > > 
> > > > > > Thanks for iterating through this, it looks like we've settled on
> > > > > > something reasonable, but now what?  This is one piece of the puzzle to
> > > > > > supporting mdev migration, but I don't think it makes sense to commit
> > > > > > this upstream on its own without also defining the remainder of how we
> > > > > > actually do migration, preferably with more than one working
> > > > > > implementation and at least prototyped, if not final, QEMU support.  I
> > > > > > hope that was the intent, and maybe it's now time to look at the next
> > > > > > piece of the puzzle.  Thanks,
> > > > > > 
> > > > > > Alex    
> > > > > 
> > > > > Got it. 
> > > > > Also thank you and all for discussing and guiding all along:)
> > > > > We'll move to the next episode now.  
> > > > 
> > > > Hi Yan,
> > > > 
> > > > As we're hopefully moving towards a migration API, would it make sense
> > > > to refresh this series at the same time?  I think we're still expecting
> > > > a vendor driver implementing Kirti's migration API to also implement
> > > > this sysfs interface for compatibility verification.  Thanks,
> > > >  
> > > Hi Alex
> > > Got it!
> > > Thanks for reminding of this. And as now we have vfio-pci implementing
> > > vendor ops to allow live migration of pass-through devices, is it
> > > necessary to implement similar sysfs node for those devices?
> > > or do you think just PCI IDs of those devices are enough for libvirt to
> > > know device compatibility ?  
> > 
> > Wasn't the problem that we'd have to know how to check for things like:
> >   a) Whether different firmware versions in the device were actually
> > compatible
> >   b) Whether minor hardware differences were compatible - e.g. some
> > hardware might let you migrate to the next version of hardware up.
> 
> Yes, minor changes in hardware or firmware that may not be represented
> in the device ID or hardware revision.  Also the version is as much for
> indicating the compatibility of the vendor defined migration protocol
> as it is for the hardware itself.  I certainly wouldn't be so bold as
> to create a protocol that is guaranteed compatible forever.  We'll need
> to expose the same sysfs attribute in some standard location for
> non-mdev devices.  I assume vfio-pci would provide the vendor ops some
> mechanism to expose these in a standard namespace of sysfs attributes
> under the device itself.  Perhaps that indicates we need to link the
> mdev type version under the mdev device as well to make this
> transparent to userspace tools like libvirt.  Thanks,
>
Got it. will do it.
Thanks!

Yan