mbox series

[v2,0/2] introduction of version attribute for VFIO live migration

Message ID 20190506014514.3555-1-yan.y.zhao@intel.com (mailing list archive)
Headers show
Series introduction of version attribute for VFIO live migration | expand

Message

Yan Zhao May 6, 2019, 1:45 a.m. UTC
This patchset introduces a version attribute under sysfs of VFIO Mediated
devices.

This version attribute is used to check whether two mdev devices are
compatible.
user space software can take advantage of this version attribute to
determine whether to launch live migration between two mdev devices.

Patch 1 defines version attribute in Documentation/vfio-mediated-device.txt

Patch 2 uses GVT as an example to show how to expose version attribute and
check device compatibility in vendor driver.


v2:
1. renamed patched 1
2. made definition of device version string completely private to vendor
   driver
3. abandoned changes to sample mdev drivers
4. described intent and usage of version attribute more clearly.

Yan Zhao (2):
  vfio/mdev: add version attribute for mdev device
  drm/i915/gvt: export mdev device version to sysfs for Intel vGPU

 Documentation/vfio-mediated-device.txt    | 135 ++++++++++++++++++++++
 drivers/gpu/drm/i915/gvt/Makefile         |   2 +-
 drivers/gpu/drm/i915/gvt/device_version.c |  84 ++++++++++++++
 drivers/gpu/drm/i915/gvt/gvt.c            |  51 ++++++++
 drivers/gpu/drm/i915/gvt/gvt.h            |   6 +
 5 files changed, 277 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/gvt/device_version.c

Comments

Cornelia Huck May 7, 2019, 9:19 a.m. UTC | #1
On Sun,  5 May 2019 21:49:04 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> version attribute is used to check two mdev devices' compatibility.
> 
> The key point of this version attribute is that it's rw.
> User space has no need to understand internal of device version and no
> need to compare versions by itself.
> Compared to reading version strings from both two mdev devices being
> checked, user space only reads from one mdev device's version attribute.
> After getting its version string, user space writes this string into the
> other mdev device's version attribute. Vendor driver of mdev device
> whose version attribute being written will check device compatibility of
> the two mdev devices for user space and return success for compatibility
> or errno for incompatibility.

I'm still missing a bit _what_ is actually supposed to be
compatible/incompatible. I'd assume some internal state descriptions
(even if this is not actually limited to migration).

> So two readings of version attributes + checking in user space are now
> changed to one reading + one writing of version attributes + checking in
> vendor driver.

I'm not sure that needs to go into the patch description (sounds like
it is rather a change log?)

> Format and length of version strings are now private to vendor driver
> who can define them freely.

Same here; simply drop the 'now'?

> 
>              __ user space
>               /\          \
>              /             \write
>             / read          \
>      ______/__           ___\|/___
>     | version |         | version |-->check compatibility
>     -----------         -----------
>     mdev device A       mdev device B
> 
> This version attribute is optional. If a mdev device does not provide
> with a version attribute, this mdev device is incompatible to all other
> mdev devices.

Again, I'd like an explanation here what kind of compatibility we're
talking about.

> 
> Live migration is able to take advantage of this version attribute.
> Before user space actually starts live migration, it can first check
> whether two mdev devices are compatible.
> 
> v2:
> 1. added detailed intent and usage
> 2. made definition of version string completely private to vendor driver
>    (Alex Williamson)
> 3. abandoned changes to sample mdev drivers (Alex Williamson)
> 4. mandatory --> optional (Cornelia Huck)
> 5. added description for errno (Cornelia Huck)

This changelog should go below the ---, so that it does not actually
show up in the patch description later :)

> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Erik Skultety <eskultet@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> Cc: Neo Jia <cjia@nvidia.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Christophe de Dinechin <dinechin@redhat.com>
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
>  1 file changed, 140 insertions(+)
> 
> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> index c3f69bcaf96e..013a764968eb 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |     |   |--- create
> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |          |--- create
> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
>    |          |--- available_instances
>    |          |--- device_api
>    |          |--- description
> +  |          |--- version
>    |          |--- [devices]
>  
>  * [mdev_supported_types]
> @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
>    This attribute should show the number of devices of type <type-id> that can be
>    created.
>  
> +* version
> +
> +  This attribute is rw, and is optional.
> +  It is used to check device compatibility between two mdev devices and is
> +  accessed in pairs between the two mdev devices being checked.
> +  The intent of this attribute is to make an mdev device's version opaque to
> +  user space, so instead of reading two mdev devices' version strings and
> +  comparing in userspace, user space should only read one mdev device's version
> +  attribute, and writes this version string into the other mdev device's version
> +  attribute. Then vendor driver of mdev device whose version attribute being
> +  written would check the incoming version string and tell user space whether
> +  the two mdev devices are compatible via return value. That's why this
> +  attribute is writable.

I would reword this a bit:

"This attribute provides a way to check device compatibility between
two mdev devices from userspace. The intended usage is for userspace to
read the version attribute from one mdev device and then writing that
value to the version attribute of the other mdev device. The second
mdev device indicates compatibility via the return code of the write
operation. This makes compatibility between mdev devices completely
vendor-defined and opaque to userspace."

We still should explain _what_ compatibility we're talking about here,
though.

> +
> +  when reading this attribute, it should show device version string of
> +  the device of type <type-id>.
> +
> +  This string is private to vendor driver itself. Vendor driver is able to
> +  freely define format and length of device version string.
> +  e.g. It can use a combination of pciid of parent device + mdev type.
> +
> +  When writing a string to this attribute, vendor driver should analyze this
> +  string and check whether the mdev device being identified by this string is
> +  compatible with the mdev device for this attribute. vendor driver should then
> +  return written string's length if it regards the two mdev devices are
> +  compatible; vendor driver should return negative errno if it regards the two
> +  mdev devices are not compatible.
> +
> +  User space should treat ANY of below conditions as two mdev devices not
> +  compatible:
> +  (1) any one of the two mdev devices does not have a version attribute
> +  (2) error when read from one mdev device's version attribute

s/read/reading/

> +  (3) error when write one mdev device's version string to the other mdev

s/write/writing/

> +  device's version attribute
> +
> +  User space should regard two mdev devices compatible when ALL of below
> +  conditions are met:
> +  (1) success when read from one mdev device's version attribute.

s/read/reading/

> +  (2) success when write one mdev device's version string to the other mdev

s/write/writing/

> +  device's version attribute
> +
> +  Errno:
> +  If vendor driver wants to claim a mdev device incompatible to all other mdev

"If the vendor driver wants to designate a mdev device..."

> +  devices, it should not register version attribute for this mdev device. But if
> +  a vendor driver has already registered version attribute and it wants to claim
> +  a mdev device incompatible to all other mdev devices, it needs to return
> +  -ENODEV on access to this mdev device's version attribute.
> +  If a mdev device is only incompatible to certain mdev devices, write of
> +  incompatible mdev devices's version strings to its version attribute should
> +  return -EINVAL;


Maybe put the defined return code into a bulleted list instead? But
this looks reasonable as well.

> +
> +  This attribute can be taken advantage of by live migration.
> +  If user space detects two mdev devices are compatible through version
> +  attribute, it can start migration between the two mdev devices, otherwise it
> +  should abort its migration attempts between the two mdev devices.

(...)
Alex Williamson May 7, 2019, 9:18 p.m. UTC | #2
On Sun,  5 May 2019 21:49:04 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> version attribute is used to check two mdev devices' compatibility.
> 
> The key point of this version attribute is that it's rw.
> User space has no need to understand internal of device version and no
> need to compare versions by itself.
> Compared to reading version strings from both two mdev devices being
> checked, user space only reads from one mdev device's version attribute.
> After getting its version string, user space writes this string into the
> other mdev device's version attribute. Vendor driver of mdev device
> whose version attribute being written will check device compatibility of
> the two mdev devices for user space and return success for compatibility
> or errno for incompatibility.
> So two readings of version attributes + checking in user space are now
> changed to one reading + one writing of version attributes + checking in
> vendor driver.
> Format and length of version strings are now private to vendor driver
> who can define them freely.
> 
>              __ user space
>               /\          \
>              /             \write
>             / read          \
>      ______/__           ___\|/___
>     | version |         | version |-->check compatibility
>     -----------         -----------
>     mdev device A       mdev device B
> 
> This version attribute is optional. If a mdev device does not provide
> with a version attribute, this mdev device is incompatible to all other
> mdev devices.
> 
> Live migration is able to take advantage of this version attribute.
> Before user space actually starts live migration, it can first check
> whether two mdev devices are compatible.
> 
> v2:
> 1. added detailed intent and usage
> 2. made definition of version string completely private to vendor driver
>    (Alex Williamson)
> 3. abandoned changes to sample mdev drivers (Alex Williamson)
> 4. mandatory --> optional (Cornelia Huck)
> 5. added description for errno (Cornelia Huck)
> 
> Cc: Alex Williamson <alex.williamson@redhat.com>
> Cc: Erik Skultety <eskultet@redhat.com>
> Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> Cc: Cornelia Huck <cohuck@redhat.com>
> Cc: "Tian, Kevin" <kevin.tian@intel.com>
> Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> Cc: Neo Jia <cjia@nvidia.com>
> Cc: Kirti Wankhede <kwankhede@nvidia.com>
> Cc: Daniel P. Berrangé <berrange@redhat.com>
> Cc: Christophe de Dinechin <dinechin@redhat.com>
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
>  1 file changed, 140 insertions(+)
> 
> diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> index c3f69bcaf96e..013a764968eb 100644
> --- a/Documentation/vfio-mediated-device.txt
> +++ b/Documentation/vfio-mediated-device.txt
> @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |     |   |--- create
> @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
>    |     |   |--- available_instances
>    |     |   |--- device_api
>    |     |   |--- description
> +  |     |   |--- version
>    |     |   |--- [devices]
>    |     |--- [<type-id>]
>    |          |--- create
> @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
>    |          |--- available_instances
>    |          |--- device_api
>    |          |--- description
> +  |          |--- version
>    |          |--- [devices]

I thought there was a request to make this more specific to migration
by renaming it to something like migration_version.  Also, as an
optional attribute, it seems the example should perhaps not add it to
all types to illustrate that it is not required.

>  
>  * [mdev_supported_types]
> @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
>    This attribute should show the number of devices of type <type-id> that can be
>    created.
>  
> +* version
> +
> +  This attribute is rw, and is optional.
> +  It is used to check device compatibility between two mdev devices and is

between two mdev devices of the same type.

> +  accessed in pairs between the two mdev devices being checked.

"in pairs"?

> +  The intent of this attribute is to make an mdev device's version opaque to
> +  user space, so instead of reading two mdev devices' version strings and

perhaps "...instead of reading the version string of two mdev devices
and comparing them in userspace..."

> +  comparing in userspace, user space should only read one mdev device's version
> +  attribute, and writes this version string into the other mdev device's version
> +  attribute. Then vendor driver of mdev device whose version attribute being
> +  written would check the incoming version string and tell user space whether
> +  the two mdev devices are compatible via return value. That's why this
> +  attribute is writable.
> +
> +  when reading this attribute, it should show device version string of
> +  the device of type <type-id>.
> +
> +  This string is private to vendor driver itself. Vendor driver is able to
> +  freely define format and length of device version string.
> +  e.g. It can use a combination of pciid of parent device + mdev type.

Can the user assume the data contents of the string is ascii
characters?  It's good that the vendor driver defines the format and
length, but the user probably needs some expectation bounding that
length.  Should we define it as no larger than PATH_MAX (4096), or maybe
NAME_MAX (255) might be more reasonable?

> +
> +  When writing a string to this attribute, vendor driver should analyze this
> +  string and check whether the mdev device being identified by this string is
> +  compatible with the mdev device for this attribute. vendor driver should then

Compatible for what purpose?  I think this is where specifically
calling this a migration_version potentially has value.

> +  return written string's length if it regards the two mdev devices are
> +  compatible; vendor driver should return negative errno if it regards the two
> +  mdev devices are not compatible.

IOW, the write(2) will succeed if the version is determined to be
compatible and otherwise fail with vendor specific errno.

> +
> +  User space should treat ANY of below conditions as two mdev devices not
> +  compatible:

(0) The mdev devices are not of the same type.

> +  (1) any one of the two mdev devices does not have a version attribute
> +  (2) error when read from one mdev device's version attribute

Is this intended to support that the vendor driver can supply a version
attribute but not support migration?  TBH, this sounds like a vendor
driver bug, but maybe it's necessary if the vendor driver could have
some types that support migration and others that do not?  IOW, we're
supplying the same attribute groups to all devices from a vendor, in
which case my comment above regarding an example type without a version
attribute might be invalid.

> +  (3) error when write one mdev device's version string to the other mdev
> +  device's version attribute
> +
> +  User space should regard two mdev devices compatible when ALL of below
> +  conditions are met:

(0) The mdev devices are of the same type

> +  (1) success when read from one mdev device's version attribute.
> +  (2) success when write one mdev device's version string to the other mdev
> +  device's version attribute
> +
> +  Errno:
> +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> +  devices, it should not register version attribute for this mdev device. But if
> +  a vendor driver has already registered version attribute and it wants to claim
> +  a mdev device incompatible to all other mdev devices, it needs to return
> +  -ENODEV on access to this mdev device's version attribute.
> +  If a mdev device is only incompatible to certain mdev devices, write of
> +  incompatible mdev devices's version strings to its version attribute should
> +  return -EINVAL;

I think it's best not to define the specific errno returned for a
specific situation, let the vendor driver decide, userspace simply
needs to know that an errno on read indicates the device does not
support migration version comparison and that an errno on write
indicates the devices are incompatible or the target doesn't support
migration versions.

> +
> +  This attribute can be taken advantage of by live migration.
> +  If user space detects two mdev devices are compatible through version
> +  attribute, it can start migration between the two mdev devices, otherwise it
> +  should abort its migration attempts between the two mdev devices.
> +
> +  Example Usage:
> +  case 1:
> +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> +
> +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> +  ../mdev_supported_types/i915-GVTg_V5_4
> +
> +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> +  ../mdev_supported_types/i915-GVTg_V5_4
> +
> +  (1) read source side mdev device's version.
> +  #cat \
> +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> +    mdev_type/version
> +  8086-193b-i915-GVTg_V5_4

Is this really the version information exposed in 2/2?  This is opaque,
so of course you can add things later, but it seems short sighted not
to even append a version 0 tag to account for software compatibility
differences since the above only represents a parent and mdev type
based version.

> +  (2) write source side mdev device's version string into target side mdev
> +  device's version attribute.
> +  # echo 8086-193b-i915-GVTg_V5_4 >
> +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> +  mdev_type/version
> +  # echo $?
> +  0

TBH, there's a lot of superfluous information in this example that can
be stripped out.  For example:

"
(1) Compare mdev types:

The mdev type of an instantiated device can be read from the mdev_type
link within the device instance in sysfs, for example:

  # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)

The mdev types available on a given host system can also be found
through /sys/class/mdev_bus, for example:

  # ls /sys/class/mdev_bus/*/mdev_supported_types/

Migration is only possible between devices of the same mdev type.

(2) Retrieve the mdev source version:

The migration version information can either be read from the mdev_type
link on an instantiated device:

  # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version

Or it can be read from the mdev type definition, for example:

  # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version

If reading the source version generates an error, migration is not
possible.  NB, there might be several parent devices for a given mdev
type on a host system, each may support or expose different versions.
Matching the specific mdev type to a parent may become important in
such configurations.

(3) Test source version at target:

Given a version as outlined above, its compatibility to an instantiated
device of the same mdev type can be tested as:

  # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version

If this write fails, the source and target versions are not compatible
or the target does not support migration.

Compatibility can also be tested prior to target device creation using
the mdev type definition for a parent device with a previously found
matching mdev type, for example:

  # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version

Again, an error writing the version indicates that an instance of this
mdev type would not support a migration from the provided version.
"

In particular from the provided example, the specific UUIDs, mdev
types, parent information, and contents of the version attribute do not
contribute to illustrating the protocol.  In fact, displaying the
contents of the version attribute may tempt users to do comparison on
their own, especially given how easy it is to decide the GVT-g version
string.


> +
> +  in this case, user space's write to target side mdev device's version
> +  attribute returns success to indicate the two mdev devices are compatible.
> +
> +  case 2:
> +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b.
> +
> +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> +  ../mdev_supported_types/i915-GVTg_V5_4
> +
> +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> +  ../mdev_supported_types/i915-GVTg_V5_4
> +
> +  (1) read source side mdev device's version.
> +  #cat \
> +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> +    mdev_type/version
> +  8086-193b-i915-GVTg_V5_4
> +
> +  (2) write source side mdev device's version string into target side mdev
> +  device's version attribute.
> +  # echo 8086-193b-i915-GVTg_V5_4 >
> +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> +  mdev_type/version
> +  -bash: echo: write error: Invalid argument
> +
> +  in this case, user space's write to target side mdev device's version
> +  attribute returns error to indicate the two mdev devices are incompatible.
> +  (incompatible because pci ids of the two mdev devices' parent devices are
> +  different).
> +
> +  case 3:
> +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> +  But vendor driver does not provide version attribute for this device.
> +
> +  (1) read source side mdev device's version.
> +  #cat \
> +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> +    mdev_type/version
> +  cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> +  mdev_type/version': No such file or directory
> +
> +  in this case, user space reads source side mdev device's version attribute
> +  which does not exist however. user space regards the two mdev devices as not
> +  compatible and will not start migration between the two mdev devices.
> +
> +

This is far too long for description and examples, it's not this
complicated.  Thanks,

Alex

>  * [device]
>  
>    This directory contains links to the devices of type <type-id> that have been
Yan Zhao May 8, 2019, 11:27 a.m. UTC | #3
On Wed, May 08, 2019 at 05:18:26AM +0800, Alex Williamson wrote:
> On Sun,  5 May 2019 21:49:04 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > version attribute is used to check two mdev devices' compatibility.
> > 
> > The key point of this version attribute is that it's rw.
> > User space has no need to understand internal of device version and no
> > need to compare versions by itself.
> > Compared to reading version strings from both two mdev devices being
> > checked, user space only reads from one mdev device's version attribute.
> > After getting its version string, user space writes this string into the
> > other mdev device's version attribute. Vendor driver of mdev device
> > whose version attribute being written will check device compatibility of
> > the two mdev devices for user space and return success for compatibility
> > or errno for incompatibility.
> > So two readings of version attributes + checking in user space are now
> > changed to one reading + one writing of version attributes + checking in
> > vendor driver.
> > Format and length of version strings are now private to vendor driver
> > who can define them freely.
> > 
> >              __ user space
> >               /\          \
> >              /             \write
> >             / read          \
> >      ______/__           ___\|/___
> >     | version |         | version |-->check compatibility
> >     -----------         -----------
> >     mdev device A       mdev device B
> > 
> > This version attribute is optional. If a mdev device does not provide
> > with a version attribute, this mdev device is incompatible to all other
> > mdev devices.
> > 
> > Live migration is able to take advantage of this version attribute.
> > Before user space actually starts live migration, it can first check
> > whether two mdev devices are compatible.
> > 
> > v2:
> > 1. added detailed intent and usage
> > 2. made definition of version string completely private to vendor driver
> >    (Alex Williamson)
> > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > 4. mandatory --> optional (Cornelia Huck)
> > 5. added description for errno (Cornelia Huck)
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Erik Skultety <eskultet@redhat.com>
> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > Cc: Neo Jia <cjia@nvidia.com>
> > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Cc: Christophe de Dinechin <dinechin@redhat.com>
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
> >  1 file changed, 140 insertions(+)
> > 
> > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > index c3f69bcaf96e..013a764968eb 100644
> > --- a/Documentation/vfio-mediated-device.txt
> > +++ b/Documentation/vfio-mediated-device.txt
> > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |     |   |--- create
> > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |          |--- create
> > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |          |--- available_instances
> >    |          |--- device_api
> >    |          |--- description
> > +  |          |--- version
> >    |          |--- [devices]
> 
> I thought there was a request to make this more specific to migration
> by renaming it to something like migration_version.  Also, as an
>
so this attribute may not only include a mdev device's parent device info and
mdev type, but also include numeric software version of vendor specific
migration code, right?
This actually makes sense.
So, do I need to add a disclaimer in this doc like:
vendor driver should be responsible by itself for a mdev device's migration
compatibility. 
During migration setup phase, general migration code in user space VFIO only
checks this version of VFIO migration region, and will not check software version
of vendor specific migration code.
It is suggested to incorporate at least parent device info and software version
of vendor specific migration code into this migration_version attribute.

> optional attribute, it seems the example should perhaps not add it to
> all types to illustrate that it is not required.
ok. got it.


> >  
> >  * [mdev_supported_types]
> > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> >    This attribute should show the number of devices of type <type-id> that can be
> >    created.
> >  
> > +* version
> > +
> > +  This attribute is rw, and is optional.
> > +  It is used to check device compatibility between two mdev devices and is
> 
> between two mdev devices of the same type.
>
ok. got it.
But I have a question about aggregation proposed earlier.
Do we also have to assume the two mdev devices are of the same aggregation
count?
However, aggregation count is not available before a mdev device is created. :(


> > +  accessed in pairs between the two mdev devices being checked.
> 
> "in pairs"?
I meant, user space needs to access version attributes from two mdev device.
but seems that it's needless to mention that... I'll remove it :)


> > +  The intent of this attribute is to make an mdev device's version opaque to
> > +  user space, so instead of reading two mdev devices' version strings and
> 
> perhaps "...instead of reading the version string of two mdev devices
> and comparing them in userspace..."
yes, better, thanks:)

> > +  comparing in userspace, user space should only read one mdev device's version
> > +  attribute, and writes this version string into the other mdev device's version
> > +  attribute. Then vendor driver of mdev device whose version attribute being
> > +  written would check the incoming version string and tell user space whether
> > +  the two mdev devices are compatible via return value. That's why this
> > +  attribute is writable.
> > +
> > +  when reading this attribute, it should show device version string of
> > +  the device of type <type-id>.
> > +
> > +  This string is private to vendor driver itself. Vendor driver is able to
> > +  freely define format and length of device version string.
> > +  e.g. It can use a combination of pciid of parent device + mdev type.
> 
> Can the user assume the data contents of the string is ascii
> characters?  It's good that the vendor driver defines the format and
> length, but the user probably needs some expectation bounding that
> length.  Should we define it as no larger than PATH_MAX (4096), or maybe
> NAME_MAX (255) might be more reasonable?
I think so. I'll add those restrictions in next revision. 

> > +
> > +  When writing a string to this attribute, vendor driver should analyze this
> > +  string and check whether the mdev device being identified by this string is
> > +  compatible with the mdev device for this attribute. vendor driver should then
> 
> Compatible for what purpose?  I think this is where specifically
> calling this a migration_version potentially has value.
yes. if it also covers version of vendor specific migration code, calling it
migration_version is more appropriate.

> > +  return written string's length if it regards the two mdev devices are
> > +  compatible; vendor driver should return negative errno if it regards the two
> > +  mdev devices are not compatible.
> 
> IOW, the write(2) will succeed if the version is determined to be
> compatible and otherwise fail with vendor specific errno.
>
thanks:)

> > +
> > +  User space should treat ANY of below conditions as two mdev devices not
> > +  compatible:
> 
> (0) The mdev devices are not of the same type.
>
the same as above. do we also need to take aggregation count into consideration?

> > +  (1) any one of the two mdev devices does not have a version attribute
> > +  (2) error when read from one mdev device's version attribute
> 
> Is this intended to support that the vendor driver can supply a version
> attribute but not support migration?  TBH, this sounds like a vendor
> driver bug, but maybe it's necessary if the vendor driver could have
> some types that support migration and others that do not?  IOW, we're
> supplying the same attribute groups to all devices from a vendor, in
> which case my comment above regarding an example type without a version
> attribute might be invalid.
hmm, this is to make life easier for vendor driver to have some types that
support migration and others that do not. while we can get rid of returning
errno by providing different attribute groups to different devices, the way of
returning errno gives a simpler choice to vendors.

> 
> > +  (3) error when write one mdev device's version string to the other mdev
> > +  device's version attribute
> > +
> > +  User space should regard two mdev devices compatible when ALL of below
> > +  conditions are met:
> 
> (0) The mdev devices are of the same type
> 
> > +  (1) success when read from one mdev device's version attribute.
> > +  (2) success when write one mdev device's version string to the other mdev
> > +  device's version attribute
> > +
> > +  Errno:
> > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > +  devices, it should not register version attribute for this mdev device. But if
> > +  a vendor driver has already registered version attribute and it wants to claim
> > +  a mdev device incompatible to all other mdev devices, it needs to return
> > +  -ENODEV on access to this mdev device's version attribute.
> > +  If a mdev device is only incompatible to certain mdev devices, write of
> > +  incompatible mdev devices's version strings to its version attribute should
> > +  return -EINVAL;
> 
> I think it's best not to define the specific errno returned for a
> specific situation, let the vendor driver decide, userspace simply
> needs to know that an errno on read indicates the device does not
> support migration version comparison and that an errno on write
> indicates the devices are incompatible or the target doesn't support
> migration versions.
>
yes, user space only gets 0 or 1 as return code, not those errno. 
maybe I only need to describe errno in patch 2/2.

> > +
> > +  This attribute can be taken advantage of by live migration.
> > +  If user space detects two mdev devices are compatible through version
> > +  attribute, it can start migration between the two mdev devices, otherwise it
> > +  should abort its migration attempts between the two mdev devices.
> > +
> > +  Example Usage:
> > +  case 1:
> > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > +
> > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > +  ../mdev_supported_types/i915-GVTg_V5_4
> > +
> > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > +  ../mdev_supported_types/i915-GVTg_V5_4
> > +
> > +  (1) read source side mdev device's version.
> > +  #cat \
> > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > +    mdev_type/version
> > +  8086-193b-i915-GVTg_V5_4
> 
> Is this really the version information exposed in 2/2?  This is opaque,
> so of course you can add things later, but it seems short sighted not
> to even append a version 0 tag to account for software compatibility
> differences since the above only represents a parent and mdev type
> based version.
> 
yes, currently in 2/2, the version only includes <vendor id> + <device id> +
<mdev type>. but you are right, it's better to include software migration
version number.
so vendor drivers have below 3 ways to designate a mdev device has no migration
capability.
1. not registering migration_version attribute
2. on reading migration_version, returning errno
3. on reading migration_version, returning string indicating non-migratable.

The reason of not giving up way 2 is that maybe it can accelerate user space
getting information of device incompatible. if we only keep way 3, it would not
know this info until writing this string to target attribute.

do you agree?

> > +  (2) write source side mdev device's version string into target side mdev
> > +  device's version attribute.
> > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > +  mdev_type/version
> > +  # echo $?
> > +  0
> 
> TBH, there's a lot of superfluous information in this example that can
> be stripped out.  For example:
> 
> "
> (1) Compare mdev types:
> 
> The mdev type of an instantiated device can be read from the mdev_type
> link within the device instance in sysfs, for example:
> 
>   # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> 
> The mdev types available on a given host system can also be found
> through /sys/class/mdev_bus, for example:
> 
>   # ls /sys/class/mdev_bus/*/mdev_supported_types/
> 
> Migration is only possible between devices of the same mdev type.
> 
> (2) Retrieve the mdev source version:
> 
> The migration version information can either be read from the mdev_type
> link on an instantiated device:
> 
>   # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version
> 
> Or it can be read from the mdev type definition, for example:
> 
>   # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version
> 
> If reading the source version generates an error, migration is not
> possible.  NB, there might be several parent devices for a given mdev
> type on a host system, each may support or expose different versions.
> Matching the specific mdev type to a parent may become important in
> such configurations.
> 
> (3) Test source version at target:
> 
> Given a version as outlined above, its compatibility to an instantiated
> device of the same mdev type can be tested as:
> 
>   # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version
> 
> If this write fails, the source and target versions are not compatible
> or the target does not support migration.
> 
> Compatibility can also be tested prior to target device creation using
> the mdev type definition for a parent device with a previously found
> matching mdev type, for example:
> 
>   # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version
> 
> Again, an error writing the version indicates that an instance of this
> mdev type would not support a migration from the provided version.
> "
> 
> In particular from the provided example, the specific UUIDs, mdev
> types, parent information, and contents of the version attribute do not
> contribute to illustrating the protocol.  In fact, displaying the
> contents of the version attribute may tempt users to do comparison on
> their own, especially given how easy it is to decide the GVT-g version
> string.
>
got it!
great thanks!
I'll update it to the next revision.
> 
> > +
> > +  in this case, user space's write to target side mdev device's version
> > +  attribute returns success to indicate the two mdev devices are compatible.
> > +
> > +  case 2:
> > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b.
> > +
> > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > +  ../mdev_supported_types/i915-GVTg_V5_4
> > +
> > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > +  ../mdev_supported_types/i915-GVTg_V5_4
> > +
> > +  (1) read source side mdev device's version.
> > +  #cat \
> > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > +    mdev_type/version
> > +  8086-193b-i915-GVTg_V5_4
> > +
> > +  (2) write source side mdev device's version string into target side mdev
> > +  device's version attribute.
> > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > +  mdev_type/version
> > +  -bash: echo: write error: Invalid argument
> > +
> > +  in this case, user space's write to target side mdev device's version
> > +  attribute returns error to indicate the two mdev devices are incompatible.
> > +  (incompatible because pci ids of the two mdev devices' parent devices are
> > +  different).
> > +
> > +  case 3:
> > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > +  But vendor driver does not provide version attribute for this device.
> > +
> > +  (1) read source side mdev device's version.
> > +  #cat \
> > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > +    mdev_type/version
> > +  cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > +  mdev_type/version': No such file or directory
> > +
> > +  in this case, user space reads source side mdev device's version attribute
> > +  which does not exist however. user space regards the two mdev devices as not
> > +  compatible and will not start migration between the two mdev devices.
> > +
> > +
> 
> This is far too long for description and examples, it's not this
> complicated.  Thanks,
>
got it. I'll follow your above example :)

thanks
Yan 
> >  * [device]
> >  
> >    This directory contains links to the devices of type <type-id> that have been
>
Yan Zhao May 8, 2019, 11:57 a.m. UTC | #4
On Tue, May 07, 2019 at 05:19:54PM +0800, Cornelia Huck wrote:
> On Sun,  5 May 2019 21:49:04 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > version attribute is used to check two mdev devices' compatibility.
> > 
> > The key point of this version attribute is that it's rw.
> > User space has no need to understand internal of device version and no
> > need to compare versions by itself.
> > Compared to reading version strings from both two mdev devices being
> > checked, user space only reads from one mdev device's version attribute.
> > After getting its version string, user space writes this string into the
> > other mdev device's version attribute. Vendor driver of mdev device
> > whose version attribute being written will check device compatibility of
> > the two mdev devices for user space and return success for compatibility
> > or errno for incompatibility.
> 
> I'm still missing a bit _what_ is actually supposed to be
> compatible/incompatible. I'd assume some internal state descriptions
> (even if this is not actually limited to migration).
>
right.
originally, I thought this attribute should only contain a device's hardware
compatibility info. But seems also including vendor specific software migration
version is more reasonable, because general VFIO migration code cannot know
version of vendor specific software migration code until migration data is
transferring to the target vm. Then renaming it to migration_version is more
appropriate.
:)

> > So two readings of version attributes + checking in user space are now
> > changed to one reading + one writing of version attributes + checking in
> > vendor driver.
> 
> I'm not sure that needs to go into the patch description (sounds like
> it is rather a change log?)
> 
Indeed :)

> > Format and length of version strings are now private to vendor driver
> > who can define them freely.
> 
> Same here; simply drop the 'now'?
> 
ok. thanks:)

> > 
> >              __ user space
> >               /\          \
> >              /             \write
> >             / read          \
> >      ______/__           ___\|/___
> >     | version |         | version |-->check compatibility
> >     -----------         -----------
> >     mdev device A       mdev device B
> > 
> > This version attribute is optional. If a mdev device does not provide
> > with a version attribute, this mdev device is incompatible to all other
> > mdev devices.
> 
> Again, I'd like an explanation here what kind of compatibility we're
> talking about.
> 
as above, let me reword it to migration compatibility.

> > 
> > Live migration is able to take advantage of this version attribute.
> > Before user space actually starts live migration, it can first check
> > whether two mdev devices are compatible.
> > 
> > v2:
> > 1. added detailed intent and usage
> > 2. made definition of version string completely private to vendor driver
> >    (Alex Williamson)
> > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > 4. mandatory --> optional (Cornelia Huck)
> > 5. added description for errno (Cornelia Huck)
> 
> This changelog should go below the ---, so that it does not actually
> show up in the patch description later :)
got it:)

> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Erik Skultety <eskultet@redhat.com>
> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > Cc: Cornelia Huck <cohuck@redhat.com>
> > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > Cc: Neo Jia <cjia@nvidia.com>
> > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > Cc: Christophe de Dinechin <dinechin@redhat.com>
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
> >  1 file changed, 140 insertions(+)
> > 
> > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > index c3f69bcaf96e..013a764968eb 100644
> > --- a/Documentation/vfio-mediated-device.txt
> > +++ b/Documentation/vfio-mediated-device.txt
> > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |     |   |--- create
> > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |     |   |--- available_instances
> >    |     |   |--- device_api
> >    |     |   |--- description
> > +  |     |   |--- version
> >    |     |   |--- [devices]
> >    |     |--- [<type-id>]
> >    |          |--- create
> > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> >    |          |--- available_instances
> >    |          |--- device_api
> >    |          |--- description
> > +  |          |--- version
> >    |          |--- [devices]
> >  
> >  * [mdev_supported_types]
> > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> >    This attribute should show the number of devices of type <type-id> that can be
> >    created.
> >  
> > +* version
> > +
> > +  This attribute is rw, and is optional.
> > +  It is used to check device compatibility between two mdev devices and is
> > +  accessed in pairs between the two mdev devices being checked.
> > +  The intent of this attribute is to make an mdev device's version opaque to
> > +  user space, so instead of reading two mdev devices' version strings and
> > +  comparing in userspace, user space should only read one mdev device's version
> > +  attribute, and writes this version string into the other mdev device's version
> > +  attribute. Then vendor driver of mdev device whose version attribute being
> > +  written would check the incoming version string and tell user space whether
> > +  the two mdev devices are compatible via return value. That's why this
> > +  attribute is writable.
> 
> I would reword this a bit:
> 
> "This attribute provides a way to check device compatibility between
> two mdev devices from userspace. The intended usage is for userspace to
> read the version attribute from one mdev device and then writing that
> value to the version attribute of the other mdev device. The second
> mdev device indicates compatibility via the return code of the write
> operation. This makes compatibility between mdev devices completely
> vendor-defined and opaque to userspace."
> 
> We still should explain _what_ compatibility we're talking about here,
> though.
> 
Thanks. It's much better than mine:) 
Then I'll change compatibility --> migration compatibility.

> > +
> > +  when reading this attribute, it should show device version string of
> > +  the device of type <type-id>.
> > +
> > +  This string is private to vendor driver itself. Vendor driver is able to
> > +  freely define format and length of device version string.
> > +  e.g. It can use a combination of pciid of parent device + mdev type.
> > +
> > +  When writing a string to this attribute, vendor driver should analyze this
> > +  string and check whether the mdev device being identified by this string is
> > +  compatible with the mdev device for this attribute. vendor driver should then
> > +  return written string's length if it regards the two mdev devices are
> > +  compatible; vendor driver should return negative errno if it regards the two
> > +  mdev devices are not compatible.
> > +
> > +  User space should treat ANY of below conditions as two mdev devices not
> > +  compatible:
> > +  (1) any one of the two mdev devices does not have a version attribute
> > +  (2) error when read from one mdev device's version attribute
> 
> s/read/reading/
> 
> > +  (3) error when write one mdev device's version string to the other mdev
> 
> s/write/writing/
> 
> > +  device's version attribute
> > +
> > +  User space should regard two mdev devices compatible when ALL of below
> > +  conditions are met:
> > +  (1) success when read from one mdev device's version attribute.
> 
> s/read/reading/
> 
> > +  (2) success when write one mdev device's version string to the other mdev
> 
> s/write/writing/
got it. thanks for pointing them out:)
> 
> > +  device's version attribute
> > +
> > +  Errno:
> > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> 
> "If the vendor driver wants to designate a mdev device..."
> 
ok. thanks:)
> > +  devices, it should not register version attribute for this mdev device. But if
> > +  a vendor driver has already registered version attribute and it wants to claim
> > +  a mdev device incompatible to all other mdev devices, it needs to return
> > +  -ENODEV on access to this mdev device's version attribute.
> > +  If a mdev device is only incompatible to certain mdev devices, write of
> > +  incompatible mdev devices's version strings to its version attribute should
> > +  return -EINVAL;
> 
> 
> Maybe put the defined return code into a bulleted list instead? But
> this looks reasonable as well.
> 
as user space have no idea of those errno and only gets 0/1 as return code from
read/write. maybe I can move this description of errno to patch 2/2 as an
example?

> > +
> > +  This attribute can be taken advantage of by live migration.
> > +  If user space detects two mdev devices are compatible through version
> > +  attribute, it can start migration between the two mdev devices, otherwise it
> > +  should abort its migration attempts between the two mdev devices.
> 
> (...)
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Boris Fiuczynski May 8, 2019, 3:27 p.m. UTC | #5
On 5/8/19 11:22 PM, Alex Williamson wrote:
>>> I thought there was a request to make this more specific to migration
>>> by renaming it to something like migration_version.  Also, as an
>>>   
>> so this attribute may not only include a mdev device's parent device info and
>> mdev type, but also include numeric software version of vendor specific
>> migration code, right?
> It's a vendor defined string, it should be considered opaque to the
> user, the vendor can include whatever they feel is relevant.
> 
Would a vendor also be allowed to provide a string expressing required 
features as well as containing backend resource requirements which need 
to be compatible for a successful migration? Somehow a bit like a cpu 
model... maybe even as json or xml...
I am asking this with vfio-ap in mind. In that context checking 
compatibility of two vfio-ap mdev devices is not as simple as checking 
if version A is smaller or equal to version B.
Alex Williamson May 8, 2019, 9:22 p.m. UTC | #6
On Wed, 8 May 2019 07:27:40 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Wed, May 08, 2019 at 05:18:26AM +0800, Alex Williamson wrote:
> > On Sun,  5 May 2019 21:49:04 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > version attribute is used to check two mdev devices' compatibility.
> > > 
> > > The key point of this version attribute is that it's rw.
> > > User space has no need to understand internal of device version and no
> > > need to compare versions by itself.
> > > Compared to reading version strings from both two mdev devices being
> > > checked, user space only reads from one mdev device's version attribute.
> > > After getting its version string, user space writes this string into the
> > > other mdev device's version attribute. Vendor driver of mdev device
> > > whose version attribute being written will check device compatibility of
> > > the two mdev devices for user space and return success for compatibility
> > > or errno for incompatibility.
> > > So two readings of version attributes + checking in user space are now
> > > changed to one reading + one writing of version attributes + checking in
> > > vendor driver.
> > > Format and length of version strings are now private to vendor driver
> > > who can define them freely.
> > > 
> > >              __ user space
> > >               /\          \
> > >              /             \write
> > >             / read          \
> > >      ______/__           ___\|/___
> > >     | version |         | version |-->check compatibility
> > >     -----------         -----------
> > >     mdev device A       mdev device B
> > > 
> > > This version attribute is optional. If a mdev device does not provide
> > > with a version attribute, this mdev device is incompatible to all other
> > > mdev devices.
> > > 
> > > Live migration is able to take advantage of this version attribute.
> > > Before user space actually starts live migration, it can first check
> > > whether two mdev devices are compatible.
> > > 
> > > v2:
> > > 1. added detailed intent and usage
> > > 2. made definition of version string completely private to vendor driver
> > >    (Alex Williamson)
> > > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > > 4. mandatory --> optional (Cornelia Huck)
> > > 5. added description for errno (Cornelia Huck)
> > > 
> > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > Cc: Erik Skultety <eskultet@redhat.com>
> > > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > > Cc: Neo Jia <cjia@nvidia.com>
> > > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > > Cc: Christophe de Dinechin <dinechin@redhat.com>
> > > 
> > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > ---
> > >  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
> > >  1 file changed, 140 insertions(+)
> > > 
> > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > > index c3f69bcaf96e..013a764968eb 100644
> > > --- a/Documentation/vfio-mediated-device.txt
> > > +++ b/Documentation/vfio-mediated-device.txt
> > > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> > >    |     |   |--- available_instances
> > >    |     |   |--- device_api
> > >    |     |   |--- description
> > > +  |     |   |--- version
> > >    |     |   |--- [devices]
> > >    |     |--- [<type-id>]
> > >    |     |   |--- create
> > > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> > >    |     |   |--- available_instances
> > >    |     |   |--- device_api
> > >    |     |   |--- description
> > > +  |     |   |--- version
> > >    |     |   |--- [devices]
> > >    |     |--- [<type-id>]
> > >    |          |--- create
> > > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> > >    |          |--- available_instances
> > >    |          |--- device_api
> > >    |          |--- description
> > > +  |          |--- version
> > >    |          |--- [devices]  
> > 
> > I thought there was a request to make this more specific to migration
> > by renaming it to something like migration_version.  Also, as an
> >  
> so this attribute may not only include a mdev device's parent device info and
> mdev type, but also include numeric software version of vendor specific
> migration code, right?

It's a vendor defined string, it should be considered opaque to the
user, the vendor can include whatever they feel is relevant.

> This actually makes sense.
> So, do I need to add a disclaimer in this doc like:
> vendor driver should be responsible by itself for a mdev device's migration
> compatibility. 

I thought that was the purpose of this attribute.

> During migration setup phase, general migration code in user space VFIO only
> checks this version of VFIO migration region, and will not check software version
> of vendor specific migration code.

What is "software version of vendor specific migration code"?  If
you're asking whether anything will check for parent device
compatibility or parent vendor driver compatibility, the answer is no,
that's what this interface is meant to provide.  Userspace should need
to do nothing more than verify the mdev types match and then use the
version attribute to confirm source to target compatibility.

> It is suggested to incorporate at least parent device info and software version
> of vendor specific migration code into this migration_version attribute.

We can make recommendations as "best practices", but ultimately it's an
opaque string defined by the vendor driver.

But you never addressed my comment that previous reviews asked for the
attribute to be named something more specific to migration.

> > optional attribute, it seems the example should perhaps not add it to
> > all types to illustrate that it is not required.  
> ok. got it.
> 
> 
> > >  
> > >  * [mdev_supported_types]
> > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> > >    This attribute should show the number of devices of type <type-id> that can be
> > >    created.
> > >  
> > > +* version
> > > +
> > > +  This attribute is rw, and is optional.
> > > +  It is used to check device compatibility between two mdev devices and is  
> > 
> > between two mdev devices of the same type.
> >  
> ok. got it.
> But I have a question about aggregation proposed earlier.
> Do we also have to assume the two mdev devices are of the same aggregation
> count?
> However, aggregation count is not available before a mdev device is created. :(

We don't support aggregation yet, but yes, that's going to introduce
issues here.  Any configuration beyond the base mdev type would imply
that the base type could be compatible for migration, but the specific
instances might not.  Resolving that would imply that our version
information needs to be relative to an instance, not just the base type.

How would we extend this interface to support that?  We could have a
version attribute on each device instance, which might report something
like:

  0123456789,aggregate=2

IOW the device instance of version concatenates the mdev type version
with the additional create parameters for that device.  Writing this to
the type attribute should be parsed by the vendor driver as support for
given base device with specified additional create parameters.

I'm afraid this also bring us around to treacherous questions around
who is responsible for creating that device on the migration target and
where is this meta information about the device exposed.  Maybe instead
of a per instance version attribute we would instead expose the create
parameters as an attribute per instance and it would be userspace's
responsibility to create a version string from the mdev type and create
parameters similar to above.  This would also make it possible to
create a compatible instance on the target without pre-knowledge of how
the device was created.

Also, this issue exists already, but compatibility and capacity are two
separate things, I think we want to limit this interface to the
former.  For instance, if I want to migrate an i915-GVTg_V5_1 device to
another system where available_instances for that type is zero, the
version attribute should strictly report the device compatibility, it's
not responsible for returning an errno due to lack of resources.
Similarly if we were to do something with aggregation, the version
attribute would strictly report if the target supports creating that
device with those parameters, not whether it has capacity to create
such as device at that instant in time.
  
> > > +  accessed in pairs between the two mdev devices being checked.  
> > 
> > "in pairs"?  
> I meant, user space needs to access version attributes from two mdev device.
> but seems that it's needless to mention that... I'll remove it :)
> 
> 
> > > +  The intent of this attribute is to make an mdev device's version opaque to
> > > +  user space, so instead of reading two mdev devices' version strings and  
> > 
> > perhaps "...instead of reading the version string of two mdev devices
> > and comparing them in userspace..."  
> yes, better, thanks:)
> 
> > > +  comparing in userspace, user space should only read one mdev device's version
> > > +  attribute, and writes this version string into the other mdev device's version
> > > +  attribute. Then vendor driver of mdev device whose version attribute being
> > > +  written would check the incoming version string and tell user space whether
> > > +  the two mdev devices are compatible via return value. That's why this
> > > +  attribute is writable.
> > > +
> > > +  when reading this attribute, it should show device version string of
> > > +  the device of type <type-id>.
> > > +
> > > +  This string is private to vendor driver itself. Vendor driver is able to
> > > +  freely define format and length of device version string.
> > > +  e.g. It can use a combination of pciid of parent device + mdev type.  
> > 
> > Can the user assume the data contents of the string is ascii
> > characters?  It's good that the vendor driver defines the format and
> > length, but the user probably needs some expectation bounding that
> > length.  Should we define it as no larger than PATH_MAX (4096), or maybe
> > NAME_MAX (255) might be more reasonable?  
> I think so. I'll add those restrictions in next revision. 

If we start adding creation parameters, PATH_MAX may actually be the
more reasonable limit.

> > > +
> > > +  When writing a string to this attribute, vendor driver should analyze this
> > > +  string and check whether the mdev device being identified by this string is
> > > +  compatible with the mdev device for this attribute. vendor driver should then  
> > 
> > Compatible for what purpose?  I think this is where specifically
> > calling this a migration_version potentially has value.  
> yes. if it also covers version of vendor specific migration code, calling it
> migration_version is more appropriate.

I think we're discussing an interface that validates "I [the vendor
driver] am able to import the state of this version", therefore it must
include every relevant aspect of the vendor support for that.

> > > +  return written string's length if it regards the two mdev devices are
> > > +  compatible; vendor driver should return negative errno if it regards the two
> > > +  mdev devices are not compatible.  
> > 
> > IOW, the write(2) will succeed if the version is determined to be
> > compatible and otherwise fail with vendor specific errno.
> >  
> thanks:)
> 
> > > +
> > > +  User space should treat ANY of below conditions as two mdev devices not
> > > +  compatible:  
> > 
> > (0) The mdev devices are not of the same type.
> >  
> the same as above. do we also need to take aggregation count into consideration?
> 
> > > +  (1) any one of the two mdev devices does not have a version attribute
> > > +  (2) error when read from one mdev device's version attribute  
> > 
> > Is this intended to support that the vendor driver can supply a version
> > attribute but not support migration?  TBH, this sounds like a vendor
> > driver bug, but maybe it's necessary if the vendor driver could have
> > some types that support migration and others that do not?  IOW, we're
> > supplying the same attribute groups to all devices from a vendor, in
> > which case my comment above regarding an example type without a version
> > attribute might be invalid.  
> hmm, this is to make life easier for vendor driver to have some types that
> support migration and others that do not. while we can get rid of returning
> errno by providing different attribute groups to different devices, the way of
> returning errno gives a simpler choice to vendors.

Yes, I think it might be overly complicated to provide different
attribute groups for different devices, we have more flexibility if the
user does not make any assumptions based only on the presence of a
version attribute.

> > > +  (3) error when write one mdev device's version string to the other mdev
> > > +  device's version attribute
> > > +
> > > +  User space should regard two mdev devices compatible when ALL of below
> > > +  conditions are met:  
> > 
> > (0) The mdev devices are of the same type
> >   
> > > +  (1) success when read from one mdev device's version attribute.
> > > +  (2) success when write one mdev device's version string to the other mdev
> > > +  device's version attribute
> > > +
> > > +  Errno:
> > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > +  devices, it should not register version attribute for this mdev device. But if
> > > +  a vendor driver has already registered version attribute and it wants to claim
> > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > +  -ENODEV on access to this mdev device's version attribute.
> > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > +  incompatible mdev devices's version strings to its version attribute should
> > > +  return -EINVAL;  
> > 
> > I think it's best not to define the specific errno returned for a
> > specific situation, let the vendor driver decide, userspace simply
> > needs to know that an errno on read indicates the device does not
> > support migration version comparison and that an errno on write
> > indicates the devices are incompatible or the target doesn't support
> > migration versions.
> >  
> yes, user space only gets 0 or 1 as return code, not those errno. 
> maybe I only need to describe errno in patch 2/2.
> 
> > > +
> > > +  This attribute can be taken advantage of by live migration.
> > > +  If user space detects two mdev devices are compatible through version
> > > +  attribute, it can start migration between the two mdev devices, otherwise it
> > > +  should abort its migration attempts between the two mdev devices.
> > > +
> > > +  Example Usage:
> > > +  case 1:
> > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > +
> > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > +
> > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > +
> > > +  (1) read source side mdev device's version.
> > > +  #cat \
> > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > +    mdev_type/version
> > > +  8086-193b-i915-GVTg_V5_4  
> > 
> > Is this really the version information exposed in 2/2?  This is opaque,
> > so of course you can add things later, but it seems short sighted not
> > to even append a version 0 tag to account for software compatibility
> > differences since the above only represents a parent and mdev type
> > based version.
> >   
> yes, currently in 2/2, the version only includes <vendor id> + <device id> +
> <mdev type>. but you are right, it's better to include software migration
> version number.
> so vendor drivers have below 3 ways to designate a mdev device has no migration
> capability.
> 1. not registering migration_version attribute
> 2. on reading migration_version, returning errno
> 3. on reading migration_version, returning string indicating non-migratable.
> 
> The reason of not giving up way 2 is that maybe it can accelerate user space
> getting information of device incompatible. if we only keep way 3, it would not
> know this info until writing this string to target attribute.
> 
> do you agree?

The string is opaque to the user, so if you're asking in (3) that the
user read and parse some information in the string that indicates the
device is non-migratable then no, I don't agree with that policy.  If
reading the version attribute produces a result, the only thing the
user can do with that result is to test it by writing it to another
version attribute.  Thanks,

Alex

> > > +  (2) write source side mdev device's version string into target side mdev
> > > +  device's version attribute.
> > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > +  mdev_type/version
> > > +  # echo $?
> > > +  0  
> > 
> > TBH, there's a lot of superfluous information in this example that can
> > be stripped out.  For example:
> > 
> > "
> > (1) Compare mdev types:
> > 
> > The mdev type of an instantiated device can be read from the mdev_type
> > link within the device instance in sysfs, for example:
> > 
> >   # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> > 
> > The mdev types available on a given host system can also be found
> > through /sys/class/mdev_bus, for example:
> > 
> >   # ls /sys/class/mdev_bus/*/mdev_supported_types/
> > 
> > Migration is only possible between devices of the same mdev type.
> > 
> > (2) Retrieve the mdev source version:
> > 
> > The migration version information can either be read from the mdev_type
> > link on an instantiated device:
> > 
> >   # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version
> > 
> > Or it can be read from the mdev type definition, for example:
> > 
> >   # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version
> > 
> > If reading the source version generates an error, migration is not
> > possible.  NB, there might be several parent devices for a given mdev
> > type on a host system, each may support or expose different versions.
> > Matching the specific mdev type to a parent may become important in
> > such configurations.
> > 
> > (3) Test source version at target:
> > 
> > Given a version as outlined above, its compatibility to an instantiated
> > device of the same mdev type can be tested as:
> > 
> >   # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version
> > 
> > If this write fails, the source and target versions are not compatible
> > or the target does not support migration.
> > 
> > Compatibility can also be tested prior to target device creation using
> > the mdev type definition for a parent device with a previously found
> > matching mdev type, for example:
> > 
> >   # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version
> > 
> > Again, an error writing the version indicates that an instance of this
> > mdev type would not support a migration from the provided version.
> > "
> > 
> > In particular from the provided example, the specific UUIDs, mdev
> > types, parent information, and contents of the version attribute do not
> > contribute to illustrating the protocol.  In fact, displaying the
> > contents of the version attribute may tempt users to do comparison on
> > their own, especially given how easy it is to decide the GVT-g version
> > string.
> >  
> got it!
> great thanks!
> I'll update it to the next revision.
> >   
> > > +
> > > +  in this case, user space's write to target side mdev device's version
> > > +  attribute returns success to indicate the two mdev devices are compatible.
> > > +
> > > +  case 2:
> > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b.
> > > +
> > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > +
> > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > +
> > > +  (1) read source side mdev device's version.
> > > +  #cat \
> > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > +    mdev_type/version
> > > +  8086-193b-i915-GVTg_V5_4
> > > +
> > > +  (2) write source side mdev device's version string into target side mdev
> > > +  device's version attribute.
> > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > +  mdev_type/version
> > > +  -bash: echo: write error: Invalid argument
> > > +
> > > +  in this case, user space's write to target side mdev device's version
> > > +  attribute returns error to indicate the two mdev devices are incompatible.
> > > +  (incompatible because pci ids of the two mdev devices' parent devices are
> > > +  different).
> > > +
> > > +  case 3:
> > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > +  But vendor driver does not provide version attribute for this device.
> > > +
> > > +  (1) read source side mdev device's version.
> > > +  #cat \
> > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > +    mdev_type/version
> > > +  cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > +  mdev_type/version': No such file or directory
> > > +
> > > +  in this case, user space reads source side mdev device's version attribute
> > > +  which does not exist however. user space regards the two mdev devices as not
> > > +  compatible and will not start migration between the two mdev devices.
> > > +
> > > +  
> > 
> > This is far too long for description and examples, it's not this
> > complicated.  Thanks,
> >  
> got it. I'll follow your above example :)
> 
> thanks
> Yan 
> > >  * [device]
> > >  
> > >    This directory contains links to the devices of type <type-id> that have been  
> >
Yan Zhao May 9, 2019, 3:10 a.m. UTC | #7
On Thu, May 09, 2019 at 05:22:42AM +0800, Alex Williamson wrote:
> On Wed, 8 May 2019 07:27:40 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Wed, May 08, 2019 at 05:18:26AM +0800, Alex Williamson wrote:
> > > On Sun,  5 May 2019 21:49:04 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > version attribute is used to check two mdev devices' compatibility.
> > > > 
> > > > The key point of this version attribute is that it's rw.
> > > > User space has no need to understand internal of device version and no
> > > > need to compare versions by itself.
> > > > Compared to reading version strings from both two mdev devices being
> > > > checked, user space only reads from one mdev device's version attribute.
> > > > After getting its version string, user space writes this string into the
> > > > other mdev device's version attribute. Vendor driver of mdev device
> > > > whose version attribute being written will check device compatibility of
> > > > the two mdev devices for user space and return success for compatibility
> > > > or errno for incompatibility.
> > > > So two readings of version attributes + checking in user space are now
> > > > changed to one reading + one writing of version attributes + checking in
> > > > vendor driver.
> > > > Format and length of version strings are now private to vendor driver
> > > > who can define them freely.
> > > > 
> > > >              __ user space
> > > >               /\          \
> > > >              /             \write
> > > >             / read          \
> > > >      ______/__           ___\|/___
> > > >     | version |         | version |-->check compatibility
> > > >     -----------         -----------
> > > >     mdev device A       mdev device B
> > > > 
> > > > This version attribute is optional. If a mdev device does not provide
> > > > with a version attribute, this mdev device is incompatible to all other
> > > > mdev devices.
> > > > 
> > > > Live migration is able to take advantage of this version attribute.
> > > > Before user space actually starts live migration, it can first check
> > > > whether two mdev devices are compatible.
> > > > 
> > > > v2:
> > > > 1. added detailed intent and usage
> > > > 2. made definition of version string completely private to vendor driver
> > > >    (Alex Williamson)
> > > > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > > > 4. mandatory --> optional (Cornelia Huck)
> > > > 5. added description for errno (Cornelia Huck)
> > > > 
> > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > Cc: Erik Skultety <eskultet@redhat.com>
> > > > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > > > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > > > Cc: Neo Jia <cjia@nvidia.com>
> > > > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > > > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > > > Cc: Christophe de Dinechin <dinechin@redhat.com>
> > > > 
> > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > > ---
> > > >  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
> > > >  1 file changed, 140 insertions(+)
> > > > 
> > > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > > > index c3f69bcaf96e..013a764968eb 100644
> > > > --- a/Documentation/vfio-mediated-device.txt
> > > > +++ b/Documentation/vfio-mediated-device.txt
> > > > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> > > >    |     |   |--- available_instances
> > > >    |     |   |--- device_api
> > > >    |     |   |--- description
> > > > +  |     |   |--- version
> > > >    |     |   |--- [devices]
> > > >    |     |--- [<type-id>]
> > > >    |     |   |--- create
> > > > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> > > >    |     |   |--- available_instances
> > > >    |     |   |--- device_api
> > > >    |     |   |--- description
> > > > +  |     |   |--- version
> > > >    |     |   |--- [devices]
> > > >    |     |--- [<type-id>]
> > > >    |          |--- create
> > > > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> > > >    |          |--- available_instances
> > > >    |          |--- device_api
> > > >    |          |--- description
> > > > +  |          |--- version
> > > >    |          |--- [devices]  
> > > 
> > > I thought there was a request to make this more specific to migration
> > > by renaming it to something like migration_version.  Also, as an
> > >  
> > so this attribute may not only include a mdev device's parent device info and
> > mdev type, but also include numeric software version of vendor specific
> > migration code, right?
> 
> It's a vendor defined string, it should be considered opaque to the
> user, the vendor can include whatever they feel is relevant.
> 
> > This actually makes sense.
> > So, do I need to add a disclaimer in this doc like:
> > vendor driver should be responsible by itself for a mdev device's migration
> > compatibility. 
> 
> I thought that was the purpose of this attribute.
> 
> > During migration setup phase, general migration code in user space VFIO only
> > checks this version of VFIO migration region, and will not check software version
> > of vendor specific migration code.
> 
> What is "software version of vendor specific migration code"?  If
> you're asking whether anything will check for parent device
> compatibility or parent vendor driver compatibility, the answer is no,
> that's what this interface is meant to provide.  Userspace should need
> to do nothing more than verify the mdev types match and then use the
> version attribute to confirm source to target compatibility.
> 
> > It is suggested to incorporate at least parent device info and software version
> > of vendor specific migration code into this migration_version attribute.
> 
> We can make recommendations as "best practices", but ultimately it's an
> opaque string defined by the vendor driver.
> 
> But you never addressed my comment that previous reviews asked for the
> attribute to be named something more specific to migration.
>
I aggree to rename it to migration_version.

> > > optional attribute, it seems the example should perhaps not add it to
> > > all types to illustrate that it is not required.  
> > ok. got it.
> > 
> > 
> > > >  
> > > >  * [mdev_supported_types]
> > > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> > > >    This attribute should show the number of devices of type <type-id> that can be
> > > >    created.
> > > >  
> > > > +* version
> > > > +
> > > > +  This attribute is rw, and is optional.
> > > > +  It is used to check device compatibility between two mdev devices and is  
> > > 
> > > between two mdev devices of the same type.
> > >  
> > ok. got it.
> > But I have a question about aggregation proposed earlier.
> > Do we also have to assume the two mdev devices are of the same aggregation
> > count?
> > However, aggregation count is not available before a mdev device is created. :(
> 
> We don't support aggregation yet, but yes, that's going to introduce
> issues here.  Any configuration beyond the base mdev type would imply
> that the base type could be compatible for migration, but the specific
> instances might not.  Resolving that would imply that our version
> information needs to be relative to an instance, not just the base type.
> 
> How would we extend this interface to support that?  We could have a
> version attribute on each device instance, which might report something
> like:
> 
>   0123456789,aggregate=2
> 
> IOW the device instance of version concatenates the mdev type version
> with the additional create parameters for that device.  Writing this to
> the type attribute should be parsed by the vendor driver as support for
> given base device with specified additional create parameters.
> 
> I'm afraid this also bring us around to treacherous questions around
> who is responsible for creating that device on the migration target and
> where is this meta information about the device exposed.  Maybe instead
> of a per instance version attribute we would instead expose the create
> parameters as an attribute per instance and it would be userspace's
> responsibility to create a version string from the mdev type and create
> parameters similar to above.  This would also make it possible to
> create a compatible instance on the target without pre-knowledge of how
> the device was created.
> 
> Also, this issue exists already, but compatibility and capacity are two
> separate things, I think we want to limit this interface to the
> former.  For instance, if I want to migrate an i915-GVTg_V5_1 device to
> another system where available_instances for that type is zero, the
> version attribute should strictly report the device compatibility, it's
> not responsible for returning an errno due to lack of resources.
> Similarly if we were to do something with aggregation, the version
> attribute would strictly report if the target supports creating that
> device with those parameters, not whether it has capacity to create
> such as device at that instant in time.
>
I think it's good to have a migration_version attribute under each device
instance.
It has two pros:
1. vendor driver can incorperate into the string things like:
   parent device info + mdev type + aggregate count + software version
2. even for non mdev devices, like a VF in SRIOV
   PF driver can export this migration_version under a VF instance. so a VF is
   possible to migrate with vfio-pci driver installed. (though with current VFIO
   live migration RFCs, with vfio-pci driver is not able to migrate. but it
   provides a possibility)


could we maintain two migration_version attributes? 
"create parameter + mdev_type migraton_version" for user space software to
create a migration compatible mdev device.
"per device instance migration_version" for verifying migration compatibility of
devices already created.


> > > > +  accessed in pairs between the two mdev devices being checked.  
> > > 
> > > "in pairs"?  
> > I meant, user space needs to access version attributes from two mdev device.
> > but seems that it's needless to mention that... I'll remove it :)
> > 
> > 
> > > > +  The intent of this attribute is to make an mdev device's version opaque to
> > > > +  user space, so instead of reading two mdev devices' version strings and  
> > > 
> > > perhaps "...instead of reading the version string of two mdev devices
> > > and comparing them in userspace..."  
> > yes, better, thanks:)
> > 
> > > > +  comparing in userspace, user space should only read one mdev device's version
> > > > +  attribute, and writes this version string into the other mdev device's version
> > > > +  attribute. Then vendor driver of mdev device whose version attribute being
> > > > +  written would check the incoming version string and tell user space whether
> > > > +  the two mdev devices are compatible via return value. That's why this
> > > > +  attribute is writable.
> > > > +
> > > > +  when reading this attribute, it should show device version string of
> > > > +  the device of type <type-id>.
> > > > +
> > > > +  This string is private to vendor driver itself. Vendor driver is able to
> > > > +  freely define format and length of device version string.
> > > > +  e.g. It can use a combination of pciid of parent device + mdev type.  
> > > 
> > > Can the user assume the data contents of the string is ascii
> > > characters?  It's good that the vendor driver defines the format and
> > > length, but the user probably needs some expectation bounding that
> > > length.  Should we define it as no larger than PATH_MAX (4096), or maybe
> > > NAME_MAX (255) might be more reasonable?  
> > I think so. I'll add those restrictions in next revision. 
> 
> If we start adding creation parameters, PATH_MAX may actually be the
> more reasonable limit.
>
got it.
> > > > +
> > > > +  When writing a string to this attribute, vendor driver should analyze this
> > > > +  string and check whether the mdev device being identified by this string is
> > > > +  compatible with the mdev device for this attribute. vendor driver should then  
> > > 
> > > Compatible for what purpose?  I think this is where specifically
> > > calling this a migration_version potentially has value.  
> > yes. if it also covers version of vendor specific migration code, calling it
> > migration_version is more appropriate.
> 
> I think we're discussing an interface that validates "I [the vendor
> driver] am able to import the state of this version", therefore it must
> include every relevant aspect of the vendor support for that.
> 
> > > > +  return written string's length if it regards the two mdev devices are
> > > > +  compatible; vendor driver should return negative errno if it regards the two
> > > > +  mdev devices are not compatible.  
> > > 
> > > IOW, the write(2) will succeed if the version is determined to be
> > > compatible and otherwise fail with vendor specific errno.
> > >  
> > thanks:)
> > 
> > > > +
> > > > +  User space should treat ANY of below conditions as two mdev devices not
> > > > +  compatible:  
> > > 
> > > (0) The mdev devices are not of the same type.
> > >  
> > the same as above. do we also need to take aggregation count into consideration?
> > 
> > > > +  (1) any one of the two mdev devices does not have a version attribute
> > > > +  (2) error when read from one mdev device's version attribute  
> > > 
> > > Is this intended to support that the vendor driver can supply a version
> > > attribute but not support migration?  TBH, this sounds like a vendor
> > > driver bug, but maybe it's necessary if the vendor driver could have
> > > some types that support migration and others that do not?  IOW, we're
> > > supplying the same attribute groups to all devices from a vendor, in
> > > which case my comment above regarding an example type without a version
> > > attribute might be invalid.  
> > hmm, this is to make life easier for vendor driver to have some types that
> > support migration and others that do not. while we can get rid of returning
> > errno by providing different attribute groups to different devices, the way of
> > returning errno gives a simpler choice to vendors.
> 
> Yes, I think it might be overly complicated to provide different
> attribute groups for different devices, we have more flexibility if the
> user does not make any assumptions based only on the presence of a
> version attribute.
> 
> > > > +  (3) error when write one mdev device's version string to the other mdev
> > > > +  device's version attribute
> > > > +
> > > > +  User space should regard two mdev devices compatible when ALL of below
> > > > +  conditions are met:  
> > > 
> > > (0) The mdev devices are of the same type
> > >   
> > > > +  (1) success when read from one mdev device's version attribute.
> > > > +  (2) success when write one mdev device's version string to the other mdev
> > > > +  device's version attribute
> > > > +
> > > > +  Errno:
> > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > +  return -EINVAL;  
> > > 
> > > I think it's best not to define the specific errno returned for a
> > > specific situation, let the vendor driver decide, userspace simply
> > > needs to know that an errno on read indicates the device does not
> > > support migration version comparison and that an errno on write
> > > indicates the devices are incompatible or the target doesn't support
> > > migration versions.
> > >  
> > yes, user space only gets 0 or 1 as return code, not those errno. 
> > maybe I only need to describe errno in patch 2/2.
> > 
> > > > +
> > > > +  This attribute can be taken advantage of by live migration.
> > > > +  If user space detects two mdev devices are compatible through version
> > > > +  attribute, it can start migration between the two mdev devices, otherwise it
> > > > +  should abort its migration attempts between the two mdev devices.
> > > > +
> > > > +  Example Usage:
> > > > +  case 1:
> > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > +
> > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > +
> > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > +
> > > > +  (1) read source side mdev device's version.
> > > > +  #cat \
> > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > +    mdev_type/version
> > > > +  8086-193b-i915-GVTg_V5_4  
> > > 
> > > Is this really the version information exposed in 2/2?  This is opaque,
> > > so of course you can add things later, but it seems short sighted not
> > > to even append a version 0 tag to account for software compatibility
> > > differences since the above only represents a parent and mdev type
> > > based version.
> > >   
> > yes, currently in 2/2, the version only includes <vendor id> + <device id> +
> > <mdev type>. but you are right, it's better to include software migration
> > version number.
> > so vendor drivers have below 3 ways to designate a mdev device has no migration
> > capability.
> > 1. not registering migration_version attribute
> > 2. on reading migration_version, returning errno
> > 3. on reading migration_version, returning string indicating non-migratable.
> > 
> > The reason of not giving up way 2 is that maybe it can accelerate user space
> > getting information of device incompatible. if we only keep way 3, it would not
> > know this info until writing this string to target attribute.
> > 
> > do you agree?
> 
> The string is opaque to the user, so if you're asking in (3) that the
> user read and parse some information in the string that indicates the
> device is non-migratable then no, I don't agree with that policy.  If
> reading the version attribute produces a result, the only thing the
> user can do with that result is to test it by writing it to another
> version attribute.  Thanks,
> 
> Alex

ok. so we will keep way 2 as a valid choice :)

Thanks
Yan
> > > > +  (2) write source side mdev device's version string into target side mdev
> > > > +  device's version attribute.
> > > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > > +  mdev_type/version
> > > > +  # echo $?
> > > > +  0  
> > > 
> > > TBH, there's a lot of superfluous information in this example that can
> > > be stripped out.  For example:
> > > 
> > > "
> > > (1) Compare mdev types:
> > > 
> > > The mdev type of an instantiated device can be read from the mdev_type
> > > link within the device instance in sysfs, for example:
> > > 
> > >   # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> > > 
> > > The mdev types available on a given host system can also be found
> > > through /sys/class/mdev_bus, for example:
> > > 
> > >   # ls /sys/class/mdev_bus/*/mdev_supported_types/
> > > 
> > > Migration is only possible between devices of the same mdev type.
> > > 
> > > (2) Retrieve the mdev source version:
> > > 
> > > The migration version information can either be read from the mdev_type
> > > link on an instantiated device:
> > > 
> > >   # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version
> > > 
> > > Or it can be read from the mdev type definition, for example:
> > > 
> > >   # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version
> > > 
> > > If reading the source version generates an error, migration is not
> > > possible.  NB, there might be several parent devices for a given mdev
> > > type on a host system, each may support or expose different versions.
> > > Matching the specific mdev type to a parent may become important in
> > > such configurations.
> > > 
> > > (3) Test source version at target:
> > > 
> > > Given a version as outlined above, its compatibility to an instantiated
> > > device of the same mdev type can be tested as:
> > > 
> > >   # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version
> > > 
> > > If this write fails, the source and target versions are not compatible
> > > or the target does not support migration.
> > > 
> > > Compatibility can also be tested prior to target device creation using
> > > the mdev type definition for a parent device with a previously found
> > > matching mdev type, for example:
> > > 
> > >   # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version
> > > 
> > > Again, an error writing the version indicates that an instance of this
> > > mdev type would not support a migration from the provided version.
> > > "
> > > 
> > > In particular from the provided example, the specific UUIDs, mdev
> > > types, parent information, and contents of the version attribute do not
> > > contribute to illustrating the protocol.  In fact, displaying the
> > > contents of the version attribute may tempt users to do comparison on
> > > their own, especially given how easy it is to decide the GVT-g version
> > > string.
> > >  
> > got it!
> > great thanks!
> > I'll update it to the next revision.
> > >   
> > > > +
> > > > +  in this case, user space's write to target side mdev device's version
> > > > +  attribute returns success to indicate the two mdev devices are compatible.
> > > > +
> > > > +  case 2:
> > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b.
> > > > +
> > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > +
> > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > +
> > > > +  (1) read source side mdev device's version.
> > > > +  #cat \
> > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > +    mdev_type/version
> > > > +  8086-193b-i915-GVTg_V5_4
> > > > +
> > > > +  (2) write source side mdev device's version string into target side mdev
> > > > +  device's version attribute.
> > > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > > +  mdev_type/version
> > > > +  -bash: echo: write error: Invalid argument
> > > > +
> > > > +  in this case, user space's write to target side mdev device's version
> > > > +  attribute returns error to indicate the two mdev devices are incompatible.
> > > > +  (incompatible because pci ids of the two mdev devices' parent devices are
> > > > +  different).
> > > > +
> > > > +  case 3:
> > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > +  But vendor driver does not provide version attribute for this device.
> > > > +
> > > > +  (1) read source side mdev device's version.
> > > > +  #cat \
> > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > +    mdev_type/version
> > > > +  cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > +  mdev_type/version': No such file or directory
> > > > +
> > > > +  in this case, user space reads source side mdev device's version attribute
> > > > +  which does not exist however. user space regards the two mdev devices as not
> > > > +  compatible and will not start migration between the two mdev devices.
> > > > +
> > > > +  
> > > 
> > > This is far too long for description and examples, it's not this
> > > complicated.  Thanks,
> > >  
> > got it. I'll follow your above example :)
> > 
> > thanks
> > Yan 
> > > >  * [device]
> > > >  
> > > >    This directory contains links to the devices of type <type-id> that have been  
> > >   
>
Alex Williamson May 9, 2019, 3:38 a.m. UTC | #8
On Wed, 8 May 2019 23:10:55 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Thu, May 09, 2019 at 05:22:42AM +0800, Alex Williamson wrote:
> > On Wed, 8 May 2019 07:27:40 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > On Wed, May 08, 2019 at 05:18:26AM +0800, Alex Williamson wrote:  
> > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > >     
> > > > > version attribute is used to check two mdev devices' compatibility.
> > > > > 
> > > > > The key point of this version attribute is that it's rw.
> > > > > User space has no need to understand internal of device version and no
> > > > > need to compare versions by itself.
> > > > > Compared to reading version strings from both two mdev devices being
> > > > > checked, user space only reads from one mdev device's version attribute.
> > > > > After getting its version string, user space writes this string into the
> > > > > other mdev device's version attribute. Vendor driver of mdev device
> > > > > whose version attribute being written will check device compatibility of
> > > > > the two mdev devices for user space and return success for compatibility
> > > > > or errno for incompatibility.
> > > > > So two readings of version attributes + checking in user space are now
> > > > > changed to one reading + one writing of version attributes + checking in
> > > > > vendor driver.
> > > > > Format and length of version strings are now private to vendor driver
> > > > > who can define them freely.
> > > > > 
> > > > >              __ user space
> > > > >               /\          \
> > > > >              /             \write
> > > > >             / read          \
> > > > >      ______/__           ___\|/___
> > > > >     | version |         | version |-->check compatibility
> > > > >     -----------         -----------
> > > > >     mdev device A       mdev device B
> > > > > 
> > > > > This version attribute is optional. If a mdev device does not provide
> > > > > with a version attribute, this mdev device is incompatible to all other
> > > > > mdev devices.
> > > > > 
> > > > > Live migration is able to take advantage of this version attribute.
> > > > > Before user space actually starts live migration, it can first check
> > > > > whether two mdev devices are compatible.
> > > > > 
> > > > > v2:
> > > > > 1. added detailed intent and usage
> > > > > 2. made definition of version string completely private to vendor driver
> > > > >    (Alex Williamson)
> > > > > 3. abandoned changes to sample mdev drivers (Alex Williamson)
> > > > > 4. mandatory --> optional (Cornelia Huck)
> > > > > 5. added description for errno (Cornelia Huck)
> > > > > 
> > > > > Cc: Alex Williamson <alex.williamson@redhat.com>
> > > > > Cc: Erik Skultety <eskultet@redhat.com>
> > > > > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
> > > > > Cc: Cornelia Huck <cohuck@redhat.com>
> > > > > Cc: "Tian, Kevin" <kevin.tian@intel.com>
> > > > > Cc: Zhenyu Wang <zhenyuw@linux.intel.com>
> > > > > Cc: "Wang, Zhi A" <zhi.a.wang@intel.com>
> > > > > Cc: Neo Jia <cjia@nvidia.com>
> > > > > Cc: Kirti Wankhede <kwankhede@nvidia.com>
> > > > > Cc: Daniel P. Berrangé <berrange@redhat.com>
> > > > > Cc: Christophe de Dinechin <dinechin@redhat.com>
> > > > > 
> > > > > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > > > > ---
> > > > >  Documentation/vfio-mediated-device.txt | 140 +++++++++++++++++++++++++
> > > > >  1 file changed, 140 insertions(+)
> > > > > 
> > > > > diff --git a/Documentation/vfio-mediated-device.txt b/Documentation/vfio-mediated-device.txt
> > > > > index c3f69bcaf96e..013a764968eb 100644
> > > > > --- a/Documentation/vfio-mediated-device.txt
> > > > > +++ b/Documentation/vfio-mediated-device.txt
> > > > > @@ -202,6 +202,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > >    |     |   |--- available_instances
> > > > >    |     |   |--- device_api
> > > > >    |     |   |--- description
> > > > > +  |     |   |--- version
> > > > >    |     |   |--- [devices]
> > > > >    |     |--- [<type-id>]
> > > > >    |     |   |--- create
> > > > > @@ -209,6 +210,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > >    |     |   |--- available_instances
> > > > >    |     |   |--- device_api
> > > > >    |     |   |--- description
> > > > > +  |     |   |--- version
> > > > >    |     |   |--- [devices]
> > > > >    |     |--- [<type-id>]
> > > > >    |          |--- create
> > > > > @@ -216,6 +218,7 @@ Directories and files under the sysfs for Each Physical Device
> > > > >    |          |--- available_instances
> > > > >    |          |--- device_api
> > > > >    |          |--- description
> > > > > +  |          |--- version
> > > > >    |          |--- [devices]    
> > > > 
> > > > I thought there was a request to make this more specific to migration
> > > > by renaming it to something like migration_version.  Also, as an
> > > >    
> > > so this attribute may not only include a mdev device's parent device info and
> > > mdev type, but also include numeric software version of vendor specific
> > > migration code, right?  
> > 
> > It's a vendor defined string, it should be considered opaque to the
> > user, the vendor can include whatever they feel is relevant.
> >   
> > > This actually makes sense.
> > > So, do I need to add a disclaimer in this doc like:
> > > vendor driver should be responsible by itself for a mdev device's migration
> > > compatibility.   
> > 
> > I thought that was the purpose of this attribute.
> >   
> > > During migration setup phase, general migration code in user space VFIO only
> > > checks this version of VFIO migration region, and will not check software version
> > > of vendor specific migration code.  
> > 
> > What is "software version of vendor specific migration code"?  If
> > you're asking whether anything will check for parent device
> > compatibility or parent vendor driver compatibility, the answer is no,
> > that's what this interface is meant to provide.  Userspace should need
> > to do nothing more than verify the mdev types match and then use the
> > version attribute to confirm source to target compatibility.
> >   
> > > It is suggested to incorporate at least parent device info and software version
> > > of vendor specific migration code into this migration_version attribute.  
> > 
> > We can make recommendations as "best practices", but ultimately it's an
> > opaque string defined by the vendor driver.
> > 
> > But you never addressed my comment that previous reviews asked for the
> > attribute to be named something more specific to migration.
> >  
> I aggree to rename it to migration_version.
> 
> > > > optional attribute, it seems the example should perhaps not add it to
> > > > all types to illustrate that it is not required.    
> > > ok. got it.
> > > 
> > >   
> > > > >  
> > > > >  * [mdev_supported_types]
> > > > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> > > > >    This attribute should show the number of devices of type <type-id> that can be
> > > > >    created.
> > > > >  
> > > > > +* version
> > > > > +
> > > > > +  This attribute is rw, and is optional.
> > > > > +  It is used to check device compatibility between two mdev devices and is    
> > > > 
> > > > between two mdev devices of the same type.
> > > >    
> > > ok. got it.
> > > But I have a question about aggregation proposed earlier.
> > > Do we also have to assume the two mdev devices are of the same aggregation
> > > count?
> > > However, aggregation count is not available before a mdev device is created. :(  
> > 
> > We don't support aggregation yet, but yes, that's going to introduce
> > issues here.  Any configuration beyond the base mdev type would imply
> > that the base type could be compatible for migration, but the specific
> > instances might not.  Resolving that would imply that our version
> > information needs to be relative to an instance, not just the base type.
> > 
> > How would we extend this interface to support that?  We could have a
> > version attribute on each device instance, which might report something
> > like:
> > 
> >   0123456789,aggregate=2
> > 
> > IOW the device instance of version concatenates the mdev type version
> > with the additional create parameters for that device.  Writing this to
> > the type attribute should be parsed by the vendor driver as support for
> > given base device with specified additional create parameters.
> > 
> > I'm afraid this also bring us around to treacherous questions around
> > who is responsible for creating that device on the migration target and
> > where is this meta information about the device exposed.  Maybe instead
> > of a per instance version attribute we would instead expose the create
> > parameters as an attribute per instance and it would be userspace's
> > responsibility to create a version string from the mdev type and create
> > parameters similar to above.  This would also make it possible to
> > create a compatible instance on the target without pre-knowledge of how
> > the device was created.
> > 
> > Also, this issue exists already, but compatibility and capacity are two
> > separate things, I think we want to limit this interface to the
> > former.  For instance, if I want to migrate an i915-GVTg_V5_1 device to
> > another system where available_instances for that type is zero, the
> > version attribute should strictly report the device compatibility, it's
> > not responsible for returning an errno due to lack of resources.
> > Similarly if we were to do something with aggregation, the version
> > attribute would strictly report if the target supports creating that
> > device with those parameters, not whether it has capacity to create
> > such as device at that instant in time.
> >  
> I think it's good to have a migration_version attribute under each device
> instance.
> It has two pros:
> 1. vendor driver can incorperate into the string things like:
>    parent device info + mdev type + aggregate count + software version

The only thing added here is aggregate count, the rest is available for
the base type.

> 2. even for non mdev devices, like a VF in SRIOV
>    PF driver can export this migration_version under a VF instance. so a VF is
>    possible to migrate with vfio-pci driver installed. (though with current VFIO
>    live migration RFCs, with vfio-pci driver is not able to migrate. but it
>    provides a possibility)

I don't follow here, migration version is only one piece of the puzzle
to enable the migration of a device, if a PF driver wants to make a VF
assignable and migratable, it can wrap it in an mdev layer and export
it within the infrastructure we're developing.

> could we maintain two migration_version attributes? 
> "create parameter + mdev_type migraton_version" for user space software to
> create a migration compatible mdev device.
> "per device instance migration_version" for verifying migration compatibility of
> devices already created.

I don't think so.  If it wasn't obvious in my stream of consciousness
previous reply, I prefer the latter mechanism where an instance exposes
the creation attributes allowing the user to concatenate the version
string and creation attributes to ask the type level version about
compatibility.  I think it's confusing to have a version on the
instance that reports something different than the version on the type
but we cannot remove the version on the type because we need to be able
to test compatibility before instantiating an instance.  Therefore
let's not put a version on the instance is the conclusion I've come
to.  Thanks,

Alex

> > > > > +  accessed in pairs between the two mdev devices being checked.    
> > > > 
> > > > "in pairs"?    
> > > I meant, user space needs to access version attributes from two mdev device.
> > > but seems that it's needless to mention that... I'll remove it :)
> > > 
> > >   
> > > > > +  The intent of this attribute is to make an mdev device's version opaque to
> > > > > +  user space, so instead of reading two mdev devices' version strings and    
> > > > 
> > > > perhaps "...instead of reading the version string of two mdev devices
> > > > and comparing them in userspace..."    
> > > yes, better, thanks:)
> > >   
> > > > > +  comparing in userspace, user space should only read one mdev device's version
> > > > > +  attribute, and writes this version string into the other mdev device's version
> > > > > +  attribute. Then vendor driver of mdev device whose version attribute being
> > > > > +  written would check the incoming version string and tell user space whether
> > > > > +  the two mdev devices are compatible via return value. That's why this
> > > > > +  attribute is writable.
> > > > > +
> > > > > +  when reading this attribute, it should show device version string of
> > > > > +  the device of type <type-id>.
> > > > > +
> > > > > +  This string is private to vendor driver itself. Vendor driver is able to
> > > > > +  freely define format and length of device version string.
> > > > > +  e.g. It can use a combination of pciid of parent device + mdev type.    
> > > > 
> > > > Can the user assume the data contents of the string is ascii
> > > > characters?  It's good that the vendor driver defines the format and
> > > > length, but the user probably needs some expectation bounding that
> > > > length.  Should we define it as no larger than PATH_MAX (4096), or maybe
> > > > NAME_MAX (255) might be more reasonable?    
> > > I think so. I'll add those restrictions in next revision.   
> > 
> > If we start adding creation parameters, PATH_MAX may actually be the
> > more reasonable limit.
> >  
> got it.
> > > > > +
> > > > > +  When writing a string to this attribute, vendor driver should analyze this
> > > > > +  string and check whether the mdev device being identified by this string is
> > > > > +  compatible with the mdev device for this attribute. vendor driver should then    
> > > > 
> > > > Compatible for what purpose?  I think this is where specifically
> > > > calling this a migration_version potentially has value.    
> > > yes. if it also covers version of vendor specific migration code, calling it
> > > migration_version is more appropriate.  
> > 
> > I think we're discussing an interface that validates "I [the vendor
> > driver] am able to import the state of this version", therefore it must
> > include every relevant aspect of the vendor support for that.
> >   
> > > > > +  return written string's length if it regards the two mdev devices are
> > > > > +  compatible; vendor driver should return negative errno if it regards the two
> > > > > +  mdev devices are not compatible.    
> > > > 
> > > > IOW, the write(2) will succeed if the version is determined to be
> > > > compatible and otherwise fail with vendor specific errno.
> > > >    
> > > thanks:)
> > >   
> > > > > +
> > > > > +  User space should treat ANY of below conditions as two mdev devices not
> > > > > +  compatible:    
> > > > 
> > > > (0) The mdev devices are not of the same type.
> > > >    
> > > the same as above. do we also need to take aggregation count into consideration?
> > >   
> > > > > +  (1) any one of the two mdev devices does not have a version attribute
> > > > > +  (2) error when read from one mdev device's version attribute    
> > > > 
> > > > Is this intended to support that the vendor driver can supply a version
> > > > attribute but not support migration?  TBH, this sounds like a vendor
> > > > driver bug, but maybe it's necessary if the vendor driver could have
> > > > some types that support migration and others that do not?  IOW, we're
> > > > supplying the same attribute groups to all devices from a vendor, in
> > > > which case my comment above regarding an example type without a version
> > > > attribute might be invalid.    
> > > hmm, this is to make life easier for vendor driver to have some types that
> > > support migration and others that do not. while we can get rid of returning
> > > errno by providing different attribute groups to different devices, the way of
> > > returning errno gives a simpler choice to vendors.  
> > 
> > Yes, I think it might be overly complicated to provide different
> > attribute groups for different devices, we have more flexibility if the
> > user does not make any assumptions based only on the presence of a
> > version attribute.
> >   
> > > > > +  (3) error when write one mdev device's version string to the other mdev
> > > > > +  device's version attribute
> > > > > +
> > > > > +  User space should regard two mdev devices compatible when ALL of below
> > > > > +  conditions are met:    
> > > > 
> > > > (0) The mdev devices are of the same type
> > > >     
> > > > > +  (1) success when read from one mdev device's version attribute.
> > > > > +  (2) success when write one mdev device's version string to the other mdev
> > > > > +  device's version attribute
> > > > > +
> > > > > +  Errno:
> > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > +  return -EINVAL;    
> > > > 
> > > > I think it's best not to define the specific errno returned for a
> > > > specific situation, let the vendor driver decide, userspace simply
> > > > needs to know that an errno on read indicates the device does not
> > > > support migration version comparison and that an errno on write
> > > > indicates the devices are incompatible or the target doesn't support
> > > > migration versions.
> > > >    
> > > yes, user space only gets 0 or 1 as return code, not those errno. 
> > > maybe I only need to describe errno in patch 2/2.
> > >   
> > > > > +
> > > > > +  This attribute can be taken advantage of by live migration.
> > > > > +  If user space detects two mdev devices are compatible through version
> > > > > +  attribute, it can start migration between the two mdev devices, otherwise it
> > > > > +  should abort its migration attempts between the two mdev devices.
> > > > > +
> > > > > +  Example Usage:
> > > > > +  case 1:
> > > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > > +
> > > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > > +
> > > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > > +
> > > > > +  (1) read source side mdev device's version.
> > > > > +  #cat \
> > > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > > +    mdev_type/version
> > > > > +  8086-193b-i915-GVTg_V5_4    
> > > > 
> > > > Is this really the version information exposed in 2/2?  This is opaque,
> > > > so of course you can add things later, but it seems short sighted not
> > > > to even append a version 0 tag to account for software compatibility
> > > > differences since the above only represents a parent and mdev type
> > > > based version.
> > > >     
> > > yes, currently in 2/2, the version only includes <vendor id> + <device id> +
> > > <mdev type>. but you are right, it's better to include software migration
> > > version number.
> > > so vendor drivers have below 3 ways to designate a mdev device has no migration
> > > capability.
> > > 1. not registering migration_version attribute
> > > 2. on reading migration_version, returning errno
> > > 3. on reading migration_version, returning string indicating non-migratable.
> > > 
> > > The reason of not giving up way 2 is that maybe it can accelerate user space
> > > getting information of device incompatible. if we only keep way 3, it would not
> > > know this info until writing this string to target attribute.
> > > 
> > > do you agree?  
> > 
> > The string is opaque to the user, so if you're asking in (3) that the
> > user read and parse some information in the string that indicates the
> > device is non-migratable then no, I don't agree with that policy.  If
> > reading the version attribute produces a result, the only thing the
> > user can do with that result is to test it by writing it to another
> > version attribute.  Thanks,
> > 
> > Alex  
> 
> ok. so we will keep way 2 as a valid choice :)
> 
> Thanks
> Yan
> > > > > +  (2) write source side mdev device's version string into target side mdev
> > > > > +  device's version attribute.
> > > > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > > > +  mdev_type/version
> > > > > +  # echo $?
> > > > > +  0    
> > > > 
> > > > TBH, there's a lot of superfluous information in this example that can
> > > > be stripped out.  For example:
> > > > 
> > > > "
> > > > (1) Compare mdev types:
> > > > 
> > > > The mdev type of an instantiated device can be read from the mdev_type
> > > > link within the device instance in sysfs, for example:
> > > > 
> > > >   # basename $(readlink -f /sys/bus/mdev/devices/$MDEV_UUID/mdev_type/)
> > > > 
> > > > The mdev types available on a given host system can also be found
> > > > through /sys/class/mdev_bus, for example:
> > > > 
> > > >   # ls /sys/class/mdev_bus/*/mdev_supported_types/
> > > > 
> > > > Migration is only possible between devices of the same mdev type.
> > > > 
> > > > (2) Retrieve the mdev source version:
> > > > 
> > > > The migration version information can either be read from the mdev_type
> > > > link on an instantiated device:
> > > > 
> > > >   # cat /sys/bus/mdev/devices/$UUID1/mdev_type/version
> > > > 
> > > > Or it can be read from the mdev type definition, for example:
> > > > 
> > > >   # cat /sys/class/mdev_bus/*/mdev_supported_types/$MDEV_TYPE/version
> > > > 
> > > > If reading the source version generates an error, migration is not
> > > > possible.  NB, there might be several parent devices for a given mdev
> > > > type on a host system, each may support or expose different versions.
> > > > Matching the specific mdev type to a parent may become important in
> > > > such configurations.
> > > > 
> > > > (3) Test source version at target:
> > > > 
> > > > Given a version as outlined above, its compatibility to an instantiated
> > > > device of the same mdev type can be tested as:
> > > > 
> > > >   # echo $VERSION > /sys/bus/mdev/devices/$UUID2/mdev_type/version
> > > > 
> > > > If this write fails, the source and target versions are not compatible
> > > > or the target does not support migration.
> > > > 
> > > > Compatibility can also be tested prior to target device creation using
> > > > the mdev type definition for a parent device with a previously found
> > > > matching mdev type, for example:
> > > > 
> > > >   # echo $VERSION > /sys/class/mdev_bus/$PARENT/mdev_supported_types/$MDEV_TYPE/version
> > > > 
> > > > Again, an error writing the version indicates that an instance of this
> > > > mdev type would not support a migration from the provided version.
> > > > "
> > > > 
> > > > In particular from the provided example, the specific UUIDs, mdev
> > > > types, parent information, and contents of the version attribute do not
> > > > contribute to illustrating the protocol.  In fact, displaying the
> > > > contents of the version attribute may tempt users to do comparison on
> > > > their own, especially given how easy it is to decide the GVT-g version
> > > > string.
> > > >    
> > > got it!
> > > great thanks!
> > > I'll update it to the next revision.  
> > > >     
> > > > > +
> > > > > +  in this case, user space's write to target side mdev device's version
> > > > > +  attribute returns success to indicate the two mdev devices are compatible.
> > > > > +
> > > > > +  case 2:
> > > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > > +  target side mdev device is if of uuid 882cc4da-dede-11e7-9180-078a62063ab1,
> > > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-191b.
> > > > > +
> > > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > > +  5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/mdev_type
> > > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > > +
> > > > > +  # readlink /sys/bus/pci/devices/0000\:00\:02.0/\
> > > > > +  882cc4da-dede-11e7-9180-078a62063ab1/mdev_type
> > > > > +  ../mdev_supported_types/i915-GVTg_V5_4
> > > > > +
> > > > > +  (1) read source side mdev device's version.
> > > > > +  #cat \
> > > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > > +    mdev_type/version
> > > > > +  8086-193b-i915-GVTg_V5_4
> > > > > +
> > > > > +  (2) write source side mdev device's version string into target side mdev
> > > > > +  device's version attribute.
> > > > > +  # echo 8086-193b-i915-GVTg_V5_4 >
> > > > > +   /sys/bus/pci/devices/0000\:00\:02.0/882cc4da-dede-11e7-9180-078a62063ab1/\
> > > > > +  mdev_type/version
> > > > > +  -bash: echo: write error: Invalid argument
> > > > > +
> > > > > +  in this case, user space's write to target side mdev device's version
> > > > > +  attribute returns error to indicate the two mdev devices are incompatible.
> > > > > +  (incompatible because pci ids of the two mdev devices' parent devices are
> > > > > +  different).
> > > > > +
> > > > > +  case 3:
> > > > > +  source side mdev device is of uuid 5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd,
> > > > > +  its mdev type is i915-GVTg_V5_4. pci id of parent device is 8086-193b.
> > > > > +  But vendor driver does not provide version attribute for this device.
> > > > > +
> > > > > +  (1) read source side mdev device's version.
> > > > > +  #cat \
> > > > > +    /sys/bus/pci/devices/0000\:00\:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > > +    mdev_type/version
> > > > > +  cat: '/sys/bus/pci/devices/0000:00:02.0/5ac1fb20-2bbf-4842-bb7e-36c58c3be9cd/\
> > > > > +  mdev_type/version': No such file or directory
> > > > > +
> > > > > +  in this case, user space reads source side mdev device's version attribute
> > > > > +  which does not exist however. user space regards the two mdev devices as not
> > > > > +  compatible and will not start migration between the two mdev devices.
> > > > > +
> > > > > +    
> > > > 
> > > > This is far too long for description and examples, it's not this
> > > > complicated.  Thanks,
> > > >    
> > > got it. I'll follow your above example :)
> > > 
> > > thanks
> > > Yan   
> > > > >  * [device]
> > > > >  
> > > > >    This directory contains links to the devices of type <type-id> that have been    
> > > >     
> >
Yan Zhao May 9, 2019, 6:55 a.m. UTC | #9
On Wed, May 08, 2019 at 11:27:47PM +0800, Boris Fiuczynski wrote:
> On 5/8/19 11:22 PM, Alex Williamson wrote:
> >>> I thought there was a request to make this more specific to migration
> >>> by renaming it to something like migration_version.  Also, as an
> >>>   
> >> so this attribute may not only include a mdev device's parent device info and
> >> mdev type, but also include numeric software version of vendor specific
> >> migration code, right?
> > It's a vendor defined string, it should be considered opaque to the
> > user, the vendor can include whatever they feel is relevant.
> > 
> Would a vendor also be allowed to provide a string expressing required 
> features as well as containing backend resource requirements which need 
> to be compatible for a successful migration? Somehow a bit like a cpu 
> model... maybe even as json or xml...
> I am asking this with vfio-ap in mind. In that context checking 
> compatibility of two vfio-ap mdev devices is not as simple as checking 
> if version A is smaller or equal to version B.
>
I think so. vendor driver is allowed to put whatever content into the
migration_version string as long as it thinks it's necessary. 
vendor driver only needs ensure in the target mdev device, the write(2)
operation on its migration_version attribute would correctly fail or succeeed
based on the input string.

Thanks
Yan
> -- 
> Mit freundlichen GrĂĽĂźen/Kind regards
>     Boris Fiuczynski
> 
> IBM Deutschland Research & Development GmbH
> Vorsitzender des Aufsichtsrats: Matthias Hartmann
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
>
Cornelia Huck May 9, 2019, 3:24 p.m. UTC | #10
On Wed, 8 May 2019 07:57:05 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, May 07, 2019 at 05:19:54PM +0800, Cornelia Huck wrote:
> > On Sun,  5 May 2019 21:49:04 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> >   
> > > version attribute is used to check two mdev devices' compatibility.
> > > 
> > > The key point of this version attribute is that it's rw.
> > > User space has no need to understand internal of device version and no
> > > need to compare versions by itself.
> > > Compared to reading version strings from both two mdev devices being
> > > checked, user space only reads from one mdev device's version attribute.
> > > After getting its version string, user space writes this string into the
> > > other mdev device's version attribute. Vendor driver of mdev device
> > > whose version attribute being written will check device compatibility of
> > > the two mdev devices for user space and return success for compatibility
> > > or errno for incompatibility.  
> > 
> > I'm still missing a bit _what_ is actually supposed to be
> > compatible/incompatible. I'd assume some internal state descriptions
> > (even if this is not actually limited to migration).
> >  
> right.
> originally, I thought this attribute should only contain a device's hardware
> compatibility info. But seems also including vendor specific software migration
> version is more reasonable, because general VFIO migration code cannot know
> version of vendor specific software migration code until migration data is
> transferring to the target vm. Then renaming it to migration_version is more
> appropriate.
> :)

Nod.

(...)

> > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> > >    This attribute should show the number of devices of type <type-id> that can be
> > >    created.
> > >  
> > > +* version
> > > +
> > > +  This attribute is rw, and is optional.
> > > +  It is used to check device compatibility between two mdev devices and is
> > > +  accessed in pairs between the two mdev devices being checked.
> > > +  The intent of this attribute is to make an mdev device's version opaque to
> > > +  user space, so instead of reading two mdev devices' version strings and
> > > +  comparing in userspace, user space should only read one mdev device's version
> > > +  attribute, and writes this version string into the other mdev device's version
> > > +  attribute. Then vendor driver of mdev device whose version attribute being
> > > +  written would check the incoming version string and tell user space whether
> > > +  the two mdev devices are compatible via return value. That's why this
> > > +  attribute is writable.  
> > 
> > I would reword this a bit:
> > 
> > "This attribute provides a way to check device compatibility between
> > two mdev devices from userspace. The intended usage is for userspace to
> > read the version attribute from one mdev device and then writing that
> > value to the version attribute of the other mdev device. The second
> > mdev device indicates compatibility via the return code of the write
> > operation. This makes compatibility between mdev devices completely
> > vendor-defined and opaque to userspace."
> > 
> > We still should explain _what_ compatibility we're talking about here,
> > though.
> >   
> Thanks. It's much better than mine:) 
> Then I'll change compatibility --> migration compatibility.

Ok, with that it should be clear enough.

> 
> > > +
> > > +  when reading this attribute, it should show device version string of
> > > +  the device of type <type-id>.
> > > +
> > > +  This string is private to vendor driver itself. Vendor driver is able to
> > > +  freely define format and length of device version string.
> > > +  e.g. It can use a combination of pciid of parent device + mdev type.
> > > +
> > > +  When writing a string to this attribute, vendor driver should analyze this
> > > +  string and check whether the mdev device being identified by this string is
> > > +  compatible with the mdev device for this attribute. vendor driver should then
> > > +  return written string's length if it regards the two mdev devices are
> > > +  compatible; vendor driver should return negative errno if it regards the two
> > > +  mdev devices are not compatible.
> > > +
> > > +  User space should treat ANY of below conditions as two mdev devices not
> > > +  compatible:
> > > +  (1) any one of the two mdev devices does not have a version attribute
> > > +  (2) error when read from one mdev device's version attribute  
> > 
> > s/read/reading/
> >   
> > > +  (3) error when write one mdev device's version string to the other mdev  
> > 
> > s/write/writing/
> >   
> > > +  device's version attribute
> > > +
> > > +  User space should regard two mdev devices compatible when ALL of below
> > > +  conditions are met:
> > > +  (1) success when read from one mdev device's version attribute.  
> > 
> > s/read/reading/
> >   
> > > +  (2) success when write one mdev device's version string to the other mdev  
> > 
> > s/write/writing/  
> got it. thanks for pointing them out:)
> >   
> > > +  device's version attribute
> > > +
> > > +  Errno:
> > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev  
> > 
> > "If the vendor driver wants to designate a mdev device..."
> >   
> ok. thanks:)
> > > +  devices, it should not register version attribute for this mdev device. But if
> > > +  a vendor driver has already registered version attribute and it wants to claim
> > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > +  -ENODEV on access to this mdev device's version attribute.
> > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > +  incompatible mdev devices's version strings to its version attribute should
> > > +  return -EINVAL;  
> > 
> > 
> > Maybe put the defined return code into a bulleted list instead? But
> > this looks reasonable as well.
> >   
> as user space have no idea of those errno and only gets 0/1 as return code from
> read/write. maybe I can move this description of errno to patch 2/2 as an
> example?

Confused. They should get -EINVAL/-ENODEV/... all right, shouldn't they?

> 
> > > +
> > > +  This attribute can be taken advantage of by live migration.
> > > +  If user space detects two mdev devices are compatible through version
> > > +  attribute, it can start migration between the two mdev devices, otherwise it
> > > +  should abort its migration attempts between the two mdev devices.  
> > 
> > (...)
> > _______________________________________________
> > intel-gvt-dev mailing list
> > intel-gvt-dev@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Cornelia Huck May 9, 2019, 3:38 p.m. UTC | #11
On Tue, 7 May 2019 15:18:26 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> On Sun,  5 May 2019 21:49:04 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:

> > +  Errno:
> > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > +  devices, it should not register version attribute for this mdev device. But if
> > +  a vendor driver has already registered version attribute and it wants to claim
> > +  a mdev device incompatible to all other mdev devices, it needs to return
> > +  -ENODEV on access to this mdev device's version attribute.
> > +  If a mdev device is only incompatible to certain mdev devices, write of
> > +  incompatible mdev devices's version strings to its version attribute should
> > +  return -EINVAL;  
> 
> I think it's best not to define the specific errno returned for a
> specific situation, let the vendor driver decide, userspace simply
> needs to know that an errno on read indicates the device does not
> support migration version comparison and that an errno on write
> indicates the devices are incompatible or the target doesn't support
> migration versions.

I think I have to disagree here: It's probably valuable to have an
agreed error for 'cannot migrate at all' vs 'cannot migrate between
those two particular devices'. Userspace might want to do different
things (e.g. trying with different device pairs).
Dr. David Alan Gilbert May 9, 2019, 3:48 p.m. UTC | #12
* Cornelia Huck (cohuck@redhat.com) wrote:
> On Tue, 7 May 2019 15:18:26 -0600
> Alex Williamson <alex.williamson@redhat.com> wrote:
> 
> > On Sun,  5 May 2019 21:49:04 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > > +  Errno:
> > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > +  devices, it should not register version attribute for this mdev device. But if
> > > +  a vendor driver has already registered version attribute and it wants to claim
> > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > +  -ENODEV on access to this mdev device's version attribute.
> > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > +  incompatible mdev devices's version strings to its version attribute should
> > > +  return -EINVAL;  
> > 
> > I think it's best not to define the specific errno returned for a
> > specific situation, let the vendor driver decide, userspace simply
> > needs to know that an errno on read indicates the device does not
> > support migration version comparison and that an errno on write
> > indicates the devices are incompatible or the target doesn't support
> > migration versions.
> 
> I think I have to disagree here: It's probably valuable to have an
> agreed error for 'cannot migrate at all' vs 'cannot migrate between
> those two particular devices'. Userspace might want to do different
> things (e.g. trying with different device pairs).

Trying to stuff these things down an errno seems a bad idea; we can't
get much information that way.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Cornelia Huck May 9, 2019, 3:54 p.m. UTC | #13
On Thu, 9 May 2019 16:48:57 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Tue, 7 May 2019 15:18:26 -0600
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> >   
> > > On Sun,  5 May 2019 21:49:04 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:  
> >   
> > > > +  Errno:
> > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > +  return -EINVAL;    
> > > 
> > > I think it's best not to define the specific errno returned for a
> > > specific situation, let the vendor driver decide, userspace simply
> > > needs to know that an errno on read indicates the device does not
> > > support migration version comparison and that an errno on write
> > > indicates the devices are incompatible or the target doesn't support
> > > migration versions.  
> > 
> > I think I have to disagree here: It's probably valuable to have an
> > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > those two particular devices'. Userspace might want to do different
> > things (e.g. trying with different device pairs).  
> 
> Trying to stuff these things down an errno seems a bad idea; we can't
> get much information that way.

So, what would be a reasonable approach? Userspace should first read
the version attributes on both devices (to find out whether migration
is supported at all), and only then figure out via writing whether they
are compatible?

(Or just go ahead and try, if it does not care about the reason.)
Dr. David Alan Gilbert May 9, 2019, 4:48 p.m. UTC | #14
* Cornelia Huck (cohuck@redhat.com) wrote:
> On Thu, 9 May 2019 16:48:57 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Tue, 7 May 2019 15:18:26 -0600
> > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > >   
> > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > Yan Zhao <yan.y.zhao@intel.com> wrote:  
> > >   
> > > > > +  Errno:
> > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > +  return -EINVAL;    
> > > > 
> > > > I think it's best not to define the specific errno returned for a
> > > > specific situation, let the vendor driver decide, userspace simply
> > > > needs to know that an errno on read indicates the device does not
> > > > support migration version comparison and that an errno on write
> > > > indicates the devices are incompatible or the target doesn't support
> > > > migration versions.  
> > > 
> > > I think I have to disagree here: It's probably valuable to have an
> > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > those two particular devices'. Userspace might want to do different
> > > things (e.g. trying with different device pairs).  
> > 
> > Trying to stuff these things down an errno seems a bad idea; we can't
> > get much information that way.
> 
> So, what would be a reasonable approach? Userspace should first read
> the version attributes on both devices (to find out whether migration
> is supported at all), and only then figure out via writing whether they
> are compatible?
> 
> (Or just go ahead and try, if it does not care about the reason.)

Well, I'm OK with something like writing to test whether it's
compatible, it's just we need a better way of saying 'no'.
I'm not sure if that involves reading back from somewhere after
the write or what.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Yan Zhao May 10, 2019, 2:43 a.m. UTC | #15
On Thu, May 09, 2019 at 11:24:49PM +0800, Cornelia Huck wrote:
> On Wed, 8 May 2019 07:57:05 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, May 07, 2019 at 05:19:54PM +0800, Cornelia Huck wrote:
> > > On Sun,  5 May 2019 21:49:04 -0400
> > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > >   
> > > > version attribute is used to check two mdev devices' compatibility.
> > > > 
> > > > The key point of this version attribute is that it's rw.
> > > > User space has no need to understand internal of device version and no
> > > > need to compare versions by itself.
> > > > Compared to reading version strings from both two mdev devices being
> > > > checked, user space only reads from one mdev device's version attribute.
> > > > After getting its version string, user space writes this string into the
> > > > other mdev device's version attribute. Vendor driver of mdev device
> > > > whose version attribute being written will check device compatibility of
> > > > the two mdev devices for user space and return success for compatibility
> > > > or errno for incompatibility.  
> > > 
> > > I'm still missing a bit _what_ is actually supposed to be
> > > compatible/incompatible. I'd assume some internal state descriptions
> > > (even if this is not actually limited to migration).
> > >  
> > right.
> > originally, I thought this attribute should only contain a device's hardware
> > compatibility info. But seems also including vendor specific software migration
> > version is more reasonable, because general VFIO migration code cannot know
> > version of vendor specific software migration code until migration data is
> > transferring to the target vm. Then renaming it to migration_version is more
> > appropriate.
> > :)
> 
> Nod.
> 
> (...)
> 
> > > > @@ -246,6 +249,143 @@ Directories and files under the sysfs for Each Physical Device
> > > >    This attribute should show the number of devices of type <type-id> that can be
> > > >    created.
> > > >  
> > > > +* version
> > > > +
> > > > +  This attribute is rw, and is optional.
> > > > +  It is used to check device compatibility between two mdev devices and is
> > > > +  accessed in pairs between the two mdev devices being checked.
> > > > +  The intent of this attribute is to make an mdev device's version opaque to
> > > > +  user space, so instead of reading two mdev devices' version strings and
> > > > +  comparing in userspace, user space should only read one mdev device's version
> > > > +  attribute, and writes this version string into the other mdev device's version
> > > > +  attribute. Then vendor driver of mdev device whose version attribute being
> > > > +  written would check the incoming version string and tell user space whether
> > > > +  the two mdev devices are compatible via return value. That's why this
> > > > +  attribute is writable.  
> > > 
> > > I would reword this a bit:
> > > 
> > > "This attribute provides a way to check device compatibility between
> > > two mdev devices from userspace. The intended usage is for userspace to
> > > read the version attribute from one mdev device and then writing that
> > > value to the version attribute of the other mdev device. The second
> > > mdev device indicates compatibility via the return code of the write
> > > operation. This makes compatibility between mdev devices completely
> > > vendor-defined and opaque to userspace."
> > > 
> > > We still should explain _what_ compatibility we're talking about here,
> > > though.
> > >   
> > Thanks. It's much better than mine:) 
> > Then I'll change compatibility --> migration compatibility.
> 
> Ok, with that it should be clear enough.
> 
> > 
> > > > +
> > > > +  when reading this attribute, it should show device version string of
> > > > +  the device of type <type-id>.
> > > > +
> > > > +  This string is private to vendor driver itself. Vendor driver is able to
> > > > +  freely define format and length of device version string.
> > > > +  e.g. It can use a combination of pciid of parent device + mdev type.
> > > > +
> > > > +  When writing a string to this attribute, vendor driver should analyze this
> > > > +  string and check whether the mdev device being identified by this string is
> > > > +  compatible with the mdev device for this attribute. vendor driver should then
> > > > +  return written string's length if it regards the two mdev devices are
> > > > +  compatible; vendor driver should return negative errno if it regards the two
> > > > +  mdev devices are not compatible.
> > > > +
> > > > +  User space should treat ANY of below conditions as two mdev devices not
> > > > +  compatible:
> > > > +  (1) any one of the two mdev devices does not have a version attribute
> > > > +  (2) error when read from one mdev device's version attribute  
> > > 
> > > s/read/reading/
> > >   
> > > > +  (3) error when write one mdev device's version string to the other mdev  
> > > 
> > > s/write/writing/
> > >   
> > > > +  device's version attribute
> > > > +
> > > > +  User space should regard two mdev devices compatible when ALL of below
> > > > +  conditions are met:
> > > > +  (1) success when read from one mdev device's version attribute.  
> > > 
> > > s/read/reading/
> > >   
> > > > +  (2) success when write one mdev device's version string to the other mdev  
> > > 
> > > s/write/writing/  
> > got it. thanks for pointing them out:)
> > >   
> > > > +  device's version attribute
> > > > +
> > > > +  Errno:
> > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev  
> > > 
> > > "If the vendor driver wants to designate a mdev device..."
> > >   
> > ok. thanks:)
> > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > +  return -EINVAL;  
> > > 
> > > 
> > > Maybe put the defined return code into a bulleted list instead? But
> > > this looks reasonable as well.
> > >   
> > as user space have no idea of those errno and only gets 0/1 as return code from
> > read/write. maybe I can move this description of errno to patch 2/2 as an
> > example?
> 
> Confused. They should get -EINVAL/-ENODEV/... all right, shouldn't they?
>
sorry. my previous statement is not right.
read(2)/write(2) return -1 on error, error cause is returned through errno.
So, it's also fine if we can get an agreement in this doc that
-ENODEV meaning a mdev device is not compatible to all devices,
-EINVAL meaning a mdev device is not compatible to specified device.


> > 
> > > > +
> > > > +  This attribute can be taken advantage of by live migration.
> > > > +  If user space detects two mdev devices are compatible through version
> > > > +  attribute, it can start migration between the two mdev devices, otherwise it
> > > > +  should abort its migration attempts between the two mdev devices.  
> > > 
> > > (...)
> > > _______________________________________________
> > > intel-gvt-dev mailing list
> > > intel-gvt-dev@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev  
> 
> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Cornelia Huck May 10, 2019, 9:08 a.m. UTC | #16
On Thu, 9 May 2019 17:48:26 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Thu, 9 May 2019 16:48:57 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > >     
> > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:    
> > > >     
> > > > > > +  Errno:
> > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > +  return -EINVAL;      
> > > > > 
> > > > > I think it's best not to define the specific errno returned for a
> > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > needs to know that an errno on read indicates the device does not
> > > > > support migration version comparison and that an errno on write
> > > > > indicates the devices are incompatible or the target doesn't support
> > > > > migration versions.    
> > > > 
> > > > I think I have to disagree here: It's probably valuable to have an
> > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > those two particular devices'. Userspace might want to do different
> > > > things (e.g. trying with different device pairs).    
> > > 
> > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > get much information that way.  
> > 
> > So, what would be a reasonable approach? Userspace should first read
> > the version attributes on both devices (to find out whether migration
> > is supported at all), and only then figure out via writing whether they
> > are compatible?
> > 
> > (Or just go ahead and try, if it does not care about the reason.)  
> 
> Well, I'm OK with something like writing to test whether it's
> compatible, it's just we need a better way of saying 'no'.
> I'm not sure if that involves reading back from somewhere after
> the write or what.

Hm, so I basically see two ways of doing that:
- standardize on some error codes... problem: error codes can be hard
  to fit to reasons
- make the error available in some attribute that can be read

I'm not sure how we can serialize the readback with the last write,
though (this looks inherently racy).

How important is detailed error reporting here?
Dr. David Alan Gilbert May 10, 2019, 9:36 a.m. UTC | #17
* Cornelia Huck (cohuck@redhat.com) wrote:
> On Thu, 9 May 2019 17:48:26 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Thu, 9 May 2019 16:48:57 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > >     
> > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:    
> > > > >     
> > > > > > > +  Errno:
> > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > +  return -EINVAL;      
> > > > > > 
> > > > > > I think it's best not to define the specific errno returned for a
> > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > needs to know that an errno on read indicates the device does not
> > > > > > support migration version comparison and that an errno on write
> > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > migration versions.    
> > > > > 
> > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > those two particular devices'. Userspace might want to do different
> > > > > things (e.g. trying with different device pairs).    
> > > > 
> > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > get much information that way.  
> > > 
> > > So, what would be a reasonable approach? Userspace should first read
> > > the version attributes on both devices (to find out whether migration
> > > is supported at all), and only then figure out via writing whether they
> > > are compatible?
> > > 
> > > (Or just go ahead and try, if it does not care about the reason.)  
> > 
> > Well, I'm OK with something like writing to test whether it's
> > compatible, it's just we need a better way of saying 'no'.
> > I'm not sure if that involves reading back from somewhere after
> > the write or what.
> 
> Hm, so I basically see two ways of doing that:
> - standardize on some error codes... problem: error codes can be hard
>   to fit to reasons
> - make the error available in some attribute that can be read
> 
> I'm not sure how we can serialize the readback with the last write,
> though (this looks inherently racy).
> 
> How important is detailed error reporting here?

I think we need something, otherwise we're just going to get vague
user reports of 'but my VM doesn't migrate'; I'd like the error to be
good enough to point most users to something they can understand
(e.g. wrong card family/too old a driver etc).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Cornelia Huck May 10, 2019, 9:48 a.m. UTC | #18
On Fri, 10 May 2019 10:36:09 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Thu, 9 May 2019 17:48:26 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >   
> > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >     
> > > > > * Cornelia Huck (cohuck@redhat.com) wrote:    
> > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > >       
> > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:      
> > > > > >       
> > > > > > > > +  Errno:
> > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > +  return -EINVAL;        
> > > > > > > 
> > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > support migration version comparison and that an errno on write
> > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > migration versions.      
> > > > > > 
> > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > those two particular devices'. Userspace might want to do different
> > > > > > things (e.g. trying with different device pairs).      
> > > > > 
> > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > get much information that way.    
> > > > 
> > > > So, what would be a reasonable approach? Userspace should first read
> > > > the version attributes on both devices (to find out whether migration
> > > > is supported at all), and only then figure out via writing whether they
> > > > are compatible?
> > > > 
> > > > (Or just go ahead and try, if it does not care about the reason.)    
> > > 
> > > Well, I'm OK with something like writing to test whether it's
> > > compatible, it's just we need a better way of saying 'no'.
> > > I'm not sure if that involves reading back from somewhere after
> > > the write or what.  
> > 
> > Hm, so I basically see two ways of doing that:
> > - standardize on some error codes... problem: error codes can be hard
> >   to fit to reasons
> > - make the error available in some attribute that can be read
> > 
> > I'm not sure how we can serialize the readback with the last write,
> > though (this looks inherently racy).
> > 
> > How important is detailed error reporting here?  
> 
> I think we need something, otherwise we're just going to get vague
> user reports of 'but my VM doesn't migrate'; I'd like the error to be
> good enough to point most users to something they can understand
> (e.g. wrong card family/too old a driver etc).

Ok, that sounds like a reasonable point. Not that I have a better idea
how to achieve that, though... we could also log a more verbose error
message to the kernel log, but that's not necessarily where a user will
look first.

Ideally, we'd want to have the user space program setting up things
querying the general compatibility for migration (so that it becomes
their problem on how to alert the user to problems :), but I'm not sure
how to eliminate the race between asking the vendor driver for
compatibility and getting the result of that operation.

Unless we introduce an interface that can retrieve _all_ results
together with the written value? Or is that not going to be much of a
problem in practice?
Yan Zhao May 13, 2019, 1:16 a.m. UTC | #19
On Fri, May 10, 2019 at 05:48:38PM +0800, Cornelia Huck wrote:
> On Fri, 10 May 2019 10:36:09 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Thu, 9 May 2019 17:48:26 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >   
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >     
> > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:    
> > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > >       
> > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:      
> > > > > > >       
> > > > > > > > > +  Errno:
> > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > +  return -EINVAL;        
> > > > > > > > 
> > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > migration versions.      
> > > > > > > 
> > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > things (e.g. trying with different device pairs).      
> > > > > > 
> > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > get much information that way.    
> > > > > 
> > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > the version attributes on both devices (to find out whether migration
> > > > > is supported at all), and only then figure out via writing whether they
> > > > > are compatible?
> > > > > 
> > > > > (Or just go ahead and try, if it does not care about the reason.)    
> > > > 
> > > > Well, I'm OK with something like writing to test whether it's
> > > > compatible, it's just we need a better way of saying 'no'.
> > > > I'm not sure if that involves reading back from somewhere after
> > > > the write or what.  
> > > 
> > > Hm, so I basically see two ways of doing that:
> > > - standardize on some error codes... problem: error codes can be hard
> > >   to fit to reasons
> > > - make the error available in some attribute that can be read
> > > 
> > > I'm not sure how we can serialize the readback with the last write,
> > > though (this looks inherently racy).
> > > 
> > > How important is detailed error reporting here?  
> > 
> > I think we need something, otherwise we're just going to get vague
> > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > good enough to point most users to something they can understand
> > (e.g. wrong card family/too old a driver etc).
> 
> Ok, that sounds like a reasonable point. Not that I have a better idea
> how to achieve that, though... we could also log a more verbose error
> message to the kernel log, but that's not necessarily where a user will
> look first.
> 
> Ideally, we'd want to have the user space program setting up things
> querying the general compatibility for migration (so that it becomes
> their problem on how to alert the user to problems :), but I'm not sure
> how to eliminate the race between asking the vendor driver for
> compatibility and getting the result of that operation.
> 
> Unless we introduce an interface that can retrieve _all_ results
> together with the written value? Or is that not going to be much of a
> problem in practice?
what about defining a migration_errors attribute, storing recent 10 error
records with format like:
    input string: error
as identical input strings always have the same error string, the 10 error
records may meet 10+ reason querying operations. And in practice, I think there
wouldn't be 10 simultaneous migration requests?

or could we just define some common errno? like 
#define ENOMIGRATION         140  /* device not supporting migration */
#define EUNATCH              49  /* software version not match */
#define EHWNM                142  /* hardware not matching*/
Erik Skultety May 13, 2019, 1:28 p.m. UTC | #20
On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> On Fri, 10 May 2019 10:36:09 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
>
> > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > On Thu, 9 May 2019 17:48:26 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >
> > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > >
> > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > >
> > > > > > > > > +  Errno:
> > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > +  return -EINVAL;
> > > > > > > >
> > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > migration versions.
> > > > > > >
> > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > things (e.g. trying with different device pairs).
> > > > > >
> > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > get much information that way.
> > > > >
> > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > the version attributes on both devices (to find out whether migration
> > > > > is supported at all), and only then figure out via writing whether they
> > > > > are compatible?
> > > > >
> > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > >
> > > > Well, I'm OK with something like writing to test whether it's
> > > > compatible, it's just we need a better way of saying 'no'.
> > > > I'm not sure if that involves reading back from somewhere after
> > > > the write or what.
> > >
> > > Hm, so I basically see two ways of doing that:
> > > - standardize on some error codes... problem: error codes can be hard
> > >   to fit to reasons
> > > - make the error available in some attribute that can be read
> > >
> > > I'm not sure how we can serialize the readback with the last write,
> > > though (this looks inherently racy).
> > >
> > > How important is detailed error reporting here?
> >
> > I think we need something, otherwise we're just going to get vague
> > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > good enough to point most users to something they can understand
> > (e.g. wrong card family/too old a driver etc).
>
> Ok, that sounds like a reasonable point. Not that I have a better idea
> how to achieve that, though... we could also log a more verbose error
> message to the kernel log, but that's not necessarily where a user will
> look first.

In case of libvirt checking the compatibility, it won't matter how good the
error message in the kernel log is and regardless of how many error states you
want to handle, libvirt's only limited to errno here, since we're going to do
plain read/write, so our internal error message returned to the user is only
going to contain what the errno says - okay, of course we can (and we DO)
provide libvirt specific string, further specifying the error but like I
mentioned, depending on how many error cases we want to distinguish this may be
hard for anyone to figure out solely on the error code, as apps will most
probably not parse the
logs.

Regards,
Erik
>
> Ideally, we'd want to have the user space program setting up things
> querying the general compatibility for migration (so that it becomes
> their problem on how to alert the user to problems :), but I'm not sure
> how to eliminate the race between asking the vendor driver for
> compatibility and getting the result of that operation.
>
> Unless we introduce an interface that can retrieve _all_ results
> together with the written value? Or is that not going to be much of a
> problem in practice?
Cornelia Huck May 14, 2019, 7:03 a.m. UTC | #21
On Tue, 14 May 2019 02:12:35 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:

> > In case of libvirt checking the compatibility, it won't matter how good the
> > error message in the kernel log is and regardless of how many error states you
> > want to handle, libvirt's only limited to errno here, since we're going to do
> > plain read/write, so our internal error message returned to the user is only
> > going to contain what the errno says - okay, of course we can (and we DO)
> > provide libvirt specific string, further specifying the error but like I
> > mentioned, depending on how many error cases we want to distinguish this may be
> > hard for anyone to figure out solely on the error code, as apps will most
> > probably not parse the
> > logs.
> > 
> > Regards,
> > Erik  
> hi Erik
> do you mean you are agreeing on defining common errors and only returning errno?
> 
> e.g.
> #define ENOMIGRATION         140  /* device not supporting migration */
> #define EUNATCH              49  /* software version not match */
> #define EHWNM                142  /* hardware not matching*/

Defining custom error codes is probably not such a good idea... can we
match to common error codes instead? Do we have a good idea about
common error categories, anyway?

(Btw: does libvirt do a generic error-to-description translation, or
does it match to the context? I.e., can libvirt translate well-defined
error codes to a useful message for a specific case?)
Erik Skultety May 14, 2019, 7:20 a.m. UTC | #22
On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > On Fri, 10 May 2019 10:36:09 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >
> > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > >
> > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > >
> > > > > > > > > > > +  Errno:
> > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > >
> > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > migration versions.
> > > > > > > > >
> > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > >
> > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > get much information that way.
> > > > > > >
> > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > are compatible?
> > > > > > >
> > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > >
> > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > the write or what.
> > > > >
> > > > > Hm, so I basically see two ways of doing that:
> > > > > - standardize on some error codes... problem: error codes can be hard
> > > > >   to fit to reasons
> > > > > - make the error available in some attribute that can be read
> > > > >
> > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > though (this looks inherently racy).
> > > > >
> > > > > How important is detailed error reporting here?
> > > >
> > > > I think we need something, otherwise we're just going to get vague
> > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > good enough to point most users to something they can understand
> > > > (e.g. wrong card family/too old a driver etc).
> > >
> > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > how to achieve that, though... we could also log a more verbose error
> > > message to the kernel log, but that's not necessarily where a user will
> > > look first.
> >
> > In case of libvirt checking the compatibility, it won't matter how good the
> > error message in the kernel log is and regardless of how many error states you
> > want to handle, libvirt's only limited to errno here, since we're going to do
> > plain read/write, so our internal error message returned to the user is only
> > going to contain what the errno says - okay, of course we can (and we DO)
> > provide libvirt specific string, further specifying the error but like I
> > mentioned, depending on how many error cases we want to distinguish this may be
> > hard for anyone to figure out solely on the error code, as apps will most
> > probably not parse the
> > logs.
> >
> > Regards,
> > Erik
> hi Erik
> do you mean you are agreeing on defining common errors and only returning errno?

In a sense, yes. While it is highly desirable to have logs with descriptive
messages which will help in troubleshooting tremendously, I wanted to point out
that spending time with error logs may not be that worthwhile especially since
most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
That means that we're limited by the errnos available, so apart from
reporting the generic system message we can't any more magic in terms of the
error messages, so the driver needs to assure that a proper message is
propagated to the journal and at best libvirt can direct the user (consumer) to
look through the system logs for more info. I also agree with the point
mentioned above that defining a specific errno is IMO not the way to go, as
these would be just too specific for the read(3)/write(3) use case.

That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
errors (I believe Alex has mentioned something similar in one of his responses
in one of the threads):
    a) read error indicating that an mdev type doesn't support migration
        - I assume if one type doesn't support migration, none of the other
          types exposed on the parent device do, is that a fair assumption?
    b) write error indicating that the mdev types are incompatible for
    migration

Regards,
Erik
Yan Zhao May 14, 2019, 7:32 a.m. UTC | #23
On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > >
> > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > >
> > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > >
> > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > >
> > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > migration versions.
> > > > > > > > > >
> > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > >
> > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > get much information that way.
> > > > > > > >
> > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > are compatible?
> > > > > > > >
> > > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > > >
> > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > the write or what.
> > > > > >
> > > > > > Hm, so I basically see two ways of doing that:
> > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > >   to fit to reasons
> > > > > > - make the error available in some attribute that can be read
> > > > > >
> > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > though (this looks inherently racy).
> > > > > >
> > > > > > How important is detailed error reporting here?
> > > > >
> > > > > I think we need something, otherwise we're just going to get vague
> > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > good enough to point most users to something they can understand
> > > > > (e.g. wrong card family/too old a driver etc).
> > > >
> > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > how to achieve that, though... we could also log a more verbose error
> > > > message to the kernel log, but that's not necessarily where a user will
> > > > look first.
> > >
> > > In case of libvirt checking the compatibility, it won't matter how good the
> > > error message in the kernel log is and regardless of how many error states you
> > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > plain read/write, so our internal error message returned to the user is only
> > > going to contain what the errno says - okay, of course we can (and we DO)
> > > provide libvirt specific string, further specifying the error but like I
> > > mentioned, depending on how many error cases we want to distinguish this may be
> > > hard for anyone to figure out solely on the error code, as apps will most
> > > probably not parse the
> > > logs.
> > >
> > > Regards,
> > > Erik
> > hi Erik
> > do you mean you are agreeing on defining common errors and only returning errno?
> 
> In a sense, yes. While it is highly desirable to have logs with descriptive
> messages which will help in troubleshooting tremendously, I wanted to point out
> that spending time with error logs may not be that worthwhile especially since
> most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> That means that we're limited by the errnos available, so apart from
> reporting the generic system message we can't any more magic in terms of the
> error messages, so the driver needs to assure that a proper message is
> propagated to the journal and at best libvirt can direct the user (consumer) to
> look through the system logs for more info. I also agree with the point
> mentioned above that defining a specific errno is IMO not the way to go, as
> these would be just too specific for the read(3)/write(3) use case.
> 
> That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> errors (I believe Alex has mentioned something similar in one of his responses
> in one of the threads):
>     a) read error indicating that an mdev type doesn't support migration
>         - I assume if one type doesn't support migration, none of the other
>           types exposed on the parent device do, is that a fair assumption?
>     b) write error indicating that the mdev types are incompatible for
>     migration
> 
> Regards,
> Erik
Thanks for this explanation.
so, can we arrive at below agreements?

1. "not to define the specific errno returned for a specific situation,
let the vendor driver decide, userspace simply needs to know that an errno on
read indicates the device does not support migration version comparison and
that an errno on write indicates the devices are incompatible or the target
doesn't support migration versions. "
2. vendor driver should log detailed error reasons in kernel log. 

Thanks
Yan

> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Erik Skultety May 14, 2019, 7:43 a.m. UTC | #24
On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > >
> > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > >
> > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > >
> > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > > >
> > > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > > migration versions.
> > > > > > > > > > >
> > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > > >
> > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > > get much information that way.
> > > > > > > > >
> > > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > > are compatible?
> > > > > > > > >
> > > > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > > > >
> > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > the write or what.
> > > > > > >
> > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > > >   to fit to reasons
> > > > > > > - make the error available in some attribute that can be read
> > > > > > >
> > > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > > though (this looks inherently racy).
> > > > > > >
> > > > > > > How important is detailed error reporting here?
> > > > > >
> > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > > good enough to point most users to something they can understand
> > > > > > (e.g. wrong card family/too old a driver etc).
> > > > >
> > > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > > how to achieve that, though... we could also log a more verbose error
> > > > > message to the kernel log, but that's not necessarily where a user will
> > > > > look first.
> > > >
> > > > In case of libvirt checking the compatibility, it won't matter how good the
> > > > error message in the kernel log is and regardless of how many error states you
> > > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > > plain read/write, so our internal error message returned to the user is only
> > > > going to contain what the errno says - okay, of course we can (and we DO)
> > > > provide libvirt specific string, further specifying the error but like I
> > > > mentioned, depending on how many error cases we want to distinguish this may be
> > > > hard for anyone to figure out solely on the error code, as apps will most
> > > > probably not parse the
> > > > logs.
> > > >
> > > > Regards,
> > > > Erik
> > > hi Erik
> > > do you mean you are agreeing on defining common errors and only returning errno?
> >
> > In a sense, yes. While it is highly desirable to have logs with descriptive
> > messages which will help in troubleshooting tremendously, I wanted to point out
> > that spending time with error logs may not be that worthwhile especially since
> > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> > That means that we're limited by the errnos available, so apart from
> > reporting the generic system message we can't any more magic in terms of the
> > error messages, so the driver needs to assure that a proper message is
> > propagated to the journal and at best libvirt can direct the user (consumer) to
> > look through the system logs for more info. I also agree with the point
> > mentioned above that defining a specific errno is IMO not the way to go, as
> > these would be just too specific for the read(3)/write(3) use case.
> >
> > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > errors (I believe Alex has mentioned something similar in one of his responses
> > in one of the threads):
> >     a) read error indicating that an mdev type doesn't support migration
> >         - I assume if one type doesn't support migration, none of the other
> >           types exposed on the parent device do, is that a fair assumption?
> >     b) write error indicating that the mdev types are incompatible for
> >     migration
> >
> > Regards,
> > Erik
> Thanks for this explanation.
> so, can we arrive at below agreements?
>
> 1. "not to define the specific errno returned for a specific situation,
> let the vendor driver decide, userspace simply needs to know that an errno on
> read indicates the device does not support migration version comparison and
> that an errno on write indicates the devices are incompatible or the target
> doesn't support migration versions. "
> 2. vendor driver should log detailed error reasons in kernel log.

That would be my take on this, yes, but I open to hear any other suggestions and
ideas I couldn't think of as well.

Erik
Yan Zhao May 14, 2019, 7:47 a.m. UTC | #25
On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:
> > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:
> > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:
> > > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > >
> > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > >
> > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > > >
> > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:
> > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > > > +  return -EINVAL;
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > > > migration versions.
> > > > > > > > > > > >
> > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > > > things (e.g. trying with different device pairs).
> > > > > > > > > > >
> > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > > > get much information that way.
> > > > > > > > > >
> > > > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > > > are compatible?
> > > > > > > > > >
> > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.)
> > > > > > > > >
> > > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > > the write or what.
> > > > > > > >
> > > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > > > >   to fit to reasons
> > > > > > > > - make the error available in some attribute that can be read
> > > > > > > >
> > > > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > > > though (this looks inherently racy).
> > > > > > > >
> > > > > > > > How important is detailed error reporting here?
> > > > > > >
> > > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > > > good enough to point most users to something they can understand
> > > > > > > (e.g. wrong card family/too old a driver etc).
> > > > > >
> > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > > > how to achieve that, though... we could also log a more verbose error
> > > > > > message to the kernel log, but that's not necessarily where a user will
> > > > > > look first.
> > > > >
> > > > > In case of libvirt checking the compatibility, it won't matter how good the
> > > > > error message in the kernel log is and regardless of how many error states you
> > > > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > > > plain read/write, so our internal error message returned to the user is only
> > > > > going to contain what the errno says - okay, of course we can (and we DO)
> > > > > provide libvirt specific string, further specifying the error but like I
> > > > > mentioned, depending on how many error cases we want to distinguish this may be
> > > > > hard for anyone to figure out solely on the error code, as apps will most
> > > > > probably not parse the
> > > > > logs.
> > > > >
> > > > > Regards,
> > > > > Erik
> > > > hi Erik
> > > > do you mean you are agreeing on defining common errors and only returning errno?
> > >
> > > In a sense, yes. While it is highly desirable to have logs with descriptive
> > > messages which will help in troubleshooting tremendously, I wanted to point out
> > > that spending time with error logs may not be that worthwhile especially since
> > > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> > > That means that we're limited by the errnos available, so apart from
> > > reporting the generic system message we can't any more magic in terms of the
> > > error messages, so the driver needs to assure that a proper message is
> > > propagated to the journal and at best libvirt can direct the user (consumer) to
> > > look through the system logs for more info. I also agree with the point
> > > mentioned above that defining a specific errno is IMO not the way to go, as
> > > these would be just too specific for the read(3)/write(3) use case.
> > >
> > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > errors (I believe Alex has mentioned something similar in one of his responses
> > > in one of the threads):
> > >     a) read error indicating that an mdev type doesn't support migration
> > >         - I assume if one type doesn't support migration, none of the other
> > >           types exposed on the parent device do, is that a fair assumption?
> > >     b) write error indicating that the mdev types are incompatible for
> > >     migration
> > >
> > > Regards,
> > > Erik
> > Thanks for this explanation.
> > so, can we arrive at below agreements?
> >
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
> 
> That would be my take on this, yes, but I open to hear any other suggestions and
> ideas I couldn't think of as well.
> 
> Erik
got it. thanks a lot!

hi Cornelia and Dave,
do you also agree on:
1. "not to define the specific errno returned for a specific situation,
let the vendor driver decide, userspace simply needs to know that an errno on
read indicates the device does not support migration version comparison and
that an errno on write indicates the devices are incompatible or the target
doesn't support migration versions. "
2. vendor driver should log detailed error reasons in kernel log.
 
Thanks
Yan
Cornelia Huck May 14, 2019, 9:51 a.m. UTC | #26
On Tue, 14 May 2019 03:47:36 -0400
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:  
> > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  

> > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > in one of the threads):
> > > >     a) read error indicating that an mdev type doesn't support migration
> > > >         - I assume if one type doesn't support migration, none of the other
> > > >           types exposed on the parent device do, is that a fair assumption?

Probably; but there might be cases where the migratability depends not
on the device type, but how the partitioning has been done... or is
that too contrived?

> > > >     b) write error indicating that the mdev types are incompatible for
> > > >     migration
> > > >
> > > > Regards,
> > > > Erik  
> > > Thanks for this explanation.
> > > so, can we arrive at below agreements?
> > >
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > read indicates the device does not support migration version comparison and
> > > that an errno on write indicates the devices are incompatible or the target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > That would be my take on this, yes, but I open to hear any other suggestions and
> > ideas I couldn't think of as well.

So, read to find out whether migration is supported at all, write to
find out whether it is supported for that concrete pairing is
reasonable for libvirt?

> > 
> > Erik  
> got it. thanks a lot!
> 
> hi Cornelia and Dave,
> do you also agree on:
> 1. "not to define the specific errno returned for a specific situation,
> let the vendor driver decide, userspace simply needs to know that an errno on
> read indicates the device does not support migration version comparison and
> that an errno on write indicates the devices are incompatible or the target
> doesn't support migration versions. "
> 2. vendor driver should log detailed error reasons in kernel log.

Two questions:
- How reasonable is it to refer to the system log in order to find out
  what exactly went wrong?
- If detailed error reporting is basically done to the syslog, do
  different error codes still provide useful information? Or should the
  vendor driver decide what it wants to do?
Erik Skultety May 14, 2019, 10:57 a.m. UTC | #27
On Tue, May 14, 2019 at 11:51:35AM +0200, Cornelia Huck wrote:
> On Tue, 14 May 2019 03:47:36 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
>
> > On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:
>
> > > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > > in one of the threads):
> > > > >     a) read error indicating that an mdev type doesn't support migration
> > > > >         - I assume if one type doesn't support migration, none of the other
> > > > >           types exposed on the parent device do, is that a fair assumption?
>
> Probably; but there might be cases where the migratability depends not
> on the device type, but how the partitioning has been done... or is
> that too contrived?

No, you have a point - once again I let my thoughts be carried away by the idea
of heterogeneous setups, which is a discussion for another time anyway, I was
just thinking out loud.

>
> > > > >     b) write error indicating that the mdev types are incompatible for
> > > > >     migration
> > > > >
> > > > > Regards,
> > > > > Erik
> > > > Thanks for this explanation.
> > > > so, can we arrive at below agreements?
> > > >
> > > > 1. "not to define the specific errno returned for a specific situation,
> > > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > > read indicates the device does not support migration version comparison and
> > > > that an errno on write indicates the devices are incompatible or the target
> > > > doesn't support migration versions. "
> > > > 2. vendor driver should log detailed error reasons in kernel log.
> > >
> > > That would be my take on this, yes, but I open to hear any other suggestions and
> > > ideas I couldn't think of as well.
>
> So, read to find out whether migration is supported at all, write to
> find out whether it is supported for that concrete pairing is
> reasonable for libvirt?

Yes, more specifically, in the prepare phase of migration, we'd retrieve the
string (potentially reporting an error like: "Failed to query migration
support: <errno translation>"), put the string into the migration cookie and
do the check with write on destination. The only thing is that if the error is
on the destination, the error message in kernel log lives only on the
destination, which doesn't help libvirt users, so it would require setting up
remote logging, but for layered products, this is not a problem since those
already utilize central logging nodes.

Then there are the libvirt-specific bits out of scope of this discussion,
whether we should only assume identical mdev type pairs, or whether we should
employ best effort approach and iterate over all the available types exposed by
the vendor and check whether any of the types would support this migration
(back to your note Connie, partitioning would come into the picture here).


>
> > >
> > > Erik
> > got it. thanks a lot!
> >
> > hi Cornelia and Dave,
> > do you also agree on:
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
>
> Two questions:
> - How reasonable is it to refer to the system log in order to find out
>   what exactly went wrong?
> - If detailed error reporting is basically done to the syslog, do
>   different error codes still provide useful information? Or should the
>   vendor driver decide what it wants to do?

I'd leave anything beyond returning -1 on read/write from/to the sysfs to the
vendor driver, as user space has no control over it, even if there was a
facility to interpret different return codes for us, I'm not sure (in this
migration-related case) how much would userspace be able to recover or
fallback anyway, you either can or cannot migrate smoothely.

Regards,
Erik
Dr. David Alan Gilbert May 14, 2019, 11:01 a.m. UTC | #28
* Cornelia Huck (cohuck@redhat.com) wrote:
> On Tue, 14 May 2019 03:47:36 -0400
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, May 14, 2019 at 03:43:44PM +0800, Erik Skultety wrote:
> > > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:  
> > > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  
> 
> > > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > > in one of the threads):
> > > > >     a) read error indicating that an mdev type doesn't support migration
> > > > >         - I assume if one type doesn't support migration, none of the other
> > > > >           types exposed on the parent device do, is that a fair assumption?
> 
> Probably; but there might be cases where the migratability depends not
> on the device type, but how the partitioning has been done... or is
> that too contrived?
> 
> > > > >     b) write error indicating that the mdev types are incompatible for
> > > > >     migration
> > > > >
> > > > > Regards,
> > > > > Erik  
> > > > Thanks for this explanation.
> > > > so, can we arrive at below agreements?
> > > >
> > > > 1. "not to define the specific errno returned for a specific situation,
> > > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > > read indicates the device does not support migration version comparison and
> > > > that an errno on write indicates the devices are incompatible or the target
> > > > doesn't support migration versions. "
> > > > 2. vendor driver should log detailed error reasons in kernel log.  
> > > 
> > > That would be my take on this, yes, but I open to hear any other suggestions and
> > > ideas I couldn't think of as well.
> 
> So, read to find out whether migration is supported at all, write to
> find out whether it is supported for that concrete pairing is
> reasonable for libvirt?
> 
> > > 
> > > Erik  
> > got it. thanks a lot!
> > 
> > hi Cornelia and Dave,
> > do you also agree on:
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.
> 
> Two questions:
> - How reasonable is it to refer to the system log in order to find out
>   what exactly went wrong?
> - If detailed error reporting is basically done to the syslog, do
>   different error codes still provide useful information? Or should the
>   vendor driver decide what it wants to do?

I don't see error codes as being that helpful; if we can't actually get
an error message back up the stack (which was my preference), then I guess
syslog is as good as it will get.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Cornelia Huck May 14, 2019, 11:30 a.m. UTC | #29
On Tue, 14 May 2019 12:01:45 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Cornelia Huck (cohuck@redhat.com) wrote:
> > On Tue, 14 May 2019 03:47:36 -0400
> > Yan Zhao <yan.y.zhao@intel.com> wrote:

> > > hi Cornelia and Dave,
> > > do you also agree on:
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > read indicates the device does not support migration version comparison and
> > > that an errno on write indicates the devices are incompatible or the target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > Two questions:
> > - How reasonable is it to refer to the system log in order to find out
> >   what exactly went wrong?
> > - If detailed error reporting is basically done to the syslog, do
> >   different error codes still provide useful information? Or should the
> >   vendor driver decide what it wants to do?  
> 
> I don't see error codes as being that helpful; if we can't actually get
> an error message back up the stack (which was my preference), then I guess
> syslog is as good as it will get.

Ok, so letting the vendor driver simply return an(y) error and possibly
dumping an error message into the syslog seems to be the most
reasonable approach.
Alex Williamson May 14, 2019, 3:01 p.m. UTC | #30
On Tue, 14 May 2019 09:43:44 +0200
Erik Skultety <eskultet@redhat.com> wrote:

> On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  
> > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:  
> > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:  
> > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:  
> > > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > >  
> > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > >  
> > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > > >  
> > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > > > >  
> > > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:  
> > > > > > > > > > > >  
> > > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > > > +  return -EINVAL;  
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > > > migration versions.  
> > > > > > > > > > > >
> > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > > > things (e.g. trying with different device pairs).  
> > > > > > > > > > >
> > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > > > get much information that way.  
> > > > > > > > > >
> > > > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > > > are compatible?
> > > > > > > > > >
> > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.)  
> > > > > > > > >
> > > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > > the write or what.  
> > > > > > > >
> > > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > > > >   to fit to reasons
> > > > > > > > - make the error available in some attribute that can be read
> > > > > > > >
> > > > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > > > though (this looks inherently racy).
> > > > > > > >
> > > > > > > > How important is detailed error reporting here?  
> > > > > > >
> > > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > > > good enough to point most users to something they can understand
> > > > > > > (e.g. wrong card family/too old a driver etc).  
> > > > > >
> > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > > > how to achieve that, though... we could also log a more verbose error
> > > > > > message to the kernel log, but that's not necessarily where a user will
> > > > > > look first.  
> > > > >
> > > > > In case of libvirt checking the compatibility, it won't matter how good the
> > > > > error message in the kernel log is and regardless of how many error states you
> > > > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > > > plain read/write, so our internal error message returned to the user is only
> > > > > going to contain what the errno says - okay, of course we can (and we DO)
> > > > > provide libvirt specific string, further specifying the error but like I
> > > > > mentioned, depending on how many error cases we want to distinguish this may be
> > > > > hard for anyone to figure out solely on the error code, as apps will most
> > > > > probably not parse the
> > > > > logs.
> > > > >
> > > > > Regards,
> > > > > Erik  
> > > > hi Erik
> > > > do you mean you are agreeing on defining common errors and only returning errno?  
> > >
> > > In a sense, yes. While it is highly desirable to have logs with descriptive
> > > messages which will help in troubleshooting tremendously, I wanted to point out
> > > that spending time with error logs may not be that worthwhile especially since
> > > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> > > That means that we're limited by the errnos available, so apart from
> > > reporting the generic system message we can't any more magic in terms of the
> > > error messages, so the driver needs to assure that a proper message is
> > > propagated to the journal and at best libvirt can direct the user (consumer) to
> > > look through the system logs for more info. I also agree with the point
> > > mentioned above that defining a specific errno is IMO not the way to go, as
> > > these would be just too specific for the read(3)/write(3) use case.
> > >
> > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > errors (I believe Alex has mentioned something similar in one of his responses
> > > in one of the threads):
> > >     a) read error indicating that an mdev type doesn't support migration
> > >         - I assume if one type doesn't support migration, none of the other
> > >           types exposed on the parent device do, is that a fair assumption?

I'd prefer not to make this assumption.  Let's leave open the
possibility that (for whatever reason) a vendor may choose to support
migration on some types, but not others.

> > >     b) write error indicating that the mdev types are incompatible for
> > >     migration
> > >
> > > Regards,
> > > Erik  
> > Thanks for this explanation.
> > so, can we arrive at below agreements?
> >
> > 1. "not to define the specific errno returned for a specific situation,
> > let the vendor driver decide, userspace simply needs to know that an errno on
> > read indicates the device does not support migration version comparison and
> > that an errno on write indicates the devices are incompatible or the target
> > doesn't support migration versions. "
> > 2. vendor driver should log detailed error reasons in kernel log.  
> 
> That would be my take on this, yes, but I open to hear any other suggestions and
> ideas I couldn't think of as well.

Kernel logging tends to be rather ineffective, it's surprisingly
difficult to get users to look in dmesg and it's not really a good
choice for scraping diagnostic information either.  I'd probably leave
this to vendor driver's discretion at this point.  Thanks,

Alex
Alex Williamson May 14, 2019, 3:31 p.m. UTC | #31
On Wed, 8 May 2019 17:27:47 +0200
Boris Fiuczynski <fiuczy@linux.ibm.com> wrote:

> On 5/8/19 11:22 PM, Alex Williamson wrote:
> >>> I thought there was a request to make this more specific to migration
> >>> by renaming it to something like migration_version.  Also, as an
> >>>     
> >> so this attribute may not only include a mdev device's parent device info and
> >> mdev type, but also include numeric software version of vendor specific
> >> migration code, right?  
> > It's a vendor defined string, it should be considered opaque to the
> > user, the vendor can include whatever they feel is relevant.
> >   
> Would a vendor also be allowed to provide a string expressing required 
> features as well as containing backend resource requirements which need 
> to be compatible for a successful migration? Somehow a bit like a cpu 
> model... maybe even as json or xml...
> I am asking this with vfio-ap in mind. In that context checking 
> compatibility of two vfio-ap mdev devices is not as simple as checking 
> if version A is smaller or equal to version B.

Two pieces to this, the first is that the string is opaque exactly so
that the vendor driver can express whatever they need in it.  The user
should never infer that two devices are compatible.  The second is that
this is not a resource availability or reservation interface.  The fact
that a target device would be compatible for migration should not take
into account whether the target has the resources to actually create
such a device.  Doing so would imply some sort of resource reservation
support that does not exist.  Matrix devices are clearly a bit
complicated here since maybe the source is expressing a component of
the device that doesn't exist on the target.  In such a "resource not
available at all" case, it might be fair to nak the compatibility test,
but a "ok, but resource not currently available" case should pass,
imo.  Thanks,

Alex
Yan Zhao May 16, 2019, 1 a.m. UTC | #32
On Tue, May 14, 2019 at 11:01:42PM +0800, Alex Williamson wrote:
> On Tue, 14 May 2019 09:43:44 +0200
> Erik Skultety <eskultet@redhat.com> wrote:
> 
> > On Tue, May 14, 2019 at 03:32:19AM -0400, Yan Zhao wrote:
> > > On Tue, May 14, 2019 at 03:20:40PM +0800, Erik Skultety wrote:  
> > > > On Tue, May 14, 2019 at 02:12:35AM -0400, Yan Zhao wrote:  
> > > > > On Mon, May 13, 2019 at 09:28:04PM +0800, Erik Skultety wrote:  
> > > > > > On Fri, May 10, 2019 at 11:48:38AM +0200, Cornelia Huck wrote:  
> > > > > > > On Fri, 10 May 2019 10:36:09 +0100
> > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > >  
> > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > > On Thu, 9 May 2019 17:48:26 +0100
> > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > >  
> > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > > > > On Thu, 9 May 2019 16:48:57 +0100
> > > > > > > > > > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > > > > > > > > > >  
> > > > > > > > > > > > * Cornelia Huck (cohuck@redhat.com) wrote:  
> > > > > > > > > > > > > On Tue, 7 May 2019 15:18:26 -0600
> > > > > > > > > > > > > Alex Williamson <alex.williamson@redhat.com> wrote:
> > > > > > > > > > > > >  
> > > > > > > > > > > > > > On Sun,  5 May 2019 21:49:04 -0400
> > > > > > > > > > > > > > Yan Zhao <yan.y.zhao@intel.com> wrote:  
> > > > > > > > > > > > >  
> > > > > > > > > > > > > > > +  Errno:
> > > > > > > > > > > > > > > +  If vendor driver wants to claim a mdev device incompatible to all other mdev
> > > > > > > > > > > > > > > +  devices, it should not register version attribute for this mdev device. But if
> > > > > > > > > > > > > > > +  a vendor driver has already registered version attribute and it wants to claim
> > > > > > > > > > > > > > > +  a mdev device incompatible to all other mdev devices, it needs to return
> > > > > > > > > > > > > > > +  -ENODEV on access to this mdev device's version attribute.
> > > > > > > > > > > > > > > +  If a mdev device is only incompatible to certain mdev devices, write of
> > > > > > > > > > > > > > > +  incompatible mdev devices's version strings to its version attribute should
> > > > > > > > > > > > > > > +  return -EINVAL;  
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I think it's best not to define the specific errno returned for a
> > > > > > > > > > > > > > specific situation, let the vendor driver decide, userspace simply
> > > > > > > > > > > > > > needs to know that an errno on read indicates the device does not
> > > > > > > > > > > > > > support migration version comparison and that an errno on write
> > > > > > > > > > > > > > indicates the devices are incompatible or the target doesn't support
> > > > > > > > > > > > > > migration versions.  
> > > > > > > > > > > > >
> > > > > > > > > > > > > I think I have to disagree here: It's probably valuable to have an
> > > > > > > > > > > > > agreed error for 'cannot migrate at all' vs 'cannot migrate between
> > > > > > > > > > > > > those two particular devices'. Userspace might want to do different
> > > > > > > > > > > > > things (e.g. trying with different device pairs).  
> > > > > > > > > > > >
> > > > > > > > > > > > Trying to stuff these things down an errno seems a bad idea; we can't
> > > > > > > > > > > > get much information that way.  
> > > > > > > > > > >
> > > > > > > > > > > So, what would be a reasonable approach? Userspace should first read
> > > > > > > > > > > the version attributes on both devices (to find out whether migration
> > > > > > > > > > > is supported at all), and only then figure out via writing whether they
> > > > > > > > > > > are compatible?
> > > > > > > > > > >
> > > > > > > > > > > (Or just go ahead and try, if it does not care about the reason.)  
> > > > > > > > > >
> > > > > > > > > > Well, I'm OK with something like writing to test whether it's
> > > > > > > > > > compatible, it's just we need a better way of saying 'no'.
> > > > > > > > > > I'm not sure if that involves reading back from somewhere after
> > > > > > > > > > the write or what.  
> > > > > > > > >
> > > > > > > > > Hm, so I basically see two ways of doing that:
> > > > > > > > > - standardize on some error codes... problem: error codes can be hard
> > > > > > > > >   to fit to reasons
> > > > > > > > > - make the error available in some attribute that can be read
> > > > > > > > >
> > > > > > > > > I'm not sure how we can serialize the readback with the last write,
> > > > > > > > > though (this looks inherently racy).
> > > > > > > > >
> > > > > > > > > How important is detailed error reporting here?  
> > > > > > > >
> > > > > > > > I think we need something, otherwise we're just going to get vague
> > > > > > > > user reports of 'but my VM doesn't migrate'; I'd like the error to be
> > > > > > > > good enough to point most users to something they can understand
> > > > > > > > (e.g. wrong card family/too old a driver etc).  
> > > > > > >
> > > > > > > Ok, that sounds like a reasonable point. Not that I have a better idea
> > > > > > > how to achieve that, though... we could also log a more verbose error
> > > > > > > message to the kernel log, but that's not necessarily where a user will
> > > > > > > look first.  
> > > > > >
> > > > > > In case of libvirt checking the compatibility, it won't matter how good the
> > > > > > error message in the kernel log is and regardless of how many error states you
> > > > > > want to handle, libvirt's only limited to errno here, since we're going to do
> > > > > > plain read/write, so our internal error message returned to the user is only
> > > > > > going to contain what the errno says - okay, of course we can (and we DO)
> > > > > > provide libvirt specific string, further specifying the error but like I
> > > > > > mentioned, depending on how many error cases we want to distinguish this may be
> > > > > > hard for anyone to figure out solely on the error code, as apps will most
> > > > > > probably not parse the
> > > > > > logs.
> > > > > >
> > > > > > Regards,
> > > > > > Erik  
> > > > > hi Erik
> > > > > do you mean you are agreeing on defining common errors and only returning errno?  
> > > >
> > > > In a sense, yes. While it is highly desirable to have logs with descriptive
> > > > messages which will help in troubleshooting tremendously, I wanted to point out
> > > > that spending time with error logs may not be that worthwhile especially since
> > > > most apps (like libvirt) will solely rely on using read(3)/write(3) to sysfs.
> > > > That means that we're limited by the errnos available, so apart from
> > > > reporting the generic system message we can't any more magic in terms of the
> > > > error messages, so the driver needs to assure that a proper message is
> > > > propagated to the journal and at best libvirt can direct the user (consumer) to
> > > > look through the system logs for more info. I also agree with the point
> > > > mentioned above that defining a specific errno is IMO not the way to go, as
> > > > these would be just too specific for the read(3)/write(3) use case.
> > > >
> > > > That said, from libvirt POV as a consumer, I'd expect there to be truly only 2
> > > > errors (I believe Alex has mentioned something similar in one of his responses
> > > > in one of the threads):
> > > >     a) read error indicating that an mdev type doesn't support migration
> > > >         - I assume if one type doesn't support migration, none of the other
> > > >           types exposed on the parent device do, is that a fair assumption?
> 
> I'd prefer not to make this assumption.  Let's leave open the
> possibility that (for whatever reason) a vendor may choose to support
> migration on some types, but not others.
> 
> > > >     b) write error indicating that the mdev types are incompatible for
> > > >     migration
> > > >
> > > > Regards,
> > > > Erik  
> > > Thanks for this explanation.
> > > so, can we arrive at below agreements?
> > >
> > > 1. "not to define the specific errno returned for a specific situation,
> > > let the vendor driver decide, userspace simply needs to know that an errno on
> > > read indicates the device does not support migration version comparison and
> > > that an errno on write indicates the devices are incompatible or the target
> > > doesn't support migration versions. "
> > > 2. vendor driver should log detailed error reasons in kernel log.  
> > 
> > That would be my take on this, yes, but I open to hear any other suggestions and
> > ideas I couldn't think of as well.
> 
> Kernel logging tends to be rather ineffective, it's surprisingly
> difficult to get users to look in dmesg and it's not really a good
> choice for scraping diagnostic information either.  I'd probably leave
> this to vendor driver's discretion at this point.  Thanks,
> 
> Alex

got it.
Thank you all!
I'll follow it to prepare the next revision.

Thanks
Yan

> _______________________________________________
> intel-gvt-dev mailing list
> intel-gvt-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gvt-dev
Boris Fiuczynski May 28, 2019, 8:57 p.m. UTC | #33
On 5/14/19 5:31 PM, Alex Williamson wrote:
> On Wed, 8 May 2019 17:27:47 +0200
> Boris Fiuczynski <fiuczy@linux.ibm.com> wrote:
> 
>> On 5/8/19 11:22 PM, Alex Williamson wrote:
>>>>> I thought there was a request to make this more specific to migration
>>>>> by renaming it to something like migration_version.  Also, as an
>>>>>      
>>>> so this attribute may not only include a mdev device's parent device info and
>>>> mdev type, but also include numeric software version of vendor specific
>>>> migration code, right?
>>> It's a vendor defined string, it should be considered opaque to the
>>> user, the vendor can include whatever they feel is relevant.
>>>    
>> Would a vendor also be allowed to provide a string expressing required
>> features as well as containing backend resource requirements which need
>> to be compatible for a successful migration? Somehow a bit like a cpu
>> model... maybe even as json or xml...
>> I am asking this with vfio-ap in mind. In that context checking
>> compatibility of two vfio-ap mdev devices is not as simple as checking
>> if version A is smaller or equal to version B.
> 
> Two pieces to this, the first is that the string is opaque exactly so
> that the vendor driver can express whatever they need in it.  The user
> should never infer that two devices are compatible.  The second is that
I agree.

> this is not a resource availability or reservation interface.  The fact
I also agree. The migration_version (version in this case is not really 
a good fit) is a summary of requirements the source mdev has which a 
target mdev needs to be able to fulfill in order to allow migration.
The target mdev already exists and was already configured by other means 
not involved in the migration check process.
Using the migrations_version as some kind of configuration transport 
and/or reservation mechanism wasn't my intention and IMHO would both be 
wrong.

> that a target device would be compatible for migration should not take
> into account whether the target has the resources to actually create
> such a device.  Doing so would imply some sort of resource reservation
> support that does not exist.  Matrix devices are clearly a bit
> complicated here since maybe the source is expressing a component of
> the device that doesn't exist on the target.  In such a "resource not
> available at all" case, it might be fair to nak the compatibility test,
> but a "ok, but resource not currently available" case should pass,
> imo.  Thanks,
> 
> Alex
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
>
Alex Williamson May 29, 2019, 2:08 p.m. UTC | #34
On Tue, 28 May 2019 22:57:15 +0200
Boris Fiuczynski <fiuczy@linux.ibm.com> wrote:

> On 5/14/19 5:31 PM, Alex Williamson wrote:
> > On Wed, 8 May 2019 17:27:47 +0200
> > Boris Fiuczynski <fiuczy@linux.ibm.com> wrote:
> >   
> >> On 5/8/19 11:22 PM, Alex Williamson wrote:  
> >>>>> I thought there was a request to make this more specific to migration
> >>>>> by renaming it to something like migration_version.  Also, as an
> >>>>>        
> >>>> so this attribute may not only include a mdev device's parent device info and
> >>>> mdev type, but also include numeric software version of vendor specific
> >>>> migration code, right?  
> >>> It's a vendor defined string, it should be considered opaque to the
> >>> user, the vendor can include whatever they feel is relevant.
> >>>      
> >> Would a vendor also be allowed to provide a string expressing required
> >> features as well as containing backend resource requirements which need
> >> to be compatible for a successful migration? Somehow a bit like a cpu
> >> model... maybe even as json or xml...
> >> I am asking this with vfio-ap in mind. In that context checking
> >> compatibility of two vfio-ap mdev devices is not as simple as checking
> >> if version A is smaller or equal to version B.  
> > 
> > Two pieces to this, the first is that the string is opaque exactly so
> > that the vendor driver can express whatever they need in it.  The user
> > should never infer that two devices are compatible.  The second is that  
> I agree.
> 
> > this is not a resource availability or reservation interface.  The fact  
> I also agree. The migration_version (version in this case is not really 
> a good fit) is a summary of requirements the source mdev has which a 
> target mdev needs to be able to fulfill in order to allow migration.
> The target mdev already exists and was already configured by other means 
> not involved in the migration check process.

Just a nit here (I hope), the target mdev does not necessarily exist at
the time we're testing migration version compatibility.  The intention
is that this feature can be used to select a target host system which
can possibly generate a compatible target mdev device before committing
to create that device.  For instance a management tool might test for
migration compatibility across a data center, narrowing the set of
potential target hosts, then proceed to select a best choice based on
factors including the ability to actually instantiate such a device on
the host.

> Using the migrations_version as some kind of configuration transport 
> and/or reservation mechanism wasn't my intention and IMHO would both be 
> wrong.

Sounds good.  Thanks,

Alex

> > that a target device would be compatible for migration should not take
> > into account whether the target has the resources to actually create
> > such a device.  Doing so would imply some sort of resource reservation
> > support that does not exist.  Matrix devices are clearly a bit
> > complicated here since maybe the source is expressing a component of
> > the device that doesn't exist on the target.  In such a "resource not
> > available at all" case, it might be fair to nak the compatibility test,
> > but a "ok, but resource not currently available" case should pass,
> > imo.  Thanks,
> > 
> > Alex
> > 
> > --
> > libvir-list mailing list
> > libvir-list@redhat.com
> > https://www.redhat.com/mailman/listinfo/libvir-list
> >   
> 
>