diff mbox series

[v7,22/22] s390: doc: detailed specifications for AP virtualization

Message ID 20180726195429.31960-23-borntraeger@de.ibm.com (mailing list archive)
State New, archived
Headers show
Series vfio-ap: guest dedicated crypto adapters | expand

Commit Message

Christian Borntraeger July 26, 2018, 7:54 p.m. UTC
From: Tony Krowiak <akrowiak@linux.ibm.com>

This patch provides documentation describing the AP architecture and
design concepts behind the virtualization of AP devices. It also
includes an example of how to configure AP devices for exclusive
use of KVM guests.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 Documentation/s390/vfio-ap.txt | 591 +++++++++++++++++++++++++++++++++
 MAINTAINERS                    |   1 +
 2 files changed, 592 insertions(+)
 create mode 100644 Documentation/s390/vfio-ap.txt

Comments

Alex Williamson July 27, 2018, 5:44 p.m. UTC | #1
On Thu, 26 Jul 2018 21:54:29 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:
...
> +The process for reserving an AP queue for use by a KVM guest is:
> +
> +* The vfio-ap driver during its initialization will perform the following:
> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
> +  * Create the 'matrix' device in the 'vfio_ap' root
> +  * Register the matrix device with the device core
> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> +  newer) and to provide the vfio_ap driver's probe and remove callback
> +  interfaces. The reason why older devices are not supported is because there
> +  are no systems available on which to test.
> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
> +  mask to /sys/bus/ap/aqmask.
> +
> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
> +
> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask

And this is a reasonable user syntax? ;)

...
> +  * mdev_attr_groups
> +    This attribute group identifies the user-defined sysfs attributes of the
> +    mediated device. When a device is registered with the VFIO mediated device
> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
> +    structure will be created in the mediated matrix device's directory. The
> +    sysfs attributes for a mediated matrix device are:
> +    * assign_adapter:
> +      A write-only file for assigning an AP adapter to the mediated matrix
> +      device. To assign an adapter, the APID of the adapter is written to the
> +      file.
> +    * assign_domain:
> +      A write-only file for assigning an AP usage domain to the mediated matrix
> +      device. To assign a domain, the APQI of the AP queue corresponding to a
> +      usage domain is written to the file.
> +    * matrix:
> +      A read-only file for displaying the APQNs derived from the adapters and
> +      domains assigned to the mediated matrix device.
> +    * assign_control_domain:
> +      A write-only file for assigning an AP control domain to the mediated
> +      matrix device. To assign a control domain, the ID of a domain to be
> +      controlled is written to the file. For the initial implementation, the set
> +      of control domains will always include the set of usage domains, so it is
> +      only necessary to assign control domains that are not also assigned as
> +      usage domains.


How will the user know when this changes?  What's the transition plan
to maintain compatibility when it does change?

...
> +4. The administrator now needs to configure the matrixes for mediated
> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
> +
> +   This is how the matrix is configured for Guest1:
> +
> +   echo 5 > assign_adapter
> +   echo 6 > assign_adapter
> +   echo 4 > assign_domain
> +   echo 0xab > assign_domain
> +
> +   For this implementation, all usage domains - i.e., domains assigned
> +   via the assign_domain attribute file - will also be configured in the ADM
> +   field of the KVM guest's CRYCB, so there is no need to assign control
> +   domains here unless you want to assign control domains that are not
> +   assigned as usage domains.
> +
> +   If a mistake is made configuring an adapter, domain or control domain,
> +   you can use the unassign_xxx files to unassign the adapter, domain or
> +   control domain.

Would it be more consistent with other sysfs entries to call these
"bind" and "unbind" rather than "assign" and "unassign"?

> +
> +   To display the matrix configuration for Guest1:
> +
> +   cat matrix
> +
> +   This is how the matrix is configured for Guest2:
> +
> +   echo 5 > assign_adapter
> +   echo 0x47 > assign_domain
> +   echo 0xff > assign_domain
> +
> +5. The adminstrator now needs to activate the mediated devices.
> +   echo 1 > activate

Or optionally not.  As in reply to cover letter, I don't think this
interface is sufficiently justified, or necessarily desirable.

> +6. Start Guest1:
> +
> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
> +
> +7. Start Guest2:
> +
> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
> +
> +When the guest is shut down, the mediated matrix device may be removed.
> +
> +Using our example again, to remove the mediated matrix device $uuid1:
> +
> +   /sys/devices/virtual/misc
> +      --- [vfio_ap]
> +      --------- [mdev_supported_types]
> +      ------------ [vfio_ap-passthrough]
> +      --------------- [devices]
> +      ------------------ [$uuid1]
> +      --------------------- activate
> +      --------------------- remove
> +
> +
> +   echo 0 > activate
> +   echo 1 > remove
> +
> +   This will release all the AP queues for the mediated device and
> +   remove all of the mdev matrix device's sysfs structures. To
> +   recreate and reconfigure the mdev matrix device, all of the steps starting
> +   with step 4 will have to be performed again.
> +
> +   It is not necessary to remove an mdev matrix device, but one may want to
> +   remove it if no guest will use it during the lifetime of the linux host. If
> +   the mdev matrix device is removed, one may want to unbind the AP queues the
> +   guest was using from the vfio_ap device driver and bind them back to the
> +   default driver. Alternatively, the AP queues can be configured for another
> +   mdev matrix (i.e., guest). In either case, one must take care to change the
> +   secure key configured for the domain to which the queue is connected.

This seems prime for data leakage, extremely sensitive data at that.
Who's responsibility is it to prevent this?  Shouldn't closing the
device reset the device, which should perhaps wipe any key
configuration?

> +
> +
> +Limitations
> +===========
> +An admin needs to be careful when writing to sysfs files
> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
> +driver and assigned to a mediated device, admin should not re-assign the
> +AP queues back to the default driver, because of technical limitations.
> +The kernel does not prevent you from re-assigning from AP queues reserved
> +for the vfio_ap driver back to default driver.  Future enhancements in
> +the ap_bus and vfio_ap are likely to introduce complete support for such
> +dynamic reconfiguration. But until then be extremely careful.

Why don't we have these enhancements now?  Should the whole thing be
marked experimental without them?  This sounds similar to a vfio-pci
case where a group with multiple devices could have device which is
unused by the user unbound from vfio-pci and re-bound to a native host
driver.  We BUG_ON when this occurs to protect the data.

Probably also need to evaluate updates to
Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
supposed to interact with an 'activate' attribute.  Thanks,

Alex
Halil Pasic Aug. 1, 2018, 11:12 a.m. UTC | #2
On 07/27/2018 07:44 PM, Alex Williamson wrote:
> On Thu, 26 Jul 2018 21:54:29 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> ...
>> +The process for reserving an AP queue for use by a KVM guest is:
>> +
>> +* The vfio-ap driver during its initialization will perform the following:
>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>> +  * Create the 'matrix' device in the 'vfio_ap' root
>> +  * Register the matrix device with the device core
>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>> +  interfaces. The reason why older devices are not supported is because there
>> +  are no systems available on which to test.
>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>> +  mask to /sys/bus/ap/aqmask.
>> +
>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>> +
>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
> 
> And this is a reasonable user syntax? ;)
> 

We intend to provide tooling that makes this more human friendly. BTW
echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.

> ...
>> +  * mdev_attr_groups
>> +    This attribute group identifies the user-defined sysfs attributes of the
>> +    mediated device. When a device is registered with the VFIO mediated device
>> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
>> +    structure will be created in the mediated matrix device's directory. The
>> +    sysfs attributes for a mediated matrix device are:
>> +    * assign_adapter:
>> +      A write-only file for assigning an AP adapter to the mediated matrix
>> +      device. To assign an adapter, the APID of the adapter is written to the
>> +      file.
>> +    * assign_domain:
>> +      A write-only file for assigning an AP usage domain to the mediated matrix
>> +      device. To assign a domain, the APQI of the AP queue corresponding to a
>> +      usage domain is written to the file.
>> +    * matrix:
>> +      A read-only file for displaying the APQNs derived from the adapters and
>> +      domains assigned to the mediated matrix device.
>> +    * assign_control_domain:
>> +      A write-only file for assigning an AP control domain to the mediated
>> +      matrix device. To assign a control domain, the ID of a domain to be
>> +      controlled is written to the file. For the initial implementation, the set
>> +      of control domains will always include the set of usage domains, so it is
>> +      only necessary to assign control domains that are not also assigned as
>> +      usage domains.
> 
> 
> How will the user know when this changes?  What's the transition plan
> to maintain compatibility when it does change?
> 

I don't think this is supposed to change under the users feet. This document
probably started out as a design document. While I'm personally in favor of
managing the two separate, I think we need to commit to either-or and
get rid of 'For the initial implementation,'.

> ...
>> +4. The administrator now needs to configure the matrixes for mediated
>> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
>> +
>> +   This is how the matrix is configured for Guest1:
>> +
>> +   echo 5 > assign_adapter
>> +   echo 6 > assign_adapter
>> +   echo 4 > assign_domain
>> +   echo 0xab > assign_domain
>> +
>> +   For this implementation, all usage domains - i.e., domains assigned
>> +   via the assign_domain attribute file - will also be configured in the ADM
>> +   field of the KVM guest's CRYCB, so there is no need to assign control
>> +   domains here unless you want to assign control domains that are not
>> +   assigned as usage domains.
>> +
>> +   If a mistake is made configuring an adapter, domain or control domain,
>> +   you can use the unassign_xxx files to unassign the adapter, domain or
>> +   control domain.
> 
> Would it be more consistent with other sysfs entries to call these
> "bind" and "unbind" rather than "assign" and "unassign"?
> 

I'm not sure. Let me recap what is this about. First we partition our AP
resources (queues) into 'to be used by the host' and 'to be used by the
guests' using the /sys/bus/ap/a[pq]mask. Then we sub-partition 'to be
used by the guests' among the guests. Each vfio ap_matrix represents a
partition that corresponds to a single VM (guest). Such a partition can
be seen as a device that grants access (usage or administration) to a set
of queues. We can grant access to queues even if these are not currently
available in the system. E.g. we can partition an AP card that is
currently not online or even plugged. When The resources become available
(e.g. new card plugged, or just 'configured'), the resources (should)
become available at all levels if the authorization is right. I know
'bind' binding a device (that currently exists within the system) to a
driver. The vfio_ap stuff seems different enough to seek a different name
for it.
.

>> +
>> +   To display the matrix configuration for Guest1:
>> +
>> +   cat matrix
>> +
>> +   This is how the matrix is configured for Guest2:
>> +
>> +   echo 5 > assign_adapter
>> +   echo 0x47 > assign_domain
>> +   echo 0xff > assign_domain
>> +
>> +5. The adminstrator now needs to activate the mediated devices.
>> +   echo 1 > activate
> 
> Or optionally not.  As in reply to cover letter, I don't think this
> interface is sufficiently justified, or necessarily desirable.
> 

You are right. Activation is not necessary because the open will try to
auto-activate it. But honestly, I would recommend activating it in
advance, so the admin can be sure the resources are claimed.

In our current design we choose to give the administrator the freedom, to
decide on his policy how does he want to manage the vfio_ap mdevs
lifecycle. And  to decide what guarantees does he want to have at a given
point in time. With 'activate' both 'dynamically and on-demand of
starting a VM' and 'create known vfio_ap mdevs at system bring up' are
viable. If there were a consensus on how the life-cycle of the mdev
devices is to be managed, it would be easier to not give this freedom to
the admin.

>> +6. Start Guest1:
>> +
>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
>> +
>> +7. Start Guest2:
>> +
>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
>> +
>> +When the guest is shut down, the mediated matrix device may be removed.
>> +
>> +Using our example again, to remove the mediated matrix device $uuid1:
>> +
>> +   /sys/devices/virtual/misc
>> +      --- [vfio_ap]
>> +      --------- [mdev_supported_types]
>> +      ------------ [vfio_ap-passthrough]
>> +      --------------- [devices]
>> +      ------------------ [$uuid1]
>> +      --------------------- activate
>> +      --------------------- remove
>> +
>> +
>> +   echo 0 > activate
>> +   echo 1 > remove
>> +
>> +   This will release all the AP queues for the mediated device and
>> +   remove all of the mdev matrix device's sysfs structures. To
>> +   recreate and reconfigure the mdev matrix device, all of the steps starting
>> +   with step 4 will have to be performed again.
>> +
>> +   It is not necessary to remove an mdev matrix device, but one may want to
>> +   remove it if no guest will use it during the lifetime of the linux host. If
>> +   the mdev matrix device is removed, one may want to unbind the AP queues the
>> +   guest was using from the vfio_ap device driver and bind them back to the
>> +   default driver. Alternatively, the AP queues can be configured for another
>> +   mdev matrix (i.e., guest). In either case, one must take care to change the
>> +   secure key configured for the domain to which the queue is connected.
> 
> This seems prime for data leakage, extremely sensitive data at that.
> Who's responsibility is it to prevent this?  Shouldn't closing the
> device reset the device, which should perhaps wipe any key
> configuration?
> 

The documentation seems to be a bit outdated here. We do reset the device
in the release callback and on VFIO_DEVICE_RESET ioctl. There might be some
stuff that is not supposed to be reset by us, but that is fine (master keys
managed via special infrastructure called the TKE). I'm not sure what is the
meaning of last sentence of the paragraph in question  ('change the secure
key configured ...') though. Anyway we will have to update this.

>> +
>> +
>> +Limitations
>> +===========
>> +An admin needs to be careful when writing to sysfs files
>> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
>> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
>> +driver and assigned to a mediated device, admin should not re-assign the
>> +AP queues back to the default driver, because of technical limitations.
>> +The kernel does not prevent you from re-assigning from AP queues reserved
>> +for the vfio_ap driver back to default driver.  Future enhancements in
>> +the ap_bus and vfio_ap are likely to introduce complete support for such
>> +dynamic reconfiguration. But until then be extremely careful.
> 
> Why don't we have these enhancements now?  Should the whole thing be
> marked experimental without them?  This sounds similar to a vfio-pci
> case where a group with multiple devices could have device which is
> unused by the user unbound from vfio-pci and re-bound to a native host
> driver.  We BUG_ON when this occurs to protect the data.

I think, marking vfio_ap as experimental is probably a good idea. I think
we can do these enhancements later. Of course, having everything right away
is the best, but the problem with that is that it takes (more) time.

> 
> Probably also need to evaluate updates to
> Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
> supposed to interact with an 'activate' attribute.  Thanks,
> 

I'm not sure documenting the 'activate' in Documentation/ABI/testing/sysfs-bus-vfio-mdev
is necessary. I see it more like the other type specific attributes. For
instance drivers/gpu/drm/i915/gvt/kvmgt.c seems to have vgpu_id and
https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-red-hat-el-kvm
suggests that Nvidia vgpus have a vgpu_params (read-write) attribute to
'control the behavior of the vGPU, such as the frame rate limiter...'.

AFAIK the activate and the assign/unassign attributes are because we can
not do much with homogeneous vfio_ap mdevs. The only safe default for
a vfio_ap mdev is all zero masks, that can't use any resources, that
is essentially useless. And this is how we construct the vfio_ap mdev
devices. In that sense we could as well construct them 'activated',
as empty won't result in any kind of conflicts (and allow the admin
to de-activate these so we are still flexible about the lifecycle
management).

Regards,
Halil




>
Cornelia Huck Aug. 1, 2018, 11:27 a.m. UTC | #3
On Wed, 1 Aug 2018 13:12:05 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 07/27/2018 07:44 PM, Alex Williamson wrote:
> > On Thu, 26 Jul 2018 21:54:29 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > ...  
> >> +The process for reserving an AP queue for use by a KVM guest is:
> >> +
> >> +* The vfio-ap driver during its initialization will perform the following:
> >> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
> >> +  * Create the 'matrix' device in the 'vfio_ap' root
> >> +  * Register the matrix device with the device core
> >> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> >> +  newer) and to provide the vfio_ap driver's probe and remove callback
> >> +  interfaces. The reason why older devices are not supported is because there
> >> +  are no systems available on which to test.
> >> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
> >> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
> >> +  mask to /sys/bus/ap/aqmask.
> >> +
> >> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
> >> +
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask  
> > 
> > And this is a reasonable user syntax? ;)
> >   
> 
> We intend to provide tooling that makes this more human friendly. BTW
> echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.

This should be reasonably configurable even without tooling. If you
want to have any APQNs reserved to the default drivers, you'll end up
counting positions, and with the mask lengths here, it's like
configuring zfcp, only worse.

See also my comments to the patch introducing the masks.
Halil Pasic Aug. 1, 2018, 12:15 p.m. UTC | #4
On 08/01/2018 01:27 PM, Cornelia Huck wrote:
> On Wed, 1 Aug 2018 13:12:05 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On 07/27/2018 07:44 PM, Alex Williamson wrote:
>>> On Thu, 26 Jul 2018 21:54:29 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>> ...
>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>> +
>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>>>> +  * Create the 'matrix' device in the 'vfio_ap' root
>>>> +  * Register the matrix device with the device core
>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>>>> +  interfaces. The reason why older devices are not supported is because there
>>>> +  are no systems available on which to test.
>>>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>>>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>>>> +  mask to /sys/bus/ap/aqmask.
>>>> +
>>>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>>>> +
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
>>>
>>> And this is a reasonable user syntax? ;)
>>>    
>>
>> We intend to provide tooling that makes this more human friendly. BTW
>> echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.
> 
> This should be reasonably configurable even without tooling. If you
> want to have any APQNs reserved to the default drivers, you'll end up
> counting positions, and with the mask lengths here, it's like
> configuring zfcp, only worse.
> 
> See also my comments to the patch introducing the masks.
> 

Yes, I've seen those comments. I did not feel like meddling in. These
questions are best discussed with Harald. Harald is still on vacation
however. My opinion is that both approaches (bitmap and list-based
whatever) have their pros and cons.

Halil
Alex Williamson Aug. 1, 2018, 6:08 p.m. UTC | #5
On Wed, 1 Aug 2018 13:12:05 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 07/27/2018 07:44 PM, Alex Williamson wrote:
> > On Thu, 26 Jul 2018 21:54:29 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > ...  
> >> +The process for reserving an AP queue for use by a KVM guest is:
> >> +
> >> +* The vfio-ap driver during its initialization will perform the following:
> >> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
> >> +  * Create the 'matrix' device in the 'vfio_ap' root
> >> +  * Register the matrix device with the device core
> >> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> >> +  newer) and to provide the vfio_ap driver's probe and remove callback
> >> +  interfaces. The reason why older devices are not supported is because there
> >> +  are no systems available on which to test.
> >> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
> >> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
> >> +  mask to /sys/bus/ap/aqmask.
> >> +
> >> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
> >> +
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask  
> > 
> > And this is a reasonable user syntax? ;)
> >   
> 
> We intend to provide tooling that makes this more human friendly. BTW
> echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.

If tooling is required, perhaps the interface is wrong by design.
Maybe the interface should accept a list of bits, for example:

echo 255,0 > /sys/bus/ap/aqmask

vs

echo 0x8000000000000000000000000000000000000000000000000000000000000001 > /sys/bus/ap/aqmask

> > ...  
> >> +  * mdev_attr_groups
> >> +    This attribute group identifies the user-defined sysfs attributes of the
> >> +    mediated device. When a device is registered with the VFIO mediated device
> >> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
> >> +    structure will be created in the mediated matrix device's directory. The
> >> +    sysfs attributes for a mediated matrix device are:
> >> +    * assign_adapter:
> >> +      A write-only file for assigning an AP adapter to the mediated matrix
> >> +      device. To assign an adapter, the APID of the adapter is written to the
> >> +      file.
> >> +    * assign_domain:
> >> +      A write-only file for assigning an AP usage domain to the mediated matrix
> >> +      device. To assign a domain, the APQI of the AP queue corresponding to a
> >> +      usage domain is written to the file.
> >> +    * matrix:
> >> +      A read-only file for displaying the APQNs derived from the adapters and
> >> +      domains assigned to the mediated matrix device.
> >> +    * assign_control_domain:
> >> +      A write-only file for assigning an AP control domain to the mediated
> >> +      matrix device. To assign a control domain, the ID of a domain to be
> >> +      controlled is written to the file. For the initial implementation, the set
> >> +      of control domains will always include the set of usage domains, so it is
> >> +      only necessary to assign control domains that are not also assigned as
> >> +      usage domains.  
> > 
> > 
> > How will the user know when this changes?  What's the transition plan
> > to maintain compatibility when it does change?
> >   
> 
> I don't think this is supposed to change under the users feet. This document
> probably started out as a design document. While I'm personally in favor of
> managing the two separate, I think we need to commit to either-or and
> get rid of 'For the initial implementation,'.

+1

> > ...  
> >> +4. The administrator now needs to configure the matrixes for mediated
> >> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
> >> +
> >> +   This is how the matrix is configured for Guest1:
> >> +
> >> +   echo 5 > assign_adapter
> >> +   echo 6 > assign_adapter
> >> +   echo 4 > assign_domain
> >> +   echo 0xab > assign_domain
> >> +
> >> +   For this implementation, all usage domains - i.e., domains assigned
> >> +   via the assign_domain attribute file - will also be configured in the ADM
> >> +   field of the KVM guest's CRYCB, so there is no need to assign control
> >> +   domains here unless you want to assign control domains that are not
> >> +   assigned as usage domains.
> >> +
> >> +   If a mistake is made configuring an adapter, domain or control domain,
> >> +   you can use the unassign_xxx files to unassign the adapter, domain or
> >> +   control domain.  
> > 
> > Would it be more consistent with other sysfs entries to call these
> > "bind" and "unbind" rather than "assign" and "unassign"?
> >   
> 
> I'm not sure. Let me recap what is this about. First we partition our AP
> resources (queues) into 'to be used by the host' and 'to be used by the
> guests' using the /sys/bus/ap/a[pq]mask. Then we sub-partition 'to be
> used by the guests' among the guests. Each vfio ap_matrix represents a
> partition that corresponds to a single VM (guest). Such a partition can
> be seen as a device that grants access (usage or administration) to a set
> of queues. We can grant access to queues even if these are not currently
> available in the system. E.g. we can partition an AP card that is
> currently not online or even plugged. When The resources become available
> (e.g. new card plugged, or just 'configured'), the resources (should)
> become available at all levels if the authorization is right. I know
> 'bind' binding a device (that currently exists within the system) to a
> driver. The vfio_ap stuff seems different enough to seek a different name
> for it.

As discussed elsewhere, I'm not in favor of allowing mdev devices to
exist which rely on resources that are not present or in-use
elsewhere.  If we remove that concept, then we are actually binding and
unbinding resources to the mdev device.

> >> +
> >> +   To display the matrix configuration for Guest1:
> >> +
> >> +   cat matrix
> >> +
> >> +   This is how the matrix is configured for Guest2:
> >> +
> >> +   echo 5 > assign_adapter
> >> +   echo 0x47 > assign_domain
> >> +   echo 0xff > assign_domain
> >> +
> >> +5. The adminstrator now needs to activate the mediated devices.
> >> +   echo 1 > activate  
> > 
> > Or optionally not.  As in reply to cover letter, I don't think this
> > interface is sufficiently justified, or necessarily desirable.
> >   
> 
> You are right. Activation is not necessary because the open will try to
> auto-activate it. But honestly, I would recommend activating it in
> advance, so the admin can be sure the resources are claimed.

This can also be solved by claiming the resource at the point at which
it is bound/assigned/attached to the mdev device (ie. at any point
where we introduce new intersections in the matrix).

> In our current design we choose to give the administrator the freedom, to
> decide on his policy how does he want to manage the vfio_ap mdevs
> lifecycle. And  to decide what guarantees does he want to have at a given
> point in time. With 'activate' both 'dynamically and on-demand of
> starting a VM' and 'create known vfio_ap mdevs at system bring up' are
> viable. If there were a consensus on how the life-cycle of the mdev
> devices is to be managed, it would be easier to not give this freedom to
> the admin.

Please, this is not a freedom of the admin issue, this simply defines
the point in time at which bound/assigned/attached resources are
committed to the mdev device.  In the model proposed here, the activate
attribute allows resources to be over-committed, placing the burden on
the management tools to implicitly understand which mdev devices can be
used simultaneously.  With the existing model, resources are committed
on creation.  I'll agree that such a model doesn't necessarily apply to
these new devices, but I'd argue the next logical step is that
resources are committed as they're attached such that the mdev device
is always in a usable state.  AFAICT, the same "freedom" is available
to the admin by simply scripting mdev creation and assembly without a
change to the mdev interface.

> >> +6. Start Guest1:
> >> +
> >> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> >> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
> >> +
> >> +7. Start Guest2:
> >> +
> >> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> >> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
> >> +
> >> +When the guest is shut down, the mediated matrix device may be removed.
> >> +
> >> +Using our example again, to remove the mediated matrix device $uuid1:
> >> +
> >> +   /sys/devices/virtual/misc
> >> +      --- [vfio_ap]
> >> +      --------- [mdev_supported_types]
> >> +      ------------ [vfio_ap-passthrough]
> >> +      --------------- [devices]
> >> +      ------------------ [$uuid1]
> >> +      --------------------- activate
> >> +      --------------------- remove
> >> +
> >> +
> >> +   echo 0 > activate
> >> +   echo 1 > remove
> >> +
> >> +   This will release all the AP queues for the mediated device and
> >> +   remove all of the mdev matrix device's sysfs structures. To
> >> +   recreate and reconfigure the mdev matrix device, all of the steps starting
> >> +   with step 4 will have to be performed again.
> >> +
> >> +   It is not necessary to remove an mdev matrix device, but one may want to
> >> +   remove it if no guest will use it during the lifetime of the linux host. If
> >> +   the mdev matrix device is removed, one may want to unbind the AP queues the
> >> +   guest was using from the vfio_ap device driver and bind them back to the
> >> +   default driver. Alternatively, the AP queues can be configured for another
> >> +   mdev matrix (i.e., guest). In either case, one must take care to change the
> >> +   secure key configured for the domain to which the queue is connected.  
> > 
> > This seems prime for data leakage, extremely sensitive data at that.
> > Who's responsibility is it to prevent this?  Shouldn't closing the
> > device reset the device, which should perhaps wipe any key
> > configuration?
> >   
> 
> The documentation seems to be a bit outdated here. We do reset the device
> in the release callback and on VFIO_DEVICE_RESET ioctl. There might be some
> stuff that is not supposed to be reset by us, but that is fine (master keys
> managed via special infrastructure called the TKE). I'm not sure what is the
> meaning of last sentence of the paragraph in question  ('change the secure
> key configured ...') though. Anyway we will have to update this.
> 
> >> +
> >> +
> >> +Limitations
> >> +===========
> >> +An admin needs to be careful when writing to sysfs files
> >> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
> >> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
> >> +driver and assigned to a mediated device, admin should not re-assign the
> >> +AP queues back to the default driver, because of technical limitations.
> >> +The kernel does not prevent you from re-assigning from AP queues reserved
> >> +for the vfio_ap driver back to default driver.  Future enhancements in
> >> +the ap_bus and vfio_ap are likely to introduce complete support for such
> >> +dynamic reconfiguration. But until then be extremely careful.  
> > 
> > Why don't we have these enhancements now?  Should the whole thing be
> > marked experimental without them?  This sounds similar to a vfio-pci
> > case where a group with multiple devices could have device which is
> > unused by the user unbound from vfio-pci and re-bound to a native host
> > driver.  We BUG_ON when this occurs to protect the data.  
> 
> I think, marking vfio_ap as experimental is probably a good idea. I think
> we can do these enhancements later. Of course, having everything right away
> is the best, but the problem with that is that it takes (more) time.
> 
> > 
> > Probably also need to evaluate updates to
> > Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
> > supposed to interact with an 'activate' attribute.  Thanks,
> >   
> 
> I'm not sure documenting the 'activate' in Documentation/ABI/testing/sysfs-bus-vfio-mdev
> is necessary. I see it more like the other type specific attributes. For
> instance drivers/gpu/drm/i915/gvt/kvmgt.c seems to have vgpu_id and
> https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-red-hat-el-kvm
> suggests that Nvidia vgpus have a vgpu_params (read-write) attribute to
> 'control the behavior of the vGPU, such as the frame rate limiter...'.

I disagree, the difference is that the existing vendor attributes can
modify the behavior of the mdev device, but are not part of the
standard mdev sysfs ABI.  You recommend above that 'activate' should be
used prior to open(2) in order to commit mdev resources.  That's a
fundamental change in the management of the device, which is only
obscured by the fact that open(2) does an implicit activate.  Let's
please just drop this whole notion of 'activate'.

> AFAIK the activate and the assign/unassign attributes are because we can
> not do much with homogeneous vfio_ap mdevs. The only safe default for
> a vfio_ap mdev is all zero masks, that can't use any resources, that
> is essentially useless. And this is how we construct the vfio_ap mdev
> devices. In that sense we could as well construct them 'activated',
> as empty won't result in any kind of conflicts (and allow the admin
> to de-activate these so we are still flexible about the lifecycle
> management).

I agree that a zero mask is the only safe default for creation and that
vendor specific attributes are necessary to build the configuration up
to something that provides some usable value (unless we decide to
expand the create interface).  As above though, I disagree about what
'activate' does or does not provide.  I don't see that it adds and
flexibility in lifecycle management, at least not in a way that's
compatible with existing mdev devices or in a way to facilitates upper
level software management.  It seems to only allow the admin to avoid
dynamically creating and attaching resources to mdevs, at the cost of
manage layers needing to account for over-commitment.  Thanks,

Alex
Anthony Krowiak Aug. 1, 2018, 11:05 p.m. UTC | #6
On 07/27/2018 01:44 PM, Alex Williamson wrote:
> On Thu, 26 Jul 2018 21:54:29 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> ...
>> +The process for reserving an AP queue for use by a KVM guest is:
>> +
>> +* The vfio-ap driver during its initialization will perform the following:
>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>> +  * Create the 'matrix' device in the 'vfio_ap' root
>> +  * Register the matrix device with the device core
>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>> +  interfaces. The reason why older devices are not supported is because there
>> +  are no systems available on which to test.
>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>> +  mask to /sys/bus/ap/aqmask.
>> +
>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>> +
>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
> And this is a reasonable user syntax? ;)

I am not a fan of writing a 256-bit bitmask as such. It is entirely too 
difficult to
figure out which character to specify to indicate a bit number to be set 
for anything
past the first few characters. Of course, tooling could be used to 
accomplish this
task, but the sysfs interfaces should probably be user-friendly. A 
comma-separated
list could be used; for example to set reserve adapters 21, 99 and 249:

echo 21,99,249 > /sys/bus/ap/apmask

The problem with this is how do we indicate that all bits are to be 
cleared? Do you
have any suggestions for a reasonable user syntax?

>
> ...
>> +  * mdev_attr_groups
>> +    This attribute group identifies the user-defined sysfs attributes of the
>> +    mediated device. When a device is registered with the VFIO mediated device
>> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
>> +    structure will be created in the mediated matrix device's directory. The
>> +    sysfs attributes for a mediated matrix device are:
>> +    * assign_adapter:
>> +      A write-only file for assigning an AP adapter to the mediated matrix
>> +      device. To assign an adapter, the APID of the adapter is written to the
>> +      file.
>> +    * assign_domain:
>> +      A write-only file for assigning an AP usage domain to the mediated matrix
>> +      device. To assign a domain, the APQI of the AP queue corresponding to a
>> +      usage domain is written to the file.
>> +    * matrix:
>> +      A read-only file for displaying the APQNs derived from the adapters and
>> +      domains assigned to the mediated matrix device.
>> +    * assign_control_domain:
>> +      A write-only file for assigning an AP control domain to the mediated
>> +      matrix device. To assign a control domain, the ID of a domain to be
>> +      controlled is written to the file. For the initial implementation, the set
>> +      of control domains will always include the set of usage domains, so it is
>> +      only necessary to assign control domains that are not also assigned as
>> +      usage domains.
>
> How will the user know when this changes?  What's the transition plan
> to maintain compatibility when it does change?

I'm sorry, I don't understand the question. When you say user, about 
whom are
you talking; the person doing the assignments? What changes are you talking
about?

>
> ...
>> +4. The administrator now needs to configure the matrixes for mediated
>> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
>> +
>> +   This is how the matrix is configured for Guest1:
>> +
>> +   echo 5 > assign_adapter
>> +   echo 6 > assign_adapter
>> +   echo 4 > assign_domain
>> +   echo 0xab > assign_domain
>> +
>> +   For this implementation, all usage domains - i.e., domains assigned
>> +   via the assign_domain attribute file - will also be configured in the ADM
>> +   field of the KVM guest's CRYCB, so there is no need to assign control
>> +   domains here unless you want to assign control domains that are not
>> +   assigned as usage domains.
>> +
>> +   If a mistake is made configuring an adapter, domain or control domain,
>> +   you can use the unassign_xxx files to unassign the adapter, domain or
>> +   control domain.
> Would it be more consistent with other sysfs entries to call these
> "bind" and "unbind" rather than "assign" and "unassign

Aren't the bind and unbind sysfs interfaces typically associated with
binding devices to and unbinding devices from a device driver? In this
case we are talking about configuring a mediated device's AP matrix - i.e.,
setting bits identifying the adapters, domains and control domains defined
for the mediated device (i.e., a guest). Maybe there are better words we
could use than assign/unassign, but I find the use of bind/unbind to be
confusing. I'm open to suggestions.

>
>> +
>> +   To display the matrix configuration for Guest1:
>> +
>> +   cat matrix
>> +
>> +   This is how the matrix is configured for Guest2:
>> +
>> +   echo 5 > assign_adapter
>> +   echo 0x47 > assign_domain
>> +   echo 0xff > assign_domain
>> +
>> +5. The adminstrator now needs to activate the mediated devices.
>> +   echo 1 > activate
> Or optionally not.  As in reply to cover letter, I don't think this
> interface is sufficiently justified, or necessarily desirable.

To clarify your objection, let me state what I think you are saying.

1. You do not think there is a good reason to assign duplicate APQNs
    - i.e., the cross product of all adapter IDs and domain IDs assigned
    to the mdev.

2. All mdevs should be validly configured for use regardless of when
    guests using them are started. In other words, libvirt should be
    able to depend on the fact that an mdev is ready for use at all
    times.

I will discuss this with the team when we meet on Thursday and get
back to you. If I am mistaken in my analysis of your concern, please
clarify.

I would like to mention that when a guest using an mdev starts,
the mdev will be activated if not already activated. The guest
will not start if another activated mdev is using an APQN assigned
to the mdev of the guest being started.

>
>> +6. Start Guest1:
>> +
>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
>> +
>> +7. Start Guest2:
>> +
>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
>> +
>> +When the guest is shut down, the mediated matrix device may be removed.
>> +
>> +Using our example again, to remove the mediated matrix device $uuid1:
>> +
>> +   /sys/devices/virtual/misc
>> +      --- [vfio_ap]
>> +      --------- [mdev_supported_types]
>> +      ------------ [vfio_ap-passthrough]
>> +      --------------- [devices]
>> +      ------------------ [$uuid1]
>> +      --------------------- activate
>> +      --------------------- remove
>> +
>> +
>> +   echo 0 > activate
>> +   echo 1 > remove
>> +
>> +   This will release all the AP queues for the mediated device and
>> +   remove all of the mdev matrix device's sysfs structures. To
>> +   recreate and reconfigure the mdev matrix device, all of the steps starting
>> +   with step 4 will have to be performed again.
>> +
>> +   It is not necessary to remove an mdev matrix device, but one may want to
>> +   remove it if no guest will use it during the lifetime of the linux host. If
>> +   the mdev matrix device is removed, one may want to unbind the AP queues the
>> +   guest was using from the vfio_ap device driver and bind them back to the
>> +   default driver. Alternatively, the AP queues can be configured for another
>> +   mdev matrix (i.e., guest). In either case, one must take care to change the
>> +   secure key configured for the domain to which the queue is connected.
> This seems prime for data leakage, extremely sensitive data at that.
> Who's responsibility is it to prevent this?  Shouldn't closing the
> device reset the device, which should perhaps wipe any key
> configuration?

The device is zeroized (reset) when it is removed. The zeroize 
instruction - to
the best of my knowledge - does nothing with the secure key for a given 
queue.
There is also no instruction that I know of to clear keys, so it would 
appear
that this is left to the system administrator.

>
>> +
>> +
>> +Limitations
>> +===========
>> +An admin needs to be careful when writing to sysfs files
>> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
>> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
>> +driver and assigned to a mediated device, admin should not re-assign the
>> +AP queues back to the default driver, because of technical limitations.
>> +The kernel does not prevent you from re-assigning from AP queues reserved
>> +for the vfio_ap driver back to default driver.  Future enhancements in
>> +the ap_bus and vfio_ap are likely to introduce complete support for such
>> +dynamic reconfiguration. But until then be extremely careful.
> Why don't we have these enhancements now?  Should the whole thing be
> marked experimental without them?  This sounds similar to a vfio-pci
> case where a group with multiple devices could have device which is
> unused by the user unbound from vfio-pci and re-bound to a native host
> driver.  We BUG_ON when this occurs to protect the data.

We have customers that have been waiting a long time and are very anxious
to have guest access to AP devices. In the interest of reducing time to
market, we have decided to simplify the initial implementation as much
as possible. I wouldn't necessarily call this experimental, but view it
more as the minimal viable product.

As far as BUG_ON, I will discuss this with the crypto team at our
meeting on Thursday and get back to you.

>
> Probably also need to evaluate updates to
> Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
> supposed to interact with an 'activate' attribute.  Thanks,

As it stands, libvirt will not create or modify the attributes of an mdev.
If a guest is started under libvirt and its mdev has not been activated,
it will be activated when the mdev fd is opened. Of course, if another
activated mdev is assigned an APQN assigned to the mdev of the guest
being started, the guest will be terminated.

>
> Alex
>
Halil Pasic Aug. 2, 2018, 2:56 p.m. UTC | #7
On 08/01/2018 08:08 PM, Alex Williamson wrote:
> On Wed, 1 Aug 2018 13:12:05 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
>> On 07/27/2018 07:44 PM, Alex Williamson wrote:
>>> On Thu, 26 Jul 2018 21:54:29 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>> ...
>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>> +
>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>>>> +  * Create the 'matrix' device in the 'vfio_ap' root
>>>> +  * Register the matrix device with the device core
>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>>>> +  interfaces. The reason why older devices are not supported is because there
>>>> +  are no systems available on which to test.
>>>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>>>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>>>> +  mask to /sys/bus/ap/aqmask.
>>>> +
>>>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>>>> +
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
>>>
>>> And this is a reasonable user syntax? ;)
>>>    
>>
>> We intend to provide tooling that makes this more human friendly. BTW
>> echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.
> 
> If tooling is required, perhaps the interface is wrong by design.
> Maybe the interface should accept a list of bits, for example:
> 
> echo 255,0 > /sys/bus/ap/aqmask
> 
> vs
> 
> echo 0x8000000000000000000000000000000000000000000000000000000000000001 > /sys/bus/ap/aqmask
> 

Sorry I'm not the right person to have this discussion with. Just wanted to
provide some background as someone who was there when this was discussed internally.
Harald should be back next week.

[..]

>>> Would it be more consistent with other sysfs entries to call these
>>> "bind" and "unbind" rather than "assign" and "unassign"?
>>>    
>>
>> I'm not sure. Let me recap what is this about. First we partition our AP
>> resources (queues) into 'to be used by the host' and 'to be used by the
>> guests' using the /sys/bus/ap/a[pq]mask. Then we sub-partition 'to be
>> used by the guests' among the guests. Each vfio ap_matrix represents a
>> partition that corresponds to a single VM (guest). Such a partition can
>> be seen as a device that grants access (usage or administration) to a set
>> of queues. We can grant access to queues even if these are not currently
>> available in the system. E.g. we can partition an AP card that is
>> currently not online or even plugged. When The resources become available
>> (e.g. new card plugged, or just 'configured'), the resources (should)
>> become available at all levels if the authorization is right. I know
>> 'bind' binding a device (that currently exists within the system) to a
>> driver. The vfio_ap stuff seems different enough to seek a different name
>> for it.
> 
> As discussed elsewhere, I'm not in favor of allowing mdev devices to
> exist which rely on resources that are not present or in-use
> elsewhere.  If we remove that concept, then we are actually binding and
> unbinding resources to the mdev device.
> 

I understand and respect your point. I'm torn between the mdev perspective
and the perspective given by the architecture. On s390x KVM is a second and
higher level hyperviseor, and I prefer if the things are consistent among
the different levels of virtualization (if possible). What I described (granting
and gaining access to AP resources that are not a part of the configuration at
a certain moment) is how things work at LPAR level.

In use elsewhere is a taboo of course, and we check for that. Here we are
completely on the same page. But I fail to see why 'not (yet) present but
guaranteed to be 'ours' once it shows up (if)' has to be excluded on
KVM level if it is valid on LPAR level. Can you help me understand that
aspect?

>>>> +
>>>> +   To display the matrix configuration for Guest1:
>>>> +
>>>> +   cat matrix
>>>> +
>>>> +   This is how the matrix is configured for Guest2:
>>>> +
>>>> +   echo 5 > assign_adapter
>>>> +   echo 0x47 > assign_domain
>>>> +   echo 0xff > assign_domain
>>>> +
>>>> +5. The adminstrator now needs to activate the mediated devices.
>>>> +   echo 1 > activate
>>>
>>> Or optionally not.  As in reply to cover letter, I don't think this
>>> interface is sufficiently justified, or necessarily desirable.
>>>    
>>
>> You are right. Activation is not necessary because the open will try to
>> auto-activate it. But honestly, I would recommend activating it in
>> advance, so the admin can be sure the resources are claimed.
> 
> This can also be solved by claiming the resource at the point at which
> it is bound/assigned/attached to the mdev device (ie. at any point
> where we introduce new intersections in the matrix).
> 

See below.

>> In our current design we choose to give the administrator the freedom, to
>> decide on his policy how does he want to manage the vfio_ap mdevs
>> lifecycle. And  to decide what guarantees does he want to have at a given
>> point in time. With 'activate' both 'dynamically and on-demand of
>> starting a VM' and 'create known vfio_ap mdevs at system bring up' are
>> viable. If there were a consensus on how the life-cycle of the mdev
>> devices is to be managed, it would be easier to not give this freedom to
>> the admin.
> 
> Please, this is not a freedom of the admin issue, this simply defines
> the point in time at which bound/assigned/attached resources are
> committed to the mdev device.  In the model proposed here, the activate
> attribute allows resources to be over-committed, placing the burden on
> the management tools to implicitly understand which mdev devices can be
> used simultaneously. 

We have never seen this as the responsibility of the management software. The
idea was that this would be another thing were management software does not have
complete understanding. If the admin decides to do things that way activating
the vfio_ap mdev was just supposed to fail.

> With the existing model, resources are committed
> on creation.  I'll agree that such a model doesn't necessarily apply to
> these new devices, but I'd argue the next logical step is that
> resources are committed as they're attached such that the mdev device
> is always in a usable state.

Having the device in a not usable state is indeed very ugly. When
we were discussing this internally I was arguing for an atomic bulk
update of the matrix that either succeeds or fails. Makes things easier
to think about. If we go by the acquire the diff on each assign/(bind)
operation we kind of have an transaction to get in the desired state.
Deadlocks may happen if no protocol is established, and similar.

Please also consider we don't add one resource at a time. If we have the
the adapters 0, 1 and 2 assigned, but no domains, and we assign domain
0 we 'get' not one but three queues.

The 'activate' is a quite new, and in my opinion somewhat messy concept
that was proposed during the workshop mentioned by Christian and introduced
in this round.

One of the things that is associated with 'activate' is kind of an 'edit'
and 'commit/save' workflow. Especially thinking about changing the authorizations
(the matrix) for a running VM (which is not supported yet), not being
able to do it in one go does not sound like a good idea to me. But that would
effectively mean that we would have to remain functional with the 'old' set
of resources (initially empty). The we could maintain device is operational
with the claimed  resources an invariant.

> AFAICT, the same "freedom" is available
> to the admin by simply scripting mdev creation and assembly without a
> change to the mdev interface.
> 

You mean the qemu hooks in libvirt I guess. I don't quite understand. AFAIU
if we use the hooks we commit to populate (and create) the vfio_ap mdevs
dynamically on demand.

AFAIU we can not have all three of the following at the same time
1) mutually overlapping definitions of vfio_ap devcies
2) construct and populate mdevs according to each of these definitions
    e.g at system bring up time.
3) Maintain all 'resources claimed' as an invariant for a vfio_ap mdev.

Your proposal resolves this by compromising on: not creating and populating
a vfio_ap device for each definition, but only for the ones that are actually
needed. I used to argue along the same lines, but I'm not sure how acceptable
is this hooks based solution to our management software people.

[..]

>>
>> I'm not sure documenting the 'activate' in Documentation/ABI/testing/sysfs-bus-vfio-mdev
>> is necessary. I see it more like the other type specific attributes. For
>> instance drivers/gpu/drm/i915/gvt/kvmgt.c seems to have vgpu_id and
>> https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-red-hat-el-kvm
>> suggests that Nvidia vgpus have a vgpu_params (read-write) attribute to
>> 'control the behavior of the vGPU, such as the frame rate limiter...'.
> 
> I disagree, the difference is that the existing vendor attributes can
> modify the behavior of the mdev device, but are not part of the
> standard mdev sysfs ABI.  You recommend above that 'activate' should be
> used prior to open(2) in order to commit mdev resources.  That's a
> fundamental change in the management of the device, which is only
> obscured by the fact that open(2) does an implicit activate.

I agree, you have a point. Requiring to do an activate before doing an
open is a heavy interference with the mdev ABI. And making the open
fail is ugly indeed.

> Let's please just drop this whole notion of 'activate'.
> 
>> AFAIK the activate and the assign/unassign attributes are because we can
>> not do much with homogeneous vfio_ap mdevs. The only safe default for
>> a vfio_ap mdev is all zero masks, that can't use any resources, that
>> is essentially useless. And this is how we construct the vfio_ap mdev
>> devices. In that sense we could as well construct them 'activated',
>> as empty won't result in any kind of conflicts (and allow the admin
>> to de-activate these so we are still flexible about the lifecycle
>> management).
> 
> I agree that a zero mask is the only safe default for creation and that
> vendor specific attributes are necessary to build the configuration up
> to something that provides some usable value (unless we decide to
> expand the create interface).  As above though, I disagree about what
> 'activate' does or does not provide.  I don't see that it adds and
> flexibility in lifecycle management, at least not in a way that's
> compatible with existing mdev devices or in a way to facilitates upper
> level software management.  It seems to only allow the admin to avoid
> dynamically creating and attaching resources to mdevs, at the cost of
> manage layers needing to account for over-commitment.  Thanks,

I tried to clarify my views on some of these points (responsibility of
management layers regarding over-commitment, and the flexibility in lifecycle
management). The truth is, I used to argue against 'activate', and I would
be very happy to forget about it if we can come up with something better.

It's a bit unfortunate that I will be on a vacation (I'm already on
vacation) for some time. I will try to stay tuned though. Thanks for the
nice and productive discussion. We (and especially I) learned a lot about your
take on the problem.

Cheers,
Halil
Alex Williamson Aug. 2, 2018, 3:25 p.m. UTC | #8
On Wed, 1 Aug 2018 19:05:35 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 07/27/2018 01:44 PM, Alex Williamson wrote:
> > On Thu, 26 Jul 2018 21:54:29 +0200
> > Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> > ...  
> >> +The process for reserving an AP queue for use by a KVM guest is:
> >> +
> >> +* The vfio-ap driver during its initialization will perform the following:
> >> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
> >> +  * Create the 'matrix' device in the 'vfio_ap' root
> >> +  * Register the matrix device with the device core
> >> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
> >> +  newer) and to provide the vfio_ap driver's probe and remove callback
> >> +  interfaces. The reason why older devices are not supported is because there
> >> +  are no systems available on which to test.
> >> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
> >> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
> >> +  mask to /sys/bus/ap/aqmask.
> >> +
> >> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
> >> +
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
> >> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask  
> > And this is a reasonable user syntax? ;)  
> 
> I am not a fan of writing a 256-bit bitmask as such. It is entirely too 
> difficult to
> figure out which character to specify to indicate a bit number to be set 
> for anything
> past the first few characters. Of course, tooling could be used to 
> accomplish this
> task, but the sysfs interfaces should probably be user-friendly. A 
> comma-separated
> list could be used; for example to set reserve adapters 21, 99 and 249:
> 
> echo 21,99,249 > /sys/bus/ap/apmask
> 
> The problem with this is how do we indicate that all bits are to be 
> cleared? Do you
> have any suggestions for a reasonable user syntax?

Perhaps an empty write, just a carriage return, "null".  I'd shy away
from magic digits, like 256 to clear.

> > ...  
> >> +  * mdev_attr_groups
> >> +    This attribute group identifies the user-defined sysfs attributes of the
> >> +    mediated device. When a device is registered with the VFIO mediated device
> >> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
> >> +    structure will be created in the mediated matrix device's directory. The
> >> +    sysfs attributes for a mediated matrix device are:
> >> +    * assign_adapter:
> >> +      A write-only file for assigning an AP adapter to the mediated matrix
> >> +      device. To assign an adapter, the APID of the adapter is written to the
> >> +      file.
> >> +    * assign_domain:
> >> +      A write-only file for assigning an AP usage domain to the mediated matrix
> >> +      device. To assign a domain, the APQI of the AP queue corresponding to a
> >> +      usage domain is written to the file.
> >> +    * matrix:
> >> +      A read-only file for displaying the APQNs derived from the adapters and
> >> +      domains assigned to the mediated matrix device.
> >> +    * assign_control_domain:
> >> +      A write-only file for assigning an AP control domain to the mediated
> >> +      matrix device. To assign a control domain, the ID of a domain to be
> >> +      controlled is written to the file. For the initial implementation, the set
> >> +      of control domains will always include the set of usage domains, so it is
> >> +      only necessary to assign control domains that are not also assigned as
> >> +      usage domains.  
> >
> > How will the user know when this changes?  What's the transition plan
> > to maintain compatibility when it does change?  
> 
> I'm sorry, I don't understand the question. When you say user, about 
> whom are
> you talking; the person doing the assignments? What changes are you talking
> about?

The user would be the person interaction with the sysfs attribute
assign_control_domain where it states "[f]or the initial implementation,
the set of control domains will always include the set of usage
domains, so it is only necessary to assign control domains that are not
also assigned as usage domains."  I infer from the usage of "initial
implementation" that this behavior may not always be the case and
therefore if it's not final what is the transition plan so that a user
interacting with this attribute can know the current behavior.

> > ...  
> >> +4. The administrator now needs to configure the matrixes for mediated
> >> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
> >> +
> >> +   This is how the matrix is configured for Guest1:
> >> +
> >> +   echo 5 > assign_adapter
> >> +   echo 6 > assign_adapter
> >> +   echo 4 > assign_domain
> >> +   echo 0xab > assign_domain
> >> +
> >> +   For this implementation, all usage domains - i.e., domains assigned
> >> +   via the assign_domain attribute file - will also be configured in the ADM
> >> +   field of the KVM guest's CRYCB, so there is no need to assign control
> >> +   domains here unless you want to assign control domains that are not
> >> +   assigned as usage domains.
> >> +
> >> +   If a mistake is made configuring an adapter, domain or control domain,
> >> +   you can use the unassign_xxx files to unassign the adapter, domain or
> >> +   control domain.  
> > Would it be more consistent with other sysfs entries to call these
> > "bind" and "unbind" rather than "assign" and "unassign  
> 
> Aren't the bind and unbind sysfs interfaces typically associated with
> binding devices to and unbinding devices from a device driver?

Yes

> In this
> case we are talking about configuring a mediated device's AP matrix - i.e.,
> setting bits identifying the adapters, domains and control domains defined
> for the mediated device (i.e., a guest). Maybe there are better words we
> could use than assign/unassign, but I find the use of bind/unbind to be
> confusing. I'm open to suggestions.

If we remove the 'activate' interface and dedicate resources to the
mdev device at the point where we create an intersection in the matrix,
wouldn't that resource be "bound" to the mdev at that time?  This is
not a strong issue for me, I just thought "assign" seemed inconsistent
as the operation felt more like a traditional "bind".

> >> +
> >> +   To display the matrix configuration for Guest1:
> >> +
> >> +   cat matrix
> >> +
> >> +   This is how the matrix is configured for Guest2:
> >> +
> >> +   echo 5 > assign_adapter
> >> +   echo 0x47 > assign_domain
> >> +   echo 0xff > assign_domain
> >> +
> >> +5. The adminstrator now needs to activate the mediated devices.
> >> +   echo 1 > activate  
> > Or optionally not.  As in reply to cover letter, I don't think this
> > interface is sufficiently justified, or necessarily desirable.  
> 
> To clarify your objection, let me state what I think you are saying.
> 
> 1. You do not think there is a good reason to assign duplicate APQNs
>     - i.e., the cross product of all adapter IDs and domain IDs assigned
>     to the mdev.

I think the cross product of adapters and domains is a basic part of
the design here, what I object to is that mdevs can exist with
overlapping cross products and which one can be activated is dependent
on activation ordering.  For example, if I have the following mdevs:

{adapters}{domains}
A: {0,1}{0,1}
B: {0}{0}
C: {0}{1}
D: {1}{0}
E: {1}{1}

Who is supposed to understand and debug that A cannot be activated
while any of B/C/D/E are activated?  What does the debugging process
look like?  If the mdev is not explicitly activated, only opened, how
do we determine the fault is from a resource conflict on the mdev
definition?  On the other hand if the cross product is committed at the
time it's specified, then a scenario where E already exists and we're
trying to create A might look like:

1. Specify adapter mask of {0,1}
2. Specify domain mask of {0,1} <-- write(2) fails with -EBUSY

Here we know that one or more of the domains we've specified are not
available on the set of adapters we've configured.

> 2. All mdevs should be validly configured for use regardless of when
>     guests using them are started. In other words, libvirt should be
>     able to depend on the fact that an mdev is ready for use at all
>     times.

Yes, an mdev should represent committed resources.  We support dynamic
creation and removal if the resources need to be freed for the host or
another mdev configuration.

> I will discuss this with the team when we meet on Thursday and get
> back to you. If I am mistaken in my analysis of your concern, please
> clarify.
> 
> I would like to mention that when a guest using an mdev starts,
> the mdev will be activated if not already activated. The guest
> will not start if another activated mdev is using an APQN assigned
> to the mdev of the guest being started.

Understood.

> >> +6. Start Guest1:
> >> +
> >> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> >> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
> >> +
> >> +7. Start Guest2:
> >> +
> >> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
> >> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
> >> +
> >> +When the guest is shut down, the mediated matrix device may be removed.
> >> +
> >> +Using our example again, to remove the mediated matrix device $uuid1:
> >> +
> >> +   /sys/devices/virtual/misc
> >> +      --- [vfio_ap]
> >> +      --------- [mdev_supported_types]
> >> +      ------------ [vfio_ap-passthrough]
> >> +      --------------- [devices]
> >> +      ------------------ [$uuid1]
> >> +      --------------------- activate
> >> +      --------------------- remove
> >> +
> >> +
> >> +   echo 0 > activate
> >> +   echo 1 > remove
> >> +
> >> +   This will release all the AP queues for the mediated device and
> >> +   remove all of the mdev matrix device's sysfs structures. To
> >> +   recreate and reconfigure the mdev matrix device, all of the steps starting
> >> +   with step 4 will have to be performed again.
> >> +
> >> +   It is not necessary to remove an mdev matrix device, but one may want to
> >> +   remove it if no guest will use it during the lifetime of the linux host. If
> >> +   the mdev matrix device is removed, one may want to unbind the AP queues the
> >> +   guest was using from the vfio_ap device driver and bind them back to the
> >> +   default driver. Alternatively, the AP queues can be configured for another
> >> +   mdev matrix (i.e., guest). In either case, one must take care to change the
> >> +   secure key configured for the domain to which the queue is connected.  
> > This seems prime for data leakage, extremely sensitive data at that.
> > Who's responsibility is it to prevent this?  Shouldn't closing the
> > device reset the device, which should perhaps wipe any key
> > configuration?  
> 
> The device is zeroized (reset) when it is removed. The zeroize 
> instruction - to
> the best of my knowledge - does nothing with the secure key for a given 
> queue.
> There is also no instruction that I know of to clear keys, so it would 
> appear
> that this is left to the system administrator.

:-o  That seems like a pretty serious fault.  Given that this interface
is specifically for crypto devices where such sensitive information
seems common, doesn't it seem like the kernel, ie. the mdev vendor
driver, should provide guarantees that should your user instance crash
or be killed that keys are not available to the system admin?

> >> +
> >> +
> >> +Limitations
> >> +===========
> >> +An admin needs to be careful when writing to sysfs files
> >> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
> >> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
> >> +driver and assigned to a mediated device, admin should not re-assign the
> >> +AP queues back to the default driver, because of technical limitations.
> >> +The kernel does not prevent you from re-assigning from AP queues reserved
> >> +for the vfio_ap driver back to default driver.  Future enhancements in
> >> +the ap_bus and vfio_ap are likely to introduce complete support for such
> >> +dynamic reconfiguration. But until then be extremely careful.  
> > Why don't we have these enhancements now?  Should the whole thing be
> > marked experimental without them?  This sounds similar to a vfio-pci
> > case where a group with multiple devices could have device which is
> > unused by the user unbound from vfio-pci and re-bound to a native host
> > driver.  We BUG_ON when this occurs to protect the data.  
> 
> We have customers that have been waiting a long time and are very anxious
> to have guest access to AP devices. In the interest of reducing time to
> market, we have decided to simplify the initial implementation as much
> as possible. I wouldn't necessarily call this experimental, but view it
> more as the minimal viable product.
> 
> As far as BUG_ON, I will discuss this with the crypto team at our
> meeting on Thursday and get back to you.

A minimum viable product still needs to protect the host and does not
excuse us from later breaking the user exposed ABI.

> > Probably also need to evaluate updates to
> > Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
> > supposed to interact with an 'activate' attribute.  Thanks,  
> 
> As it stands, libvirt will not create or modify the attributes of an mdev.
> If a guest is started under libvirt and its mdev has not been activated,
> it will be activated when the mdev fd is opened. Of course, if another
> activated mdev is assigned an APQN assigned to the mdev of the guest
> being started, the guest will be terminated.

Exactly why I disapprove of the activated interface, but it's being
proposed and recommended for use as a standard part of the lifecycle
management for these devices.  Thanks,

Alex
Anthony Krowiak Aug. 2, 2018, 9:27 p.m. UTC | #9
On 08/01/2018 02:08 PM, Alex Williamson wrote:
> On Wed, 1 Aug 2018 13:12:05 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>> On 07/27/2018 07:44 PM, Alex Williamson wrote:
>>> On Thu, 26 Jul 2018 21:54:29 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>> ...
>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>> +
>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>>>> +  * Create the 'matrix' device in the 'vfio_ap' root
>>>> +  * Register the matrix device with the device core
>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>>>> +  interfaces. The reason why older devices are not supported is because there
>>>> +  are no systems available on which to test.
>>>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>>>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>>>> +  mask to /sys/bus/ap/aqmask.
>>>> +
>>>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>>>> +
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
>>> And this is a reasonable user syntax? ;)
>>>    
>> We intend to provide tooling that makes this more human friendly. BTW
>> echo 0 > sys/bus/ap/apmask would also work -- and looks less scary.
> If tooling is required, perhaps the interface is wrong by design.
> Maybe the interface should accept a list of bits, for example:
>
> echo 255,0 > /sys/bus/ap/aqmask
>
> vs
>
> echo 0x8000000000000000000000000000000000000000000000000000000000000001 > /sys/bus/ap/aqmask
>
>>> ...
>>>> +  * mdev_attr_groups
>>>> +    This attribute group identifies the user-defined sysfs attributes of the
>>>> +    mediated device. When a device is registered with the VFIO mediated device
>>>> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
>>>> +    structure will be created in the mediated matrix device's directory. The
>>>> +    sysfs attributes for a mediated matrix device are:
>>>> +    * assign_adapter:
>>>> +      A write-only file for assigning an AP adapter to the mediated matrix
>>>> +      device. To assign an adapter, the APID of the adapter is written to the
>>>> +      file.
>>>> +    * assign_domain:
>>>> +      A write-only file for assigning an AP usage domain to the mediated matrix
>>>> +      device. To assign a domain, the APQI of the AP queue corresponding to a
>>>> +      usage domain is written to the file.
>>>> +    * matrix:
>>>> +      A read-only file for displaying the APQNs derived from the adapters and
>>>> +      domains assigned to the mediated matrix device.
>>>> +    * assign_control_domain:
>>>> +      A write-only file for assigning an AP control domain to the mediated
>>>> +      matrix device. To assign a control domain, the ID of a domain to be
>>>> +      controlled is written to the file. For the initial implementation, the set
>>>> +      of control domains will always include the set of usage domains, so it is
>>>> +      only necessary to assign control domains that are not also assigned as
>>>> +      usage domains.
>>>
>>> How will the user know when this changes?  What's the transition plan
>>> to maintain compatibility when it does change?
>>>    
>> I don't think this is supposed to change under the users feet. This document
>> probably started out as a design document. While I'm personally in favor of
>> managing the two separate, I think we need to commit to either-or and
>> get rid of 'For the initial implementation,'.
> +1
>
>>> ...
>>>> +4. The administrator now needs to configure the matrixes for mediated
>>>> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
>>>> +
>>>> +   This is how the matrix is configured for Guest1:
>>>> +
>>>> +   echo 5 > assign_adapter
>>>> +   echo 6 > assign_adapter
>>>> +   echo 4 > assign_domain
>>>> +   echo 0xab > assign_domain
>>>> +
>>>> +   For this implementation, all usage domains - i.e., domains assigned
>>>> +   via the assign_domain attribute file - will also be configured in the ADM
>>>> +   field of the KVM guest's CRYCB, so there is no need to assign control
>>>> +   domains here unless you want to assign control domains that are not
>>>> +   assigned as usage domains.
>>>> +
>>>> +   If a mistake is made configuring an adapter, domain or control domain,
>>>> +   you can use the unassign_xxx files to unassign the adapter, domain or
>>>> +   control domain.
>>> Would it be more consistent with other sysfs entries to call these
>>> "bind" and "unbind" rather than "assign" and "unassign"?
>>>    
>> I'm not sure. Let me recap what is this about. First we partition our AP
>> resources (queues) into 'to be used by the host' and 'to be used by the
>> guests' using the /sys/bus/ap/a[pq]mask. Then we sub-partition 'to be
>> used by the guests' among the guests. Each vfio ap_matrix represents a
>> partition that corresponds to a single VM (guest). Such a partition can
>> be seen as a device that grants access (usage or administration) to a set
>> of queues. We can grant access to queues even if these are not currently
>> available in the system. E.g. we can partition an AP card that is
>> currently not online or even plugged. When The resources become available
>> (e.g. new card plugged, or just 'configured'), the resources (should)
>> become available at all levels if the authorization is right. I know
>> 'bind' binding a device (that currently exists within the system) to a
>> driver. The vfio_ap stuff seems different enough to seek a different name
>> for it.
> As discussed elsewhere, I'm not in favor of allowing mdev devices to
> exist which rely on resources that are not present or in-use
> elsewhere.  If we remove that concept, then we are actually binding and
> unbinding resources to the mdev device.
>
>>>> +
>>>> +   To display the matrix configuration for Guest1:
>>>> +
>>>> +   cat matrix
>>>> +
>>>> +   This is how the matrix is configured for Guest2:
>>>> +
>>>> +   echo 5 > assign_adapter
>>>> +   echo 0x47 > assign_domain
>>>> +   echo 0xff > assign_domain
>>>> +
>>>> +5. The adminstrator now needs to activate the mediated devices.
>>>> +   echo 1 > activate
>>> Or optionally not.  As in reply to cover letter, I don't think this
>>> interface is sufficiently justified, or necessarily desirable.
>>>    
>> You are right. Activation is not necessary because the open will try to
>> auto-activate it. But honestly, I would recommend activating it in
>> advance, so the admin can be sure the resources are claimed.
> This can also be solved by claiming the resource at the point at which
> it is bound/assigned/attached to the mdev device (ie. at any point
> where we introduce new intersections in the matrix).
>
>> In our current design we choose to give the administrator the freedom, to
>> decide on his policy how does he want to manage the vfio_ap mdevs
>> lifecycle. And  to decide what guarantees does he want to have at a given
>> point in time. With 'activate' both 'dynamically and on-demand of
>> starting a VM' and 'create known vfio_ap mdevs at system bring up' are
>> viable. If there were a consensus on how the life-cycle of the mdev
>> devices is to be managed, it would be easier to not give this freedom to
>> the admin.
> Please, this is not a freedom of the admin issue, this simply defines
> the point in time at which bound/assigned/attached resources are
> committed to the mdev device.  In the model proposed here, the activate
> attribute allows resources to be over-committed, placing the burden on
> the management tools to implicitly understand which mdev devices can be
> used simultaneously.  With the existing model, resources are committed
> on creation.  I'll agree that such a model doesn't necessarily apply to
> these new devices, but I'd argue the next logical step is that
> resources are committed as they're attached such that the mdev device
> is always in a usable state.  AFAICT, the same "freedom" is available
> to the admin by simply scripting mdev creation and assembly without a
> change to the mdev interface.
>
>>>> +6. Start Guest1:
>>>> +
>>>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>>>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
>>>> +
>>>> +7. Start Guest2:
>>>> +
>>>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>>>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
>>>> +
>>>> +When the guest is shut down, the mediated matrix device may be removed.
>>>> +
>>>> +Using our example again, to remove the mediated matrix device $uuid1:
>>>> +
>>>> +   /sys/devices/virtual/misc
>>>> +      --- [vfio_ap]
>>>> +      --------- [mdev_supported_types]
>>>> +      ------------ [vfio_ap-passthrough]
>>>> +      --------------- [devices]
>>>> +      ------------------ [$uuid1]
>>>> +      --------------------- activate
>>>> +      --------------------- remove
>>>> +
>>>> +
>>>> +   echo 0 > activate
>>>> +   echo 1 > remove
>>>> +
>>>> +   This will release all the AP queues for the mediated device and
>>>> +   remove all of the mdev matrix device's sysfs structures. To
>>>> +   recreate and reconfigure the mdev matrix device, all of the steps starting
>>>> +   with step 4 will have to be performed again.
>>>> +
>>>> +   It is not necessary to remove an mdev matrix device, but one may want to
>>>> +   remove it if no guest will use it during the lifetime of the linux host. If
>>>> +   the mdev matrix device is removed, one may want to unbind the AP queues the
>>>> +   guest was using from the vfio_ap device driver and bind them back to the
>>>> +   default driver. Alternatively, the AP queues can be configured for another
>>>> +   mdev matrix (i.e., guest). In either case, one must take care to change the
>>>> +   secure key configured for the domain to which the queue is connected.
>>> This seems prime for data leakage, extremely sensitive data at that.
>>> Who's responsibility is it to prevent this?  Shouldn't closing the
>>> device reset the device, which should perhaps wipe any key
>>> configuration?
>>>    
>> The documentation seems to be a bit outdated here. We do reset the device
>> in the release callback and on VFIO_DEVICE_RESET ioctl. There might be some
>> stuff that is not supposed to be reset by us, but that is fine (master keys
>> managed via special infrastructure called the TKE). I'm not sure what is the
>> meaning of last sentence of the paragraph in question  ('change the secure
>> key configured ...') though. Anyway we will have to update this.
>>
>>>> +
>>>> +
>>>> +Limitations
>>>> +===========
>>>> +An admin needs to be careful when writing to sysfs files
>>>> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
>>>> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
>>>> +driver and assigned to a mediated device, admin should not re-assign the
>>>> +AP queues back to the default driver, because of technical limitations.
>>>> +The kernel does not prevent you from re-assigning from AP queues reserved
>>>> +for the vfio_ap driver back to default driver.  Future enhancements in
>>>> +the ap_bus and vfio_ap are likely to introduce complete support for such
>>>> +dynamic reconfiguration. But until then be extremely careful.
>>> Why don't we have these enhancements now?  Should the whole thing be
>>> marked experimental without them?  This sounds similar to a vfio-pci
>>> case where a group with multiple devices could have device which is
>>> unused by the user unbound from vfio-pci and re-bound to a native host
>>> driver.  We BUG_ON when this occurs to protect the data.
>> I think, marking vfio_ap as experimental is probably a good idea. I think
>> we can do these enhancements later. Of course, having everything right away
>> is the best, but the problem with that is that it takes (more) time.
>>
>>> Probably also need to evaluate updates to
>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
>>> supposed to interact with an 'activate' attribute.  Thanks,
>>>    
>> I'm not sure documenting the 'activate' in Documentation/ABI/testing/sysfs-bus-vfio-mdev
>> is necessary. I see it more like the other type specific attributes. For
>> instance drivers/gpu/drm/i915/gvt/kvmgt.c seems to have vgpu_id and
>> https://docs.nvidia.com/grid/latest/grid-vgpu-user-guide/index.html#setting-vgpu-plugin-parameters-on-red-hat-el-kvm
>> suggests that Nvidia vgpus have a vgpu_params (read-write) attribute to
>> 'control the behavior of the vGPU, such as the frame rate limiter...'.
> I disagree, the difference is that the existing vendor attributes can
> modify the behavior of the mdev device, but are not part of the
> standard mdev sysfs ABI.  You recommend above that 'activate' should be
> used prior to open(2) in order to commit mdev resources.  That's a
> fundamental change in the management of the device, which is only
> obscured by the fact that open(2) does an implicit activate.  Let's
> please just drop this whole notion of 'activate'.

You'll be happy to know the team has decided to drop the activate.

>
>> AFAIK the activate and the assign/unassign attributes are because we can
>> not do much with homogeneous vfio_ap mdevs. The only safe default for
>> a vfio_ap mdev is all zero masks, that can't use any resources, that
>> is essentially useless. And this is how we construct the vfio_ap mdev
>> devices. In that sense we could as well construct them 'activated',
>> as empty won't result in any kind of conflicts (and allow the admin
>> to de-activate these so we are still flexible about the lifecycle
>> management).
> I agree that a zero mask is the only safe default for creation and that
> vendor specific attributes are necessary to build the configuration up
> to something that provides some usable value (unless we decide to
> expand the create interface).  As above though, I disagree about what
> 'activate' does or does not provide.  I don't see that it adds and
> flexibility in lifecycle management, at least not in a way that's
> compatible with existing mdev devices or in a way to facilitates upper
> level software management.  It seems to only allow the admin to avoid
> dynamically creating and attaching resources to mdevs, at the cost of
> manage layers needing to account for over-commitment.  Thanks,
>
> Alex
>
Anthony Krowiak Aug. 2, 2018, 9:59 p.m. UTC | #10
On 08/02/2018 11:25 AM, Alex Williamson wrote:
> On Wed, 1 Aug 2018 19:05:35 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> On 07/27/2018 01:44 PM, Alex Williamson wrote:
>>> On Thu, 26 Jul 2018 21:54:29 +0200
>>> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
>>> ...
>>>> +The process for reserving an AP queue for use by a KVM guest is:
>>>> +
>>>> +* The vfio-ap driver during its initialization will perform the following:
>>>> +  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
>>>> +  * Create the 'matrix' device in the 'vfio_ap' root
>>>> +  * Register the matrix device with the device core
>>>> +* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
>>>> +  newer) and to provide the vfio_ap driver's probe and remove callback
>>>> +  interfaces. The reason why older devices are not supported is because there
>>>> +  are no systems available on which to test.
>>>> +* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
>>>> +  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
>>>> +  mask to /sys/bus/ap/aqmask.
>>>> +
>>>> +  For example to reserve all the AP Queues on the system for vfio_ap driver:
>>>> +
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
>>>> +  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
>>> And this is a reasonable user syntax? ;)
>> I am not a fan of writing a 256-bit bitmask as such. It is entirely too
>> difficult to
>> figure out which character to specify to indicate a bit number to be set
>> for anything
>> past the first few characters. Of course, tooling could be used to
>> accomplish this
>> task, but the sysfs interfaces should probably be user-friendly. A
>> comma-separated
>> list could be used; for example to set reserve adapters 21, 99 and 249:
>>
>> echo 21,99,249 > /sys/bus/ap/apmask
>>
>> The problem with this is how do we indicate that all bits are to be
>> cleared? Do you
>> have any suggestions for a reasonable user syntax?
> Perhaps an empty write, just a carriage return, "null".  I'd shy away
> from magic digits, like 256 to clear.

The crypto team has decided to eliminate these sysfs attributes for the 
AP bus
and rely on bind/unbind.

>
>>> ...
>>>> +  * mdev_attr_groups
>>>> +    This attribute group identifies the user-defined sysfs attributes of the
>>>> +    mediated device. When a device is registered with the VFIO mediated device
>>>> +    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
>>>> +    structure will be created in the mediated matrix device's directory. The
>>>> +    sysfs attributes for a mediated matrix device are:
>>>> +    * assign_adapter:
>>>> +      A write-only file for assigning an AP adapter to the mediated matrix
>>>> +      device. To assign an adapter, the APID of the adapter is written to the
>>>> +      file.
>>>> +    * assign_domain:
>>>> +      A write-only file for assigning an AP usage domain to the mediated matrix
>>>> +      device. To assign a domain, the APQI of the AP queue corresponding to a
>>>> +      usage domain is written to the file.
>>>> +    * matrix:
>>>> +      A read-only file for displaying the APQNs derived from the adapters and
>>>> +      domains assigned to the mediated matrix device.
>>>> +    * assign_control_domain:
>>>> +      A write-only file for assigning an AP control domain to the mediated
>>>> +      matrix device. To assign a control domain, the ID of a domain to be
>>>> +      controlled is written to the file. For the initial implementation, the set
>>>> +      of control domains will always include the set of usage domains, so it is
>>>> +      only necessary to assign control domains that are not also assigned as
>>>> +      usage domains.
>>> How will the user know when this changes?  What's the transition plan
>>> to maintain compatibility when it does change?
>> I'm sorry, I don't understand the question. When you say user, about
>> whom are
>> you talking; the person doing the assignments? What changes are you talking
>> about?
> The user would be the person interaction with the sysfs attribute
> assign_control_domain where it states "[f]or the initial implementation,
> the set of control domains will always include the set of usage
> domains, so it is only necessary to assign control domains that are not
> also assigned as usage domains."  I infer from the usage of "initial
> implementation" that this behavior may not always be the case and
> therefore if it's not final what is the transition plan so that a user
> interacting with this attribute can know the current behavior.

Ah, I get your point now. I see how this verbiage would lead you to that
conclusion and it will be removed. This behavior will not change.

>
>>> ...
>>>> +4. The administrator now needs to configure the matrixes for mediated
>>>> +   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
>>>> +
>>>> +   This is how the matrix is configured for Guest1:
>>>> +
>>>> +   echo 5 > assign_adapter
>>>> +   echo 6 > assign_adapter
>>>> +   echo 4 > assign_domain
>>>> +   echo 0xab > assign_domain
>>>> +
>>>> +   For this implementation, all usage domains - i.e., domains assigned
>>>> +   via the assign_domain attribute file - will also be configured in the ADM
>>>> +   field of the KVM guest's CRYCB, so there is no need to assign control
>>>> +   domains here unless you want to assign control domains that are not
>>>> +   assigned as usage domains.
>>>> +
>>>> +   If a mistake is made configuring an adapter, domain or control domain,
>>>> +   you can use the unassign_xxx files to unassign the adapter, domain or
>>>> +   control domain.
>>> Would it be more consistent with other sysfs entries to call these
>>> "bind" and "unbind" rather than "assign" and "unassign
>> Aren't the bind and unbind sysfs interfaces typically associated with
>> binding devices to and unbinding devices from a device driver?
> Yes
>
>> In this
>> case we are talking about configuring a mediated device's AP matrix - i.e.,
>> setting bits identifying the adapters, domains and control domains defined
>> for the mediated device (i.e., a guest). Maybe there are better words we
>> could use than assign/unassign, but I find the use of bind/unbind to be
>> confusing. I'm open to suggestions.
> If we remove the 'activate' interface and dedicate resources to the
> mdev device at the point where we create an intersection in the matrix,
> wouldn't that resource be "bound" to the mdev at that time?  This is
> not a strong issue for me, I just thought "assign" seemed inconsistent
> as the operation felt more like a traditional "bind".

We are going to remove the activate for the next series, but stick with
the assign terminology because we're not really binding devices, but 
constructing a
matrix that identifies the AP queues to which the guest using the mdev
will have access.

>
>>>> +
>>>> +   To display the matrix configuration for Guest1:
>>>> +
>>>> +   cat matrix
>>>> +
>>>> +   This is how the matrix is configured for Guest2:
>>>> +
>>>> +   echo 5 > assign_adapter
>>>> +   echo 0x47 > assign_domain
>>>> +   echo 0xff > assign_domain
>>>> +
>>>> +5. The adminstrator now needs to activate the mediated devices.
>>>> +   echo 1 > activate
>>> Or optionally not.  As in reply to cover letter, I don't think this
>>> interface is sufficiently justified, or necessarily desirable.
>> To clarify your objection, let me state what I think you are saying.
>>
>> 1. You do not think there is a good reason to assign duplicate APQNs
>>      - i.e., the cross product of all adapter IDs and domain IDs assigned
>>      to the mdev.
> I think the cross product of adapters and domains is a basic part of
> the design here, what I object to is that mdevs can exist with
> overlapping cross products and which one can be activated is dependent
> on activation ordering.  For example, if I have the following mdevs:
>
> {adapters}{domains}
> A: {0,1}{0,1}
> B: {0}{0}
> C: {0}{1}
> D: {1}{0}
> E: {1}{1}
>
> Who is supposed to understand and debug that A cannot be activated
> while any of B/C/D/E are activated?  What does the debugging process
> look like?  If the mdev is not explicitly activated, only opened, how
> do we determine the fault is from a resource conflict on the mdev
> definition?  On the other hand if the cross product is committed at the
> time it's specified, then a scenario where E already exists and we're
> trying to create A might look like:
>
> 1. Specify adapter mask of {0,1}
> 2. Specify domain mask of {0,1} <-- write(2) fails with -EBUSY
>
> Here we know that one or more of the domains we've specified are not
> available on the set of adapters we've configured.

This is moot for the next series because the activate will be removed and
consistency checking will be done as adapters/domains are assigned.

>
>> 2. All mdevs should be validly configured for use regardless of when
>>      guests using them are started. In other words, libvirt should be
>>      able to depend on the fact that an mdev is ready for use at all
>>      times.
> Yes, an mdev should represent committed resources.  We support dynamic
> creation and removal if the resources need to be freed for the host or
> another mdev configuration.

See my previous response.

>
>> I will discuss this with the team when we meet on Thursday and get
>> back to you. If I am mistaken in my analysis of your concern, please
>> clarify.
>>
>> I would like to mention that when a guest using an mdev starts,
>> the mdev will be activated if not already activated. The guest
>> will not start if another activated mdev is using an APQN assigned
>> to the mdev of the guest being started.
> Understood.
>
>>>> +6. Start Guest1:
>>>> +
>>>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>>>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
>>>> +
>>>> +7. Start Guest2:
>>>> +
>>>> +   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
>>>> +      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
>>>> +
>>>> +When the guest is shut down, the mediated matrix device may be removed.
>>>> +
>>>> +Using our example again, to remove the mediated matrix device $uuid1:
>>>> +
>>>> +   /sys/devices/virtual/misc
>>>> +      --- [vfio_ap]
>>>> +      --------- [mdev_supported_types]
>>>> +      ------------ [vfio_ap-passthrough]
>>>> +      --------------- [devices]
>>>> +      ------------------ [$uuid1]
>>>> +      --------------------- activate
>>>> +      --------------------- remove
>>>> +
>>>> +
>>>> +   echo 0 > activate
>>>> +   echo 1 > remove
>>>> +
>>>> +   This will release all the AP queues for the mediated device and
>>>> +   remove all of the mdev matrix device's sysfs structures. To
>>>> +   recreate and reconfigure the mdev matrix device, all of the steps starting
>>>> +   with step 4 will have to be performed again.
>>>> +
>>>> +   It is not necessary to remove an mdev matrix device, but one may want to
>>>> +   remove it if no guest will use it during the lifetime of the linux host. If
>>>> +   the mdev matrix device is removed, one may want to unbind the AP queues the
>>>> +   guest was using from the vfio_ap device driver and bind them back to the
>>>> +   default driver. Alternatively, the AP queues can be configured for another
>>>> +   mdev matrix (i.e., guest). In either case, one must take care to change the
>>>> +   secure key configured for the domain to which the queue is connected.
>>> This seems prime for data leakage, extremely sensitive data at that.
>>> Who's responsibility is it to prevent this?  Shouldn't closing the
>>> device reset the device, which should perhaps wipe any key
>>> configuration?
>> The device is zeroized (reset) when it is removed. The zeroize
>> instruction - to
>> the best of my knowledge - does nothing with the secure key for a given
>> queue.
>> There is also no instruction that I know of to clear keys, so it would
>> appear
>> that this is left to the system administrator.
> :-o  That seems like a pretty serious fault.  Given that this interface
> is specifically for crypto devices where such sensitive information
> seems common, doesn't it seem like the kernel, ie. the mdev vendor
> driver, should provide guarantees that should your user instance crash
> or be killed that keys are not available to the system admin?

I may have misspoke with regard to my understanding of the AP reset. I've
been told that the keys stored with the domain will be reset. I will
verify this and get back to you.

>
>>>> +
>>>> +
>>>> +Limitations
>>>> +===========
>>>> +An admin needs to be careful when writing to sysfs files
>>>> +/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
>>>> +to which an AP queue is bound to. Once an AP queue is bound vfio_ap
>>>> +driver and assigned to a mediated device, admin should not re-assign the
>>>> +AP queues back to the default driver, because of technical limitations.
>>>> +The kernel does not prevent you from re-assigning from AP queues reserved
>>>> +for the vfio_ap driver back to default driver.  Future enhancements in
>>>> +the ap_bus and vfio_ap are likely to introduce complete support for such
>>>> +dynamic reconfiguration. But until then be extremely careful.
>>> Why don't we have these enhancements now?  Should the whole thing be
>>> marked experimental without them?  This sounds similar to a vfio-pci
>>> case where a group with multiple devices could have device which is
>>> unused by the user unbound from vfio-pci and re-bound to a native host
>>> driver.  We BUG_ON when this occurs to protect the data.
>> We have customers that have been waiting a long time and are very anxious
>> to have guest access to AP devices. In the interest of reducing time to
>> market, we have decided to simplify the initial implementation as much
>> as possible. I wouldn't necessarily call this experimental, but view it
>> more as the minimal viable product.
>>
>> As far as BUG_ON, I will discuss this with the crypto team at our
>> meeting on Thursday and get back to you.
> A minimum viable product still needs to protect the host and does not
> excuse us from later breaking the user exposed ABI.

I will have a more thorough look into it.

>
>>> Probably also need to evaluate updates to
>>> Documentation/ABI/testing/sysfs-bus-vfio-mdev, especially if libvirt is
>>> supposed to interact with an 'activate' attribute.  Thanks,
>> As it stands, libvirt will not create or modify the attributes of an mdev.
>> If a guest is started under libvirt and its mdev has not been activated,
>> it will be activated when the mdev fd is opened. Of course, if another
>> activated mdev is assigned an APQN assigned to the mdev of the guest
>> being started, the guest will be terminated.
> Exactly why I disapprove of the activated interface, but it's being
> proposed and recommended for use as a standard part of the lifecycle
> management for these devices.  Thanks,

As I stated elsewhere, we are removing the activate interface. All
consistency checking will be done as adapters/domains are assigned to
the mdev.

>
> Alex
>
diff mbox series

Patch

diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt
new file mode 100644
index 000000000000..e7cda892817e
--- /dev/null
+++ b/Documentation/s390/vfio-ap.txt
@@ -0,0 +1,591 @@ 
+Introduction:
+============
+The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised
+of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards.
+The AP devices provide cryptographic functions to all CPUs assigned to a
+linux system running in an IBM Z system LPAR.
+
+The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap
+is to make AP cards available to KVM guests using the VFIO mediated device
+framework. This implementation relies considerably on the s390 virtualization
+facilities which do most of the hard work of providing direct access to AP
+devices.
+
+AP Architectural Overview:
+=========================
+To facilitate the comprehension of the design, let's start with some
+definitions:
+
+* AP adapter
+
+  An AP adapter is an IBM Z adapter card that can perform cryptographic
+  functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters
+  assigned to the LPAR in which a linux host is running will be available to
+  the linux host. Each adapter is identified by a number from 0 to 255. When
+  installed, an AP adapter is accessed by AP instructions executed by any CPU.
+
+  The AP adapter cards are assigned to a given LPAR via the system's Activation
+  Profile which can be edited via the HMC. When the system is IPL'd, the AP bus
+  module is loaded and detects the AP adapter cards assigned to the LPAR. The AP
+  bus creates a sysfs device for each adapter as they are detected. For example,
+  if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will
+  create the following sysfs entries:
+
+    /sys/devices/ap/card04
+    /sys/devices/ap/card0a
+
+  Symbolic links to these devices will also be created in the AP bus devices
+  sub-directory:
+
+    /sys/bus/ap/devices/[card04]
+    /sys/bus/ap/devices/[card04]
+
+* AP domain
+
+  An adapter is partitioned into domains. Each domain can be thought of as
+  a set of hardware registers for processing AP instructions. An adapter can
+  hold up to 256 domains. Each domain is identified by a number from 0 to 255.
+  Domains can be further classified into two types:
+
+    * Usage domains are domains that can be accessed directly to process AP
+      commands.
+
+    * Control domains are domains that are accessed indirectly by AP
+      commands sent to a usage domain to control or change the domain, for
+      example; to set a secure private key for the domain.
+
+  The AP usage and control domains are assigned to a given LPAR via the system's
+  Activation Profile which can be edited via the HMC. When the system is IPL'd,
+  the AP bus module is loaded and detects the AP usage and control domains
+  assigned to the LPAR. The domain number of each usage domain will be coupled
+  with the adapter number of each AP adapter assigned to the LPAR to identify
+  the AP queues (see AP Queue section below). The domain number of each control
+  domain will be represented in a bitmask and stored in a sysfs file
+  /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask,
+  from most to least significant bit, correspond to domains 0-255.
+
+  A domain may be assigned to a system as both a usage and control domain, or
+  as a control domain only. Consequently, all domains assigned as both a usage
+  and control domain can both process AP commands as well as be changed by an AP
+  command sent to any usage domain assigned to the same system. Domains assigned
+  only as control domains can not process AP commands but can be changed by AP
+  commands sent to any usage domain assigned to the system.
+
+* AP Queue
+
+  An AP queue is the means by which an AP command-request message is sent to a
+  usage domain inside a specific adapter. An AP queue is identified by a tuple
+  comprised of an AP adapter ID (APID) and an AP queue index (APQI). The
+  APQI corresponds to a given usage domain number within the adapter. This tuple
+  forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP
+  instructions include a field containing the APQN to identify the AP queue to
+  which the AP command-request message is to be sent for processing.
+
+  The AP bus will create a sysfs device for each APQN that can be derived from
+  the intersection of the AP adapter and usage domain numbers detected when the
+  AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage
+  domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the
+  following sysfs entries:
+
+    /sys/devices/ap/card04/04.0006
+    /sys/devices/ap/card04/04.0047
+    /sys/devices/ap/card0a/0a.0006
+    /sys/devices/ap/card0a/0a.0047
+
+  The following symbolic links to these devices will be created in the AP bus
+  devices subdirectory:
+
+    /sys/bus/ap/devices/[04.0006]
+    /sys/bus/ap/devices/[04.0047]
+    /sys/bus/ap/devices/[0a.0006]
+    /sys/bus/ap/devices/[0a.0047]
+
+* AP Instructions:
+
+  There are three AP instructions:
+
+  * NQAP: to enqueue an AP command-request message to a queue
+  * DQAP: to dequeue an AP command-reply message from a queue
+  * PQAP: to administer the queues
+
+AP and SIE:
+==========
+Let's now see how AP instructions are interpreted by the hardware.
+
+A satellite control block called the Crypto Control Block is attached to our
+main hardware virtualization control block. The CRYCB contains three fields to
+identify the adapters, usage domains and control domains assigned to the KVM
+guest:
+
+* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned
+  to the KVM guest. Each bit in the mask, from most significant to least
+  significant bit, corresponds to an APID from 0-255. If a bit is set, the
+  corresponding adapter is valid for use by the KVM guest.
+
+* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains
+  assigned to the KVM guest. Each bit in the mask, from most significant to
+  least significant bit, corresponds to an AP queue index (APQI) from 0-255. If
+  a bit is set, the corresponding queue is valid for use by the KVM guest.
+
+* The AP Domain Mask field is a bit mask that identifies the AP control domains
+  assigned to the KVM guest. The ADM bit mask controls which domains can be
+  changed by an AP command-request message sent to a usage domain from the
+  guest. Each bit in the mask, from least significant to most significant bit,
+  corresponds to a domain from 0-255. If a bit is set, the corresponding domain
+  can be modified by an AP command-request message sent to a usage domain
+  configured for the KVM guest.
+
+If you recall from the description of an AP Queue, AP instructions include
+an APQN to identify the AP adapter and AP queue to which an AP command-request
+message is to be sent (NQAP and PQAP instructions), or from which a
+command-reply message is to be received (DQAP instruction). The validity of an
+APQN is defined by the matrix calculated from the APM and AQM; it is the
+cross product of all assigned adapter numbers (APM) with all assigned queue
+indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are
+assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for
+the guest.
+
+The APQNs can provide secure key functionality - i.e., a private key is stored
+on the adapter card for each of its domains - so each APQN must be assigned to
+at most one guest or the linux host.
+
+   Example 1: Valid configuration:
+   ------------------------------
+   Guest1: adapters 1,2  domains 5,6
+   Guest2: adapter  1,2  domain 7
+
+   This is valid because both guests have a unique set of APQNs: Guest1 has
+   APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7).
+
+   Example 2: Invalid configuration:
+   --------------------------------is assigned by writing the adapter's number into the
+   Guest1: adapters 1,2  domains 5,6
+   Guest2: adapter  1    domains 6,7
+
+   This is an invalid configuration because both guests have access to
+   APQN (1,6).
+
+The Design:
+===========
+The design introduces three new objects:
+
+1. AP matrix device
+2. VFIO AP device driver (vfio_ap.ko)
+3. AP mediated matrix passthrough device
+
+The VFIO AP device driver
+-------------------------
+The VFIO AP (vfio_ap) device driver serves the following purposes:
+
+1. Provides the interfaces to reserve APQNs for exclusive use of KVM guests.
+
+2. Sets up the VFIO mediated device interfaces to manage the mediated matrix
+   device and create the sysfs interfaces for assigning adapters, usage domains,
+   and control domains comprising the matrix for a KVM guest.
+
+3. Configure the APM, AQM and ADM in the CRYCB referenced by a KVM guest's
+   SIE state description to grant the guest access to AP devices
+
+4. Initialize the CPU model feature indicating that a KVM guest may use
+   AP facilities installed on the linux host.
+
+5. Enable interpretive execution mode for the KVM guest.
+
+Reserve APQNs for exclusive use of KVM guests
+---------------------------------------------
+The following block diagram illustrates the mechanism by which APQNs are
+reserved:
+
+                              +------------------+
+                 remove       |                  |   unbind
+         +------------------->+ cex4queue driver +<-----------+
+         |                    |                  |            |
+         |                    +------------------+            |
+         |                                                    |
+         |                                                    |
+         |                                                    |
++--------+---------+ register +------------------+      +-----+------+
+|                  +<---------+                  | bind |            |
+|      ap_bus      |          |  vfio_ap driver  +<-----+    admin   |
+|                  +--------->+                  |      |            |
++------------------+  probe   +---+--------+-----+      +------------+
+                                  |        |
+                           create |        | store APQN
+                                  |        |
+                                  v        v
+                              +---+--------+-----+
+                              |                  |
+                              |  matrix device   |
+                              |                  |
+                              +------------------+
+
+The process for reserving an AP queue for use by a KVM guest is:
+
+* The vfio-ap driver during its initialization will perform the following:
+  * Create the 'vfio_ap' root device - /sys/devices/virtual/misc/vfio_ap
+  * Create the 'matrix' device in the 'vfio_ap' root
+  * Register the matrix device with the device core
+* Register with the ap_bus for AP queue devices of type 10 devices (CEX4 and
+  newer) and to provide the vfio_ap driver's probe and remove callback
+  interfaces. The reason why older devices are not supported is because there
+  are no systems available on which to test.
+* The admin needs to reserve the AP Queue for vfio_ap driver. This can be
+  done by writing the AP adapter mask to /sys/bus/ap/apmask and the AP domain
+  mask to /sys/bus/ap/aqmask.
+
+  For example to reserve all the AP Queues on the system for vfio_ap driver:
+
+  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
+  echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
+
+  This would unbind all the AP Queues from the default zcrypt (cex4queue) driver and
+  make the AP Queues available for vfio_ap driver.
+
+  Another option would be to specify the apmask and aqmask via kernel command line.
+  eg:
+  ap.apmask=0x0 ap.aqmask=0x0
+
+  This would reserve all the AP Queues to be by vfio_ap driver.
+
+Set up the VFIO mediated device interfaces
+------------------------------------------
+The VFIO AP device driver utilizes the common interface of the VFIO mediated
+device core driver to:
+* Register an AP mediated bus driver to add a mediated matrix device to and
+  remove it from a VFIO group.
+* Create and destroy a mediated matrix device
+* Add a mediated matrix device to and remove it from the AP mediated bus driver
+* Add a mediated matrix device to and remove it from an IOMMU group
+
+The following high-level block diagram shows the main components and interfaces
+of the VFIO AP mediated matrix device driver:
+
+ +-------------+
+ |             |
+ | +---------+ | mdev_register_driver() +--------------+
+ | |  Mdev   | +<-----------------------+              |
+ | |  bus    | |                        | vfio_mdev.ko |
+ | | driver  | +----------------------->+              |<-> VFIO user
+ | +---------+ |    probe()/remove()    +--------------+    APIs
+ |             |
+ |  MDEV CORE  |
+ |   MODULE    |
+ |   mdev.ko   |
+ | +---------+ | mdev_register_device() +--------------+
+ | |Physical | +<-----------------------+              |
+ | | device  | |                        |  vfio_ap.ko  |<-> matrix
+ | |interface| +----------------------->+              |    device
+ | +---------+ |       callback         +--------------+
+ +-------------+
+
+During initialization of the vfio_ap module, the matrix device is registered
+with an 'mdev_parent_ops' structure that provides the sysfs attribute
+structures, mdev functions and callback interfaces for managing the mediated
+matrix device.
+
+* sysfs attribute structures:
+  * supported_type_groups
+    The VFIO mediated device framework supports creation of user-defined
+    mediated device types. These mediated device types are specified
+    via the 'supported_type_groups' structure when a device is registered
+    with the mediated device framework. The registration process creates the
+    sysfs structures for each mediated device type specified in the
+    'mdev_supported_types' sub-directory of the device being registered. Along
+    with the device type, the sysfs attributes of the mediated device type are
+    provided.
+
+    The VFIO AP device driver will register one mediated device type for
+    passthrough devices:
+        /sys/devices/vfio_ap/mdev_supported_types/vfio_ap-passthrough
+    Only the three read-only attributes required by the VFIO mdev framework will
+    be provided:
+        /sys/devices/vfio_ap/mdev_supported_types
+        ... name
+        ... device_api
+        ... available_instances
+        Where:
+        * name: specifies the name of the mediated device type
+        * device_api: the mediated device type's API
+        * available_instances: the number of mediated matrix passthrough devices
+                               that can be created
+  * mdev_attr_groups
+    This attribute group identifies the user-defined sysfs attributes of the
+    mediated device. When a device is registered with the VFIO mediated device
+    framework, the sysfs attributes files identified in the 'mdev_attr_groups'
+    structure will be created in the mediated matrix device's directory. The
+    sysfs attributes for a mediated matrix device are:
+    * assign_adapter:
+      A write-only file for assigning an AP adapter to the mediated matrix
+      device. To assign an adapter, the APID of the adapter is written to the
+      file.
+    * assign_domain:
+      A write-only file for assigning an AP usage domain to the mediated matrix
+      device. To assign a domain, the APQI of the AP queue corresponding to a
+      usage domain is written to the file.
+    * matrix:
+      A read-only file for displaying the APQNs derived from the adapters and
+      domains assigned to the mediated matrix device.
+    * assign_control_domain:
+      A write-only file for assigning an AP control domain to the mediated
+      matrix device. To assign a control domain, the ID of a domain to be
+      controlled is written to the file. For the initial implementation, the set
+      of control domains will always include the set of usage domains, so it is
+      only necessary to assign control domains that are not also assigned as
+      usage domains.
+    * control_domains:
+      A read-only file for displaying the control domain numbers assigned to the
+      mediated matrix device.
+
+* functions:
+  * create:
+    allocates the ap_matrix_mdev structure used by the vfio_ap driver to:
+    * Keep track of the available instances
+    * Store the reference to the struct kvm for the KVM guest
+    * Provide the notifier callback that will get invoked to handle the
+      VFIO_GROUP_NOTIFY_SET_KVM event. When received, the vfio_ap driver will
+      store the reference in the mediated matrix device's ap_matrix_mdev
+      structure and enable the interpretive execution mode for the KVM guest.
+  * remove:
+    deallocates the mediated matrix device's ap_matrix_mdev structure.
+  * activate:
+    Claims AP Queue resources for a mediated device.
+
+* callback interfaces
+  * open:
+    The vfio_ap driver uses this callback to register a
+    VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix
+    device. The notifier is invoked when QEMU connects the VFIO iommu group
+    for the mdev matrix device to the MDEV bus. Access to the KVM structure used
+    to configure the KVM guest is provided via this callback. The KVM structure,
+    is used to configure the guest's access to the AP matrix defined via the
+    mediated matrix device's sysfs attribute files.
+  * release:
+    unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the
+    mdev matrix device and deconfigures the guest's AP matrix.
+
+Configure the APM, AQM and ADM in the CRYCB:
+-------------------------------------------
+Configuring the AP matrix for a KVM guest will be performed when the
+VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier
+function is called when QEMU connects the VFIO iommu group for the mdev matrix
+device to the MDEV bus. The CRYCB is configured by:
+* Setting the bits in the APM corresponding to the APIDs assigned to the
+  mediated matrix device via its 'assign_adapter' interface.
+* Setting the bits in the AQM corresponding to the APQIs assigned to the
+  mediated matrix device via its 'assign_domain' interface.
+* Setting the bits in the ADM corresponding to the domain dIDs assigned to the
+  mediated matrix device via its 'assign_control_domains' interface.
+
+Initialize the CPU model feature for AP
+---------------------------------------
+A new CPU model feature, KVM_S390_VM_CPU_FEAT_AP, is introduced to indicate that
+AP instructions are available to the KVM guest. This feature will be enabled by
+KVM only if the AP instructions are installed on the linux host. The feature
+must be turned on for the guest in order to access AP devices from the guest.
+For example, to turn the AP facilities on from the QEMU command line:
+
+    /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on
+
+    Where xxx is the CPU model being used.
+
+    If the CPU model feature is not enabled by the kernel, QEMU will fail and
+    report that the feature is not supported. For most Linux guests the cpu model
+    feature apft must also be turned on.
+
+Example:
+=======
+Let's now provide an example to illustrate how KVM guests may be given
+access to AP facilities. For this example, we will show how to configure
+two guests such that executing the lszcrypt command on the guests would
+look like this:
+
+Guest1
+------
+CARD.DOMAIN TYPE  MODE
+------------------------------
+05          CEX5C CCA-Coproc
+05.0004     CEX5C CCA-Coproc
+05.00ab     CEX5C CCA-Coproc
+06          CEX5A Accelerator
+06.0004     CEX5A Accelerator
+06.00ab     CEX5C CCA-Coproc
+
+Guest2
+------
+CARD.DOMAIN TYPE  MODE
+------------------------------
+05          CEX5A Accelerator
+05.0047     CEX5A Accelerator
+05.00ff     CEX5A Accelerator
+
+These are the steps:
+
+1. Install the vfio_ap module on the linux host. The dependency chain for the
+   vfio_ap module is:
+   * vfio
+   * mdev
+   * vfio_mdev
+   * KVM
+   * vfio_ap
+
+2. Secure the AP queues to be used by the two guests so that the host can not
+   access them. Only type 10 adapters (i.e., CEX4 and later) are supported
+   due to the fact that no test systems with older card types are available
+   for testing.
+
+   To secure the AP queues each, each AP Queue device must first be unbound from
+   the cex4queue device driver. The sysfs location of the driver is:
+
+   /sys/bus/ap
+   --- [drivers]
+   ------ [cex4queue]
+   --------- [05.0004]
+   --------- [05.0047]
+   --------- [05.00ab]
+   --------- [05.00ff]
+   --------- [06.0004]
+   --------- [06.00ab]
+   --------- unbind
+
+   To unbind the AP queues from cex4queue driver, we need to update the
+   apmask and aqmask files.
+
+   echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/apmask
+   echo 0x0000000000000000000000000000000000000000000000000000000000000000 > /sys/bus/ap/aqmask
+
+   This would unbind all the AP queues and makes them available for vfio_ap
+   driver.
+
+   Take note that the AP queues bound to the vfio_ap driver will be available
+   for guest usage until they are unbound from the driver, the vfio_ap module
+   is unloaded, or the host system is shut down.
+
+3. Create the mediated devices needed to configure the AP matrixes for the
+   two guests and to provide an interface to the vfio_ap driver for
+   use by the guests:
+
+   /sys/devices/virtual/misc
+   --- [vfio_ap]
+   --------- [mdev_supported_types]
+   ------------ [vfio_ap-passthrough] (passthrough mediated matrix device type)
+   --------------- create
+   --------------- [devices]
+
+   To create the mediated devices for the two guests:
+
+	uuidgen > create
+	uuidgen > create
+
+   This will create two mediated devices in the [devices] subdirectory named
+   with the UUID written to the create attribute file. We call them $uuid1
+   and $uuid2:
+
+   /sys/devices/virtual/misc/
+   --- [vfio_ap]
+   --------- [mdev_supported_types]
+   ------------ [vfio_ap-passthrough]
+   --------------- [devices]
+   ------------------ [$uuid1]
+   --------------------- assign_adapter
+   --------------------- assign_control_domain
+   --------------------- assign_domain
+   --------------------- matrix
+   --------------------- unassign_adapter
+   --------------------- unassign_control_domain
+   --------------------- unassign_domain
+
+   ------------------ [$uuid2]
+   --------------------- assign_adapter
+   --------------------- assign_control_domain
+   --------------------- assign_domain
+   --------------------- matrix
+   --------------------- unassign_adapter
+   --------------------- unassign_control_domain
+   --------------------- unassign_domain
+
+4. The administrator now needs to configure the matrixes for mediated
+   devices $uuid1 (for Guest1) and $uuid2 (for Guest2).
+
+   This is how the matrix is configured for Guest1:
+
+   echo 5 > assign_adapter
+   echo 6 > assign_adapter
+   echo 4 > assign_domain
+   echo 0xab > assign_domain
+
+   For this implementation, all usage domains - i.e., domains assigned
+   via the assign_domain attribute file - will also be configured in the ADM
+   field of the KVM guest's CRYCB, so there is no need to assign control
+   domains here unless you want to assign control domains that are not
+   assigned as usage domains.
+
+   If a mistake is made configuring an adapter, domain or control domain,
+   you can use the unassign_xxx files to unassign the adapter, domain or
+   control domain.
+
+   To display the matrix configuration for Guest1:
+
+   cat matrix
+
+   This is how the matrix is configured for Guest2:
+
+   echo 5 > assign_adapter
+   echo 0x47 > assign_domain
+   echo 0xff > assign_domain
+
+5. The adminstrator now needs to activate the mediated devices.
+   echo 1 > activate
+
+6. Start Guest1:
+
+   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
+      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid1 ...
+
+7. Start Guest2:
+
+   /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apft=on \
+      -device vfio-ap,sysfsdev=/sys/devices/virtual/misc/vfio_ap/mdev_supported_types/vfio_ap-passthrough/devices/$uuid2 ...
+
+When the guest is shut down, the mediated matrix device may be removed.
+
+Using our example again, to remove the mediated matrix device $uuid1:
+
+   /sys/devices/virtual/misc
+      --- [vfio_ap]
+      --------- [mdev_supported_types]
+      ------------ [vfio_ap-passthrough]
+      --------------- [devices]
+      ------------------ [$uuid1]
+      --------------------- activate
+      --------------------- remove
+
+
+   echo 0 > activate
+   echo 1 > remove
+
+   This will release all the AP queues for the mediated device and
+   remove all of the mdev matrix device's sysfs structures. To
+   recreate and reconfigure the mdev matrix device, all of the steps starting
+   with step 4 will have to be performed again.
+
+   It is not necessary to remove an mdev matrix device, but one may want to
+   remove it if no guest will use it during the lifetime of the linux host. If
+   the mdev matrix device is removed, one may want to unbind the AP queues the
+   guest was using from the vfio_ap device driver and bind them back to the
+   default driver. Alternatively, the AP queues can be configured for another
+   mdev matrix (i.e., guest). In either case, one must take care to change the
+   secure key configured for the domain to which the queue is connected.
+
+
+Limitations
+===========
+An admin needs to be careful when writing to sysfs files
+/sys/bus/ap/apmask and /sys/bus/ap/aqmask. These files control the driver
+to which an AP queue is bound to. Once an AP queue is bound vfio_ap
+driver and assigned to a mediated device, admin should not re-assign the
+AP queues back to the default driver, because of technical limitations.
+The kernel does not prevent you from re-assigning from AP queues reserved
+for the vfio_ap driver back to default driver.  Future enhancements in
+the ap_bus and vfio_ap are likely to introduce complete support for such
+dynamic reconfiguration. But until then be extremely careful.
+
diff --git a/MAINTAINERS b/MAINTAINERS
index 5adff9c424d9..5d5c44e83b38 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12426,6 +12426,7 @@  S:	Supported
 F:	drivers/s390/crypto/vfio_ap_drv.c
 F:	drivers/s390/crypto/vfio_ap_private.h
 F:	drivers/s390/crypto/vfio_ap_ops.c
+F:	Documentation/s390/vfio-ap.txt
 
 S390 ZFCP DRIVER
 M:	Steffen Maier <maier@linux.ibm.com>