Message ID | 1534196899-16987-23-git-send-email-akrowiak@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | guest dedicated crypto adapters | expand |
On Mon, 13 Aug 2018 17:48:19 -0400 Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > From: Tony Krowiak <akrowiak@linux.ibm.com> > > This patch provides documentation describing the AP architecture and > design concepts behind the virtualization of AP devices. It also > includes an example of how to configure AP devices for exclusive > use of KVM guests. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > Reviewed-by: Halil Pasic <pasic@linux.ibm.com> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > --- > Documentation/s390/vfio-ap.txt | 615 ++++++++++++++++++++++++++++++++++++++++ > MAINTAINERS | 1 + > 2 files changed, 616 insertions(+), 0 deletions(-) > create mode 100644 Documentation/s390/vfio-ap.txt > > +AP Architectural Overview: > +========================= > +To facilitate the comprehension of the design, let's start with some > +definitions: > + > +* AP adapter > + > + An AP adapter is an IBM Z adapter card that can perform cryptographic > + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters > + assigned to the LPAR in which a linux host is running will be available to > + the linux host. Each adapter is identified by a number from 0 to 255. When > + installed, an AP adapter is accessed by AP instructions executed by any CPU. > + > + The AP adapter cards are assigned to a given LPAR via the system's Activation > + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus There's lots of s390 jargon in here... but one hopes that someone trying to understand AP is already familiar with the basics... > + module is loaded and detects the AP adapter cards assigned to the LPAR. The AP > + bus creates a sysfs device for each adapter as they are detected. For example, > + if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will > + create the following sysfs entries: > + > + /sys/devices/ap/card04 > + /sys/devices/ap/card0a > + > + Symbolic links to these devices will also be created in the AP bus devices > + sub-directory: > + > + /sys/bus/ap/devices/[card04] > + /sys/bus/ap/devices/[card04] > + > +* AP domain > + > + An adapter is partitioned into domains. Each domain can be thought of as > + a set of hardware registers for processing AP instructions. An adapter can > + hold up to 256 domains. Each domain is identified by a number from 0 to 255. > + Domains can be further classified into two types: > + > + * Usage domains are domains that can be accessed directly to process AP > + commands. > + > + * Control domains are domains that are accessed indirectly by AP > + commands sent to a usage domain to control or change the domain; for > + example, to set a secure private key for the domain. > + > + The AP usage and control domains are assigned to a given LPAR via the system's > + Activation Profile which can be edited via the HMC. When the system is IPL'd, > + the AP bus module is loaded and detects the AP usage and control domains > + assigned to the LPAR. The domain number of each usage domain will be coupled > + with the adapter number of each AP adapter assigned to the LPAR to identify > + the AP queues (see AP Queue section below). The domain number of each control > + domain will be represented in a bitmask and stored in a sysfs file > + /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask, > + from most to least significant bit, correspond to domains 0-255. > + > + A domain may be assigned to a system as both a usage and control domain, or > + as a control domain only. Consequently, all domains assigned as both a usage > + and control domain can both process AP commands as well as be changed by an AP > + command sent to any usage domain assigned to the same system. Domains assigned > + only as control domains can not process AP commands but can be changed by AP > + commands sent to any usage domain assigned to the system. I'm struggling a bit with this paragraph. Does that mean that you can use control domains as the target of an instruction changing configuration on the system? (Or on the VM, if they are listed in the relevant control block?) > + > +* AP Queue > + > + An AP queue is the means by which an AP command-request message is sent to a > + usage domain inside a specific adapter. An AP queue is identified by a tuple > + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The > + APQI corresponds to a given usage domain number within the adapter. This tuple > + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP > + instructions include a field containing the APQN to identify the AP queue to > + which the AP command-request message is to be sent for processing. > + > + The AP bus will create a sysfs device for each APQN that can be derived from > + the cross product of the AP adapter and usage domain numbers detected when the > + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage > + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the > + following sysfs entries: > + > + /sys/devices/ap/card04/04.0006 > + /sys/devices/ap/card04/04.0047 > + /sys/devices/ap/card0a/0a.0006 > + /sys/devices/ap/card0a/0a.0047 > + > + The following symbolic links to these devices will be created in the AP bus > + devices subdirectory: > + > + /sys/bus/ap/devices/[04.0006] > + /sys/bus/ap/devices/[04.0047] > + /sys/bus/ap/devices/[0a.0006] > + /sys/bus/ap/devices/[0a.0047] > + > +* AP Instructions: > + > + There are three AP instructions: > + > + * NQAP: to enqueue an AP command-request message to a queue > + * DQAP: to dequeue an AP command-reply message from a queue > + * PQAP: to administer the queues So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or is it that all of them need usage domains, but PQAP can target a control domain as well? [I don't want to dive deeply into the AP architecture here, just far enough to really understand the design implications.] > + > +AP and SIE: > +========== > +Let's now take a look at how AP instructions executed on a guest are interpreted > +by the hardware. > + > +A satellite control block called the Crypto Control Block (CRYCB) is attached to > +our main hardware virtualization control block. The CRYCB contains three fields > +to identify the adapters, usage domains and control domains assigned to the KVM > +guest: > + > +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned > + to the KVM guest. Each bit in the mask, from most significant to least > + significant bit, corresponds to an APID from 0-255. If a bit is set, the > + corresponding adapter is valid for use by the KVM guest. > + > +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains > + assigned to the KVM guest. Each bit in the mask, from most significant to > + least significant bit, corresponds to an AP queue index (APQI) from 0-255. If > + a bit is set, the corresponding queue is valid for use by the KVM guest. > + > +* The AP Domain Mask field is a bit mask that identifies the AP control domains > + assigned to the KVM guest. The ADM bit mask controls which domains can be > + changed by an AP command-request message sent to a usage domain from the > + guest. Each bit in the mask, from least significant to most significant bit, > + corresponds to a domain from 0-255. If a bit is set, the corresponding domain > + can be modified by an AP command-request message sent to a usage domain > + configured for the KVM guest. OK, that seems to imply that you modify a control domain by sending a request to (any) usage domain? I do not doubt that, but the whole architecture is really confusing :) > + > +If you recall from the description of an AP Queue, AP instructions include > +an APQN to identify the AP adapter and AP queue to which an AP command-request > +message is to be sent (NQAP and PQAP instructions), or from which a > +command-reply message is to be received (DQAP instruction). The validity of an > +APQN is defined by the matrix calculated from the APM and AQM; it is the > +cross product of all assigned adapter numbers (APM) with all assigned queue > +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are > +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for > +the guest. How does the control domain mask interact with that? Can you send a command to an APQN valid for the guest to modify any control domain specified in the mask? Does the SIE complain if you specify a control domain that the host does not have access to (I'd guess so)? > + > +The APQNs can provide secure key functionality - i.e., a private key is stored > +on the adapter card for each of its domains - so each APQN must be assigned to > +at most one guest or to the linux host. > + > + Example 1: Valid configuration: > + ------------------------------ > + Guest1: adapters 1,2 domains 5,6 > + Guest2: adapter 1,2 domain 7 > + > + This is valid because both guests have a unique set of APQNs: Guest1 has > + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). > + > + Example 2: Invalid configuration: > + Guest1: adapters 1,2 domains 5,6 > + Guest2: adapter 1 domains 6,7 > + > + This is an invalid configuration because both guests have access to > + APQN (1,6). So, the adapters or the domains can overlap , but the cross product mustn't? If I had Guest1: adapters 1,2 domains 5,6 Guest2: adapters 3,4 domains 5,6 would that be fine? Is there any rule about shared control domains? (...) > +Limitations > +=========== > +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP > + queue that is still assigned to a mediated device. Even if the device > + 'remove' callback returns an error, the device core detaches the AP > + queue from the VFIO AP driver. It is therefore incumbent upon the > + administrator to make sure there is no mediated device to which the > + APQN - for the AP queue being unbound - is assigned. > + > +* Hot plug/unplug of AP devices is not supported for guests. Not sure what that sentence means. Adding/removing devices by the hypervisor is not supported? Or some guest actions, respectively injecting notifications that would trigger some actions on the real hardware? Do you want to add (some of) this in the future? > + > +* Live guest migration is not supported for guests using AP devices. Migration and vfio is an interesting area in general :) Would be great if vfio-ap could benefit from any generic efforts in that area, but that probably requires that someone with access to documentation and hardware keeps an eye on developments. > \ No newline at end of file Please add one :)
On 08/20/2018 12:03 PM, Cornelia Huck wrote: > On Mon, 13 Aug 2018 17:48:19 -0400 > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> From: Tony Krowiak <akrowiak@linux.ibm.com> >> >> This patch provides documentation describing the AP architecture and >> design concepts behind the virtualization of AP devices. It also >> includes an example of how to configure AP devices for exclusive >> use of KVM guests. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >> --- >> Documentation/s390/vfio-ap.txt | 615 ++++++++++++++++++++++++++++++++++++++++ >> MAINTAINERS | 1 + >> 2 files changed, 616 insertions(+), 0 deletions(-) >> create mode 100644 Documentation/s390/vfio-ap.txt >> >> +AP Architectural Overview: >> +========================= >> +To facilitate the comprehension of the design, let's start with some >> +definitions: >> + >> +* AP adapter >> + >> + An AP adapter is an IBM Z adapter card that can perform cryptographic >> + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters >> + assigned to the LPAR in which a linux host is running will be available to >> + the linux host. Each adapter is identified by a number from 0 to 255. When >> + installed, an AP adapter is accessed by AP instructions executed by any CPU. >> + >> + The AP adapter cards are assigned to a given LPAR via the system's Activation >> + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus > There's lots of s390 jargon in here... but one hopes that someone > trying to understand AP is already familiar with the basics... I'm not quite sure how one can describe s390-specific devices that can be installed only on an s390 system without using s390 jargon. I would think that one who is administering a linux host or guest running on an s390 system would have some basic knowledge of s390. If you have any suggestions, I'd be happy to entertain them. > >> + module is loaded and detects the AP adapter cards assigned to the LPAR. The AP >> + bus creates a sysfs device for each adapter as they are detected. For example, >> + if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will >> + create the following sysfs entries: >> + >> + /sys/devices/ap/card04 >> + /sys/devices/ap/card0a >> + >> + Symbolic links to these devices will also be created in the AP bus devices >> + sub-directory: >> + >> + /sys/bus/ap/devices/[card04] >> + /sys/bus/ap/devices/[card04] >> + >> +* AP domain >> + >> + An adapter is partitioned into domains. Each domain can be thought of as >> + a set of hardware registers for processing AP instructions. An adapter can >> + hold up to 256 domains. Each domain is identified by a number from 0 to 255. >> + Domains can be further classified into two types: >> + >> + * Usage domains are domains that can be accessed directly to process AP >> + commands. >> + >> + * Control domains are domains that are accessed indirectly by AP >> + commands sent to a usage domain to control or change the domain; for >> + example, to set a secure private key for the domain. >> + >> + The AP usage and control domains are assigned to a given LPAR via the system's >> + Activation Profile which can be edited via the HMC. When the system is IPL'd, >> + the AP bus module is loaded and detects the AP usage and control domains >> + assigned to the LPAR. The domain number of each usage domain will be coupled >> + with the adapter number of each AP adapter assigned to the LPAR to identify >> + the AP queues (see AP Queue section below). The domain number of each control >> + domain will be represented in a bitmask and stored in a sysfs file >> + /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask, >> + from most to least significant bit, correspond to domains 0-255. >> + >> + A domain may be assigned to a system as both a usage and control domain, or >> + as a control domain only. Consequently, all domains assigned as both a usage >> + and control domain can both process AP commands as well as be changed by an AP >> + command sent to any usage domain assigned to the same system. Domains assigned >> + only as control domains can not process AP commands but can be changed by AP >> + commands sent to any usage domain assigned to the system. > I'm struggling a bit with this paragraph. Does that mean that you can > use control domains as the target of an instruction changing > configuration on the system? (Or on the VM, if they are listed in the > relevant control block?) Only usage domains can be the target of an AP command request message. If an AP message sent to a usage domain is a request to change a domain, the number of the domain to be changed will be contained in the command request message. That domain number must be configured as a control domain or the AP command will fail. The fact you are struggling with understanding the last paragraph leads me to believe it should probably be rewritten, or eliminated. Allow me to reconsider this section. > >> + >> +* AP Queue >> + >> + An AP queue is the means by which an AP command-request message is sent to a >> + usage domain inside a specific adapter. An AP queue is identified by a tuple >> + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The >> + APQI corresponds to a given usage domain number within the adapter. This tuple >> + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP >> + instructions include a field containing the APQN to identify the AP queue to >> + which the AP command-request message is to be sent for processing. >> + >> + The AP bus will create a sysfs device for each APQN that can be derived from >> + the cross product of the AP adapter and usage domain numbers detected when the >> + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage >> + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the >> + following sysfs entries: >> + >> + /sys/devices/ap/card04/04.0006 >> + /sys/devices/ap/card04/04.0047 >> + /sys/devices/ap/card0a/0a.0006 >> + /sys/devices/ap/card0a/0a.0047 >> + >> + The following symbolic links to these devices will be created in the AP bus >> + devices subdirectory: >> + >> + /sys/bus/ap/devices/[04.0006] >> + /sys/bus/ap/devices/[04.0047] >> + /sys/bus/ap/devices/[0a.0006] >> + /sys/bus/ap/devices/[0a.0047] >> + >> +* AP Instructions: >> + >> + There are three AP instructions: >> + >> + * NQAP: to enqueue an AP command-request message to a queue >> + * DQAP: to dequeue an AP command-reply message from a queue >> + * PQAP: to administer the queues > So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or > is it that all of them need usage domains, but PQAP can target a control > domain as well? All AP instructions - the lone exception being the PQAP(QCI) subfunction - identify the usage domain that is the target of the instruction. I think using the term 'control domain' is the source of much confusion. It makes it seem as if there are two types of domains that serve different purposes. That is simply not true. A domain is a partition within an AP adapter that can process AP command request messages. All AP commands are sent to a domain. Configuring a domain as a usage domain means it can be used to process AP commands; in other words, it can be the target of an AP instruction. Configuring a domain as a control domain means it can be changed by an AP command. AP commands that change a domain are sent to a usage domain, but the domain to be changed is specified in the payload of the AP command message. The domain thus specified must be identified via the AP configuration as a control domain, or the AP command will be rejected. > > [I don't want to dive deeply into the AP architecture here, just far > enough to really understand the design implications.] Are you suggesting some of the above should be removed? If so, what? > >> + >> +AP and SIE: >> +========== >> +Let's now take a look at how AP instructions executed on a guest are interpreted >> +by the hardware. >> + >> +A satellite control block called the Crypto Control Block (CRYCB) is attached to >> +our main hardware virtualization control block. The CRYCB contains three fields >> +to identify the adapters, usage domains and control domains assigned to the KVM >> +guest: >> + >> +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned >> + to the KVM guest. Each bit in the mask, from most significant to least >> + significant bit, corresponds to an APID from 0-255. If a bit is set, the >> + corresponding adapter is valid for use by the KVM guest. >> + >> +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains >> + assigned to the KVM guest. Each bit in the mask, from most significant to >> + least significant bit, corresponds to an AP queue index (APQI) from 0-255. If >> + a bit is set, the corresponding queue is valid for use by the KVM guest. >> + >> +* The AP Domain Mask field is a bit mask that identifies the AP control domains >> + assigned to the KVM guest. The ADM bit mask controls which domains can be >> + changed by an AP command-request message sent to a usage domain from the >> + guest. Each bit in the mask, from least significant to most significant bit, >> + corresponds to a domain from 0-255. If a bit is set, the corresponding domain >> + can be modified by an AP command-request message sent to a usage domain >> + configured for the KVM guest. > OK, that seems to imply that you modify a control domain by sending a > request to (any) usage domain? That is a true statement. I reality, you are just modifying a domain. The control domain designation identifies a domain that can be controlled as opposed to used. Maybe if you think of these bitmasks as access control masks it would clarify things. The AQM specifies domains to which AP commands can be sent and the ADM specifies domains that can be changed by an AP command. > I do not doubt that, but the whole > architecture is really confusing :) I couldn't agree more. It took me a while to wrap my head around it. > >> + >> +If you recall from the description of an AP Queue, AP instructions include >> +an APQN to identify the AP adapter and AP queue to which an AP command-request >> +message is to be sent (NQAP and PQAP instructions), or from which a >> +command-reply message is to be received (DQAP instruction). The validity of an >> +APQN is defined by the matrix calculated from the APM and AQM; it is the >> +cross product of all assigned adapter numbers (APM) with all assigned queue >> +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are >> +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for >> +the guest. > How does the control domain mask interact with that? The control domain mask does not interact with the other two masks. It merely specifies which domains can be modified by an AP command. In fact, the ADM can have bits set that are not included in the AQM; in other words, a guest can be used to control domains that it can not use. > Can you send a > command to an APQN valid for the guest to modify any control domain > specified in the mask? Yes. > Does the SIE complain if you specify a control > domain that the host does not have access to (I'd guess so)? The SIE does not complain if you specify a domain to which the host - or a lower level guest - does not have access. The firmware performs a logical AND of the guest's and hosts's - or lower level guest's - APMs, AQMs and ADMs to create effective masks EAPM, EAQM and EADM. Only devices corresponding to the bits set in the EAPM, EAQM and EADM will be accessible by the guest. > >> + >> +The APQNs can provide secure key functionality - i.e., a private key is stored >> +on the adapter card for each of its domains - so each APQN must be assigned to >> +at most one guest or to the linux host. >> + >> + Example 1: Valid configuration: >> + ------------------------------ >> + Guest1: adapters 1,2 domains 5,6 >> + Guest2: adapter 1,2 domain 7 >> + >> + This is valid because both guests have a unique set of APQNs: Guest1 has >> + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). >> + >> + Example 2: Invalid configuration: >> + Guest1: adapters 1,2 domains 5,6 >> + Guest2: adapter 1 domains 6,7 >> + >> + This is an invalid configuration because both guests have access to >> + APQN (1,6). > So, the adapters or the domains can overlap , but the cross product > mustn't? If I had > > Guest1: adapters 1,2 domains 5,6 > Guest2: adapters 3,4 domains 5,6 > > would that be fine? Yes, that would be fine because Guest1 would have access to APQNs (1,5), (1,6), (2,5) and (2,6) while Guest2 would have access to (3,5), (3,6), (4,5) AND (4,6), but neither would have access to the same APQN. > > Is there any rule about shared control domains? AFAIK there isn't, but I will consult with Reinhard about that. > > (...) > >> +Limitations >> +=========== >> +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP >> + queue that is still assigned to a mediated device. Even if the device >> + 'remove' callback returns an error, the device core detaches the AP >> + queue from the VFIO AP driver. It is therefore incumbent upon the >> + administrator to make sure there is no mediated device to which the >> + APQN - for the AP queue being unbound - is assigned. >> + >> +* Hot plug/unplug of AP devices is not supported for guests. > Not sure what that sentence means. Adding/removing devices by the > hypervisor is not supported? Or some guest actions, respectively > injecting notifications that would trigger some actions on the real > hardware? No means is provided to modify a guest's AP matrix - i.e., APM, AQM and ADM - while a guest is running. Once a guest is running, its AP configuration can not be changed dynamically. > > Do you want to add (some of) this in the future? Yes, we plan to introduce dynamic configurations in future releases. > >> + >> +* Live guest migration is not supported for guests using AP devices. > Migration and vfio is an interesting area in general :) Would be great > if vfio-ap could benefit from any generic efforts in that area, but > that probably requires that someone with access to documentation and > hardware keeps an eye on developments. I have briefly looked at some of the articles talking about live migration of passthrough devices, but nothing seemed applicable to AP architecture. From my limited perspective, it would seem that architectural changes would have to be implemented to fully support live migration of in-process AP queues. > >> \ No newline at end of file > Please add one :) Will do. >
On 20.08.2018 18:03, Cornelia Huck wrote: > On Mon, 13 Aug 2018 17:48:19 -0400 > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> From: Tony Krowiak <akrowiak@linux.ibm.com> >> >> This patch provides documentation describing the AP architecture and >> design concepts behind the virtualization of AP devices. It also >> includes an example of how to configure AP devices for exclusive >> use of KVM guests. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >> --- >> Documentation/s390/vfio-ap.txt | 615 ++++++++++++++++++++++++++++++++++++++++ >> MAINTAINERS | 1 + >> 2 files changed, 616 insertions(+), 0 deletions(-) >> create mode 100644 Documentation/s390/vfio-ap.txt >> >> +AP Architectural Overview: >> +========================= >> +To facilitate the comprehension of the design, let's start with some >> +definitions: >> + >> +* AP adapter >> + >> + An AP adapter is an IBM Z adapter card that can perform cryptographic >> + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters >> + assigned to the LPAR in which a linux host is running will be available to >> + the linux host. Each adapter is identified by a number from 0 to 255. When >> + installed, an AP adapter is accessed by AP instructions executed by any CPU. >> + >> + The AP adapter cards are assigned to a given LPAR via the system's Activation >> + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus > There's lots of s390 jargon in here... but one hopes that someone > trying to understand AP is already familiar with the basics... > >> + module is loaded and detects the AP adapter cards assigned to the LPAR. The AP >> + bus creates a sysfs device for each adapter as they are detected. For example, >> + if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will >> + create the following sysfs entries: >> + >> + /sys/devices/ap/card04 >> + /sys/devices/ap/card0a >> + >> + Symbolic links to these devices will also be created in the AP bus devices >> + sub-directory: >> + >> + /sys/bus/ap/devices/[card04] >> + /sys/bus/ap/devices/[card04] >> + >> +* AP domain >> + >> + An adapter is partitioned into domains. Each domain can be thought of as >> + a set of hardware registers for processing AP instructions. An adapter can >> + hold up to 256 domains. Each domain is identified by a number from 0 to 255. >> + Domains can be further classified into two types: >> + >> + * Usage domains are domains that can be accessed directly to process AP >> + commands. >> + >> + * Control domains are domains that are accessed indirectly by AP >> + commands sent to a usage domain to control or change the domain; for >> + example, to set a secure private key for the domain. >> + >> + The AP usage and control domains are assigned to a given LPAR via the system's >> + Activation Profile which can be edited via the HMC. When the system is IPL'd, >> + the AP bus module is loaded and detects the AP usage and control domains >> + assigned to the LPAR. The domain number of each usage domain will be coupled >> + with the adapter number of each AP adapter assigned to the LPAR to identify >> + the AP queues (see AP Queue section below). The domain number of each control >> + domain will be represented in a bitmask and stored in a sysfs file >> + /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask, >> + from most to least significant bit, correspond to domains 0-255. >> + >> + A domain may be assigned to a system as both a usage and control domain, or >> + as a control domain only. Consequently, all domains assigned as both a usage >> + and control domain can both process AP commands as well as be changed by an AP >> + command sent to any usage domain assigned to the same system. Domains assigned >> + only as control domains can not process AP commands but can be changed by AP >> + commands sent to any usage domain assigned to the system. > I'm struggling a bit with this paragraph. Does that mean that you can > use control domains as the target of an instruction changing > configuration on the system? (Or on the VM, if they are listed in the > relevant control block?) Yes. You can send an CPRB to a (usage) domain which includes a command for controlling another (control) domain. > >> + >> +* AP Queue >> + >> + An AP queue is the means by which an AP command-request message is sent to a >> + usage domain inside a specific adapter. An AP queue is identified by a tuple >> + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The >> + APQI corresponds to a given usage domain number within the adapter. This tuple >> + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP >> + instructions include a field containing the APQN to identify the AP queue to >> + which the AP command-request message is to be sent for processing. >> + >> + The AP bus will create a sysfs device for each APQN that can be derived from >> + the cross product of the AP adapter and usage domain numbers detected when the >> + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage >> + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the >> + following sysfs entries: >> + >> + /sys/devices/ap/card04/04.0006 >> + /sys/devices/ap/card04/04.0047 >> + /sys/devices/ap/card0a/0a.0006 >> + /sys/devices/ap/card0a/0a.0047 >> + >> + The following symbolic links to these devices will be created in the AP bus >> + devices subdirectory: >> + >> + /sys/bus/ap/devices/[04.0006] >> + /sys/bus/ap/devices/[04.0047] >> + /sys/bus/ap/devices/[0a.0006] >> + /sys/bus/ap/devices/[0a.0047] >> + >> +* AP Instructions: >> + >> + There are three AP instructions: >> + >> + * NQAP: to enqueue an AP command-request message to a queue >> + * DQAP: to dequeue an AP command-reply message from a queue >> + * PQAP: to administer the queues > So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or > is it that all of them need usage domains, but PQAP can target a control > domain as well? > > [I don't want to dive deeply into the AP architecture here, just far > enough to really understand the design implications.] Well, to be honest, nobody ever tried this under Linux. Theoretically one should be able to send a CPRB to a usage domain where inside the CPRB another domain (the control domain) is addressed. However, as of now I am only aware of applications controlling the same usage domain. I don't know any application which is able to address another control domain and I am not sure if the zcrypt device driver would handle such a CPRB correctly. NQAP, DQAP and PQAP always address a usage domain. But the CPRB send down the pipe via NQAP may address some control thing on another domain. I am not sure which code and where do the sorting out here. There are two candidates: the firmware layer in the CEC and the crypto card code. > >> + >> +AP and SIE: >> +========== >> +Let's now take a look at how AP instructions executed on a guest are interpreted >> +by the hardware. >> + >> +A satellite control block called the Crypto Control Block (CRYCB) is attached to >> +our main hardware virtualization control block. The CRYCB contains three fields >> +to identify the adapters, usage domains and control domains assigned to the KVM >> +guest: >> + >> +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned >> + to the KVM guest. Each bit in the mask, from most significant to least >> + significant bit, corresponds to an APID from 0-255. If a bit is set, the >> + corresponding adapter is valid for use by the KVM guest. >> + >> +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains >> + assigned to the KVM guest. Each bit in the mask, from most significant to >> + least significant bit, corresponds to an AP queue index (APQI) from 0-255. If >> + a bit is set, the corresponding queue is valid for use by the KVM guest. >> + >> +* The AP Domain Mask field is a bit mask that identifies the AP control domains >> + assigned to the KVM guest. The ADM bit mask controls which domains can be >> + changed by an AP command-request message sent to a usage domain from the >> + guest. Each bit in the mask, from least significant to most significant bit, >> + corresponds to a domain from 0-255. If a bit is set, the corresponding domain >> + can be modified by an AP command-request message sent to a usage domain >> + configured for the KVM guest. > OK, that seems to imply that you modify a control domain by sending a > request to (any) usage domain? I do not doubt that, but the whole > architecture is really confusing :) > >> + >> +If you recall from the description of an AP Queue, AP instructions include >> +an APQN to identify the AP adapter and AP queue to which an AP command-request >> +message is to be sent (NQAP and PQAP instructions), or from which a >> +command-reply message is to be received (DQAP instruction). The validity of an >> +APQN is defined by the matrix calculated from the APM and AQM; it is the >> +cross product of all assigned adapter numbers (APM) with all assigned queue >> +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are >> +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for >> +the guest. > How does the control domain mask interact with that? Can you send a > command to an APQN valid for the guest to modify any control domain > specified in the mask? Does the SIE complain if you specify a control > domain that the host does not have access to (I'd guess so)? > >> + >> +The APQNs can provide secure key functionality - i.e., a private key is stored >> +on the adapter card for each of its domains - so each APQN must be assigned to >> +at most one guest or to the linux host. >> + >> + Example 1: Valid configuration: >> + ------------------------------ >> + Guest1: adapters 1,2 domains 5,6 >> + Guest2: adapter 1,2 domain 7 >> + >> + This is valid because both guests have a unique set of APQNs: Guest1 has >> + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). >> + >> + Example 2: Invalid configuration: >> + Guest1: adapters 1,2 domains 5,6 >> + Guest2: adapter 1 domains 6,7 >> + >> + This is an invalid configuration because both guests have access to >> + APQN (1,6). > So, the adapters or the domains can overlap , but the cross product > mustn't? If I had > > Guest1: adapters 1,2 domains 5,6 > Guest2: adapters 3,4 domains 5,6 > > would that be fine? > > Is there any rule about shared control domains? > > (...) > >> +Limitations >> +=========== >> +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP >> + queue that is still assigned to a mediated device. Even if the device >> + 'remove' callback returns an error, the device core detaches the AP >> + queue from the VFIO AP driver. It is therefore incumbent upon the >> + administrator to make sure there is no mediated device to which the >> + APQN - for the AP queue being unbound - is assigned. >> + >> +* Hot plug/unplug of AP devices is not supported for guests. > Not sure what that sentence means. Adding/removing devices by the > hypervisor is not supported? Or some guest actions, respectively > injecting notifications that would trigger some actions on the real > hardware? > > Do you want to add (some of) this in the future? > >> + >> +* Live guest migration is not supported for guests using AP devices. > Migration and vfio is an interesting area in general :) Would be great > if vfio-ap could benefit from any generic efforts in that area, but > that probably requires that someone with access to documentation and > hardware keeps an eye on developments. > >> \ No newline at end of file > Please add one :) >
On Tue, 21 Aug 2018 11:00:00 +0200 Harald Freudenberger <freude@linux.ibm.com> wrote: > On 20.08.2018 18:03, Cornelia Huck wrote: > > On Mon, 13 Aug 2018 17:48:19 -0400 > > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> +* AP Instructions: > >> + > >> + There are three AP instructions: > >> + > >> + * NQAP: to enqueue an AP command-request message to a queue > >> + * DQAP: to dequeue an AP command-reply message from a queue > >> + * PQAP: to administer the queues > > So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or > > is it that all of them need usage domains, but PQAP can target a control > > domain as well? > > > > [I don't want to dive deeply into the AP architecture here, just far > > enough to really understand the design implications.] > Well, to be honest, nobody ever tried this under Linux. Theoretically > one should be able to send a CPRB to a usage domain where inside > the CPRB another domain (the control domain) is addressed. However, > as of now I am only aware of applications controlling the same usage > domain. I don't know any application which is able to address another > control domain and I am not sure if the zcrypt device driver would > handle such a CPRB correctly. NQAP, DQAP and PQAP always address > a usage domain. But the CPRB send down the pipe via NQAP may > address some control thing on another domain. I am not sure which > code and where do the sorting out here. There are two candidates: > the firmware layer in the CEC and the crypto card code. OK, so it's possible as by the architecture, but at least Linux does not (currently) do it? Perhaps we should simply not overthink that whole control domain thingy :) It's mostly yet another knob, and as long as the design does not go against the general architecture, it's probably fine, I guess.
On Mon, 20 Aug 2018 16:16:15 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > On 08/20/2018 12:03 PM, Cornelia Huck wrote: > > On Mon, 13 Aug 2018 17:48:19 -0400 > > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> +AP Architectural Overview: > >> +========================= > >> +To facilitate the comprehension of the design, let's start with some > >> +definitions: > >> + > >> +* AP adapter > >> + > >> + An AP adapter is an IBM Z adapter card that can perform cryptographic > >> + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters > >> + assigned to the LPAR in which a linux host is running will be available to > >> + the linux host. Each adapter is identified by a number from 0 to 255. When > >> + installed, an AP adapter is accessed by AP instructions executed by any CPU. > >> + > >> + The AP adapter cards are assigned to a given LPAR via the system's Activation > >> + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus > > There's lots of s390 jargon in here... but one hopes that someone > > trying to understand AP is already familiar with the basics... > > I'm not quite sure how one can describe s390-specific devices that can > be installed > only on an s390 system without using s390 jargon. I would think that one > who is > administering a linux host or guest running on an s390 system would have > some > basic knowledge of s390. If you have any suggestions, I'd be happy to > entertain them. I fear the jargon is mostly unavoidable :( > >> +* AP Instructions: > >> + > >> + There are three AP instructions: > >> + > >> + * NQAP: to enqueue an AP command-request message to a queue > >> + * DQAP: to dequeue an AP command-reply message from a queue > >> + * PQAP: to administer the queues > > So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or > > is it that all of them need usage domains, but PQAP can target a control > > domain as well? > > All AP instructions - the lone exception being the PQAP(QCI) subfunction - > identify the usage domain that is the target of the instruction. I think > using the term 'control domain' is the source of much confusion. It makes > it seem as if there are two types of domains that serve different purposes. > That is simply not true. A domain is a partition within an AP adapter that > can process AP command request messages. All AP commands are sent to a > domain. Configuring a domain as a usage domain means it can be used to > process AP commands; in other words, it can be the target of an AP > instruction. Configuring a domain as a control domain means it can be > changed by an AP command. AP commands that change a domain are sent to > a usage domain, but the domain to be changed is specified in the payload > of the AP command message. The domain thus specified must be > identified via the AP configuration as a control domain, or the AP command > will be rejected. Yes, the 'control domain' term is a source of much confusion :( > > > > > [I don't want to dive deeply into the AP architecture here, just far > > enough to really understand the design implications.] > > Are you suggesting some of the above should be removed? If so, what? Not removed. What about an explanation like the following somewhere: "AP instructions identify the domain that is targeted to process the command: This must be one of the usage domains. They may modify a domain that is not one of the usage domains, but the modified domain must be one of the control domains." I hope that is both correct and understandable ;) > > Does the SIE complain if you specify a control > > domain that the host does not have access to (I'd guess so)? > > The SIE does not complain if you specify a domain to which the host - or a > lower level guest - does not have access. The firmware performs a logical > AND of the guest's and hosts's - or lower level guest's - APMs, AQMs and > ADMs > to create effective masks EAPM, EAQM and EADM. Only devices corresponding to > the bits set in the EAPM, EAQM and EADM will be accessible by the guest. OK, so the guest effectively won't see the domain. That makes sense. > > > > >> + > >> +The APQNs can provide secure key functionality - i.e., a private key is stored > >> +on the adapter card for each of its domains - so each APQN must be assigned to > >> +at most one guest or to the linux host. > >> + > >> + Example 1: Valid configuration: > >> + ------------------------------ > >> + Guest1: adapters 1,2 domains 5,6 > >> + Guest2: adapter 1,2 domain 7 > >> + > >> + This is valid because both guests have a unique set of APQNs: Guest1 has > >> + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). > >> + > >> + Example 2: Invalid configuration: > >> + Guest1: adapters 1,2 domains 5,6 > >> + Guest2: adapter 1 domains 6,7 > >> + > >> + This is an invalid configuration because both guests have access to > >> + APQN (1,6). > > So, the adapters or the domains can overlap , but the cross product > > mustn't? If I had > > > > Guest1: adapters 1,2 domains 5,6 > > Guest2: adapters 3,4 domains 5,6 > > > > would that be fine? > > Yes, that would be fine because Guest1 would have access to APQNs > (1,5), (1,6), (2,5) and (2,6) while Guest2 would have access to > (3,5), (3,6), (4,5) AND (4,6), but neither would have access to > the same APQN. Might be a good idea to add this as an additional example. > > > > > Is there any rule about shared control domains? > > AFAIK there isn't, but I will consult with Reinhard about that. > > > > > (...) > > > >> +Limitations > >> +=========== > >> +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP > >> + queue that is still assigned to a mediated device. Even if the device > >> + 'remove' callback returns an error, the device core detaches the AP > >> + queue from the VFIO AP driver. It is therefore incumbent upon the > >> + administrator to make sure there is no mediated device to which the > >> + APQN - for the AP queue being unbound - is assigned. > >> + > >> +* Hot plug/unplug of AP devices is not supported for guests. > > Not sure what that sentence means. Adding/removing devices by the > > hypervisor is not supported? Or some guest actions, respectively > > injecting notifications that would trigger some actions on the real > > hardware? > > No means is provided to modify a guest's AP matrix - i.e., APM, AQM > and ADM - while a guest is running. Once a guest is running, its AP > configuration can not be changed dynamically. > > > > > Do you want to add (some of) this in the future? > > Yes, we plan to introduce dynamic configurations in future releases. What about the following sentence: "Dynamically modifying the AP matrix for a running guest (which would amount to hot(un)plug of AP devices for the guest) is currently not supported." > > > > >> + > >> +* Live guest migration is not supported for guests using AP devices. > > Migration and vfio is an interesting area in general :) Would be great > > if vfio-ap could benefit from any generic efforts in that area, but > > that probably requires that someone with access to documentation and > > hardware keeps an eye on developments. > > I have briefly looked at some of the articles talking about live migration > of passthrough devices, but nothing seemed applicable to AP architecture. Most of the approaches to live migration of vfio devices are focused on pci devices; even ccw devices have different needs. Any halfway generic approach would need a common part and a backend-specific part anyway, I think. > From my limited perspective, it would seem that architectural changes > would have to be implemented to fully support live migration of in-process > AP queues. From what I have seen of the AP virtualization architecture, this may very well be the case. I'll keep AP in the back of my head, but it's probably better to focus on the easier targets first.
On 08/20/2018 10:16 PM, Tony Krowiak wrote: >> Does the SIE complain if you specify a control >> domain that the host does not have access to (I'd guess so)? > > The SIE does not complain if you specify a domain to which the host - or a > lower level guest - does not have access. The firmware performs a logical > AND of the guest's and hosts's - or lower level guest's - APMs, AQMs and ADMs Rather a bit-wise AND, I guess (of the same type masks corresponding to Guest 1 and Guest 2). The result of a logical AND is a logical value (true or false) as far as I remember. > to create effective masks EAPM, EAQM and EADM. Only devices corresponding to > the bits set in the EAPM, EAQM and EADM will be accessible by the guest. I'm not sure what is the intended meaning of 'the SIE complains'. If it means getting out of (SIE when interpreting lets say an NQAP under the discussed circumstances) with some sort of error code, I think Tony's answer, ' SIE does not complain' makes a lot of sense. It's the guest that's is trying to stretch further than the blanket reaches, and it's the guest that needs to be educated on this fact. AFAIR SIE does the right thing (whatever the right thing is) and we don't have to worry about it. As a matter of fact I can't recall exactly what is supposed to happen when a guest tries to modify a domain such that the guest does not have privileges to modify (in terms of EADM, either because the guest or because the host does not have the corresponding bit set). I'm sure I did not try it out. Tony did you test this scenario? (BTW my best guess at the moment is, that the situation is handled via the command-reply.) Regards, Halil
On 08/21/2018 12:13 PM, Cornelia Huck wrote: > On Mon, 20 Aug 2018 16:16:15 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> On 08/20/2018 12:03 PM, Cornelia Huck wrote: >>> On Mon, 13 Aug 2018 17:48:19 -0400 >>> Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: >>>> +AP Architectural Overview: >>>> +========================= >>>> +To facilitate the comprehension of the design, let's start with some >>>> +definitions: >>>> + >>>> +* AP adapter >>>> + >>>> + An AP adapter is an IBM Z adapter card that can perform cryptographic >>>> + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters >>>> + assigned to the LPAR in which a linux host is running will be available to >>>> + the linux host. Each adapter is identified by a number from 0 to 255. When >>>> + installed, an AP adapter is accessed by AP instructions executed by any CPU. >>>> + >>>> + The AP adapter cards are assigned to a given LPAR via the system's Activation >>>> + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus >>> There's lots of s390 jargon in here... but one hopes that someone >>> trying to understand AP is already familiar with the basics... >> I'm not quite sure how one can describe s390-specific devices that can >> be installed >> only on an s390 system without using s390 jargon. I would think that one >> who is >> administering a linux host or guest running on an s390 system would have >> some >> basic knowledge of s390. If you have any suggestions, I'd be happy to >> entertain them. > I fear the jargon is mostly unavoidable :( > >>>> +* AP Instructions: >>>> + >>>> + There are three AP instructions: >>>> + >>>> + * NQAP: to enqueue an AP command-request message to a queue >>>> + * DQAP: to dequeue an AP command-reply message from a queue >>>> + * PQAP: to administer the queues >>> So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or >>> is it that all of them need usage domains, but PQAP can target a control >>> domain as well? >> All AP instructions - the lone exception being the PQAP(QCI) subfunction - >> identify the usage domain that is the target of the instruction. I think >> using the term 'control domain' is the source of much confusion. It makes >> it seem as if there are two types of domains that serve different purposes. >> That is simply not true. A domain is a partition within an AP adapter that >> can process AP command request messages. All AP commands are sent to a >> domain. Configuring a domain as a usage domain means it can be used to >> process AP commands; in other words, it can be the target of an AP >> instruction. Configuring a domain as a control domain means it can be >> changed by an AP command. AP commands that change a domain are sent to >> a usage domain, but the domain to be changed is specified in the payload >> of the AP command message. The domain thus specified must be >> identified via the AP configuration as a control domain, or the AP command >> will be rejected. > Yes, the 'control domain' term is a source of much confusion :( > >>> [I don't want to dive deeply into the AP architecture here, just far >>> enough to really understand the design implications.] >> Are you suggesting some of the above should be removed? If so, what? > Not removed. What about an explanation like the following somewhere: > > "AP instructions identify the domain that is targeted to process the > command: This must be one of the usage domains. They may modify a > domain that is not one of the usage domains, but the modified domain > must be one of the control domains." > > I hope that is both correct and understandable ;) Yes, it is both correct and understandable. > >>> Does the SIE complain if you specify a control >>> domain that the host does not have access to (I'd guess so)? >> The SIE does not complain if you specify a domain to which the host - or a >> lower level guest - does not have access. The firmware performs a logical >> AND of the guest's and hosts's - or lower level guest's - APMs, AQMs and >> ADMs >> to create effective masks EAPM, EAQM and EADM. Only devices corresponding to >> the bits set in the EAPM, EAQM and EADM will be accessible by the guest. > OK, so the guest effectively won't see the domain. That makes sense. It is one of the positive aspects of the architecture. > >>> >>>> + >>>> +The APQNs can provide secure key functionality - i.e., a private key is stored >>>> +on the adapter card for each of its domains - so each APQN must be assigned to >>>> +at most one guest or to the linux host. >>>> + >>>> + Example 1: Valid configuration: >>>> + ------------------------------ >>>> + Guest1: adapters 1,2 domains 5,6 >>>> + Guest2: adapter 1,2 domain 7 >>>> + >>>> + This is valid because both guests have a unique set of APQNs: Guest1 has >>>> + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). >>>> + >>>> + Example 2: Invalid configuration: >>>> + Guest1: adapters 1,2 domains 5,6 >>>> + Guest2: adapter 1 domains 6,7 >>>> + >>>> + This is an invalid configuration because both guests have access to >>>> + APQN (1,6). >>> So, the adapters or the domains can overlap , but the cross product >>> mustn't? If I had >>> >>> Guest1: adapters 1,2 domains 5,6 >>> Guest2: adapters 3,4 domains 5,6 >>> >>> would that be fine? >> Yes, that would be fine because Guest1 would have access to APQNs >> (1,5), (1,6), (2,5) and (2,6) while Guest2 would have access to >> (3,5), (3,6), (4,5) AND (4,6), but neither would have access to >> the same APQN. > Might be a good idea to add this as an additional example. Will do > >>> Is there any rule about shared control domains? >> AFAIK there isn't, but I will consult with Reinhard about that. >> >>> (...) >>> >>>> +Limitations >>>> +=========== >>>> +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP >>>> + queue that is still assigned to a mediated device. Even if the device >>>> + 'remove' callback returns an error, the device core detaches the AP >>>> + queue from the VFIO AP driver. It is therefore incumbent upon the >>>> + administrator to make sure there is no mediated device to which the >>>> + APQN - for the AP queue being unbound - is assigned. >>>> + >>>> +* Hot plug/unplug of AP devices is not supported for guests. >>> Not sure what that sentence means. Adding/removing devices by the >>> hypervisor is not supported? Or some guest actions, respectively >>> injecting notifications that would trigger some actions on the real >>> hardware? >> No means is provided to modify a guest's AP matrix - i.e., APM, AQM >> and ADM - while a guest is running. Once a guest is running, its AP >> configuration can not be changed dynamically. >> >>> Do you want to add (some of) this in the future? >> Yes, we plan to introduce dynamic configurations in future releases. > What about the following sentence: > > "Dynamically modifying the AP matrix for a running guest (which would > amount to hot(un)plug of AP devices for the guest) is currently not > supported." Sounds fine to me. > >>> >>>> + >>>> +* Live guest migration is not supported for guests using AP devices. >>> Migration and vfio is an interesting area in general :) Would be great >>> if vfio-ap could benefit from any generic efforts in that area, but >>> that probably requires that someone with access to documentation and >>> hardware keeps an eye on developments. >> I have briefly looked at some of the articles talking about live migration >> of passthrough devices, but nothing seemed applicable to AP architecture. > Most of the approaches to live migration of vfio devices are focused on > pci devices; even ccw devices have different needs. Any halfway generic > approach would need a common part and a backend-specific part anyway, I > think. Yes, that would seem to be the case. > >> From my limited perspective, it would seem that architectural changes >> would have to be implemented to fully support live migration of in-process >> AP queues. > From what I have seen of the AP virtualization architecture, this may > very well be the case. I'll keep AP in the back of my head, but it's > probably better to focus on the easier targets first. That has been our goal from the start. >
On 21.08.2018 17:53, Cornelia Huck wrote: > On Tue, 21 Aug 2018 11:00:00 +0200 > Harald Freudenberger <freude@linux.ibm.com> wrote: > >> On 20.08.2018 18:03, Cornelia Huck wrote: >>> On Mon, 13 Aug 2018 17:48:19 -0400 >>> Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: >>>> +* AP Instructions: >>>> + >>>> + There are three AP instructions: >>>> + >>>> + * NQAP: to enqueue an AP command-request message to a queue >>>> + * DQAP: to dequeue an AP command-reply message from a queue >>>> + * PQAP: to administer the queues >>> So, NQAP/DQAP need usage domains, while PQAP needs a control domain? Or >>> is it that all of them need usage domains, but PQAP can target a control >>> domain as well? >>> >>> [I don't want to dive deeply into the AP architecture here, just far >>> enough to really understand the design implications.] >> Well, to be honest, nobody ever tried this under Linux. Theoretically >> one should be able to send a CPRB to a usage domain where inside >> the CPRB another domain (the control domain) is addressed. However, >> as of now I am only aware of applications controlling the same usage >> domain. I don't know any application which is able to address another >> control domain and I am not sure if the zcrypt device driver would >> handle such a CPRB correctly. NQAP, DQAP and PQAP always address >> a usage domain. But the CPRB send down the pipe via NQAP may >> address some control thing on another domain. I am not sure which >> code and where do the sorting out here. There are two candidates: >> the firmware layer in the CEC and the crypto card code. > OK, so it's possible as by the architecture, but at least Linux does > not (currently) do it? > > Perhaps we should simply not overthink that whole control domain > thingy :) It's mostly yet another knob, and as long as the design does > not go against the general architecture, it's probably fine, I guess. Well, sooner or later this has to work. Yesterday we tested the control domain thing with trying to pull some simple data from a 'controlled' domain to the TKE - doesn't work with a Linux LPAR. I will investigate the details in the next weeks. However, long-term it should be possible to run scenarios like having one KVM guest control all the domains used by other KVM guests. With respect to the KVM vfio driver, currently there should be just the rule that for a guest the control domain mask should be equal or a superset of the usage domain mask. This is by convention as the architecture is not so clear here, but this is enforced on every place which deals with usage and control domains (SE, TKE). regards Harald Freudenberger
On Tue, 21 Aug 2018 20:54:49 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > On 08/20/2018 10:16 PM, Tony Krowiak wrote: > >> Does the SIE complain if you specify a control > >> domain that the host does not have access to (I'd guess so)? > > > > The SIE does not complain if you specify a domain to which the host - or a > > lower level guest - does not have access. The firmware performs a logical > > AND of the guest's and hosts's - or lower level guest's - APMs, AQMs and ADMs > > Rather a bit-wise AND, I guess (of the same type masks corresponding to Guest 1 and > Guest 2). The result of a logical AND is a logical value (true or false) as > far as I remember. > > > to create effective masks EAPM, EAQM and EADM. Only devices corresponding to > > the bits set in the EAPM, EAQM and EADM will be accessible by the guest. > > I'm not sure what is the intended meaning of 'the SIE complains'. If it means > getting out of (SIE when interpreting lets say an NQAP under the discussed > circumstances) with some sort of error code, I think Tony's answer, ' SIE does not complain' > makes a lot of sense. It's the guest that's is trying to stretch further than > the blanket reaches, and it's the guest that needs to be educated on this fact. Yep, that's what I meant. If the hypervisor can call the SIE with that config, but the guest gets an error if it tries to use something that it cannot use, that's fine.
On Wed, 22 Aug 2018 09:04:13 +0200 Harald Freudenberger <freude@linux.ibm.com> wrote: > Well, sooner or later this has to work. Yesterday we tested the control > domain thing with trying to pull some simple data from a 'controlled' domain > to the TKE - doesn't work with a Linux LPAR. I will investigate the details in the > next weeks. However, long-term it should be possible to run scenarios > like having one KVM guest control all the domains used by other KVM guests. > With respect to the KVM vfio driver, currently there should be just the > rule that for a guest the control domain mask should be equal or a superset > of the usage domain mask. This is by convention as the architecture is > not so clear here, but this is enforced on every place which deals with > usage and control domains (SE, TKE). Thanks for the update; this makes me think we really should fiddle with the masks in the kernel (as opposed to doing it higher up in the stack).
... about control domains Talked with the s390 firmware guys. The convention that the control domain mask is a superset of the usage domain mask is only true for 1st level guests. It is absolutely valid to run a kvm guest with restricted control domain mask bitmap in the CRYCB. It is valid to have an empty control domain mask and the guest should be able to run crypto CPRBs on the usage domain(s) without any problems. However, nobody has tried this. regards Harald Freudenberger
On 08/22/2018 12:13 PM, Harald Freudenberger wrote: > ... about control domains > > Talked with the s390 firmware guys. The convention that the control domain > mask is a superset of the usage domain mask is only true for 1st level guests. > > It is absolutely valid to run a kvm guest with restricted control domain > mask bitmap in the CRYCB. It is valid to have an empty control domain mask > and the guest should be able to run crypto CPRBs on the usage domain(s) without > any problems. However, nobody has tried this. I did try this ;). > > regards > Harald Freudenberger > >
diff --git a/Documentation/s390/vfio-ap.txt b/Documentation/s390/vfio-ap.txt new file mode 100644 index 0000000..524951a --- /dev/null +++ b/Documentation/s390/vfio-ap.txt @@ -0,0 +1,615 @@ +Introduction: +============ +The Adjunct Processor (AP) facility is an IBM Z cryptographic facility comprised +of three AP instructions and from 1 up to 256 PCIe cryptographic adapter cards. +The AP devices provide cryptographic functions to all CPUs assigned to a +linux system running in an IBM Z system LPAR. + +The AP adapter cards are exposed via the AP bus. The motivation for vfio-ap +is to make AP cards available to KVM guests using the VFIO mediated device +framework. This implementation relies considerably on the s390 virtualization +facilities which do most of the hard work of providing direct access to AP +devices. + +AP Architectural Overview: +========================= +To facilitate the comprehension of the design, let's start with some +definitions: + +* AP adapter + + An AP adapter is an IBM Z adapter card that can perform cryptographic + functions. There can be from 0 to 256 adapters assigned to an LPAR. Adapters + assigned to the LPAR in which a linux host is running will be available to + the linux host. Each adapter is identified by a number from 0 to 255. When + installed, an AP adapter is accessed by AP instructions executed by any CPU. + + The AP adapter cards are assigned to a given LPAR via the system's Activation + Profile which can be edited via the HMC. When the system is IPL'd, the AP bus + module is loaded and detects the AP adapter cards assigned to the LPAR. The AP + bus creates a sysfs device for each adapter as they are detected. For example, + if AP adapters 4 and 10 (0x0a) are assigned to the LPAR, the AP bus will + create the following sysfs entries: + + /sys/devices/ap/card04 + /sys/devices/ap/card0a + + Symbolic links to these devices will also be created in the AP bus devices + sub-directory: + + /sys/bus/ap/devices/[card04] + /sys/bus/ap/devices/[card04] + +* AP domain + + An adapter is partitioned into domains. Each domain can be thought of as + a set of hardware registers for processing AP instructions. An adapter can + hold up to 256 domains. Each domain is identified by a number from 0 to 255. + Domains can be further classified into two types: + + * Usage domains are domains that can be accessed directly to process AP + commands. + + * Control domains are domains that are accessed indirectly by AP + commands sent to a usage domain to control or change the domain; for + example, to set a secure private key for the domain. + + The AP usage and control domains are assigned to a given LPAR via the system's + Activation Profile which can be edited via the HMC. When the system is IPL'd, + the AP bus module is loaded and detects the AP usage and control domains + assigned to the LPAR. The domain number of each usage domain will be coupled + with the adapter number of each AP adapter assigned to the LPAR to identify + the AP queues (see AP Queue section below). The domain number of each control + domain will be represented in a bitmask and stored in a sysfs file + /sys/bus/ap/ap_control_domain_mask created by the bus. The bits in the mask, + from most to least significant bit, correspond to domains 0-255. + + A domain may be assigned to a system as both a usage and control domain, or + as a control domain only. Consequently, all domains assigned as both a usage + and control domain can both process AP commands as well as be changed by an AP + command sent to any usage domain assigned to the same system. Domains assigned + only as control domains can not process AP commands but can be changed by AP + commands sent to any usage domain assigned to the system. + +* AP Queue + + An AP queue is the means by which an AP command-request message is sent to a + usage domain inside a specific adapter. An AP queue is identified by a tuple + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The + APQI corresponds to a given usage domain number within the adapter. This tuple + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP + instructions include a field containing the APQN to identify the AP queue to + which the AP command-request message is to be sent for processing. + + The AP bus will create a sysfs device for each APQN that can be derived from + the cross product of the AP adapter and usage domain numbers detected when the + AP bus module is loaded. For example, if adapters 4 and 10 (0x0a) and usage + domains 6 and 71 (0x47) are assigned to the LPAR, the AP bus will create the + following sysfs entries: + + /sys/devices/ap/card04/04.0006 + /sys/devices/ap/card04/04.0047 + /sys/devices/ap/card0a/0a.0006 + /sys/devices/ap/card0a/0a.0047 + + The following symbolic links to these devices will be created in the AP bus + devices subdirectory: + + /sys/bus/ap/devices/[04.0006] + /sys/bus/ap/devices/[04.0047] + /sys/bus/ap/devices/[0a.0006] + /sys/bus/ap/devices/[0a.0047] + +* AP Instructions: + + There are three AP instructions: + + * NQAP: to enqueue an AP command-request message to a queue + * DQAP: to dequeue an AP command-reply message from a queue + * PQAP: to administer the queues + +AP and SIE: +========== +Let's now take a look at how AP instructions executed on a guest are interpreted +by the hardware. + +A satellite control block called the Crypto Control Block (CRYCB) is attached to +our main hardware virtualization control block. The CRYCB contains three fields +to identify the adapters, usage domains and control domains assigned to the KVM +guest: + +* The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned + to the KVM guest. Each bit in the mask, from most significant to least + significant bit, corresponds to an APID from 0-255. If a bit is set, the + corresponding adapter is valid for use by the KVM guest. + +* The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains + assigned to the KVM guest. Each bit in the mask, from most significant to + least significant bit, corresponds to an AP queue index (APQI) from 0-255. If + a bit is set, the corresponding queue is valid for use by the KVM guest. + +* The AP Domain Mask field is a bit mask that identifies the AP control domains + assigned to the KVM guest. The ADM bit mask controls which domains can be + changed by an AP command-request message sent to a usage domain from the + guest. Each bit in the mask, from least significant to most significant bit, + corresponds to a domain from 0-255. If a bit is set, the corresponding domain + can be modified by an AP command-request message sent to a usage domain + configured for the KVM guest. + +If you recall from the description of an AP Queue, AP instructions include +an APQN to identify the AP adapter and AP queue to which an AP command-request +message is to be sent (NQAP and PQAP instructions), or from which a +command-reply message is to be received (DQAP instruction). The validity of an +APQN is defined by the matrix calculated from the APM and AQM; it is the +cross product of all assigned adapter numbers (APM) with all assigned queue +indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are +assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for +the guest. + +The APQNs can provide secure key functionality - i.e., a private key is stored +on the adapter card for each of its domains - so each APQN must be assigned to +at most one guest or to the linux host. + + Example 1: Valid configuration: + ------------------------------ + Guest1: adapters 1,2 domains 5,6 + Guest2: adapter 1,2 domain 7 + + This is valid because both guests have a unique set of APQNs: Guest1 has + APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). + + Example 2: Invalid configuration: + Guest1: adapters 1,2 domains 5,6 + Guest2: adapter 1 domains 6,7 + + This is an invalid configuration because both guests have access to + APQN (1,6). + +The Design: +=========== +The design introduces three new objects: + +1. AP matrix device +2. VFIO AP device driver (vfio_ap.ko) +3. AP mediated matrix passthrough device + +The VFIO AP device driver +------------------------- +The VFIO AP (vfio_ap) device driver serves the following purposes: + +1. Provides the interfaces to bind APQNs for exclusive use of KVM guests. + +2. Sets up the VFIO mediated device interfaces to manage a mediated matrix + device and creates the sysfs interfaces for assigning adapters, usage + domains, and control domains comprising the matrix for a KVM guest. + +3. Configures the APM, AQM and ADM in the CRYCB referenced by a KVM guest's + SIE state description to grant the guest access to a matrix of AP devices + +Reserve APQNs for exclusive use of KVM guests +--------------------------------------------- +The following block diagram illustrates the mechanism by which APQNs are +reserved: + + +------------------+ + remove | | + +------------------->+ cex4queue driver + + | | | + | +------------------+ + | + | + | remove +------------------+ + | +-----------------+ |<---------------+ + | | probe | Device core | | + | | +--------------+ +<-----------+ | + | | | +--------+---------+ | | + | | | ^ | | + | | | register | | | + | | | vfio_ap device | bind | | unbind + | v v | vfio_ap | | cex4queue ++--------+-----+---+ +--------+---------+ +-+---+---+--+ +| | register | | | | +| ap_bus +<---------+ vfio_ap driver + + admin | +| +--------->+ | | | ++------------------+ probe +---+--------+-----+ +------------+ + | | + create | | assign + | | adapters/domains/control domains + v v + +---+--------+-----+ + | | + | mediated device | + | | + +------------------+ + +The process for reserving an AP queue for use by a KVM guest is: + +* The vfio-ap driver during its initialization will perform the following: + * Create a single 'matrix' device, /sys/devices/vfio_ap/matrix. This will + serve as the parent device for all mediated matrix devices used to configure + an AP matrix which is the cross product of all AP adapter IDs (APID) and + queue indexes (APQI) assigned to a guest. + * Register the matrix device with the device core +* Register with the ap_bus for AP queue devices of type 10 (CEX4 and + newer) and to provide the vfio_ap driver's probe and remove callback + interfaces. Devices older than CEX4 queues are not supported to simplify the + implementation and because older devices will be going out of service in the + relatively near future. +* The admin needs to unbind AP Queues to be reserved for use by guests from + the cex4queue device driver and bind them to the vfio_ap device driver. + + +Set up the VFIO mediated device interfaces +------------------------------------------ +The VFIO AP device driver utilizes the common interface of the VFIO mediated +device core driver to: +* Register an AP mediated bus driver to add a mediated matrix device to and + remove it from a VFIO group. +* Create and destroy a mediated matrix device +* Add a mediated matrix device to and remove it from the AP mediated bus driver +* Add a mediated matrix device to and remove it from an IOMMU group + +The following high-level block diagram shows the main components and interfaces +of the VFIO AP mediated matrix device driver: + + +-------------+ + | | + | +---------+ | mdev_register_driver() +--------------+ + | | Mdev | +<-----------------------+ | + | | bus | | | vfio_mdev.ko | + | | driver | +----------------------->+ |<-> VFIO user + | +---------+ | probe()/remove() +--------------+ APIs + | | + | MDEV CORE | + | MODULE | + | mdev.ko | + | +---------+ | mdev_register_device() +--------------+ + | |Physical | +<-----------------------+ | + | | device | | | vfio_ap.ko |<-> matrix + | |interface| +----------------------->+ | device + | +---------+ | callback +--------------+ + +-------------+ + +During initialization of the vfio_ap module, the matrix device is registered +with an 'mdev_parent_ops' structure that provides the sysfs attribute +structures, mdev functions and callback interfaces for managing the mediated +matrix device. + +* sysfs attribute structures: + * supported_type_groups + The VFIO mediated device framework supports creation of user-defined + mediated device types. These mediated device types are specified + via the 'supported_type_groups' structure when a device is registered + with the mediated device framework. The registration process creates the + sysfs structures for each mediated device type specified in the + 'mdev_supported_types' sub-directory of the device being registered. Along + with the device type, the sysfs attributes of the mediated device type are + provided. + + The VFIO AP device driver will register one mediated device type for + passthrough devices: + /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough + Only the read-only attributes required by the VFIO mdev framework will + be provided: + ... name + ... device_api + ... available_instances + ... device_api + Where: + * name: specifies the name of the mediated device type + * device_api: the mediated device type's API + * available_instances: the number of mediated matrix passthrough devices + that can be created + * device_api: specifies the VFIO API + * mdev_attr_groups + This attribute group identifies the user-defined sysfs attributes of the + mediated device. When a device is registered with the VFIO mediated device + framework, the sysfs attributes files identified in the 'mdev_attr_groups' + structure will be created in the mediated matrix device's directory. The + sysfs attributes for a mediated matrix device are: + * assign_adapter: + * unassign_adapter: + Write-only attributes for assigning/unassigning an AP adapter to/from the + mediated matrix device. To assign/unassign an adapter, the APID of the + adapter is written to the respective attribute file. + * assign_domain: + * unassign_domain: + Write-only attributes for assigning/unassigning an AP usage domain to/from + the mediated matrix device. To assign/unassign a domain, the APQI of the + AP queue corresponding to a usage domain is written to the respective + attribute file. + * matrix: + A read-only file for displaying the APQNs derived from the cross product + of the adapters and domains assigned to the mediated matrix device. + * assign_control_domain: + * unassign_control_domain: + Write-only attributes for assigning/unassigning an AP control domain + to/from the mediated matrix device. To assign/unassign a control domain, + the ID of a domain to be assigned/unassigned is written to the respective + attribute file. + * control_domains: + A read-only file for displaying the control domain numbers assigned to the + mediated matrix device. + +* functions: + * create: + allocates the ap_matrix_mdev structure used by the vfio_ap driver to: + * Store the reference to the KVM structure for the guest using the mdev + * Store the AP matrix configuration for the adapters, domains, and control + domains assigned via the corresponding sysfs attributes files + * remove: + deallocates the mediated matrix device's ap_matrix_mdev structure. This will + be allowed only if a running guest is not using the mdev. + +* callback interfaces + * open: + The vfio_ap driver uses this callback to register a + VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the mdev matrix + device. The open is invoked when QEMU connects the VFIO iommu group + for the mdev matrix device to the MDEV bus. Access to the KVM structure used + to configure the KVM guest is provided via this callback. The KVM structure, + is used to configure the guest's access to the AP matrix defined via the + mediated matrix device's sysfs attribute files. + * release: + unregisters the VFIO_GROUP_NOTIFY_SET_KVM notifier callback function for the + mdev matrix device and deconfigures the guest's AP matrix. + +Configure the APM, AQM and ADM in the CRYCB: +------------------------------------------- +Configuring the AP matrix for a KVM guest will be performed when the +VFIO_GROUP_NOTIFY_SET_KVM notifier callback is invoked. The notifier +function is called when QEMU connects to KVM. The CRYCB is configured by: +* Setting the bits in the APM corresponding to the APIDs assigned to the + mediated matrix device via its 'assign_adapter' interface. +* Setting the bits in the AQM corresponding to the APQIs assigned to the + mediated matrix device via its 'assign_domain' interface. +* Setting the bits in the ADM corresponding to the domain dIDs assigned to the + mediated matrix device via its 'assign_control_domains' interface. + +The CPU model features for AP +----------------------------- +The AP stack relies on the presence of the AP instructions as well as two +facilities: The AP Facilities Test (APFT) facility; and the AP Query +Configuration Information (QCI) facility. These features/facilities are made +available to a KVM guest via the following CPU model features: + +1. ap: Indicates whether the AP instructions are installed on the guest. This + feature will be enabled by KVM only if the AP instructions are installed + on the host. + +2. apft: Indicates the APFT facility is available on the guest. This facility + can be made available to the guest only if it is available on the host. + +3. apft: Indicates the AP QCI facility is available on the guest. This facility + can be made available to the guest only if it is available on the host. + +Note that if the user chooses to specify a CPU model different than the 'host' +model to QEMU, the CPU model features and facilities need to be turned on +explicitly; for example: + + /usr/bin/qemu-system-s390x ... -cpu z13,ap=on,apqci=on,apft=on + +A guest can be precluded from using AP features/facilities by turning them off +explicitly; for example: + + /usr/bin/qemu-system-s390x ... -cpu host,ap=off,apqci=off,apft=off + +Example: +======= +Let's now provide an example to illustrate how KVM guests may be given +access to AP facilities. For this example, we will show how to configure +two guests such that executing the lszcrypt command on the guests would +look like this: + +Guest1 +------ +CARD.DOMAIN TYPE MODE +------------------------------ +05 CEX5C CCA-Coproc +05.0004 CEX5C CCA-Coproc +05.00ab CEX5C CCA-Coproc +06 CEX5A Accelerator +06.0004 CEX5A Accelerator +06.00ab CEX5C CCA-Coproc + +Guest2 +------ +CARD.DOMAIN TYPE MODE +------------------------------ +05 CEX5A Accelerator +05.0047 CEX5A Accelerator +05.00ff CEX5A Accelerator + +These are the steps: + +1. Install the vfio_ap module on the linux host. The dependency chain for the + vfio_ap module is: + * vfio + * mdev + * vfio_mdev + * KVM + * vfio_ap + +2. Secure the AP queues to be used by the two guests so that the host can not + access them. Only type 10 adapters (i.e., CEX4 and later) are supported + for the following reasons: To simplify the implementation; a lack of older + systems on which to test; and because the older hardware will go out of + service in a relatively short time. + + To secure the AP queues each, each AP Queue device must first be unbound from + the cex4queue device driver. The sysfs location of the driver is: + + /sys/bus/ap + --- [drivers] + ------ [cex4queue] + --------- [05.0004] + --------- [05.0047] + --------- [05.00ab] + --------- [05.00ff] + --------- [06.0004] + --------- [06.00ab] + --------- unbind + + To unbind AP queue 05.0004 for example; + + echo 05.0004 > unbind + + The AP queue devices must then be bound to the vfio_ap driver. The sysfs + location of the driver is: + + /sys/bus/ap + --- [drivers] + ------ [cex4queue] + ---------- bind + + To bind AP queue 05.0004 to the vfio_ap driver: + + echo 05.0004 > bind + + Take note that the AP queues bound to the vfio_ap driver will be available + for guest usage until the vfio_ap module is unloaded, or the host system is + shut down. + +3. Create the mediated devices needed to configure the AP matrixes for the + two guests and to provide an interface to the vfio_ap driver for + use by the guests: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) + --------- create + --------- [devices] + + To create the mediated devices for the two guests: + + uuidgen > create + uuidgen > create + + This will create two mediated devices in the [devices] subdirectory named + with the UUID written to the create attribute file. We call them $uuid1 + and $uuid2: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] + --------- [devices] + ------------ [$uuid1] + --------------- assign_adapter + --------------- assign_control_domain + --------------- assign_domain + --------------- matrix + --------------- unassign_adapter + --------------- unassign_control_domain + --------------- unassign_domain + + ------------ [$uuid2] + --------------- assign_adapter + --------------- assign_control_domain + --------------- assign_domain + --------------- matrix + --------------- unassign_adapter + --------------------- unassign_control_domain + --------------------- unassign_domain + +4. The administrator now needs to configure the matrixes for mediated + devices $uuid1 (for Guest1) and $uuid2 (for Guest2). + + This is how the matrix is configured for Guest1: + + echo 5 > assign_adapter + echo 6 > assign_adapter + echo 4 > assign_domain + echo 0xab > assign_domain + + For this implementation, all usage domains - i.e., domains assigned + via the assign_domain attribute file - will also be configured in the ADM + field of the KVM guest's CRYCB, so there is no need to assign control + domains here unless you want to assign control domains that are not + assigned as usage domains. + + If a mistake is made configuring an adapter, domain or control domain, + you can use the unassign_xxx files to unassign the adapter, domain or + control domain. + + To display the matrix configuration for Guest1: + + cat matrix + + This is how the matrix is configured for Guest2: + + echo 5 > assign_adapter + echo 0x47 > assign_domain + echo 0xff > assign_domain + + In order to successfully assign an adapter: + + * All APQNs that can be derived from the adapter ID and the IDs of + the previously assigned domains must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APID bound to the vfio_ap driver. + + No APQN that can be derived from the adapter ID and the IDs of the + previously assigned domains can be assigned to another mediated matrix + device. + + In order to successfully assign a domain: + + * All APQNs that can be derived from the domain ID and the IDs of + the previously assigned adapters must be bound to the vfio_ap device + driver. If no domains have yet been assigned, then there must be at least + one APQN with the specified APQI bound to the vfio_ap driver. + + No APQN that can be derived from the domain ID and the IDs of the + previously assigned adapters can be assigned to another mediated matrix + device. + +5. Start Guest1: + + /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apqci=on,apft=on \ + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... + +7. Start Guest2: + + /usr/bin/qemu-system-s390x ... -cpu xxx,ap=on,apqci=on,apft=on \ + -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... + +When the guest is shut down, the mediated matrix device may be removed. + +Using our example again, to remove the mediated matrix device $uuid1: + + /sys/devices/vfio_ap/matrix/ + --- [mdev_supported_types] + ------ [vfio_ap-passthrough] + --------- [devices] + ------------ [$uuid1] + --------------- remove + + + echo 1 > remove + + This will release all the AP queues configured for the mediated device and + remove all of the mdev matrix device's sysfs structures including the mdev + device itself. To recreate and reconfigure the mdev matrix device, all of the + steps starting with step 3 will have to be performed again. Note that the + remove will fail if a guest using the mdev is still running. + + It is not necessary to remove an mdev matrix device, but one may want to + remove it if no guest will use it during the lifetime of the linux host. If + the mdev matrix device is removed, one may want to unbind the AP queues the + guest was using from the vfio_ap device driver and bind them back to the + default driver. Alternatively, the AP queues can be configured for another + mdev matrix (i.e., guest). + + +Limitations +=========== +* The KVM/kernel interfaces do not provide a way to prevent unbinding an AP + queue that is still assigned to a mediated device. Even if the device + 'remove' callback returns an error, the device core detaches the AP + queue from the VFIO AP driver. It is therefore incumbent upon the + administrator to make sure there is no mediated device to which the + APQN - for the AP queue being unbound - is assigned. + +* Hot plug/unplug of AP devices is not supported for guests. + +* Live guest migration is not supported for guests using AP devices. \ No newline at end of file diff --git a/MAINTAINERS b/MAINTAINERS index f60dd56..beeff24 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -12428,6 +12428,7 @@ S: Supported F: drivers/s390/crypto/vfio_ap_drv.c F: drivers/s390/crypto/vfio_ap_private.h F: drivers/s390/crypto/vfio_ap_ops.c +F: Documentation/s390/vfio-ap.txt S390 ZFCP DRIVER M: Steffen Maier <maier@linux.ibm.com>