mbox series

[v15,00/13] s390/vfio-ap: dynamic configuration support

Message ID 20210406153122.22874-1-akrowiak@linux.ibm.com (mailing list archive)
Headers show
Series s390/vfio-ap: dynamic configuration support | expand

Message

Anthony Krowiak April 6, 2021, 3:31 p.m. UTC
Note: Patch 1, "s390/vfio-ap: fix circular lockdep when setting/clearing
      crypto masks" does not belong to this series. It is currently
      being merged and is included here because it is a pre-req for
      this series.

      Ignore checkpatch warnings regarding unknown commit IDs, those
      appears to be made in error.

The current design for AP pass-through does not support making dynamic
changes to the AP matrix of a running guest resulting in a few
deficiencies this patch series is intended to mitigate:

1. Adapters, domains and control domains can not be added to or removed
   from a running guest. In order to modify a guest's AP configuration,
   the guest must be terminated; only then can AP resources be assigned
   to or unassigned from the guest's matrix mdev. The new AP
   configuration becomes available to the guest when it is subsequently
   restarted.

2. The AP bus's /sys/bus/ap/apmask and /sys/bus/ap/aqmask interfaces can
   be modified by a root user without any restrictions. A change to
   either mask can result in AP queue devices being unbound from the
   vfio_ap device driver and bound to a zcrypt device driver even if a
   guest is using the queues, thus giving the host access to the guest's
   private crypto data and vice versa.

3. The APQNs derived from the Cartesian product of the APIDs of the
   adapters and APQIs of the domains assigned to a matrix mdev must
   reference an AP queue device bound to the vfio_ap device driver. The
   AP architecture allows assignment of AP resources that are not
   available to the system, so this artificial restriction is not
   compliant with the architecture.

4. The AP configuration profile can be dynamically changed for the linux
   host after a KVM guest is started. For example, a new domain can be
   dynamically added to the configuration profile via the SE or an HMC
   connected to a DPM enabled lpar. Likewise, AP adapters can be
   dynamically configured (online state) and deconfigured (standby state)
   using the SE, an SCLP command or an HMC connected to a DPM enabled
   lpar. This can result in inadvertent sharing of AP queues between the
   guest and host.

5. A root user can manually unbind an AP queue device representing a
   queue in use by a KVM guest via the vfio_ap device driver's sysfs
   unbind attribute. In this case, the guest will be using a queue that
   is not bound to the driver which violates the device model.

This patch series introduces the following changes to the current design
to alleviate the shortcomings described above as well as to implement
more of the AP architecture:

1. A root user will be prevented from making edits to the AP bus's
   /sys/bus/ap/apmask or /sys/bus/ap/aqmask if the change would transfer
   ownership of an APQN from the vfio_ap device driver to a zcrypt driver
   while the APQN is assigned to a matrix mdev.

2. Allow a root user to hot plug/unplug AP adapters, domains and control
   domains for a KVM guest using the matrix mdev via its sysfs
   assign/unassign attributes.

4. Allow assignment of an AP adapter or domain to a matrix mdev even if
   it results in assignment of an APQN that does not reference an AP
   queue device bound to the vfio_ap device driver, as long as the APQN
   is not reserved for use by the default zcrypt drivers (also known as
   over-provisioning of AP resources). Allowing over-provisioning of AP
   resources better models the architecture which does not preclude
   assigning AP resources that are not yet available in the system. Such
   APQNs, however, will not be assigned to the guest using the matrix
   mdev; only APQNs referencing AP queue devices bound to the vfio_ap
   device driver will actually get assigned to the guest.

5. Handle dynamic changes to the AP device model.

1. Rationale for changes to AP bus's apmask/aqmask interfaces:
----------------------------------------------------------
Due to the extremely sensitive nature of cryptographic data, it is
imperative that great care be taken to ensure that such data is secured.
Allowing a root user, either inadvertently or maliciously, to configure
these masks such that a queue is shared between the host and a guest is
not only avoidable, it is advisable. It was suggested that this scenario
is better handled in user space with management software, but that does
not preclude a malicious administrator from using the sysfs interfaces
to gain access to a guest's crypto data. It was also suggested that this
scenario could be avoided by taking access to the adapter away from the
guest and zeroing out the queues prior to the vfio_ap driver releasing the
device; however, stealing an adapter in use from a guest as a by-product
of an operation is bad and will likely cause problems for the guest
unnecessarily. It was decided that the most effective solution with the
least number of negative side effects is to prevent the situation at the
source.

2. Rationale for hot plug/unplug using matrix mdev sysfs interfaces:
----------------------------------------------------------------
Allowing a user to hot plug/unplug AP resources using the matrix mdev
sysfs interfaces circumvents the need to terminate the guest in order to
modify its AP configuration. Allowing dynamic configuration makes
reconfiguring a guest's AP matrix much less disruptive.

3. Rationale for allowing over-provisioning of AP resources:
-----------------------------------------------------------
Allowing assignment of AP resources to a matrix mdev and ultimately to a
guest better models the AP architecture. The architecture does not
preclude assignment of unavailable AP resources. If a queue subsequently
becomes available while a guest using the matrix mdev to which its APQN
is assigned, the guest will be given access to it. If an APQN
is dynamically unassigned from the underlying host system, it will
automatically become unavailable to the guest.


Change log v14-v15:
------------------
* Fixed bug: Unlink mdev from all queues when the mdev is removed.

Change log v13-v14:
------------------
* Removed patch "s390/vfio-ap: clean up vfio_ap resources when KVM pointer
  invalidated". The patch is not necessary because that is handled
  with patch 1 of this series (currently being merged) and
  commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated")

* Removed patch "s390/vfio-ap: No need to disable IRQ after queue reset",
  that has already been merged with
  commit 6c12a6384e0c ("s390/vfio-ap: No need to disable IRQ after queue reset").

* Initialize the vfio_ap_queue object before setting the drvdata in
  the probe callback

* Change return code from mdev assignment interfaces to -EAGAIN when
  mutex_trylock fails for the mdev lock.

* Restored missing hunk from v12 in the group notifier callback, but
  had to restore it to the vfio_ap_mdev_set_kvm() function due to
  changes made via merged commits noted above.

* Reordered patch "s390/vfio-ap: sysfs attribute to display the
  guest's matrix" to follow the patches that modify the shadow
  APCB.

* Remove queue from APCB before resetting it in the remove
  callback.

* Split the vfio_ap_mdev_unlink_queue() function into two
  functions: one to remove the link from the matrix mdev to
  the queue; and, one to remove the link from the queue to the matrix
  mdev.

* Removed the QCI call and the shadow_apcb memcpy from the
  vfio_ap_mdev_filter_apcb() function.

* Do not clear shadow_apcb when there are not adapters or domains
  assigned.

* Moved filtering code from "s390/vfio-ap: allow hot plug/unplug of
  AP resources using mdev device" into its own patch.

* Squashed the two patches comprising the handling of changes to
  the AP configuration into one patch.

* Added code to delay hot plug during probe until the AP bus scan
  is complete if the APID of the queue is in the bitmap of adapters
  currently being added to the AP configuration.

Change log v12-v13:
------------------
* Combined patches 12/13 from previous series into one patch

* Moved all changes for linking queues and mdevs into a single patch

* Re-ordered some patches to aid in review

* Using mutex_trylock() function in adapter/domain assignment functions
  to avoid potential deadlock condition with in_use callback

* Using filtering function for refreshing the guest's APCB for all events
  that change the APCB: assign/unassign adapters, domains, control domains;
  bind/unbind of queue devices; and, changes to the host AP configuration.

Change log v11-v12:
------------------
* Moved matrix device lock to protect group notifier callback

* Split the 'No need to disable IRQ after queue reset' patch into
  multiple patches for easier review (move probe/remove callback
  functions and remove disable IRQ after queue reset)

* Added code to decrement reference count for KVM in group notifier
  callback

* Using mutex_trylock() in functions implementing the sysfs assign_adapter
  and assign_domain as well as the in_use callback to avoid deadlock
  between the AP bus's ap_perms mutex and the matrix device lock used by
  vfio_ap driver.

* The sysfs guest_matrix attribute of the vfio_ap mdev will now display
  the shadow APCB regardless of whether a guest is using the mdev or not

* Replaced vfio_ap mdev filtering function with a function that initializes
  the guest's APCB by filtering the vfio_ap mdev by APID.

* No longer using filtering function during adapter/domain assignment
  to/from the vfio_ap mdev; replaced with new hot plug/unplug
  adapter/domain functions.

* No longer using filtering function during bind/unbind; replaced with
  hot plug/unplug queue functions.

* No longer using filtering function for bulk assignment of new adapters
  and domains in on_scan_complete callback; replaced with new hot plug
  functions.


Change log v10-v11:
------------------
* The matrix mdev's configuration is not filtered by APID so that if any
  APQN assigned to the mdev is not bound to the vfio_ap device driver,
  the adapter will not get plugged into the KVM guest on startup, or when
  a new adapter is assigned to the mdev.

* Replaced patch 8 by squashing patches 8 (filtering patch) and 15 (handle
  probe/remove).

* Added a patch 1 to remove disable IRQ after a reset because the reset
  already disables a queue.

* Now using filtering code to update the KVM guest's matrix when
  notified that AP bus scan has completed.

* Fixed issue with probe/remove not inititiated by a configuration change
  occurring within a config change.


Change log v9-v10:
-----------------
* Updated the documentation in vfio-ap.rst to include information about the
  AP dynamic configuration support

Change log v8-v9:
----------------
* Fixed errors flagged by the kernel test robot

* Fixed issue with guest losing queues when a new queue is probed due to
  manual bind operation.

Change log v7-v8:
----------------
* Now logging a message when an attempt to reserve APQNs for the zcrypt
  drivers will result in taking a queue away from a KVM guest to provide
  the sysadmin a way to ascertain why the sysfs operation failed.

* Created locked and unlocked versions of the ap_parse_mask_str() function.

* Now using new interface provided by an AP bus patch -
  s390/ap: introduce new ap function ap_get_qdev() - to retrieve
  struct ap_queue representing an AP queue device. This patch is not a
  part of this series but is a prerequisite for this series.

Change log v6-v7:
----------------
* Added callbacks to AP bus:
  - on_config_changed: Notifies implementing drivers that
    the AP configuration has changed since last AP device scan.
  - on_scan_complete: Notifies implementing drivers that the device scan
    has completed.
  - implemented on_config_changed and on_scan_complete callbacks for
    vfio_ap device driver.
  - updated vfio_ap device driver's probe and remove callbacks to handle
    dynamic changes to the AP device model.
* Added code to filter APQNs when assigning AP resources to a KVM guest's
  CRYCB

Change log v5-v6:
----------------
* Fixed a bug in ap_bus.c introduced with patch 2/7 of the v5
  series. Harald Freudenberer pointed out that the mutex lock
  for ap_perms_mutex in the apmask_store and aqmask_store functions
  was not being freed.

* Removed patch 6/7 which added logging to the vfio_ap driver
  to expedite acceptance of this series. The logging will be introduced
  with a separate patch series to allow more time to explore options
  such as DBF logging vs. tracepoints.

* Added 3 patches related to ensuring that APQNs that do not reference
  AP queue devices bound to the vfio_ap device driver are not assigned
  to the guest CRYCB:

  Patch 4: Filter CRYCB bits for unavailable queue devices
  Patch 5: sysfs attribute to display the guest CRYCB
  Patch 6: update guest CRYCB in vfio_ap probe and remove callbacks

* Added a patch (Patch 9) to version the vfio_ap module.

* Reshuffled patches to allow the in_use callback implementation to
  invoke the vfio_ap_mdev_verify_no_sharing() function introduced in
  patch 2.

Change log v4-v5:
----------------
* Added a patch to provide kernel s390dbf debug logs for VFIO AP

Change log v3->v4:
-----------------
* Restored patches preventing root user from changing ownership of
  APQNs from zcrypt drivers to the vfio_ap driver if the APQN is
  assigned to an mdev.

* No longer enforcing requirement restricting guest access to
  queues represented by a queue device bound to the vfio_ap
  device driver.

* Removed shadow CRYCB and now directly updating the guest CRYCB
  from the matrix mdev's matrix.

* Rebased the patch series on top of 'vfio: ap: AP Queue Interrupt
  Control' patches.

* Disabled bind/unbind sysfs interfaces for vfio_ap driver

Change log v2->v3:
-----------------
* Allow guest access to an AP queue only if the queue is bound to
  the vfio_ap device driver.

* Removed the patch to test CRYCB masks before taking the vCPUs
  out of SIE. Now checking the shadow CRYCB in the vfio_ap driver.

Change log v1->v2:
-----------------
* Removed patches preventing root user from unbinding AP queues from
  the vfio_ap device driver
* Introduced a shadow CRYCB in the vfio_ap driver to manage dynamic
  changes to the AP guest configuration due to root user interventions
  or hardware anomalies.

Tony Krowiak (13):
  s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
  s390/vfio-ap: use new AP bus interface to search for queue devices
  s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
  s390/vfio-ap: manage link between queue struct and matrix mdev
  s390/vfio-ap: introduce shadow APCB
  s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
  s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
  s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
  s390/zcrypt: driver callback to indicate resource in use
  s390/vfio-ap: implement in-use callback for vfio_ap driver
  s390/vfio-ap: sysfs attribute to display the guest's matrix
  s390/zcrypt: notify drivers on config changed and scan complete
    callbacks
  s390/vfio-ap: update docs to include dynamic config support

 Documentation/s390/vfio-ap.rst        |  383 ++++++---
 drivers/s390/crypto/ap_bus.c          |  249 +++++-
 drivers/s390/crypto/ap_bus.h          |   16 +
 drivers/s390/crypto/vfio_ap_drv.c     |   46 +-
 drivers/s390/crypto/vfio_ap_ops.c     | 1107 ++++++++++++++++++-------
 drivers/s390/crypto/vfio_ap_private.h |   29 +-
 6 files changed, 1364 insertions(+), 466 deletions(-)

--
2.21.3

Comments

Halil Pasic April 8, 2021, 8:38 p.m. UTC | #1
On Tue,  6 Apr 2021 11:31:09 -0400
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Tony Krowiak (13):
>   s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks

The subsequent patches, re introduce this circular locking dependency
problem. See my kernel messages for the details. The link we severe
in the above patch is re-introduced at several places. One of them is
assign_adapter_store().

Regards,
Halil

[  +0.000236] vfio_ap matrix: MDEV: Registered
[  +0.037919] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Adding to iommu group 1
[  +0.000092] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: group_id = 1

[Apr 8 22:31] ======================================================
[  +0.000002] WARNING: possible circular locking dependency detected
[  +0.000002] 5.12.0-rc6-00016-g5bea90816c56 #57 Not tainted
[  +0.000002] ------------------------------------------------------
[  +0.000002] CPU 1/KVM/6651 is trying to acquire lock:
[  +0.000002] 00000000cef9d508 (&matrix_dev->lock){+.+.}-{3:3}, at: handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.000011] 
              but task is already holding lock:
[  +0.000001] 00000000d41f4308 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x898 [kvm]
[  +0.000038] 
              which lock already depends on the new lock.

[  +0.000002] 
              the existing dependency chain (in reverse order) is:
[  +0.000001] 
              -> #2 (&vcpu->mutex){+.+.}-{3:3}:
[  +0.000004]        validate_chain+0x796/0xa20
[  +0.000006]        __lock_acquire+0x420/0x7c8
[  +0.000003]        lock_acquire.part.0+0xec/0x1e8
[  +0.000002]        lock_acquire+0xb8/0x208
[  +0.000002]        __mutex_lock+0xa2/0x928
[  +0.000005]        mutex_lock_nested+0x32/0x40
[  +0.000002]        kvm_s390_cpus_to_pv+0x4e/0xf8 [kvm]
[  +0.000019]        kvm_s390_handle_pv+0x1ce/0x6b0 [kvm]
[  +0.000018]        kvm_arch_vm_ioctl+0x3ec/0x550 [kvm]
[  +0.000019]        kvm_vm_ioctl+0x40e/0x4a8 [kvm]
[  +0.000018]        __s390x_sys_ioctl+0xc0/0x100
[  +0.000004]        do_syscall+0x7e/0xd0
[  +0.000043]        __do_syscall+0xc0/0xd8
[  +0.000004]        system_call+0x72/0x98
[  +0.000004] 
              -> #1 (&kvm->lock){+.+.}-{3:3}:
[  +0.000004]        validate_chain+0x796/0xa20
[  +0.000002]        __lock_acquire+0x420/0x7c8
[  +0.000002]        lock_acquire.part.0+0xec/0x1e8
[  +0.000002]        lock_acquire+0xb8/0x208
[  +0.000003]        __mutex_lock+0xa2/0x928
[  +0.000002]        mutex_lock_nested+0x32/0x40
[  +0.000002]        kvm_arch_crypto_set_masks+0x4a/0x2b8 [kvm]
[  +0.000018]        vfio_ap_mdev_refresh_apcb+0xd0/0xe0 [vfio_ap]
[  +0.000003]        assign_adapter_store+0x1f2/0x240 [vfio_ap]
[  +0.000003]        kernfs_fop_write_iter+0x13e/0x1e0
[  +0.000003]        new_sync_write+0x10a/0x198
[  +0.000003]        vfs_write.part.0+0x196/0x290
[  +0.000002]        ksys_write+0x6c/0xf8
[  +0.000003]        do_syscall+0x7e/0xd0
[  +0.000002]        __do_syscall+0xc0/0xd8
[  +0.000003]        system_call+0x72/0x98
[  +0.000002] 
              -> #0 (&matrix_dev->lock){+.+.}-{3:3}:
[  +0.000004]        check_noncircular+0x16e/0x190
[  +0.000002]        check_prev_add+0xec/0xf38
[  +0.000002]        validate_chain+0x796/0xa20
[  +0.000002]        __lock_acquire+0x420/0x7c8
[  +0.000002]        lock_acquire.part.0+0xec/0x1e8
[  +0.000002]        lock_acquire+0xb8/0x208
[  +0.000002]        __mutex_lock+0xa2/0x928
[  +0.000002]        mutex_lock_nested+0x32/0x40
[  +0.000003]        handle_pqap+0x56/0x1c8 [vfio_ap]
[  +0.000002]        handle_pqap+0xe2/0x1d8 [kvm]
[  +0.000019]        kvm_handle_sie_intercept+0x134/0x248 [kvm]
[  +0.000019]        vcpu_post_run+0x2b6/0x580 [kvm]
[  +0.000018]        __vcpu_run+0x27e/0x388 [kvm]
[  +0.000019]        kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm]
[  +0.000018]        kvm_vcpu_ioctl+0x2cc/0x898 [kvm]
[  +0.000018]        __s390x_sys_ioctl+0xc0/0x100
[  +0.000003]        do_syscall+0x7e/0xd0
[  +0.000002]        __do_syscall+0xc0/0xd8
[  +0.000002]        system_call+0x72/0x98
[  +0.000003] 
              other info that might help us debug this:

[  +0.000001] Chain exists of:
                &matrix_dev->lock --> &kvm->lock --> &vcpu->mutex

[  +0.000005]  Possible unsafe locking scenario:

[  +0.000001]        CPU0                    CPU1
[  +0.000001]        ----                    ----
[  +0.000002]   lock(&vcpu->mutex);
[  +0.000002]                                lock(&kvm->lock);
[  +0.000002]                                lock(&vcpu->mutex);
[  +0.000002]   lock(&matrix_dev->lock);
[  +0.000002] 
               *** DEADLOCK ***

[  +0.000002] 2 locks held by CPU 1/KVM/6651:
[  +0.000002]  #0: 00000000d41f4308 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x898 [kvm]
[  +0.000023]  #1: 00000000da2fc508 (&kvm->srcu){....}-{0:0}, at: __vcpu_run+0x1ec/0x388 [kvm]
[  +0.000021] 
              stack backtrace:
[  +0.000002] CPU: 6 PID: 6651 Comm: CPU 1/KVM Not tainted 5.12.0-rc6-00016-g5bea90816c56 #57
[  +0.000004] Hardware name: IBM 8561 T01 701 (LPAR)
[  +0.000001] Call Trace:
[  +0.000002]  [<00000002010e7ef0>] show_stack+0x90/0xf8 
[  +0.000007]  [<00000002010fb5b2>] dump_stack+0xba/0x108 
[  +0.000002]  [<000000020053feb6>] check_noncircular+0x16e/0x190 
[  +0.000003]  [<0000000200541424>] check_prev_add+0xec/0xf38 
[  +0.000002]  [<0000000200542a06>] validate_chain+0x796/0xa20 
[  +0.000003]  [<0000000200545430>] __lock_acquire+0x420/0x7c8 
[  +0.000002]  [<00000002005441a4>] lock_acquire.part.0+0xec/0x1e8 
[  +0.000002]  [<0000000200544358>] lock_acquire+0xb8/0x208 
[  +0.000003]  [<000000020110aeea>] __mutex_lock+0xa2/0x928 
[  +0.000002]  [<000000020110b7a2>] mutex_lock_nested+0x32/0x40 
[  +0.000003]  [<000003ff8060fb5e>] handle_pqap+0x56/0x1c8 [vfio_ap] 
[  +0.000003]  [<000003ff80597412>] handle_pqap+0xe2/0x1d8 [kvm] 
[  +0.000018]  [<000003ff8058c924>] kvm_handle_sie_intercept+0x134/0x248 [kvm] 
[  +0.000020]  [<000003ff80588e96>] vcpu_post_run+0x2b6/0x580 [kvm] 
[  +0.000019]  [<000003ff805893de>] __vcpu_run+0x27e/0x388 [kvm] 
[  +0.000018]  [<000003ff80589d0a>] kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm] 
[  +0.000019]  [<000003ff805704d4>] kvm_vcpu_ioctl+0x2cc/0x898 [kvm] 
[  +0.000019]  [<0000000200801ee8>] __s390x_sys_ioctl+0xc0/0x100 
[  +0.000003]  [<000000020046e7ae>] do_syscall+0x7e/0xd0 
[  +0.000003]  [<00000002010ffc20>] __do_syscall+0xc0/0xd8 
[  +0.000002]  [<0000000201110c42>] system_call+0x72/0x98 
[  +0.000003] INFO: lockdep is turned off.
[  +6.846296] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Removing from iommu group 1
[  +0.000028] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: detaching iommu
[  +0.007677] vfio_ap matrix: MDEV: Unregistering


>   s390/vfio-ap: use new AP bus interface to search for queue devices
>   s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
>   s390/vfio-ap: manage link between queue struct and matrix mdev
>   s390/vfio-ap: introduce shadow APCB
>   s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
>   s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
>   s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
>   s390/zcrypt: driver callback to indicate resource in use
>   s390/vfio-ap: implement in-use callback for vfio_ap driver
>   s390/vfio-ap: sysfs attribute to display the guest's matrix
>   s390/zcrypt: notify drivers on config changed and scan complete
>     callbacks
>   s390/vfio-ap: update docs to include dynamic config support
Anthony Krowiak April 9, 2021, 1:26 p.m. UTC | #2
On 4/8/21 4:38 PM, Halil Pasic wrote:
> On Tue,  6 Apr 2021 11:31:09 -0400
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Tony Krowiak (13):
>>    s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
> The subsequent patches, re introduce this circular locking dependency
> problem. See my kernel messages for the details. The link we severe
> in the above patch is re-introduced at several places. One of them is
> assign_adapter_store().

Like in the patch referenced above, the lockdep splat occurs when
the APCB masks are set which requires acquisition of the kvm lock.
Patch 08/13, allow hot plug/unplug of AP resources using mdev,
introduces code that updates the APCB masks whenever an
adapter, domain or control domain is assigned or unassigned
as well as when a queue device is probed or removed.
I think the solution from the patch above can be implemented
here to resolve this problem.

>
> Regards,
> Halil
>
> [  +0.000236] vfio_ap matrix: MDEV: Registered
> [  +0.037919] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Adding to iommu group 1
> [  +0.000092] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: group_id = 1
>
> [Apr 8 22:31] ======================================================
> [  +0.000002] WARNING: possible circular locking dependency detected
> [  +0.000002] 5.12.0-rc6-00016-g5bea90816c56 #57 Not tainted
> [  +0.000002] ------------------------------------------------------
> [  +0.000002] CPU 1/KVM/6651 is trying to acquire lock:
> [  +0.000002] 00000000cef9d508 (&matrix_dev->lock){+.+.}-{3:3}, at: handle_pqap+0x56/0x1c8 [vfio_ap]
> [  +0.000011]
>                but task is already holding lock:
> [  +0.000001] 00000000d41f4308 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x898 [kvm]
> [  +0.000038]
>                which lock already depends on the new lock.
>
> [  +0.000002]
>                the existing dependency chain (in reverse order) is:
> [  +0.000001]
>                -> #2 (&vcpu->mutex){+.+.}-{3:3}:
> [  +0.000004]        validate_chain+0x796/0xa20
> [  +0.000006]        __lock_acquire+0x420/0x7c8
> [  +0.000003]        lock_acquire.part.0+0xec/0x1e8
> [  +0.000002]        lock_acquire+0xb8/0x208
> [  +0.000002]        __mutex_lock+0xa2/0x928
> [  +0.000005]        mutex_lock_nested+0x32/0x40
> [  +0.000002]        kvm_s390_cpus_to_pv+0x4e/0xf8 [kvm]
> [  +0.000019]        kvm_s390_handle_pv+0x1ce/0x6b0 [kvm]
> [  +0.000018]        kvm_arch_vm_ioctl+0x3ec/0x550 [kvm]
> [  +0.000019]        kvm_vm_ioctl+0x40e/0x4a8 [kvm]
> [  +0.000018]        __s390x_sys_ioctl+0xc0/0x100
> [  +0.000004]        do_syscall+0x7e/0xd0
> [  +0.000043]        __do_syscall+0xc0/0xd8
> [  +0.000004]        system_call+0x72/0x98
> [  +0.000004]
>                -> #1 (&kvm->lock){+.+.}-{3:3}:
> [  +0.000004]        validate_chain+0x796/0xa20
> [  +0.000002]        __lock_acquire+0x420/0x7c8
> [  +0.000002]        lock_acquire.part.0+0xec/0x1e8
> [  +0.000002]        lock_acquire+0xb8/0x208
> [  +0.000003]        __mutex_lock+0xa2/0x928
> [  +0.000002]        mutex_lock_nested+0x32/0x40
> [  +0.000002]        kvm_arch_crypto_set_masks+0x4a/0x2b8 [kvm]
> [  +0.000018]        vfio_ap_mdev_refresh_apcb+0xd0/0xe0 [vfio_ap]
> [  +0.000003]        assign_adapter_store+0x1f2/0x240 [vfio_ap]
> [  +0.000003]        kernfs_fop_write_iter+0x13e/0x1e0
> [  +0.000003]        new_sync_write+0x10a/0x198
> [  +0.000003]        vfs_write.part.0+0x196/0x290
> [  +0.000002]        ksys_write+0x6c/0xf8
> [  +0.000003]        do_syscall+0x7e/0xd0
> [  +0.000002]        __do_syscall+0xc0/0xd8
> [  +0.000003]        system_call+0x72/0x98
> [  +0.000002]
>                -> #0 (&matrix_dev->lock){+.+.}-{3:3}:
> [  +0.000004]        check_noncircular+0x16e/0x190
> [  +0.000002]        check_prev_add+0xec/0xf38
> [  +0.000002]        validate_chain+0x796/0xa20
> [  +0.000002]        __lock_acquire+0x420/0x7c8
> [  +0.000002]        lock_acquire.part.0+0xec/0x1e8
> [  +0.000002]        lock_acquire+0xb8/0x208
> [  +0.000002]        __mutex_lock+0xa2/0x928
> [  +0.000002]        mutex_lock_nested+0x32/0x40
> [  +0.000003]        handle_pqap+0x56/0x1c8 [vfio_ap]
> [  +0.000002]        handle_pqap+0xe2/0x1d8 [kvm]
> [  +0.000019]        kvm_handle_sie_intercept+0x134/0x248 [kvm]
> [  +0.000019]        vcpu_post_run+0x2b6/0x580 [kvm]
> [  +0.000018]        __vcpu_run+0x27e/0x388 [kvm]
> [  +0.000019]        kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm]
> [  +0.000018]        kvm_vcpu_ioctl+0x2cc/0x898 [kvm]
> [  +0.000018]        __s390x_sys_ioctl+0xc0/0x100
> [  +0.000003]        do_syscall+0x7e/0xd0
> [  +0.000002]        __do_syscall+0xc0/0xd8
> [  +0.000002]        system_call+0x72/0x98
> [  +0.000003]
>                other info that might help us debug this:
>
> [  +0.000001] Chain exists of:
>                  &matrix_dev->lock --> &kvm->lock --> &vcpu->mutex
>
> [  +0.000005]  Possible unsafe locking scenario:
>
> [  +0.000001]        CPU0                    CPU1
> [  +0.000001]        ----                    ----
> [  +0.000002]   lock(&vcpu->mutex);
> [  +0.000002]                                lock(&kvm->lock);
> [  +0.000002]                                lock(&vcpu->mutex);
> [  +0.000002]   lock(&matrix_dev->lock);
> [  +0.000002]
>                 *** DEADLOCK ***
>
> [  +0.000002] 2 locks held by CPU 1/KVM/6651:
> [  +0.000002]  #0: 00000000d41f4308 (&vcpu->mutex){+.+.}-{3:3}, at: kvm_vcpu_ioctl+0x90/0x898 [kvm]
> [  +0.000023]  #1: 00000000da2fc508 (&kvm->srcu){....}-{0:0}, at: __vcpu_run+0x1ec/0x388 [kvm]
> [  +0.000021]
>                stack backtrace:
> [  +0.000002] CPU: 6 PID: 6651 Comm: CPU 1/KVM Not tainted 5.12.0-rc6-00016-g5bea90816c56 #57
> [  +0.000004] Hardware name: IBM 8561 T01 701 (LPAR)
> [  +0.000001] Call Trace:
> [  +0.000002]  [<00000002010e7ef0>] show_stack+0x90/0xf8
> [  +0.000007]  [<00000002010fb5b2>] dump_stack+0xba/0x108
> [  +0.000002]  [<000000020053feb6>] check_noncircular+0x16e/0x190
> [  +0.000003]  [<0000000200541424>] check_prev_add+0xec/0xf38
> [  +0.000002]  [<0000000200542a06>] validate_chain+0x796/0xa20
> [  +0.000003]  [<0000000200545430>] __lock_acquire+0x420/0x7c8
> [  +0.000002]  [<00000002005441a4>] lock_acquire.part.0+0xec/0x1e8
> [  +0.000002]  [<0000000200544358>] lock_acquire+0xb8/0x208
> [  +0.000003]  [<000000020110aeea>] __mutex_lock+0xa2/0x928
> [  +0.000002]  [<000000020110b7a2>] mutex_lock_nested+0x32/0x40
> [  +0.000003]  [<000003ff8060fb5e>] handle_pqap+0x56/0x1c8 [vfio_ap]
> [  +0.000003]  [<000003ff80597412>] handle_pqap+0xe2/0x1d8 [kvm]
> [  +0.000018]  [<000003ff8058c924>] kvm_handle_sie_intercept+0x134/0x248 [kvm]
> [  +0.000020]  [<000003ff80588e96>] vcpu_post_run+0x2b6/0x580 [kvm]
> [  +0.000019]  [<000003ff805893de>] __vcpu_run+0x27e/0x388 [kvm]
> [  +0.000018]  [<000003ff80589d0a>] kvm_arch_vcpu_ioctl_run+0x10a/0x278 [kvm]
> [  +0.000019]  [<000003ff805704d4>] kvm_vcpu_ioctl+0x2cc/0x898 [kvm]
> [  +0.000019]  [<0000000200801ee8>] __s390x_sys_ioctl+0xc0/0x100
> [  +0.000003]  [<000000020046e7ae>] do_syscall+0x7e/0xd0
> [  +0.000003]  [<00000002010ffc20>] __do_syscall+0xc0/0xd8
> [  +0.000002]  [<0000000201110c42>] system_call+0x72/0x98
> [  +0.000003] INFO: lockdep is turned off.
> [  +6.846296] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: Removing from iommu group 1
> [  +0.000028] vfio_mdev 4f77ad87-1e62-4959-8b7a-c677c98d2194: MDEV: detaching iommu
> [  +0.007677] vfio_ap matrix: MDEV: Unregistering
>
>
>>    s390/vfio-ap: use new AP bus interface to search for queue devices
>>    s390/vfio-ap: move probe and remove callbacks to vfio_ap_ops.c
>>    s390/vfio-ap: manage link between queue struct and matrix mdev
>>    s390/vfio-ap: introduce shadow APCB
>>    s390/vfio-ap: refresh guest's APCB by filtering APQNs assigned to mdev
>>    s390/vfio-ap: allow assignment of unavailable AP queues to mdev device
>>    s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device
>>    s390/zcrypt: driver callback to indicate resource in use
>>    s390/vfio-ap: implement in-use callback for vfio_ap driver
>>    s390/vfio-ap: sysfs attribute to display the guest's matrix
>>    s390/zcrypt: notify drivers on config changed and scan complete
>>      callbacks
>>    s390/vfio-ap: update docs to include dynamic config support