diff mbox series

[v12,12/17] s390/vfio-ap: allow hot plug/unplug of AP resources using mdev device

Message ID 20201124214016.3013-13-akrowiak@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series s390/vfio-ap: dynamic configuration support | expand

Commit Message

Anthony Krowiak Nov. 24, 2020, 9:40 p.m. UTC
Let's hot plug/unplug adapters, domains and control domains assigned to or
unassigned from an AP matrix mdev device while it is in use by a guest per
the following rules:

* Assign an adapter to mdev's matrix:

  The adapter will be hot plugged into the guest under the following
  conditions:
  1. The adapter is not yet assigned to the guest's matrix
  2. At least one domain is assigned to the guest's matrix
  3. Each APQN derived from the APID of the newly assigned adapter and
     the APQIs of the domains already assigned to the guest's
     matrix references a queue device bound to the vfio_ap device driver.

  The adapter and each domain assigned to the mdev's matrix will be hot
  plugged into the guest under the following conditions:
  1. The adapter is not yet assigned to the guest's matrix
  2. No domains are assigned to the guest's matrix
  3  At least one domain is assigned to the mdev's matrix
  4. Each APQN derived from the APID of the newly assigned adapter and
     the APQIs of the domains assigned to the mdev's matrix references a
     queue device bound to the vfio_ap device driver.

* Unassign an adapter from mdev's matrix:

  The adapter will be hot unplugged from the KVM guest if it is
  assigned to the guest's matrix.

* Assign a domain to mdev's matrix:

  The domain will be hot plugged into the guest under the following
  conditions:
  1. The domain is not yet assigned to the guest's matrix
  2. At least one adapter is assigned to the guest's matrix
  3. Each APQN derived from the APQI of the newly assigned domain and
     the APIDs of the adapters already assigned to the guest's
     matrix references a queue device bound to the vfio_ap device driver.

  The domain and each adapter assigned to the mdev's matrix will be hot
  plugged into the guest under the following conditions:
  1. The domain is not yet assigned to the guest's matrix
  2. No adapters are assigned to the guest's matrix
  3  At least one adapter is assigned to the mdev's matrix
  4. Each APQN derived from the APQI of the newly assigned domain and
     the APIDs of the adapters assigned to the mdev's matrix references a
     queue device bound to the vfio_ap device driver.

* Unassign adapter from mdev's matrix:

  The domain will be hot unplugged from the KVM guest if it is
  assigned to the guest's matrix.

* Assign a control domain:

  The control domain will be hot plugged into the KVM guest if it is not
  assigned to the guest's APCB. The AP architecture ensures a guest will
  only get access to the control domain if it is in the host's AP
  configuration, so there is no risk in hot plugging it; however, it will
  become automatically available to the guest when it is added to the host
  configuration.

* Unassign a control domain:

  The control domain will be hot unplugged from the KVM guest if it is
  assigned to the guest's APCB.

Note: Now that hot plug/unplug is implemented, there is the possibility
      that an assignment/unassignment of an adapter, domain or control
      domain could be initiated while the guest is starting, so the
      matrix device lock will be taken for the group notification callback
      that initializes the guest's APCB when the KVM pointer is made
      available to the vfio_ap device driver.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
 1 file changed, 159 insertions(+), 31 deletions(-)

Comments

Halil Pasic Nov. 29, 2020, 1:52 a.m. UTC | #1
On Tue, 24 Nov 2020 16:40:11 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> Let's hot plug/unplug adapters, domains and control domains assigned to or
> unassigned from an AP matrix mdev device while it is in use by a guest per
> the following rules:
> 
> * Assign an adapter to mdev's matrix:
> 
>   The adapter will be hot plugged into the guest under the following
>   conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. At least one domain is assigned to the guest's matrix
>   3. Each APQN derived from the APID of the newly assigned adapter and
>      the APQIs of the domains already assigned to the guest's
>      matrix references a queue device bound to the vfio_ap device driver.
> 
>   The adapter and each domain assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The adapter is not yet assigned to the guest's matrix
>   2. No domains are assigned to the guest's matrix
>   3  At least one domain is assigned to the mdev's matrix
>   4. Each APQN derived from the APID of the newly assigned adapter and
>      the APQIs of the domains assigned to the mdev's matrix references a
>      queue device bound to the vfio_ap device driver.
> 
> * Unassign an adapter from mdev's matrix:
> 
>   The adapter will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a domain to mdev's matrix:
> 
>   The domain will be hot plugged into the guest under the following
>   conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. At least one adapter is assigned to the guest's matrix
>   3. Each APQN derived from the APQI of the newly assigned domain and
>      the APIDs of the adapters already assigned to the guest's
>      matrix references a queue device bound to the vfio_ap device driver.
> 
>   The domain and each adapter assigned to the mdev's matrix will be hot
>   plugged into the guest under the following conditions:
>   1. The domain is not yet assigned to the guest's matrix
>   2. No adapters are assigned to the guest's matrix
>   3  At least one adapter is assigned to the mdev's matrix
>   4. Each APQN derived from the APQI of the newly assigned domain and
>      the APIDs of the adapters assigned to the mdev's matrix references a
>      queue device bound to the vfio_ap device driver.
> 
> * Unassign adapter from mdev's matrix:
> 
>   The domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's matrix.
> 
> * Assign a control domain:
> 
>   The control domain will be hot plugged into the KVM guest if it is not
>   assigned to the guest's APCB. The AP architecture ensures a guest will
>   only get access to the control domain if it is in the host's AP
>   configuration, so there is no risk in hot plugging it; however, it will
>   become automatically available to the guest when it is added to the host
>   configuration.
> 
> * Unassign a control domain:
> 
>   The control domain will be hot unplugged from the KVM guest if it is
>   assigned to the guest's APCB.

This is where things start getting tricky. E.g. do we need to revise
filtering after an unassign? (For example an assign_adapter X didn't
change the shadow, because queue XY was missing, but now we unplug domain
Y. Should the adapter X pop up? I guess it should.)


> 
> Note: Now that hot plug/unplug is implemented, there is the possibility
>       that an assignment/unassignment of an adapter, domain or control
>       domain could be initiated while the guest is starting, so the
>       matrix device lock will be taken for the group notification callback
>       that initializes the guest's APCB when the KVM pointer is made
>       available to the vfio_ap device driver.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>  1 file changed, 159 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index 586ec5776693..4f96b7861607 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>  	}
>  }
>  
> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long apid)
> +{
> +	unsigned long apqi, apqn;
> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> +
> +	/*
> +	 * If the APID is already assigned to the guest's shadow APCB, there is
> +	 * no need to assign it.
> +	 */
> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> +		return false;
> +
> +	/*
> +	 * If no domains have yet been assigned to the shadow APCB and one or
> +	 * more domains have been assigned to the matrix mdev, then use
> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
> +	 * to assign to the shadow APCB.
> +	 */
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> +			return false;
> +
> +		aqm = matrix_mdev->matrix.aqm;
> +	}
> +
> +	/* Make sure all APQNs are bound to the vfio_ap driver */
> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> +		apqn = AP_MKQID(apid, apqi);
> +
> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> +			return false;
> +	}
> +
> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +
> +	/*
> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
> +	 * then copy the APQIs of those domains into the guest's APCB
> +	 */
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
> +
> +	return true;
> +}

What is the rationale behind the shadow aqm empty special handling? I.e.
why not simply:


static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,     
                                        unsigned long apid)                     
{                                                                               
        unsigned long apqi, apqn;                                               
        unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;                      
                                                                                
        /*                                                                      
         * If the APID is already assigned to the guest's shadow APCB, there is 
         * no need to assign it.                                                
         */                                                                     
        if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))                   
                return false;                                                   
                                                                                
        /* Make sure all APQNs are bound to the vfio_ap driver */               
        for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {                           
                apqn = AP_MKQID(apid, apqi);                                    
                                                                                
                if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)          
                        return false;                                           
        }                                                                       
                                                                                
        set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);                        
                                                                                
        return true;                                                            
} 

Please answer the questions I've asked, and note that I will have to
return to this patch, later.

Regards,
Halil

> +
> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
> +					  unsigned long apid)
> +{
> +	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +}
> +
>  /**
>   * assign_adapter_store
>   *
> @@ -673,10 +727,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -698,12 +748,22 @@ static ssize_t assign_adapter_store(struct device *dev,
>  	}
>  	set_bit_inv(apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
> +	vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_adapter);
>  
> +static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
> +					    unsigned long apid)
> +{
> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
> +		clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  /**
>   * unassign_adapter_store
>   *
> @@ -730,10 +790,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of adapter */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apid);
>  	if (ret)
>  		return ret;
> @@ -744,12 +800,67 @@ static ssize_t unassign_adapter_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>  	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
> +	vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(unassign_adapter);
>  
> +static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> +					unsigned long apqi)
> +{
> +	unsigned long apid, apqn;
> +	unsigned long *apm = matrix_mdev->shadow_apcb.apm;
> +
> +	/*
> +	 * If the APQI is already assigned to the guest's shadow APCB, there is
> +	 * no need to assign it.
> +	 */
> +	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
> +		return false;
> +
> +	/*
> +	 * If no adapters have yet been assigned to the shadow APCB and one or
> +	 * more adapters have been assigned to the matrix mdev, then use
> +	 * the adapters assigned to the matrix mdev; otherwise, there is nothing
> +	 * to assign to the shadow APCB.
> +	 */
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
> +		if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
> +			return false;
> +
> +		apm = matrix_mdev->matrix.apm;
> +	}
> +
> +	/* Make sure all APQNs are bound to the vfio_ap driver */
> +	for_each_set_bit_inv(apid, apm, AP_DEVICES) {
> +		apqn = AP_MKQID(apid, apqi);
> +
> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> +			return false;
> +	}
> +
> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +
> +	/*
> +	 * If we verified APQNs using the adapters assigned to the matrix mdev,
> +	 * then copy the APIDs of those adapters into the guest's APCB
> +	 */
> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
> +		bitmap_copy(matrix_mdev->shadow_apcb.apm,
> +			    matrix_mdev->matrix.apm, AP_DEVICES);
> +
> +	return true;
> +}
> +
> +static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
> +					 unsigned long apqi)
> +{
> +	if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +}
> +
>  /**
>   * assign_domain_store
>   *
> @@ -793,10 +904,6 @@ static ssize_t assign_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>  
> -	/* If the guest is running, disallow assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -817,12 +924,21 @@ static ssize_t assign_domain_store(struct device *dev,
>  	}
>  	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
> +	vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_domain);
>  
> +static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
> +					   unsigned long apqi)
> +{
> +	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
> +		clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
>  
>  /**
>   * unassign_domain_store
> @@ -850,10 +966,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow un-assignment of domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &apqi);
>  	if (ret)
>  		return ret;
> @@ -864,12 +976,22 @@ static ssize_t unassign_domain_store(struct device *dev,
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>  	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
> +	vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(unassign_domain);
>  
> +static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
> +					     unsigned long domid)
> +{
> +	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> +		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  /**
>   * assign_control_domain_store
>   *
> @@ -895,10 +1017,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	struct mdev_device *mdev = mdev_from_dev(dev);
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  
> -	/* If the guest is running, disallow assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &id);
>  	if (ret)
>  		return ret;
> @@ -914,12 +1032,23 @@ static ssize_t assign_control_domain_store(struct device *dev,
>  	if (!mutex_trylock(&matrix_dev->lock))
>  		return -EBUSY;
>  	set_bit_inv(id, matrix_mdev->matrix.adm);
> +	vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
>  }
>  static DEVICE_ATTR_WO(assign_control_domain);
>  
> +static void
> +vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
> +				   unsigned long domid)
> +{
> +	if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
> +		clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> +	}
> +}
> +
>  /**
>   * unassign_control_domain_store
>   *
> @@ -946,10 +1075,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>  	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
>  
> -	/* If the guest is running, disallow un-assignment of control domain */
> -	if (matrix_mdev->kvm)
> -		return -EBUSY;
> -
>  	ret = kstrtoul(buf, 0, &domid);
>  	if (ret)
>  		return ret;
> @@ -958,6 +1083,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>  
>  	mutex_lock(&matrix_dev->lock);
>  	clear_bit_inv(domid, matrix_mdev->matrix.adm);
> +	vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
>  	mutex_unlock(&matrix_dev->lock);
>  
>  	return count;
> @@ -1099,8 +1225,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  {
>  	struct ap_matrix_mdev *m;
>  
> -	mutex_lock(&matrix_dev->lock);
> -
>  	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>  		if ((m != matrix_mdev) && (m->kvm == kvm)) {
>  			mutex_unlock(&matrix_dev->lock);
> @@ -1111,7 +1235,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>  	matrix_mdev->kvm = kvm;
>  	kvm_get_kvm(kvm);
>  	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
> -	mutex_unlock(&matrix_dev->lock);
>  
>  	return 0;
>  }
> @@ -1148,7 +1271,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  				       unsigned long action, void *data)
>  {
> -	int ret;
> +	int ret = NOTIFY_DONE;
>  	struct ap_matrix_mdev *matrix_mdev;
>  
>  	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
> @@ -1156,23 +1279,28 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  
>  	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>  
> +	mutex_lock(&matrix_dev->lock);
> +
>  	if (!data) {
>  		if (matrix_mdev->kvm)
>  			kvm_put_kvm(matrix_mdev->kvm);
>  
>  		matrix_mdev->kvm = NULL;
>  
> -		return NOTIFY_OK;
> +		ret = NOTIFY_OK;
> +		goto done;
>  	}
>  
>  	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>  	if (ret)
> -		return NOTIFY_DONE;
> +		goto done;
>  
>  	vfio_ap_mdev_init_apcb(matrix_mdev);
>  	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>  
> -	return NOTIFY_OK;
> +done:
> +	mutex_unlock(&matrix_dev->lock);
> +	return ret;
>  }
>  
>  static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
Anthony Krowiak Nov. 30, 2020, 7:36 p.m. UTC | #2
On 11/28/20 8:52 PM, Halil Pasic wrote:
> On Tue, 24 Nov 2020 16:40:11 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>> Let's hot plug/unplug adapters, domains and control domains assigned to or
>> unassigned from an AP matrix mdev device while it is in use by a guest per
>> the following rules:
>>
>> * Assign an adapter to mdev's matrix:
>>
>>    The adapter will be hot plugged into the guest under the following
>>    conditions:
>>    1. The adapter is not yet assigned to the guest's matrix
>>    2. At least one domain is assigned to the guest's matrix
>>    3. Each APQN derived from the APID of the newly assigned adapter and
>>       the APQIs of the domains already assigned to the guest's
>>       matrix references a queue device bound to the vfio_ap device driver.
>>
>>    The adapter and each domain assigned to the mdev's matrix will be hot
>>    plugged into the guest under the following conditions:
>>    1. The adapter is not yet assigned to the guest's matrix
>>    2. No domains are assigned to the guest's matrix
>>    3  At least one domain is assigned to the mdev's matrix
>>    4. Each APQN derived from the APID of the newly assigned adapter and
>>       the APQIs of the domains assigned to the mdev's matrix references a
>>       queue device bound to the vfio_ap device driver.
>>
>> * Unassign an adapter from mdev's matrix:
>>
>>    The adapter will be hot unplugged from the KVM guest if it is
>>    assigned to the guest's matrix.
>>
>> * Assign a domain to mdev's matrix:
>>
>>    The domain will be hot plugged into the guest under the following
>>    conditions:
>>    1. The domain is not yet assigned to the guest's matrix
>>    2. At least one adapter is assigned to the guest's matrix
>>    3. Each APQN derived from the APQI of the newly assigned domain and
>>       the APIDs of the adapters already assigned to the guest's
>>       matrix references a queue device bound to the vfio_ap device driver.
>>
>>    The domain and each adapter assigned to the mdev's matrix will be hot
>>    plugged into the guest under the following conditions:
>>    1. The domain is not yet assigned to the guest's matrix
>>    2. No adapters are assigned to the guest's matrix
>>    3  At least one adapter is assigned to the mdev's matrix
>>    4. Each APQN derived from the APQI of the newly assigned domain and
>>       the APIDs of the adapters assigned to the mdev's matrix references a
>>       queue device bound to the vfio_ap device driver.
>>
>> * Unassign adapter from mdev's matrix:
>>
>>    The domain will be hot unplugged from the KVM guest if it is
>>    assigned to the guest's matrix.
>>
>> * Assign a control domain:
>>
>>    The control domain will be hot plugged into the KVM guest if it is not
>>    assigned to the guest's APCB. The AP architecture ensures a guest will
>>    only get access to the control domain if it is in the host's AP
>>    configuration, so there is no risk in hot plugging it; however, it will
>>    become automatically available to the guest when it is added to the host
>>    configuration.
>>
>> * Unassign a control domain:
>>
>>    The control domain will be hot unplugged from the KVM guest if it is
>>    assigned to the guest's APCB.
> This is where things start getting tricky. E.g. do we need to revise
> filtering after an unassign? (For example an assign_adapter X didn't
> change the shadow, because queue XY was missing, but now we unplug domain
> Y. Should the adapter X pop up? I guess it should.)

I suppose that makes sense at the expense of making the code
more complex. It is essentially what we had in the prior version
which used the same filtering code for assignment as well as
host AP configuration changes.

>
>
>> Note: Now that hot plug/unplug is implemented, there is the possibility
>>        that an assignment/unassignment of an adapter, domain or control
>>        domain could be initiated while the guest is starting, so the
>>        matrix device lock will be taken for the group notification callback
>>        that initializes the guest's APCB when the KVM pointer is made
>>        available to the vfio_ap device driver.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> ---
>>   drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>>   1 file changed, 159 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index 586ec5776693..4f96b7861607 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>>   	}
>>   }
>>   
>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long apid)
>> +{
>> +	unsigned long apqi, apqn;
>> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>> +
>> +	/*
>> +	 * If the APID is already assigned to the guest's shadow APCB, there is
>> +	 * no need to assign it.
>> +	 */
>> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>> +		return false;
>> +
>> +	/*
>> +	 * If no domains have yet been assigned to the shadow APCB and one or
>> +	 * more domains have been assigned to the matrix mdev, then use
>> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
>> +	 * to assign to the shadow APCB.
>> +	 */
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>> +			return false;
>> +
>> +		aqm = matrix_mdev->matrix.aqm;
>> +	}
>> +
>> +	/* Make sure all APQNs are bound to the vfio_ap driver */
>> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +
>> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +
>> +	/*
>> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
>> +	 * then copy the APQIs of those domains into the guest's APCB
>> +	 */
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
>> +
>> +	return true;
>> +}
> What is the rationale behind the shadow aqm empty special handling?

The rationale was to avoid taking the VCPUs
out of SIE in order to make an update to the guest's APCB
unnecessarily. For example, suppose the guest is started
without access to any APQNs (i.e., all matrix and shadow_apcb
masks are zeros). Now suppose the administrator proceeds to
start assigning AP resources to the mdev. Let's say he starts
by assigning adapters 1 through 100. The code below will return
true indicating the shadow_apcb was updated. Consequently,
the calling code will commit the changes to the guest's
APCB. The problem there is that in order to update the guest's
VCPUs, they will have to be taken out of SIE, yet the guest will
not get access to the adapter since no domains have yet been
assigned to the APCB. Doing this 100 times - once for each
adapter 1-100 - is probably a bad idea.

>   I.e.
> why not simply:
>
>
> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>                                          unsigned long apid)
> {
>          unsigned long apqi, apqn;
>          unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>                                                                                  
>          /*
>           * If the APID is already assigned to the guest's shadow APCB, there is
>           * no need to assign it.
>           */
>          if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>                  return false;
>                                                                                  
>          /* Make sure all APQNs are bound to the vfio_ap driver */
>          for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>                  apqn = AP_MKQID(apid, apqi);
>                                                                                  
>                  if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>                          return false;
>          }
>                                                                                  
>          set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>                                                                                  
>          return true;
> }
>
> Please answer the questions I've asked, and note that I will have to
> return to this patch, later.
>
> Regards,
> Halil
>
>> +
>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>> +					  unsigned long apid)
>> +{
>> +	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +}
>> +
>>   /**
>>    * assign_adapter_store
>>    *
>> @@ -673,10 +727,6 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow assignment of adapter */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apid);
>>   	if (ret)
>>   		return ret;
>> @@ -698,12 +748,22 @@ static ssize_t assign_adapter_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
>> +	vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_adapter);
>>   
>> +static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
>> +					    unsigned long apid)
>> +{
>> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
>> +		clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +}
>> +
>>   /**
>>    * unassign_adapter_store
>>    *
>> @@ -730,10 +790,6 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow un-assignment of adapter */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apid);
>>   	if (ret)
>>   		return ret;
>> @@ -744,12 +800,67 @@ static ssize_t unassign_adapter_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
>>   	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
>> +	vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_adapter);
>>   
>> +static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>> +					unsigned long apqi)
>> +{
>> +	unsigned long apid, apqn;
>> +	unsigned long *apm = matrix_mdev->shadow_apcb.apm;
>> +
>> +	/*
>> +	 * If the APQI is already assigned to the guest's shadow APCB, there is
>> +	 * no need to assign it.
>> +	 */
>> +	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
>> +		return false;
>> +
>> +	/*
>> +	 * If no adapters have yet been assigned to the shadow APCB and one or
>> +	 * more adapters have been assigned to the matrix mdev, then use
>> +	 * the adapters assigned to the matrix mdev; otherwise, there is nothing
>> +	 * to assign to the shadow APCB.
>> +	 */
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
>> +		if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
>> +			return false;
>> +
>> +		apm = matrix_mdev->matrix.apm;
>> +	}
>> +
>> +	/* Make sure all APQNs are bound to the vfio_ap driver */
>> +	for_each_set_bit_inv(apid, apm, AP_DEVICES) {
>> +		apqn = AP_MKQID(apid, apqi);
>> +
>> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>> +			return false;
>> +	}
>> +
>> +	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +
>> +	/*
>> +	 * If we verified APQNs using the adapters assigned to the matrix mdev,
>> +	 * then copy the APIDs of those adapters into the guest's APCB
>> +	 */
>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
>> +		bitmap_copy(matrix_mdev->shadow_apcb.apm,
>> +			    matrix_mdev->matrix.apm, AP_DEVICES);
>> +
>> +	return true;
>> +}
>> +
>> +static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
>> +					 unsigned long apqi)
>> +{
>> +	if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +}
>> +
>>   /**
>>    * assign_domain_store
>>    *
>> @@ -793,10 +904,6 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
>>   
>> -	/* If the guest is running, disallow assignment of domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apqi);
>>   	if (ret)
>>   		return ret;
>> @@ -817,12 +924,21 @@ static ssize_t assign_domain_store(struct device *dev,
>>   	}
>>   	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
>> +	vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_domain);
>>   
>> +static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
>> +					   unsigned long apqi)
>> +{
>> +	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
>> +		clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +}
>>   
>>   /**
>>    * unassign_domain_store
>> @@ -850,10 +966,6 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow un-assignment of domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &apqi);
>>   	if (ret)
>>   		return ret;
>> @@ -864,12 +976,22 @@ static ssize_t unassign_domain_store(struct device *dev,
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
>>   	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
>> +	vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(unassign_domain);
>>   
>> +static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
>> +					     unsigned long domid)
>> +{
>> +	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +}
>> +
>>   /**
>>    * assign_control_domain_store
>>    *
>> @@ -895,10 +1017,6 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   	struct mdev_device *mdev = mdev_from_dev(dev);
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   
>> -	/* If the guest is running, disallow assignment of control domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &id);
>>   	if (ret)
>>   		return ret;
>> @@ -914,12 +1032,23 @@ static ssize_t assign_control_domain_store(struct device *dev,
>>   	if (!mutex_trylock(&matrix_dev->lock))
>>   		return -EBUSY;
>>   	set_bit_inv(id, matrix_mdev->matrix.adm);
>> +	vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>>   }
>>   static DEVICE_ATTR_WO(assign_control_domain);
>>   
>> +static void
>> +vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
>> +				   unsigned long domid)
>> +{
>> +	if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
>> +		clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>> +	}
>> +}
>> +
>>   /**
>>    * unassign_control_domain_store
>>    *
>> @@ -946,10 +1075,6 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>   	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>   	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
>>   
>> -	/* If the guest is running, disallow un-assignment of control domain */
>> -	if (matrix_mdev->kvm)
>> -		return -EBUSY;
>> -
>>   	ret = kstrtoul(buf, 0, &domid);
>>   	if (ret)
>>   		return ret;
>> @@ -958,6 +1083,7 @@ static ssize_t unassign_control_domain_store(struct device *dev,
>>   
>>   	mutex_lock(&matrix_dev->lock);
>>   	clear_bit_inv(domid, matrix_mdev->matrix.adm);
>> +	vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
>>   	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return count;
>> @@ -1099,8 +1225,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   {
>>   	struct ap_matrix_mdev *m;
>>   
>> -	mutex_lock(&matrix_dev->lock);
>> -
>>   	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
>>   		if ((m != matrix_mdev) && (m->kvm == kvm)) {
>>   			mutex_unlock(&matrix_dev->lock);
>> @@ -1111,7 +1235,6 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
>>   	matrix_mdev->kvm = kvm;
>>   	kvm_get_kvm(kvm);
>>   	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
>> -	mutex_unlock(&matrix_dev->lock);
>>   
>>   	return 0;
>>   }
>> @@ -1148,7 +1271,7 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
>>   static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   				       unsigned long action, void *data)
>>   {
>> -	int ret;
>> +	int ret = NOTIFY_DONE;
>>   	struct ap_matrix_mdev *matrix_mdev;
>>   
>>   	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
>> @@ -1156,23 +1279,28 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>   
>>   	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
>>   
>> +	mutex_lock(&matrix_dev->lock);
>> +
>>   	if (!data) {
>>   		if (matrix_mdev->kvm)
>>   			kvm_put_kvm(matrix_mdev->kvm);
>>   
>>   		matrix_mdev->kvm = NULL;
>>   
>> -		return NOTIFY_OK;
>> +		ret = NOTIFY_OK;
>> +		goto done;
>>   	}
>>   
>>   	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
>>   	if (ret)
>> -		return NOTIFY_DONE;
>> +		goto done;
>>   
>>   	vfio_ap_mdev_init_apcb(matrix_mdev);
>>   	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>   
>> -	return NOTIFY_OK;
>> +done:
>> +	mutex_unlock(&matrix_dev->lock);
>> +	return ret;
>>   }
>>   
>>   static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
Halil Pasic Nov. 30, 2020, 11:32 p.m. UTC | #3
On Mon, 30 Nov 2020 14:36:10 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> 
> 
> On 11/28/20 8:52 PM, Halil Pasic wrote:
[..]
> >> * Unassign adapter from mdev's matrix:
> >>
> >>    The domain will be hot unplugged from the KVM guest if it is
> >>    assigned to the guest's matrix.
> >>
> >> * Assign a control domain:
> >>
> >>    The control domain will be hot plugged into the KVM guest if it is not
> >>    assigned to the guest's APCB. The AP architecture ensures a guest will
> >>    only get access to the control domain if it is in the host's AP
> >>    configuration, so there is no risk in hot plugging it; however, it will
> >>    become automatically available to the guest when it is added to the host
> >>    configuration.
> >>
> >> * Unassign a control domain:
> >>
> >>    The control domain will be hot unplugged from the KVM guest if it is
> >>    assigned to the guest's APCB.
> > This is where things start getting tricky. E.g. do we need to revise
> > filtering after an unassign? (For example an assign_adapter X didn't
> > change the shadow, because queue XY was missing, but now we unplug domain
> > Y. Should the adapter X pop up? I guess it should.)
> 
> I suppose that makes sense at the expense of making the code
> more complex. It is essentially what we had in the prior version
> which used the same filtering code for assignment as well as
> host AP configuration changes.
> 

Will have to think about it some more. Making the user unplug and
replug an adapter because at some point it got filtered, but there
is no need to filter it does not feel right. On the other hand, I'm
afraid I'm complaining in circles. 

> >
> >
> >> Note: Now that hot plug/unplug is implemented, there is the possibility
> >>        that an assignment/unassignment of an adapter, domain or control
> >>        domain could be initiated while the guest is starting, so the
> >>        matrix device lock will be taken for the group notification callback
> >>        that initializes the guest's APCB when the KVM pointer is made
> >>        available to the vfio_ap device driver.
> >>
> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> >> ---
> >>   drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
> >>   1 file changed, 159 insertions(+), 31 deletions(-)
> >>
> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index 586ec5776693..4f96b7861607 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
> >>   	}
> >>   }
> >>   
> >> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >> +					unsigned long apid)
> >> +{
> >> +	unsigned long apqi, apqn;
> >> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >> +
> >> +	/*
> >> +	 * If the APID is already assigned to the guest's shadow APCB, there is
> >> +	 * no need to assign it.
> >> +	 */
> >> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >> +		return false;
> >> +
> >> +	/*
> >> +	 * If no domains have yet been assigned to the shadow APCB and one or
> >> +	 * more domains have been assigned to the matrix mdev, then use
> >> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
> >> +	 * to assign to the shadow APCB.
> >> +	 */
> >> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> >> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> >> +			return false;
> >> +
> >> +		aqm = matrix_mdev->matrix.aqm;
> >> +	}
> >> +
> >> +	/* Make sure all APQNs are bound to the vfio_ap driver */
> >> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >> +		apqn = AP_MKQID(apid, apqi);
> >> +
> >> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >> +			return false;
> >> +	}
> >> +
> >> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >> +
> >> +	/*
> >> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
> >> +	 * then copy the APQIs of those domains into the guest's APCB
> >> +	 */
> >> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> >> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> >> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
> >> +
> >> +	return true;
> >> +}
> > What is the rationale behind the shadow aqm empty special handling?
> 
> The rationale was to avoid taking the VCPUs
> out of SIE in order to make an update to the guest's APCB
> unnecessarily. For example, suppose the guest is started
> without access to any APQNs (i.e., all matrix and shadow_apcb
> masks are zeros). Now suppose the administrator proceeds to
> start assigning AP resources to the mdev. Let's say he starts
> by assigning adapters 1 through 100. The code below will return
> true indicating the shadow_apcb was updated. Consequently,
> the calling code will commit the changes to the guest's
> APCB. The problem there is that in order to update the guest's
> VCPUs, they will have to be taken out of SIE, yet the guest will
> not get access to the adapter since no domains have yet been
> assigned to the APCB. Doing this 100 times - once for each
> adapter 1-100 - is probably a bad idea.
>

Not yanking the VCPUs out of SIE does make a lot of sense. At least
I understand your motivation now. I will think some more about this,
but in the meanwhile, please try to answer one more question (see
below).
 
> >   I.e.
> > why not simply:
> >
> >
> > static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >                                          unsigned long apid)
> > {
> >          unsigned long apqi, apqn;
> >          unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >                                                                                  
> >          /*
> >           * If the APID is already assigned to the guest's shadow APCB, there is
> >           * no need to assign it.
> >           */
> >          if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >                  return false;
> >                                                                                  
> >          /* Make sure all APQNs are bound to the vfio_ap driver */
> >          for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >                  apqn = AP_MKQID(apid, apqi);
> >                                                                                  
> >                  if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >                          return false;
> >          }
> >                                                                                  
> >          set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >                                                                                  
> >          return true;

Would
s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
AP_DOMAINS)/ 
do the trick?

I mean if empty, then we would not commit the APCB, so we would
not take the vCPUs out of SIE -- see below.

> >> +
> >> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
> >> +					  unsigned long apid)
> >> +{
> >> +	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
> >> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
> >> +}
> >> +
[..]

Regards,
Halil
Anthony Krowiak Dec. 1, 2020, 12:18 a.m. UTC | #4
On 11/30/20 6:32 PM, Halil Pasic wrote:
> On Mon, 30 Nov 2020 14:36:10 -0500
> Tony Krowiak <akrowiak@linux.ibm.com> wrote:
>
>>
>> On 11/28/20 8:52 PM, Halil Pasic wrote:
> [..]
>>>> * Unassign adapter from mdev's matrix:
>>>>
>>>>     The domain will be hot unplugged from the KVM guest if it is
>>>>     assigned to the guest's matrix.
>>>>
>>>> * Assign a control domain:
>>>>
>>>>     The control domain will be hot plugged into the KVM guest if it is not
>>>>     assigned to the guest's APCB. The AP architecture ensures a guest will
>>>>     only get access to the control domain if it is in the host's AP
>>>>     configuration, so there is no risk in hot plugging it; however, it will
>>>>     become automatically available to the guest when it is added to the host
>>>>     configuration.
>>>>
>>>> * Unassign a control domain:
>>>>
>>>>     The control domain will be hot unplugged from the KVM guest if it is
>>>>     assigned to the guest's APCB.
>>> This is where things start getting tricky. E.g. do we need to revise
>>> filtering after an unassign? (For example an assign_adapter X didn't
>>> change the shadow, because queue XY was missing, but now we unplug domain
>>> Y. Should the adapter X pop up? I guess it should.)
>> I suppose that makes sense at the expense of making the code
>> more complex. It is essentially what we had in the prior version
>> which used the same filtering code for assignment as well as
>> host AP configuration changes.
>>
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles.
>
>>>
>>>> Note: Now that hot plug/unplug is implemented, there is the possibility
>>>>         that an assignment/unassignment of an adapter, domain or control
>>>>         domain could be initiated while the guest is starting, so the
>>>>         matrix device lock will be taken for the group notification callback
>>>>         that initializes the guest's APCB when the KVM pointer is made
>>>>         available to the vfio_ap device driver.
>>>>
>>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>>> ---
>>>>    drivers/s390/crypto/vfio_ap_ops.c | 190 +++++++++++++++++++++++++-----
>>>>    1 file changed, 159 insertions(+), 31 deletions(-)
>>>>
>>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>>> index 586ec5776693..4f96b7861607 100644
>>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>>> @@ -631,6 +631,60 @@ static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
>>>>    	}
>>>>    }
>>>>    
>>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>> +					unsigned long apid)
>>>> +{
>>>> +	unsigned long apqi, apqn;
>>>> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>> +
>>>> +	/*
>>>> +	 * If the APID is already assigned to the guest's shadow APCB, there is
>>>> +	 * no need to assign it.
>>>> +	 */
>>>> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>> +		return false;
>>>> +
>>>> +	/*
>>>> +	 * If no domains have yet been assigned to the shadow APCB and one or
>>>> +	 * more domains have been assigned to the matrix mdev, then use
>>>> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
>>>> +	 * to assign to the shadow APCB.
>>>> +	 */
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
>>>> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
>>>> +			return false;
>>>> +
>>>> +		aqm = matrix_mdev->matrix.aqm;
>>>> +	}
>>>> +
>>>> +	/* Make sure all APQNs are bound to the vfio_ap driver */
>>>> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>> +		apqn = AP_MKQID(apid, apqi);
>>>> +
>>>> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>> +			return false;
>>>> +	}
>>>> +
>>>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>> +
>>>> +	/*
>>>> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
>>>> +	 * then copy the APQIs of those domains into the guest's APCB
>>>> +	 */
>>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
>>>> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
>>>> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
>>>> +
>>>> +	return true;
>>>> +}
>>> What is the rationale behind the shadow aqm empty special handling?
>> The rationale was to avoid taking the VCPUs
>> out of SIE in order to make an update to the guest's APCB
>> unnecessarily. For example, suppose the guest is started
>> without access to any APQNs (i.e., all matrix and shadow_apcb
>> masks are zeros). Now suppose the administrator proceeds to
>> start assigning AP resources to the mdev. Let's say he starts
>> by assigning adapters 1 through 100. The code below will return
>> true indicating the shadow_apcb was updated. Consequently,
>> the calling code will commit the changes to the guest's
>> APCB. The problem there is that in order to update the guest's
>> VCPUs, they will have to be taken out of SIE, yet the guest will
>> not get access to the adapter since no domains have yet been
>> assigned to the APCB. Doing this 100 times - once for each
>> adapter 1-100 - is probably a bad idea.
>>
> Not yanking the VCPUs out of SIE does make a lot of sense. At least
> I understand your motivation now. I will think some more about this,
> but in the meanwhile, please try to answer one more question (see
> below).
>   
>>>    I.e.
>>> why not simply:
>>>
>>>
>>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
>>>                                           unsigned long apid)
>>> {
>>>           unsigned long apqi, apqn;
>>>           unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
>>>                                                                                   
>>>           /*
>>>            * If the APID is already assigned to the guest's shadow APCB, there is
>>>            * no need to assign it.
>>>            */
>>>           if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
>>>                   return false;
>>>                                                                                   
>>>           /* Make sure all APQNs are bound to the vfio_ap driver */
>>>           for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
>>>                   apqn = AP_MKQID(apid, apqi);
>>>                                                                                   
>>>                   if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
>>>                           return false;
>>>           }
>>>                                                                                   
>>>           set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
>>>                                                                                   
>>>           return true;
> Would
> s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> AP_DOMAINS)/
> do the trick?
>
> I mean if empty, then we would not commit the APCB, so we would
> not take the vCPUs out of SIE -- see below.

At first glance I'd say yes, it does the trick; but, I need to consider
all possible scenarios. For example, that will work fine when someone
either assigns all of the adapters or all of the domains first, then assigns
the other.

>
>>>> +
>>>> +static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
>>>> +					  unsigned long apid)
>>>> +{
>>>> +	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
>>>> +		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
>>>> +}
>>>> +
> [..]
>
> Regards,
> Halil
Halil Pasic Dec. 1, 2020, 10:19 a.m. UTC | #5
On Mon, 30 Nov 2020 19:18:30 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> >>>> +static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >>>> +					unsigned long apid)
> >>>> +{
> >>>> +	unsigned long apqi, apqn;
> >>>> +	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >>>> +
> >>>> +	/*
> >>>> +	 * If the APID is already assigned to the guest's shadow APCB, there is
> >>>> +	 * no need to assign it.
> >>>> +	 */
> >>>> +	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >>>> +		return false;
> >>>> +
> >>>> +	/*
> >>>> +	 * If no domains have yet been assigned to the shadow APCB and one or
> >>>> +	 * more domains have been assigned to the matrix mdev, then use
> >>>> +	 * the domains assigned to the matrix mdev; otherwise, there is nothing
> >>>> +	 * to assign to the shadow APCB.
> >>>> +	 */
> >>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
> >>>> +		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
> >>>> +			return false;
> >>>> +
> >>>> +		aqm = matrix_mdev->matrix.aqm;
> >>>> +	}
> >>>> +
> >>>> +	/* Make sure all APQNs are bound to the vfio_ap driver */
> >>>> +	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >>>> +		apqn = AP_MKQID(apid, apqi);
> >>>> +
> >>>> +		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >>>> +			return false;
> >>>> +	}
> >>>> +
> >>>> +	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >>>> +
> >>>> +	/*
> >>>> +	 * If we verified APQNs using the domains assigned to the matrix mdev,
> >>>> +	 * then copy the APQIs of those domains into the guest's APCB
> >>>> +	 */
> >>>> +	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
> >>>> +		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
> >>>> +			    matrix_mdev->matrix.aqm, AP_DOMAINS);
> >>>> +
> >>>> +	return true;
> >>>> +}  
> >>> What is the rationale behind the shadow aqm empty special handling?  
> >> The rationale was to avoid taking the VCPUs
> >> out of SIE in order to make an update to the guest's APCB
> >> unnecessarily. For example, suppose the guest is started
> >> without access to any APQNs (i.e., all matrix and shadow_apcb
> >> masks are zeros). Now suppose the administrator proceeds to
> >> start assigning AP resources to the mdev. Let's say he starts
> >> by assigning adapters 1 through 100. The code below will return
> >> true indicating the shadow_apcb was updated. Consequently,
> >> the calling code will commit the changes to the guest's
> >> APCB. The problem there is that in order to update the guest's
> >> VCPUs, they will have to be taken out of SIE, yet the guest will
> >> not get access to the adapter since no domains have yet been
> >> assigned to the APCB. Doing this 100 times - once for each
> >> adapter 1-100 - is probably a bad idea.
> >>  
> > Not yanking the VCPUs out of SIE does make a lot of sense. At least
> > I understand your motivation now. I will think some more about this,
> > but in the meanwhile, please try to answer one more question (see
> > below).
> >     
> >>>    I.e.
> >>> why not simply:
> >>>
> >>>
> >>> static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
> >>>                                           unsigned long apid)
> >>> {
> >>>           unsigned long apqi, apqn;
> >>>           unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
> >>>                                                                                   
> >>>           /*
> >>>            * If the APID is already assigned to the guest's shadow APCB, there is
> >>>            * no need to assign it.
> >>>            */
> >>>           if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
> >>>                   return false;
> >>>                                                                                   
> >>>           /* Make sure all APQNs are bound to the vfio_ap driver */
> >>>           for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
> >>>                   apqn = AP_MKQID(apid, apqi);
> >>>                                                                                   
> >>>                   if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
> >>>                           return false;
> >>>           }
> >>>                                                                                   
> >>>           set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
> >>>                                                                                   
> >>>           return true;  
> > Would
> > s/return true/return !bitmap_empty(matrix_mdev->shadow_apcb.aqm,
> > AP_DOMAINS)/
> > do the trick?
> >
> > I mean if empty, then we would not commit the APCB, so we would
> > not take the vCPUs out of SIE -- see below.  
> 
> At first glance I'd say yes, it does the trick; but, I need to consider
> all possible scenarios. For example, that will work fine when someone
> either assigns all of the adapters or all of the domains first, then assigns
> the other.

Maybe I can help you. The only caveat I have in mind is the show of the
guest_matrix attribute. We probably don't want to display adapters
without domains and vice-versa. But that can be easily handled with
a flag.

Regards,
Halil
Halil Pasic Dec. 1, 2020, 5:56 p.m. UTC | #6
On Tue, 1 Dec 2020 00:32:27 +0100
Halil Pasic <pasic@linux.ibm.com> wrote:

> > 
> > 
> > On 11/28/20 8:52 PM, Halil Pasic wrote:  
> [..]
> > >> * Unassign adapter from mdev's matrix:
> > >>
> > >>    The domain will be hot unplugged from the KVM guest if it is
> > >>    assigned to the guest's matrix.
> > >>
> > >> * Assign a control domain:
> > >>
> > >>    The control domain will be hot plugged into the KVM guest if it is not
> > >>    assigned to the guest's APCB. The AP architecture ensures a guest will
> > >>    only get access to the control domain if it is in the host's AP
> > >>    configuration, so there is no risk in hot plugging it; however, it will
> > >>    become automatically available to the guest when it is added to the host
> > >>    configuration.
> > >>
> > >> * Unassign a control domain:
> > >>
> > >>    The control domain will be hot unplugged from the KVM guest if it is
> > >>    assigned to the guest's APCB.  
> > > This is where things start getting tricky. E.g. do we need to revise
> > > filtering after an unassign? (For example an assign_adapter X didn't
> > > change the shadow, because queue XY was missing, but now we unplug domain
> > > Y. Should the adapter X pop up? I guess it should.)  
> > 
> > I suppose that makes sense at the expense of making the code
> > more complex. It is essentially what we had in the prior version
> > which used the same filtering code for assignment as well as
> > host AP configuration changes.
> >   
> 
> Will have to think about it some more. Making the user unplug and
> replug an adapter because at some point it got filtered, but there
> is no need to filter it does not feel right. On the other hand, I'm
> afraid I'm complaining in circles. 

I did some thinking. The following statements are about the state of
affairs, when all 17 patches are applied. I'm commenting here, because
I believe this is the patch that introduces the most controversial code.

First about low level problems with the current code/design. The other is
empty handling in vfio_ap_assign_apid_to_apcb() (and
vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
allows for over-commitment, i.e. assignment of e.g. domains that
are not in the crypto host config. Let's assume the host LPAR
has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
and aqmask are both 0 (all in on vfio), all bound. We start with an empty
mdev that is tied to a running guest:
assign_adapter 1
assign_adapter 2
assign_adapter 3
assign_adapter 4
all of these will work. The resulting shadow_apcb is completely empty. No
commit_apcb.
assign_domain 1
assign_domain 2
assign_domain 3
all of these will work. But again the shadow_apcb is completely empty at
the end: we did get to the loop that is checking the boundness of the
queues, but please note that we are checking against matrix.apm, and
adapter 4 is not in the config of the host.

I've hacked up a fixup patch for these problems that simplifies the
code considerably, but there are design level issues, that run deeper,
so I'm not sure the fixups are the way to go.

Now lets talk about design level stuff. Currently the assignment
operations are designed in to accommodate the FCFS principle. This
is a blessing and a curse at the same time. 

Consider the following scenarios. We have an empty (nothing assigned
mdev) and the following queues are bound to the vfio_ap driver:
0.0
0.1
1.0
If the we do 
asssign_adapter 0
assign_domain 0
assign_domain 1
assign_adapter 1
We end up with the guest_matrix
0.0
0.1
and the matrix
0.0
0.1
1.0
1.0

That is a different result compared to
asssign_adapter 0
assign_domain 0
assign_adapter 1
assign_domain 1
or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
and then 1.1 gets unbound.

For the same system state (bound, config, ap_perm, matrix) you get a
different outcomes (guest_matrix), because the outcomes depend on
history.

Another thing is recovery. I believe the main idea behind shadow_apcb
is that we should auto recover once the resources are available again.
The current design choices make recovery more difficult to think about
because we may end up having either the apid or the apqi filtered on
a 'hole' (an queue missing for reasons different than, belonging to
default, or not being in the host config).

I still think for these cases filtering out the apid is the lesser
evil. Yes a hotplug of a domain making hot unplugging an adapter is
ugly, but at least I can describe that. So I propose the following.
Let me hack up a fixup that morphs things in this direction. Maybe
I will run into unexpected problems, but if I don't then we will
have an alternative design you can run your testcases against. How about
that?

Regards,
Halil
Anthony Krowiak Dec. 1, 2020, 10:12 p.m. UTC | #7
On 12/1/20 12:56 PM, Halil Pasic wrote:
> On Tue, 1 Dec 2020 00:32:27 +0100
> Halil Pasic <pasic@linux.ibm.com> wrote:
>
>>>
>>> On 11/28/20 8:52 PM, Halil Pasic wrote:
>> [..]
>>>>> * Unassign adapter from mdev's matrix:
>>>>>
>>>>>     The domain will be hot unplugged from the KVM guest if it is
>>>>>     assigned to the guest's matrix.
>>>>>
>>>>> * Assign a control domain:
>>>>>
>>>>>     The control domain will be hot plugged into the KVM guest if it is not
>>>>>     assigned to the guest's APCB. The AP architecture ensures a guest will
>>>>>     only get access to the control domain if it is in the host's AP
>>>>>     configuration, so there is no risk in hot plugging it; however, it will
>>>>>     become automatically available to the guest when it is added to the host
>>>>>     configuration.
>>>>>
>>>>> * Unassign a control domain:
>>>>>
>>>>>     The control domain will be hot unplugged from the KVM guest if it is
>>>>>     assigned to the guest's APCB.
>>>> This is where things start getting tricky. E.g. do we need to revise
>>>> filtering after an unassign? (For example an assign_adapter X didn't
>>>> change the shadow, because queue XY was missing, but now we unplug domain
>>>> Y. Should the adapter X pop up? I guess it should.)
>>> I suppose that makes sense at the expense of making the code
>>> more complex. It is essentially what we had in the prior version
>>> which used the same filtering code for assignment as well as
>>> host AP configuration changes.
>>>    
>> Will have to think about it some more. Making the user unplug and
>> replug an adapter because at some point it got filtered, but there
>> is no need to filter it does not feel right. On the other hand, I'm
>> afraid I'm complaining in circles.
> I did some thinking. The following statements are about the state of
> affairs, when all 17 patches are applied. I'm commenting here, because
> I believe this is the patch that introduces the most controversial code.
>
> First about low level problems with the current code/design. The other is
> empty handling in vfio_ap_assign_apid_to_apcb() (and
> vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
> allows for over-commitment, i.e. assignment of e.g. domains that
> are not in the crypto host config. Let's assume the host LPAR
> has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
> and aqmask are both 0 (all in on vfio), all bound. We start with an empty
> mdev that is tied to a running guest:
> assign_adapter 1
> assign_adapter 2
> assign_adapter 3
> assign_adapter 4
> all of these will work. The resulting shadow_apcb is completely empty. No
> commit_apcb.
> assign_domain 1
> assign_domain 2
> assign_domain 3
> all of these will work. But again the shadow_apcb is completely empty at
> the end: we did get to the loop that is checking the boundness of the
> queues, but please note that we are checking against matrix.apm, and
> adapter 4 is not in the config of the host.
>
> I've hacked up a fixup patch for these problems that simplifies the
> code considerably, but there are design level issues, that run deeper,
> so I'm not sure the fixups are the way to go.
>
> Now lets talk about design level stuff. Currently the assignment
> operations are designed in to accommodate the FCFS principle. This
> is a blessing and a curse at the same time.
>
> Consider the following scenarios. We have an empty (nothing assigned
> mdev) and the following queues are bound to the vfio_ap driver:
> 0.0
> 0.1
> 1.0
> If the we do
> asssign_adapter 0
> assign_domain 0
> assign_domain 1
> assign_adapter 1
> We end up with the guest_matrix
> 0.0
> 0.1
> and the matrix
> 0.0
> 0.1
> 1.0
> 1.0
>
> That is a different result compared to
> asssign_adapter 0
> assign_domain 0
> assign_adapter 1
> assign_domain 1
> or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
> and then 1.1 gets unbound.

In v11 of the patch series, the filtering code always filters
the matrix assigned to the mdev and is invoked whenever
an adapter or domain is assigned, a queue is probed and
when the AP bus scan complete notification is received and
adapters and/or domains have been added to the host AP
configuration. So I made a slight modification to that
filtering function to filter only by APID and ran the above
scenarios. In each case, the resulting guest matrix was
identicle. I also tested the bind/unbind and achieved the
same results.

>
> For the same system state (bound, config, ap_perm, matrix) you get a
> different outcomes (guest_matrix), because the outcomes depend on
> history.
>
> Another thing is recovery. I believe the main idea behind shadow_apcb
> is that we should auto recover once the resources are available again.
> The current design choices make recovery more difficult to think about
> because we may end up having either the apid or the apqi filtered on
> a 'hole' (an queue missing for reasons different than, belonging to
> default, or not being in the host config).

The filtering code from the v11 series with the tweak I
mentioned above accomplishes this. I tested this by
doing manual binds/unbinds of a queue using the
scenarios you layed out.

>
> I still think for these cases filtering out the apid is the lesser
> evil. Yes a hotplug of a domain making hot unplugging an adapter is
> ugly, but at least I can describe that. So I propose the following.
> Let me hack up a fixup that morphs things in this direction. Maybe
> I will run into unexpected problems, but if I don't then we will
> have an alternative design you can run your testcases against. How about
> that?

I appreciate the offer, but I believe with the change to the v11
filtering code I described above we have a solution. One of
your objections to the filtering code was looping over all
assigned adapters/domains each time an adapter or
domain is assigned. It should also be easy to examine only
the APQNs involving the new APID or APQI being assigned.
Again, I appreciate your offer, but I don't think it is necessary
to take you away from your priorities to involve yourself in
mine.




>
> Regards,
> Halil
Halil Pasic Dec. 1, 2020, 11:14 p.m. UTC | #8
On Tue, 1 Dec 2020 17:12:56 -0500
Tony Krowiak <akrowiak@linux.ibm.com> wrote:

> On 12/1/20 12:56 PM, Halil Pasic wrote:
> > On Tue, 1 Dec 2020 00:32:27 +0100
> > Halil Pasic <pasic@linux.ibm.com> wrote:
> >  
> >>>
> >>> On 11/28/20 8:52 PM, Halil Pasic wrote:  
> >> [..]  
> >>>>> * Unassign adapter from mdev's matrix:
> >>>>>
> >>>>>     The domain will be hot unplugged from the KVM guest if it is
> >>>>>     assigned to the guest's matrix.
> >>>>>
> >>>>> * Assign a control domain:
> >>>>>
> >>>>>     The control domain will be hot plugged into the KVM guest if it is not
> >>>>>     assigned to the guest's APCB. The AP architecture ensures a guest will
> >>>>>     only get access to the control domain if it is in the host's AP
> >>>>>     configuration, so there is no risk in hot plugging it; however, it will
> >>>>>     become automatically available to the guest when it is added to the host
> >>>>>     configuration.
> >>>>>
> >>>>> * Unassign a control domain:
> >>>>>
> >>>>>     The control domain will be hot unplugged from the KVM guest if it is
> >>>>>     assigned to the guest's APCB.  
> >>>> This is where things start getting tricky. E.g. do we need to revise
> >>>> filtering after an unassign? (For example an assign_adapter X didn't
> >>>> change the shadow, because queue XY was missing, but now we unplug domain
> >>>> Y. Should the adapter X pop up? I guess it should.)  
> >>> I suppose that makes sense at the expense of making the code
> >>> more complex. It is essentially what we had in the prior version
> >>> which used the same filtering code for assignment as well as
> >>> host AP configuration changes.
> >>>      
> >> Will have to think about it some more. Making the user unplug and
> >> replug an adapter because at some point it got filtered, but there
> >> is no need to filter it does not feel right. On the other hand, I'm
> >> afraid I'm complaining in circles.  
> > I did some thinking. The following statements are about the state of
> > affairs, when all 17 patches are applied. I'm commenting here, because
> > I believe this is the patch that introduces the most controversial code.
> >
> > First about low level problems with the current code/design. The other is
> > empty handling in vfio_ap_assign_apid_to_apcb() (and
> > vfio_ap_assign_apqi_to_apcb()) is troublesome. The final product
> > allows for over-commitment, i.e. assignment of e.g. domains that
> > are not in the crypto host config. Let's assume the host LPAR
> > has usage domains 1 and 2, and adapters 1, 2, and 3. The apmask
> > and aqmask are both 0 (all in on vfio), all bound. We start with an empty
> > mdev that is tied to a running guest:
> > assign_adapter 1
> > assign_adapter 2
> > assign_adapter 3
> > assign_adapter 4
> > all of these will work. The resulting shadow_apcb is completely empty. No
> > commit_apcb.
> > assign_domain 1
> > assign_domain 2
> > assign_domain 3
> > all of these will work. But again the shadow_apcb is completely empty at
> > the end: we did get to the loop that is checking the boundness of the
> > queues, but please note that we are checking against matrix.apm, and
> > adapter 4 is not in the config of the host.
> >
> > I've hacked up a fixup patch for these problems that simplifies the
> > code considerably, but there are design level issues, that run deeper,
> > so I'm not sure the fixups are the way to go.
> >
> > Now lets talk about design level stuff. Currently the assignment
> > operations are designed in to accommodate the FCFS principle. This
> > is a blessing and a curse at the same time.
> >
> > Consider the following scenarios. We have an empty (nothing assigned
> > mdev) and the following queues are bound to the vfio_ap driver:
> > 0.0
> > 0.1
> > 1.0
> > If the we do
> > asssign_adapter 0
> > assign_domain 0
> > assign_domain 1
> > assign_adapter 1
> > We end up with the guest_matrix
> > 0.0
> > 0.1
> > and the matrix
> > 0.0
> > 0.1
> > 1.0
> > 1.0
> >
> > That is a different result compared to
> > asssign_adapter 0
> > assign_domain 0
> > assign_adapter 1
> > assign_domain 1
> > or the situation where we have 0.0, 0.1, 1.0 and 1.1 bound to vfio_ap
> > and then 1.1 gets unbound.  
> 
> In v11 of the patch series, the filtering code always filters
> the matrix assigned to the mdev and is invoked whenever
> an adapter or domain is assigned, a queue is probed and
> when the AP bus scan complete notification is received and
> adapters and/or domains have been added to the host AP
> configuration. So I made a slight modification to that
> filtering function to filter only by APID and ran the above
> scenarios. In each case, the resulting guest matrix was
> identicle. I also tested the bind/unbind and achieved the
> same results.
> 
> >
> > For the same system state (bound, config, ap_perm, matrix) you get a
> > different outcomes (guest_matrix), because the outcomes depend on
> > history.
> >
> > Another thing is recovery. I believe the main idea behind shadow_apcb
> > is that we should auto recover once the resources are available again.
> > The current design choices make recovery more difficult to think about
> > because we may end up having either the apid or the apqi filtered on
> > a 'hole' (an queue missing for reasons different than, belonging to
> > default, or not being in the host config).  
> 
> The filtering code from the v11 series with the tweak I
> mentioned above accomplishes this. I tested this by
> doing manual binds/unbinds of a queue using the
> scenarios you layed out.
> 
> >
> > I still think for these cases filtering out the apid is the lesser
> > evil. Yes a hotplug of a domain making hot unplugging an adapter is
> > ugly, but at least I can describe that. So I propose the following.
> > Let me hack up a fixup that morphs things in this direction. Maybe
> > I will run into unexpected problems, but if I don't then we will
> > have an alternative design you can run your testcases against. How about
> > that?  
> 
> I appreciate the offer, but I believe with the change to the v11
> filtering code I described above we have a solution. One of
> your objections to the filtering code was looping over all
> assigned adapters/domains each time an adapter or
> domain is assigned. It should also be easy to examine only
> the APQNs involving the new APID or APQI being assigned.
> Again, I appreciate your offer, but I don't think it is necessary
> to take you away from your priorities to involve yourself in
> mine.

Seems you have it sorted out. Unfortunately I can't really follow without
code, but I have to trust you. Can you please spin a v13 with these
improvements implemented?

Maybe I didn't comment on every patch, but I did go through all of them.
I believe we have enough material for another iteration, and further
review makes no sense at this point. I intend to come back to this
once v13 is out.

Thanks,
Halil
diff mbox series

Patch

diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 586ec5776693..4f96b7861607 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -631,6 +631,60 @@  static void vfio_ap_mdev_manage_qlinks(struct ap_matrix_mdev *matrix_mdev,
 	}
 }
 
+static bool vfio_ap_assign_apid_to_apcb(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long apid)
+{
+	unsigned long apqi, apqn;
+	unsigned long *aqm = matrix_mdev->shadow_apcb.aqm;
+
+	/*
+	 * If the APID is already assigned to the guest's shadow APCB, there is
+	 * no need to assign it.
+	 */
+	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm))
+		return false;
+
+	/*
+	 * If no domains have yet been assigned to the shadow APCB and one or
+	 * more domains have been assigned to the matrix mdev, then use
+	 * the domains assigned to the matrix mdev; otherwise, there is nothing
+	 * to assign to the shadow APCB.
+	 */
+	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS)) {
+		if (bitmap_empty(matrix_mdev->matrix.aqm, AP_DOMAINS))
+			return false;
+
+		aqm = matrix_mdev->matrix.aqm;
+	}
+
+	/* Make sure all APQNs are bound to the vfio_ap driver */
+	for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) {
+		apqn = AP_MKQID(apid, apqi);
+
+		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
+			return false;
+	}
+
+	set_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+
+	/*
+	 * If we verified APQNs using the domains assigned to the matrix mdev,
+	 * then copy the APQIs of those domains into the guest's APCB
+	 */
+	if (bitmap_empty(matrix_mdev->shadow_apcb.aqm, AP_DOMAINS))
+		bitmap_copy(matrix_mdev->shadow_apcb.aqm,
+			    matrix_mdev->matrix.aqm, AP_DOMAINS);
+
+	return true;
+}
+
+static void vfio_ap_mdev_hot_plug_adapter(struct ap_matrix_mdev *matrix_mdev,
+					  unsigned long apid)
+{
+	if (vfio_ap_assign_apid_to_apcb(matrix_mdev, apid))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+}
+
 /**
  * assign_adapter_store
  *
@@ -673,10 +727,6 @@  static ssize_t assign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -698,12 +748,22 @@  static ssize_t assign_adapter_store(struct device *dev,
 	}
 	set_bit_inv(apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APID, apid);
+	vfio_ap_mdev_hot_plug_adapter(matrix_mdev, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_adapter);
 
+static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev,
+					    unsigned long apid)
+{
+	if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) {
+		clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 /**
  * unassign_adapter_store
  *
@@ -730,10 +790,6 @@  static ssize_t unassign_adapter_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of adapter */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apid);
 	if (ret)
 		return ret;
@@ -744,12 +800,67 @@  static ssize_t unassign_adapter_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apid, matrix_mdev->matrix.apm);
 	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APID, apid);
+	vfio_ap_mdev_hot_unplug_adapter(matrix_mdev, apid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_adapter);
 
+static bool vfio_ap_assign_apqi_to_apcb(struct ap_matrix_mdev *matrix_mdev,
+					unsigned long apqi)
+{
+	unsigned long apid, apqn;
+	unsigned long *apm = matrix_mdev->shadow_apcb.apm;
+
+	/*
+	 * If the APQI is already assigned to the guest's shadow APCB, there is
+	 * no need to assign it.
+	 */
+	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm))
+		return false;
+
+	/*
+	 * If no adapters have yet been assigned to the shadow APCB and one or
+	 * more adapters have been assigned to the matrix mdev, then use
+	 * the adapters assigned to the matrix mdev; otherwise, there is nothing
+	 * to assign to the shadow APCB.
+	 */
+	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES)) {
+		if (bitmap_empty(matrix_mdev->matrix.apm, AP_DEVICES))
+			return false;
+
+		apm = matrix_mdev->matrix.apm;
+	}
+
+	/* Make sure all APQNs are bound to the vfio_ap driver */
+	for_each_set_bit_inv(apid, apm, AP_DEVICES) {
+		apqn = AP_MKQID(apid, apqi);
+
+		if (vfio_ap_mdev_get_queue(matrix_mdev, apqn) == NULL)
+			return false;
+	}
+
+	set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+
+	/*
+	 * If we verified APQNs using the adapters assigned to the matrix mdev,
+	 * then copy the APIDs of those adapters into the guest's APCB
+	 */
+	if (bitmap_empty(matrix_mdev->shadow_apcb.apm, AP_DEVICES))
+		bitmap_copy(matrix_mdev->shadow_apcb.apm,
+			    matrix_mdev->matrix.apm, AP_DEVICES);
+
+	return true;
+}
+
+static void vfio_ap_mdev_hot_plug_domain(struct ap_matrix_mdev *matrix_mdev,
+					 unsigned long apqi)
+{
+	if (vfio_ap_assign_apqi_to_apcb(matrix_mdev, apqi))
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+}
+
 /**
  * assign_domain_store
  *
@@ -793,10 +904,6 @@  static ssize_t assign_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
 
-	/* If the guest is running, disallow assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -817,12 +924,21 @@  static ssize_t assign_domain_store(struct device *dev,
 	}
 	set_bit_inv(apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_manage_qlinks(matrix_mdev, LINK_APQI, apqi);
+	vfio_ap_mdev_hot_plug_domain(matrix_mdev, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_domain);
 
+static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev,
+					   unsigned long apqi)
+{
+	if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) {
+		clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
 
 /**
  * unassign_domain_store
@@ -850,10 +966,6 @@  static ssize_t unassign_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow un-assignment of domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &apqi);
 	if (ret)
 		return ret;
@@ -864,12 +976,22 @@  static ssize_t unassign_domain_store(struct device *dev,
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv((unsigned long)apqi, matrix_mdev->matrix.aqm);
 	vfio_ap_mdev_manage_qlinks(matrix_mdev, UNLINK_APQI, apqi);
+	vfio_ap_mdev_hot_unplug_domain(matrix_mdev, apqi);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(unassign_domain);
 
+static void vfio_ap_mdev_hot_plug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
+					     unsigned long domid)
+{
+	if (!test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+		set_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 /**
  * assign_control_domain_store
  *
@@ -895,10 +1017,6 @@  static ssize_t assign_control_domain_store(struct device *dev,
 	struct mdev_device *mdev = mdev_from_dev(dev);
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 
-	/* If the guest is running, disallow assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &id);
 	if (ret)
 		return ret;
@@ -914,12 +1032,23 @@  static ssize_t assign_control_domain_store(struct device *dev,
 	if (!mutex_trylock(&matrix_dev->lock))
 		return -EBUSY;
 	set_bit_inv(id, matrix_mdev->matrix.adm);
+	vfio_ap_mdev_hot_plug_ctl_domain(matrix_mdev, id);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
 }
 static DEVICE_ATTR_WO(assign_control_domain);
 
+static void
+vfio_ap_mdev_hot_unplug_ctl_domain(struct ap_matrix_mdev *matrix_mdev,
+				   unsigned long domid)
+{
+	if (test_bit_inv(domid, matrix_mdev->shadow_apcb.adm)) {
+		clear_bit_inv(domid, matrix_mdev->shadow_apcb.adm);
+		vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
+	}
+}
+
 /**
  * unassign_control_domain_store
  *
@@ -946,10 +1075,6 @@  static ssize_t unassign_control_domain_store(struct device *dev,
 	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
 	unsigned long max_domid =  matrix_mdev->matrix.adm_max;
 
-	/* If the guest is running, disallow un-assignment of control domain */
-	if (matrix_mdev->kvm)
-		return -EBUSY;
-
 	ret = kstrtoul(buf, 0, &domid);
 	if (ret)
 		return ret;
@@ -958,6 +1083,7 @@  static ssize_t unassign_control_domain_store(struct device *dev,
 
 	mutex_lock(&matrix_dev->lock);
 	clear_bit_inv(domid, matrix_mdev->matrix.adm);
+	vfio_ap_mdev_hot_unplug_ctl_domain(matrix_mdev, domid);
 	mutex_unlock(&matrix_dev->lock);
 
 	return count;
@@ -1099,8 +1225,6 @@  static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 {
 	struct ap_matrix_mdev *m;
 
-	mutex_lock(&matrix_dev->lock);
-
 	list_for_each_entry(m, &matrix_dev->mdev_list, node) {
 		if ((m != matrix_mdev) && (m->kvm == kvm)) {
 			mutex_unlock(&matrix_dev->lock);
@@ -1111,7 +1235,6 @@  static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
 	matrix_mdev->kvm = kvm;
 	kvm_get_kvm(kvm);
 	kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
-	mutex_unlock(&matrix_dev->lock);
 
 	return 0;
 }
@@ -1148,7 +1271,7 @@  static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
 static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 				       unsigned long action, void *data)
 {
-	int ret;
+	int ret = NOTIFY_DONE;
 	struct ap_matrix_mdev *matrix_mdev;
 
 	if (action != VFIO_GROUP_NOTIFY_SET_KVM)
@@ -1156,23 +1279,28 @@  static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
 
 	matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
 
+	mutex_lock(&matrix_dev->lock);
+
 	if (!data) {
 		if (matrix_mdev->kvm)
 			kvm_put_kvm(matrix_mdev->kvm);
 
 		matrix_mdev->kvm = NULL;
 
-		return NOTIFY_OK;
+		ret = NOTIFY_OK;
+		goto done;
 	}
 
 	ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
 	if (ret)
-		return NOTIFY_DONE;
+		goto done;
 
 	vfio_ap_mdev_init_apcb(matrix_mdev);
 	vfio_ap_mdev_commit_shadow_apcb(matrix_mdev);
 
-	return NOTIFY_OK;
+done:
+	mutex_unlock(&matrix_dev->lock);
+	return ret;
 }
 
 static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,