[v10,13/26] s390: vfio-ap: zeroize the AP queues

Message ID	1536781396-13601-14-git-send-email-akrowiak@linux.vnet.ibm.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <kvm-owner@kernel.org> Gateway: Authorized Use Only! Violators will be prosecuted for <kvm@vger.kernel.org> from <akrowiak@linux.vnet.ibm.com>; Wed, 12 Sep 2018 13:44:23 -0600 Gateway: Authorized Use Only! Violators will be prosecuted; (version=TLSv1/SSLv3 cipher=AES256-GCM-SHA384 bits=256/256) Wed, 12 Sep 2018 13:44:19 -0600 From: Tony Krowiak <akrowiak@linux.vnet.ibm.com> To: linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org Cc: freude@de.ibm.com, schwidefsky@de.ibm.com, heiko.carstens@de.ibm.com, borntraeger@de.ibm.com, cohuck@redhat.com, kwankhede@nvidia.com, bjsdjshi@linux.vnet.ibm.com, pbonzini@redhat.com, alex.williamson@redhat.com, pmorel@linux.vnet.ibm.com, alifm@linux.vnet.ibm.com, mjrosato@linux.vnet.ibm.com, jjherne@linux.vnet.ibm.com, thuth@redhat.com, pasic@linux.vnet.ibm.com, berrange@redhat.com, fiuczy@linux.vnet.ibm.com, buendgen@de.ibm.com, akrowiak@linux.vnet.ibm.com, frankja@linux.ibm.com, Tony Krowiak <akrowiak@linux.ibm.com> Subject: [PATCH v10 13/26] s390: vfio-ap: zeroize the AP queues Date: Wed, 12 Sep 2018 15:43:03 -0400 In-Reply-To: <1536781396-13601-1-git-send-email-akrowiak@linux.vnet.ibm.com> References: <1536781396-13601-1-git-send-email-akrowiak@linux.vnet.ibm.com> Message-Id: <1536781396-13601-14-git-send-email-akrowiak@linux.vnet.ibm.com> Sender: kvm-owner@vger.kernel.org Precedence: bulk
Series	guest dedicated crypto adapters \| expand [v10,00/26] guest dedicated crypto adapters [v10,01/26] KVM: s390: vsie: simulate VCPU SIE entry/exit [v10,02/26] KVM: s390: introduce and use KVM_REQ_VSIE_RESTART [v10,03/26] KVM: s390: refactor crypto initialization [v10,04/26] s390: vfio-ap: base implementation of VFIO AP device driver [v10,05/26] s390: vfio-ap: register matrix device with VFIO mdev framework [v10,06/26] s390: vfio-ap: sysfs interfaces to configure adapters [v10,07/26] s390: vfio-ap: sysfs interfaces to configure domains [v10,08/26] s390: vfio-ap: sysfs interfaces to configure control domains [v10,09/26] s390: vfio-ap: sysfs interface to view matrix mdev matrix [v10,10/26] KVM: s390: interfaces to clear CRYCB masks [v10,11/26] s390: vfio-ap: implement mediated device open callback [v10,12/26] s390: vfio-ap: implement VFIO_DEVICE_GET_INFO ioctl [v10,13/26] s390: vfio-ap: zeroize the AP queues [v10,14/26] s390: vfio-ap: implement VFIO_DEVICE_RESET ioctl [v10,15/26] KVM: s390: Clear Crypto Control Block when using vSIE [v10,16/26] KVM: s390: vsie: Do the CRYCB validation first [v10,17/26] KVM: s390: vsie: Make use of CRYCB FORMAT2 clear [v10,18/26] KVM: s390: vsie: Allow CRYCB FORMAT-2 [v10,19/26] KVM: s390: vsie: allow CRYCB FORMAT-1 [v10,20/26] KVM: s390: vsie: allow CRYCB FORMAT-0 [v10,21/26] KVM: s390: vsie: allow guest FORMAT-0 CRYCB on host FORMAT-1 [v10,22/26] KVM: s390: vsie: allow guest FORMAT-1 CRYCB on host FORMAT-2 [v10,23/26] KVM: s390: vsie: allow guest FORMAT-0 CRYCB on host FORMAT-2 [v10,24/26] KVM: s390: device attrs to enable/disable AP interpretation [v10,25/26] KVM: s390: CPU model support for AP virtualization [v10,26/26] s390: doc: detailed specifications for AP virtualization

Tony Krowiak Sept. 12, 2018, 7:43 p.m. UTC

From: Tony Krowiak <akrowiak@linux.ibm.com>

Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured
for a mediated matrix device when it is released.

Zeroizing a queue resets the queue, clears all pending
messages for the queue entries and disables adapter interruptions
associated with the queue.

Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
Tested-by: Michael Mueller <mimu@linux.ibm.com>
Tested-by: Farhan Ali <alifm@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 drivers/s390/crypto/vfio_ap_ops.c |   44 +++++++++++++++++++++++++++++++++++++
 1 files changed, 44 insertions(+), 0 deletions(-)

Cornelia Huck Sept. 24, 2018, 11:36 a.m. UTC | #1

On Wed, 12 Sep 2018 15:43:03 -0400
Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote:

> From: Tony Krowiak <akrowiak@linux.ibm.com>
> 
> Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured
> for a mediated matrix device when it is released.
> 
> Zeroizing a queue resets the queue, clears all pending
> messages for the queue entries and disables adapter interruptions
> associated with the queue.
> 
> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
> Tested-by: Michael Mueller <mimu@linux.ibm.com>
> Tested-by: Farhan Ali <alifm@linux.ibm.com>
> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
> ---
>  drivers/s390/crypto/vfio_ap_ops.c |   44 +++++++++++++++++++++++++++++++++++++
>  1 files changed, 44 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> index f8b276a..48b1b78 100644
> --- a/drivers/s390/crypto/vfio_ap_ops.c
> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>  	return NOTIFY_OK;
>  }
>  
> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> +				    unsigned int retry)
> +{
> +	struct ap_queue_status status;
> +
> +	do {
> +		status = ap_zapq(AP_MKQID(apid, apqi));
> +		switch (status.response_code) {
> +		case AP_RESPONSE_NORMAL:
> +			return 0;
> +		case AP_RESPONSE_RESET_IN_PROGRESS:
> +		case AP_RESPONSE_BUSY:
> +			msleep(20);
> +			break;
> +		default:
> +			/* things are really broken, give up */
> +			return -EIO;
> +		}
> +	} while (retry--);
> +
> +	return -EBUSY;

So, this function may either return 0, -EIO (things are really broken),
or -EBUSY (still busy after multiple tries)...

> +}
> +
> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> +{
> +	int ret;
> +	int rc = 0;
> +	unsigned long apid, apqi;
> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> +
> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> +			     matrix_mdev->matrix.apm_max + 1) {
> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> +				     matrix_mdev->matrix.aqm_max + 1) {
> +			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> +			if (ret)
> +				rc = ret;

...and here, we return the last error of any of the resets. Two
questions:

- Does it make sense to continue if we get -EIO? IOW, does "really
  broken" only refer to a certain tuple and other tuples still can/need
  to be reset?
- Is the return code useful in any way, as we don't know which tuple it
  refers to?

> +		}
> +	}
> +
> +	return rc;
> +}
> +
>  static int vfio_ap_mdev_open(struct mdev_device *mdev)
>  {
>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>  	if (matrix_mdev->kvm)
>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>  
> +	vfio_ap_mdev_reset_queues(mdev);
>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>  				 &matrix_mdev->group_notifier);
>  	matrix_mdev->kvm = NULL;

Halil Pasic Sept. 24, 2018, 12:16 p.m. UTC | #2

On 09/24/2018 01:36 PM, Cornelia Huck wrote:
> On Wed, 12 Sep 2018 15:43:03 -0400
> Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote:
> 
>> From: Tony Krowiak <akrowiak@linux.ibm.com>
>>
>> Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured
>> for a mediated matrix device when it is released.
>>
>> Zeroizing a queue resets the queue, clears all pending
>> messages for the queue entries and disables adapter interruptions
>> associated with the queue.
>>
>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>> Tested-by: Michael Mueller <mimu@linux.ibm.com>
>> Tested-by: Farhan Ali <alifm@linux.ibm.com>
>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>> ---
>>  drivers/s390/crypto/vfio_ap_ops.c |   44 +++++++++++++++++++++++++++++++++++++
>>  1 files changed, 44 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>> index f8b276a..48b1b78 100644
>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>  	return NOTIFY_OK;
>>  }
>>  
>> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>> +				    unsigned int retry)
>> +{
>> +	struct ap_queue_status status;
>> +
>> +	do {
>> +		status = ap_zapq(AP_MKQID(apid, apqi));
>> +		switch (status.response_code) {
>> +		case AP_RESPONSE_NORMAL:
>> +			return 0;
>> +		case AP_RESPONSE_RESET_IN_PROGRESS:
>> +		case AP_RESPONSE_BUSY:
>> +			msleep(20);
>> +			break;
>> +		default:
>> +			/* things are really broken, give up */
>> +			return -EIO;
>> +		}
>> +	} while (retry--);
>> +
>> +	return -EBUSY;
> 
> So, this function may either return 0, -EIO (things are really broken),
> or -EBUSY (still busy after multiple tries)...
> 
>> +}
>> +
>> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>> +{
>> +	int ret;
>> +	int rc = 0;
>> +	unsigned long apid, apqi;
>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> +
>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
>> +			     matrix_mdev->matrix.apm_max + 1) {
>> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>> +				     matrix_mdev->matrix.aqm_max + 1) {
>> +			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>> +			if (ret)
>> +				rc = ret;
> 
> ...and here, we return the last error of any of the resets. Two
> questions:
> 
> - Does it make sense to continue if we get -EIO? IOW, does "really
>   broken" only refer to a certain tuple and other tuples still can/need
>   to be reset?

I think it does make sense to continue, because IMHO "things are really
broken" is an overstatement (I mean the APQN invalid case). One could
argue would skipping the current card (adapter) be justified or not.

IMHO the current code is good enough for the first shot, and we can think
about fine-tuning it later.

> - Is the return code useful in any way, as we don't know which tuple it
>   refers to?
> 

Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY
is mostly fine given what the architecture say if we are satisfied with just
reset. And the cases behind -EIO might actually be OK too in the same sense.
My guess is, that based on the return value client code can tell if we have
zeroize for all queues or basically just reset (like rapq). We could log that
to some debug facility or whatever -- I guess, but at the moment we don't care.

In the end I think the code is good enough as is, and if we want we can
improve on it later.

Regards,
Halil


>> +		}
>> +	}
>> +
>> +	return rc;
>> +}
>> +
>>  static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>  {
>>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>  	if (matrix_mdev->kvm)
>>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>  
>> +	vfio_ap_mdev_reset_queues(mdev);
>>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>  				 &matrix_mdev->group_notifier);
>>  	matrix_mdev->kvm = NULL;
>

Cornelia Huck Sept. 24, 2018, 12:32 p.m. UTC | #3

On Mon, 24 Sep 2018 14:16:42 +0200
Halil Pasic <pasic@linux.ibm.com> wrote:

> On 09/24/2018 01:36 PM, Cornelia Huck wrote:
> > On Wed, 12 Sep 2018 15:43:03 -0400
> > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote:

> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
> >> index f8b276a..48b1b78 100644
> >> --- a/drivers/s390/crypto/vfio_ap_ops.c
> >> +++ b/drivers/s390/crypto/vfio_ap_ops.c
> >> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
> >>  	return NOTIFY_OK;
> >>  }
> >>  
> >> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
> >> +				    unsigned int retry)
> >> +{
> >> +	struct ap_queue_status status;
> >> +
> >> +	do {
> >> +		status = ap_zapq(AP_MKQID(apid, apqi));
> >> +		switch (status.response_code) {
> >> +		case AP_RESPONSE_NORMAL:
> >> +			return 0;
> >> +		case AP_RESPONSE_RESET_IN_PROGRESS:
> >> +		case AP_RESPONSE_BUSY:
> >> +			msleep(20);
> >> +			break;
> >> +		default:
> >> +			/* things are really broken, give up */
> >> +			return -EIO;
> >> +		}
> >> +	} while (retry--);
> >> +
> >> +	return -EBUSY;  
> > 
> > So, this function may either return 0, -EIO (things are really broken),
> > or -EBUSY (still busy after multiple tries)...
> >   
> >> +}
> >> +
> >> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
> >> +{
> >> +	int ret;
> >> +	int rc = 0;
> >> +	unsigned long apid, apqi;
> >> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> +
> >> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
> >> +			     matrix_mdev->matrix.apm_max + 1) {
> >> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
> >> +				     matrix_mdev->matrix.aqm_max + 1) {
> >> +			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
> >> +			if (ret)
> >> +				rc = ret;  
> > 
> > ...and here, we return the last error of any of the resets. Two
> > questions:
> > 
> > - Does it make sense to continue if we get -EIO? IOW, does "really
> >   broken" only refer to a certain tuple and other tuples still can/need
> >   to be reset?  
> 
> I think it does make sense to continue, because IMHO "things are really
> broken" is an overstatement (I mean the APQN invalid case). One could
> argue would skipping the current card (adapter) be justified or not.

A short comment ("even after -EIO, other devices still need to be
reset") may be helpful here (remember that I don't have any way to
verify this with the architecture).

> 
> IMHO the current code is good enough for the first shot, and we can think
> about fine-tuning it later.

Sure.

> 
> > - Is the return code useful in any way, as we don't know which tuple it
> >   refers to?
> >   
> 
> Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY
> is mostly fine given what the architecture say if we are satisfied with just
> reset. And the cases behind -EIO might actually be OK too in the same sense.
> My guess is, that based on the return value client code can tell if we have
> zeroize for all queues or basically just reset (like rapq). We could log that
> to some debug facility or whatever -- I guess, but at the moment we don't care.

Logging would probably be more useful than the return code, but that
can be added later.

> 
> In the end I think the code is good enough as is, and if we want we can
> improve on it later.

I don't object to that; but this is all a bit confusing to readers
without access to the architecture, so I think a comment or two would
really improve things.

> 
> Regards,
> Halil
> 
> 
> >> +		}
> >> +	}
> >> +
> >> +	return rc;
> >> +}
> >> +
> >>  static int vfio_ap_mdev_open(struct mdev_device *mdev)
> >>  {
> >>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
> >> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
> >>  	if (matrix_mdev->kvm)
> >>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >>  
> >> +	vfio_ap_mdev_reset_queues(mdev);
> >>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
> >>  				 &matrix_mdev->group_notifier);
> >>  	matrix_mdev->kvm = NULL;  
> >   
>

Harald Freudenberger Sept. 24, 2018, 1:22 p.m. UTC | #4

On 24.09.2018 14:16, Halil Pasic wrote:
>
> On 09/24/2018 01:36 PM, Cornelia Huck wrote:
>> On Wed, 12 Sep 2018 15:43:03 -0400
>> Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote:
>>
>>> From: Tony Krowiak <akrowiak@linux.ibm.com>
>>>
>>> Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured
>>> for a mediated matrix device when it is released.
>>>
>>> Zeroizing a queue resets the queue, clears all pending
>>> messages for the queue entries and disables adapter interruptions
>>> associated with the queue.
>>>
>>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com>
>>> Reviewed-by: Halil Pasic <pasic@linux.ibm.com>
>>> Tested-by: Michael Mueller <mimu@linux.ibm.com>
>>> Tested-by: Farhan Ali <alifm@linux.ibm.com>
>>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
>>> ---
>>>  drivers/s390/crypto/vfio_ap_ops.c |   44 +++++++++++++++++++++++++++++++++++++
>>>  1 files changed, 44 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
>>> index f8b276a..48b1b78 100644
>>> --- a/drivers/s390/crypto/vfio_ap_ops.c
>>> +++ b/drivers/s390/crypto/vfio_ap_ops.c
>>> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
>>>  	return NOTIFY_OK;
>>>  }
>>>  
>>> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi,
>>> +				    unsigned int retry)
>>> +{
>>> +	struct ap_queue_status status;
>>> +
>>> +	do {
>>> +		status = ap_zapq(AP_MKQID(apid, apqi));
>>> +		switch (status.response_code) {
>>> +		case AP_RESPONSE_NORMAL:
>>> +			return 0;
>>> +		case AP_RESPONSE_RESET_IN_PROGRESS:
>>> +		case AP_RESPONSE_BUSY:
>>> +			msleep(20);
>>> +			break;
>>> +		default:
>>> +			/* things are really broken, give up */
>>> +			return -EIO;
>>> +		}
>>> +	} while (retry--);
>>> +
>>> +	return -EBUSY;
>> So, this function may either return 0, -EIO (things are really broken),
>> or -EBUSY (still busy after multiple tries)...
>>
>>> +}
>>> +
>>> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev)
>>> +{
>>> +	int ret;
>>> +	int rc = 0;
>>> +	unsigned long apid, apqi;
>>> +	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> +
>>> +	for_each_set_bit_inv(apid, matrix_mdev->matrix.apm,
>>> +			     matrix_mdev->matrix.apm_max + 1) {
>>> +		for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm,
>>> +				     matrix_mdev->matrix.aqm_max + 1) {
>>> +			ret = vfio_ap_mdev_reset_queue(apid, apqi, 1);
>>> +			if (ret)
>>> +				rc = ret;
>> ...and here, we return the last error of any of the resets. Two
>> questions:
>>
>> - Does it make sense to continue if we get -EIO? IOW, does "really
>>   broken" only refer to a certain tuple and other tuples still can/need
>>   to be reset?
> I think it does make sense to continue, because IMHO "things are really
> broken" is an overstatement (I mean the APQN invalid case). One could
> argue would skipping the current card (adapter) be justified or not.
>
> IMHO the current code is good enough for the first shot, and we can think
> about fine-tuning it later.
Absolutely. The -EIO case is reached for example when the APQN
is 'deconfigured' which means the crypto adapter is logically unplugged.
So the -EIO case should NOT lead to some fatal actions like panic()
or cause a KVM guest to shut down or so.
>> - Is the return code useful in any way, as we don't know which tuple it
>>   refers to?
>>
> Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY
> is mostly fine given what the architecture say if we are satisfied with just
> reset. And the cases behind -EIO might actually be OK too in the same sense.
> My guess is, that based on the return value client code can tell if we have
> zeroize for all queues or basically just reset (like rapq). We could log that
> to some debug facility or whatever -- I guess, but at the moment we don't care.
>
> In the end I think the code is good enough as is, and if we want we can
> improve on it later.
>
> Regards,
> Halil
>
>
>>> +		}
>>> +	}
>>> +
>>> +	return rc;
>>> +}
>>> +
>>>  static int vfio_ap_mdev_open(struct mdev_device *mdev)
>>>  {
>>>  	struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
>>>  	if (matrix_mdev->kvm)
>>>  		kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>  
>>> +	vfio_ap_mdev_reset_queues(mdev);
>>>  	vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY,
>>>  				 &matrix_mdev->group_notifier);
>>>  	matrix_mdev->kvm = NULL;

Anthony Krowiak Sept. 24, 2018, 4:42 p.m. UTC | #5

On 09/24/2018 09:22 AM, Harald Freudenberger wrote:
> On 24.09.2018 14:16, Halil Pasic wrote:
>>
>> On 09/24/2018 01:36 PM, Cornelia Huck wrote:

(...)

>>> ...and here, we return the last error of any of the resets. Two
>>> questions:
>>>
>>> - Does it make sense to continue if we get -EIO? IOW, does "really
>>>    broken" only refer to a certain tuple and other tuples still can/need
>>>    to be reset?
>> I think it does make sense to continue, because IMHO "things are really
>> broken" is an overstatement (I mean the APQN invalid case). One could
>> argue would skipping the current card (adapter) be justified or not.
>>
>> IMHO the current code is good enough for the first shot, and we can think
>> about fine-tuning it later.
> Absolutely. The -EIO case is reached for example when the APQN
> is 'deconfigured' which means the crypto adapter is logically unplugged.
> So the -EIO case should NOT lead to some fatal actions like panic()
> or cause a KVM guest to shut down or so.
>>> - Is the return code useful in any way, as we don't know which tuple it
>>>    refers to?
>>>
>> Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY
>> is mostly fine given what the architecture say if we are satisfied with just
>> reset. And the cases behind -EIO might actually be OK too in the same sense.
>> My guess is, that based on the return value client code can tell if we have
>> zeroize for all queues or basically just reset (like rapq). We could log that
>> to some debug facility or whatever -- I guess, but at the moment we don't care.
>>
>> In the end I think the code is good enough as is, and if we want we can
>> improve on it later.
>>
>> Regards,
>> Halil
>>

I'll note that in v7 a message was logged to indicate for which APQN the 
error occurred, but I was asked to remove the printk log messsages. I 
agree with Halil and Harald confirmed that the code is probably okay as 
it stands. I can definitely see enhancing all of AP virtualization down 
the road with some type of debug logging.

>

[v10,13/26] s390: vfio-ap: zeroize the AP queues

Commit Message

Comments

Patch