Message ID | 1536781396-13601-14-git-send-email-akrowiak@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | guest dedicated crypto adapters | expand |
On Wed, 12 Sep 2018 15:43:03 -0400 Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > From: Tony Krowiak <akrowiak@linux.ibm.com> > > Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured > for a mediated matrix device when it is released. > > Zeroizing a queue resets the queue, clears all pending > messages for the queue entries and disables adapter interruptions > associated with the queue. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > Reviewed-by: Halil Pasic <pasic@linux.ibm.com> > Tested-by: Michael Mueller <mimu@linux.ibm.com> > Tested-by: Farhan Ali <alifm@linux.ibm.com> > Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> > --- > drivers/s390/crypto/vfio_ap_ops.c | 44 +++++++++++++++++++++++++++++++++++++ > 1 files changed, 44 insertions(+), 0 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index f8b276a..48b1b78 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, > return NOTIFY_OK; > } > > +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > + unsigned int retry) > +{ > + struct ap_queue_status status; > + > + do { > + status = ap_zapq(AP_MKQID(apid, apqi)); > + switch (status.response_code) { > + case AP_RESPONSE_NORMAL: > + return 0; > + case AP_RESPONSE_RESET_IN_PROGRESS: > + case AP_RESPONSE_BUSY: > + msleep(20); > + break; > + default: > + /* things are really broken, give up */ > + return -EIO; > + } > + } while (retry--); > + > + return -EBUSY; So, this function may either return 0, -EIO (things are really broken), or -EBUSY (still busy after multiple tries)... > +} > + > +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > +{ > + int ret; > + int rc = 0; > + unsigned long apid, apqi; > + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > + > + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, > + matrix_mdev->matrix.apm_max + 1) { > + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, > + matrix_mdev->matrix.aqm_max + 1) { > + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); > + if (ret) > + rc = ret; ...and here, we return the last error of any of the resets. Two questions: - Does it make sense to continue if we get -EIO? IOW, does "really broken" only refer to a certain tuple and other tuples still can/need to be reset? - Is the return code useful in any way, as we don't know which tuple it refers to? > + } > + } > + > + return rc; > +} > + > static int vfio_ap_mdev_open(struct mdev_device *mdev) > { > struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) > if (matrix_mdev->kvm) > kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > > + vfio_ap_mdev_reset_queues(mdev); > vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, > &matrix_mdev->group_notifier); > matrix_mdev->kvm = NULL;
On 09/24/2018 01:36 PM, Cornelia Huck wrote: > On Wed, 12 Sep 2018 15:43:03 -0400 > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> From: Tony Krowiak <akrowiak@linux.ibm.com> >> >> Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured >> for a mediated matrix device when it is released. >> >> Zeroizing a queue resets the queue, clears all pending >> messages for the queue entries and disables adapter interruptions >> associated with the queue. >> >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> >> Tested-by: Michael Mueller <mimu@linux.ibm.com> >> Tested-by: Farhan Ali <alifm@linux.ibm.com> >> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >> --- >> drivers/s390/crypto/vfio_ap_ops.c | 44 +++++++++++++++++++++++++++++++++++++ >> 1 files changed, 44 insertions(+), 0 deletions(-) >> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c >> index f8b276a..48b1b78 100644 >> --- a/drivers/s390/crypto/vfio_ap_ops.c >> +++ b/drivers/s390/crypto/vfio_ap_ops.c >> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, >> return NOTIFY_OK; >> } >> >> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >> + unsigned int retry) >> +{ >> + struct ap_queue_status status; >> + >> + do { >> + status = ap_zapq(AP_MKQID(apid, apqi)); >> + switch (status.response_code) { >> + case AP_RESPONSE_NORMAL: >> + return 0; >> + case AP_RESPONSE_RESET_IN_PROGRESS: >> + case AP_RESPONSE_BUSY: >> + msleep(20); >> + break; >> + default: >> + /* things are really broken, give up */ >> + return -EIO; >> + } >> + } while (retry--); >> + >> + return -EBUSY; > > So, this function may either return 0, -EIO (things are really broken), > or -EBUSY (still busy after multiple tries)... > >> +} >> + >> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >> +{ >> + int ret; >> + int rc = 0; >> + unsigned long apid, apqi; >> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >> + >> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, >> + matrix_mdev->matrix.apm_max + 1) { >> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, >> + matrix_mdev->matrix.aqm_max + 1) { >> + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); >> + if (ret) >> + rc = ret; > > ...and here, we return the last error of any of the resets. Two > questions: > > - Does it make sense to continue if we get -EIO? IOW, does "really > broken" only refer to a certain tuple and other tuples still can/need > to be reset? I think it does make sense to continue, because IMHO "things are really broken" is an overstatement (I mean the APQN invalid case). One could argue would skipping the current card (adapter) be justified or not. IMHO the current code is good enough for the first shot, and we can think about fine-tuning it later. > - Is the return code useful in any way, as we don't know which tuple it > refers to? > Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY is mostly fine given what the architecture say if we are satisfied with just reset. And the cases behind -EIO might actually be OK too in the same sense. My guess is, that based on the return value client code can tell if we have zeroize for all queues or basically just reset (like rapq). We could log that to some debug facility or whatever -- I guess, but at the moment we don't care. In the end I think the code is good enough as is, and if we want we can improve on it later. Regards, Halil >> + } >> + } >> + >> + return rc; >> +} >> + >> static int vfio_ap_mdev_open(struct mdev_device *mdev) >> { >> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) >> if (matrix_mdev->kvm) >> kvm_arch_crypto_clear_masks(matrix_mdev->kvm); >> >> + vfio_ap_mdev_reset_queues(mdev); >> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, >> &matrix_mdev->group_notifier); >> matrix_mdev->kvm = NULL; >
On Mon, 24 Sep 2018 14:16:42 +0200 Halil Pasic <pasic@linux.ibm.com> wrote: > On 09/24/2018 01:36 PM, Cornelia Huck wrote: > > On Wed, 12 Sep 2018 15:43:03 -0400 > > Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: > >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > >> index f8b276a..48b1b78 100644 > >> --- a/drivers/s390/crypto/vfio_ap_ops.c > >> +++ b/drivers/s390/crypto/vfio_ap_ops.c > >> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, > >> return NOTIFY_OK; > >> } > >> > >> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, > >> + unsigned int retry) > >> +{ > >> + struct ap_queue_status status; > >> + > >> + do { > >> + status = ap_zapq(AP_MKQID(apid, apqi)); > >> + switch (status.response_code) { > >> + case AP_RESPONSE_NORMAL: > >> + return 0; > >> + case AP_RESPONSE_RESET_IN_PROGRESS: > >> + case AP_RESPONSE_BUSY: > >> + msleep(20); > >> + break; > >> + default: > >> + /* things are really broken, give up */ > >> + return -EIO; > >> + } > >> + } while (retry--); > >> + > >> + return -EBUSY; > > > > So, this function may either return 0, -EIO (things are really broken), > > or -EBUSY (still busy after multiple tries)... > > > >> +} > >> + > >> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) > >> +{ > >> + int ret; > >> + int rc = 0; > >> + unsigned long apid, apqi; > >> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > >> + > >> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, > >> + matrix_mdev->matrix.apm_max + 1) { > >> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, > >> + matrix_mdev->matrix.aqm_max + 1) { > >> + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); > >> + if (ret) > >> + rc = ret; > > > > ...and here, we return the last error of any of the resets. Two > > questions: > > > > - Does it make sense to continue if we get -EIO? IOW, does "really > > broken" only refer to a certain tuple and other tuples still can/need > > to be reset? > > I think it does make sense to continue, because IMHO "things are really > broken" is an overstatement (I mean the APQN invalid case). One could > argue would skipping the current card (adapter) be justified or not. A short comment ("even after -EIO, other devices still need to be reset") may be helpful here (remember that I don't have any way to verify this with the architecture). > > IMHO the current code is good enough for the first shot, and we can think > about fine-tuning it later. Sure. > > > - Is the return code useful in any way, as we don't know which tuple it > > refers to? > > > > Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY > is mostly fine given what the architecture say if we are satisfied with just > reset. And the cases behind -EIO might actually be OK too in the same sense. > My guess is, that based on the return value client code can tell if we have > zeroize for all queues or basically just reset (like rapq). We could log that > to some debug facility or whatever -- I guess, but at the moment we don't care. Logging would probably be more useful than the return code, but that can be added later. > > In the end I think the code is good enough as is, and if we want we can > improve on it later. I don't object to that; but this is all a bit confusing to readers without access to the architecture, so I think a comment or two would really improve things. > > Regards, > Halil > > > >> + } > >> + } > >> + > >> + return rc; > >> +} > >> + > >> static int vfio_ap_mdev_open(struct mdev_device *mdev) > >> { > >> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); > >> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) > >> if (matrix_mdev->kvm) > >> kvm_arch_crypto_clear_masks(matrix_mdev->kvm); > >> > >> + vfio_ap_mdev_reset_queues(mdev); > >> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, > >> &matrix_mdev->group_notifier); > >> matrix_mdev->kvm = NULL; > > >
On 24.09.2018 14:16, Halil Pasic wrote: > > On 09/24/2018 01:36 PM, Cornelia Huck wrote: >> On Wed, 12 Sep 2018 15:43:03 -0400 >> Tony Krowiak <akrowiak@linux.vnet.ibm.com> wrote: >> >>> From: Tony Krowiak <akrowiak@linux.ibm.com> >>> >>> Let's call PAPQ(ZAPQ) to zeroize a queue for each queue configured >>> for a mediated matrix device when it is released. >>> >>> Zeroizing a queue resets the queue, clears all pending >>> messages for the queue entries and disables adapter interruptions >>> associated with the queue. >>> >>> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >>> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> >>> Tested-by: Michael Mueller <mimu@linux.ibm.com> >>> Tested-by: Farhan Ali <alifm@linux.ibm.com> >>> Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com> >>> --- >>> drivers/s390/crypto/vfio_ap_ops.c | 44 +++++++++++++++++++++++++++++++++++++ >>> 1 files changed, 44 insertions(+), 0 deletions(-) >>> >>> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c >>> index f8b276a..48b1b78 100644 >>> --- a/drivers/s390/crypto/vfio_ap_ops.c >>> +++ b/drivers/s390/crypto/vfio_ap_ops.c >>> @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, >>> return NOTIFY_OK; >>> } >>> >>> +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, >>> + unsigned int retry) >>> +{ >>> + struct ap_queue_status status; >>> + >>> + do { >>> + status = ap_zapq(AP_MKQID(apid, apqi)); >>> + switch (status.response_code) { >>> + case AP_RESPONSE_NORMAL: >>> + return 0; >>> + case AP_RESPONSE_RESET_IN_PROGRESS: >>> + case AP_RESPONSE_BUSY: >>> + msleep(20); >>> + break; >>> + default: >>> + /* things are really broken, give up */ >>> + return -EIO; >>> + } >>> + } while (retry--); >>> + >>> + return -EBUSY; >> So, this function may either return 0, -EIO (things are really broken), >> or -EBUSY (still busy after multiple tries)... >> >>> +} >>> + >>> +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) >>> +{ >>> + int ret; >>> + int rc = 0; >>> + unsigned long apid, apqi; >>> + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >>> + >>> + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, >>> + matrix_mdev->matrix.apm_max + 1) { >>> + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, >>> + matrix_mdev->matrix.aqm_max + 1) { >>> + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); >>> + if (ret) >>> + rc = ret; >> ...and here, we return the last error of any of the resets. Two >> questions: >> >> - Does it make sense to continue if we get -EIO? IOW, does "really >> broken" only refer to a certain tuple and other tuples still can/need >> to be reset? > I think it does make sense to continue, because IMHO "things are really > broken" is an overstatement (I mean the APQN invalid case). One could > argue would skipping the current card (adapter) be justified or not. > > IMHO the current code is good enough for the first shot, and we can think > about fine-tuning it later. Absolutely. The -EIO case is reached for example when the APQN is 'deconfigured' which means the crypto adapter is logically unplugged. So the -EIO case should NOT lead to some fatal actions like panic() or cause a KVM guest to shut down or so. >> - Is the return code useful in any way, as we don't know which tuple it >> refers to? >> > Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY > is mostly fine given what the architecture say if we are satisfied with just > reset. And the cases behind -EIO might actually be OK too in the same sense. > My guess is, that based on the return value client code can tell if we have > zeroize for all queues or basically just reset (like rapq). We could log that > to some debug facility or whatever -- I guess, but at the moment we don't care. > > In the end I think the code is good enough as is, and if we want we can > improve on it later. > > Regards, > Halil > > >>> + } >>> + } >>> + >>> + return rc; >>> +} >>> + >>> static int vfio_ap_mdev_open(struct mdev_device *mdev) >>> { >>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); >>> @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) >>> if (matrix_mdev->kvm) >>> kvm_arch_crypto_clear_masks(matrix_mdev->kvm); >>> >>> + vfio_ap_mdev_reset_queues(mdev); >>> vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, >>> &matrix_mdev->group_notifier); >>> matrix_mdev->kvm = NULL;
On 09/24/2018 09:22 AM, Harald Freudenberger wrote: > On 24.09.2018 14:16, Halil Pasic wrote: >> >> On 09/24/2018 01:36 PM, Cornelia Huck wrote: (...) >>> ...and here, we return the last error of any of the resets. Two >>> questions: >>> >>> - Does it make sense to continue if we get -EIO? IOW, does "really >>> broken" only refer to a certain tuple and other tuples still can/need >>> to be reset? >> I think it does make sense to continue, because IMHO "things are really >> broken" is an overstatement (I mean the APQN invalid case). One could >> argue would skipping the current card (adapter) be justified or not. >> >> IMHO the current code is good enough for the first shot, and we can think >> about fine-tuning it later. > Absolutely. The -EIO case is reached for example when the APQN > is 'deconfigured' which means the crypto adapter is logically unplugged. > So the -EIO case should NOT lead to some fatal actions like panic() > or cause a KVM guest to shut down or so. >>> - Is the return code useful in any way, as we don't know which tuple it >>> refers to? >>> >> Well, good question. It conveys that the operation can 'fail'. AFAIR -EBUSY >> is mostly fine given what the architecture say if we are satisfied with just >> reset. And the cases behind -EIO might actually be OK too in the same sense. >> My guess is, that based on the return value client code can tell if we have >> zeroize for all queues or basically just reset (like rapq). We could log that >> to some debug facility or whatever -- I guess, but at the moment we don't care. >> >> In the end I think the code is good enough as is, and if we want we can >> improve on it later. >> >> Regards, >> Halil >> I'll note that in v7 a message was logged to indicate for which APQN the error occurred, but I was asked to remove the printk log messsages. I agree with Halil and Harald confirmed that the code is probably okay as it stands. I can definitely see enhancing all of AP virtualization down the road with some type of debug logging. >
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index f8b276a..48b1b78 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -829,6 +829,49 @@ static int vfio_ap_mdev_group_notifier(struct notifier_block *nb, return NOTIFY_OK; } +static int vfio_ap_mdev_reset_queue(unsigned int apid, unsigned int apqi, + unsigned int retry) +{ + struct ap_queue_status status; + + do { + status = ap_zapq(AP_MKQID(apid, apqi)); + switch (status.response_code) { + case AP_RESPONSE_NORMAL: + return 0; + case AP_RESPONSE_RESET_IN_PROGRESS: + case AP_RESPONSE_BUSY: + msleep(20); + break; + default: + /* things are really broken, give up */ + return -EIO; + } + } while (retry--); + + return -EBUSY; +} + +static int vfio_ap_mdev_reset_queues(struct mdev_device *mdev) +{ + int ret; + int rc = 0; + unsigned long apid, apqi; + struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); + + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, + matrix_mdev->matrix.apm_max + 1) { + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, + matrix_mdev->matrix.aqm_max + 1) { + ret = vfio_ap_mdev_reset_queue(apid, apqi, 1); + if (ret) + rc = ret; + } + } + + return rc; +} + static int vfio_ap_mdev_open(struct mdev_device *mdev) { struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev); @@ -859,6 +902,7 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev) if (matrix_mdev->kvm) kvm_arch_crypto_clear_masks(matrix_mdev->kvm); + vfio_ap_mdev_reset_queues(mdev); vfio_unregister_notifier(mdev_dev(mdev), VFIO_GROUP_NOTIFY, &matrix_mdev->group_notifier); matrix_mdev->kvm = NULL;