Message ID | 20200918170234.5807-1-akrowiak@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | s390/vfio-ap: fix unregister GISC when KVM is already gone results in OOPS | expand |
On 18.09.20 19:02, Tony Krowiak wrote: > Attempting to unregister Guest Interruption Subclass (GISC) when the > link between the matrix mdev and KVM has been removed results in the > following: > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" I think the full backtrace would be better in case someone runs into this and needs to compare this patch to its oops. This also makes it easier to understand the fix. > > This patch fixes this bug by verifying the matrix mdev and KVM are still > linked prior to unregistering the GISC. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> Do we need a Fixes tag and cc stable? > --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..847a88642644 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) So we already check for q->matrix_mdev here > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > - if (q->saved_pfn && q->matrix_mdev) and here > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > - &q->saved_pfn, 1); > + if (q->matrix_mdev) { > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) ^^^^ and this is the only new check? Cant we just add this condition to the first if? > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > + q->saved_isc); > + if (q->saved_pfn) > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > + &q->saved_pfn, 1); > + } > + > q->saved_pfn = 0; > q->saved_isc = VFIO_AP_ISC_INVALID; > } >
On 2020-09-18 19:02, Tony Krowiak wrote: > Attempting to unregister Guest Interruption Subclass (GISC) when the > link between the matrix mdev and KVM has been removed results in the > following: > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" > > This patch fixes this bug by verifying the matrix mdev and KVM are still > linked prior to unregistering the GISC. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..847a88642644 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) I think you could have make the patch smaller by adding the missing test for kvm here. > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > - if (q->saved_pfn && q->matrix_mdev) > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > - &q->saved_pfn, 1); > + if (q->matrix_mdev) { > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > + q->saved_isc); > + if (q->saved_pfn) > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > + &q->saved_pfn, 1); > + } > + > q->saved_pfn = 0; > q->saved_isc = VFIO_AP_ISC_INVALID; > } >
On Fri, 18 Sep 2020 13:02:34 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > Attempting to unregister Guest Interruption Subclass (GISC) when the > link between the matrix mdev and KVM has been removed results in the > following: > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" I'm wondering how we get there (why are we unregistering the gisc if the mdev and kvm are not yet linked or are already unlinked?), so I agree that the actual backchain would be helpful here. > > This patch fixes this bug by verifying the matrix mdev and KVM are still > linked prior to unregistering the GISC. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..847a88642644 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) If checking for ->kvm is the right thing to do, I agree that moving the check here would be easier to read. > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > - if (q->saved_pfn && q->matrix_mdev) > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > - &q->saved_pfn, 1); > + if (q->matrix_mdev) { > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > + q->saved_isc); > + if (q->saved_pfn) > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > + &q->saved_pfn, 1); > + } > + > q->saved_pfn = 0; > q->saved_isc = VFIO_AP_ISC_INVALID; > }
On Mon, 21 Sep 2020 07:48:58 +0200 Christian Borntraeger <borntraeger@de.ibm.com> wrote: > > > On 18.09.20 19:02, Tony Krowiak wrote: > > Attempting to unregister Guest Interruption Subclass (GISC) when the > > link between the matrix mdev and KVM has been removed results in the > > following: > > > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" > > I think the full backtrace would be better in case someone runs into this > and needs to compare this patch to its oops. This also makes it easier to > understand the fix. > > > > This patch fixes this bug by verifying the matrix mdev and KVM are still > > linked prior to unregistering the GISC. > > > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > > Do we need a Fixes tag and cc stable? > I believe we do! > > > --- > > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > > 1 file changed, 9 insertions(+), 5 deletions(-) > > > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > > index e0bde8518745..847a88642644 100644 > > --- a/drivers/s390/crypto/vfio_ap_ops.c > > +++ b/drivers/s390/crypto/vfio_ap_ops.c > > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > > */ > > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > > { > > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) > > So we already check for q->matrix_mdev here > > > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > > - if (q->saved_pfn && q->matrix_mdev) > > and here > > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > > - &q->saved_pfn, 1); > > + if (q->matrix_mdev) { > > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) > ^^^^ and this is the only > new check? Cant we just add this condition to the first if? You are technically right, but I'm not comfortable with my level of understanding of this logic regardless of the coding style. Will ask some questions directly. Regards, Halil > > > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > > + q->saved_isc); > > + if (q->saved_pfn) > > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > > + &q->saved_pfn, 1); > > + } > > + > > q->saved_pfn = 0; > > q->saved_isc = VFIO_AP_ISC_INVALID; > > } > >
On Fri, 18 Sep 2020 13:02:34 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > Attempting to unregister Guest Interruption Subclass (GISC) when the > link between the matrix mdev and KVM has been removed results in the > following: > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" > > This patch fixes this bug by verifying the matrix mdev and KVM are still > linked prior to unregistering the GISC. I read from your commit message that this happens when the link between the KVM and the matrix mdev was established and then got severed. I assume the interrupts were previously enabled, and were not been disabled or cleaned up because q->saved_isc != VFIO_AP_ISC_INVALID. That means the guest enabled interrupts and then for whatever reason got destroyed, and this happens on mdev cleanup. Does it happen all the time or is it some sort of a race? > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..847a88642644 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > - if (q->saved_pfn && q->matrix_mdev) > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > - &q->saved_pfn, 1); > + if (q->matrix_mdev) { > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > + q->saved_isc); I don't quite understand the logic here. I suppose we need to ensure that the struct kvm is 'alive' at least until kvm_s390_gisc_unregister() is done. That is supposed be ensured by kvm_get_kvm() in vfio_ap_mdev_set_kvm() and kvm_put_kvm() in vfio_ap_mdev_release(). If the critical section in vfio_ap_mdev_release() is done and matrix_mdev->kvm was set to NULL there then I would expect that the queues are already reset and q->saved_isc == VFIO_AP_ISC_INVALID. So this should not blow up. Now if this happens before the critical section in vfio_ap_mdev_release() is done, I ask myself how are we going to do the kvm_put_kvm()? Another question. Do we hold the matrix_dev->lock here? > + if (q->saved_pfn) > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > + &q->saved_pfn, 1); > + } > + > q->saved_pfn = 0; > q->saved_isc = VFIO_AP_ISC_INVALID; > }
On 9/21/20 11:45 AM, Halil Pasic wrote: > On Fri, 18 Sep 2020 13:02:34 -0400 > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > >> Attempting to unregister Guest Interruption Subclass (GISC) when the >> link between the matrix mdev and KVM has been removed results in the >> following: >> >> "Kernel panic -not syncing: Fatal exception: panic_on_oops" >> >> This patch fixes this bug by verifying the matrix mdev and KVM are still >> linked prior to unregistering the GISC. > > I read from your commit message that this happens when the link between > the KVM and the matrix mdev was established and then got severed. > > I assume the interrupts were previously enabled, and were not been > disabled or cleaned up because q->saved_isc != VFIO_AP_ISC_INVALID. > > That means the guest enabled interrupts and then for whatever > reason got destroyed, and this happens on mdev cleanup. > > Does it happen all the time or is it some sort of a race? This is a race condition that happens when a guest is terminated and the mdev is removed in rapid succession. I came across it with one of my hades test cases on cleanup of the resources after the test case completes. There is a bug in the problem appears the vfio_ap_mdev_release function because it tries to reset the APQNs after the bits are cleared from the matrix_mdev.matrix, so the resets never happen. Fixing that, however, does not resolve the issue, so I'm in the process of doing a bunch of tracing to see the flow of the resets etc. during the lifecycle of the mdev during this hades test. I should have a better answer next week. > >> Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> >> --- >> drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- >> 1 file changed, 9 insertions(+), 5 deletions(-) >> >> diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c >> index e0bde8518745..847a88642644 100644 >> --- a/drivers/s390/crypto/vfio_ap_ops.c >> +++ b/drivers/s390/crypto/vfio_ap_ops.c >> @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) >> */ >> static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) >> { >> - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) >> - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); >> - if (q->saved_pfn && q->matrix_mdev) >> - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), >> - &q->saved_pfn, 1); >> + if (q->matrix_mdev) { >> + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) >> + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, >> + q->saved_isc); > I don't quite understand the logic here. I suppose we need to ensure > that the struct kvm is 'alive' at least until kvm_s390_gisc_unregister() > is done. That is supposed be ensured by kvm_get_kvm() in > vfio_ap_mdev_set_kvm() and kvm_put_kvm() in vfio_ap_mdev_release(). > > If the critical section in vfio_ap_mdev_release() is done and > matrix_mdev->kvm was set to NULL there then I would expect that the > queues are already reset and q->saved_isc == VFIO_AP_ISC_INVALID. So > this should not blow up. > > Now if this happens before the critical section in > vfio_ap_mdev_release() is done, I ask myself how are we going to do the > kvm_put_kvm()? > > Another question. Do we hold the matrix_dev->lock here? > >> + if (q->saved_pfn) >> + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), >> + &q->saved_pfn, 1); >> + } >> + >> q->saved_pfn = 0; >> q->saved_isc = VFIO_AP_ISC_INVALID; >> }
On Fri, 25 Sep 2020 18:29:16 -0400 Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > On 9/21/20 11:45 AM, Halil Pasic wrote: > > On Fri, 18 Sep 2020 13:02:34 -0400 > > Tony Krowiak <akrowiak@linux.ibm.com> wrote: > > > >> Attempting to unregister Guest Interruption Subclass (GISC) when the > >> link between the matrix mdev and KVM has been removed results in the > >> following: > >> > >> "Kernel panic -not syncing: Fatal exception: panic_on_oops" > >> > >> This patch fixes this bug by verifying the matrix mdev and KVM are still > >> linked prior to unregistering the GISC. > > > > I read from your commit message that this happens when the link between > > the KVM and the matrix mdev was established and then got severed. > > > > I assume the interrupts were previously enabled, and were not been > > disabled or cleaned up because q->saved_isc != VFIO_AP_ISC_INVALID. > > > > That means the guest enabled interrupts and then for whatever > > reason got destroyed, and this happens on mdev cleanup. > > > > Does it happen all the time or is it some sort of a race? > > This is a race condition that happens when a guest is terminated and the > mdev is > removed in rapid succession. I came across it with one of my hades test > cases > on cleanup of the resources after the test case completes. There is a > bug in the problem appears > the vfio_ap_mdev_releasefunction because it tries to reset the APQNs > after the bits are > cleared from the matrix_mdev.matrix, so the resets never happen. > That sounds very strange. I couldn't find the place where we clear the bits in matrix_mdev.matrix except for unassign. Currently the unassign is supposed to be enabled only after we have no guest and we have cleaned up the queues (which should restore VFIO_AP_ISC_INVALID). Does your test do any unassign operations? (I'm not sure the we always do like we are supposed to.) Now if we did not clear the bits from matrix_mdev.matrix then this could be an use after free scenario (where we interpret already re-purposed memory as matrix_mdev.matrix). > Fixing that, however, does not resolve the issue, so I'm in the process > of doing a bunch of > tracing to see the flow of the resets etc. during the lifecycle of the > mdev during this > hades test. I should have a better answer next week. > My take away is that we don't understand what exactly is going wrong, and so this patch is at best a mitigation (not a real fix). Does that sound about correct? Regards, Halil [..]
In trying to recreate this problem in order to get a stack trace, I discovered that it only occurs within a local repository that has several new patches applied, so the problem is not part of the base code and will be fixed via this new set of patches forthcoming. On 9/18/20 1:02 PM, Tony Krowiak wrote: > Attempting to unregister Guest Interruption Subclass (GISC) when the > link between the matrix mdev and KVM has been removed results in the > following: > > "Kernel panic -not syncing: Fatal exception: panic_on_oops" > > This patch fixes this bug by verifying the matrix mdev and KVM are still > linked prior to unregistering the GISC. > > Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> > --- > drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- > 1 file changed, 9 insertions(+), 5 deletions(-) > > diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c > index e0bde8518745..847a88642644 100644 > --- a/drivers/s390/crypto/vfio_ap_ops.c > +++ b/drivers/s390/crypto/vfio_ap_ops.c > @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) > */ > static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) > { > - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) > - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); > - if (q->saved_pfn && q->matrix_mdev) > - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > - &q->saved_pfn, 1); > + if (q->matrix_mdev) { > + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) > + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, > + q->saved_isc); > + if (q->saved_pfn) > + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), > + &q->saved_pfn, 1); > + } > + > q->saved_pfn = 0; > q->saved_isc = VFIO_AP_ISC_INVALID; > }
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c index e0bde8518745..847a88642644 100644 --- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -119,11 +119,15 @@ static void vfio_ap_wait_for_irqclear(int apqn) */ static void vfio_ap_free_aqic_resources(struct vfio_ap_queue *q) { - if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev) - kvm_s390_gisc_unregister(q->matrix_mdev->kvm, q->saved_isc); - if (q->saved_pfn && q->matrix_mdev) - vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), - &q->saved_pfn, 1); + if (q->matrix_mdev) { + if (q->saved_isc != VFIO_AP_ISC_INVALID && q->matrix_mdev->kvm) + kvm_s390_gisc_unregister(q->matrix_mdev->kvm, + q->saved_isc); + if (q->saved_pfn) + vfio_unpin_pages(mdev_dev(q->matrix_mdev->mdev), + &q->saved_pfn, 1); + } + q->saved_pfn = 0; q->saved_isc = VFIO_AP_ISC_INVALID; }
Attempting to unregister Guest Interruption Subclass (GISC) when the link between the matrix mdev and KVM has been removed results in the following: "Kernel panic -not syncing: Fatal exception: panic_on_oops" This patch fixes this bug by verifying the matrix mdev and KVM are still linked prior to unregistering the GISC. Signed-off-by: Tony Krowiak <akrowiak@linux.ibm.com> --- drivers/s390/crypto/vfio_ap_ops.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)