diff mbox series

[v4,04/14] KVM: s390: pv: avoid stalls when making pages secure

Message ID 20210818132620.46770-5-imbrenda@linux.ibm.com (mailing list archive)
State New, archived
Headers show
Series KVM: s390: pv: implement lazy destroy for reboot | expand

Commit Message

Claudio Imbrenda Aug. 18, 2021, 1:26 p.m. UTC
Improve make_secure_pte to avoid stalls when the system is heavily
overcommitted. This was especially problematic in kvm_s390_pv_unpack,
because of the loop over all pages that needed unpacking.

Due to the locks being held, it was not possible to simply replace
uv_call with uv_call_sched. A more complex approach was
needed, in which uv_call is replaced with __uv_call, which does not
loop. When the UVC needs to be executed again, -EAGAIN is returned, and
the caller (or its caller) will try again.

When -EAGAIN is returned, the path is the same as when the page is in
writeback (and the writeback check is also performed, which is
harmless).

Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests")
---
 arch/s390/kernel/uv.c | 29 +++++++++++++++++++++++------
 1 file changed, 23 insertions(+), 6 deletions(-)

Comments

Christian Borntraeger Aug. 31, 2021, 2:32 p.m. UTC | #1
On 18.08.21 15:26, Claudio Imbrenda wrote:
> Improve make_secure_pte to avoid stalls when the system is heavily
> overcommitted. This was especially problematic in kvm_s390_pv_unpack,
> because of the loop over all pages that needed unpacking.
> 
> Due to the locks being held, it was not possible to simply replace
> uv_call with uv_call_sched. A more complex approach was
> needed, in which uv_call is replaced with __uv_call, which does not
> loop. When the UVC needs to be executed again, -EAGAIN is returned, and
> the caller (or its caller) will try again.
> 
> When -EAGAIN is returned, the path is the same as when the page is in
> writeback (and the writeback check is also performed, which is
> harmless).

To me it looks like
handle_pv_uvc does not handle EAGAIN but also calls into this code. Is this code
path ok or do we need to change something here?

> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests")
> ---
>   arch/s390/kernel/uv.c | 29 +++++++++++++++++++++++------
>   1 file changed, 23 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> index aeb0a15bcbb7..68a8fbafcb9c 100644
> --- a/arch/s390/kernel/uv.c
> +++ b/arch/s390/kernel/uv.c
> @@ -180,7 +180,7 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
>   {
>   	pte_t entry = READ_ONCE(*ptep);
>   	struct page *page;
> -	int expected, rc = 0;
> +	int expected, cc = 0;
>   
>   	if (!pte_present(entry))
>   		return -ENXIO;
> @@ -196,12 +196,25 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
>   	if (!page_ref_freeze(page, expected))
>   		return -EBUSY;
>   	set_bit(PG_arch_1, &page->flags);
> -	rc = uv_call(0, (u64)uvcb);
> +	/*
> +	 * If the UVC does not succeed or fail immediately, we don't want to
> +	 * loop for long, or we might get stall notifications.
> +	 * On the other hand, this is a complex scenario and we are holding a lot of
> +	 * locks, so we can't easily sleep and reschedule. We try only once,
> +	 * and if the UVC returned busy or partial completion, we return
> +	 * -EAGAIN and we let the callers deal with it.
> +	 */
> +	cc = __uv_call(0, (u64)uvcb);
>   	page_ref_unfreeze(page, expected);
> -	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
> -	if (rc)
> -		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
> -	return rc;
> +	/*
> +	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
> +	 * If busy or partially completed, return -EAGAIN.
> +	 */
> +	if (cc == UVC_CC_OK)
> +		return 0;
> +	else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
> +		return -EAGAIN;
> +	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
>   }
>   
>   /*
> @@ -254,6 +267,10 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>   	mmap_read_unlock(gmap->mm);
>   
>   	if (rc == -EAGAIN) {
> +		/*
> +		 * If we are here because the UVC returned busy or partial
> +		 * completion, this is just a useless check, but it is safe.
> +		 */
>   		wait_on_page_writeback(page);
>   	} else if (rc == -EBUSY) {
>   		/*
>
Claudio Imbrenda Aug. 31, 2021, 3 p.m. UTC | #2
On Tue, 31 Aug 2021 16:32:24 +0200
Christian Borntraeger <borntraeger@de.ibm.com> wrote:

> On 18.08.21 15:26, Claudio Imbrenda wrote:
> > Improve make_secure_pte to avoid stalls when the system is heavily
> > overcommitted. This was especially problematic in kvm_s390_pv_unpack,
> > because of the loop over all pages that needed unpacking.
> > 
> > Due to the locks being held, it was not possible to simply replace
> > uv_call with uv_call_sched. A more complex approach was
> > needed, in which uv_call is replaced with __uv_call, which does not
> > loop. When the UVC needs to be executed again, -EAGAIN is returned, and
> > the caller (or its caller) will try again.
> > 
> > When -EAGAIN is returned, the path is the same as when the page is in
> > writeback (and the writeback check is also performed, which is
> > harmless).  
> 
> To me it looks like
> handle_pv_uvc does not handle EAGAIN but also calls into this code. Is this code
> path ok or do we need to change something here?

EAGAIN will be propagated all the way to userspace, which will retry.

if the UVC fails, the page does not get unpinned, and the next attempt
to run the UVC in the guest will trigger this same path.

if you don't like it, I can change handle_pv_uvc like this

	if (rc == -EINVAL || rc == -EAGAIN)

which will save a trip to userspace

> 
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
> > Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests")
> > ---
> >   arch/s390/kernel/uv.c | 29 +++++++++++++++++++++++------
> >   1 file changed, 23 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
> > index aeb0a15bcbb7..68a8fbafcb9c 100644
> > --- a/arch/s390/kernel/uv.c
> > +++ b/arch/s390/kernel/uv.c
> > @@ -180,7 +180,7 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
> >   {
> >   	pte_t entry = READ_ONCE(*ptep);
> >   	struct page *page;
> > -	int expected, rc = 0;
> > +	int expected, cc = 0;
> >   
> >   	if (!pte_present(entry))
> >   		return -ENXIO;
> > @@ -196,12 +196,25 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
> >   	if (!page_ref_freeze(page, expected))
> >   		return -EBUSY;
> >   	set_bit(PG_arch_1, &page->flags);
> > -	rc = uv_call(0, (u64)uvcb);
> > +	/*
> > +	 * If the UVC does not succeed or fail immediately, we don't want to
> > +	 * loop for long, or we might get stall notifications.
> > +	 * On the other hand, this is a complex scenario and we are holding a lot of
> > +	 * locks, so we can't easily sleep and reschedule. We try only once,
> > +	 * and if the UVC returned busy or partial completion, we return
> > +	 * -EAGAIN and we let the callers deal with it.
> > +	 */
> > +	cc = __uv_call(0, (u64)uvcb);
> >   	page_ref_unfreeze(page, expected);
> > -	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
> > -	if (rc)
> > -		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
> > -	return rc;
> > +	/*
> > +	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
> > +	 * If busy or partially completed, return -EAGAIN.
> > +	 */
> > +	if (cc == UVC_CC_OK)
> > +		return 0;
> > +	else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
> > +		return -EAGAIN;
> > +	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
> >   }
> >   
> >   /*
> > @@ -254,6 +267,10 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
> >   	mmap_read_unlock(gmap->mm);
> >   
> >   	if (rc == -EAGAIN) {
> > +		/*
> > +		 * If we are here because the UVC returned busy or partial
> > +		 * completion, this is just a useless check, but it is safe.
> > +		 */
> >   		wait_on_page_writeback(page);
> >   	} else if (rc == -EBUSY) {
> >   		/*
> >
Christian Borntraeger Aug. 31, 2021, 3:11 p.m. UTC | #3
On 31.08.21 17:00, Claudio Imbrenda wrote:
> On Tue, 31 Aug 2021 16:32:24 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 18.08.21 15:26, Claudio Imbrenda wrote:
>>> Improve make_secure_pte to avoid stalls when the system is heavily
>>> overcommitted. This was especially problematic in kvm_s390_pv_unpack,
>>> because of the loop over all pages that needed unpacking.
>>>
>>> Due to the locks being held, it was not possible to simply replace
>>> uv_call with uv_call_sched. A more complex approach was
>>> needed, in which uv_call is replaced with __uv_call, which does not
>>> loop. When the UVC needs to be executed again, -EAGAIN is returned, and
>>> the caller (or its caller) will try again.
>>>
>>> When -EAGAIN is returned, the path is the same as when the page is in
>>> writeback (and the writeback check is also performed, which is
>>> harmless).
>>
>> To me it looks like
>> handle_pv_uvc does not handle EAGAIN but also calls into this code. Is this code
>> path ok or do we need to change something here?
> 
> EAGAIN will be propagated all the way to userspace, which will retry.
> 
> if the UVC fails, the page does not get unpinned, and the next attempt
> to run the UVC in the guest will trigger this same path.
> 
> if you don't like it, I can change handle_pv_uvc like this
> 
> 	if (rc == -EINVAL || rc == -EAGAIN)
> 
> which will save a trip to userspace

I think a comment would be good.
> 
>>
>>>
>>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
>>> Fixes: 214d9bbcd3a672 ("s390/mm: provide memory management functions for protected KVM guests")
>>> ---
>>>    arch/s390/kernel/uv.c | 29 +++++++++++++++++++++++------
>>>    1 file changed, 23 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
>>> index aeb0a15bcbb7..68a8fbafcb9c 100644
>>> --- a/arch/s390/kernel/uv.c
>>> +++ b/arch/s390/kernel/uv.c
>>> @@ -180,7 +180,7 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
>>>    {
>>>    	pte_t entry = READ_ONCE(*ptep);
>>>    	struct page *page;
>>> -	int expected, rc = 0;
>>> +	int expected, cc = 0;
>>>    
>>>    	if (!pte_present(entry))
>>>    		return -ENXIO;
>>> @@ -196,12 +196,25 @@ static int make_secure_pte(pte_t *ptep, unsigned long addr,
>>>    	if (!page_ref_freeze(page, expected))
>>>    		return -EBUSY;
>>>    	set_bit(PG_arch_1, &page->flags);
>>> -	rc = uv_call(0, (u64)uvcb);
>>> +	/*
>>> +	 * If the UVC does not succeed or fail immediately, we don't want to
>>> +	 * loop for long, or we might get stall notifications.
>>> +	 * On the other hand, this is a complex scenario and we are holding a lot of
>>> +	 * locks, so we can't easily sleep and reschedule. We try only once,
>>> +	 * and if the UVC returned busy or partial completion, we return
>>> +	 * -EAGAIN and we let the callers deal with it.
>>> +	 */
>>> +	cc = __uv_call(0, (u64)uvcb);
>>>    	page_ref_unfreeze(page, expected);
>>> -	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
>>> -	if (rc)
>>> -		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
>>> -	return rc;
>>> +	/*
>>> +	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
>>> +	 * If busy or partially completed, return -EAGAIN.
>>> +	 */
>>> +	if (cc == UVC_CC_OK)
>>> +		return 0;
>>> +	else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
>>> +		return -EAGAIN;
>>> +	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
>>>    }
>>>    
>>>    /*
>>> @@ -254,6 +267,10 @@ int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
>>>    	mmap_read_unlock(gmap->mm);
>>>    
>>>    	if (rc == -EAGAIN) {
>>> +		/*
>>> +		 * If we are here because the UVC returned busy or partial
>>> +		 * completion, this is just a useless check, but it is safe.
>>> +		 */
>>>    		wait_on_page_writeback(page);
>>>    	} else if (rc == -EBUSY) {
>>>    		/*
>>>    
>
diff mbox series

Patch

diff --git a/arch/s390/kernel/uv.c b/arch/s390/kernel/uv.c
index aeb0a15bcbb7..68a8fbafcb9c 100644
--- a/arch/s390/kernel/uv.c
+++ b/arch/s390/kernel/uv.c
@@ -180,7 +180,7 @@  static int make_secure_pte(pte_t *ptep, unsigned long addr,
 {
 	pte_t entry = READ_ONCE(*ptep);
 	struct page *page;
-	int expected, rc = 0;
+	int expected, cc = 0;
 
 	if (!pte_present(entry))
 		return -ENXIO;
@@ -196,12 +196,25 @@  static int make_secure_pte(pte_t *ptep, unsigned long addr,
 	if (!page_ref_freeze(page, expected))
 		return -EBUSY;
 	set_bit(PG_arch_1, &page->flags);
-	rc = uv_call(0, (u64)uvcb);
+	/*
+	 * If the UVC does not succeed or fail immediately, we don't want to
+	 * loop for long, or we might get stall notifications.
+	 * On the other hand, this is a complex scenario and we are holding a lot of
+	 * locks, so we can't easily sleep and reschedule. We try only once,
+	 * and if the UVC returned busy or partial completion, we return
+	 * -EAGAIN and we let the callers deal with it.
+	 */
+	cc = __uv_call(0, (u64)uvcb);
 	page_ref_unfreeze(page, expected);
-	/* Return -ENXIO if the page was not mapped, -EINVAL otherwise */
-	if (rc)
-		rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
-	return rc;
+	/*
+	 * Return -ENXIO if the page was not mapped, -EINVAL for other errors.
+	 * If busy or partially completed, return -EAGAIN.
+	 */
+	if (cc == UVC_CC_OK)
+		return 0;
+	else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL)
+		return -EAGAIN;
+	return uvcb->rc == 0x10a ? -ENXIO : -EINVAL;
 }
 
 /*
@@ -254,6 +267,10 @@  int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb)
 	mmap_read_unlock(gmap->mm);
 
 	if (rc == -EAGAIN) {
+		/*
+		 * If we are here because the UVC returned busy or partial
+		 * completion, this is just a useless check, but it is safe.
+		 */
 		wait_on_page_writeback(page);
 	} else if (rc == -EBUSY) {
 		/*