diff mbox series

[v9,06/10] mm/memory.c: Allow different return codes for copy_nonpresent_pte()

Message ID 20210524132725.12697-7-apopple@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add support for SVM atomics in Nouveau | expand

Commit Message

Alistair Popple May 24, 2021, 1:27 p.m. UTC
Currently if copy_nonpresent_pte() returns a non-zero value it is
assumed to be a swap entry which requires further processing outside the
loop in copy_pte_range() after dropping locks. This prevents other
values being returned to signal conditions such as failure which a
subsequent change requires.

Instead make copy_nonpresent_pte() return an error code if further
processing is required and read the value for the swap entry in the main
loop under the ptl.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

v9:

New for v9 to allow device exclusive handling to occur in
copy_nonpresent_pte().
---
 mm/memory.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Peter Xu May 26, 2021, 7:50 p.m. UTC | #1
On Mon, May 24, 2021 at 11:27:21PM +1000, Alistair Popple wrote:
> Currently if copy_nonpresent_pte() returns a non-zero value it is
> assumed to be a swap entry which requires further processing outside the
> loop in copy_pte_range() after dropping locks. This prevents other
> values being returned to signal conditions such as failure which a
> subsequent change requires.
> 
> Instead make copy_nonpresent_pte() return an error code if further
> processing is required and read the value for the swap entry in the main
> loop under the ptl.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> 
> ---
> 
> v9:
> 
> New for v9 to allow device exclusive handling to occur in
> copy_nonpresent_pte().
> ---
>  mm/memory.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index 2fb455c365c2..e061cfa18c11 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -718,7 +718,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  
>  	if (likely(!non_swap_entry(entry))) {
>  		if (swap_duplicate(entry) < 0)
> -			return entry.val;
> +			return -EAGAIN;
>  
>  		/* make sure dst_mm is on swapoff's mmlist. */
>  		if (unlikely(list_empty(&dst_mm->mmlist))) {
> @@ -974,11 +974,13 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
>  			continue;
>  		}
>  		if (unlikely(!pte_present(*src_pte))) {
> -			entry.val = copy_nonpresent_pte(dst_mm, src_mm,
> -							dst_pte, src_pte,
> -							src_vma, addr, rss);
> -			if (entry.val)
> +			ret = copy_nonpresent_pte(dst_mm, src_mm,
> +						dst_pte, src_pte,
> +						src_vma, addr, rss);
> +			if (ret == -EAGAIN) {
> +				entry = pte_to_swp_entry(*src_pte);
>  				break;
> +			}
>  			progress += 8;
>  			continue;
>  		}

Note that -EAGAIN was previously used by copy_present_page() for early cow
use.  Here later although we check entry.val first:

	if (entry.val) {
		if (add_swap_count_continuation(entry, GFP_KERNEL) < 0) {
			ret = -ENOMEM;
			goto out;
		}
		entry.val = 0;
	} else if (ret) {
		WARN_ON_ONCE(ret != -EAGAIN);
		prealloc = page_copy_prealloc(src_mm, src_vma, addr);
		if (!prealloc)
			return -ENOMEM;
		/* We've captured and resolved the error. Reset, try again. */
		ret = 0;
	}

We didn't reset "ret" in entry.val case (maybe we should?). Then in the next
round of "goto again" if "ret" is unluckily untouched, it could reach the 2nd
if check, and I think it could cause an unexpected page_copy_prealloc().
Alistair Popple May 27, 2021, 1:20 a.m. UTC | #2
On Thursday, 27 May 2021 5:50:05 AM AEST Peter Xu wrote:
> On Mon, May 24, 2021 at 11:27:21PM +1000, Alistair Popple wrote:
> > Currently if copy_nonpresent_pte() returns a non-zero value it is
> > assumed to be a swap entry which requires further processing outside the
> > loop in copy_pte_range() after dropping locks. This prevents other
> > values being returned to signal conditions such as failure which a
> > subsequent change requires.
> > 
> > Instead make copy_nonpresent_pte() return an error code if further
> > processing is required and read the value for the swap entry in the main
> > loop under the ptl.
> > 
> > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> > 
> > ---
> > 
> > v9:
> > 
> > New for v9 to allow device exclusive handling to occur in
> > copy_nonpresent_pte().
> > ---
> > 
> >  mm/memory.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 2fb455c365c2..e061cfa18c11 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -718,7 +718,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct
> > mm_struct *src_mm,> 
> >       if (likely(!non_swap_entry(entry))) {
> >       
> >               if (swap_duplicate(entry) < 0)
> > 
> > -                     return entry.val;
> > +                     return -EAGAIN;
> > 
> >               /* make sure dst_mm is on swapoff's mmlist. */
> >               if (unlikely(list_empty(&dst_mm->mmlist))) {
> > 
> > @@ -974,11 +974,13 @@ copy_pte_range(struct vm_area_struct *dst_vma,
> > struct vm_area_struct *src_vma,> 
> >                       continue;
> >               
> >               }
> >               if (unlikely(!pte_present(*src_pte))) {
> > 
> > -                     entry.val = copy_nonpresent_pte(dst_mm, src_mm,
> > -                                                     dst_pte, src_pte,
> > -                                                     src_vma, addr, rss);
> > -                     if (entry.val)
> > +                     ret = copy_nonpresent_pte(dst_mm, src_mm,
> > +                                             dst_pte, src_pte,
> > +                                             src_vma, addr, rss);
> > +                     if (ret == -EAGAIN) {
> > +                             entry = pte_to_swp_entry(*src_pte);
> > 
> >                               break;
> > 
> > +                     }
> > 
> >                       progress += 8;
> >                       continue;
> >               
> >               }
> 
> Note that -EAGAIN was previously used by copy_present_page() for early cow
> use.  Here later although we check entry.val first:
> 
>         if (entry.val) {
>                 if (add_swap_count_continuation(entry, GFP_KERNEL) < 0) {
>                         ret = -ENOMEM;
>                         goto out;
>                 }
>                 entry.val = 0;
>         } else if (ret) {
>                 WARN_ON_ONCE(ret != -EAGAIN);
>                 prealloc = page_copy_prealloc(src_mm, src_vma, addr);
>                 if (!prealloc)
>                         return -ENOMEM;
>                 /* We've captured and resolved the error. Reset, try again.
> */ ret = 0;
>         }
> 
> We didn't reset "ret" in entry.val case (maybe we should?). Then in the next
> round of "goto again" if "ret" is unluckily untouched, it could reach the
> 2nd if check, and I think it could cause an unexpected
> page_copy_prealloc().

Thanks, I had considered that but saw "ret" was always set either by 
copy_nonpresent_pte() or copy_present_pte(). However missed the "unlucky" case 
at the start of the loop:

	if (progress >= 32) {
		progress = 0;
		if (need_resched() ||
				spin_needbreak(src_ptl) || pin_needbreak(dst_ptl))
			break;

Looking at this again though checking different variables to figure out what 
to do outside the locks and reusing error codes seems error prone. I reused -
EAGAIN for copy_nonpresent_pte() simply because that seemed the most sensible 
error code, but I don't think that aids readability and it might be better to 
use a unique error code for each case needing extra handling.

So it might be better if I update this patch to:
1) Use unique error codes for each case requiring special handling outside the 
lock.
2) Only check "ret" to determine what to do outside locks (ie. not entry.val)
3) Document these.
4) Always reset ret after handling.

Thoughts?

 - Alistair

> --
> Peter Xu
Peter Xu May 27, 2021, 1:44 a.m. UTC | #3
On Thu, May 27, 2021 at 11:20:36AM +1000, Alistair Popple wrote:
> On Thursday, 27 May 2021 5:50:05 AM AEST Peter Xu wrote:
> > On Mon, May 24, 2021 at 11:27:21PM +1000, Alistair Popple wrote:
> > > Currently if copy_nonpresent_pte() returns a non-zero value it is
> > > assumed to be a swap entry which requires further processing outside the
> > > loop in copy_pte_range() after dropping locks. This prevents other
> > > values being returned to signal conditions such as failure which a
> > > subsequent change requires.
> > > 
> > > Instead make copy_nonpresent_pte() return an error code if further
> > > processing is required and read the value for the swap entry in the main
> > > loop under the ptl.
> > > 
> > > Signed-off-by: Alistair Popple <apopple@nvidia.com>
> > > 
> > > ---
> > > 
> > > v9:
> > > 
> > > New for v9 to allow device exclusive handling to occur in
> > > copy_nonpresent_pte().
> > > ---
> > > 
> > >  mm/memory.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/mm/memory.c b/mm/memory.c
> > > index 2fb455c365c2..e061cfa18c11 100644
> > > --- a/mm/memory.c
> > > +++ b/mm/memory.c
> > > @@ -718,7 +718,7 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct
> > > mm_struct *src_mm,> 
> > >       if (likely(!non_swap_entry(entry))) {
> > >       
> > >               if (swap_duplicate(entry) < 0)
> > > 
> > > -                     return entry.val;
> > > +                     return -EAGAIN;
> > > 
> > >               /* make sure dst_mm is on swapoff's mmlist. */
> > >               if (unlikely(list_empty(&dst_mm->mmlist))) {
> > > 
> > > @@ -974,11 +974,13 @@ copy_pte_range(struct vm_area_struct *dst_vma,
> > > struct vm_area_struct *src_vma,> 
> > >                       continue;
> > >               
> > >               }
> > >               if (unlikely(!pte_present(*src_pte))) {
> > > 
> > > -                     entry.val = copy_nonpresent_pte(dst_mm, src_mm,
> > > -                                                     dst_pte, src_pte,
> > > -                                                     src_vma, addr, rss);
> > > -                     if (entry.val)
> > > +                     ret = copy_nonpresent_pte(dst_mm, src_mm,
> > > +                                             dst_pte, src_pte,
> > > +                                             src_vma, addr, rss);
> > > +                     if (ret == -EAGAIN) {
> > > +                             entry = pte_to_swp_entry(*src_pte);
> > > 
> > >                               break;
> > > 
> > > +                     }
> > > 
> > >                       progress += 8;
> > >                       continue;
> > >               
> > >               }
> > 
> > Note that -EAGAIN was previously used by copy_present_page() for early cow
> > use.  Here later although we check entry.val first:
> > 
> >         if (entry.val) {
> >                 if (add_swap_count_continuation(entry, GFP_KERNEL) < 0) {
> >                         ret = -ENOMEM;
> >                         goto out;
> >                 }
> >                 entry.val = 0;
> >         } else if (ret) {
> >                 WARN_ON_ONCE(ret != -EAGAIN);
> >                 prealloc = page_copy_prealloc(src_mm, src_vma, addr);
> >                 if (!prealloc)
> >                         return -ENOMEM;
> >                 /* We've captured and resolved the error. Reset, try again.
> > */ ret = 0;
> >         }
> > 
> > We didn't reset "ret" in entry.val case (maybe we should?). Then in the next
> > round of "goto again" if "ret" is unluckily untouched, it could reach the
> > 2nd if check, and I think it could cause an unexpected
> > page_copy_prealloc().
> 
> Thanks, I had considered that but saw "ret" was always set either by 
> copy_nonpresent_pte() or copy_present_pte(). However missed the "unlucky" case 
> at the start of the loop:
> 
> 	if (progress >= 32) {
> 		progress = 0;
> 		if (need_resched() ||
> 				spin_needbreak(src_ptl) || pin_needbreak(dst_ptl))
> 			break;
> 
> Looking at this again though checking different variables to figure out what 
> to do outside the locks and reusing error codes seems error prone. I reused -
> EAGAIN for copy_nonpresent_pte() simply because that seemed the most sensible 
> error code, but I don't think that aids readability and it might be better to 
> use a unique error code for each case needing extra handling.
> 
> So it might be better if I update this patch to:
> 1) Use unique error codes for each case requiring special handling outside the 
> lock.
> 2) Only check "ret" to determine what to do outside locks (ie. not entry.val)
> 3) Document these.
> 4) Always reset ret after handling.
> 
> Thoughts?

Looks good to me.  Thanks,
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 2fb455c365c2..e061cfa18c11 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -718,7 +718,7 @@  copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 	if (likely(!non_swap_entry(entry))) {
 		if (swap_duplicate(entry) < 0)
-			return entry.val;
+			return -EAGAIN;
 
 		/* make sure dst_mm is on swapoff's mmlist. */
 		if (unlikely(list_empty(&dst_mm->mmlist))) {
@@ -974,11 +974,13 @@  copy_pte_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma,
 			continue;
 		}
 		if (unlikely(!pte_present(*src_pte))) {
-			entry.val = copy_nonpresent_pte(dst_mm, src_mm,
-							dst_pte, src_pte,
-							src_vma, addr, rss);
-			if (entry.val)
+			ret = copy_nonpresent_pte(dst_mm, src_mm,
+						dst_pte, src_pte,
+						src_vma, addr, rss);
+			if (ret == -EAGAIN) {
+				entry = pte_to_swp_entry(*src_pte);
 				break;
+			}
 			progress += 8;
 			continue;
 		}