diff mbox series

[v2] mm/gup.c: Simplify and fix check_and_migrate_movable_pages() return codes

Message ID 814dee5d3aadd38c3370eaaf438ba7eee9bf9d2b.1659399696.git-series.apopple@nvidia.com (mailing list archive)
State New
Headers show
Series [v2] mm/gup.c: Simplify and fix check_and_migrate_movable_pages() return codes | expand

Commit Message

Alistair Popple Aug. 2, 2022, 12:30 a.m. UTC
When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
is called to migrate pages out of zones which should not contain any
longterm pinned pages.

When migration succeeds all pages will have been unpinned so pinning
needs to be retried. This is indicated by returning zero. When all pages
are in the correct zone the number of pinned pages is returned.

However migration can also fail, in which case pages are unpinned and
-ENOMEM is returned. However if the failure was due to not being unable
to isolate a page zero is returned. This leads to indefinite looping in
__gup_longterm_locked().

Fix this by simplifying the return codes such that zero indicates all
pages were successfully pinned in the correct zone while errors indicate
either pages were migrated and pinning should be retried or that
migration has failed and therefore the pinning operation should fail.

This fixes the indefinite looping on page isolation failure by failing
the pin operation instead of retrying indefinitely.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

Changes for v2:
 - Changed error handling to be move conventional using goto as
   suggested by Jason.
 - Removed coherent_pages check as it isn't necessary.
---
 mm/gup.c | 81 ++++++++++++++++++++++++++++-----------------------------
 1 file changed, 41 insertions(+), 40 deletions(-)


base-commit: 187e7c41445a0f202bb551f08ca7f8158fea1cd7

Comments

Jason Gunthorpe Aug. 2, 2022, 1:50 p.m. UTC | #1
On Tue, Aug 02, 2022 at 10:30:12AM +1000, Alistair Popple wrote:
> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
> is called to migrate pages out of zones which should not contain any
> longterm pinned pages.
> 
> When migration succeeds all pages will have been unpinned so pinning
> needs to be retried. This is indicated by returning zero. When all pages
> are in the correct zone the number of pinned pages is returned.
> 
> However migration can also fail, in which case pages are unpinned and
> -ENOMEM is returned. However if the failure was due to not being unable
> to isolate a page zero is returned. This leads to indefinite looping in
> __gup_longterm_locked().
> 
> Fix this by simplifying the return codes such that zero indicates all
> pages were successfully pinned in the correct zone while errors indicate
> either pages were migrated and pinning should be retried or that
> migration has failed and therefore the pinning operation should fail.
> 
> This fixes the indefinite looping on page isolation failure by failing
> the pin operation instead of retrying indefinitely.
> 
> Signed-off-by: Alistair Popple <apopple@nvidia.com>
> 
> ---
> 
> Changes for v2:
>  - Changed error handling to be move conventional using goto as
>    suggested by Jason.
>  - Removed coherent_pages check as it isn't necessary.
> ---
>  mm/gup.c | 81 ++++++++++++++++++++++++++++-----------------------------
>  1 file changed, 41 insertions(+), 40 deletions(-)
> 
> diff --git a/mm/gup.c b/mm/gup.c
> index 364b274..5707c56 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -1901,20 +1901,24 @@ struct page *get_dump_page(unsigned long addr)
>  
>  #ifdef CONFIG_MIGRATION
>  /*
> - * Check whether all pages are pinnable, if so return number of pages.  If some
> - * pages are not pinnable, migrate them, and unpin all pages. Return zero if
> - * pages were migrated, or if some pages were not successfully isolated.
> - * Return negative error if migration fails.
> + * Check whether all pages are pinnable. If some pages are not pinnable migrate
> + * them and unpin all the pages. Returns -EAGAIN if pages were unpinned or zero
> + * if all pages are pinnable and in the right zone. Other errors indicate
> + * migration failure.
>   */
>  static long check_and_migrate_movable_pages(unsigned long nr_pages,
>  					    struct page **pages,
>  					    unsigned int gup_flags)
>  {
> -	unsigned long isolation_error_count = 0, i;
> +	unsigned long i;
>  	struct folio *prev_folio = NULL;
>  	LIST_HEAD(movable_page_list);
> -	bool drain_allow = true, coherent_pages = false;
> -	int ret = 0;
> +	bool drain_allow = true;
> +	int ret = -EAGAIN;

It looked like every goto error set this? Why initialize it?

It looks OK to me, a lot clearer

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Thanks,
Jason
Pasha Tatashin Aug. 2, 2022, 9:36 p.m. UTC | #2
On Mon, Aug 1, 2022 at 8:32 PM Alistair Popple <apopple@nvidia.com> wrote:
>
> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
> is called to migrate pages out of zones which should not contain any
> longterm pinned pages.
>
> When migration succeeds all pages will have been unpinned so pinning
> needs to be retried. This is indicated by returning zero. When all pages
> are in the correct zone the number of pinned pages is returned.
>
> However migration can also fail, in which case pages are unpinned and
> -ENOMEM is returned. However if the failure was due to not being unable
> to isolate a page zero is returned. This leads to indefinite looping in
> __gup_longterm_locked().

Hi Alistair,

During prohibiting pinning movable zone development, there was a
discussion where we figured that isolation errors should be transient
[1]. What isolation errors are you seeing that lead to infinite loop?
Why do they happen?

Pasha

[1] https://lore.kernel.org/linux-mm/20201218104655.GW32193@dhcp22.suse.cz
Andrew Morton Aug. 3, 2022, 12:12 a.m. UTC | #3
On Tue,  2 Aug 2022 10:30:12 +1000 Alistair Popple <apopple@nvidia.com> wrote:

> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
> is called to migrate pages out of zones which should not contain any
> longterm pinned pages.
> 
> When migration succeeds all pages will have been unpinned so pinning
> needs to be retried. This is indicated by returning zero. When all pages
> are in the correct zone the number of pinned pages is returned.
> 
> However migration can also fail, in which case pages are unpinned and
> -ENOMEM is returned. However if the failure was due to not being unable
> to isolate a page zero is returned. This leads to indefinite looping in
> __gup_longterm_locked().
> 
> Fix this by simplifying the return codes such that zero indicates all
> pages were successfully pinned in the correct zone while errors indicate
> either pages were migrated and pinning should be retried or that
> migration has failed and therefore the pinning operation should fail.
> 
> This fixes the indefinite looping on page isolation failure by failing
> the pin operation instead of retrying indefinitely.
> 

Are we able to identify a Fixes: for this?  Presumably something in the
series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping"?
Alistair Popple Aug. 4, 2022, 12:01 a.m. UTC | #4
Pasha Tatashin <pasha.tatashin@soleen.com> writes:

> On Mon, Aug 1, 2022 at 8:32 PM Alistair Popple <apopple@nvidia.com> wrote:
>>
>> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
>> is called to migrate pages out of zones which should not contain any
>> longterm pinned pages.
>>
>> When migration succeeds all pages will have been unpinned so pinning
>> needs to be retried. This is indicated by returning zero. When all pages
>> are in the correct zone the number of pinned pages is returned.
>>
>> However migration can also fail, in which case pages are unpinned and
>> -ENOMEM is returned. However if the failure was due to not being unable
>> to isolate a page zero is returned. This leads to indefinite looping in
>> __gup_longterm_locked().
>
> Hi Alistair,
>
> During prohibiting pinning movable zone development, there was a
> discussion where we figured that isolation errors should be transient
> [1]. What isolation errors are you seeing that lead to infinite loop?
> Why do they happen?

Thanks for the pointer Pasha. There were reports of qemu running into
the same zero page problem you reported there, see
https://lore.kernel.org/linux-mm/165490039431.944052.12458624139225785964.stgit@omen/

This doesn't directly fix that problem as we need to allow pinning of
the zero page, but it does prevent the infinite loop. I was going to
re-spin this patch to retry instead of instant failure however reading
that thread it seems the infinite loop is desired behaviour. So will
re-spin this to leave that in-place.

 - Alistair

> Pasha
>
> [1] https://lore.kernel.org/linux-mm/20201218104655.GW32193@dhcp22.suse.cz
Alistair Popple Aug. 4, 2022, 12:12 a.m. UTC | #5
Andrew Morton <akpm@linux-foundation.org> writes:

> On Tue,  2 Aug 2022 10:30:12 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>
>> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
>> is called to migrate pages out of zones which should not contain any
>> longterm pinned pages.
>>
>> When migration succeeds all pages will have been unpinned so pinning
>> needs to be retried. This is indicated by returning zero. When all pages
>> are in the correct zone the number of pinned pages is returned.
>>
>> However migration can also fail, in which case pages are unpinned and
>> -ENOMEM is returned. However if the failure was due to not being unable
>> to isolate a page zero is returned. This leads to indefinite looping in
>> __gup_longterm_locked().
>>
>> Fix this by simplifying the return codes such that zero indicates all
>> pages were successfully pinned in the correct zone while errors indicate
>> either pages were migrated and pinning should be retried or that
>> migration has failed and therefore the pinning operation should fail.
>>
>> This fixes the indefinite looping on page isolation failure by failing
>> the pin operation instead of retrying indefinitely.
>>
>
> Are we able to identify a Fixes: for this?  Presumably something in the
> series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping"?

It seems the infinite loop was desired behaviour so I will re-spin this
as a pure clean-up.
David Hildenbrand Aug. 4, 2022, 7:40 a.m. UTC | #6
On 04.08.22 02:12, Alistair Popple wrote:
> 
> Andrew Morton <akpm@linux-foundation.org> writes:
> 
>> On Tue,  2 Aug 2022 10:30:12 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>>
>>> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
>>> is called to migrate pages out of zones which should not contain any
>>> longterm pinned pages.
>>>
>>> When migration succeeds all pages will have been unpinned so pinning
>>> needs to be retried. This is indicated by returning zero. When all pages
>>> are in the correct zone the number of pinned pages is returned.
>>>
>>> However migration can also fail, in which case pages are unpinned and
>>> -ENOMEM is returned. However if the failure was due to not being unable
>>> to isolate a page zero is returned. This leads to indefinite looping in
>>> __gup_longterm_locked().
>>>
>>> Fix this by simplifying the return codes such that zero indicates all
>>> pages were successfully pinned in the correct zone while errors indicate
>>> either pages were migrated and pinning should be retried or that
>>> migration has failed and therefore the pinning operation should fail.
>>>
>>> This fixes the indefinite looping on page isolation failure by failing
>>> the pin operation instead of retrying indefinitely.
>>>
>>
>> Are we able to identify a Fixes: for this?  Presumably something in the
>> series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping"?
> 
> It seems the infinite loop was desired behaviour so I will re-spin this
> as a pure clean-up.
> 

How can the infinite loop trigger when we allow longterm-pinning the
shared zeropage? (note: disallowing that for now was a bug)
Alistair Popple Aug. 4, 2022, 9:57 a.m. UTC | #7
David Hildenbrand <david@redhat.com> writes:

> On 04.08.22 02:12, Alistair Popple wrote:
>>
>> Andrew Morton <akpm@linux-foundation.org> writes:
>>
>>> On Tue,  2 Aug 2022 10:30:12 +1000 Alistair Popple <apopple@nvidia.com> wrote:
>>>
>>>> When pinning pages with FOLL_LONGTERM check_and_migrate_movable_pages()
>>>> is called to migrate pages out of zones which should not contain any
>>>> longterm pinned pages.
>>>>
>>>> When migration succeeds all pages will have been unpinned so pinning
>>>> needs to be retried. This is indicated by returning zero. When all pages
>>>> are in the correct zone the number of pinned pages is returned.
>>>>
>>>> However migration can also fail, in which case pages are unpinned and
>>>> -ENOMEM is returned. However if the failure was due to not being unable
>>>> to isolate a page zero is returned. This leads to indefinite looping in
>>>> __gup_longterm_locked().
>>>>
>>>> Fix this by simplifying the return codes such that zero indicates all
>>>> pages were successfully pinned in the correct zone while errors indicate
>>>> either pages were migrated and pinning should be retried or that
>>>> migration has failed and therefore the pinning operation should fail.
>>>>
>>>> This fixes the indefinite looping on page isolation failure by failing
>>>> the pin operation instead of retrying indefinitely.
>>>>
>>>
>>> Are we able to identify a Fixes: for this?  Presumably something in the
>>> series "Add MEMORY_DEVICE_COHERENT for coherent device memory mapping"?
>>
>> It seems the infinite loop was desired behaviour so I will re-spin this
>> as a pure clean-up.
>>
>
> How can the infinite loop trigger when we allow longterm-pinning the
> shared zeropage? (note: disallowing that for now was a bug)

Right, I don't know of any other triggers so based on the discussion
Pasha pointed me at I think the infinite loop is probably fine unless
there are other bugs.

Apologies I should have copied you on the new version which is just a
clean-up now -
https://lore.kernel.org/linux-mm/20220804032241.859891-1-apopple@nvidia.com/
diff mbox series

Patch

diff --git a/mm/gup.c b/mm/gup.c
index 364b274..5707c56 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1901,20 +1901,24 @@  struct page *get_dump_page(unsigned long addr)
 
 #ifdef CONFIG_MIGRATION
 /*
- * Check whether all pages are pinnable, if so return number of pages.  If some
- * pages are not pinnable, migrate them, and unpin all pages. Return zero if
- * pages were migrated, or if some pages were not successfully isolated.
- * Return negative error if migration fails.
+ * Check whether all pages are pinnable. If some pages are not pinnable migrate
+ * them and unpin all the pages. Returns -EAGAIN if pages were unpinned or zero
+ * if all pages are pinnable and in the right zone. Other errors indicate
+ * migration failure.
  */
 static long check_and_migrate_movable_pages(unsigned long nr_pages,
 					    struct page **pages,
 					    unsigned int gup_flags)
 {
-	unsigned long isolation_error_count = 0, i;
+	unsigned long i;
 	struct folio *prev_folio = NULL;
 	LIST_HEAD(movable_page_list);
-	bool drain_allow = true, coherent_pages = false;
-	int ret = 0;
+	bool drain_allow = true;
+	int ret = -EAGAIN;
+	struct migration_target_control mtc = {
+		.nid = NUMA_NO_NODE,
+		.gfp_mask = GFP_USER | __GFP_NOWARN,
+	};
 
 	for (i = 0; i < nr_pages; i++) {
 		struct folio *folio = page_folio(pages[i]);
@@ -1935,7 +1939,6 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 			 * pages.
 			 */
 			pages[i] = 0;
-			coherent_pages = true;
 
 			/*
 			 * Migration will fail if the page is pinned, so convert
@@ -1946,10 +1949,10 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 				unpin_user_page(&folio->page);
 			}
 
-			ret = migrate_device_coherent_page(&folio->page);
-			if (ret)
-				goto unpin_pages;
-
+			if (migrate_device_coherent_page(&folio->page)) {
+				ret = -EBUSY;
+				goto error;
+			}
 			continue;
 		}
 
@@ -1960,8 +1963,10 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 		 */
 		if (folio_test_hugetlb(folio)) {
 			if (isolate_hugetlb(&folio->page,
-						&movable_page_list))
-				isolation_error_count++;
+						&movable_page_list)) {
+				ret = -EBUSY;
+				goto error;
+			}
 			continue;
 		}
 
@@ -1971,28 +1976,26 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 		}
 
 		if (folio_isolate_lru(folio)) {
-			isolation_error_count++;
-			continue;
+			ret = -EBUSY;
+			goto error;
 		}
+
 		list_add_tail(&folio->lru, &movable_page_list);
 		node_stat_mod_folio(folio,
 				    NR_ISOLATED_ANON + folio_is_file_lru(folio),
 				    folio_nr_pages(folio));
 	}
 
-	if (!list_empty(&movable_page_list) || isolation_error_count
-		|| coherent_pages)
-		goto unpin_pages;
-
 	/*
-	 * If list is empty, and no isolation errors, means that all pages are
-	 * in the correct zone.
+	 * All pages are in the correct zone.
 	 */
-	return nr_pages;
+	if (list_empty(&movable_page_list))
+		return 0;
 
-unpin_pages:
 	/*
-	 * pages[i] might be NULL if any device coherent pages were found.
+	 * Unpin all pages. If device coherent pages were found
+	 * migrate_deivce_coherent_page() will have already dropped the pin and
+	 * set pages[i] == NULL.
 	 */
 	for (i = 0; i < nr_pages; i++) {
 		if (!pages[i])
@@ -2004,21 +2007,19 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 			put_page(pages[i]);
 	}
 
-	if (!list_empty(&movable_page_list)) {
-		struct migration_target_control mtc = {
-			.nid = NUMA_NO_NODE,
-			.gfp_mask = GFP_USER | __GFP_NOWARN,
-		};
-
-		ret = migrate_pages(&movable_page_list, alloc_migration_target,
-				    NULL, (unsigned long)&mtc, MIGRATE_SYNC,
-				    MR_LONGTERM_PIN, NULL);
-		if (ret > 0) /* number of pages not migrated */
-			ret = -ENOMEM;
+	if (migrate_pages(&movable_page_list, alloc_migration_target,
+				NULL, (unsigned long)&mtc, MIGRATE_SYNC,
+				MR_LONGTERM_PIN, NULL)) {
+		ret = -ENOMEM;
+		goto error;
 	}
 
-	if (ret && !list_empty(&movable_page_list))
+	return -EAGAIN;
+
+error:
+	if (!list_empty(&movable_page_list))
 		putback_movable_pages(&movable_page_list);
+
 	return ret;
 }
 #else
@@ -2026,7 +2027,7 @@  static long check_and_migrate_movable_pages(unsigned long nr_pages,
 					    struct page **pages,
 					    unsigned int gup_flags)
 {
-	return nr_pages;
+	return 0;
 }
 #endif /* CONFIG_MIGRATION */
 
@@ -2054,10 +2055,10 @@  static long __gup_longterm_locked(struct mm_struct *mm,
 		if (rc <= 0)
 			break;
 		rc = check_and_migrate_movable_pages(rc, pages, gup_flags);
-	} while (!rc);
+	} while (rc == -EAGAIN);
 	memalloc_pin_restore(flags);
 
-	return rc;
+	return rc ? rc : nr_pages;
 }
 
 static bool is_valid_gup_flags(unsigned int gup_flags)