diff mbox series

[1/9] mm/migrate.c: Always allow device private pages to migrate

Message ID 20210209010722.13839-2-apopple@nvidia.com (mailing list archive)
State New, archived
Headers show
Series Add support for SVM atomics in Nouveau | expand

Commit Message

Alistair Popple Feb. 9, 2021, 1:07 a.m. UTC
Device private pages are used to represent device memory that is not
directly accessible from the CPU. Extra references to a device private
page are only used to ensure the struct page itself remains valid whilst
waiting for migration entries. Therefore extra references should not
prevent device private page migration as this can lead to failures to
migrate pages back to the CPU which are fatal to the user process.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
 mm/migrate.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Jason Gunthorpe Feb. 9, 2021, 1:39 p.m. UTC | #1
On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> Device private pages are used to represent device memory that is not
> directly accessible from the CPU. Extra references to a device private
> page are only used to ensure the struct page itself remains valid whilst
> waiting for migration entries. Therefore extra references should not
> prevent device private page migration as this can lead to failures to
> migrate pages back to the CPU which are fatal to the user process.

This should identify the extra references in expected_count, just
disabling this protection seems unsafe, ZONE_DEVICE is not so special
that the refcount means nothing

Is this a side effect of the extra refcounts that Ralph was trying to
get rid of? I'd rather see that work finished :)

Jason
Alistair Popple Feb. 10, 2021, 3:40 a.m. UTC | #2
On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > Device private pages are used to represent device memory that is not
> > directly accessible from the CPU. Extra references to a device private
> > page are only used to ensure the struct page itself remains valid whilst
> > waiting for migration entries. Therefore extra references should not
> > prevent device private page migration as this can lead to failures to
> > migrate pages back to the CPU which are fatal to the user process.
> 
> This should identify the extra references in expected_count, just
> disabling this protection seems unsafe, ZONE_DEVICE is not so special
> that the refcount means nothing

This is similar to what migarte_vma_check_page() does now. The issue is that a 
migration wait takes a reference on the device private page so you can end up 
with one thread stuck waiting for migration whilst the other can't migrate due 
to the extra refcount.

Given device private pages can't undergo GUP and that it's not possible to 
differentiate the migration wait refcount from any other refcount we assume 
any possible extra reference must be from migration wait.

> Is this a side effect of the extra refcounts that Ralph was trying to
> get rid of? I'd rather see that work finished :)

I'd like to see that finished too but I don't think it would help here as this 
is not a side effect of that.

 - Alistair

> Jason
Jason Gunthorpe Feb. 10, 2021, 12:56 p.m. UTC | #3
On Wed, Feb 10, 2021 at 02:40:10PM +1100, Alistair Popple wrote:
> On Wednesday, 10 February 2021 12:39:32 AM AEDT Jason Gunthorpe wrote:
> > On Tue, Feb 09, 2021 at 12:07:14PM +1100, Alistair Popple wrote:
> > > Device private pages are used to represent device memory that is not
> > > directly accessible from the CPU. Extra references to a device private
> > > page are only used to ensure the struct page itself remains valid whilst
> > > waiting for migration entries. Therefore extra references should not
> > > prevent device private page migration as this can lead to failures to
> > > migrate pages back to the CPU which are fatal to the user process.
> > 
> > This should identify the extra references in expected_count, just
> > disabling this protection seems unsafe, ZONE_DEVICE is not so special
> > that the refcount means nothing
> 
> This is similar to what migarte_vma_check_page() does now. The issue is that a 
> migration wait takes a reference on the device private page so you can end up 
> with one thread stuck waiting for migration whilst the other can't migrate due 
> to the extra refcount.
> 
> Given device private pages can't undergo GUP and that it's not possible to 
> differentiate the migration wait refcount from any other refcount we assume 
> any possible extra reference must be from migration wait.

GUP is not the only thing that elevates the refcount, I think this is
an unsafe assumption

Why is migration holding an extra refcount anyhow?

Jason
diff mbox series

Patch

diff --git a/mm/migrate.c b/mm/migrate.c
index 20ca887ea769..053228559fd3 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -405,8 +405,13 @@  int migrate_page_move_mapping(struct address_space *mapping,
 	int nr = thp_nr_pages(page);
 
 	if (!mapping) {
-		/* Anonymous page without mapping */
-		if (page_count(page) != expected_count)
+		/*
+		 * Anonymous page without mapping. Device private pages should
+		 * never have extra references except during migration, but it
+		 * is safe to ignore these.
+		 */
+		if (!is_device_private_page(page) &&
+			page_count(page) != expected_count)
 			return -EAGAIN;
 
 		/* No turning back from here */