diff mbox series

mm/migrate.c: Rework migration_entry_wait() to not take a pageref

Message ID 20211104103338.891258-1-apopple@nvidia.com (mailing list archive)
State New
Headers show
Series mm/migrate.c: Rework migration_entry_wait() to not take a pageref | expand

Commit Message

Alistair Popple Nov. 4, 2021, 10:33 a.m. UTC
This fixes a FIXME in migrate_vma_check_page().

Before migrating a page migration code will take a reference and check
there are no unexpected page references, failing the migration if there
are. When a thread faults on a migration entry it will take a temporary
reference to the page to wait for the page to become unlocked signifying
the migration entry has been removed.

This reference is dropped just prior to waiting on the page lock,
however the extra reference can cause migration failures so it is
desirable to avoid taking it.

As migration code already has a reference to the migrating page an extra
reference to wait on PG_locked is unnecessary so long as the reference
can't be dropped whilst setting up the wait.

When faulting on a migration entry the ptl is taken to check the
migration entry. Removing a migration entry also requires the ptl, and
migration code won't drop its page reference until after the migration
entry has been removed. Therefore retaining the ptl of a migration entry
is sufficient to ensure the page has a reference. Reworking
migration_entry_wait() to hold the ptl until the wait setup is complete
means the extra page reference is no longer needed.

Signed-off-by: Alistair Popple <apopple@nvidia.com>
---
 include/linux/pagemap.h |  2 +
 mm/filemap.c            | 82 +++++++++++++++++++++++++++++++++++++++++
 mm/migrate.c            | 28 ++------------
 3 files changed, 87 insertions(+), 25 deletions(-)

Comments

Matthew Wilcox (Oracle) Nov. 4, 2021, 12:21 p.m. UTC | #1
On Thu, Nov 04, 2021 at 09:33:38PM +1100, Alistair Popple wrote:
> +++ b/mm/filemap.c
> @@ -1356,6 +1356,88 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
>  	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
>  }
>  
> +/**
> + * migration_entry_wait_on_locked - Wait for a migration entry to be removed
> + * @page: page referenced by the migration entry.
> + * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
> + * @ptl: already locked ptl. This function will drop the lock.
> + *
> + * Wait for a migration entry referencing the given page to be removed. This is
> + * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
> + * this can be called without taking a reference on the page. Instead this
> + * should be called while holding the ptl for the migration entry referencing
> + * the page.
> + *
> + * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
> + *
> + * This follows the same logic as wait_on_page_bit_common() so see the comments
> + * there.
> + */
> +void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
> +				spinlock_t *ptl)
> +{
> +	struct wait_page_queue wait_page;
> +	wait_queue_entry_t *wait = &wait_page.wait;
> +	bool thrashing = false;
> +	bool delayacct = false;
> +	unsigned long pflags;
> +	wait_queue_head_t *q;
> +
> +	q = page_waitqueue(page);

You're going to need to update this patch to apply to Linus' current
tree; page_waitqueue() went away in favour of folio_waitqueue().

It seems like it would look simpler if this were a patch which modified
folio_wait_bit_common() instead of doing a manual inline of it into
this function.
Alistair Popple Nov. 5, 2021, 7:02 a.m. UTC | #2
On Thursday, 4 November 2021 11:21:51 PM AEDT Matthew Wilcox wrote:
> On Thu, Nov 04, 2021 at 09:33:38PM +1100, Alistair Popple wrote:
> > +++ b/mm/filemap.c
> > @@ -1356,6 +1356,88 @@ static inline int wait_on_page_bit_common(wait_queue_head_t *q,
> >  	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
> >  }
> >  
> > +/**
> > + * migration_entry_wait_on_locked - Wait for a migration entry to be removed
> > + * @page: page referenced by the migration entry.
> > + * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
> > + * @ptl: already locked ptl. This function will drop the lock.
> > + *
> > + * Wait for a migration entry referencing the given page to be removed. This is
> > + * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
> > + * this can be called without taking a reference on the page. Instead this
> > + * should be called while holding the ptl for the migration entry referencing
> > + * the page.
> > + *
> > + * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
> > + *
> > + * This follows the same logic as wait_on_page_bit_common() so see the comments
> > + * there.
> > + */
> > +void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
> > +				spinlock_t *ptl)
> > +{
> > +	struct wait_page_queue wait_page;
> > +	wait_queue_entry_t *wait = &wait_page.wait;
> > +	bool thrashing = false;
> > +	bool delayacct = false;
> > +	unsigned long pflags;
> > +	wait_queue_head_t *q;
> > +
> > +	q = page_waitqueue(page);
> 
> You're going to need to update this patch to apply to Linus' current
> tree; page_waitqueue() went away in favour of folio_waitqueue().

Argh, thanks I had meant to rebase before sending.

> It seems like it would look simpler if this were a patch which modified
> folio_wait_bit_common() instead of doing a manual inline of it into
> this function.

Yes, happy for some opinions here. I was debating a manual inline vs. modifying
folio_wait_bit_common() but felt an additional two special case arguments would
make things a bit messy and there was no obvious way to refactor or split up
folio_wait_bit_common().

However I just noticed wait and wait_page are related so I might be able to
refactor some of the initialisation to reduce code duplication. Will resend a
rebased version doing that.
kernel test robot Nov. 5, 2021, 9:50 a.m. UTC | #3
Hi Alistair,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on v5.15]
[cannot apply to hnaz-mm/master linus/master next-20211105]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211104-183442
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2f111a6fd5b5297b4e92f53798ca086f7c7d33a4
config: arm-randconfig-r026-20211105 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 847a6807332b13f43704327c2d30103ec0347c77)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm cross compiling tool for clang build
        # apt-get install binutils-arm-linux-gnueabi
        # https://github.com/0day-ci/linux/commit/e9447498f8f8758741f3dae044c3e4593130595c
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211104-183442
        git checkout e9447498f8f8758741f3dae044c3e4593130595c
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/filemap.c:1468:2: error: implicit declaration of function 'pte_unmap' [-Werror,-Wimplicit-function-declaration]
           pte_unmap_unlock(ptep, ptl);
           ^
   include/linux/mm.h:2275:2: note: expanded from macro 'pte_unmap_unlock'
           pte_unmap(pte);                                 \
           ^
   1 error generated.


vim +/pte_unmap +1468 mm/filemap.c

  1413	
  1414	/**
  1415	 * migration_entry_wait_on_locked - Wait for a migration entry to be removed
  1416	 * @page: page referenced by the migration entry.
  1417	 * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
  1418	 * @ptl: already locked ptl. This function will drop the lock.
  1419	 *
  1420	 * Wait for a migration entry referencing the given page to be removed. This is
  1421	 * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
  1422	 * this can be called without taking a reference on the page. Instead this
  1423	 * should be called while holding the ptl for the migration entry referencing
  1424	 * the page.
  1425	 *
  1426	 * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
  1427	 *
  1428	 * This follows the same logic as wait_on_page_bit_common() so see the comments
  1429	 * there.
  1430	 */
  1431	void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
  1432					spinlock_t *ptl)
  1433	{
  1434		struct wait_page_queue wait_page;
  1435		wait_queue_entry_t *wait = &wait_page.wait;
  1436		bool thrashing = false;
  1437		bool delayacct = false;
  1438		unsigned long pflags;
  1439		wait_queue_head_t *q;
  1440	
  1441		q = page_waitqueue(page);
  1442		if (!PageUptodate(page) && PageWorkingset(page)) {
  1443			if (!PageSwapBacked(page)) {
  1444				delayacct_thrashing_start();
  1445				delayacct = true;
  1446			}
  1447			psi_memstall_enter(&pflags);
  1448			thrashing = true;
  1449		}
  1450	
  1451		init_wait(wait);
  1452		wait->func = wake_page_function;
  1453		wait_page.page = page;
  1454		wait_page.bit_nr = PG_locked;
  1455		wait->flags = 0;
  1456	
  1457		spin_lock_irq(&q->lock);
  1458		SetPageWaiters(page);
  1459		if (!trylock_page_bit_common(page, PG_locked, wait))
  1460			__add_wait_queue_entry_tail(q, wait);
  1461		spin_unlock_irq(&q->lock);
  1462	
  1463		/*
  1464		 * If a migration entry exists for the page the migration path must hold
  1465		 * a valid reference to the page, and it must take the ptl to remove the
  1466		 * migration entry. So the page is valid until the ptl is dropped.
  1467		 */
> 1468		pte_unmap_unlock(ptep, ptl);
  1469	
  1470		for (;;) {
  1471			unsigned int flags;
  1472	
  1473			set_current_state(TASK_UNINTERRUPTIBLE);
  1474	
  1475			/* Loop until we've been woken or interrupted */
  1476			flags = smp_load_acquire(&wait->flags);
  1477			if (!(flags & WQ_FLAG_WOKEN)) {
  1478				if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
  1479					break;
  1480	
  1481				io_schedule();
  1482				continue;
  1483			}
  1484			break;
  1485		}
  1486	
  1487		finish_wait(q, wait);
  1488	
  1489		if (thrashing) {
  1490			if (delayacct)
  1491				delayacct_thrashing_end();
  1492			psi_memstall_leave(&pflags);
  1493		}
  1494	}
  1495	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Alistair Popple Nov. 8, 2021, 7:21 a.m. UTC | #4
Got this after sending v2 but it will have the same problem. This occurs for
CONFIG_MMU=n which seems to be broken anyway due to other arch build errors (at
least for Arm SA1100 based builds). Fixing this is easy enough though - only
defining migration_entry_wait_on_locked() when CONFIG_MIGRATION=y fixes this
and is probably a good idea anyway.

I will wait a bit for feedback on v2 before sending v3 with this fix.

 - Alistair

On Friday, 5 November 2021 8:50:08 PM AEDT kernel test robot wrote:
> Hi Alistair,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on linux/master]
> [also build test ERROR on v5.15]
> [cannot apply to hnaz-mm/master linus/master next-20211105]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211104-183442
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 2f111a6fd5b5297b4e92f53798ca086f7c7d33a4
> config: arm-randconfig-r026-20211105 (attached as .config)
> compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 847a6807332b13f43704327c2d30103ec0347c77)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install arm cross compiling tool for clang build
>         # apt-get install binutils-arm-linux-gnueabi
>         # https://github.com/0day-ci/linux/commit/e9447498f8f8758741f3dae044c3e4593130595c
>         git remote add linux-review https://github.com/0day-ci/linux
>         git fetch --no-tags linux-review Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211104-183442
>         git checkout e9447498f8f8758741f3dae044c3e4593130595c
>         # save the attached .config to linux build tree
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=arm 
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):
> 
> >> mm/filemap.c:1468:2: error: implicit declaration of function 'pte_unmap' [-Werror,-Wimplicit-function-declaration]
>            pte_unmap_unlock(ptep, ptl);
>            ^
>    include/linux/mm.h:2275:2: note: expanded from macro 'pte_unmap_unlock'
>            pte_unmap(pte);                                 \
>            ^
>    1 error generated.
> 
> 
> vim +/pte_unmap +1468 mm/filemap.c
> 
>   1413	
>   1414	/**
>   1415	 * migration_entry_wait_on_locked - Wait for a migration entry to be removed
>   1416	 * @page: page referenced by the migration entry.
>   1417	 * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
>   1418	 * @ptl: already locked ptl. This function will drop the lock.
>   1419	 *
>   1420	 * Wait for a migration entry referencing the given page to be removed. This is
>   1421	 * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
>   1422	 * this can be called without taking a reference on the page. Instead this
>   1423	 * should be called while holding the ptl for the migration entry referencing
>   1424	 * the page.
>   1425	 *
>   1426	 * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
>   1427	 *
>   1428	 * This follows the same logic as wait_on_page_bit_common() so see the comments
>   1429	 * there.
>   1430	 */
>   1431	void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
>   1432					spinlock_t *ptl)
>   1433	{
>   1434		struct wait_page_queue wait_page;
>   1435		wait_queue_entry_t *wait = &wait_page.wait;
>   1436		bool thrashing = false;
>   1437		bool delayacct = false;
>   1438		unsigned long pflags;
>   1439		wait_queue_head_t *q;
>   1440	
>   1441		q = page_waitqueue(page);
>   1442		if (!PageUptodate(page) && PageWorkingset(page)) {
>   1443			if (!PageSwapBacked(page)) {
>   1444				delayacct_thrashing_start();
>   1445				delayacct = true;
>   1446			}
>   1447			psi_memstall_enter(&pflags);
>   1448			thrashing = true;
>   1449		}
>   1450	
>   1451		init_wait(wait);
>   1452		wait->func = wake_page_function;
>   1453		wait_page.page = page;
>   1454		wait_page.bit_nr = PG_locked;
>   1455		wait->flags = 0;
>   1456	
>   1457		spin_lock_irq(&q->lock);
>   1458		SetPageWaiters(page);
>   1459		if (!trylock_page_bit_common(page, PG_locked, wait))
>   1460			__add_wait_queue_entry_tail(q, wait);
>   1461		spin_unlock_irq(&q->lock);
>   1462	
>   1463		/*
>   1464		 * If a migration entry exists for the page the migration path must hold
>   1465		 * a valid reference to the page, and it must take the ptl to remove the
>   1466		 * migration entry. So the page is valid until the ptl is dropped.
>   1467		 */
> > 1468		pte_unmap_unlock(ptep, ptl);
>   1469	
>   1470		for (;;) {
>   1471			unsigned int flags;
>   1472	
>   1473			set_current_state(TASK_UNINTERRUPTIBLE);
>   1474	
>   1475			/* Loop until we've been woken or interrupted */
>   1476			flags = smp_load_acquire(&wait->flags);
>   1477			if (!(flags & WQ_FLAG_WOKEN)) {
>   1478				if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
>   1479					break;
>   1480	
>   1481				io_schedule();
>   1482				continue;
>   1483			}
>   1484			break;
>   1485		}
>   1486	
>   1487		finish_wait(q, wait);
>   1488	
>   1489		if (thrashing) {
>   1490			if (delayacct)
>   1491				delayacct_thrashing_end();
>   1492			psi_memstall_leave(&pflags);
>   1493		}
>   1494	}
>   1495	
> 
> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
>
diff mbox series

Patch

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index ed02aa522263..00e4cbde6ec5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -696,6 +696,8 @@  static inline int wait_on_page_locked_killable(struct page *page)
 	return wait_on_page_bit_killable(compound_head(page), PG_locked);
 }
 
+void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
+				spinlock_t *ptl);
 int put_and_wait_on_page_locked(struct page *page, int state);
 void wait_on_page_writeback(struct page *page);
 int wait_on_page_writeback_killable(struct page *page);
diff --git a/mm/filemap.c b/mm/filemap.c
index d1458ecf2f51..53fa8f8576fd 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1356,6 +1356,88 @@  static inline int wait_on_page_bit_common(wait_queue_head_t *q,
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
+/**
+ * migration_entry_wait_on_locked - Wait for a migration entry to be removed
+ * @page: page referenced by the migration entry.
+ * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
+ * @ptl: already locked ptl. This function will drop the lock.
+ *
+ * Wait for a migration entry referencing the given page to be removed. This is
+ * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
+ * this can be called without taking a reference on the page. Instead this
+ * should be called while holding the ptl for the migration entry referencing
+ * the page.
+ *
+ * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
+ *
+ * This follows the same logic as wait_on_page_bit_common() so see the comments
+ * there.
+ */
+void migration_entry_wait_on_locked(struct page *page, pte_t *ptep,
+				spinlock_t *ptl)
+{
+	struct wait_page_queue wait_page;
+	wait_queue_entry_t *wait = &wait_page.wait;
+	bool thrashing = false;
+	bool delayacct = false;
+	unsigned long pflags;
+	wait_queue_head_t *q;
+
+	q = page_waitqueue(page);
+	if (!PageUptodate(page) && PageWorkingset(page)) {
+		if (!PageSwapBacked(page)) {
+			delayacct_thrashing_start();
+			delayacct = true;
+		}
+		psi_memstall_enter(&pflags);
+		thrashing = true;
+	}
+
+	init_wait(wait);
+	wait->func = wake_page_function;
+	wait_page.page = page;
+	wait_page.bit_nr = PG_locked;
+	wait->flags = 0;
+
+	spin_lock_irq(&q->lock);
+	SetPageWaiters(page);
+	if (!trylock_page_bit_common(page, PG_locked, wait))
+		__add_wait_queue_entry_tail(q, wait);
+	spin_unlock_irq(&q->lock);
+
+	/*
+	 * If a migration entry exists for the page the migration path must hold
+	 * a valid reference to the page, and it must take the ptl to remove the
+	 * migration entry. So the page is valid until the ptl is dropped.
+	 */
+	pte_unmap_unlock(ptep, ptl);
+
+	for (;;) {
+		unsigned int flags;
+
+		set_current_state(TASK_UNINTERRUPTIBLE);
+
+		/* Loop until we've been woken or interrupted */
+		flags = smp_load_acquire(&wait->flags);
+		if (!(flags & WQ_FLAG_WOKEN)) {
+			if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
+				break;
+
+			io_schedule();
+			continue;
+		}
+		break;
+	}
+
+	finish_wait(q, wait);
+
+	if (thrashing) {
+		if (delayacct)
+			delayacct_thrashing_end();
+		psi_memstall_leave(&pflags);
+	}
+}
+
 void wait_on_page_bit(struct page *page, int bit_nr)
 {
 	wait_queue_head_t *q = page_waitqueue(page);
diff --git a/mm/migrate.c b/mm/migrate.c
index 7e240437e7d9..2218f65b01c4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -304,15 +304,7 @@  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 	page = pfn_swap_entry_to_page(entry);
 	page = compound_head(page);
 
-	/*
-	 * Once page cache replacement of page migration started, page_count
-	 * is zero; but we must not call put_and_wait_on_page_locked() without
-	 * a ref. Use get_page_unless_zero(), and just fault again if it fails.
-	 */
-	if (!get_page_unless_zero(page))
-		goto out;
-	pte_unmap_unlock(ptep, ptl);
-	put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE);
+	migration_entry_wait_on_locked(page, ptep, ptl);
 	return;
 out:
 	pte_unmap_unlock(ptep, ptl);
@@ -2406,22 +2398,8 @@  static bool migrate_vma_check_page(struct page *page)
 		return false;
 
 	/* Page from ZONE_DEVICE have one extra reference */
-	if (is_zone_device_page(page)) {
-		/*
-		 * Private page can never be pin as they have no valid pte and
-		 * GUP will fail for those. Yet if there is a pending migration
-		 * a thread might try to wait on the pte migration entry and
-		 * will bump the page reference count. Sadly there is no way to
-		 * differentiate a regular pin from migration wait. Hence to
-		 * avoid 2 racing thread trying to migrate back to CPU to enter
-		 * infinite loop (one stopping migration because the other is
-		 * waiting on pte migration entry). We always return true here.
-		 *
-		 * FIXME proper solution is to rework migration_entry_wait() so
-		 * it does not need to take a reference on page.
-		 */
-		return is_device_private_page(page);
-	}
+	if (is_zone_device_page(page))
+		extra++;
 
 	/* For file back page */
 	if (page_mapping(page))