diff mbox series

[v3] mm/migrate.c: Rework migration_entry_wait() to not take a pageref

Message ID 20211115105222.4183286-1-apopple@nvidia.com (mailing list archive)
State New
Headers show
Series [v3] mm/migrate.c: Rework migration_entry_wait() to not take a pageref | expand

Commit Message

Alistair Popple Nov. 15, 2021, 10:52 a.m. UTC
This fixes the FIXME in migrate_vma_check_page().

Before migrating a page migration code will take a reference and check
there are no unexpected page references, failing the migration if there
are. When a thread faults on a migration entry it will take a temporary
reference to the page to wait for the page to become unlocked signifying
the migration entry has been removed.

This reference is dropped just prior to waiting on the page lock,
however the extra reference can cause migration failures so it is
desirable to avoid taking it.

As migration code already has a reference to the migrating page an extra
reference to wait on PG_locked is unnecessary so long as the reference
can't be dropped whilst setting up the wait.

When faulting on a migration entry the ptl is taken to check the
migration entry. Removing a migration entry also requires the ptl, and
migration code won't drop its page reference until after the migration
entry has been removed. Therefore retaining the ptl of a migration entry
is sufficient to ensure the page has a reference. Reworking
migration_entry_wait() to hold the ptl until the wait setup is complete
means the extra page reference is no longer needed.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

v3:
 - Fix a build issue for CONFIG_MMU=n by only building
   migration_entry_wait_on_locked() if CONFIG_MIGRATION=y

v2:
 - Rebase to master with folios
 - Avoid taking a pageref in pmd_migration_entry_wait() as well
---
 include/linux/migrate.h |  2 +
 mm/filemap.c            | 87 +++++++++++++++++++++++++++++++++++++++++
 mm/migrate.c            | 33 ++--------------
 3 files changed, 93 insertions(+), 29 deletions(-)

Comments

kernel test robot Nov. 15, 2021, 5:40 p.m. UTC | #1
Hi Alistair,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.16-rc1 next-20211115]
[cannot apply to hnaz-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211115-185444
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git debe436e77c72fcee804fb867f275e6d31aa999c
config: nios2-defconfig (attached as .config)
compiler: nios2-linux-gcc (GCC) 11.2.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/91a437ddc7606450e331059d80babe2d4c1163e0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211115-185444
        git checkout 91a437ddc7606450e331059d80babe2d4c1163e0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.2.0 make.cross ARCH=nios2 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/filemap.c:1447:6: warning: no previous prototype for 'migration_entry_wait_on_locked' [-Wmissing-prototypes]
    1447 | void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


vim +/migration_entry_wait_on_locked +1447 mm/filemap.c

  1428	
  1429	#ifdef CONFIG_MIGRATION
  1430	/**
  1431	 * migration_entry_wait_on_locked - Wait for a migration entry to be removed
  1432	 * @page: page referenced by the migration entry.
  1433	 * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
  1434	 * @ptl: already locked ptl. This function will drop the lock.
  1435	 *
  1436	 * Wait for a migration entry referencing the given page to be removed. This is
  1437	 * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
  1438	 * this can be called without taking a reference on the page. Instead this
  1439	 * should be called while holding the ptl for the migration entry referencing
  1440	 * the page.
  1441	 *
  1442	 * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
  1443	 *
  1444	 * This follows the same logic as wait_on_page_bit_common() so see the comments
  1445	 * there.
  1446	 */
> 1447	void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
  1448					spinlock_t *ptl)
  1449	{
  1450		struct wait_page_queue wait_page;
  1451		wait_queue_entry_t *wait = &wait_page.wait;
  1452		bool thrashing = false;
  1453		bool delayacct = false;
  1454		unsigned long pflags;
  1455		wait_queue_head_t *q;
  1456	
  1457		q = folio_waitqueue(folio);
  1458		if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
  1459			if (!folio_test_swapbacked(folio)) {
  1460				delayacct_thrashing_start();
  1461				delayacct = true;
  1462			}
  1463			psi_memstall_enter(&pflags);
  1464			thrashing = true;
  1465		}
  1466	
  1467		init_wait(wait);
  1468		wait->func = wake_page_function;
  1469		wait_page.folio = folio;
  1470		wait_page.bit_nr = PG_locked;
  1471		wait->flags = 0;
  1472	
  1473		spin_lock_irq(&q->lock);
  1474		folio_set_waiters(folio);
  1475		if (!folio_trylock_flag(folio, PG_locked, wait))
  1476			__add_wait_queue_entry_tail(q, wait);
  1477		spin_unlock_irq(&q->lock);
  1478	
  1479		/*
  1480		 * If a migration entry exists for the page the migration path must hold
  1481		 * a valid reference to the page, and it must take the ptl to remove the
  1482		 * migration entry. So the page is valid until the ptl is dropped.
  1483		 */
  1484		if (ptep)
  1485			pte_unmap_unlock(ptep, ptl);
  1486		else
  1487			spin_unlock(ptl);
  1488	
  1489		for (;;) {
  1490			unsigned int flags;
  1491	
  1492			set_current_state(TASK_UNINTERRUPTIBLE);
  1493	
  1494			/* Loop until we've been woken or interrupted */
  1495			flags = smp_load_acquire(&wait->flags);
  1496			if (!(flags & WQ_FLAG_WOKEN)) {
  1497				if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
  1498					break;
  1499	
  1500				io_schedule();
  1501				continue;
  1502			}
  1503			break;
  1504		}
  1505	
  1506		finish_wait(q, wait);
  1507	
  1508		if (thrashing) {
  1509			if (delayacct)
  1510				delayacct_thrashing_end();
  1511			psi_memstall_leave(&pflags);
  1512		}
  1513	}
  1514	#endif
  1515	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Matthew Wilcox Nov. 15, 2021, 5:55 p.m. UTC | #2
On Mon, Nov 15, 2021 at 09:52:22PM +1100, Alistair Popple wrote:
> +#ifdef CONFIG_MIGRATION
> +/**
> + * migration_entry_wait_on_locked - Wait for a migration entry to be removed
> + * @page: page referenced by the migration entry.

This should be @folio (you can test by running 'make htmldocs', or even
'make W=1'

> + * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
> + * @ptl: already locked ptl. This function will drop the lock.
> + *
> + * Wait for a migration entry referencing the given page to be removed. This is
> + * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
> + * this can be called without taking a reference on the page. Instead this
> + * should be called while holding the ptl for the migration entry referencing
> + * the page.

The tool won't tell you to update these page references to be folio
references ... so I will ;-)

> +++ b/mm/migrate.c
> @@ -305,15 +305,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
>  	page = pfn_swap_entry_to_page(entry);
>  	page = compound_head(page);

I think this whole function should be folio-based.  That is:

-	struct page *page;
+	struct folio *folio;

and
	folio = page_folio(pfn_swap_entry_to_page(entry));
kernel test robot Nov. 15, 2021, 8:40 p.m. UTC | #3
Hi Alistair,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linux/master]
[also build test WARNING on linus/master v5.16-rc1 next-20211115]
[cannot apply to hnaz-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211115-185444
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git debe436e77c72fcee804fb867f275e6d31aa999c
config: i386-randconfig-r015-20211115 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project fbe72e41b99dc7994daac300d208a955be3e4a0a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/91a437ddc7606450e331059d80babe2d4c1163e0
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alistair-Popple/mm-migrate-c-Rework-migration_entry_wait-to-not-take-a-pageref/20211115-185444
        git checkout 91a437ddc7606450e331059d80babe2d4c1163e0
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> mm/filemap.c:1447:6: warning: no previous prototype for function 'migration_entry_wait_on_locked' [-Wmissing-prototypes]
   void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
        ^
   mm/filemap.c:1447:1: note: declare 'static' if the function is not intended to be used outside of this translation unit
   void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
   ^
   static 
   1 warning generated.


vim +/migration_entry_wait_on_locked +1447 mm/filemap.c

  1428	
  1429	#ifdef CONFIG_MIGRATION
  1430	/**
  1431	 * migration_entry_wait_on_locked - Wait for a migration entry to be removed
  1432	 * @page: page referenced by the migration entry.
  1433	 * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
  1434	 * @ptl: already locked ptl. This function will drop the lock.
  1435	 *
  1436	 * Wait for a migration entry referencing the given page to be removed. This is
  1437	 * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
  1438	 * this can be called without taking a reference on the page. Instead this
  1439	 * should be called while holding the ptl for the migration entry referencing
  1440	 * the page.
  1441	 *
  1442	 * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
  1443	 *
  1444	 * This follows the same logic as wait_on_page_bit_common() so see the comments
  1445	 * there.
  1446	 */
> 1447	void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
  1448					spinlock_t *ptl)
  1449	{
  1450		struct wait_page_queue wait_page;
  1451		wait_queue_entry_t *wait = &wait_page.wait;
  1452		bool thrashing = false;
  1453		bool delayacct = false;
  1454		unsigned long pflags;
  1455		wait_queue_head_t *q;
  1456	
  1457		q = folio_waitqueue(folio);
  1458		if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
  1459			if (!folio_test_swapbacked(folio)) {
  1460				delayacct_thrashing_start();
  1461				delayacct = true;
  1462			}
  1463			psi_memstall_enter(&pflags);
  1464			thrashing = true;
  1465		}
  1466	
  1467		init_wait(wait);
  1468		wait->func = wake_page_function;
  1469		wait_page.folio = folio;
  1470		wait_page.bit_nr = PG_locked;
  1471		wait->flags = 0;
  1472	
  1473		spin_lock_irq(&q->lock);
  1474		folio_set_waiters(folio);
  1475		if (!folio_trylock_flag(folio, PG_locked, wait))
  1476			__add_wait_queue_entry_tail(q, wait);
  1477		spin_unlock_irq(&q->lock);
  1478	
  1479		/*
  1480		 * If a migration entry exists for the page the migration path must hold
  1481		 * a valid reference to the page, and it must take the ptl to remove the
  1482		 * migration entry. So the page is valid until the ptl is dropped.
  1483		 */
  1484		if (ptep)
  1485			pte_unmap_unlock(ptep, ptl);
  1486		else
  1487			spin_unlock(ptl);
  1488	
  1489		for (;;) {
  1490			unsigned int flags;
  1491	
  1492			set_current_state(TASK_UNINTERRUPTIBLE);
  1493	
  1494			/* Loop until we've been woken or interrupted */
  1495			flags = smp_load_acquire(&wait->flags);
  1496			if (!(flags & WQ_FLAG_WOKEN)) {
  1497				if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
  1498					break;
  1499	
  1500				io_schedule();
  1501				continue;
  1502			}
  1503			break;
  1504		}
  1505	
  1506		finish_wait(q, wait);
  1507	
  1508		if (thrashing) {
  1509			if (delayacct)
  1510				delayacct_thrashing_end();
  1511			psi_memstall_leave(&pflags);
  1512		}
  1513	}
  1514	#endif
  1515	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/include/linux/migrate.h b/include/linux/migrate.h
index 4850cc5bf813..54579902ec9f 100644
--- a/include/linux/migrate.h
+++ b/include/linux/migrate.h
@@ -40,6 +40,8 @@  extern int migrate_huge_page_move_mapping(struct address_space *mapping,
 				  struct page *newpage, struct page *page);
 extern int migrate_page_move_mapping(struct address_space *mapping,
 		struct page *newpage, struct page *page, int extra_count);
+void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
+				spinlock_t *ptl);
 void folio_migrate_flags(struct folio *newfolio, struct folio *folio);
 void folio_migrate_copy(struct folio *newfolio, struct folio *folio);
 int folio_migrate_mapping(struct address_space *mapping,
diff --git a/mm/filemap.c b/mm/filemap.c
index daa0e23a6ee6..a812e110df9c 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1426,6 +1426,93 @@  static inline int folio_wait_bit_common(struct folio *folio, int bit_nr,
 	return wait->flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
 }
 
+#ifdef CONFIG_MIGRATION
+/**
+ * migration_entry_wait_on_locked - Wait for a migration entry to be removed
+ * @page: page referenced by the migration entry.
+ * @ptep: mapped pte pointer. This function will return with the ptep unmapped.
+ * @ptl: already locked ptl. This function will drop the lock.
+ *
+ * Wait for a migration entry referencing the given page to be removed. This is
+ * equivalent to put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE) except
+ * this can be called without taking a reference on the page. Instead this
+ * should be called while holding the ptl for the migration entry referencing
+ * the page.
+ *
+ * Returns after unmapping and unlocking the pte/ptl with pte_unmap_unlock().
+ *
+ * This follows the same logic as wait_on_page_bit_common() so see the comments
+ * there.
+ */
+void migration_entry_wait_on_locked(struct folio *folio, pte_t *ptep,
+				spinlock_t *ptl)
+{
+	struct wait_page_queue wait_page;
+	wait_queue_entry_t *wait = &wait_page.wait;
+	bool thrashing = false;
+	bool delayacct = false;
+	unsigned long pflags;
+	wait_queue_head_t *q;
+
+	q = folio_waitqueue(folio);
+	if (!folio_test_uptodate(folio) && folio_test_workingset(folio)) {
+		if (!folio_test_swapbacked(folio)) {
+			delayacct_thrashing_start();
+			delayacct = true;
+		}
+		psi_memstall_enter(&pflags);
+		thrashing = true;
+	}
+
+	init_wait(wait);
+	wait->func = wake_page_function;
+	wait_page.folio = folio;
+	wait_page.bit_nr = PG_locked;
+	wait->flags = 0;
+
+	spin_lock_irq(&q->lock);
+	folio_set_waiters(folio);
+	if (!folio_trylock_flag(folio, PG_locked, wait))
+		__add_wait_queue_entry_tail(q, wait);
+	spin_unlock_irq(&q->lock);
+
+	/*
+	 * If a migration entry exists for the page the migration path must hold
+	 * a valid reference to the page, and it must take the ptl to remove the
+	 * migration entry. So the page is valid until the ptl is dropped.
+	 */
+	if (ptep)
+		pte_unmap_unlock(ptep, ptl);
+	else
+		spin_unlock(ptl);
+
+	for (;;) {
+		unsigned int flags;
+
+		set_current_state(TASK_UNINTERRUPTIBLE);
+
+		/* Loop until we've been woken or interrupted */
+		flags = smp_load_acquire(&wait->flags);
+		if (!(flags & WQ_FLAG_WOKEN)) {
+			if (signal_pending_state(TASK_UNINTERRUPTIBLE, current))
+				break;
+
+			io_schedule();
+			continue;
+		}
+		break;
+	}
+
+	finish_wait(q, wait);
+
+	if (thrashing) {
+		if (delayacct)
+			delayacct_thrashing_end();
+		psi_memstall_leave(&pflags);
+	}
+}
+#endif
+
 void folio_wait_bit(struct folio *folio, int bit_nr)
 {
 	folio_wait_bit_common(folio, bit_nr, TASK_UNINTERRUPTIBLE, SHARED);
diff --git a/mm/migrate.c b/mm/migrate.c
index cf25b00f03c8..8d29a7903f9e 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -305,15 +305,7 @@  void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep,
 	page = pfn_swap_entry_to_page(entry);
 	page = compound_head(page);
 
-	/*
-	 * Once page cache replacement of page migration started, page_count
-	 * is zero; but we must not call put_and_wait_on_page_locked() without
-	 * a ref. Use get_page_unless_zero(), and just fault again if it fails.
-	 */
-	if (!get_page_unless_zero(page))
-		goto out;
-	pte_unmap_unlock(ptep, ptl);
-	put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE);
+	migration_entry_wait_on_locked(page_folio(page), ptep, ptl);
 	return;
 out:
 	pte_unmap_unlock(ptep, ptl);
@@ -344,10 +336,7 @@  void pmd_migration_entry_wait(struct mm_struct *mm, pmd_t *pmd)
 	if (!is_pmd_migration_entry(*pmd))
 		goto unlock;
 	page = pfn_swap_entry_to_page(pmd_to_swp_entry(*pmd));
-	if (!get_page_unless_zero(page))
-		goto unlock;
-	spin_unlock(ptl);
-	put_and_wait_on_page_locked(page, TASK_UNINTERRUPTIBLE);
+	migration_entry_wait_on_locked(page_folio(page), NULL, ptl);
 	return;
 unlock:
 	spin_unlock(ptl);
@@ -2484,22 +2473,8 @@  static bool migrate_vma_check_page(struct page *page)
 		return false;
 
 	/* Page from ZONE_DEVICE have one extra reference */
-	if (is_zone_device_page(page)) {
-		/*
-		 * Private page can never be pin as they have no valid pte and
-		 * GUP will fail for those. Yet if there is a pending migration
-		 * a thread might try to wait on the pte migration entry and
-		 * will bump the page reference count. Sadly there is no way to
-		 * differentiate a regular pin from migration wait. Hence to
-		 * avoid 2 racing thread trying to migrate back to CPU to enter
-		 * infinite loop (one stopping migration because the other is
-		 * waiting on pte migration entry). We always return true here.
-		 *
-		 * FIXME proper solution is to rework migration_entry_wait() so
-		 * it does not need to take a reference on page.
-		 */
-		return is_device_private_page(page);
-	}
+	if (is_zone_device_page(page))
+		extra++;
 
 	/* For file back page */
 	if (page_mapping(page))