diff mbox series

[RFC,3/5] mm/migrate: add migrate_folios_batch_move to batch the folio move operations

Message ID 20250103172419.4148674-4-ziy@nvidia.com
State RFC
Headers show
Series Accelerate page migration with batching and multi threads | expand

Commit Message

Zi Yan Jan. 3, 2025, 5:24 p.m. UTC
This is a preparatory patch that enables batch copying for folios
undergoing migration. By enabling batch copying the folio content, we can
efficiently utilize the capabilities of DMA hardware or multi-threaded
folio copy. It also adds MIGRATE_NO_COPY back to migrate_mode, so that
folio copy will be skipped during metadata copy process and performed
in a batch later.

Currently, the folio move operation is performed individually for each
folio in sequential manner:
for_each_folio() {
        Copy folio metadata like flags and mappings
        Copy the folio content from src to dst
        Update page tables with dst folio
}

With this patch, we transition to a batch processing approach as shown
below:
for_each_folio() {
        Copy folio metadata like flags and mappings
}
Batch copy all src folios to dst
for_each_folio() {
        Update page tables with dst folios
}

dst->private is used to store page states and possible anon_vma value,
thus needs to be cleared during metadata copy process. To avoid additional
memory allocation to store the data during batch copy process, src->private
is used to store the data after metadata copy process, since src is no
longer used.

Originally-by: Shivank Garg <shivankg@amd.com>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 include/linux/migrate_mode.h |   2 +
 mm/migrate.c                 | 207 +++++++++++++++++++++++++++++++++--
 2 files changed, 201 insertions(+), 8 deletions(-)

Comments

kernel test robot Jan. 4, 2025, 4:19 a.m. UTC | #1
Hi Zi,

[This is a private test report for your RFC patch.]
kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master sysctl/sysctl-next v6.13-rc5 next-20241220]
[cannot apply to mcgrof/sysctl-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Zi-Yan/mm-separate-move-undo-doing-on-folio-list-from-migrate_pages_batch/20250104-012955
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20250103172419.4148674-4-ziy%40nvidia.com
patch subject: [RFC PATCH 3/5] mm/migrate: add migrate_folios_batch_move to batch the folio move operations
config: i386-buildonly-randconfig-004-20250104 (https://download.01.org/0day-ci/archive/20250104/202501041206.9cE8Qd8b-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250104/202501041206.9cE8Qd8b-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501041206.9cE8Qd8b-lkp@intel.com/

All warnings (new ones prefixed by >>):

   mm/migrate.c: In function 'migrate_folios_batch_move':
>> mm/migrate.c:1805:14: warning: variable 'is_lru' set but not used [-Wunused-but-set-variable]
    1805 |         bool is_lru;
         |              ^~~~~~


vim +/is_lru +1805 mm/migrate.c

  1791	
  1792	static void migrate_folios_batch_move(struct list_head *src_folios,
  1793			struct list_head *dst_folios,
  1794			free_folio_t put_new_folio, unsigned long private,
  1795			enum migrate_mode mode, int reason,
  1796			struct list_head *ret_folios,
  1797			struct migrate_pages_stats *stats,
  1798			int *retry, int *thp_retry, int *nr_failed,
  1799			int *nr_retry_pages)
  1800	{
  1801		struct folio *folio, *folio2, *dst, *dst2;
  1802		int rc, nr_pages = 0, nr_mig_folios = 0;
  1803		int old_page_state = 0;
  1804		struct anon_vma *anon_vma = NULL;
> 1805		bool is_lru;
  1806		int is_thp = 0;
  1807		LIST_HEAD(err_src);
  1808		LIST_HEAD(err_dst);
  1809	
  1810		if (mode != MIGRATE_ASYNC) {
  1811			*retry += 1;
  1812			return;
  1813		}
  1814	
  1815		/*
  1816		 * Iterate over the list of locked src/dst folios to copy the metadata
  1817		 */
  1818		dst = list_first_entry(dst_folios, struct folio, lru);
  1819		dst2 = list_next_entry(dst, lru);
  1820		list_for_each_entry_safe(folio, folio2, src_folios, lru) {
  1821			is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
  1822			nr_pages = folio_nr_pages(folio);
  1823			is_lru = !__folio_test_movable(folio);
  1824	
  1825			/*
  1826			 * dst->private is not cleared here. It is cleared and moved to
  1827			 * src->private in __migrate_folio().
  1828			 */
  1829			__migrate_folio_read(dst, &old_page_state, &anon_vma);
  1830	
  1831			/*
  1832			 * Use MIGRATE_NO_COPY mode in migrate_folio family functions
  1833			 * to copy the flags, mapping and some other ancillary information.
  1834			 * This does everything except the page copy. The actual page copy
  1835			 * is handled later in a batch manner.
  1836			 */
  1837			rc = _move_to_new_folio_prep(dst, folio, MIGRATE_NO_COPY);
  1838	
  1839			/*
  1840			 * -EAGAIN: Move src/dst folios to tmp lists for retry
  1841			 * Other Errno: Put src folio on ret_folios list, remove the dst folio
  1842			 * Success: Copy the folio bytes, restoring working pte, unlock and
  1843			 *	    decrement refcounter
  1844			 */
  1845			if (rc == -EAGAIN) {
  1846				*retry += 1;
  1847				*thp_retry += is_thp;
  1848				*nr_retry_pages += nr_pages;
  1849	
  1850				list_move_tail(&folio->lru, &err_src);
  1851				list_move_tail(&dst->lru, &err_dst);
  1852				__migrate_folio_record(dst, old_page_state, anon_vma);
  1853			} else if (rc != MIGRATEPAGE_SUCCESS) {
  1854				*nr_failed += 1;
  1855				stats->nr_thp_failed += is_thp;
  1856				stats->nr_failed_pages += nr_pages;
  1857	
  1858				list_del(&dst->lru);
  1859				migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
  1860						anon_vma, true, ret_folios);
  1861				migrate_folio_undo_dst(dst, true, put_new_folio, private);
  1862			} else /* MIGRATEPAGE_SUCCESS */
  1863				nr_mig_folios++;
  1864	
  1865			dst = dst2;
  1866			dst2 = list_next_entry(dst, lru);
  1867		}
  1868	
  1869		/* Exit if folio list for batch migration is empty */
  1870		if (!nr_mig_folios)
  1871			goto out;
  1872	
  1873		/* Batch copy the folios */
  1874		{
  1875			dst = list_first_entry(dst_folios, struct folio, lru);
  1876			dst2 = list_next_entry(dst, lru);
  1877			list_for_each_entry_safe(folio, folio2, src_folios, lru) {
  1878				is_thp = folio_test_large(folio) &&
  1879					 folio_test_pmd_mappable(folio);
  1880				nr_pages = folio_nr_pages(folio);
  1881				rc = folio_mc_copy(dst, folio);
  1882	
  1883				if (rc) {
  1884					int old_page_state = 0;
  1885					struct anon_vma *anon_vma = NULL;
  1886	
  1887					/*
  1888					 * dst->private is moved to src->private in
  1889					 * __migrate_folio(), so page state and anon_vma
  1890					 * values can be extracted from (src) folio.
  1891					 */
  1892					__migrate_folio_extract(folio, &old_page_state,
  1893							&anon_vma);
  1894					migrate_folio_undo_src(folio,
  1895							old_page_state & PAGE_WAS_MAPPED,
  1896							anon_vma, true, ret_folios);
  1897					list_del(&dst->lru);
  1898					migrate_folio_undo_dst(dst, true, put_new_folio,
  1899							private);
  1900				}
  1901	
  1902				switch (rc) {
  1903				case MIGRATEPAGE_SUCCESS:
  1904					stats->nr_succeeded += nr_pages;
  1905					stats->nr_thp_succeeded += is_thp;
  1906					break;
  1907				default:
  1908					*nr_failed += 1;
  1909					stats->nr_thp_failed += is_thp;
  1910					stats->nr_failed_pages += nr_pages;
  1911					break;
  1912				}
  1913	
  1914				dst = dst2;
  1915				dst2 = list_next_entry(dst, lru);
  1916			}
  1917		}
  1918	
  1919		/*
  1920		 * Iterate the folio lists to remove migration pte and restore them
  1921		 * as working pte. Unlock the folios, add/remove them to LRU lists (if
  1922		 * applicable) and release the src folios.
  1923		 */
  1924		dst = list_first_entry(dst_folios, struct folio, lru);
  1925		dst2 = list_next_entry(dst, lru);
  1926		list_for_each_entry_safe(folio, folio2, src_folios, lru) {
  1927			is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
  1928			nr_pages = folio_nr_pages(folio);
  1929			/*
  1930			 * dst->private is moved to src->private in __migrate_folio(),
  1931			 * so page state and anon_vma values can be extracted from
  1932			 * (src) folio.
  1933			 */
  1934			__migrate_folio_extract(folio, &old_page_state, &anon_vma);
  1935			list_del(&dst->lru);
  1936	
  1937			_move_to_new_folio_finalize(dst, folio, MIGRATEPAGE_SUCCESS);
  1938	
  1939			/*
  1940			 * Below few steps are only applicable for lru pages which is
  1941			 * ensured as we have removed the non-lru pages from our list.
  1942			 */
  1943			_migrate_folio_move_finalize1(folio, dst, old_page_state);
  1944	
  1945			_migrate_folio_move_finalize2(folio, dst, reason, anon_vma);
  1946	
  1947			/* Page migration successful, increase stat counter */
  1948			stats->nr_succeeded += nr_pages;
  1949			stats->nr_thp_succeeded += is_thp;
  1950	
  1951			dst = dst2;
  1952			dst2 = list_next_entry(dst, lru);
  1953		}
  1954	out:
  1955		/* Add tmp folios back to the list to let CPU re-attempt migration. */
  1956		list_splice(&err_src, src_folios);
  1957		list_splice(&err_dst, dst_folios);
  1958	}
  1959
diff mbox series

Patch

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index 265c4328b36a..9af6c949a057 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -7,11 +7,13 @@ 
  *	on most operations but not ->writepage as the potential stall time
  *	is too significant
  * MIGRATE_SYNC will block when migrating pages
+ * MIGRATE_NO_COPY will not copy page content
  */
 enum migrate_mode {
 	MIGRATE_ASYNC,
 	MIGRATE_SYNC_LIGHT,
 	MIGRATE_SYNC,
+	MIGRATE_NO_COPY,
 };
 
 enum migrate_reason {
diff --git a/mm/migrate.c b/mm/migrate.c
index a83508f94c57..95c4cc4a7823 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -51,6 +51,7 @@ 
 
 #include "internal.h"
 
+
 bool isolate_movable_page(struct page *page, isolate_mode_t mode)
 {
 	struct folio *folio = folio_get_nontail_page(page);
@@ -752,14 +753,19 @@  static int __migrate_folio(struct address_space *mapping, struct folio *dst,
 			   enum migrate_mode mode)
 {
 	int rc, expected_count = folio_expected_refs(mapping, src);
+	unsigned long dst_private = (unsigned long)dst->private;
 
 	/* Check whether src does not have extra refs before we do more work */
 	if (folio_ref_count(src) != expected_count)
 		return -EAGAIN;
 
-	rc = folio_mc_copy(dst, src);
-	if (unlikely(rc))
-		return rc;
+	if (mode == MIGRATE_NO_COPY)
+		dst->private = NULL;
+	else {
+		rc = folio_mc_copy(dst, src);
+		if (unlikely(rc))
+			return rc;
+	}
 
 	rc = __folio_migrate_mapping(mapping, dst, src, expected_count);
 	if (rc != MIGRATEPAGE_SUCCESS)
@@ -769,6 +775,10 @@  static int __migrate_folio(struct address_space *mapping, struct folio *dst,
 		folio_attach_private(dst, folio_detach_private(src));
 
 	folio_migrate_flags(dst, src);
+
+	if (mode == MIGRATE_NO_COPY)
+		src->private = (void *)dst_private;
+
 	return MIGRATEPAGE_SUCCESS;
 }
 
@@ -1042,7 +1052,7 @@  static int _move_to_new_folio_prep(struct folio *dst, struct folio *src,
 								mode);
 		else
 			rc = fallback_migrate_folio(mapping, dst, src, mode);
-	} else {
+	} else if (mode != MIGRATE_NO_COPY) {
 		const struct movable_operations *mops;
 
 		/*
@@ -1060,7 +1070,8 @@  static int _move_to_new_folio_prep(struct folio *dst, struct folio *src,
 		rc = mops->migrate_page(&dst->page, &src->page, mode);
 		WARN_ON_ONCE(rc == MIGRATEPAGE_SUCCESS &&
 				!folio_test_isolated(src));
-	}
+	} else
+		rc = -EAGAIN;
 out:
 	return rc;
 }
@@ -1138,7 +1149,7 @@  static void __migrate_folio_record(struct folio *dst,
 	dst->private = (void *)anon_vma + old_page_state;
 }
 
-static void __migrate_folio_extract(struct folio *dst,
+static void __migrate_folio_read(struct folio *dst,
 				   int *old_page_state,
 				   struct anon_vma **anon_vmap)
 {
@@ -1146,6 +1157,13 @@  static void __migrate_folio_extract(struct folio *dst,
 
 	*anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES);
 	*old_page_state = private & PAGE_OLD_STATES;
+}
+
+static void __migrate_folio_extract(struct folio *dst,
+				   int *old_page_state,
+				   struct anon_vma **anon_vmap)
+{
+	__migrate_folio_read(dst, old_page_state, anon_vmap);
 	dst->private = NULL;
 }
 
@@ -1771,6 +1789,174 @@  static void migrate_folios_move(struct list_head *src_folios,
 	}
 }
 
+static void migrate_folios_batch_move(struct list_head *src_folios,
+		struct list_head *dst_folios,
+		free_folio_t put_new_folio, unsigned long private,
+		enum migrate_mode mode, int reason,
+		struct list_head *ret_folios,
+		struct migrate_pages_stats *stats,
+		int *retry, int *thp_retry, int *nr_failed,
+		int *nr_retry_pages)
+{
+	struct folio *folio, *folio2, *dst, *dst2;
+	int rc, nr_pages = 0, nr_mig_folios = 0;
+	int old_page_state = 0;
+	struct anon_vma *anon_vma = NULL;
+	bool is_lru;
+	int is_thp = 0;
+	LIST_HEAD(err_src);
+	LIST_HEAD(err_dst);
+
+	if (mode != MIGRATE_ASYNC) {
+		*retry += 1;
+		return;
+	}
+
+	/*
+	 * Iterate over the list of locked src/dst folios to copy the metadata
+	 */
+	dst = list_first_entry(dst_folios, struct folio, lru);
+	dst2 = list_next_entry(dst, lru);
+	list_for_each_entry_safe(folio, folio2, src_folios, lru) {
+		is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
+		nr_pages = folio_nr_pages(folio);
+		is_lru = !__folio_test_movable(folio);
+
+		/*
+		 * dst->private is not cleared here. It is cleared and moved to
+		 * src->private in __migrate_folio().
+		 */
+		__migrate_folio_read(dst, &old_page_state, &anon_vma);
+
+		/*
+		 * Use MIGRATE_NO_COPY mode in migrate_folio family functions
+		 * to copy the flags, mapping and some other ancillary information.
+		 * This does everything except the page copy. The actual page copy
+		 * is handled later in a batch manner.
+		 */
+		rc = _move_to_new_folio_prep(dst, folio, MIGRATE_NO_COPY);
+
+		/*
+		 * -EAGAIN: Move src/dst folios to tmp lists for retry
+		 * Other Errno: Put src folio on ret_folios list, remove the dst folio
+		 * Success: Copy the folio bytes, restoring working pte, unlock and
+		 *	    decrement refcounter
+		 */
+		if (rc == -EAGAIN) {
+			*retry += 1;
+			*thp_retry += is_thp;
+			*nr_retry_pages += nr_pages;
+
+			list_move_tail(&folio->lru, &err_src);
+			list_move_tail(&dst->lru, &err_dst);
+			__migrate_folio_record(dst, old_page_state, anon_vma);
+		} else if (rc != MIGRATEPAGE_SUCCESS) {
+			*nr_failed += 1;
+			stats->nr_thp_failed += is_thp;
+			stats->nr_failed_pages += nr_pages;
+
+			list_del(&dst->lru);
+			migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED,
+					anon_vma, true, ret_folios);
+			migrate_folio_undo_dst(dst, true, put_new_folio, private);
+		} else /* MIGRATEPAGE_SUCCESS */
+			nr_mig_folios++;
+
+		dst = dst2;
+		dst2 = list_next_entry(dst, lru);
+	}
+
+	/* Exit if folio list for batch migration is empty */
+	if (!nr_mig_folios)
+		goto out;
+
+	/* Batch copy the folios */
+	{
+		dst = list_first_entry(dst_folios, struct folio, lru);
+		dst2 = list_next_entry(dst, lru);
+		list_for_each_entry_safe(folio, folio2, src_folios, lru) {
+			is_thp = folio_test_large(folio) &&
+				 folio_test_pmd_mappable(folio);
+			nr_pages = folio_nr_pages(folio);
+			rc = folio_mc_copy(dst, folio);
+
+			if (rc) {
+				int old_page_state = 0;
+				struct anon_vma *anon_vma = NULL;
+
+				/*
+				 * dst->private is moved to src->private in
+				 * __migrate_folio(), so page state and anon_vma
+				 * values can be extracted from (src) folio.
+				 */
+				__migrate_folio_extract(folio, &old_page_state,
+						&anon_vma);
+				migrate_folio_undo_src(folio,
+						old_page_state & PAGE_WAS_MAPPED,
+						anon_vma, true, ret_folios);
+				list_del(&dst->lru);
+				migrate_folio_undo_dst(dst, true, put_new_folio,
+						private);
+			}
+
+			switch (rc) {
+			case MIGRATEPAGE_SUCCESS:
+				stats->nr_succeeded += nr_pages;
+				stats->nr_thp_succeeded += is_thp;
+				break;
+			default:
+				*nr_failed += 1;
+				stats->nr_thp_failed += is_thp;
+				stats->nr_failed_pages += nr_pages;
+				break;
+			}
+
+			dst = dst2;
+			dst2 = list_next_entry(dst, lru);
+		}
+	}
+
+	/*
+	 * Iterate the folio lists to remove migration pte and restore them
+	 * as working pte. Unlock the folios, add/remove them to LRU lists (if
+	 * applicable) and release the src folios.
+	 */
+	dst = list_first_entry(dst_folios, struct folio, lru);
+	dst2 = list_next_entry(dst, lru);
+	list_for_each_entry_safe(folio, folio2, src_folios, lru) {
+		is_thp = folio_test_large(folio) && folio_test_pmd_mappable(folio);
+		nr_pages = folio_nr_pages(folio);
+		/*
+		 * dst->private is moved to src->private in __migrate_folio(),
+		 * so page state and anon_vma values can be extracted from
+		 * (src) folio.
+		 */
+		__migrate_folio_extract(folio, &old_page_state, &anon_vma);
+		list_del(&dst->lru);
+
+		_move_to_new_folio_finalize(dst, folio, MIGRATEPAGE_SUCCESS);
+
+		/*
+		 * Below few steps are only applicable for lru pages which is
+		 * ensured as we have removed the non-lru pages from our list.
+		 */
+		_migrate_folio_move_finalize1(folio, dst, old_page_state);
+
+		_migrate_folio_move_finalize2(folio, dst, reason, anon_vma);
+
+		/* Page migration successful, increase stat counter */
+		stats->nr_succeeded += nr_pages;
+		stats->nr_thp_succeeded += is_thp;
+
+		dst = dst2;
+		dst2 = list_next_entry(dst, lru);
+	}
+out:
+	/* Add tmp folios back to the list to let CPU re-attempt migration. */
+	list_splice(&err_src, src_folios);
+	list_splice(&err_dst, dst_folios);
+}
+
 static void migrate_folios_undo(struct list_head *src_folios,
 		struct list_head *dst_folios,
 		free_folio_t put_new_folio, unsigned long private,
@@ -1981,13 +2167,18 @@  static int migrate_pages_batch(struct list_head *from,
 	/* Flush TLBs for all unmapped folios */
 	try_to_unmap_flush();
 
-	retry = 1;
+	retry = 0;
+	/* Batch move the unmapped folios */
+	migrate_folios_batch_move(&unmap_folios, &dst_folios, put_new_folio,
+			private, mode, reason, ret_folios, stats, &retry,
+			&thp_retry, &nr_failed, &nr_retry_pages);
+
 	for (pass = 0; pass < nr_pass && retry; pass++) {
 		retry = 0;
 		thp_retry = 0;
 		nr_retry_pages = 0;
 
-		/* Move the unmapped folios */
+		/* Move the remaining unmapped folios */
 		migrate_folios_move(&unmap_folios, &dst_folios,
 				put_new_folio, private, mode, reason,
 				ret_folios, stats, &retry, &thp_retry,