diff mbox series

[v4] mm/migrate: split source folio if it is on deferred split list

Message ID 20240320014511.306128-1-zi.yan@sent.com (mailing list archive)
State New
Headers show
Series [v4] mm/migrate: split source folio if it is on deferred split list | expand

Commit Message

Zi Yan March 20, 2024, 1:45 a.m. UTC
From: Zi Yan <ziy@nvidia.com>

If the source folio is on deferred split list, it is likely some subpages
are not used. Split it before migration to avoid migrating unused subpages.

Commit 616b8371539a6 ("mm: thp: enable thp migration in generic path")
did not check if a THP is on deferred split list before migration, thus,
the destination THP is never put on deferred split list even if the source
THP might be. The opportunity of reclaiming free pages in a partially
mapped THP during deferred list scanning is lost, but no other harmful
consequence is present[1].

From v3:
1. Guarded deferred list code behind CONFIG_TRANSPARENT_HUGEPAGE to avoid
   compilation error (per SeongJae Park).

From v2:
1. Split the source folio instead of migrating it (per Matthew Wilcox)[2].

From v1:
1. Used dst to get correct deferred split list after migration
   (per Ryan Roberts).

[1]: https://lore.kernel.org/linux-mm/03CE3A00-917C-48CC-8E1C-6A98713C817C@nvidia.com/
[2]: https://lore.kernel.org/linux-mm/Ze_P6xagdTbcu1Kz@casper.infradead.org/

Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/huge_memory.c | 22 -----------------
 mm/internal.h    | 25 +++++++++++++++++++
 mm/migrate.c     | 62 +++++++++++++++++++++++++++++++++++++++---------
 3 files changed, 76 insertions(+), 33 deletions(-)

Comments

Matthew Wilcox March 20, 2024, 4:02 p.m. UTC | #1
On Tue, Mar 19, 2024 at 09:45:11PM -0400, Zi Yan wrote:
> +++ b/mm/migrate.c
> @@ -1654,25 +1654,65 @@ static int migrate_pages_batch(struct list_head *from,
>  
>  			/*
>  			 * Large folio migration might be unsupported or
> -			 * the allocation might be failed so we should retry
> -			 * on the same folio with the large folio split
> +			 * the folio is on deferred split list so we should
> +			 * retry on the same folio with the large folio split
>  			 * to normal folios.
>  			 *
>  			 * Split folios are put in split_folios, and
>  			 * we will migrate them after the rest of the
>  			 * list is processed.
>  			 */
> -			if (!thp_migration_supported() && is_thp) {
> -				nr_failed++;
> -				stats->nr_thp_failed++;
> -				if (!try_split_folio(folio, split_folios)) {
> -					stats->nr_thp_split++;
> -					stats->nr_split++;
> +			if (is_thp) {
> +				bool is_on_deferred_list = false;
> +
> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
> +				/*
> +				 * Check without taking split_queue_lock to
> +				 * reduce locking overheads. The worst case is
> +				 * that if the folio is put on the deferred
> +				 * split list after the check, it will be
> +				 * migrated and not put back on the list.
> +				 * The migrated folio will not be split
> +				 * via shrinker during memory pressure.
> +				 */
> +				if (!data_race(list_empty(&folio->_deferred_list))) {
> +					struct deferred_split *ds_queue;
> +					unsigned long flags;
> +
> +					ds_queue =
> +						get_deferred_split_queue(folio);
> +					spin_lock_irqsave(&ds_queue->split_queue_lock,
> +							  flags);
> +					/*
> +					 * Only check if the folio is on
> +					 * deferred split list without removing
> +					 * it. Since the folio can be on
> +					 * deferred_split_scan() local list and
> +					 * removing it can cause the local list
> +					 * corruption. Folio split process
> +					 * below can handle it with the help of
> +					 * folio_ref_freeze().
> +					 */
> +					is_on_deferred_list =
> +						!list_empty(&folio->_deferred_list);
> +					spin_unlock_irqrestore(&ds_queue->split_queue_lock,
> +							       flags);
> +				}
> +#endif
> +				if (!thp_migration_supported() ||
> +						is_on_deferred_list) {
> +					nr_failed++;
> +					stats->nr_thp_failed++;
> +					if (!try_split_folio(folio,
> +							     split_folios)) {
> +						stats->nr_thp_split++;
> +						stats->nr_split++;
> +						continue;
> +					}
> +					stats->nr_failed_pages += nr_pages;
> +					list_move_tail(&folio->lru, ret_folios);
>  					continue;
>  				}
> -				stats->nr_failed_pages += nr_pages;
> -				list_move_tail(&folio->lru, ret_folios);
> -				continue;
>  			}

I don't think we need to try quite this hard.  I don't think we need
to take the lock to be certain if it's on the deferred list -- is
there anything preventing the folio being added to the deferred list
after we drop the lock?

I also don't think we should account this as a thp split since those
are treated by callers as failures.  So maybe this?

+++ b/mm/migrate.c
@@ -1652,6 +1652,17 @@ static int migrate_pages_batch(struct list_head *from,

                        cond_resched();

+                       /*
+                        * The rare folio on the deferred split list should
+                        * be split now.  It should not count as a failure.
+                        */
+                       if (nr_pages > 2 &&
+                           !list_empty(&folio->_deferred_list)) {
+                               if (try_split_folio(folio, from) == 0) {
+                                       is_large = is_thp = false;
+                                       nr_pages = 1;
+                               }
+                       }
                        /*
                         * Large folio migration might be unsupported or
                         * the allocation might be failed so we should retry
Zi Yan March 20, 2024, 4:24 p.m. UTC | #2
On 20 Mar 2024, at 12:02, Matthew Wilcox wrote:

> On Tue, Mar 19, 2024 at 09:45:11PM -0400, Zi Yan wrote:
>> +++ b/mm/migrate.c
>> @@ -1654,25 +1654,65 @@ static int migrate_pages_batch(struct list_head *from,
>>
>>  			/*
>>  			 * Large folio migration might be unsupported or
>> -			 * the allocation might be failed so we should retry
>> -			 * on the same folio with the large folio split
>> +			 * the folio is on deferred split list so we should
>> +			 * retry on the same folio with the large folio split
>>  			 * to normal folios.
>>  			 *
>>  			 * Split folios are put in split_folios, and
>>  			 * we will migrate them after the rest of the
>>  			 * list is processed.
>>  			 */
>> -			if (!thp_migration_supported() && is_thp) {
>> -				nr_failed++;
>> -				stats->nr_thp_failed++;
>> -				if (!try_split_folio(folio, split_folios)) {
>> -					stats->nr_thp_split++;
>> -					stats->nr_split++;
>> +			if (is_thp) {
>> +				bool is_on_deferred_list = false;
>> +
>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>> +				/*
>> +				 * Check without taking split_queue_lock to
>> +				 * reduce locking overheads. The worst case is
>> +				 * that if the folio is put on the deferred
>> +				 * split list after the check, it will be
>> +				 * migrated and not put back on the list.
>> +				 * The migrated folio will not be split
>> +				 * via shrinker during memory pressure.
>> +				 */
>> +				if (!data_race(list_empty(&folio->_deferred_list))) {
>> +					struct deferred_split *ds_queue;
>> +					unsigned long flags;
>> +
>> +					ds_queue =
>> +						get_deferred_split_queue(folio);
>> +					spin_lock_irqsave(&ds_queue->split_queue_lock,
>> +							  flags);
>> +					/*
>> +					 * Only check if the folio is on
>> +					 * deferred split list without removing
>> +					 * it. Since the folio can be on
>> +					 * deferred_split_scan() local list and
>> +					 * removing it can cause the local list
>> +					 * corruption. Folio split process
>> +					 * below can handle it with the help of
>> +					 * folio_ref_freeze().
>> +					 */
>> +					is_on_deferred_list =
>> +						!list_empty(&folio->_deferred_list);
>> +					spin_unlock_irqrestore(&ds_queue->split_queue_lock,
>> +							       flags);
>> +				}
>> +#endif
>> +				if (!thp_migration_supported() ||
>> +						is_on_deferred_list) {
>> +					nr_failed++;
>> +					stats->nr_thp_failed++;
>> +					if (!try_split_folio(folio,
>> +							     split_folios)) {
>> +						stats->nr_thp_split++;
>> +						stats->nr_split++;
>> +						continue;
>> +					}
>> +					stats->nr_failed_pages += nr_pages;
>> +					list_move_tail(&folio->lru, ret_folios);
>>  					continue;
>>  				}
>> -				stats->nr_failed_pages += nr_pages;
>> -				list_move_tail(&folio->lru, ret_folios);
>> -				continue;
>>  			}
>
> I don't think we need to try quite this hard.  I don't think we need
> to take the lock to be certain if it's on the deferred list -- is
> there anything preventing the folio being added to the deferred list
> after we drop the lock?

No. OK, I will use the less hard version.

>
> I also don't think we should account this as a thp split since those
> are treated by callers as failures.  So maybe this?

I think we need to match the stats with code behavior, otherwise userspace
caller can get confused by the results, where only a subset of a folio
is migrated and split stats and failure stats are not bumped accordingly.

>
> +++ b/mm/migrate.c
> @@ -1652,6 +1652,17 @@ static int migrate_pages_batch(struct list_head *from,
>
>                         cond_resched();
>
> +                       /*
> +                        * The rare folio on the deferred split list should
> +                        * be split now.  It should not count as a failure.
> +                        */
> +                       if (nr_pages > 2 &&
> +                           !list_empty(&folio->_deferred_list)) {
> +                               if (try_split_folio(folio, from) == 0) {
> +                                       is_large = is_thp = false;
> +                                       nr_pages = 1;
> +                               }
> +                       }
>                         /*
>                          * Large folio migration might be unsupported or
>                          * the allocation might be failed so we should retry


--
Best Regards,
Yan, Zi
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index b40bd9f3ead5..c77cedf45f3a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -766,28 +766,6 @@  pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma)
 	return pmd;
 }
 
-#ifdef CONFIG_MEMCG
-static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
-{
-	struct mem_cgroup *memcg = folio_memcg(folio);
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	if (memcg)
-		return &memcg->deferred_split_queue;
-	else
-		return &pgdat->deferred_split_queue;
-}
-#else
-static inline
-struct deferred_split *get_deferred_split_queue(struct folio *folio)
-{
-	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
-
-	return &pgdat->deferred_split_queue;
-}
-#endif
-
 void folio_prep_large_rmappable(struct folio *folio)
 {
 	if (!folio || !folio_test_large(folio))
diff --git a/mm/internal.h b/mm/internal.h
index 85c3db43454d..56cf2051cb88 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -1106,6 +1106,31 @@  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
 				   unsigned long addr, pmd_t *pmd,
 				   unsigned int flags);
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+#ifdef CONFIG_MEMCG
+static inline
+struct deferred_split *get_deferred_split_queue(struct folio *folio)
+{
+	struct mem_cgroup *memcg = folio_memcg(folio);
+	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
+
+	if (memcg)
+		return &memcg->deferred_split_queue;
+	else
+		return &pgdat->deferred_split_queue;
+}
+#else
+static inline
+struct deferred_split *get_deferred_split_queue(struct folio *folio)
+{
+	struct pglist_data *pgdat = NODE_DATA(folio_nid(folio));
+
+	return &pgdat->deferred_split_queue;
+}
+#endif
+#endif
+
+
 /*
  * mm/mmap.c
  */
diff --git a/mm/migrate.c b/mm/migrate.c
index 73a052a382f1..e80cb0f46342 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1654,25 +1654,65 @@  static int migrate_pages_batch(struct list_head *from,
 
 			/*
 			 * Large folio migration might be unsupported or
-			 * the allocation might be failed so we should retry
-			 * on the same folio with the large folio split
+			 * the folio is on deferred split list so we should
+			 * retry on the same folio with the large folio split
 			 * to normal folios.
 			 *
 			 * Split folios are put in split_folios, and
 			 * we will migrate them after the rest of the
 			 * list is processed.
 			 */
-			if (!thp_migration_supported() && is_thp) {
-				nr_failed++;
-				stats->nr_thp_failed++;
-				if (!try_split_folio(folio, split_folios)) {
-					stats->nr_thp_split++;
-					stats->nr_split++;
+			if (is_thp) {
+				bool is_on_deferred_list = false;
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+				/*
+				 * Check without taking split_queue_lock to
+				 * reduce locking overheads. The worst case is
+				 * that if the folio is put on the deferred
+				 * split list after the check, it will be
+				 * migrated and not put back on the list.
+				 * The migrated folio will not be split
+				 * via shrinker during memory pressure.
+				 */
+				if (!data_race(list_empty(&folio->_deferred_list))) {
+					struct deferred_split *ds_queue;
+					unsigned long flags;
+
+					ds_queue =
+						get_deferred_split_queue(folio);
+					spin_lock_irqsave(&ds_queue->split_queue_lock,
+							  flags);
+					/*
+					 * Only check if the folio is on
+					 * deferred split list without removing
+					 * it. Since the folio can be on
+					 * deferred_split_scan() local list and
+					 * removing it can cause the local list
+					 * corruption. Folio split process
+					 * below can handle it with the help of
+					 * folio_ref_freeze().
+					 */
+					is_on_deferred_list =
+						!list_empty(&folio->_deferred_list);
+					spin_unlock_irqrestore(&ds_queue->split_queue_lock,
+							       flags);
+				}
+#endif
+				if (!thp_migration_supported() ||
+						is_on_deferred_list) {
+					nr_failed++;
+					stats->nr_thp_failed++;
+					if (!try_split_folio(folio,
+							     split_folios)) {
+						stats->nr_thp_split++;
+						stats->nr_split++;
+						continue;
+					}
+					stats->nr_failed_pages += nr_pages;
+					list_move_tail(&folio->lru, ret_folios);
 					continue;
 				}
-				stats->nr_failed_pages += nr_pages;
-				list_move_tail(&folio->lru, ret_folios);
-				continue;
 			}
 
 			rc = migrate_folio_unmap(get_new_folio, put_new_folio,