Message ID | 20240617143926.1511227-1-zi.yan@sent.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/migrate: fix BUG_ON in migrate_misplaced_folio() and compact_zone() | expand |
On Mon, 17 Jun 2024 10:39:25 -0400 Zi Yan <zi.yan@sent.com> wrote: > From: Zi Yan <ziy@nvidia.com> > > Both Hugh Dickins and kernel test bot hit an issue that migrate_pages() > reports success but still leaves unmigrated pages on from list. It > triggers BUG_ON in migrate_misplaced_folio() and compact_zone(). > > Our convention is that if migrate_pages() reports complete success (0), > then the migratepages list will be empty; but if it reports an error or > some pages remaining, then its caller must putback_movable_pages(). > > There's a new case in which migrate_pages() has been reporting complete > success, but returning with pages left on the migratepages list: when > migrate_pages_batch() successfully split a folio on the deferred list, > but then the "Failure isn't counted" call does not dispose of them all. > > Since that block is expecting the large folio to have been counted as 1 > failure already, and since the return code is later adjusted to success > whenever the returned list is found empty, the simple way to fix this > safely is to count splitting the deferred folio as "a failure". > > This patch is based on Hugh's fix and suggestions from Huang, Ying. > I have just sent Hugh's version in to Linus. So if/when that is merged, please send along any needed updates.
diff --git a/mm/migrate.c b/mm/migrate.c index dd04f578c19c..30f9a61598ea 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1654,7 +1654,16 @@ static int migrate_pages_batch(struct list_head *from, /* * The rare folio on the deferred split list should - * be split now. It should not count as a failure. + * be split now. It should not count as a failure: + * but increment nr_failed, because, without + * doing so, migrate_pages() may report success with + * (split but unmigrated) pages still on its fromlist; + * whereas it always reports success when its fromlist + * is empty. stats->nr_thp_failed should be increased + * too, otherwise stats inconsistency will happen when + * migrate_pages_batch is called via migrate_pages() + * with MIGRATE_SYNC and MIGRATE_ASYNC. + * * Only check it without removing it from the list. * Since the folio can be on deferred_split_scan() * local list and removing it can cause the local list @@ -1669,6 +1678,8 @@ static int migrate_pages_batch(struct list_head *from, if (nr_pages > 2 && !list_empty(&folio->_deferred_list)) { if (try_split_folio(folio, split_folios) == 0) { + nr_failed++; + stats->nr_thp_failed += is_thp; stats->nr_thp_split += is_thp; stats->nr_split++; continue;