Message ID | 1735981122-2085-1-git-send-email-yangge1116@126.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: compaction: skip memory compaction when there are not enough migratable pages | expand |
Hi, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] url: https://github.com/intel-lab-lkp/linux/commits/yangge1116-126-com/mm-compaction-skip-memory-compaction-when-there-are-not-enough-migratable-pages/20250104-170112 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/1735981122-2085-1-git-send-email-yangge1116%40126.com patch subject: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages config: i386-buildonly-randconfig-001-20250104 (https://download.01.org/0day-ci/archive/20250104/202501041908.jDpLhAgL-lkp@intel.com/config) compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250104/202501041908.jDpLhAgL-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202501041908.jDpLhAgL-lkp@intel.com/ All warnings (new ones prefixed by >>): In file included from mm/compaction.c:15: include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~ ^ ~~~ include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion] 49 | NR_ZONE_LRU_BASE + lru, nr_pages); | ~~~~~~~~~~~~~~~~ ^ ~~~ >> mm/compaction.c:2386:13: warning: unused variable 'pgdat' [-Wunused-variable] 2386 | pg_data_t *pgdat = zone->zone_pgdat; | ^~~~~ 3 warnings generated. vim +/pgdat +2386 mm/compaction.c 2381 2382 static bool __compaction_suitable(struct zone *zone, int order, 2383 int highest_zoneidx, 2384 unsigned long wmark_target) 2385 { > 2386 pg_data_t *pgdat = zone->zone_pgdat; 2387 unsigned long sum, nr_pinned; 2388 unsigned long watermark; 2389 2390 sum = node_page_state(pgdat, NR_INACTIVE_FILE) + 2391 node_page_state(pgdat, NR_INACTIVE_ANON) + 2392 node_page_state(pgdat, NR_ACTIVE_FILE) + 2393 node_page_state(pgdat, NR_ACTIVE_ANON); 2394 2395 nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - 2396 node_page_state(pgdat, NR_FOLL_PIN_RELEASED); 2397 2398 /* 2399 * Gup-pinned pages are non-migratable. After subtracting these pages, 2400 * we need to check if the remaining pages are sufficient for memory 2401 * compaction. 2402 */ 2403 if ((sum - nr_pinned) < (1 << order)) 2404 return false; 2405 2406 /* 2407 * Watermarks for order-0 must be met for compaction to be able to 2408 * isolate free pages for migration targets. This means that the 2409 * watermark and alloc_flags have to match, or be more pessimistic than 2410 * the check in __isolate_free_page(). We don't use the direct 2411 * compactor's alloc_flags, as they are not relevant for freepage 2412 * isolation. We however do use the direct compactor's highest_zoneidx 2413 * to skip over zones where lowmem reserves would prevent allocation 2414 * even if compaction succeeds. 2415 * For costly orders, we require low watermark instead of min for 2416 * compaction to proceed to increase its chances. 2417 * ALLOC_CMA is used, as pages in CMA pageblocks are considered 2418 * suitable migration targets 2419 */ 2420 watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? 2421 low_wmark_pages(zone) : min_wmark_pages(zone); 2422 watermark += compact_gap(order); 2423 return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, 2424 ALLOC_CMA, wmark_target); 2425 } 2426
On 2025/1/4 16:58, yangge1116@126.com wrote: > From: yangge <yangge1116@126.com> > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum of > 16 GB of no-CMA memory on a NUMA node can be used as virtual machine > memory. There is 16GB of free CMA memory on a NUMA node, which is > sufficient to pass the order-0 watermark check, causing the > __compaction_suitable() function to consistently return true. > However, if there aren't enough migratable pages available, performing > memory compaction is also meaningless. Besides checking whether > the order-0 watermark is met, __compaction_suitable() also needs > to determine whether there are sufficient migratable pages available > for memory compaction. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > When the 16G of non-CMA memory on a single node is exhausted, we will > fallback to allocating memory on other nodes. In order to quickly > fallback to remote nodes, we should skip memory compaction when > migratable pages are insufficient. After this fix, it only takes a > few tens of seconds to start a 32GB virtual machine with device > passthrough functionality. > > Signed-off-by: yangge <yangge1116@126.com> > --- > mm/compaction.c | 19 +++++++++++++++++++ > 1 file changed, 19 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 07bd227..1c469b3 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone *zone, int order, > int highest_zoneidx, > unsigned long wmark_target) > { > + pg_data_t *pgdat = zone->zone_pgdat; > + unsigned long sum, nr_pinned; > unsigned long watermark; > + > + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + > + node_page_state(pgdat, NR_INACTIVE_ANON) + > + node_page_state(pgdat, NR_ACTIVE_FILE) + > + node_page_state(pgdat, NR_ACTIVE_ANON); > + > + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - > + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); > + > + /* > + * Gup-pinned pages are non-migratable. After subtracting these pages, > + * we need to check if the remaining pages are sufficient for memory > + * compaction. > + */ > + if ((sum - nr_pinned) < (1 << order)) > + return false; > + IMO, using the node's statistics to determine whether the zone is suitable for compaction doesn't make sense. It is possible that even though the normal zone has long-term pinned pages, the movable zone can still be suitable for compaction.
在 2025/1/6 16:12, Baolin Wang 写道: > > > On 2025/1/4 16:58, yangge1116@126.com wrote: >> From: yangge <yangge1116@126.com> >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum of >> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine >> memory. There is 16GB of free CMA memory on a NUMA node, which is >> sufficient to pass the order-0 watermark check, causing the >> __compaction_suitable() function to consistently return true. >> However, if there aren't enough migratable pages available, performing >> memory compaction is also meaningless. Besides checking whether >> the order-0 watermark is met, __compaction_suitable() also needs >> to determine whether there are sufficient migratable pages available >> for memory compaction. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // should exit __alloc_pages_slowpath() from here >> >> When the 16G of non-CMA memory on a single node is exhausted, we will >> fallback to allocating memory on other nodes. In order to quickly >> fallback to remote nodes, we should skip memory compaction when >> migratable pages are insufficient. After this fix, it only takes a >> few tens of seconds to start a 32GB virtual machine with device >> passthrough functionality. >> >> Signed-off-by: yangge <yangge1116@126.com> >> --- >> mm/compaction.c | 19 +++++++++++++++++++ >> 1 file changed, 19 insertions(+) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 07bd227..1c469b3 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone >> *zone, int order, >> int highest_zoneidx, >> unsigned long wmark_target) >> { >> + pg_data_t *pgdat = zone->zone_pgdat; >> + unsigned long sum, nr_pinned; >> unsigned long watermark; >> + >> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + >> + node_page_state(pgdat, NR_INACTIVE_ANON) + >> + node_page_state(pgdat, NR_ACTIVE_FILE) + >> + node_page_state(pgdat, NR_ACTIVE_ANON); >> + >> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - >> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); >> + >> + /* >> + * Gup-pinned pages are non-migratable. After subtracting these >> pages, >> + * we need to check if the remaining pages are sufficient for memory >> + * compaction. >> + */ >> + if ((sum - nr_pinned) < (1 << order)) >> + return false; >> + > > IMO, using the node's statistics to determine whether the zone is > suitable for compaction doesn't make sense. It is possible that even > though the normal zone has long-term pinned pages, the movable zone can > still be suitable for compaction. If all the memory used on a node is pinned, then this memory cannot be migrated anymore, and memory compaction operations would not succeed. I haven't used movable zone before, can you explain why memory compaction is still necessary? Thank you.
On 2025/1/6 16:49, Ge Yang wrote: > > > 在 2025/1/6 16:12, Baolin Wang 写道: >> >> >> On 2025/1/4 16:58, yangge1116@126.com wrote: >>> From: yangge <yangge1116@126.com> >>> >>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >>> of memory. I have configured 16GB of CMA memory on each NUMA node, >>> and starting a 32GB virtual machine with device passthrough is >>> extremely slow, taking almost an hour. >>> >>> During the start-up of the virtual machine, it will call >>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >>> Long term GUP cannot allocate memory from CMA area, so a maximum of >>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine >>> memory. There is 16GB of free CMA memory on a NUMA node, which is >>> sufficient to pass the order-0 watermark check, causing the >>> __compaction_suitable() function to consistently return true. >>> However, if there aren't enough migratable pages available, performing >>> memory compaction is also meaningless. Besides checking whether >>> the order-0 watermark is met, __compaction_suitable() also needs >>> to determine whether there are sufficient migratable pages available >>> for memory compaction. >>> >>> For costly allocations, because __compaction_suitable() always >>> returns true, __alloc_pages_slowpath() can't exit at the appropriate >>> place, resulting in excessively long virtual machine startup times. >>> Call trace: >>> __alloc_pages_slowpath >>> if (compact_result == COMPACT_SKIPPED || >>> compact_result == COMPACT_DEFERRED) >>> goto nopage; // should exit __alloc_pages_slowpath() from here >>> >>> When the 16G of non-CMA memory on a single node is exhausted, we will >>> fallback to allocating memory on other nodes. In order to quickly >>> fallback to remote nodes, we should skip memory compaction when >>> migratable pages are insufficient. After this fix, it only takes a >>> few tens of seconds to start a 32GB virtual machine with device >>> passthrough functionality. >>> >>> Signed-off-by: yangge <yangge1116@126.com> >>> --- >>> mm/compaction.c | 19 +++++++++++++++++++ >>> 1 file changed, 19 insertions(+) >>> >>> diff --git a/mm/compaction.c b/mm/compaction.c >>> index 07bd227..1c469b3 100644 >>> --- a/mm/compaction.c >>> +++ b/mm/compaction.c >>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone >>> *zone, int order, >>> int highest_zoneidx, >>> unsigned long wmark_target) >>> { >>> + pg_data_t *pgdat = zone->zone_pgdat; >>> + unsigned long sum, nr_pinned; >>> unsigned long watermark; >>> + >>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + >>> + node_page_state(pgdat, NR_INACTIVE_ANON) + >>> + node_page_state(pgdat, NR_ACTIVE_FILE) + >>> + node_page_state(pgdat, NR_ACTIVE_ANON); >>> + >>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - >>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); >>> + >>> + /* >>> + * Gup-pinned pages are non-migratable. After subtracting these >>> pages, >>> + * we need to check if the remaining pages are sufficient for >>> memory >>> + * compaction. >>> + */ >>> + if ((sum - nr_pinned) < (1 << order)) >>> + return false; >>> + >> >> IMO, using the node's statistics to determine whether the zone is >> suitable for compaction doesn't make sense. It is possible that even >> though the normal zone has long-term pinned pages, the movable zone >> can still be suitable for compaction. > If all the memory used on a node is pinned, then this memory cannot be > migrated anymore, and memory compaction operations would not succeed. > I haven't used movable zone before, can you explain why memory > compaction is still necessary? Thank you. Please consider unevictable folios that are not in the active/inactive file/anon LRU lists, yet can still be migrated.
在 2025/1/8 10:50, Baolin Wang 写道: > > > On 2025/1/6 16:49, Ge Yang wrote: >> >> >> 在 2025/1/6 16:12, Baolin Wang 写道: >>> >>> >>> On 2025/1/4 16:58, yangge1116@126.com wrote: >>>> From: yangge <yangge1116@126.com> >>>> >>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >>>> of memory. I have configured 16GB of CMA memory on each NUMA node, >>>> and starting a 32GB virtual machine with device passthrough is >>>> extremely slow, taking almost an hour. >>>> >>>> During the start-up of the virtual machine, it will call >>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >>>> Long term GUP cannot allocate memory from CMA area, so a maximum of >>>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine >>>> memory. There is 16GB of free CMA memory on a NUMA node, which is >>>> sufficient to pass the order-0 watermark check, causing the >>>> __compaction_suitable() function to consistently return true. >>>> However, if there aren't enough migratable pages available, performing >>>> memory compaction is also meaningless. Besides checking whether >>>> the order-0 watermark is met, __compaction_suitable() also needs >>>> to determine whether there are sufficient migratable pages available >>>> for memory compaction. >>>> >>>> For costly allocations, because __compaction_suitable() always >>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate >>>> place, resulting in excessively long virtual machine startup times. >>>> Call trace: >>>> __alloc_pages_slowpath >>>> if (compact_result == COMPACT_SKIPPED || >>>> compact_result == COMPACT_DEFERRED) >>>> goto nopage; // should exit __alloc_pages_slowpath() from here >>>> >>>> When the 16G of non-CMA memory on a single node is exhausted, we will >>>> fallback to allocating memory on other nodes. In order to quickly >>>> fallback to remote nodes, we should skip memory compaction when >>>> migratable pages are insufficient. After this fix, it only takes a >>>> few tens of seconds to start a 32GB virtual machine with device >>>> passthrough functionality. >>>> >>>> Signed-off-by: yangge <yangge1116@126.com> >>>> --- >>>> mm/compaction.c | 19 +++++++++++++++++++ >>>> 1 file changed, 19 insertions(+) >>>> >>>> diff --git a/mm/compaction.c b/mm/compaction.c >>>> index 07bd227..1c469b3 100644 >>>> --- a/mm/compaction.c >>>> +++ b/mm/compaction.c >>>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone >>>> *zone, int order, >>>> int highest_zoneidx, >>>> unsigned long wmark_target) >>>> { >>>> + pg_data_t *pgdat = zone->zone_pgdat; >>>> + unsigned long sum, nr_pinned; >>>> unsigned long watermark; >>>> + >>>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + >>>> + node_page_state(pgdat, NR_INACTIVE_ANON) + >>>> + node_page_state(pgdat, NR_ACTIVE_FILE) + >>>> + node_page_state(pgdat, NR_ACTIVE_ANON); >>>> + >>>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - >>>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); >>>> + >>>> + /* >>>> + * Gup-pinned pages are non-migratable. After subtracting these >>>> pages, >>>> + * we need to check if the remaining pages are sufficient for >>>> memory >>>> + * compaction. >>>> + */ >>>> + if ((sum - nr_pinned) < (1 << order)) >>>> + return false; >>>> + >>> >>> IMO, using the node's statistics to determine whether the zone is >>> suitable for compaction doesn't make sense. It is possible that even >>> though the normal zone has long-term pinned pages, the movable zone >>> can still be suitable for compaction. >> If all the memory used on a node is pinned, then this memory cannot be >> migrated anymore, and memory compaction operations would not succeed. >> I haven't used movable zone before, can you explain why memory >> compaction is still necessary? Thank you. > > Please consider unevictable folios that are not in the active/inactive > file/anon LRU lists, yet can still be migrated. Ok, thanks.
diff --git a/mm/compaction.c b/mm/compaction.c index 07bd227..1c469b3 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone *zone, int order, int highest_zoneidx, unsigned long wmark_target) { + pg_data_t *pgdat = zone->zone_pgdat; + unsigned long sum, nr_pinned; unsigned long watermark; + + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + + node_page_state(pgdat, NR_INACTIVE_ANON) + + node_page_state(pgdat, NR_ACTIVE_FILE) + + node_page_state(pgdat, NR_ACTIVE_ANON); + + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); + + /* + * Gup-pinned pages are non-migratable. After subtracting these pages, + * we need to check if the remaining pages are sufficient for memory + * compaction. + */ + if ((sum - nr_pinned) < (1 << order)) + return false; + /* * Watermarks for order-0 must be met for compaction to be able to * isolate free pages for migration targets. This means that the