mbox series

[RFC,v2,0/7] Use pageblock_order for cma and alloc_contig_range alignment.

Message ID 20211209230414.2766515-1-zi.yan@sent.com (mailing list archive)
Headers show
Series Use pageblock_order for cma and alloc_contig_range alignment. | expand

Message

Zi Yan Dec. 9, 2021, 11:04 p.m. UTC
From: Zi Yan <ziy@nvidia.com>

Hi all,

This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
adjustable at boot time[1].

The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
isolates pageblocks to remove free memory from buddy allocator but isolating
only a subset of pageblocks within a page spanning across multiple pageblocks
causes free page accounting issues. Isolated page might not be put into the
right free list, since the code assumes the migratetype of the first pageblock
as the whole free page migratetype. This is based on the discussion at [2].

To remove the requirement, this patchset:
1. still isolates pageblocks at MAX_ORDER - 1 granularity;
2. but saves the pageblock migratetypes outside the specified range of
   alloc_contig_range() and restores them after all pages within the range
   become free after __alloc_contig_migrate_range();
3. splits free pages spanning multiple pageblocks at the beginning and the end
   of the range and puts the split pages to the right migratetype free lists
   based on the pageblock migratetypes;
4. returns pages not in the range as it did before this patch.

Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
to make sure all pageblocks belonging to a single page are isolated together 
and later pageblocks outside the range need to have their migratetypes restored;
or 2) extra logic will need to be added during page free time to split a free
page with multi-migratetype pageblocks.

Two optimizations might come later:
1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
   range during isolation to increase successful rate of alloc_contig_range().
2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
   migratetypes before and after isolation respectively.

Feel free to give comments and suggestions. Thanks.


[1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
[2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/


Zi Yan (7):
  mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
  mm: compaction: handle non-lru compound pages properly in
    isolate_migratepages_block().
  mm: migrate: allocate the right size of non hugetlb or THP compound
    pages.
  mm: make alloc_contig_range work at pageblock granularity
  mm: cma: use pageblock_order as the single alignment
  drivers: virtio_mem: use pageblock size as the minimum virtio_mem
    size.
  arch: powerpc: adjust fadump alignment to be pageblock aligned.

 arch/powerpc/include/asm/fadump-internal.h |   4 +-
 drivers/virtio/virtio_mem.c                |   6 +-
 include/linux/mmzone.h                     |  11 +-
 kernel/dma/contiguous.c                    |   2 +-
 mm/cma.c                                   |   6 +-
 mm/compaction.c                            |  10 +-
 mm/migrate.c                               |   8 +-
 mm/page_alloc.c                            | 203 +++++++++++++++++----
 8 files changed, 196 insertions(+), 54 deletions(-)

Comments

Eric Ren Dec. 10, 2021, 7:30 a.m. UTC | #1
Hi Zi Yan,

On 2021/12/10 07:04, Zi Yan wrote:
> From: Zi Yan <ziy@nvidia.com>
>
> Hi all,
>
> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
> adjustable at boot time[1].
>
> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
> isolates pageblocks to remove free memory from buddy allocator but isolating
> only a subset of pageblocks within a page spanning across multiple pageblocks
> causes free page accounting issues. Isolated page might not be put into the
> right free list, since the code assumes the migratetype of the first pageblock
> as the whole free page migratetype. This is based on the discussion at [2].
>
> To remove the requirement, this patchset:
> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
Then, unplug fails if either pageblock of theĀ  MAX_ORDER - 1 page has 
unmovable page, right?

Thanks,
Eric
> 2. but saves the pageblock migratetypes outside the specified range of
>     alloc_contig_range() and restores them after all pages within the range
>     become free after __alloc_contig_migrate_range();
> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>     of the range and puts the split pages to the right migratetype free lists
>     based on the pageblock migratetypes;
> 4. returns pages not in the range as it did before this patch.
>
> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
> to make sure all pageblocks belonging to a single page are isolated together
> and later pageblocks outside the range need to have their migratetypes restored;
> or 2) extra logic will need to be added during page free time to split a free
> page with multi-migratetype pageblocks.
>
> Two optimizations might come later:
> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>     range during isolation to increase successful rate of alloc_contig_range().
> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
>     migratetypes before and after isolation respectively.
>
> Feel free to give comments and suggestions. Thanks.
>
>
> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>
>
> Zi Yan (7):
>    mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
>    mm: compaction: handle non-lru compound pages properly in
>      isolate_migratepages_block().
>    mm: migrate: allocate the right size of non hugetlb or THP compound
>      pages.
>    mm: make alloc_contig_range work at pageblock granularity
>    mm: cma: use pageblock_order as the single alignment
>    drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>      size.
>    arch: powerpc: adjust fadump alignment to be pageblock aligned.
>
>   arch/powerpc/include/asm/fadump-internal.h |   4 +-
>   drivers/virtio/virtio_mem.c                |   6 +-
>   include/linux/mmzone.h                     |  11 +-
>   kernel/dma/contiguous.c                    |   2 +-
>   mm/cma.c                                   |   6 +-
>   mm/compaction.c                            |  10 +-
>   mm/migrate.c                               |   8 +-
>   mm/page_alloc.c                            | 203 +++++++++++++++++----
>   8 files changed, 196 insertions(+), 54 deletions(-)
>
Zi Yan Dec. 10, 2021, 3:30 p.m. UTC | #2
On 10 Dec 2021, at 2:30, Eric Ren wrote:

> Hi Zi Yan,
>
> On 2021/12/10 07:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi all,
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
> Then, unplug fails if either pageblock of theĀ  MAX_ORDER - 1 page has unmovable page, right?

Right. One of the optimizations mentioned is targeting this by passing the actual
range instead of the MAX_ORDER-1 aligned range, so that has_unmovable_pages()
will not give false positive, minimizing the isolation failure rates.

>
> Thanks,
> Eric
>> 2. but saves the pageblock migratetypes outside the specified range of
>>     alloc_contig_range() and restores them after all pages within the range
>>     become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>>     of the range and puts the split pages to the right migratetype free lists
>>     based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>>     range during isolation to increase successful rate of alloc_contig_range().
^^^^^^^^^^^^^^

>> 2. make MIGRATE_ISOLATE a separate bit to avoid saving and restoring existing
>>     migratetypes before and after isolation respectively.
>>
>> Feel free to give comments and suggestions. Thanks.
>>
>>
>> [1] https://lore.kernel.org/linux-mm/20210805190253.2795604-1-zi.yan@sent.com/
>> [2] https://lore.kernel.org/linux-mm/d19fb078-cb9b-f60f-e310-fdeea1b947d2@redhat.com/
>>
>>
>> Zi Yan (7):
>>    mm: page_alloc: avoid merging non-fallbackable pageblocks with others.
>>    mm: compaction: handle non-lru compound pages properly in
>>      isolate_migratepages_block().
>>    mm: migrate: allocate the right size of non hugetlb or THP compound
>>      pages.
>>    mm: make alloc_contig_range work at pageblock granularity
>>    mm: cma: use pageblock_order as the single alignment
>>    drivers: virtio_mem: use pageblock size as the minimum virtio_mem
>>      size.
>>    arch: powerpc: adjust fadump alignment to be pageblock aligned.
>>
>>   arch/powerpc/include/asm/fadump-internal.h |   4 +-
>>   drivers/virtio/virtio_mem.c                |   6 +-
>>   include/linux/mmzone.h                     |  11 +-
>>   kernel/dma/contiguous.c                    |   2 +-
>>   mm/cma.c                                   |   6 +-
>>   mm/compaction.c                            |  10 +-
>>   mm/migrate.c                               |   8 +-
>>   mm/page_alloc.c                            | 203 +++++++++++++++++----
>>   8 files changed, 196 insertions(+), 54 deletions(-)
>>


--
Best Regards,
Yan, Zi
Zi Yan Dec. 10, 2021, 8:17 p.m. UTC | #3
On 10 Dec 2021, at 13:36, David Hildenbrand wrote:

> On 10.12.21 00:04, Zi Yan wrote:
>> From: Zi Yan <ziy@nvidia.com>
>>
>> Hi all,
>
> Hi,
>
> thanks for working on that!
>
>>
>> This patchset tries to remove the MAX_ORDER - 1 alignment requirement for CMA
>> and alloc_contig_range(). It prepares for my upcoming changes to make MAX_ORDER
>> adjustable at boot time[1].
>>
>> The MAX_ORDER - 1 alignment requirement comes from that alloc_contig_range()
>> isolates pageblocks to remove free memory from buddy allocator but isolating
>> only a subset of pageblocks within a page spanning across multiple pageblocks
>> causes free page accounting issues. Isolated page might not be put into the
>> right free list, since the code assumes the migratetype of the first pageblock
>> as the whole free page migratetype. This is based on the discussion at [2].
>>
>> To remove the requirement, this patchset:
>> 1. still isolates pageblocks at MAX_ORDER - 1 granularity;
>> 2. but saves the pageblock migratetypes outside the specified range of
>>    alloc_contig_range() and restores them after all pages within the range
>>    become free after __alloc_contig_migrate_range();
>> 3. splits free pages spanning multiple pageblocks at the beginning and the end
>>    of the range and puts the split pages to the right migratetype free lists
>>    based on the pageblock migratetypes;
>> 4. returns pages not in the range as it did before this patch.
>>
>> Isolation needs to happen at MAX_ORDER - 1 granularity, because otherwise
>> 1) extra code is needed to detect pages (free, PageHuge, THP, or PageCompound)
>> to make sure all pageblocks belonging to a single page are isolated together
>> and later pageblocks outside the range need to have their migratetypes restored;
>> or 2) extra logic will need to be added during page free time to split a free
>> page with multi-migratetype pageblocks.
>>
>> Two optimizations might come later:
>> 1. only check unmovable pages within the range instead of MAX_ORDER - 1 aligned
>>    range during isolation to increase successful rate of alloc_contig_range().
>
> The issue with virtio-mem is that we'll need that as soon as we change
> the granularity to pageblocks, because otherwise, you can heavily
> degrade unplug reliably in sane setups:
>
> Previous:
> * Try unplug free 4M range (2 pageblocks): succeeds
>
> Now:
> * Try unplug 2M range (first pageblock): succeeds.
> * Try unplug next 2M range (second pageblock): fails because first
> contains unmovable allcoations.
>

OK. Make sense. I will add it in the next version.


--
Best Regards,
Yan, Zi