mbox series

[mm-unstable,v2,0/3] mm/hugetlb: alloc/free gigantic folios

Message ID 20240814035451.773331-1-yuzhao@google.com (mailing list archive)
Headers show
Series mm/hugetlb: alloc/free gigantic folios | expand

Message

Yu Zhao Aug. 14, 2024, 3:54 a.m. UTC
Use __GFP_COMP for gigantic folios can greatly reduce not only the
amount of code but also the allocation and free time.

Approximate LOC to mm/hugetlb.c: +60, -240

Allocate and free 500 1GB hugeTLB memory without HVO by:
  time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
  time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

       Before  After
Alloc  ~13s    ~10s
Free   ~15s    <1s

The above magnitude generally holds for multiple x86 and arm64 CPU
models.

Perf profile before:
  Alloc
    - 99.99% alloc_pool_huge_folio
       - __alloc_fresh_hugetlb_folio
          - 83.23% alloc_contig_pages_noprof
             - 47.46% alloc_contig_range_noprof
                - 20.96% isolate_freepages_range
                     16.10% split_page
                - 14.10% start_isolate_page_range
                - 12.02% undo_isolate_page_range

  Free
    - update_and_free_pages_bulk
       - 87.71% free_contig_range
          - 76.02% free_unref_page
             - 41.30% free_unref_page_commit
                - 32.58% free_pcppages_bulk
                   - 24.75% __free_one_page
               13.96% _raw_spin_trylock
         12.27% __update_and_free_hugetlb_folio

Perf profile after:
  Alloc
    - 99.99% alloc_pool_huge_folio
         alloc_gigantic_folio
       - alloc_contig_pages_noprof
          - 59.15% alloc_contig_range_noprof
             - 20.72% start_isolate_page_range
               20.64% prep_new_page
             - 17.13% undo_isolate_page_range

  Free
    - update_and_free_pages_bulk
       - __folio_put
       - __free_pages_ok
            7.46% free_tail_page_prepare
          - 1.97% free_one_page
               1.86% __free_one_page

Yu Zhao (3):
  mm/contig_alloc: support __GFP_COMP
  mm/cma: add cma_{alloc,free}_folio()
  mm/hugetlb: use __GFP_COMP for gigantic folios

 include/linux/cma.h     |  16 +++
 include/linux/gfp.h     |  23 ++++
 include/linux/hugetlb.h |   9 +-
 mm/cma.c                |  55 ++++++--
 mm/compaction.c         |  41 +-----
 mm/hugetlb.c            | 293 ++++++++--------------------------------
 mm/page_alloc.c         | 111 ++++++++++-----
 7 files changed, 226 insertions(+), 322 deletions(-)

Comments

Jane Chu Aug. 30, 2024, 5:55 p.m. UTC | #1
On 8/13/2024 8:54 PM, Yu Zhao wrote:

> Use __GFP_COMP for gigantic folios can greatly reduce not only the
> amount of code but also the allocation and free time.
>
> Approximate LOC to mm/hugetlb.c: +60, -240
>
> Allocate and free 500 1GB hugeTLB memory without HVO by:

Do you also have numbers with HVO enabled ?

>    time echo 500 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>    time echo 0 >/sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages
>
>         Before  After
> Alloc  ~13s    ~10s
> Free   ~15s    <1s
>
Thanks,

-jane