[v2,3/3] mm: mTHP stats for pagecache folio allocations

Message ID	20240716135907.4047689-4-ryan.roberts@arm.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Ryan Roberts <ryan.roberts@arm.com> To: Andrew Morton <akpm@linux-foundation.org>, Hugh Dickins <hughd@google.com>, Jonathan Corbet <corbet@lwn.net>, "Matthew Wilcox (Oracle)" <willy@infradead.org>, David Hildenbrand <david@redhat.com>, Barry Song <baohua@kernel.org>, Lance Yang <ioworker0@gmail.com>, Baolin Wang <baolin.wang@linux.alibaba.com>, Gavin Shan <gshan@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: [PATCH v2 3/3] mm: mTHP stats for pagecache folio allocations Date: Tue, 16 Jul 2024 14:59:06 +0100 Message-ID: <20240716135907.4047689-4-ryan.roberts@arm.com> In-Reply-To: <20240716135907.4047689-1-ryan.roberts@arm.com> References: <20240716135907.4047689-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mTHP allocation stats for file-backed memory \| expand [v2,0/3] mTHP allocation stats for file-backed memory [v2,1/3] mm: Cleanup count_mthp_stat() definition [v2,2/3] mm: Tidy up shmem mTHP controls and stats [v2,3/3] mm: mTHP stats for pagecache folio allocations

Ryan Roberts July 16, 2024, 1:59 p.m. UTC

Expose 3 new mTHP stats for file (pagecache) folio allocations:

  /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
  /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
  /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge

This will provide some insight on the sizes of large folios being
allocated for file-backed memory, and how often allocation is failing.

All non-order-0 (and most order-0) folio allocations are currently done
through filemap_alloc_folio(), and folios are charged in a subsequent
call to filemap_add_folio(). So count file_fallback when allocation
fails in filemap_alloc_folio() and count file_alloc or
file_fallback_charge in filemap_add_folio(), based on whether charging
succeeded or not. There are some users of filemap_add_folio() that
allocate their own order-0 folio by other means, so we would not count
an allocation failure in this case, but we also don't care about order-0
allocations. This approach feels like it should be good enough and
doesn't require any (impractically large) refactoring.

The existing mTHP stats interface is reused to provide consistency to
users. And because we are reusing the same interface, we can reuse the
same infrastructure on the kernel side.

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
 include/linux/huge_mm.h                    |  3 +++
 include/linux/pagemap.h                    | 16 ++++++++++++++--
 mm/filemap.c                               |  6 ++++--
 mm/huge_memory.c                           |  7 +++++++
 5 files changed, 41 insertions(+), 4 deletions(-)

Barry Song July 19, 2024, 12:12 a.m. UTC | #1

On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>
>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>
> This will provide some insight on the sizes of large folios being
> allocated for file-backed memory, and how often allocation is failing.
>
> All non-order-0 (and most order-0) folio allocations are currently done
> through filemap_alloc_folio(), and folios are charged in a subsequent
> call to filemap_add_folio(). So count file_fallback when allocation
> fails in filemap_alloc_folio() and count file_alloc or
> file_fallback_charge in filemap_add_folio(), based on whether charging
> succeeded or not. There are some users of filemap_add_folio() that
> allocate their own order-0 folio by other means, so we would not count
> an allocation failure in this case, but we also don't care about order-0
> allocations. This approach feels like it should be good enough and
> doesn't require any (impractically large) refactoring.
>
> The existing mTHP stats interface is reused to provide consistency to
> users. And because we are reusing the same interface, we can reuse the
> same infrastructure on the kernel side.
>
> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> ---
>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>  include/linux/huge_mm.h                    |  3 +++
>  include/linux/pagemap.h                    | 16 ++++++++++++++--
>  mm/filemap.c                               |  6 ++++--
>  mm/huge_memory.c                           |  7 +++++++
>  5 files changed, 41 insertions(+), 4 deletions(-)
>
> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> index 058485daf186..d4857e457add 100644
> --- a/Documentation/admin-guide/mm/transhuge.rst
> +++ b/Documentation/admin-guide/mm/transhuge.rst
> @@ -512,6 +512,19 @@ shmem_fallback_charge
>         falls back to using small pages even though the allocation was
>         successful.
>
> +file_alloc
> +       is incremented every time a file huge page is successfully
> +       allocated.
> +
> +file_fallback
> +       is incremented if a file huge page is attempted to be allocated
> +       but fails and instead falls back to using small pages.
> +
> +file_fallback_charge
> +       is incremented if a file huge page cannot be charged and instead
> +       falls back to using small pages even though the allocation was
> +       successful.
> +

I realized that when we talk about fallback, it doesn't necessarily mean
small pages; it could also refer to smaller huge pages.

anon_fault_alloc
        is incremented every time a huge page is successfully
        allocated and charged to handle a page fault.

anon_fault_fallback
        is incremented if a page fault fails to allocate or charge
        a huge page and instead falls back to using huge pages with
        lower orders or small pages.

anon_fault_fallback_charge
        is incremented if a page fault fails to charge a huge page and
        instead falls back to using huge pages with lower orders or
        small pages even though the allocation was successful.

This also applies to files, right?

                do {
                        gfp_t alloc_gfp = gfp;

                        err = -ENOMEM;
                        if (order > 0)
                                alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
                        folio = filemap_alloc_folio(alloc_gfp, order);
                        if (!folio)
                                continue;

                        /* Init accessed so avoid atomic
mark_page_accessed later */
                        if (fgp_flags & FGP_ACCESSED)
                                __folio_set_referenced(folio);

                        err = filemap_add_folio(mapping, folio, index, gfp);
                        if (!err)
                                break;
                        folio_put(folio);
                        folio = NULL;
                } while (order-- > 0);


>  split
>         is incremented every time a huge page is successfully split into
>         smaller orders. This can happen for a variety of reasons but a
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index b8c63c3e967f..4f9109fcdded 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>         MTHP_STAT_SHMEM_ALLOC,
>         MTHP_STAT_SHMEM_FALLBACK,
>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> +       MTHP_STAT_FILE_ALLOC,
> +       MTHP_STAT_FILE_FALLBACK,
> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
>         MTHP_STAT_SPLIT,
>         MTHP_STAT_SPLIT_FAILED,
>         MTHP_STAT_SPLIT_DEFERRED,
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 6e2f72d03176..95a147b5d117 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>  }
>
>  #ifdef CONFIG_NUMA
> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>  #else
> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>  {
>         return folio_alloc_noprof(gfp, order);
>  }
>  #endif
>
> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> +{
> +       struct folio *folio;
> +
> +       folio = __filemap_alloc_folio_noprof(gfp, order);
> +
> +       if (!folio)
> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> +
> +       return folio;
> +}

Do we need to add and export __filemap_alloc_folio_noprof()? In any case,
we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
will only allocate the folio instead?

> +
>  #define filemap_alloc_folio(...)                               \
>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 53d5d0410b51..131d514fca29 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>         int ret;
>
>         ret = mem_cgroup_charge(folio, NULL, gfp);
> +       count_mthp_stat(folio_order(folio),
> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>         if (ret)
>                 return ret;

Would the following be better?

        ret = mem_cgroup_charge(folio, NULL, gfp);
         if (ret) {
                 count_mthp_stat(folio_order(folio),
MTHP_STAT_FILE_FALLBACK_CHARGE);
                 return ret;
         }
       count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);

Anyway, it's up to you. The code just feels a bit off to me :-)

>
> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>  EXPORT_SYMBOL_GPL(filemap_add_folio);
>
>  #ifdef CONFIG_NUMA
> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>  {
>         int n;
>         struct folio *folio;
> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>         }
>         return folio_alloc_noprof(gfp, order);
>  }
> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>  #endif
>
>  /*
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 578ac212c172..26d558e3e80f 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>         .attrs = anon_stats_attrs,
>  };
>
> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> +
>  static struct attribute *file_stats_attrs[] = {
> +       &file_alloc_attr.attr,
> +       &file_fallback_attr.attr,
> +       &file_fallback_charge_attr.attr,
>  #ifdef CONFIG_SHMEM
>         &shmem_alloc_attr.attr,
>         &shmem_fallback_attr.attr,
> --
> 2.43.0
>

Thanks
Barry

Ryan Roberts July 19, 2024, 8:56 a.m. UTC | #2

On 19/07/2024 01:12, Barry Song wrote:
> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>
>> This will provide some insight on the sizes of large folios being
>> allocated for file-backed memory, and how often allocation is failing.
>>
>> All non-order-0 (and most order-0) folio allocations are currently done
>> through filemap_alloc_folio(), and folios are charged in a subsequent
>> call to filemap_add_folio(). So count file_fallback when allocation
>> fails in filemap_alloc_folio() and count file_alloc or
>> file_fallback_charge in filemap_add_folio(), based on whether charging
>> succeeded or not. There are some users of filemap_add_folio() that
>> allocate their own order-0 folio by other means, so we would not count
>> an allocation failure in this case, but we also don't care about order-0
>> allocations. This approach feels like it should be good enough and
>> doesn't require any (impractically large) refactoring.
>>
>> The existing mTHP stats interface is reused to provide consistency to
>> users. And because we are reusing the same interface, we can reuse the
>> same infrastructure on the kernel side.
>>
>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>> ---
>>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>>  include/linux/huge_mm.h                    |  3 +++
>>  include/linux/pagemap.h                    | 16 ++++++++++++++--
>>  mm/filemap.c                               |  6 ++++--
>>  mm/huge_memory.c                           |  7 +++++++
>>  5 files changed, 41 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>> index 058485daf186..d4857e457add 100644
>> --- a/Documentation/admin-guide/mm/transhuge.rst
>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>> @@ -512,6 +512,19 @@ shmem_fallback_charge
>>         falls back to using small pages even though the allocation was
>>         successful.
>>
>> +file_alloc
>> +       is incremented every time a file huge page is successfully
>> +       allocated.
>> +
>> +file_fallback
>> +       is incremented if a file huge page is attempted to be allocated
>> +       but fails and instead falls back to using small pages.
>> +
>> +file_fallback_charge
>> +       is incremented if a file huge page cannot be charged and instead
>> +       falls back to using small pages even though the allocation was
>> +       successful.
>> +
> 
> I realized that when we talk about fallback, it doesn't necessarily mean
> small pages; it could also refer to smaller huge pages.

Yes good point, I'll update the documentation to reflect that for all memory types.

> 
> anon_fault_alloc
>         is incremented every time a huge page is successfully
>         allocated and charged to handle a page fault.
> 
> anon_fault_fallback
>         is incremented if a page fault fails to allocate or charge
>         a huge page and instead falls back to using huge pages with
>         lower orders or small pages.
> 
> anon_fault_fallback_charge
>         is incremented if a page fault fails to charge a huge page and
>         instead falls back to using huge pages with lower orders or
>         small pages even though the allocation was successful.
> 
> This also applies to files, right?

It does in the place you highlight below, but page_cache_ra_order() just falls
back immediately to order-0. Regardless, I think we should just document the
user facing docs to allow for a lower high order. That gives the implementation
more flexibility.

> 
>                 do {
>                         gfp_t alloc_gfp = gfp;
> 
>                         err = -ENOMEM;
>                         if (order > 0)
>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>                         folio = filemap_alloc_folio(alloc_gfp, order);
>                         if (!folio)
>                                 continue;
> 
>                         /* Init accessed so avoid atomic
> mark_page_accessed later */
>                         if (fgp_flags & FGP_ACCESSED)
>                                 __folio_set_referenced(folio);
> 
>                         err = filemap_add_folio(mapping, folio, index, gfp);
>                         if (!err)
>                                 break;
>                         folio_put(folio);
>                         folio = NULL;
>                 } while (order-- > 0);
> 
> 
>>  split
>>         is incremented every time a huge page is successfully split into
>>         smaller orders. This can happen for a variety of reasons but a
>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>> index b8c63c3e967f..4f9109fcdded 100644
>> --- a/include/linux/huge_mm.h
>> +++ b/include/linux/huge_mm.h
>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>>         MTHP_STAT_SHMEM_ALLOC,
>>         MTHP_STAT_SHMEM_FALLBACK,
>>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>> +       MTHP_STAT_FILE_ALLOC,
>> +       MTHP_STAT_FILE_FALLBACK,
>> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
>>         MTHP_STAT_SPLIT,
>>         MTHP_STAT_SPLIT_FAILED,
>>         MTHP_STAT_SPLIT_DEFERRED,
>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>> index 6e2f72d03176..95a147b5d117 100644
>> --- a/include/linux/pagemap.h
>> +++ b/include/linux/pagemap.h
>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>>  }
>>
>>  #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>  #else
>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>  {
>>         return folio_alloc_noprof(gfp, order);
>>  }
>>  #endif
>>
>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +{
>> +       struct folio *folio;
>> +
>> +       folio = __filemap_alloc_folio_noprof(gfp, order);
>> +
>> +       if (!folio)
>> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>> +
>> +       return folio;
>> +}
> 
> Do we need to add and export __filemap_alloc_folio_noprof()? 

It is exported. See the below change in filemap.c. Previously
filemap_alloc_folio_noprof() was exported, but that is now an inline and
__filemap_alloc_folio_noprof() is exported in its place.

> In any case,
> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> will only allocate the folio instead?

Sorry I don't understand what you mean by this?

> 
>> +
>>  #define filemap_alloc_folio(...)                               \
>>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 53d5d0410b51..131d514fca29 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>         int ret;
>>
>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>> +       count_mthp_stat(folio_order(folio),
>> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>>         if (ret)
>>                 return ret;
> 
> Would the following be better?
> 
>         ret = mem_cgroup_charge(folio, NULL, gfp);
>          if (ret) {
>                  count_mthp_stat(folio_order(folio),
> MTHP_STAT_FILE_FALLBACK_CHARGE);
>                  return ret;
>          }
>        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> 
> Anyway, it's up to you. The code just feels a bit off to me :-)

Yes, agree your version is better. I also noticed that anon and shmem increment
FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
semantics here.

Thanks,
Ryan


> 
>>
>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>  EXPORT_SYMBOL_GPL(filemap_add_folio);
>>
>>  #ifdef CONFIG_NUMA
>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>  {
>>         int n;
>>         struct folio *folio;
>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>         }
>>         return folio_alloc_noprof(gfp, order);
>>  }
>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>>  #endif
>>
>>  /*
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 578ac212c172..26d558e3e80f 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>>         .attrs = anon_stats_attrs,
>>  };
>>
>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>> +
>>  static struct attribute *file_stats_attrs[] = {
>> +       &file_alloc_attr.attr,
>> +       &file_fallback_attr.attr,
>> +       &file_fallback_charge_attr.attr,
>>  #ifdef CONFIG_SHMEM
>>         &shmem_alloc_attr.attr,
>>         &shmem_fallback_attr.attr,
>> --
>> 2.43.0
>>
> 
> Thanks
> Barry

Barry Song July 23, 2024, 10:07 p.m. UTC | #3

On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 19/07/2024 01:12, Barry Song wrote:
> > On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> Expose 3 new mTHP stats for file (pagecache) folio allocations:
> >>
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
> >>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
> >>
> >> This will provide some insight on the sizes of large folios being
> >> allocated for file-backed memory, and how often allocation is failing.
> >>
> >> All non-order-0 (and most order-0) folio allocations are currently done
> >> through filemap_alloc_folio(), and folios are charged in a subsequent
> >> call to filemap_add_folio(). So count file_fallback when allocation
> >> fails in filemap_alloc_folio() and count file_alloc or
> >> file_fallback_charge in filemap_add_folio(), based on whether charging
> >> succeeded or not. There are some users of filemap_add_folio() that
> >> allocate their own order-0 folio by other means, so we would not count
> >> an allocation failure in this case, but we also don't care about order-0
> >> allocations. This approach feels like it should be good enough and
> >> doesn't require any (impractically large) refactoring.
> >>
> >> The existing mTHP stats interface is reused to provide consistency to
> >> users. And because we are reusing the same interface, we can reuse the
> >> same infrastructure on the kernel side.
> >>
> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >> ---
> >>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
> >>  include/linux/huge_mm.h                    |  3 +++
> >>  include/linux/pagemap.h                    | 16 ++++++++++++++--
> >>  mm/filemap.c                               |  6 ++++--
> >>  mm/huge_memory.c                           |  7 +++++++
> >>  5 files changed, 41 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >> index 058485daf186..d4857e457add 100644
> >> --- a/Documentation/admin-guide/mm/transhuge.rst
> >> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >> @@ -512,6 +512,19 @@ shmem_fallback_charge
> >>         falls back to using small pages even though the allocation was
> >>         successful.
> >>
> >> +file_alloc
> >> +       is incremented every time a file huge page is successfully
> >> +       allocated.
> >> +
> >> +file_fallback
> >> +       is incremented if a file huge page is attempted to be allocated
> >> +       but fails and instead falls back to using small pages.
> >> +
> >> +file_fallback_charge
> >> +       is incremented if a file huge page cannot be charged and instead
> >> +       falls back to using small pages even though the allocation was
> >> +       successful.
> >> +
> >
> > I realized that when we talk about fallback, it doesn't necessarily mean
> > small pages; it could also refer to smaller huge pages.
>
> Yes good point, I'll update the documentation to reflect that for all memory types.
>
> >
> > anon_fault_alloc
> >         is incremented every time a huge page is successfully
> >         allocated and charged to handle a page fault.
> >
> > anon_fault_fallback
> >         is incremented if a page fault fails to allocate or charge
> >         a huge page and instead falls back to using huge pages with
> >         lower orders or small pages.
> >
> > anon_fault_fallback_charge
> >         is incremented if a page fault fails to charge a huge page and
> >         instead falls back to using huge pages with lower orders or
> >         small pages even though the allocation was successful.
> >
> > This also applies to files, right?
>
> It does in the place you highlight below, but page_cache_ra_order() just falls
> back immediately to order-0. Regardless, I think we should just document the
> user facing docs to allow for a lower high order. That gives the implementation
> more flexibility.
>
> >
> >                 do {
> >                         gfp_t alloc_gfp = gfp;
> >
> >                         err = -ENOMEM;
> >                         if (order > 0)
> >                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> >                         folio = filemap_alloc_folio(alloc_gfp, order);
> >                         if (!folio)
> >                                 continue;
> >
> >                         /* Init accessed so avoid atomic
> > mark_page_accessed later */
> >                         if (fgp_flags & FGP_ACCESSED)
> >                                 __folio_set_referenced(folio);
> >
> >                         err = filemap_add_folio(mapping, folio, index, gfp);
> >                         if (!err)
> >                                 break;
> >                         folio_put(folio);
> >                         folio = NULL;
> >                 } while (order-- > 0);
> >
> >
> >>  split
> >>         is incremented every time a huge page is successfully split into
> >>         smaller orders. This can happen for a variety of reasons but a
> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >> index b8c63c3e967f..4f9109fcdded 100644
> >> --- a/include/linux/huge_mm.h
> >> +++ b/include/linux/huge_mm.h
> >> @@ -123,6 +123,9 @@ enum mthp_stat_item {
> >>         MTHP_STAT_SHMEM_ALLOC,
> >>         MTHP_STAT_SHMEM_FALLBACK,
> >>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> >> +       MTHP_STAT_FILE_ALLOC,
> >> +       MTHP_STAT_FILE_FALLBACK,
> >> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
> >>         MTHP_STAT_SPLIT,
> >>         MTHP_STAT_SPLIT_FAILED,
> >>         MTHP_STAT_SPLIT_DEFERRED,
> >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >> index 6e2f72d03176..95a147b5d117 100644
> >> --- a/include/linux/pagemap.h
> >> +++ b/include/linux/pagemap.h
> >> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
> >>  }
> >>
> >>  #ifdef CONFIG_NUMA
> >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>  #else
> >> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>  {
> >>         return folio_alloc_noprof(gfp, order);
> >>  }
> >>  #endif
> >>
> >> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +{
> >> +       struct folio *folio;
> >> +
> >> +       folio = __filemap_alloc_folio_noprof(gfp, order);
> >> +
> >> +       if (!folio)
> >> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> >> +
> >> +       return folio;
> >> +}
> >
> > Do we need to add and export __filemap_alloc_folio_noprof()?
>
> It is exported. See the below change in filemap.c. Previously
> filemap_alloc_folio_noprof() was exported, but that is now an inline and
> __filemap_alloc_folio_noprof() is exported in its place.
>
> > In any case,
> > we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> > will only allocate the folio instead?
>
> Sorry I don't understand what you mean by this?

Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
to situations where people call __filemap_alloc_folio_noprof() and
forget to call
count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
that counting is always necessary? Another option is that we still
call count mthp
within filemap_alloc_folio_noprof() and make it noinline if
__filemap_alloc_folio_noprof()
is not inline?

>
> >
> >> +
> >>  #define filemap_alloc_folio(...)                               \
> >>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
> >>
> >> diff --git a/mm/filemap.c b/mm/filemap.c
> >> index 53d5d0410b51..131d514fca29 100644
> >> --- a/mm/filemap.c
> >> +++ b/mm/filemap.c
> >> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>         int ret;
> >>
> >>         ret = mem_cgroup_charge(folio, NULL, gfp);
> >> +       count_mthp_stat(folio_order(folio),
> >> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
> >>         if (ret)
> >>                 return ret;
> >
> > Would the following be better?
> >
> >         ret = mem_cgroup_charge(folio, NULL, gfp);
> >          if (ret) {
> >                  count_mthp_stat(folio_order(folio),
> > MTHP_STAT_FILE_FALLBACK_CHARGE);
> >                  return ret;
> >          }
> >        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> >
> > Anyway, it's up to you. The code just feels a bit off to me :-)
>
> Yes, agree your version is better. I also noticed that anon and shmem increment
> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
> semantics here.
>
> Thanks,
> Ryan
>
>
> >
> >>
> >> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>  EXPORT_SYMBOL_GPL(filemap_add_folio);
> >>
> >>  #ifdef CONFIG_NUMA
> >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>  {
> >>         int n;
> >>         struct folio *folio;
> >> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>         }
> >>         return folio_alloc_noprof(gfp, order);
> >>  }
> >> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
> >> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
> >>  #endif
> >>
> >>  /*
> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >> index 578ac212c172..26d558e3e80f 100644
> >> --- a/mm/huge_memory.c
> >> +++ b/mm/huge_memory.c
> >> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
> >>         .attrs = anon_stats_attrs,
> >>  };
> >>
> >> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >> +
> >>  static struct attribute *file_stats_attrs[] = {
> >> +       &file_alloc_attr.attr,
> >> +       &file_fallback_attr.attr,
> >> +       &file_fallback_charge_attr.attr,
> >>  #ifdef CONFIG_SHMEM
> >>         &shmem_alloc_attr.attr,
> >>         &shmem_fallback_attr.attr,
> >> --
> >> 2.43.0
> >>
> >

Thanks
Barry

Ryan Roberts July 24, 2024, 8:12 a.m. UTC | #4

On 23/07/2024 23:07, Barry Song wrote:
> On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 19/07/2024 01:12, Barry Song wrote:
>>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>>
>>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
>>>>
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
>>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
>>>>
>>>> This will provide some insight on the sizes of large folios being
>>>> allocated for file-backed memory, and how often allocation is failing.
>>>>
>>>> All non-order-0 (and most order-0) folio allocations are currently done
>>>> through filemap_alloc_folio(), and folios are charged in a subsequent
>>>> call to filemap_add_folio(). So count file_fallback when allocation
>>>> fails in filemap_alloc_folio() and count file_alloc or
>>>> file_fallback_charge in filemap_add_folio(), based on whether charging
>>>> succeeded or not. There are some users of filemap_add_folio() that
>>>> allocate their own order-0 folio by other means, so we would not count
>>>> an allocation failure in this case, but we also don't care about order-0
>>>> allocations. This approach feels like it should be good enough and
>>>> doesn't require any (impractically large) refactoring.
>>>>
>>>> The existing mTHP stats interface is reused to provide consistency to
>>>> users. And because we are reusing the same interface, we can reuse the
>>>> same infrastructure on the kernel side.
>>>>
>>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
>>>> ---
>>>>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
>>>>  include/linux/huge_mm.h                    |  3 +++
>>>>  include/linux/pagemap.h                    | 16 ++++++++++++++--
>>>>  mm/filemap.c                               |  6 ++++--
>>>>  mm/huge_memory.c                           |  7 +++++++
>>>>  5 files changed, 41 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>>> index 058485daf186..d4857e457add 100644
>>>> --- a/Documentation/admin-guide/mm/transhuge.rst
>>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
>>>> @@ -512,6 +512,19 @@ shmem_fallback_charge
>>>>         falls back to using small pages even though the allocation was
>>>>         successful.
>>>>
>>>> +file_alloc
>>>> +       is incremented every time a file huge page is successfully
>>>> +       allocated.
>>>> +
>>>> +file_fallback
>>>> +       is incremented if a file huge page is attempted to be allocated
>>>> +       but fails and instead falls back to using small pages.
>>>> +
>>>> +file_fallback_charge
>>>> +       is incremented if a file huge page cannot be charged and instead
>>>> +       falls back to using small pages even though the allocation was
>>>> +       successful.
>>>> +
>>>
>>> I realized that when we talk about fallback, it doesn't necessarily mean
>>> small pages; it could also refer to smaller huge pages.
>>
>> Yes good point, I'll update the documentation to reflect that for all memory types.
>>
>>>
>>> anon_fault_alloc
>>>         is incremented every time a huge page is successfully
>>>         allocated and charged to handle a page fault.
>>>
>>> anon_fault_fallback
>>>         is incremented if a page fault fails to allocate or charge
>>>         a huge page and instead falls back to using huge pages with
>>>         lower orders or small pages.
>>>
>>> anon_fault_fallback_charge
>>>         is incremented if a page fault fails to charge a huge page and
>>>         instead falls back to using huge pages with lower orders or
>>>         small pages even though the allocation was successful.
>>>
>>> This also applies to files, right?
>>
>> It does in the place you highlight below, but page_cache_ra_order() just falls
>> back immediately to order-0. Regardless, I think we should just document the
>> user facing docs to allow for a lower high order. That gives the implementation
>> more flexibility.
>>
>>>
>>>                 do {
>>>                         gfp_t alloc_gfp = gfp;
>>>
>>>                         err = -ENOMEM;
>>>                         if (order > 0)
>>>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
>>>                         folio = filemap_alloc_folio(alloc_gfp, order);
>>>                         if (!folio)
>>>                                 continue;
>>>
>>>                         /* Init accessed so avoid atomic
>>> mark_page_accessed later */
>>>                         if (fgp_flags & FGP_ACCESSED)
>>>                                 __folio_set_referenced(folio);
>>>
>>>                         err = filemap_add_folio(mapping, folio, index, gfp);
>>>                         if (!err)
>>>                                 break;
>>>                         folio_put(folio);
>>>                         folio = NULL;
>>>                 } while (order-- > 0);
>>>
>>>
>>>>  split
>>>>         is incremented every time a huge page is successfully split into
>>>>         smaller orders. This can happen for a variety of reasons but a
>>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>>> index b8c63c3e967f..4f9109fcdded 100644
>>>> --- a/include/linux/huge_mm.h
>>>> +++ b/include/linux/huge_mm.h
>>>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
>>>>         MTHP_STAT_SHMEM_ALLOC,
>>>>         MTHP_STAT_SHMEM_FALLBACK,
>>>>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
>>>> +       MTHP_STAT_FILE_ALLOC,
>>>> +       MTHP_STAT_FILE_FALLBACK,
>>>> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
>>>>         MTHP_STAT_SPLIT,
>>>>         MTHP_STAT_SPLIT_FAILED,
>>>>         MTHP_STAT_SPLIT_DEFERRED,
>>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
>>>> index 6e2f72d03176..95a147b5d117 100644
>>>> --- a/include/linux/pagemap.h
>>>> +++ b/include/linux/pagemap.h
>>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
>>>>  }
>>>>
>>>>  #ifdef CONFIG_NUMA
>>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
>>>>  #else
>>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>  {
>>>>         return folio_alloc_noprof(gfp, order);
>>>>  }
>>>>  #endif
>>>>
>>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +{
>>>> +       struct folio *folio;
>>>> +
>>>> +       folio = __filemap_alloc_folio_noprof(gfp, order);
>>>> +
>>>> +       if (!folio)
>>>> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
>>>> +
>>>> +       return folio;
>>>> +}
>>>
>>> Do we need to add and export __filemap_alloc_folio_noprof()?
>>
>> It is exported. See the below change in filemap.c. Previously
>> filemap_alloc_folio_noprof() was exported, but that is now an inline and
>> __filemap_alloc_folio_noprof() is exported in its place.
>>
>>> In any case,
>>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
>>> will only allocate the folio instead?
>>
>> Sorry I don't understand what you mean by this?
> 
> Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
> to situations where people call __filemap_alloc_folio_noprof() and
> forget to call
> count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
> that counting is always necessary? 

OK to make sure I've understood, I think you are saying that there is a risk
that people will call __filemap_alloc_folio_noprof() and bypass the stat
counting? But its the same problem with all "_noprof" functions; if those are
called directly, it bypasses the memory allocation profiling infrastructure. So
this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is
the public API.

> Another option is that we still
> call count mthp
> within filemap_alloc_folio_noprof() and make it noinline if
> __filemap_alloc_folio_noprof()
> is not inline?

I could certainly make filemap_alloc_folio_noprof() always out-of-line and then
handle the counting privately in the compilation unit. But before my change
filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was
trying not to change that. I think what you're suggesting would be tidier
though. I'll make the change. But I don't think it solves the wider problem of
the possibility that people can call private APIs. That's what review is for.

Thanks,
Ryan


> 
>>
>>>
>>>> +
>>>>  #define filemap_alloc_folio(...)                               \
>>>>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
>>>>
>>>> diff --git a/mm/filemap.c b/mm/filemap.c
>>>> index 53d5d0410b51..131d514fca29 100644
>>>> --- a/mm/filemap.c
>>>> +++ b/mm/filemap.c
>>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>>>         int ret;
>>>>
>>>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>>>> +       count_mthp_stat(folio_order(folio),
>>>> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
>>>>         if (ret)
>>>>                 return ret;
>>>
>>> Would the following be better?
>>>
>>>         ret = mem_cgroup_charge(folio, NULL, gfp);
>>>          if (ret) {
>>>                  count_mthp_stat(folio_order(folio),
>>> MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>                  return ret;
>>>          }
>>>        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
>>>
>>> Anyway, it's up to you. The code just feels a bit off to me :-)
>>
>> Yes, agree your version is better. I also noticed that anon and shmem increment
>> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
>> semantics here.
>>
>> Thanks,
>> Ryan
>>
>>
>>>
>>>>
>>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
>>>>  EXPORT_SYMBOL_GPL(filemap_add_folio);
>>>>
>>>>  #ifdef CONFIG_NUMA
>>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>  {
>>>>         int n;
>>>>         struct folio *folio;
>>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
>>>>         }
>>>>         return folio_alloc_noprof(gfp, order);
>>>>  }
>>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
>>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
>>>>  #endif
>>>>
>>>>  /*
>>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>>>> index 578ac212c172..26d558e3e80f 100644
>>>> --- a/mm/huge_memory.c
>>>> +++ b/mm/huge_memory.c
>>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
>>>>         .attrs = anon_stats_attrs,
>>>>  };
>>>>
>>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
>>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
>>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
>>>> +
>>>>  static struct attribute *file_stats_attrs[] = {
>>>> +       &file_alloc_attr.attr,
>>>> +       &file_fallback_attr.attr,
>>>> +       &file_fallback_charge_attr.attr,
>>>>  #ifdef CONFIG_SHMEM
>>>>         &shmem_alloc_attr.attr,
>>>>         &shmem_fallback_attr.attr,
>>>> --
>>>> 2.43.0
>>>>
>>>
> 
> Thanks
> Barry

Barry Song July 24, 2024, 10:23 a.m. UTC | #5

On Wed, Jul 24, 2024 at 8:12 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 23/07/2024 23:07, Barry Song wrote:
> > On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>
> >> On 19/07/2024 01:12, Barry Song wrote:
> >>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote:
> >>>>
> >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations:
> >>>>
> >>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc
> >>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback
> >>>>   /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge
> >>>>
> >>>> This will provide some insight on the sizes of large folios being
> >>>> allocated for file-backed memory, and how often allocation is failing.
> >>>>
> >>>> All non-order-0 (and most order-0) folio allocations are currently done
> >>>> through filemap_alloc_folio(), and folios are charged in a subsequent
> >>>> call to filemap_add_folio(). So count file_fallback when allocation
> >>>> fails in filemap_alloc_folio() and count file_alloc or
> >>>> file_fallback_charge in filemap_add_folio(), based on whether charging
> >>>> succeeded or not. There are some users of filemap_add_folio() that
> >>>> allocate their own order-0 folio by other means, so we would not count
> >>>> an allocation failure in this case, but we also don't care about order-0
> >>>> allocations. This approach feels like it should be good enough and
> >>>> doesn't require any (impractically large) refactoring.
> >>>>
> >>>> The existing mTHP stats interface is reused to provide consistency to
> >>>> users. And because we are reusing the same interface, we can reuse the
> >>>> same infrastructure on the kernel side.
> >>>>
> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
> >>>> ---
> >>>>  Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++
> >>>>  include/linux/huge_mm.h                    |  3 +++
> >>>>  include/linux/pagemap.h                    | 16 ++++++++++++++--
> >>>>  mm/filemap.c                               |  6 ++++--
> >>>>  mm/huge_memory.c                           |  7 +++++++
> >>>>  5 files changed, 41 insertions(+), 4 deletions(-)
> >>>>
> >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
> >>>> index 058485daf186..d4857e457add 100644
> >>>> --- a/Documentation/admin-guide/mm/transhuge.rst
> >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst
> >>>> @@ -512,6 +512,19 @@ shmem_fallback_charge
> >>>>         falls back to using small pages even though the allocation was
> >>>>         successful.
> >>>>
> >>>> +file_alloc
> >>>> +       is incremented every time a file huge page is successfully
> >>>> +       allocated.
> >>>> +
> >>>> +file_fallback
> >>>> +       is incremented if a file huge page is attempted to be allocated
> >>>> +       but fails and instead falls back to using small pages.
> >>>> +
> >>>> +file_fallback_charge
> >>>> +       is incremented if a file huge page cannot be charged and instead
> >>>> +       falls back to using small pages even though the allocation was
> >>>> +       successful.
> >>>> +
> >>>
> >>> I realized that when we talk about fallback, it doesn't necessarily mean
> >>> small pages; it could also refer to smaller huge pages.
> >>
> >> Yes good point, I'll update the documentation to reflect that for all memory types.
> >>
> >>>
> >>> anon_fault_alloc
> >>>         is incremented every time a huge page is successfully
> >>>         allocated and charged to handle a page fault.
> >>>
> >>> anon_fault_fallback
> >>>         is incremented if a page fault fails to allocate or charge
> >>>         a huge page and instead falls back to using huge pages with
> >>>         lower orders or small pages.
> >>>
> >>> anon_fault_fallback_charge
> >>>         is incremented if a page fault fails to charge a huge page and
> >>>         instead falls back to using huge pages with lower orders or
> >>>         small pages even though the allocation was successful.
> >>>
> >>> This also applies to files, right?
> >>
> >> It does in the place you highlight below, but page_cache_ra_order() just falls
> >> back immediately to order-0. Regardless, I think we should just document the
> >> user facing docs to allow for a lower high order. That gives the implementation
> >> more flexibility.
> >>
> >>>
> >>>                 do {
> >>>                         gfp_t alloc_gfp = gfp;
> >>>
> >>>                         err = -ENOMEM;
> >>>                         if (order > 0)
> >>>                                 alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN;
> >>>                         folio = filemap_alloc_folio(alloc_gfp, order);
> >>>                         if (!folio)
> >>>                                 continue;
> >>>
> >>>                         /* Init accessed so avoid atomic
> >>> mark_page_accessed later */
> >>>                         if (fgp_flags & FGP_ACCESSED)
> >>>                                 __folio_set_referenced(folio);
> >>>
> >>>                         err = filemap_add_folio(mapping, folio, index, gfp);
> >>>                         if (!err)
> >>>                                 break;
> >>>                         folio_put(folio);
> >>>                         folio = NULL;
> >>>                 } while (order-- > 0);
> >>>
> >>>
> >>>>  split
> >>>>         is incremented every time a huge page is successfully split into
> >>>>         smaller orders. This can happen for a variety of reasons but a
> >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> >>>> index b8c63c3e967f..4f9109fcdded 100644
> >>>> --- a/include/linux/huge_mm.h
> >>>> +++ b/include/linux/huge_mm.h
> >>>> @@ -123,6 +123,9 @@ enum mthp_stat_item {
> >>>>         MTHP_STAT_SHMEM_ALLOC,
> >>>>         MTHP_STAT_SHMEM_FALLBACK,
> >>>>         MTHP_STAT_SHMEM_FALLBACK_CHARGE,
> >>>> +       MTHP_STAT_FILE_ALLOC,
> >>>> +       MTHP_STAT_FILE_FALLBACK,
> >>>> +       MTHP_STAT_FILE_FALLBACK_CHARGE,
> >>>>         MTHP_STAT_SPLIT,
> >>>>         MTHP_STAT_SPLIT_FAILED,
> >>>>         MTHP_STAT_SPLIT_DEFERRED,
> >>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> >>>> index 6e2f72d03176..95a147b5d117 100644
> >>>> --- a/include/linux/pagemap.h
> >>>> +++ b/include/linux/pagemap.h
> >>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page)
> >>>>  }
> >>>>
> >>>>  #ifdef CONFIG_NUMA
> >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order);
> >>>>  #else
> >>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>>  {
> >>>>         return folio_alloc_noprof(gfp, order);
> >>>>  }
> >>>>  #endif
> >>>>
> >>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +{
> >>>> +       struct folio *folio;
> >>>> +
> >>>> +       folio = __filemap_alloc_folio_noprof(gfp, order);
> >>>> +
> >>>> +       if (!folio)
> >>>> +               count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK);
> >>>> +
> >>>> +       return folio;
> >>>> +}
> >>>
> >>> Do we need to add and export __filemap_alloc_folio_noprof()?
> >>
> >> It is exported. See the below change in filemap.c. Previously
> >> filemap_alloc_folio_noprof() was exported, but that is now an inline and
> >> __filemap_alloc_folio_noprof() is exported in its place.
> >>
> >>> In any case,
> >>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and
> >>> will only allocate the folio instead?
> >>
> >> Sorry I don't understand what you mean by this?
> >
> > Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead
> > to situations where people call __filemap_alloc_folio_noprof() and
> > forget to call
> > count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems
> > that counting is always necessary?
>
> OK to make sure I've understood, I think you are saying that there is a risk
> that people will call __filemap_alloc_folio_noprof() and bypass the stat
> counting? But its the same problem with all "_noprof" functions; if those are
> called directly, it bypasses the memory allocation profiling infrastructure. So
> this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is
> the public API.

Indeed. Maybe I'm overthinking it.

>
> > Another option is that we still
> > call count mthp
> > within filemap_alloc_folio_noprof() and make it noinline if
> > __filemap_alloc_folio_noprof()
> > is not inline?
>
> I could certainly make filemap_alloc_folio_noprof() always out-of-line and then
> handle the counting privately in the compilation unit. But before my change
> filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was
> trying not to change that. I think what you're suggesting would be tidier
> though. I'll make the change. But I don't think it solves the wider problem of
> the possibility that people can call private APIs. That's what review is for.

Agreed.

>
> Thanks,
> Ryan
>
>
> >
> >>
> >>>
> >>>> +
> >>>>  #define filemap_alloc_folio(...)                               \
> >>>>         alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__))
> >>>>
> >>>> diff --git a/mm/filemap.c b/mm/filemap.c
> >>>> index 53d5d0410b51..131d514fca29 100644
> >>>> --- a/mm/filemap.c
> >>>> +++ b/mm/filemap.c
> >>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>>>         int ret;
> >>>>
> >>>>         ret = mem_cgroup_charge(folio, NULL, gfp);
> >>>> +       count_mthp_stat(folio_order(folio),
> >>>> +               ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC);
> >>>>         if (ret)
> >>>>                 return ret;
> >>>
> >>> Would the following be better?
> >>>
> >>>         ret = mem_cgroup_charge(folio, NULL, gfp);
> >>>          if (ret) {
> >>>                  count_mthp_stat(folio_order(folio),
> >>> MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>>                  return ret;
> >>>          }
> >>>        count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC);
> >>>
> >>> Anyway, it's up to you. The code just feels a bit off to me :-)
> >>
> >> Yes, agree your version is better. I also noticed that anon and shmem increment
> >> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same
> >> semantics here.
> >>
> >> Thanks,
> >> Ryan
> >>
> >>
> >>>
> >>>>
> >>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio,
> >>>>  EXPORT_SYMBOL_GPL(filemap_add_folio);
> >>>>
> >>>>  #ifdef CONFIG_NUMA
> >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>>  {
> >>>>         int n;
> >>>>         struct folio *folio;
> >>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order)
> >>>>         }
> >>>>         return folio_alloc_noprof(gfp, order);
> >>>>  }
> >>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof);
> >>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof);
> >>>>  #endif
> >>>>
> >>>>  /*
> >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> >>>> index 578ac212c172..26d558e3e80f 100644
> >>>> --- a/mm/huge_memory.c
> >>>> +++ b/mm/huge_memory.c
> >>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = {
> >>>>         .attrs = anon_stats_attrs,
> >>>>  };
> >>>>
> >>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC);
> >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK);
> >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE);
> >>>> +
> >>>>  static struct attribute *file_stats_attrs[] = {
> >>>> +       &file_alloc_attr.attr,
> >>>> +       &file_fallback_attr.attr,
> >>>> +       &file_fallback_charge_attr.attr,
> >>>>  #ifdef CONFIG_SHMEM
> >>>>         &shmem_alloc_attr.attr,
> >>>>         &shmem_fallback_attr.attr,
> >>>> --
> >>>> 2.43.0
> >>>>
> >>>
> >

Thanks
 Barry

[v2,3/3] mm: mTHP stats for pagecache folio allocations

Commit Message

Comments

Patch