Message ID | 20240716135907.4047689-4-ryan.roberts@arm.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mTHP allocation stats for file-backed memory | expand |
On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > > Expose 3 new mTHP stats for file (pagecache) folio allocations: > > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback > /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge > > This will provide some insight on the sizes of large folios being > allocated for file-backed memory, and how often allocation is failing. > > All non-order-0 (and most order-0) folio allocations are currently done > through filemap_alloc_folio(), and folios are charged in a subsequent > call to filemap_add_folio(). So count file_fallback when allocation > fails in filemap_alloc_folio() and count file_alloc or > file_fallback_charge in filemap_add_folio(), based on whether charging > succeeded or not. There are some users of filemap_add_folio() that > allocate their own order-0 folio by other means, so we would not count > an allocation failure in this case, but we also don't care about order-0 > allocations. This approach feels like it should be good enough and > doesn't require any (impractically large) refactoring. > > The existing mTHP stats interface is reused to provide consistency to > users. And because we are reusing the same interface, we can reuse the > same infrastructure on the kernel side. > > Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > --- > Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ > include/linux/huge_mm.h | 3 +++ > include/linux/pagemap.h | 16 ++++++++++++++-- > mm/filemap.c | 6 ++++-- > mm/huge_memory.c | 7 +++++++ > 5 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > index 058485daf186..d4857e457add 100644 > --- a/Documentation/admin-guide/mm/transhuge.rst > +++ b/Documentation/admin-guide/mm/transhuge.rst > @@ -512,6 +512,19 @@ shmem_fallback_charge > falls back to using small pages even though the allocation was > successful. > > +file_alloc > + is incremented every time a file huge page is successfully > + allocated. > + > +file_fallback > + is incremented if a file huge page is attempted to be allocated > + but fails and instead falls back to using small pages. > + > +file_fallback_charge > + is incremented if a file huge page cannot be charged and instead > + falls back to using small pages even though the allocation was > + successful. > + I realized that when we talk about fallback, it doesn't necessarily mean small pages; it could also refer to smaller huge pages. anon_fault_alloc is incremented every time a huge page is successfully allocated and charged to handle a page fault. anon_fault_fallback is incremented if a page fault fails to allocate or charge a huge page and instead falls back to using huge pages with lower orders or small pages. anon_fault_fallback_charge is incremented if a page fault fails to charge a huge page and instead falls back to using huge pages with lower orders or small pages even though the allocation was successful. This also applies to files, right? do { gfp_t alloc_gfp = gfp; err = -ENOMEM; if (order > 0) alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; folio = filemap_alloc_folio(alloc_gfp, order); if (!folio) continue; /* Init accessed so avoid atomic mark_page_accessed later */ if (fgp_flags & FGP_ACCESSED) __folio_set_referenced(folio); err = filemap_add_folio(mapping, folio, index, gfp); if (!err) break; folio_put(folio); folio = NULL; } while (order-- > 0); > split > is incremented every time a huge page is successfully split into > smaller orders. This can happen for a variety of reasons but a > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index b8c63c3e967f..4f9109fcdded 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -123,6 +123,9 @@ enum mthp_stat_item { > MTHP_STAT_SHMEM_ALLOC, > MTHP_STAT_SHMEM_FALLBACK, > MTHP_STAT_SHMEM_FALLBACK_CHARGE, > + MTHP_STAT_FILE_ALLOC, > + MTHP_STAT_FILE_FALLBACK, > + MTHP_STAT_FILE_FALLBACK_CHARGE, > MTHP_STAT_SPLIT, > MTHP_STAT_SPLIT_FAILED, > MTHP_STAT_SPLIT_DEFERRED, > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 6e2f72d03176..95a147b5d117 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) > } > > #ifdef CONFIG_NUMA > -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > #else > -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > { > return folio_alloc_noprof(gfp, order); > } > #endif > > +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > +{ > + struct folio *folio; > + > + folio = __filemap_alloc_folio_noprof(gfp, order); > + > + if (!folio) > + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > + > + return folio; > +} Do we need to add and export __filemap_alloc_folio_noprof()? In any case, we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and will only allocate the folio instead? > + > #define filemap_alloc_folio(...) \ > alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > > diff --git a/mm/filemap.c b/mm/filemap.c > index 53d5d0410b51..131d514fca29 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > int ret; > > ret = mem_cgroup_charge(folio, NULL, gfp); > + count_mthp_stat(folio_order(folio), > + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); > if (ret) > return ret; Would the following be better? ret = mem_cgroup_charge(folio, NULL, gfp); if (ret) { count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_FALLBACK_CHARGE); return ret; } count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); Anyway, it's up to you. The code just feels a bit off to me :-) > > @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > EXPORT_SYMBOL_GPL(filemap_add_folio); > > #ifdef CONFIG_NUMA > -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > { > int n; > struct folio *folio; > @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > } > return folio_alloc_noprof(gfp, order); > } > -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > #endif > > /* > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 578ac212c172..26d558e3e80f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { > .attrs = anon_stats_attrs, > }; > > +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); > + > static struct attribute *file_stats_attrs[] = { > + &file_alloc_attr.attr, > + &file_fallback_attr.attr, > + &file_fallback_charge_attr.attr, > #ifdef CONFIG_SHMEM > &shmem_alloc_attr.attr, > &shmem_fallback_attr.attr, > -- > 2.43.0 > Thanks Barry
On 19/07/2024 01:12, Barry Song wrote: > On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> Expose 3 new mTHP stats for file (pagecache) folio allocations: >> >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge >> >> This will provide some insight on the sizes of large folios being >> allocated for file-backed memory, and how often allocation is failing. >> >> All non-order-0 (and most order-0) folio allocations are currently done >> through filemap_alloc_folio(), and folios are charged in a subsequent >> call to filemap_add_folio(). So count file_fallback when allocation >> fails in filemap_alloc_folio() and count file_alloc or >> file_fallback_charge in filemap_add_folio(), based on whether charging >> succeeded or not. There are some users of filemap_add_folio() that >> allocate their own order-0 folio by other means, so we would not count >> an allocation failure in this case, but we also don't care about order-0 >> allocations. This approach feels like it should be good enough and >> doesn't require any (impractically large) refactoring. >> >> The existing mTHP stats interface is reused to provide consistency to >> users. And because we are reusing the same interface, we can reuse the >> same infrastructure on the kernel side. >> >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >> --- >> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ >> include/linux/huge_mm.h | 3 +++ >> include/linux/pagemap.h | 16 ++++++++++++++-- >> mm/filemap.c | 6 ++++-- >> mm/huge_memory.c | 7 +++++++ >> 5 files changed, 41 insertions(+), 4 deletions(-) >> >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >> index 058485daf186..d4857e457add 100644 >> --- a/Documentation/admin-guide/mm/transhuge.rst >> +++ b/Documentation/admin-guide/mm/transhuge.rst >> @@ -512,6 +512,19 @@ shmem_fallback_charge >> falls back to using small pages even though the allocation was >> successful. >> >> +file_alloc >> + is incremented every time a file huge page is successfully >> + allocated. >> + >> +file_fallback >> + is incremented if a file huge page is attempted to be allocated >> + but fails and instead falls back to using small pages. >> + >> +file_fallback_charge >> + is incremented if a file huge page cannot be charged and instead >> + falls back to using small pages even though the allocation was >> + successful. >> + > > I realized that when we talk about fallback, it doesn't necessarily mean > small pages; it could also refer to smaller huge pages. Yes good point, I'll update the documentation to reflect that for all memory types. > > anon_fault_alloc > is incremented every time a huge page is successfully > allocated and charged to handle a page fault. > > anon_fault_fallback > is incremented if a page fault fails to allocate or charge > a huge page and instead falls back to using huge pages with > lower orders or small pages. > > anon_fault_fallback_charge > is incremented if a page fault fails to charge a huge page and > instead falls back to using huge pages with lower orders or > small pages even though the allocation was successful. > > This also applies to files, right? It does in the place you highlight below, but page_cache_ra_order() just falls back immediately to order-0. Regardless, I think we should just document the user facing docs to allow for a lower high order. That gives the implementation more flexibility. > > do { > gfp_t alloc_gfp = gfp; > > err = -ENOMEM; > if (order > 0) > alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > folio = filemap_alloc_folio(alloc_gfp, order); > if (!folio) > continue; > > /* Init accessed so avoid atomic > mark_page_accessed later */ > if (fgp_flags & FGP_ACCESSED) > __folio_set_referenced(folio); > > err = filemap_add_folio(mapping, folio, index, gfp); > if (!err) > break; > folio_put(folio); > folio = NULL; > } while (order-- > 0); > > >> split >> is incremented every time a huge page is successfully split into >> smaller orders. This can happen for a variety of reasons but a >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index b8c63c3e967f..4f9109fcdded 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -123,6 +123,9 @@ enum mthp_stat_item { >> MTHP_STAT_SHMEM_ALLOC, >> MTHP_STAT_SHMEM_FALLBACK, >> MTHP_STAT_SHMEM_FALLBACK_CHARGE, >> + MTHP_STAT_FILE_ALLOC, >> + MTHP_STAT_FILE_FALLBACK, >> + MTHP_STAT_FILE_FALLBACK_CHARGE, >> MTHP_STAT_SPLIT, >> MTHP_STAT_SPLIT_FAILED, >> MTHP_STAT_SPLIT_DEFERRED, >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >> index 6e2f72d03176..95a147b5d117 100644 >> --- a/include/linux/pagemap.h >> +++ b/include/linux/pagemap.h >> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) >> } >> >> #ifdef CONFIG_NUMA >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); >> #else >> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> { >> return folio_alloc_noprof(gfp, order); >> } >> #endif >> >> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> +{ >> + struct folio *folio; >> + >> + folio = __filemap_alloc_folio_noprof(gfp, order); >> + >> + if (!folio) >> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); >> + >> + return folio; >> +} > > Do we need to add and export __filemap_alloc_folio_noprof()? It is exported. See the below change in filemap.c. Previously filemap_alloc_folio_noprof() was exported, but that is now an inline and __filemap_alloc_folio_noprof() is exported in its place. > In any case, > we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and > will only allocate the folio instead? Sorry I don't understand what you mean by this? > >> + >> #define filemap_alloc_folio(...) \ >> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) >> >> diff --git a/mm/filemap.c b/mm/filemap.c >> index 53d5d0410b51..131d514fca29 100644 >> --- a/mm/filemap.c >> +++ b/mm/filemap.c >> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, >> int ret; >> >> ret = mem_cgroup_charge(folio, NULL, gfp); >> + count_mthp_stat(folio_order(folio), >> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); >> if (ret) >> return ret; > > Would the following be better? > > ret = mem_cgroup_charge(folio, NULL, gfp); > if (ret) { > count_mthp_stat(folio_order(folio), > MTHP_STAT_FILE_FALLBACK_CHARGE); > return ret; > } > count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > > Anyway, it's up to you. The code just feels a bit off to me :-) Yes, agree your version is better. I also noticed that anon and shmem increment FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same semantics here. Thanks, Ryan > >> >> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, >> EXPORT_SYMBOL_GPL(filemap_add_folio); >> >> #ifdef CONFIG_NUMA >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> { >> int n; >> struct folio *folio; >> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >> } >> return folio_alloc_noprof(gfp, order); >> } >> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); >> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); >> #endif >> >> /* >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 578ac212c172..26d558e3e80f 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { >> .attrs = anon_stats_attrs, >> }; >> >> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); >> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); >> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); >> + >> static struct attribute *file_stats_attrs[] = { >> + &file_alloc_attr.attr, >> + &file_fallback_attr.attr, >> + &file_fallback_charge_attr.attr, >> #ifdef CONFIG_SHMEM >> &shmem_alloc_attr.attr, >> &shmem_fallback_attr.attr, >> -- >> 2.43.0 >> > > Thanks > Barry
On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 19/07/2024 01:12, Barry Song wrote: > > On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > >> > >> Expose 3 new mTHP stats for file (pagecache) folio allocations: > >> > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback > >> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge > >> > >> This will provide some insight on the sizes of large folios being > >> allocated for file-backed memory, and how often allocation is failing. > >> > >> All non-order-0 (and most order-0) folio allocations are currently done > >> through filemap_alloc_folio(), and folios are charged in a subsequent > >> call to filemap_add_folio(). So count file_fallback when allocation > >> fails in filemap_alloc_folio() and count file_alloc or > >> file_fallback_charge in filemap_add_folio(), based on whether charging > >> succeeded or not. There are some users of filemap_add_folio() that > >> allocate their own order-0 folio by other means, so we would not count > >> an allocation failure in this case, but we also don't care about order-0 > >> allocations. This approach feels like it should be good enough and > >> doesn't require any (impractically large) refactoring. > >> > >> The existing mTHP stats interface is reused to provide consistency to > >> users. And because we are reusing the same interface, we can reuse the > >> same infrastructure on the kernel side. > >> > >> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > >> --- > >> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ > >> include/linux/huge_mm.h | 3 +++ > >> include/linux/pagemap.h | 16 ++++++++++++++-- > >> mm/filemap.c | 6 ++++-- > >> mm/huge_memory.c | 7 +++++++ > >> 5 files changed, 41 insertions(+), 4 deletions(-) > >> > >> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >> index 058485daf186..d4857e457add 100644 > >> --- a/Documentation/admin-guide/mm/transhuge.rst > >> +++ b/Documentation/admin-guide/mm/transhuge.rst > >> @@ -512,6 +512,19 @@ shmem_fallback_charge > >> falls back to using small pages even though the allocation was > >> successful. > >> > >> +file_alloc > >> + is incremented every time a file huge page is successfully > >> + allocated. > >> + > >> +file_fallback > >> + is incremented if a file huge page is attempted to be allocated > >> + but fails and instead falls back to using small pages. > >> + > >> +file_fallback_charge > >> + is incremented if a file huge page cannot be charged and instead > >> + falls back to using small pages even though the allocation was > >> + successful. > >> + > > > > I realized that when we talk about fallback, it doesn't necessarily mean > > small pages; it could also refer to smaller huge pages. > > Yes good point, I'll update the documentation to reflect that for all memory types. > > > > > anon_fault_alloc > > is incremented every time a huge page is successfully > > allocated and charged to handle a page fault. > > > > anon_fault_fallback > > is incremented if a page fault fails to allocate or charge > > a huge page and instead falls back to using huge pages with > > lower orders or small pages. > > > > anon_fault_fallback_charge > > is incremented if a page fault fails to charge a huge page and > > instead falls back to using huge pages with lower orders or > > small pages even though the allocation was successful. > > > > This also applies to files, right? > > It does in the place you highlight below, but page_cache_ra_order() just falls > back immediately to order-0. Regardless, I think we should just document the > user facing docs to allow for a lower high order. That gives the implementation > more flexibility. > > > > > do { > > gfp_t alloc_gfp = gfp; > > > > err = -ENOMEM; > > if (order > 0) > > alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > > folio = filemap_alloc_folio(alloc_gfp, order); > > if (!folio) > > continue; > > > > /* Init accessed so avoid atomic > > mark_page_accessed later */ > > if (fgp_flags & FGP_ACCESSED) > > __folio_set_referenced(folio); > > > > err = filemap_add_folio(mapping, folio, index, gfp); > > if (!err) > > break; > > folio_put(folio); > > folio = NULL; > > } while (order-- > 0); > > > > > >> split > >> is incremented every time a huge page is successfully split into > >> smaller orders. This can happen for a variety of reasons but a > >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >> index b8c63c3e967f..4f9109fcdded 100644 > >> --- a/include/linux/huge_mm.h > >> +++ b/include/linux/huge_mm.h > >> @@ -123,6 +123,9 @@ enum mthp_stat_item { > >> MTHP_STAT_SHMEM_ALLOC, > >> MTHP_STAT_SHMEM_FALLBACK, > >> MTHP_STAT_SHMEM_FALLBACK_CHARGE, > >> + MTHP_STAT_FILE_ALLOC, > >> + MTHP_STAT_FILE_FALLBACK, > >> + MTHP_STAT_FILE_FALLBACK_CHARGE, > >> MTHP_STAT_SPLIT, > >> MTHP_STAT_SPLIT_FAILED, > >> MTHP_STAT_SPLIT_DEFERRED, > >> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > >> index 6e2f72d03176..95a147b5d117 100644 > >> --- a/include/linux/pagemap.h > >> +++ b/include/linux/pagemap.h > >> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) > >> } > >> > >> #ifdef CONFIG_NUMA > >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >> #else > >> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> { > >> return folio_alloc_noprof(gfp, order); > >> } > >> #endif > >> > >> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +{ > >> + struct folio *folio; > >> + > >> + folio = __filemap_alloc_folio_noprof(gfp, order); > >> + > >> + if (!folio) > >> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > >> + > >> + return folio; > >> +} > > > > Do we need to add and export __filemap_alloc_folio_noprof()? > > It is exported. See the below change in filemap.c. Previously > filemap_alloc_folio_noprof() was exported, but that is now an inline and > __filemap_alloc_folio_noprof() is exported in its place. > > > In any case, > > we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and > > will only allocate the folio instead? > > Sorry I don't understand what you mean by this? Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead to situations where people call __filemap_alloc_folio_noprof() and forget to call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems that counting is always necessary? Another option is that we still call count mthp within filemap_alloc_folio_noprof() and make it noinline if __filemap_alloc_folio_noprof() is not inline? > > > > >> + > >> #define filemap_alloc_folio(...) \ > >> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > >> > >> diff --git a/mm/filemap.c b/mm/filemap.c > >> index 53d5d0410b51..131d514fca29 100644 > >> --- a/mm/filemap.c > >> +++ b/mm/filemap.c > >> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >> int ret; > >> > >> ret = mem_cgroup_charge(folio, NULL, gfp); > >> + count_mthp_stat(folio_order(folio), > >> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); > >> if (ret) > >> return ret; > > > > Would the following be better? > > > > ret = mem_cgroup_charge(folio, NULL, gfp); > > if (ret) { > > count_mthp_stat(folio_order(folio), > > MTHP_STAT_FILE_FALLBACK_CHARGE); > > return ret; > > } > > count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > > > > Anyway, it's up to you. The code just feels a bit off to me :-) > > Yes, agree your version is better. I also noticed that anon and shmem increment > FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same > semantics here. > > Thanks, > Ryan > > > > > >> > >> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >> EXPORT_SYMBOL_GPL(filemap_add_folio); > >> > >> #ifdef CONFIG_NUMA > >> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> { > >> int n; > >> struct folio *folio; > >> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >> } > >> return folio_alloc_noprof(gfp, order); > >> } > >> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > >> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > >> #endif > >> > >> /* > >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >> index 578ac212c172..26d558e3e80f 100644 > >> --- a/mm/huge_memory.c > >> +++ b/mm/huge_memory.c > >> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { > >> .attrs = anon_stats_attrs, > >> }; > >> > >> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > >> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > >> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); > >> + > >> static struct attribute *file_stats_attrs[] = { > >> + &file_alloc_attr.attr, > >> + &file_fallback_attr.attr, > >> + &file_fallback_charge_attr.attr, > >> #ifdef CONFIG_SHMEM > >> &shmem_alloc_attr.attr, > >> &shmem_fallback_attr.attr, > >> -- > >> 2.43.0 > >> > > Thanks Barry
On 23/07/2024 23:07, Barry Song wrote: > On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> On 19/07/2024 01:12, Barry Song wrote: >>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote: >>>> >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations: >>>> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge >>>> >>>> This will provide some insight on the sizes of large folios being >>>> allocated for file-backed memory, and how often allocation is failing. >>>> >>>> All non-order-0 (and most order-0) folio allocations are currently done >>>> through filemap_alloc_folio(), and folios are charged in a subsequent >>>> call to filemap_add_folio(). So count file_fallback when allocation >>>> fails in filemap_alloc_folio() and count file_alloc or >>>> file_fallback_charge in filemap_add_folio(), based on whether charging >>>> succeeded or not. There are some users of filemap_add_folio() that >>>> allocate their own order-0 folio by other means, so we would not count >>>> an allocation failure in this case, but we also don't care about order-0 >>>> allocations. This approach feels like it should be good enough and >>>> doesn't require any (impractically large) refactoring. >>>> >>>> The existing mTHP stats interface is reused to provide consistency to >>>> users. And because we are reusing the same interface, we can reuse the >>>> same infrastructure on the kernel side. >>>> >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> >>>> --- >>>> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ >>>> include/linux/huge_mm.h | 3 +++ >>>> include/linux/pagemap.h | 16 ++++++++++++++-- >>>> mm/filemap.c | 6 ++++-- >>>> mm/huge_memory.c | 7 +++++++ >>>> 5 files changed, 41 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst >>>> index 058485daf186..d4857e457add 100644 >>>> --- a/Documentation/admin-guide/mm/transhuge.rst >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst >>>> @@ -512,6 +512,19 @@ shmem_fallback_charge >>>> falls back to using small pages even though the allocation was >>>> successful. >>>> >>>> +file_alloc >>>> + is incremented every time a file huge page is successfully >>>> + allocated. >>>> + >>>> +file_fallback >>>> + is incremented if a file huge page is attempted to be allocated >>>> + but fails and instead falls back to using small pages. >>>> + >>>> +file_fallback_charge >>>> + is incremented if a file huge page cannot be charged and instead >>>> + falls back to using small pages even though the allocation was >>>> + successful. >>>> + >>> >>> I realized that when we talk about fallback, it doesn't necessarily mean >>> small pages; it could also refer to smaller huge pages. >> >> Yes good point, I'll update the documentation to reflect that for all memory types. >> >>> >>> anon_fault_alloc >>> is incremented every time a huge page is successfully >>> allocated and charged to handle a page fault. >>> >>> anon_fault_fallback >>> is incremented if a page fault fails to allocate or charge >>> a huge page and instead falls back to using huge pages with >>> lower orders or small pages. >>> >>> anon_fault_fallback_charge >>> is incremented if a page fault fails to charge a huge page and >>> instead falls back to using huge pages with lower orders or >>> small pages even though the allocation was successful. >>> >>> This also applies to files, right? >> >> It does in the place you highlight below, but page_cache_ra_order() just falls >> back immediately to order-0. Regardless, I think we should just document the >> user facing docs to allow for a lower high order. That gives the implementation >> more flexibility. >> >>> >>> do { >>> gfp_t alloc_gfp = gfp; >>> >>> err = -ENOMEM; >>> if (order > 0) >>> alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; >>> folio = filemap_alloc_folio(alloc_gfp, order); >>> if (!folio) >>> continue; >>> >>> /* Init accessed so avoid atomic >>> mark_page_accessed later */ >>> if (fgp_flags & FGP_ACCESSED) >>> __folio_set_referenced(folio); >>> >>> err = filemap_add_folio(mapping, folio, index, gfp); >>> if (!err) >>> break; >>> folio_put(folio); >>> folio = NULL; >>> } while (order-- > 0); >>> >>> >>>> split >>>> is incremented every time a huge page is successfully split into >>>> smaller orders. This can happen for a variety of reasons but a >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >>>> index b8c63c3e967f..4f9109fcdded 100644 >>>> --- a/include/linux/huge_mm.h >>>> +++ b/include/linux/huge_mm.h >>>> @@ -123,6 +123,9 @@ enum mthp_stat_item { >>>> MTHP_STAT_SHMEM_ALLOC, >>>> MTHP_STAT_SHMEM_FALLBACK, >>>> MTHP_STAT_SHMEM_FALLBACK_CHARGE, >>>> + MTHP_STAT_FILE_ALLOC, >>>> + MTHP_STAT_FILE_FALLBACK, >>>> + MTHP_STAT_FILE_FALLBACK_CHARGE, >>>> MTHP_STAT_SPLIT, >>>> MTHP_STAT_SPLIT_FAILED, >>>> MTHP_STAT_SPLIT_DEFERRED, >>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h >>>> index 6e2f72d03176..95a147b5d117 100644 >>>> --- a/include/linux/pagemap.h >>>> +++ b/include/linux/pagemap.h >>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) >>>> } >>>> >>>> #ifdef CONFIG_NUMA >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); >>>> #else >>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> { >>>> return folio_alloc_noprof(gfp, order); >>>> } >>>> #endif >>>> >>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> +{ >>>> + struct folio *folio; >>>> + >>>> + folio = __filemap_alloc_folio_noprof(gfp, order); >>>> + >>>> + if (!folio) >>>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); >>>> + >>>> + return folio; >>>> +} >>> >>> Do we need to add and export __filemap_alloc_folio_noprof()? >> >> It is exported. See the below change in filemap.c. Previously >> filemap_alloc_folio_noprof() was exported, but that is now an inline and >> __filemap_alloc_folio_noprof() is exported in its place. >> >>> In any case, >>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and >>> will only allocate the folio instead? >> >> Sorry I don't understand what you mean by this? > > Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead > to situations where people call __filemap_alloc_folio_noprof() and > forget to call > count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems > that counting is always necessary? OK to make sure I've understood, I think you are saying that there is a risk that people will call __filemap_alloc_folio_noprof() and bypass the stat counting? But its the same problem with all "_noprof" functions; if those are called directly, it bypasses the memory allocation profiling infrastructure. So this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is the public API. > Another option is that we still > call count mthp > within filemap_alloc_folio_noprof() and make it noinline if > __filemap_alloc_folio_noprof() > is not inline? I could certainly make filemap_alloc_folio_noprof() always out-of-line and then handle the counting privately in the compilation unit. But before my change filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was trying not to change that. I think what you're suggesting would be tidier though. I'll make the change. But I don't think it solves the wider problem of the possibility that people can call private APIs. That's what review is for. Thanks, Ryan > >> >>> >>>> + >>>> #define filemap_alloc_folio(...) \ >>>> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) >>>> >>>> diff --git a/mm/filemap.c b/mm/filemap.c >>>> index 53d5d0410b51..131d514fca29 100644 >>>> --- a/mm/filemap.c >>>> +++ b/mm/filemap.c >>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, >>>> int ret; >>>> >>>> ret = mem_cgroup_charge(folio, NULL, gfp); >>>> + count_mthp_stat(folio_order(folio), >>>> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); >>>> if (ret) >>>> return ret; >>> >>> Would the following be better? >>> >>> ret = mem_cgroup_charge(folio, NULL, gfp); >>> if (ret) { >>> count_mthp_stat(folio_order(folio), >>> MTHP_STAT_FILE_FALLBACK_CHARGE); >>> return ret; >>> } >>> count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); >>> >>> Anyway, it's up to you. The code just feels a bit off to me :-) >> >> Yes, agree your version is better. I also noticed that anon and shmem increment >> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same >> semantics here. >> >> Thanks, >> Ryan >> >> >>> >>>> >>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, >>>> EXPORT_SYMBOL_GPL(filemap_add_folio); >>>> >>>> #ifdef CONFIG_NUMA >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> { >>>> int n; >>>> struct folio *folio; >>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) >>>> } >>>> return folio_alloc_noprof(gfp, order); >>>> } >>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); >>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); >>>> #endif >>>> >>>> /* >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >>>> index 578ac212c172..26d558e3e80f 100644 >>>> --- a/mm/huge_memory.c >>>> +++ b/mm/huge_memory.c >>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { >>>> .attrs = anon_stats_attrs, >>>> }; >>>> >>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); >>>> + >>>> static struct attribute *file_stats_attrs[] = { >>>> + &file_alloc_attr.attr, >>>> + &file_fallback_attr.attr, >>>> + &file_fallback_charge_attr.attr, >>>> #ifdef CONFIG_SHMEM >>>> &shmem_alloc_attr.attr, >>>> &shmem_fallback_attr.attr, >>>> -- >>>> 2.43.0 >>>> >>> > > Thanks > Barry
On Wed, Jul 24, 2024 at 8:12 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 23/07/2024 23:07, Barry Song wrote: > > On Fri, Jul 19, 2024 at 8:56 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > >> > >> On 19/07/2024 01:12, Barry Song wrote: > >>> On Wed, Jul 17, 2024 at 1:59 AM Ryan Roberts <ryan.roberts@arm.com> wrote: > >>>> > >>>> Expose 3 new mTHP stats for file (pagecache) folio allocations: > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge > >>>> > >>>> This will provide some insight on the sizes of large folios being > >>>> allocated for file-backed memory, and how often allocation is failing. > >>>> > >>>> All non-order-0 (and most order-0) folio allocations are currently done > >>>> through filemap_alloc_folio(), and folios are charged in a subsequent > >>>> call to filemap_add_folio(). So count file_fallback when allocation > >>>> fails in filemap_alloc_folio() and count file_alloc or > >>>> file_fallback_charge in filemap_add_folio(), based on whether charging > >>>> succeeded or not. There are some users of filemap_add_folio() that > >>>> allocate their own order-0 folio by other means, so we would not count > >>>> an allocation failure in this case, but we also don't care about order-0 > >>>> allocations. This approach feels like it should be good enough and > >>>> doesn't require any (impractically large) refactoring. > >>>> > >>>> The existing mTHP stats interface is reused to provide consistency to > >>>> users. And because we are reusing the same interface, we can reuse the > >>>> same infrastructure on the kernel side. > >>>> > >>>> Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> > >>>> --- > >>>> Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ > >>>> include/linux/huge_mm.h | 3 +++ > >>>> include/linux/pagemap.h | 16 ++++++++++++++-- > >>>> mm/filemap.c | 6 ++++-- > >>>> mm/huge_memory.c | 7 +++++++ > >>>> 5 files changed, 41 insertions(+), 4 deletions(-) > >>>> > >>>> diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > >>>> index 058485daf186..d4857e457add 100644 > >>>> --- a/Documentation/admin-guide/mm/transhuge.rst > >>>> +++ b/Documentation/admin-guide/mm/transhuge.rst > >>>> @@ -512,6 +512,19 @@ shmem_fallback_charge > >>>> falls back to using small pages even though the allocation was > >>>> successful. > >>>> > >>>> +file_alloc > >>>> + is incremented every time a file huge page is successfully > >>>> + allocated. > >>>> + > >>>> +file_fallback > >>>> + is incremented if a file huge page is attempted to be allocated > >>>> + but fails and instead falls back to using small pages. > >>>> + > >>>> +file_fallback_charge > >>>> + is incremented if a file huge page cannot be charged and instead > >>>> + falls back to using small pages even though the allocation was > >>>> + successful. > >>>> + > >>> > >>> I realized that when we talk about fallback, it doesn't necessarily mean > >>> small pages; it could also refer to smaller huge pages. > >> > >> Yes good point, I'll update the documentation to reflect that for all memory types. > >> > >>> > >>> anon_fault_alloc > >>> is incremented every time a huge page is successfully > >>> allocated and charged to handle a page fault. > >>> > >>> anon_fault_fallback > >>> is incremented if a page fault fails to allocate or charge > >>> a huge page and instead falls back to using huge pages with > >>> lower orders or small pages. > >>> > >>> anon_fault_fallback_charge > >>> is incremented if a page fault fails to charge a huge page and > >>> instead falls back to using huge pages with lower orders or > >>> small pages even though the allocation was successful. > >>> > >>> This also applies to files, right? > >> > >> It does in the place you highlight below, but page_cache_ra_order() just falls > >> back immediately to order-0. Regardless, I think we should just document the > >> user facing docs to allow for a lower high order. That gives the implementation > >> more flexibility. > >> > >>> > >>> do { > >>> gfp_t alloc_gfp = gfp; > >>> > >>> err = -ENOMEM; > >>> if (order > 0) > >>> alloc_gfp |= __GFP_NORETRY | __GFP_NOWARN; > >>> folio = filemap_alloc_folio(alloc_gfp, order); > >>> if (!folio) > >>> continue; > >>> > >>> /* Init accessed so avoid atomic > >>> mark_page_accessed later */ > >>> if (fgp_flags & FGP_ACCESSED) > >>> __folio_set_referenced(folio); > >>> > >>> err = filemap_add_folio(mapping, folio, index, gfp); > >>> if (!err) > >>> break; > >>> folio_put(folio); > >>> folio = NULL; > >>> } while (order-- > 0); > >>> > >>> > >>>> split > >>>> is incremented every time a huge page is successfully split into > >>>> smaller orders. This can happen for a variety of reasons but a > >>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > >>>> index b8c63c3e967f..4f9109fcdded 100644 > >>>> --- a/include/linux/huge_mm.h > >>>> +++ b/include/linux/huge_mm.h > >>>> @@ -123,6 +123,9 @@ enum mthp_stat_item { > >>>> MTHP_STAT_SHMEM_ALLOC, > >>>> MTHP_STAT_SHMEM_FALLBACK, > >>>> MTHP_STAT_SHMEM_FALLBACK_CHARGE, > >>>> + MTHP_STAT_FILE_ALLOC, > >>>> + MTHP_STAT_FILE_FALLBACK, > >>>> + MTHP_STAT_FILE_FALLBACK_CHARGE, > >>>> MTHP_STAT_SPLIT, > >>>> MTHP_STAT_SPLIT_FAILED, > >>>> MTHP_STAT_SPLIT_DEFERRED, > >>>> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > >>>> index 6e2f72d03176..95a147b5d117 100644 > >>>> --- a/include/linux/pagemap.h > >>>> +++ b/include/linux/pagemap.h > >>>> @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) > >>>> } > >>>> > >>>> #ifdef CONFIG_NUMA > >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); > >>>> #else > >>>> -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> { > >>>> return folio_alloc_noprof(gfp, order); > >>>> } > >>>> #endif > >>>> > >>>> +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> +{ > >>>> + struct folio *folio; > >>>> + > >>>> + folio = __filemap_alloc_folio_noprof(gfp, order); > >>>> + > >>>> + if (!folio) > >>>> + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); > >>>> + > >>>> + return folio; > >>>> +} > >>> > >>> Do we need to add and export __filemap_alloc_folio_noprof()? > >> > >> It is exported. See the below change in filemap.c. Previously > >> filemap_alloc_folio_noprof() was exported, but that is now an inline and > >> __filemap_alloc_folio_noprof() is exported in its place. > >> > >>> In any case, > >>> we won't call count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK) and > >>> will only allocate the folio instead? > >> > >> Sorry I don't understand what you mean by this? > > > > Ryan, my question is whether exporting __filemap_alloc_folio_noprof() might lead > > to situations where people call __filemap_alloc_folio_noprof() and > > forget to call > > count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK). Currently, it seems > > that counting is always necessary? > > OK to make sure I've understood, I think you are saying that there is a risk > that people will call __filemap_alloc_folio_noprof() and bypass the stat > counting? But its the same problem with all "_noprof" functions; if those are > called directly, it bypasses the memory allocation profiling infrastructure. So > this just needs to be a case of "don't do that" IMHO. filemap_alloc_folio() is > the public API. Indeed. Maybe I'm overthinking it. > > > Another option is that we still > > call count mthp > > within filemap_alloc_folio_noprof() and make it noinline if > > __filemap_alloc_folio_noprof() > > is not inline? > > I could certainly make filemap_alloc_folio_noprof() always out-of-line and then > handle the counting privately in the compilation unit. But before my change > filemap_alloc_folio_noprof() was inline for the !CONFIG_NUMA case and I was > trying not to change that. I think what you're suggesting would be tidier > though. I'll make the change. But I don't think it solves the wider problem of > the possibility that people can call private APIs. That's what review is for. Agreed. > > Thanks, > Ryan > > > > > >> > >>> > >>>> + > >>>> #define filemap_alloc_folio(...) \ > >>>> alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) > >>>> > >>>> diff --git a/mm/filemap.c b/mm/filemap.c > >>>> index 53d5d0410b51..131d514fca29 100644 > >>>> --- a/mm/filemap.c > >>>> +++ b/mm/filemap.c > >>>> @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >>>> int ret; > >>>> > >>>> ret = mem_cgroup_charge(folio, NULL, gfp); > >>>> + count_mthp_stat(folio_order(folio), > >>>> + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); > >>>> if (ret) > >>>> return ret; > >>> > >>> Would the following be better? > >>> > >>> ret = mem_cgroup_charge(folio, NULL, gfp); > >>> if (ret) { > >>> count_mthp_stat(folio_order(folio), > >>> MTHP_STAT_FILE_FALLBACK_CHARGE); > >>> return ret; > >>> } > >>> count_mthp_stat(folio_order(folio), MTHP_STAT_FILE_ALLOC); > >>> > >>> Anyway, it's up to you. The code just feels a bit off to me :-) > >> > >> Yes, agree your version is better. I also noticed that anon and shmem increment > >> FALLBACK whenever FALLBACK_CHARGE is incremented so I should apply those same > >> semantics here. > >> > >> Thanks, > >> Ryan > >> > >> > >>> > >>>> > >>>> @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, > >>>> EXPORT_SYMBOL_GPL(filemap_add_folio); > >>>> > >>>> #ifdef CONFIG_NUMA > >>>> -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> { > >>>> int n; > >>>> struct folio *folio; > >>>> @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) > >>>> } > >>>> return folio_alloc_noprof(gfp, order); > >>>> } > >>>> -EXPORT_SYMBOL(filemap_alloc_folio_noprof); > >>>> +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); > >>>> #endif > >>>> > >>>> /* > >>>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c > >>>> index 578ac212c172..26d558e3e80f 100644 > >>>> --- a/mm/huge_memory.c > >>>> +++ b/mm/huge_memory.c > >>>> @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { > >>>> .attrs = anon_stats_attrs, > >>>> }; > >>>> > >>>> +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); > >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); > >>>> +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); > >>>> + > >>>> static struct attribute *file_stats_attrs[] = { > >>>> + &file_alloc_attr.attr, > >>>> + &file_fallback_attr.attr, > >>>> + &file_fallback_charge_attr.attr, > >>>> #ifdef CONFIG_SHMEM > >>>> &shmem_alloc_attr.attr, > >>>> &shmem_fallback_attr.attr, > >>>> -- > >>>> 2.43.0 > >>>> > >>> > > Thanks Barry
diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst index 058485daf186..d4857e457add 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -512,6 +512,19 @@ shmem_fallback_charge falls back to using small pages even though the allocation was successful. +file_alloc + is incremented every time a file huge page is successfully + allocated. + +file_fallback + is incremented if a file huge page is attempted to be allocated + but fails and instead falls back to using small pages. + +file_fallback_charge + is incremented if a file huge page cannot be charged and instead + falls back to using small pages even though the allocation was + successful. + split is incremented every time a huge page is successfully split into smaller orders. This can happen for a variety of reasons but a diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index b8c63c3e967f..4f9109fcdded 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -123,6 +123,9 @@ enum mthp_stat_item { MTHP_STAT_SHMEM_ALLOC, MTHP_STAT_SHMEM_FALLBACK, MTHP_STAT_SHMEM_FALLBACK_CHARGE, + MTHP_STAT_FILE_ALLOC, + MTHP_STAT_FILE_FALLBACK, + MTHP_STAT_FILE_FALLBACK_CHARGE, MTHP_STAT_SPLIT, MTHP_STAT_SPLIT_FAILED, MTHP_STAT_SPLIT_DEFERRED, diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 6e2f72d03176..95a147b5d117 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -562,14 +562,26 @@ static inline void *detach_page_private(struct page *page) } #ifdef CONFIG_NUMA -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order); #else -static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) +static inline struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) { return folio_alloc_noprof(gfp, order); } #endif +static inline struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) +{ + struct folio *folio; + + folio = __filemap_alloc_folio_noprof(gfp, order); + + if (!folio) + count_mthp_stat(order, MTHP_STAT_FILE_FALLBACK); + + return folio; +} + #define filemap_alloc_folio(...) \ alloc_hooks(filemap_alloc_folio_noprof(__VA_ARGS__)) diff --git a/mm/filemap.c b/mm/filemap.c index 53d5d0410b51..131d514fca29 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -963,6 +963,8 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, int ret; ret = mem_cgroup_charge(folio, NULL, gfp); + count_mthp_stat(folio_order(folio), + ret ? MTHP_STAT_FILE_FALLBACK_CHARGE : MTHP_STAT_FILE_ALLOC); if (ret) return ret; @@ -990,7 +992,7 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, EXPORT_SYMBOL_GPL(filemap_add_folio); #ifdef CONFIG_NUMA -struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) +struct folio *__filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) { int n; struct folio *folio; @@ -1007,7 +1009,7 @@ struct folio *filemap_alloc_folio_noprof(gfp_t gfp, unsigned int order) } return folio_alloc_noprof(gfp, order); } -EXPORT_SYMBOL(filemap_alloc_folio_noprof); +EXPORT_SYMBOL(__filemap_alloc_folio_noprof); #endif /* diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 578ac212c172..26d558e3e80f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -608,7 +608,14 @@ static struct attribute_group anon_stats_attr_grp = { .attrs = anon_stats_attrs, }; +DEFINE_MTHP_STAT_ATTR(file_alloc, MTHP_STAT_FILE_ALLOC); +DEFINE_MTHP_STAT_ATTR(file_fallback, MTHP_STAT_FILE_FALLBACK); +DEFINE_MTHP_STAT_ATTR(file_fallback_charge, MTHP_STAT_FILE_FALLBACK_CHARGE); + static struct attribute *file_stats_attrs[] = { + &file_alloc_attr.attr, + &file_fallback_attr.attr, + &file_fallback_charge_attr.attr, #ifdef CONFIG_SHMEM &shmem_alloc_attr.attr, &shmem_fallback_attr.attr,
Expose 3 new mTHP stats for file (pagecache) folio allocations: /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_alloc /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback /sys/kernel/mm/transparent_hugepage/hugepages-*kB/stats/file_fallback_charge This will provide some insight on the sizes of large folios being allocated for file-backed memory, and how often allocation is failing. All non-order-0 (and most order-0) folio allocations are currently done through filemap_alloc_folio(), and folios are charged in a subsequent call to filemap_add_folio(). So count file_fallback when allocation fails in filemap_alloc_folio() and count file_alloc or file_fallback_charge in filemap_add_folio(), based on whether charging succeeded or not. There are some users of filemap_add_folio() that allocate their own order-0 folio by other means, so we would not count an allocation failure in this case, but we also don't care about order-0 allocations. This approach feels like it should be good enough and doesn't require any (impractically large) refactoring. The existing mTHP stats interface is reused to provide consistency to users. And because we are reusing the same interface, we can reuse the same infrastructure on the kernel side. Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> --- Documentation/admin-guide/mm/transhuge.rst | 13 +++++++++++++ include/linux/huge_mm.h | 3 +++ include/linux/pagemap.h | 16 ++++++++++++++-- mm/filemap.c | 6 ++++-- mm/huge_memory.c | 7 +++++++ 5 files changed, 41 insertions(+), 4 deletions(-)