diff mbox series

[2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages

Message ID 20240626024924.1155558-3-ranxiaokai627@163.com (mailing list archive)
State New
Headers show
Series kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages | expand

Commit Message

ran xiaokai June 26, 2024, 2:49 a.m. UTC
From: Ran Xiaokai <ran.xiaokai@zte.com.cn>

KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
pages, which means of any order, but KPF_THP should only be set
when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df
("mm: thp: support allocation of anonymous multi-size THP"),
multiple orders of folios can be allocated and mapped to userspace,
so the folio_test_large() check is not sufficient here,
replace it with folio_test_pmd_mappable() to fix this.

Also kpageflags is not only for userspace memory but for all valid pfn
pages,including slab pages or drivers used pages, so the PG_lru and
is_anon check are unnecessary here.

Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
 fs/proc/page.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

Comments

Zi Yan June 26, 2024, 3:06 a.m. UTC | #1
On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>
> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> pages, which means of any order, but KPF_THP should only be set
> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df
> ("mm: thp: support allocation of anonymous multi-size THP"),
> multiple orders of folios can be allocated and mapped to userspace,
> so the folio_test_large() check is not sufficient here,
> replace it with folio_test_pmd_mappable() to fix this.
>
> Also kpageflags is not only for userspace memory but for all valid pfn
> pages,including slab pages or drivers used pages, so the PG_lru and
> is_anon check are unnecessary here.

But THP is userspace memory. slab pages or driver pages cannot be THP.

>
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>  fs/proc/page.c | 14 ++++----------
>  1 file changed, 4 insertions(+), 10 deletions(-)
>
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 2fb64bdb64eb..3e7b70449c2f 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page)
>  		u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
>  	else
>  		u |= 1 << KPF_COMPOUND_TAIL;
> +
Unnecessary new line.

>  	if (folio_test_hugetlb(folio))
>  		u |= 1 << KPF_HUGE;
> -	/*
> -	 * We need to check PageLRU/PageAnon
> -	 * to make sure a given page is a thp, not a non-huge compound page.
> -	 */
> -	else if (folio_test_large(folio)) {
> -		if ((k & (1 << PG_lru)) || is_anon)
> -			u |= 1 << KPF_THP;
> -		else if (is_huge_zero_folio(folio)) {
> +	else if (folio_test_pmd_mappable(folio)) {
> +		u |= 1 << KPF_THP;

lru and anon check should stay.

> +		if (is_huge_zero_folio(folio))
>  			u |= 1 << KPF_ZERO_PAGE;
> -			u |= 1 << KPF_THP;
> -		}
>  	} else if (is_zero_pfn(page_to_pfn(page)))
>  		u |= 1 << KPF_ZERO_PAGE;
>
ran xiaokai June 26, 2024, 4:32 a.m. UTC | #2
> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> > From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >
> > KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> > pages, which means of any order, but KPF_THP should only be set
> > when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df
> > ("mm: thp: support allocation of anonymous multi-size THP"),
> > multiple orders of folios can be allocated and mapped to userspace,
> > so the folio_test_large() check is not sufficient here,
> > replace it with folio_test_pmd_mappable() to fix this.
> >
> > Also kpageflags is not only for userspace memory but for all valid pfn
> > pages,including slab pages or drivers used pages, so the PG_lru and
> > is_anon check are unnecessary here.
> 
> But THP is userspace memory. slab pages or driver pages cannot be THP.

I see, the THP naming implies userspace memory. Not only compound order.
 
> >
> > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > ---
> >  fs/proc/page.c | 14 ++++----------
> >  1 file changed, 4 insertions(+), 10 deletions(-)
> >
> > diff --git a/fs/proc/page.c b/fs/proc/page.c
> > index 2fb64bdb64eb..3e7b70449c2f 100644
> > --- a/fs/proc/page.c
> > +++ b/fs/proc/page.c
> > @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page)
> >  		u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
> >  	else
> >  		u |= 1 << KPF_COMPOUND_TAIL;
> > +
> Unnecessary new line.

yes, will fix.

> 
> >  	if (folio_test_hugetlb(folio))
> >  		u |= 1 << KPF_HUGE;
> > -	/*
> > -	 * We need to check PageLRU/PageAnon
> > -	 * to make sure a given page is a thp, not a non-huge compound page.
> > -	 */
> > -	else if (folio_test_large(folio)) {
> > -		if ((k & (1 << PG_lru)) || is_anon)
> > -			u |= 1 << KPF_THP;
> > -		else if (is_huge_zero_folio(folio)) {
> > +	else if (folio_test_pmd_mappable(folio)) {
> > +		u |= 1 << KPF_THP;
> 
> lru and anon check should stay.

thanks, will fix.

> 
> > +		if (is_huge_zero_folio(folio))
> >  			u |= 1 << KPF_ZERO_PAGE;
> > -			u |= 1 << KPF_THP;
> > -		}
> >  	} else if (is_zero_pfn(page_to_pfn(page)))
> >  		u |= 1 << KPF_ZERO_PAGE;
> >  
> 
> -- 
> Best Regards,
> Yan, Zi
Ryan Roberts June 26, 2024, 11:07 a.m. UTC | #3
On 26/06/2024 04:06, Zi Yan wrote:
> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>
>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>> pages, which means of any order, but KPF_THP should only be set
>> when the folio is a 2M pmd mappable THP. 

Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
currently configured?

I would argue that mTHP is still THP so should still have the flag. And since
these smaller mTHP sizes are disabled by default, only mTHP-aware user space
will be enabling them, so I'll naively state that it should not cause compat
issues as is.

Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
sizes to function correctly. So that would need to be reworked if making this
change.

Thanks,
Ryan
Zi Yan June 26, 2024, 2:40 p.m. UTC | #4
On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
> On 26/06/2024 04:06, Zi Yan wrote:
> > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>
> >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> >> pages, which means of any order, but KPF_THP should only be set
> >> when the folio is a 2M pmd mappable THP. 
>
> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> currently configured?
>
> I would argue that mTHP is still THP so should still have the flag. And since
> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> will be enabling them, so I'll naively state that it should not cause compat
> issues as is.
>
> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> sizes to function correctly. So that would need to be reworked if making this
> change.

+ more folks working on mTHP

I agree that mTHP is still THP, but we might want different
stats/counters for it, since people might want to keep the old THP counters
consistent. See recent commits on adding mTHP counters:
ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")

and changes to make THP counter to only count PMD THP:
835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
THP split statistics")

In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
adjustment on tools/mm/thpmaps.
Ryan Roberts June 26, 2024, 2:42 p.m. UTC | #5
On 26/06/2024 15:40, Zi Yan wrote:
> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
>> On 26/06/2024 04:06, Zi Yan wrote:
>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>
>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>> pages, which means of any order, but KPF_THP should only be set
>>>> when the folio is a 2M pmd mappable THP. 
>>
>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>> currently configured?
>>
>> I would argue that mTHP is still THP so should still have the flag. And since
>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>> will be enabling them, so I'll naively state that it should not cause compat
>> issues as is.
>>
>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>> sizes to function correctly. So that would need to be reworked if making this
>> change.
> 
> + more folks working on mTHP
> 
> I agree that mTHP is still THP, but we might want different
> stats/counters for it, since people might want to keep the old THP counters
> consistent. See recent commits on adding mTHP counters:
> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
> 
> and changes to make THP counter to only count PMD THP:
> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
> THP split statistics")
> 
> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
> adjustment on tools/mm/thpmaps.

That would work for me, assuming we have KPF bits to spare?
Matthew Wilcox June 26, 2024, 3:15 p.m. UTC | #6
On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote:
> On 26/06/2024 04:06, Zi Yan wrote:
> > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>
> >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> >> pages, which means of any order, but KPF_THP should only be set
> >> when the folio is a 2M pmd mappable THP. 
> 
> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> currently configured?
> 
> I would argue that mTHP is still THP so should still have the flag. And since
> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> will be enabling them, so I'll naively state that it should not cause compat
> issues as is.
> 
> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> sizes to function correctly. So that would need to be reworked if making this
> change.

I told you you'd run into trouble calling them "mTHP" ...
Ryan Roberts June 26, 2024, 3:18 p.m. UTC | #7
On 26/06/2024 16:15, Matthew Wilcox wrote:
> On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote:
>> On 26/06/2024 04:06, Zi Yan wrote:
>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>
>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>> pages, which means of any order, but KPF_THP should only be set
>>>> when the folio is a 2M pmd mappable THP. 
>>
>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>> currently configured?
>>
>> I would argue that mTHP is still THP so should still have the flag. And since
>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>> will be enabling them, so I'll naively state that it should not cause compat
>> issues as is.
>>
>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>> sizes to function correctly. So that would need to be reworked if making this
>> change.
> 
> I told you you'd run into trouble calling them "mTHP" ...

"There are two hard things in computer science; naming, cache invalidation and
off-by-one errors"
kernel test robot June 26, 2024, 3:55 p.m. UTC | #8
Hi ran,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]
[also build test WARNING on linus/master v6.10-rc5 next-20240625]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/ran-xiaokai/mm-Constify-folio_order-folio_test_pmd_mappable/20240626-113027
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240626024924.1155558-3-ranxiaokai627%40163.com
patch subject: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages
config: parisc-allnoconfig (https://download.01.org/0day-ci/archive/20240626/202406262203.FFeFYbhP-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240626/202406262203.FFeFYbhP-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406262203.FFeFYbhP-lkp@intel.com/

All warnings (new ones prefixed by >>):

   fs/proc/page.c: In function 'stable_page_flags':
>> fs/proc/page.c:151:42: warning: passing argument 1 of 'folio_test_pmd_mappable' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers]
     151 |         else if (folio_test_pmd_mappable(folio)) {
         |                                          ^~~~~
   In file included from include/linux/mm.h:1120,
                    from include/linux/memblock.h:12,
                    from fs/proc/page.c:2:
   include/linux/huge_mm.h:438:58: note: expected 'struct folio *' but argument is of type 'const struct folio *'
     438 | static inline bool folio_test_pmd_mappable(struct folio *folio)
         |                                            ~~~~~~~~~~~~~~^~~~~


vim +151 fs/proc/page.c

   108	
   109	u64 stable_page_flags(const struct page *page)
   110	{
   111		const struct folio *folio;
   112		unsigned long k;
   113		unsigned long mapping;
   114		bool is_anon;
   115		u64 u = 0;
   116	
   117		/*
   118		 * pseudo flag: KPF_NOPAGE
   119		 * it differentiates a memory hole from a page with no flags
   120		 */
   121		if (!page)
   122			return 1 << KPF_NOPAGE;
   123		folio = page_folio(page);
   124	
   125		k = folio->flags;
   126		mapping = (unsigned long)folio->mapping;
   127		is_anon = mapping & PAGE_MAPPING_ANON;
   128	
   129		/*
   130		 * pseudo flags for the well known (anonymous) memory mapped pages
   131		 */
   132		if (page_mapped(page))
   133			u |= 1 << KPF_MMAP;
   134		if (is_anon) {
   135			u |= 1 << KPF_ANON;
   136			if (mapping & PAGE_MAPPING_KSM)
   137				u |= 1 << KPF_KSM;
   138		}
   139	
   140		/*
   141		 * compound pages: export both head/tail info
   142		 * they together define a compound page's start/end pos and order
   143		 */
   144		if (page == &folio->page)
   145			u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
   146		else
   147			u |= 1 << KPF_COMPOUND_TAIL;
   148	
   149		if (folio_test_hugetlb(folio))
   150			u |= 1 << KPF_HUGE;
 > 151		else if (folio_test_pmd_mappable(folio)) {
   152			u |= 1 << KPF_THP;
   153			if (is_huge_zero_folio(folio))
   154				u |= 1 << KPF_ZERO_PAGE;
   155		} else if (is_zero_pfn(page_to_pfn(page)))
   156			u |= 1 << KPF_ZERO_PAGE;
   157	
   158		/*
   159		 * Caveats on high order pages: PG_buddy and PG_slab will only be set
   160		 * on the head page.
   161		 */
   162		if (PageBuddy(page))
   163			u |= 1 << KPF_BUDDY;
   164		else if (page_count(page) == 0 && is_free_buddy_page(page))
   165			u |= 1 << KPF_BUDDY;
   166	
   167		if (PageOffline(page))
   168			u |= 1 << KPF_OFFLINE;
   169		if (PageTable(page))
   170			u |= 1 << KPF_PGTABLE;
   171		if (folio_test_slab(folio))
   172			u |= 1 << KPF_SLAB;
   173
kernel test robot June 26, 2024, 4:21 p.m. UTC | #9
Hi ran,

kernel test robot noticed the following build errors:

[auto build test ERROR on akpm-mm/mm-everything]
[also build test ERROR on linus/master v6.10-rc5 next-20240625]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/ran-xiaokai/mm-Constify-folio_order-folio_test_pmd_mappable/20240626-113027
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20240626024924.1155558-3-ranxiaokai627%40163.com
patch subject: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages
config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20240626/202406262300.iAURISyJ-lkp@intel.com/config)
compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240626/202406262300.iAURISyJ-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202406262300.iAURISyJ-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/proc/page.c:151:35: error: passing 'const struct folio *' to parameter of type 'struct folio *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers]
     151 |         else if (folio_test_pmd_mappable(folio)) {
         |                                          ^~~~~
   include/linux/huge_mm.h:438:58: note: passing argument to parameter 'folio' here
     438 | static inline bool folio_test_pmd_mappable(struct folio *folio)
         |                                                          ^
   1 error generated.


vim +151 fs/proc/page.c

   108	
   109	u64 stable_page_flags(const struct page *page)
   110	{
   111		const struct folio *folio;
   112		unsigned long k;
   113		unsigned long mapping;
   114		bool is_anon;
   115		u64 u = 0;
   116	
   117		/*
   118		 * pseudo flag: KPF_NOPAGE
   119		 * it differentiates a memory hole from a page with no flags
   120		 */
   121		if (!page)
   122			return 1 << KPF_NOPAGE;
   123		folio = page_folio(page);
   124	
   125		k = folio->flags;
   126		mapping = (unsigned long)folio->mapping;
   127		is_anon = mapping & PAGE_MAPPING_ANON;
   128	
   129		/*
   130		 * pseudo flags for the well known (anonymous) memory mapped pages
   131		 */
   132		if (page_mapped(page))
   133			u |= 1 << KPF_MMAP;
   134		if (is_anon) {
   135			u |= 1 << KPF_ANON;
   136			if (mapping & PAGE_MAPPING_KSM)
   137				u |= 1 << KPF_KSM;
   138		}
   139	
   140		/*
   141		 * compound pages: export both head/tail info
   142		 * they together define a compound page's start/end pos and order
   143		 */
   144		if (page == &folio->page)
   145			u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
   146		else
   147			u |= 1 << KPF_COMPOUND_TAIL;
   148	
   149		if (folio_test_hugetlb(folio))
   150			u |= 1 << KPF_HUGE;
 > 151		else if (folio_test_pmd_mappable(folio)) {
   152			u |= 1 << KPF_THP;
   153			if (is_huge_zero_folio(folio))
   154				u |= 1 << KPF_ZERO_PAGE;
   155		} else if (is_zero_pfn(page_to_pfn(page)))
   156			u |= 1 << KPF_ZERO_PAGE;
   157	
   158		/*
   159		 * Caveats on high order pages: PG_buddy and PG_slab will only be set
   160		 * on the head page.
   161		 */
   162		if (PageBuddy(page))
   163			u |= 1 << KPF_BUDDY;
   164		else if (page_count(page) == 0 && is_free_buddy_page(page))
   165			u |= 1 << KPF_BUDDY;
   166	
   167		if (PageOffline(page))
   168			u |= 1 << KPF_OFFLINE;
   169		if (PageTable(page))
   170			u |= 1 << KPF_PGTABLE;
   171		if (folio_test_slab(folio))
   172			u |= 1 << KPF_SLAB;
   173
Lance Yang June 27, 2024, 1:54 a.m. UTC | #10
On Wed, Jun 26, 2024 at 10:42 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 26/06/2024 15:40, Zi Yan wrote:
> > On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
> >> On 26/06/2024 04:06, Zi Yan wrote:
> >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>>>
> >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> >>>> pages, which means of any order, but KPF_THP should only be set
> >>>> when the folio is a 2M pmd mappable THP.
> >>
> >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> >> currently configured?
> >>
> >> I would argue that mTHP is still THP so should still have the flag. And since
> >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> >> will be enabling them, so I'll naively state that it should not cause compat
> >> issues as is.
> >>
> >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> >> sizes to function correctly. So that would need to be reworked if making this
> >> change.
> >
> > + more folks working on mTHP
> >
> > I agree that mTHP is still THP, but we might want different
> > stats/counters for it, since people might want to keep the old THP counters
> > consistent. See recent commits on adding mTHP counters:
> > ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
> > counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
> >
> > and changes to make THP counter to only count PMD THP:
> > 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
> > THP split statistics")
> >
> > In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
> > adjustment on tools/mm/thpmaps.
>
> That would work for me, assuming we have KPF bits to spare?

+1

Let's check on that and see if we're good ;)

Thanks,
Lance
>
Lance Yang June 27, 2024, 2:07 a.m. UTC | #11
On Wed, Jun 26, 2024 at 11:18 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 26/06/2024 16:15, Matthew Wilcox wrote:
> > On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote:
> >> On 26/06/2024 04:06, Zi Yan wrote:
> >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>>>
> >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> >>>> pages, which means of any order, but KPF_THP should only be set
> >>>> when the folio is a 2M pmd mappable THP.
> >>
> >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> >> currently configured?
> >>
> >> I would argue that mTHP is still THP so should still have the flag. And since
> >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> >> will be enabling them, so I'll naively state that it should not cause compat
> >> issues as is.
> >>
> >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> >> sizes to function correctly. So that would need to be reworked if making this
> >> change.
> >
> > I told you you'd run into trouble calling them "mTHP" ...
>
> "There are two hard things in computer science; naming, cache invalidation and
> off-by-one errors"

Totally agree. Naming things can be surprisingly challenging ;)

Thanks,
Lance

>
Barry Song June 27, 2024, 4:10 a.m. UTC | #12
On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
> > On 26/06/2024 04:06, Zi Yan wrote:
> > > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> > >>
> > >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> > >> pages, which means of any order, but KPF_THP should only be set
> > >> when the folio is a 2M pmd mappable THP.
> >
> > Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> > currently configured?
> >
> > I would argue that mTHP is still THP so should still have the flag. And since
> > these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> > will be enabling them, so I'll naively state that it should not cause compat
> > issues as is.
> >
> > Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> > sizes to function correctly. So that would need to be reworked if making this
> > change.
>
> + more folks working on mTHP
>
> I agree that mTHP is still THP, but we might want different
> stats/counters for it, since people might want to keep the old THP counters
> consistent. See recent commits on adding mTHP counters:
> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
>
> and changes to make THP counter to only count PMD THP:
> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
> THP split statistics")
>
> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
> adjustment on tools/mm/thpmaps.

It seems we have to do this though I think keeping KPF_THP and adding a
separate bit like KPF_PMD_MAPPED makes more sense. but those tools
relying on KPF_THP need to realize this and check the new bit , which is
not done now.
whether the mTHP's name is mTHP or THP will make no difference for
this case:-)

>
>
> --
> Best Regards,
> Yan, Zi
>

Thanks
Barry
Ryan Roberts June 27, 2024, 8:39 a.m. UTC | #13
On 27/06/2024 05:10, Barry Song wrote:
> On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
>>> On 26/06/2024 04:06, Zi Yan wrote:
>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>>
>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>>> pages, which means of any order, but KPF_THP should only be set
>>>>> when the folio is a 2M pmd mappable THP.
>>>
>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>>> currently configured?
>>>
>>> I would argue that mTHP is still THP so should still have the flag. And since
>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>>> will be enabling them, so I'll naively state that it should not cause compat
>>> issues as is.
>>>
>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>>> sizes to function correctly. So that would need to be reworked if making this
>>> change.
>>
>> + more folks working on mTHP
>>
>> I agree that mTHP is still THP, but we might want different
>> stats/counters for it, since people might want to keep the old THP counters
>> consistent. See recent commits on adding mTHP counters:
>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
>>
>> and changes to make THP counter to only count PMD THP:
>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
>> THP split statistics")
>>
>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
>> adjustment on tools/mm/thpmaps.
> 
> It seems we have to do this though I think keeping KPF_THP and adding a
> separate bit like KPF_PMD_MAPPED makes more sense. but those tools
> relying on KPF_THP need to realize this and check the new bit , which is
> not done now.
> whether the mTHP's name is mTHP or THP will make no difference for
> this case:-)

I don't quite follow your logic for that last part; If there are 2 separate
bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that
would be a safe/compatible approach, right? Where as your suggestion requires
changes to existing tools to work.

Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new
flag; We don't currently expose the term "mTHP" to user space. I can't think of
a better name though.

I'd still like to understand what is actually broken that this change is fixing.
Is the concern that a user could see KPF_THP and advance forward by
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries?

> 
>>
>>
>> --
>> Best Regards,
>> Yan, Zi
>>
> 
> Thanks
> Barry
Barry Song June 27, 2024, 9:16 a.m. UTC | #14
On Thu, Jun 27, 2024 at 8:39 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 27/06/2024 05:10, Barry Song wrote:
> > On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote:
> >>
> >> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
> >>> On 26/06/2024 04:06, Zi Yan wrote:
> >>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
> >>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> >>>>>
> >>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> >>>>> pages, which means of any order, but KPF_THP should only be set
> >>>>> when the folio is a 2M pmd mappable THP.
> >>>
> >>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
> >>> currently configured?
> >>>
> >>> I would argue that mTHP is still THP so should still have the flag. And since
> >>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
> >>> will be enabling them, so I'll naively state that it should not cause compat
> >>> issues as is.
> >>>
> >>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
> >>> sizes to function correctly. So that would need to be reworked if making this
> >>> change.
> >>
> >> + more folks working on mTHP
> >>
> >> I agree that mTHP is still THP, but we might want different
> >> stats/counters for it, since people might want to keep the old THP counters
> >> consistent. See recent commits on adding mTHP counters:
> >> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
> >> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
> >>
> >> and changes to make THP counter to only count PMD THP:
> >> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
> >> THP split statistics")
> >>
> >> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
> >> adjustment on tools/mm/thpmaps.
> >
> > It seems we have to do this though I think keeping KPF_THP and adding a
> > separate bit like KPF_PMD_MAPPED makes more sense. but those tools
> > relying on KPF_THP need to realize this and check the new bit , which is
> > not done now.
> > whether the mTHP's name is mTHP or THP will make no difference for
> > this case:-)
>
> I don't quite follow your logic for that last part; If there are 2 separate
> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that
> would be a safe/compatible approach, right? Where as your suggestion requires
> changes to existing tools to work.

Right, my point is that mTHP and THP are both types of THP. The only difference
is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how
the page is mapped would more accurately reflect reality. However, this change
would disrupt tools that assume KPF_THP always means PMD-mapped THP.
Therefore, we would still need separate bits for THP and mTHP in this case.

I saw Willy complain about mTHP being called "mTHP," but in this case, calling
it "mTHP" or just "THP" doesn't change anything if old tools continue to assume
that KPF_THP means PMD-mapped THP.

>
> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new
> flag; We don't currently expose the term "mTHP" to user space. I can't think of
> a better name though.

Yes.  If "compatibility" is a requirement, we cannot disregard it.

> I'd still like to understand what is actually broken that this change is fixing.
> Is the concern that a user could see KPF_THP and advance forward by
> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries?
>

Maybe we need an example which is thinking that KPF_THP is PMD-mapped.

> >
> >>
> >>
> >> --
> >> Best Regards,
> >> Yan, Zi
> >>
> >

Thanks
Barry
Ryan Roberts June 27, 2024, 9:27 a.m. UTC | #15
On 27/06/2024 10:16, Barry Song wrote:
> On Thu, Jun 27, 2024 at 8:39 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>
>> On 27/06/2024 05:10, Barry Song wrote:
>>> On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote:
>>>>
>>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
>>>>> On 26/06/2024 04:06, Zi Yan wrote:
>>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>>>>
>>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>>>>> pages, which means of any order, but KPF_THP should only be set
>>>>>>> when the folio is a 2M pmd mappable THP.
>>>>>
>>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>>>>> currently configured?
>>>>>
>>>>> I would argue that mTHP is still THP so should still have the flag. And since
>>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>>>>> will be enabling them, so I'll naively state that it should not cause compat
>>>>> issues as is.
>>>>>
>>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>>>>> sizes to function correctly. So that would need to be reworked if making this
>>>>> change.
>>>>
>>>> + more folks working on mTHP
>>>>
>>>> I agree that mTHP is still THP, but we might want different
>>>> stats/counters for it, since people might want to keep the old THP counters
>>>> consistent. See recent commits on adding mTHP counters:
>>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
>>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
>>>>
>>>> and changes to make THP counter to only count PMD THP:
>>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
>>>> THP split statistics")
>>>>
>>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
>>>> adjustment on tools/mm/thpmaps.
>>>
>>> It seems we have to do this though I think keeping KPF_THP and adding a
>>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools
>>> relying on KPF_THP need to realize this and check the new bit , which is
>>> not done now.
>>> whether the mTHP's name is mTHP or THP will make no difference for
>>> this case:-)
>>
>> I don't quite follow your logic for that last part; If there are 2 separate
>> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that
>> would be a safe/compatible approach, right? Where as your suggestion requires
>> changes to existing tools to work.
> 
> Right, my point is that mTHP and THP are both types of THP. The only difference
> is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how
> the page is mapped would more accurately reflect reality. However, this change
> would disrupt tools that assume KPF_THP always means PMD-mapped THP.
> Therefore, we would still need separate bits for THP and mTHP in this case.

I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is
whether PKF_THP implies a fixed size (and alignment). If compat is an issue,
then PKF_THP must continue to imply PMD-size. If compat is not an issue, then
size can be determined by iterating over the entries.

Having a mechanism to determine the level at which a block is mapped would
potentially be a useful feature, but seems orthogonal to me.

> 
> I saw Willy complain about mTHP being called "mTHP," but in this case, calling
> it "mTHP" or just "THP" doesn't change anything if old tools continue to assume
> that KPF_THP means PMD-mapped THP.

I think Willy was just ribbing me because he preferred calling it "anonymous
large folios". That's how I took it anyway.

> 
>>
>> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new
>> flag; We don't currently expose the term "mTHP" to user space. I can't think of
>> a better name though.
> 
> Yes.  If "compatibility" is a requirement, we cannot disregard it.
> 
>> I'd still like to understand what is actually broken that this change is fixing.
>> Is the concern that a user could see KPF_THP and advance forward by
>> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries?
>>
> 
> Maybe we need an example which is thinking that KPF_THP is PMD-mapped.

Yes, that would help.

> 
>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>> Yan, Zi
>>>>
>>>
> 
> Thanks
> Barry
ran xiaokai June 27, 2024, 12:46 p.m. UTC | #16
>On 27/06/2024 10:16, Barry Song wrote:
>> On Thu, Jun 27, 2024 at 8:39?PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>>>
>>> On 27/06/2024 05:10, Barry Song wrote:
>>>> On Thu, Jun 27, 2024 at 2:40?AM Zi Yan <ziy@nvidia.com> wrote:
>>>>>
>>>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote:
>>>>>> On 26/06/2024 04:06, Zi Yan wrote:
>>>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote:
>>>>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>>>>>>>>
>>>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>>>>>>>> pages, which means of any order, but KPF_THP should only be set
>>>>>>>> when the folio is a 2M pmd mappable THP.
>>>>>>
>>>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is
>>>>>> currently configured?
>>>>>>
>>>>>> I would argue that mTHP is still THP so should still have the flag. And since
>>>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space
>>>>>> will be enabling them, so I'll naively state that it should not cause compat
>>>>>> issues as is.
>>>>>>
>>>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP
>>>>>> sizes to function correctly. So that would need to be reworked if making this
>>>>>> change.
>>>>>
>>>>> + more folks working on mTHP
>>>>>
>>>>> I agree that mTHP is still THP, but we might want different
>>>>> stats/counters for it, since people might want to keep the old THP counters
>>>>> consistent. See recent commits on adding mTHP counters:
>>>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback
>>>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem")
>>>>>
>>>>> and changes to make THP counter to only count PMD THP:
>>>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for
>>>>> THP split statistics")
>>>>>
>>>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some
>>>>> adjustment on tools/mm/thpmaps.
>>>>
>>>> It seems we have to do this though I think keeping KPF_THP and adding a
>>>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools
>>>> relying on KPF_THP need to realize this and check the new bit , which is
>>>> not done now.
>>>> whether the mTHP's name is mTHP or THP will make no difference for
>>>> this case:-)
>>>
>>> I don't quite follow your logic for that last part; If there are 2 separate
>>> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that
>>> would be a safe/compatible approach, right? Where as your suggestion requires
>>> changes to existing tools to work.
>> 
>> Right, my point is that mTHP and THP are both types of THP. The only difference
>> is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how
>> the page is mapped would more accurately reflect reality. However, this change
>> would disrupt tools that assume KPF_THP always means PMD-mapped THP.
>> Therefore, we would still need separate bits for THP and mTHP in this case.
>
>I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is
>whether PKF_THP implies a fixed size (and alignment). If compat is an issue,
>then PKF_THP must continue to imply PMD-size. If compat is not an issue, then
>size can be determined by iterating over the entries.
>
>Having a mechanism to determine the level at which a block is mapped would
>potentially be a useful feature, but seems orthogonal to me.
>
>> 
>> I saw Willy complain about mTHP being called "mTHP," but in this case, calling
>> it "mTHP" or just "THP" doesn't change anything if old tools continue to assume
>> that KPF_THP means PMD-mapped THP.
>
>I think Willy was just ribbing me because he preferred calling it "anonymous
>large folios". That's how I took it anyway.
>
>> 
>>>
>>> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new
>>> flag; We don't currently expose the term "mTHP" to user space. I can't think of
>>> a better name though.
>> 
>> Yes.  If "compatibility" is a requirement, we cannot disregard it.
>> 
>>> I'd still like to understand what is actually broken that this change is fixing.
>>> Is the concern that a user could see KPF_THP and advance forward by
>>> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries?
>>>
>> 
>> Maybe we need an example which is thinking that KPF_THP is PMD-mapped.
>
>Yes, that would help.

For now it is the testcase in tools/testing/selftests/mm/split_huge_page_test,
if we try to split THP to other orders other than 0, the testcase will break.

Maybe we can use KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL to figure out
the compound page's start/end and the order. But these two flags are not
for userspace memory only.
David Hildenbrand June 27, 2024, 1:54 p.m. UTC | #17
On 26.06.24 04:49, ran xiaokai wrote:
> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> 
> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
> pages, which means of any order, but KPF_THP should only be set
> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df

"should only be set" -- who says that? :)

The documentation only talks about "Contiguous pages which construct 
transparent hugepages". Sure, when it was added there were only PMD ones.


> ("mm: thp: support allocation of anonymous multi-size THP"),
> multiple orders of folios can be allocated and mapped to userspace,
> so the folio_test_large() check is not sufficient here,
> replace it with folio_test_pmd_mappable() to fix this.
> 

A couple of points:

1) If I am not daydreaming, ever since we supported non-PMD THP in the
    pagecache (much longer than anon mTHP), we already indicate KPF_THP
    for them here. So this is not anon specific? Or am I getting the
    PG_lru check all wrong?

2) Anon THP are disabled as default. If some custom tool cannot deal
    with that "extension" we did with smaller THP, it shall be updated if
    it really has to special-case PMD-mapped THP, before enabled by the
    admin.


I think this interface does exactly what we want, as it is right now. 
Unless there is *good* reason, we should keep it like that.

So I suggest

a) Extend the documentation to just state "THP of any size and any 
mapping granularity" or sth like that.

b) Maybe using folio_test_large_rmappable() instead of "(k & (1 <<
    PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated
    properly.

c) Whoever is interested in getting the folio size, use this flag along
    with a scan for the KPF_COMPOUND_HEAD.


I'll note that, scanning documentation, PAGE_IS_HUGE currently only 
applies to PMD *mapped* THP. It doesn't consider PTE-mapped THP at all 
(including PMD-ones). Likely, documentation should be updated to state 
"PMD-mapped THP or any hugetlb page".

> Also kpageflags is not only for userspace memory but for all valid pfn
> pages,including slab pages or drivers used pages, so the PG_lru and
> is_anon check are unnecessary here.

I'm completely confused about that statements. We do have KPF_OFFLINE, 
KPF_PGTABLE, KPF_SLAB, ... I'm missing something important.

> 
> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
> ---
>   fs/proc/page.c | 14 ++++----------
>   1 file changed, 4 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/proc/page.c b/fs/proc/page.c
> index 2fb64bdb64eb..3e7b70449c2f 100644
> --- a/fs/proc/page.c
> +++ b/fs/proc/page.c
> @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page)
>   		u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
>   	else
>   		u |= 1 << KPF_COMPOUND_TAIL;
> +
>   	if (folio_test_hugetlb(folio))
>   		u |= 1 << KPF_HUGE;
> -	/*
> -	 * We need to check PageLRU/PageAnon
> -	 * to make sure a given page is a thp, not a non-huge compound page.
> -	 */
> -	else if (folio_test_large(folio)) {
> -		if ((k & (1 << PG_lru)) || is_anon)
> -			u |= 1 << KPF_THP;
> -		else if (is_huge_zero_folio(folio)) {
> +	else if (folio_test_pmd_mappable(folio)) {

folio_test_pmd_mappable() would not be future safe. It includes anything 
 >= PMD_ORDER as well.
ran xiaokai June 28, 2024, 3:01 a.m. UTC | #18
>On 26.06.24 04:49, ran xiaokai wrote:
>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> 
>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound
>> pages, which means of any order, but KPF_THP should only be set
>> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df
>
>"should only be set" -- who says that? :)
>
>The documentation only talks about "Contiguous pages which construct 
>transparent hugepages". Sure, when it was added there were only PMD ones.

Hi, David,
I am working on tools/testing/selftests/mm/split_huge_page_test to support
splitint THP to more orders, not only order-0, and the testcases failed.
It seems the testcodes rely on KPF_THP to indicate only 2M THP.

>> ("mm: thp: support allocation of anonymous multi-size THP"),
>> multiple orders of folios can be allocated and mapped to userspace,
>> so the folio_test_large() check is not sufficient here,
>> replace it with folio_test_pmd_mappable() to fix this.
>> 
>
>A couple of points:
>
>1) If I am not daydreaming, ever since we supported non-PMD THP in the
>    pagecache (much longer than anon mTHP), we already indicate KPF_THP
>    for them here. So this is not anon specific? Or am I getting the
>    PG_lru check all wrong?

Thanks for the clarification.

>2) Anon THP are disabled as default. If some custom tool cannot deal
>    with that "extension" we did with smaller THP, it shall be updated if
>    it really has to special-case PMD-mapped THP, before enabled by the
>    admin.

ok, it seems that it is the testcodes which should be updated.

>
>I think this interface does exactly what we want, as it is right now. 
>Unless there is *good* reason, we should keep it like that.
>
>So I suggest
>
>a) Extend the documentation to just state "THP of any size and any 
>mapping granularity" or sth like that.

yes, it is neccessay to update the documentation to make it more clear.
Since the definition of KPF_THP has been expanded since it was firstly introduced.

>b) Maybe using folio_test_large_rmappable() instead of "(k & (1 <<
>    PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated
>    properly.

i will investigate on this.

>c) Whoever is interested in getting the folio size, use this flag along
>    with a scan for the KPF_COMPOUND_HEAD.

yes, we can use the combination of KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL
to figure out the compound order.

>
>I'll note that, scanning documentation, PAGE_IS_HUGE currently only 
>applies to PMD *mapped* THP. It doesn't consider PTE-mapped THP at all 
>(including PMD-ones). Likely, documentation should be updated to state 
>"PMD-mapped THP or any hugetlb page".

i will also investigate on this.

>> Also kpageflags is not only for userspace memory but for all valid pfn
>> pages,including slab pages or drivers used pages, so the PG_lru and
>> is_anon check are unnecessary here.
>
>I'm completely confused about that statements. We do have KPF_OFFLINE, 
>KPF_PGTABLE, KPF_SLAB, ... I'm missing something important.

My statement is wrong, please ignore this.

>> 
>> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
>> ---
>>   fs/proc/page.c | 14 ++++----------
>>   1 file changed, 4 insertions(+), 10 deletions(-)
>> 
>> diff --git a/fs/proc/page.c b/fs/proc/page.c
>> index 2fb64bdb64eb..3e7b70449c2f 100644
>> --- a/fs/proc/page.c
>> +++ b/fs/proc/page.c
>> @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page)
>>   		u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
>>   	else
>>   		u |= 1 << KPF_COMPOUND_TAIL;
>> +
>>   	if (folio_test_hugetlb(folio))
>>   		u |= 1 << KPF_HUGE;
>> -	/*
>> -	 * We need to check PageLRU/PageAnon
>> -	 * to make sure a given page is a thp, not a non-huge compound page.
>> -	 */
>> -	else if (folio_test_large(folio)) {
>> -		if ((k & (1 << PG_lru)) || is_anon)
>> -			u |= 1 << KPF_THP;
>> -		else if (is_huge_zero_folio(folio)) {
>> +	else if (folio_test_pmd_mappable(folio)) {
>
>folio_test_pmd_mappable() would not be future safe. It includes anything 
> >= PMD_ORDER as well.
>
>
>-- 
>Cheers,
>
>David / dhildenb
diff mbox series

Patch

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 2fb64bdb64eb..3e7b70449c2f 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -146,19 +146,13 @@  u64 stable_page_flags(const struct page *page)
 		u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head);
 	else
 		u |= 1 << KPF_COMPOUND_TAIL;
+
 	if (folio_test_hugetlb(folio))
 		u |= 1 << KPF_HUGE;
-	/*
-	 * We need to check PageLRU/PageAnon
-	 * to make sure a given page is a thp, not a non-huge compound page.
-	 */
-	else if (folio_test_large(folio)) {
-		if ((k & (1 << PG_lru)) || is_anon)
-			u |= 1 << KPF_THP;
-		else if (is_huge_zero_folio(folio)) {
+	else if (folio_test_pmd_mappable(folio)) {
+		u |= 1 << KPF_THP;
+		if (is_huge_zero_folio(folio))
 			u |= 1 << KPF_ZERO_PAGE;
-			u |= 1 << KPF_THP;
-		}
 	} else if (is_zero_pfn(page_to_pfn(page)))
 		u |= 1 << KPF_ZERO_PAGE;