diff mbox series

[v3,2/2] mm: thp: fix false negative of shmem vma's THP eligibility

Message ID 1560401041-32207-3-git-send-email-yang.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series Fix false negative of shmem vma's THP eligibility | expand

Commit Message

Yang Shi June 13, 2019, 4:44 a.m. UTC
The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps. But, when checking
the eligibility for shmem vma, __transparent_hugepage_enabled() is
called to override the result from shmem_huge_enabled().  It may result
in the anonymous vma's THP flag override shmem's.  For example, running a
simple test which create THP for shmem, but with anonymous THP disabled,
when reading the process's smaps, it may show:

7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
Size:               4096 kB
...
[snip]
...
ShmemPmdMapped:     4096 kB
...
[snip]
...
THPeligible:    0

And, /proc/meminfo does show THP allocated and PMD mapped too:

ShmemHugePages:     4096 kB
ShmemPmdMapped:     4096 kB

This doesn't make too much sense.  The shmem objects should be treated
separately from anonymous THP.  Calling shmem_huge_enabled() with checking
MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
dax vma check since we already checked if the vma is shmem already.

Also check if vma is suitable for THP by calling
transhuge_vma_suitable().

And minor fix to smaps output format and documentation.

Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Cc: Hugh Dickins <hughd@google.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 Documentation/filesystems/proc.txt | 4 ++--
 fs/proc/task_mmu.c                 | 3 ++-
 mm/huge_memory.c                   | 9 +++++++--
 mm/shmem.c                         | 3 +++
 4 files changed, 14 insertions(+), 5 deletions(-)

Comments

Vlastimil Babka June 19, 2019, 12:12 p.m. UTC | #1
On 6/13/19 6:44 AM, Yang Shi wrote:
> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> vma") introduced THPeligible bit for processes' smaps. But, when checking
> the eligibility for shmem vma, __transparent_hugepage_enabled() is
> called to override the result from shmem_huge_enabled().  It may result
> in the anonymous vma's THP flag override shmem's.  For example, running a
> simple test which create THP for shmem, but with anonymous THP disabled,
> when reading the process's smaps, it may show:

...

> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 01d4eb0..6a13882 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -796,7 +796,8 @@ static int show_smap(struct seq_file *m, void *v)
>  
>  	__show_smap(m, &mss);
>  
> -	seq_printf(m, "THPeligible:    %d\n", transparent_hugepage_enabled(vma));
> +	seq_printf(m, "THPeligible:		%d\n",
> +		   transparent_hugepage_enabled(vma));
>  
>  	if (arch_pkeys_enabled())
>  		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4bc2552..36f0225 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -65,10 +65,15 @@
>  
>  bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>  {
> +	/* The addr is used to check if the vma size fits */
> +	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> +
> +	if (!transhuge_vma_suitable(vma, addr))
> +		return false;

Sorry for replying rather late, and not in the v2 thread, but unlike
Hugh I'm not convinced that we should include vma size/alignment in the
test for reporting THPeligible, which was supposed to reflect
administrative settings and madvise hints. I guess it's mostly a matter
of personal feeling. But one objective distinction is that the admin
settings and madvise do have an exact binary result for the whole VMA,
while this check is more fuzzy - only part of the VMA's span might be
properly sized+aligned, and THPeligible will be 1 for the whole VMA.

>  	if (vma_is_anonymous(vma))
>  		return __transparent_hugepage_enabled(vma);
> -	if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
> -		return __transparent_hugepage_enabled(vma);
> +	if (vma_is_shmem(vma))
> +		return shmem_huge_enabled(vma);
>  
>  	return false;
>  }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1bb3b8d..a807712 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3872,6 +3872,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>  	loff_t i_size;
>  	pgoff_t off;
>  
> +	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> +	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +		return false;
>  	if (shmem_huge == SHMEM_HUGE_FORCE)
>  		return true;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
>
Yang Shi June 19, 2019, 4:28 p.m. UTC | #2
On 6/19/19 5:12 AM, Vlastimil Babka wrote:
> On 6/13/19 6:44 AM, Yang Shi wrote:
>> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
>> vma") introduced THPeligible bit for processes' smaps. But, when checking
>> the eligibility for shmem vma, __transparent_hugepage_enabled() is
>> called to override the result from shmem_huge_enabled().  It may result
>> in the anonymous vma's THP flag override shmem's.  For example, running a
>> simple test which create THP for shmem, but with anonymous THP disabled,
>> when reading the process's smaps, it may show:
> ...
>
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 01d4eb0..6a13882 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -796,7 +796,8 @@ static int show_smap(struct seq_file *m, void *v)
>>   
>>   	__show_smap(m, &mss);
>>   
>> -	seq_printf(m, "THPeligible:    %d\n", transparent_hugepage_enabled(vma));
>> +	seq_printf(m, "THPeligible:		%d\n",
>> +		   transparent_hugepage_enabled(vma));
>>   
>>   	if (arch_pkeys_enabled())
>>   		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
>> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
>> index 4bc2552..36f0225 100644
>> --- a/mm/huge_memory.c
>> +++ b/mm/huge_memory.c
>> @@ -65,10 +65,15 @@
>>   
>>   bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>>   {
>> +	/* The addr is used to check if the vma size fits */
>> +	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
>> +
>> +	if (!transhuge_vma_suitable(vma, addr))
>> +		return false;
> Sorry for replying rather late, and not in the v2 thread, but unlike
> Hugh I'm not convinced that we should include vma size/alignment in the
> test for reporting THPeligible, which was supposed to reflect
> administrative settings and madvise hints. I guess it's mostly a matter
> of personal feeling. But one objective distinction is that the admin
> settings and madvise do have an exact binary result for the whole VMA,
> while this check is more fuzzy - only part of the VMA's span might be
> properly sized+aligned, and THPeligible will be 1 for the whole VMA.

I think THPeligible is used to tell us if the vma is suitable for 
allocating THP. Both anonymous and shmem THP checks vma size/alignment 
to decide to or not to allocate THP.

And, if vma size/alignment is not checked, THPeligible may show "true" 
for even 4K mapping. This doesn't make too much sense either.

>
>>   	if (vma_is_anonymous(vma))
>>   		return __transparent_hugepage_enabled(vma);
>> -	if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
>> -		return __transparent_hugepage_enabled(vma);
>> +	if (vma_is_shmem(vma))
>> +		return shmem_huge_enabled(vma);
>>   
>>   	return false;
>>   }
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 1bb3b8d..a807712 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -3872,6 +3872,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>>   	loff_t i_size;
>>   	pgoff_t off;
>>   
>> +	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
>> +	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
>> +		return false;
>>   	if (shmem_huge == SHMEM_HUGE_FORCE)
>>   		return true;
>>   	if (shmem_huge == SHMEM_HUGE_DENY)
>>
Hugh Dickins July 17, 2019, 7:44 p.m. UTC | #3
On Thu, 13 Jun 2019, Yang Shi wrote:

> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
> vma") introduced THPeligible bit for processes' smaps. But, when checking
> the eligibility for shmem vma, __transparent_hugepage_enabled() is
> called to override the result from shmem_huge_enabled().  It may result
> in the anonymous vma's THP flag override shmem's.  For example, running a
> simple test which create THP for shmem, but with anonymous THP disabled,
> when reading the process's smaps, it may show:
> 
> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
> Size:               4096 kB
> ...
> [snip]
> ...
> ShmemPmdMapped:     4096 kB
> ...
> [snip]
> ...
> THPeligible:    0
> 
> And, /proc/meminfo does show THP allocated and PMD mapped too:
> 
> ShmemHugePages:     4096 kB
> ShmemPmdMapped:     4096 kB
> 
> This doesn't make too much sense.  The shmem objects should be treated
> separately from anonymous THP.  Calling shmem_huge_enabled() with checking
> MMF_DISABLE_THP sounds good enough.  And, we could skip stack and
> dax vma check since we already checked if the vma is shmem already.
> 
> Also check if vma is suitable for THP by calling
> transhuge_vma_suitable().
> 
> And minor fix to smaps output format and documentation.
> 
> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
> Cc: Hugh Dickins <hughd@google.com>

Thanks,
Acked-by: Hugh Dickins <hughd@google.com>

> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: David Rientjes <rientjes@google.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  Documentation/filesystems/proc.txt | 4 ++--
>  fs/proc/task_mmu.c                 | 3 ++-
>  mm/huge_memory.c                   | 9 +++++++--
>  mm/shmem.c                         | 3 +++
>  4 files changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 66cad5c..b0ded06 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -477,8 +477,8 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
>  "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
>  does not take into account swapped out page of underlying shmem objects.
>  "Locked" indicates whether the mapping is locked in memory or not.
> -"THPeligible" indicates whether the mapping is eligible for THP pages - 1 if
> -true, 0 otherwise.
> +"THPeligible" indicates whether the mapping is eligible for allocating THP
> +pages - 1 if true, 0 otherwise. It just shows the current status.
>  
>  "VmFlags" field deserves a separate description. This member represents the kernel
>  flags associated with the particular virtual memory area in two letter encoded
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 01d4eb0..6a13882 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -796,7 +796,8 @@ static int show_smap(struct seq_file *m, void *v)
>  
>  	__show_smap(m, &mss);
>  
> -	seq_printf(m, "THPeligible:    %d\n", transparent_hugepage_enabled(vma));
> +	seq_printf(m, "THPeligible:		%d\n",
> +		   transparent_hugepage_enabled(vma));
>  
>  	if (arch_pkeys_enabled())
>  		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 4bc2552..36f0225 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -65,10 +65,15 @@
>  
>  bool transparent_hugepage_enabled(struct vm_area_struct *vma)
>  {
> +	/* The addr is used to check if the vma size fits */
> +	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
> +
> +	if (!transhuge_vma_suitable(vma, addr))
> +		return false;
>  	if (vma_is_anonymous(vma))
>  		return __transparent_hugepage_enabled(vma);
> -	if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
> -		return __transparent_hugepage_enabled(vma);
> +	if (vma_is_shmem(vma))
> +		return shmem_huge_enabled(vma);
>  
>  	return false;
>  }
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 1bb3b8d..a807712 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -3872,6 +3872,9 @@ bool shmem_huge_enabled(struct vm_area_struct *vma)
>  	loff_t i_size;
>  	pgoff_t off;
>  
> +	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
> +	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
> +		return false;
>  	if (shmem_huge == SHMEM_HUGE_FORCE)
>  		return true;
>  	if (shmem_huge == SHMEM_HUGE_DENY)
> -- 
> 1.8.3.1
Andrew Morton July 18, 2019, 9:44 p.m. UTC | #4
On Wed, 19 Jun 2019 09:28:42 -0700 Yang Shi <yang.shi@linux.alibaba.com> wrote:

> > Sorry for replying rather late, and not in the v2 thread, but unlike
> > Hugh I'm not convinced that we should include vma size/alignment in the
> > test for reporting THPeligible, which was supposed to reflect
> > administrative settings and madvise hints. I guess it's mostly a matter
> > of personal feeling. But one objective distinction is that the admin
> > settings and madvise do have an exact binary result for the whole VMA,
> > while this check is more fuzzy - only part of the VMA's span might be
> > properly sized+aligned, and THPeligible will be 1 for the whole VMA.
> 
> I think THPeligible is used to tell us if the vma is suitable for 
> allocating THP. Both anonymous and shmem THP checks vma size/alignment 
> to decide to or not to allocate THP.
> 
> And, if vma size/alignment is not checked, THPeligible may show "true" 
> for even 4K mapping. This doesn't make too much sense either.

This discussion seems rather inconclusive.  I'll merge up the patchset
anyway.  Vlastimil, if you think some changes are needed here then
please let's get them sorted out over the next few weeks?
Vlastimil Babka July 18, 2019, 9:52 p.m. UTC | #5
On 7/18/19 11:44 PM, Andrew Morton wrote:
> On Wed, 19 Jun 2019 09:28:42 -0700 Yang Shi <yang.shi@linux.alibaba.com> wrote:
> 
>>> Sorry for replying rather late, and not in the v2 thread, but unlike
>>> Hugh I'm not convinced that we should include vma size/alignment in the
>>> test for reporting THPeligible, which was supposed to reflect
>>> administrative settings and madvise hints. I guess it's mostly a matter
>>> of personal feeling. But one objective distinction is that the admin
>>> settings and madvise do have an exact binary result for the whole VMA,
>>> while this check is more fuzzy - only part of the VMA's span might be
>>> properly sized+aligned, and THPeligible will be 1 for the whole VMA.
>>
>> I think THPeligible is used to tell us if the vma is suitable for 
>> allocating THP. Both anonymous and shmem THP checks vma size/alignment 
>> to decide to or not to allocate THP.
>>
>> And, if vma size/alignment is not checked, THPeligible may show "true" 
>> for even 4K mapping. This doesn't make too much sense either.
> 
> This discussion seems rather inconclusive.  I'll merge up the patchset
> anyway.  Vlastimil, if you think some changes are needed here then
> please let's get them sorted out over the next few weeks?

Well, Hugh did ack it, albeit without commenting on this part. I don't
feel strongly enough about this for a nack.
Hugh Dickins July 18, 2019, 10:06 p.m. UTC | #6
On Thu, 18 Jul 2019, Vlastimil Babka wrote:
> On 7/18/19 11:44 PM, Andrew Morton wrote:
> > On Wed, 19 Jun 2019 09:28:42 -0700 Yang Shi <yang.shi@linux.alibaba.com> wrote:
> > 
> >>> Sorry for replying rather late, and not in the v2 thread, but unlike
> >>> Hugh I'm not convinced that we should include vma size/alignment in the
> >>> test for reporting THPeligible, which was supposed to reflect
> >>> administrative settings and madvise hints. I guess it's mostly a matter
> >>> of personal feeling. But one objective distinction is that the admin
> >>> settings and madvise do have an exact binary result for the whole VMA,
> >>> while this check is more fuzzy - only part of the VMA's span might be
> >>> properly sized+aligned, and THPeligible will be 1 for the whole VMA.
> >>
> >> I think THPeligible is used to tell us if the vma is suitable for 
> >> allocating THP. Both anonymous and shmem THP checks vma size/alignment 
> >> to decide to or not to allocate THP.
> >>
> >> And, if vma size/alignment is not checked, THPeligible may show "true" 
> >> for even 4K mapping. This doesn't make too much sense either.
> > 
> > This discussion seems rather inconclusive.  I'll merge up the patchset
> > anyway.  Vlastimil, if you think some changes are needed here then
> > please let's get them sorted out over the next few weeks?
> 
> Well, Hugh did ack it, albeit without commenting on this part. I don't
> feel strongly enough about this for a nack.

Right, I had no further comment: Yang and I agreed one way round,
you thought the other way.  I was more persuaded by Yang's 4kB
example than by what you wrote; but not hugely excited either way.

Hugh
diff mbox series

Patch

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 66cad5c..b0ded06 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -477,8 +477,8 @@  replaced by copy-on-write) part of the underlying shmem object out on swap.
 "SwapPss" shows proportional swap share of this mapping. Unlike "Swap", this
 does not take into account swapped out page of underlying shmem objects.
 "Locked" indicates whether the mapping is locked in memory or not.
-"THPeligible" indicates whether the mapping is eligible for THP pages - 1 if
-true, 0 otherwise.
+"THPeligible" indicates whether the mapping is eligible for allocating THP
+pages - 1 if true, 0 otherwise. It just shows the current status.
 
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 01d4eb0..6a13882 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -796,7 +796,8 @@  static int show_smap(struct seq_file *m, void *v)
 
 	__show_smap(m, &mss);
 
-	seq_printf(m, "THPeligible:    %d\n", transparent_hugepage_enabled(vma));
+	seq_printf(m, "THPeligible:		%d\n",
+		   transparent_hugepage_enabled(vma));
 
 	if (arch_pkeys_enabled())
 		seq_printf(m, "ProtectionKey:  %8u\n", vma_pkey(vma));
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 4bc2552..36f0225 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -65,10 +65,15 @@ 
 
 bool transparent_hugepage_enabled(struct vm_area_struct *vma)
 {
+	/* The addr is used to check if the vma size fits */
+	unsigned long addr = (vma->vm_end & HPAGE_PMD_MASK) - HPAGE_PMD_SIZE;
+
+	if (!transhuge_vma_suitable(vma, addr))
+		return false;
 	if (vma_is_anonymous(vma))
 		return __transparent_hugepage_enabled(vma);
-	if (vma_is_shmem(vma) && shmem_huge_enabled(vma))
-		return __transparent_hugepage_enabled(vma);
+	if (vma_is_shmem(vma))
+		return shmem_huge_enabled(vma);
 
 	return false;
 }
diff --git a/mm/shmem.c b/mm/shmem.c
index 1bb3b8d..a807712 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3872,6 +3872,9 @@  bool shmem_huge_enabled(struct vm_area_struct *vma)
 	loff_t i_size;
 	pgoff_t off;
 
+	if ((vma->vm_flags & VM_NOHUGEPAGE) ||
+	    test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags))
+		return false;
 	if (shmem_huge == SHMEM_HUGE_FORCE)
 		return true;
 	if (shmem_huge == SHMEM_HUGE_DENY)