Message ID | 1555971893-52276-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: thp: fix false negative of shmem vma's THP eligibility | expand |
On Tue 23-04-19 06:24:53, Yang Shi wrote: > The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each > vma") introduced THPeligible bit for processes' smaps. But, when checking > the eligibility for shmem vma, __transparent_hugepage_enabled() is > called to override the result from shmem_huge_enabled(). It may result > in the anonymous vma's THP flag override shmem's. Hmm, I was under impression that thw global sysfs is not anonymous memory specific and it overrides whatever sysfs comes with. Isn't ignoring the global setting a bug in the shmemfs allocation paths? Kirill what is the actual semantic here? > For example, running a > simple test which create THP for shmem, but with anonymous THP disabled, > when reading the process's smaps, it may show: > > 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test > Size: 4096 kB > ... > [snip] > ... > ShmemPmdMapped: 4096 kB > ... > [snip] > ... > THPeligible: 0 > > And, /proc/meminfo does show THP allocated and PMD mapped too: > > ShmemHugePages: 4096 kB > ShmemPmdMapped: 4096 kB > > This doesn't make too much sense. The anonymous THP flag should not > intervene shmem THP. Calling shmem_huge_enabled() with checking > MMF_DISABLE_THP sounds good enough. And, we could skip stack and > dax vma check since we already checked if the vma is shmem already. Even if I am wrong about the /sys/kernel/mm/transparent_hugepage/enabled being the global setting for _all_ THP then this patch is not sufficient because it doesn't reflect VM_NOHUGEPAGE. > > Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") > Cc: Michal Hocko <mhocko@suse.com> > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: David Rientjes <rientjes@google.com> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> > --- > mm/huge_memory.c | 4 ++-- > mm/shmem.c | 2 ++ > 2 files changed, 4 insertions(+), 2 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 165ea46..5881e82 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma) > { > if (vma_is_anonymous(vma)) > return __transparent_hugepage_enabled(vma); > - if (vma_is_shmem(vma) && shmem_huge_enabled(vma)) > - return __transparent_hugepage_enabled(vma); > + if (vma_is_shmem(vma)) > + return shmem_huge_enabled(vma); > > return false; > } > diff --git a/mm/shmem.c b/mm/shmem.c > index 2275a0f..be15e9b 100644 > --- a/mm/shmem.c > +++ b/mm/shmem.c > @@ -3873,6 +3873,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) > loff_t i_size; > pgoff_t off; > > + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > + return false; > if (shmem_huge == SHMEM_HUGE_FORCE) > return true; > if (shmem_huge == SHMEM_HUGE_DENY) > -- > 1.8.3.1 >
On 4/22/19 11:50 PM, Michal Hocko wrote: > On Tue 23-04-19 06:24:53, Yang Shi wrote: >> The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each >> vma") introduced THPeligible bit for processes' smaps. But, when checking >> the eligibility for shmem vma, __transparent_hugepage_enabled() is >> called to override the result from shmem_huge_enabled(). It may result >> in the anonymous vma's THP flag override shmem's. > Hmm, I was under impression that thw global sysfs is not anonymous > memory specific and it overrides whatever sysfs comes with. Isn't > ignoring the global setting a bug in the shmemfs allocation paths? > Kirill what is the actual semantic here? I tried 4.9, 4.14, 4.20 and 5.1-rc5, all behaves in the same way. So, I'm supposed "enabled" is for anonymous THP only. > >> For example, running a >> simple test which create THP for shmem, but with anonymous THP disabled, >> when reading the process's smaps, it may show: >> >> 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test >> Size: 4096 kB >> ... >> [snip] >> ... >> ShmemPmdMapped: 4096 kB >> ... >> [snip] >> ... >> THPeligible: 0 >> >> And, /proc/meminfo does show THP allocated and PMD mapped too: >> >> ShmemHugePages: 4096 kB >> ShmemPmdMapped: 4096 kB >> >> This doesn't make too much sense. The anonymous THP flag should not >> intervene shmem THP. Calling shmem_huge_enabled() with checking >> MMF_DISABLE_THP sounds good enough. And, we could skip stack and >> dax vma check since we already checked if the vma is shmem already. > Even if I am wrong about the /sys/kernel/mm/transparent_hugepage/enabled > being the global setting for _all_ THP then this patch is not sufficient > because it doesn't reflect VM_NOHUGEPAGE. Aha, yes, thanks for catching this. Will fix in v2. >> Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") >> Cc: Michal Hocko <mhocko@suse.com> >> Cc: Vlastimil Babka <vbabka@suse.cz> >> Cc: David Rientjes <rientjes@google.com> >> Cc: Kirill A. Shutemov <kirill@shutemov.name> >> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> >> --- >> mm/huge_memory.c | 4 ++-- >> mm/shmem.c | 2 ++ >> 2 files changed, 4 insertions(+), 2 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 165ea46..5881e82 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma) >> { >> if (vma_is_anonymous(vma)) >> return __transparent_hugepage_enabled(vma); >> - if (vma_is_shmem(vma) && shmem_huge_enabled(vma)) >> - return __transparent_hugepage_enabled(vma); >> + if (vma_is_shmem(vma)) >> + return shmem_huge_enabled(vma); >> >> return false; >> } >> diff --git a/mm/shmem.c b/mm/shmem.c >> index 2275a0f..be15e9b 100644 >> --- a/mm/shmem.c >> +++ b/mm/shmem.c >> @@ -3873,6 +3873,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) >> loff_t i_size; >> pgoff_t off; >> >> + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) >> + return false; >> if (shmem_huge == SHMEM_HUGE_FORCE) >> return true; >> if (shmem_huge == SHMEM_HUGE_DENY) >> -- >> 1.8.3.1 >>
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 165ea46..5881e82 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -67,8 +67,8 @@ bool transparent_hugepage_enabled(struct vm_area_struct *vma) { if (vma_is_anonymous(vma)) return __transparent_hugepage_enabled(vma); - if (vma_is_shmem(vma) && shmem_huge_enabled(vma)) - return __transparent_hugepage_enabled(vma); + if (vma_is_shmem(vma)) + return shmem_huge_enabled(vma); return false; } diff --git a/mm/shmem.c b/mm/shmem.c index 2275a0f..be15e9b 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -3873,6 +3873,8 @@ bool shmem_huge_enabled(struct vm_area_struct *vma) loff_t i_size; pgoff_t off; + if (test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) + return false; if (shmem_huge == SHMEM_HUGE_FORCE) return true; if (shmem_huge == SHMEM_HUGE_DENY)
The commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") introduced THPeligible bit for processes' smaps. But, when checking the eligibility for shmem vma, __transparent_hugepage_enabled() is called to override the result from shmem_huge_enabled(). It may result in the anonymous vma's THP flag override shmem's. For example, running a simple test which create THP for shmem, but with anonymous THP disabled, when reading the process's smaps, it may show: 7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test Size: 4096 kB ... [snip] ... ShmemPmdMapped: 4096 kB ... [snip] ... THPeligible: 0 And, /proc/meminfo does show THP allocated and PMD mapped too: ShmemHugePages: 4096 kB ShmemPmdMapped: 4096 kB This doesn't make too much sense. The anonymous THP flag should not intervene shmem THP. Calling shmem_huge_enabled() with checking MMF_DISABLE_THP sounds good enough. And, we could skip stack and dax vma check since we already checked if the vma is shmem already. Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma") Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: David Rientjes <rientjes@google.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> --- mm/huge_memory.c | 4 ++-- mm/shmem.c | 2 ++ 2 files changed, 4 insertions(+), 2 deletions(-)