Message ID | 20220606214414.736109-5-shy828301@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Cleanup transhuge_xxx helpers | expand |
On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > The hugepage_vma_revalidate() needs to check if the address is still in > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > but it was open-coded, use transhuge_vma_suitable() to do the job. And > add proper comments for transhuge_vma_suitable(). > > Signed-off-by: Yang Shi <shy828301@gmail.com> > --- > include/linux/huge_mm.h | 6 ++++++ > mm/khugepaged.c | 5 +---- > 2 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index a8f61db47f2a..79d5919beb83 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > return false; > } > > +/* > + * Do the below checks: > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > + * area. > + */ AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but rather that linear_page_index(vma, round_up(vma->vm_start, HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was pretty confused about this (hopefully I have it right now - if not - case and point :) ), so it might be a good opportunity to add some extra commentary to help future travelers understand why this constraint exists. Also I wonder while we're at it if we can rename this to transhuge_addr_aligned() or transhuge_addr_suitable() or something. Otherwise I think the change is a nice cleanup. > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > unsigned long addr) > { > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 7a5d1c1a1833..ca1754d3a827 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > struct vm_area_struct **vmap) > { > struct vm_area_struct *vma; > - unsigned long hstart, hend; > > if (unlikely(khugepaged_test_exit(mm))) > return SCAN_ANY_PROCESS; > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > if (!vma) > return SCAN_VMA_NULL; > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > - hend = vma->vm_end & HPAGE_PMD_MASK; > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > + if (!transhuge_vma_suitable(vma, address)) > return SCAN_ADDRESS_RANGE; > if (!hugepage_vma_check(vma, vma->vm_flags)) > return SCAN_VMA_CHECK; > -- > 2.26.3 > >
On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > add proper comments for transhuge_vma_suitable(). > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > --- > > include/linux/huge_mm.h | 6 ++++++ > > mm/khugepaged.c | 5 +---- > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index a8f61db47f2a..79d5919beb83 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > return false; > > } > > > > +/* > > + * Do the below checks: > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > + * area. > > + */ > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > rather that linear_page_index(vma, round_up(vma->vm_start, > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was Yeah, you are right. > pretty confused about this (hopefully I have it right now - if not - > case and point :) ), so it might be a good opportunity to add some > extra commentary to help future travelers understand why this > constraint exists. I'm not fully sure I understand this 100%. I think this is related to how page cache is structured. I will try to add more comments. > > Also I wonder while we're at it if we can rename this to > transhuge_addr_aligned() or transhuge_addr_suitable() or something. I think it is still actually used to check vma. > > Otherwise I think the change is a nice cleanup. > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > unsigned long addr) > > { > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 7a5d1c1a1833..ca1754d3a827 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > struct vm_area_struct **vmap) > > { > > struct vm_area_struct *vma; > > - unsigned long hstart, hend; > > > > if (unlikely(khugepaged_test_exit(mm))) > > return SCAN_ANY_PROCESS; > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > if (!vma) > > return SCAN_VMA_NULL; > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > + if (!transhuge_vma_suitable(vma, address)) > > return SCAN_ADDRESS_RANGE; > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > return SCAN_VMA_CHECK; > > -- > > 2.26.3 > > > >
On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote: > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > > add proper comments for transhuge_vma_suitable(). > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > > --- > > > include/linux/huge_mm.h | 6 ++++++ > > > mm/khugepaged.c | 5 +---- > > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > index a8f61db47f2a..79d5919beb83 100644 > > > --- a/include/linux/huge_mm.h > > > +++ b/include/linux/huge_mm.h > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > > return false; > > > } > > > > > > +/* > > > + * Do the below checks: > > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > > + * area. > > > + */ > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > > rather that linear_page_index(vma, round_up(vma->vm_start, > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was > > Yeah, you are right. > > > pretty confused about this (hopefully I have it right now - if not - > > case and point :) ), so it might be a good opportunity to add some > > extra commentary to help future travelers understand why this > > constraint exists. > > I'm not fully sure I understand this 100%. I think this is related to > how page cache is structured. I will try to add more comments. How's about "The underlying THP is always properly aligned in page cache, but it may be across the boundary of VMA if the VMA is misaligned, so the THP can't be PMD mapped for this case." > > > > > Also I wonder while we're at it if we can rename this to > > transhuge_addr_aligned() or transhuge_addr_suitable() or something. > > I think it is still actually used to check vma. > > > > > Otherwise I think the change is a nice cleanup. > > > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > > unsigned long addr) > > > { > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > index 7a5d1c1a1833..ca1754d3a827 100644 > > > --- a/mm/khugepaged.c > > > +++ b/mm/khugepaged.c > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > struct vm_area_struct **vmap) > > > { > > > struct vm_area_struct *vma; > > > - unsigned long hstart, hend; > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > return SCAN_ANY_PROCESS; > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > if (!vma) > > > return SCAN_VMA_NULL; > > > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > + if (!transhuge_vma_suitable(vma, address)) > > > return SCAN_ADDRESS_RANGE; > > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > > return SCAN_VMA_CHECK; > > > -- > > > 2.26.3 > > > > > >
On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote: > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote: > > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > > > add proper comments for transhuge_vma_suitable(). > > > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > > > --- > > > > include/linux/huge_mm.h | 6 ++++++ > > > > mm/khugepaged.c | 5 +---- > > > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > index a8f61db47f2a..79d5919beb83 100644 > > > > --- a/include/linux/huge_mm.h > > > > +++ b/include/linux/huge_mm.h > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > > > return false; > > > > } > > > > > > > > +/* > > > > + * Do the below checks: > > > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > > > + * area. > > > > + */ > > > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > > > rather that linear_page_index(vma, round_up(vma->vm_start, > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was > > > > Yeah, you are right. > > > > > pretty confused about this (hopefully I have it right now - if not - > > > case and point :) ), so it might be a good opportunity to add some > > > extra commentary to help future travelers understand why this > > > constraint exists. > > > > I'm not fully sure I understand this 100%. I think this is related to > > how page cache is structured. I will try to add more comments. > > How's about "The underlying THP is always properly aligned in page > cache, but it may be across the boundary of VMA if the VMA is > misaligned, so the THP can't be PMD mapped for this case." I could certainly still be wrong / am learning here - but I *thought* the reason for this check was to make sure that the hugepage to-be-collapsed is naturally aligned within the file (since, AFAIK, without this constraint, different mm's might have different ideas about where hugepages in the file should be). > > > > > > > > Also I wonder while we're at it if we can rename this to > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something. > > > > I think it is still actually used to check vma. > > > > > > > > Otherwise I think the change is a nice cleanup. > > > > > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > > > unsigned long addr) > > > > { > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index 7a5d1c1a1833..ca1754d3a827 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > struct vm_area_struct **vmap) > > > > { > > > > struct vm_area_struct *vma; > > > > - unsigned long hstart, hend; > > > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > > return SCAN_ANY_PROCESS; > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > if (!vma) > > > > return SCAN_VMA_NULL; > > > > > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > > + if (!transhuge_vma_suitable(vma, address)) > > > > return SCAN_ADDRESS_RANGE; > > > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > > > return SCAN_VMA_CHECK; > > > > -- > > > > 2.26.3 > > > > > > > >
On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote: > > On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote: > > > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > > > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > > > > add proper comments for transhuge_vma_suitable(). > > > > > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > > > > --- > > > > > include/linux/huge_mm.h | 6 ++++++ > > > > > mm/khugepaged.c | 5 +---- > > > > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > > index a8f61db47f2a..79d5919beb83 100644 > > > > > --- a/include/linux/huge_mm.h > > > > > +++ b/include/linux/huge_mm.h > > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > > > > return false; > > > > > } > > > > > > > > > > +/* > > > > > + * Do the below checks: > > > > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > > > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > > > > + * area. > > > > > + */ > > > > > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > > > > rather that linear_page_index(vma, round_up(vma->vm_start, > > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was > > > > > > Yeah, you are right. > > > > > > > pretty confused about this (hopefully I have it right now - if not - > > > > case and point :) ), so it might be a good opportunity to add some > > > > extra commentary to help future travelers understand why this > > > > constraint exists. > > > > > > I'm not fully sure I understand this 100%. I think this is related to > > > how page cache is structured. I will try to add more comments. > > > > How's about "The underlying THP is always properly aligned in page > > cache, but it may be across the boundary of VMA if the VMA is > > misaligned, so the THP can't be PMD mapped for this case." > > I could certainly still be wrong / am learning here - but I *thought* > the reason for this check was to make sure that the hugepage > to-be-collapsed is naturally aligned within the file (since, AFAIK, > without this constraint, different mm's might have different ideas > about where hugepages in the file should be). The hugepage is definitely naturally aligned within the file, this is guaranteed by how page cache is organized, you could find some example code from shmem fault, for example, the below code snippet: hindex = round_down(index, folio_nr_pages(folio)); error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp & GFP_RECLAIM_MASK, charge_mm); The index is actually rounded down to HPAGE_PMD_NR aligned. The check in hugepage_vma_check() is used to guarantee there is an PMD aligned area in the vma exactly overlapping with a PMD range in the page cache. For example, you have a vma starting from 0x1000 maps to the file's page offset of 0, even though you get THP for the file, it can not be PMD mapped to the vma. But if it maps to the file's page offset of 1, then starting from 0x200000 (assuming the vma is big enough) it can PMD map the second THP in the page cache. Does it make sense? > > > > > > > > > > > > Also I wonder while we're at it if we can rename this to > > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something. > > > > > > I think it is still actually used to check vma. > > > > > > > > > > > Otherwise I think the change is a nice cleanup. > > > > > > > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > > > > unsigned long addr) > > > > > { > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > > index 7a5d1c1a1833..ca1754d3a827 100644 > > > > > --- a/mm/khugepaged.c > > > > > +++ b/mm/khugepaged.c > > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > struct vm_area_struct **vmap) > > > > > { > > > > > struct vm_area_struct *vma; > > > > > - unsigned long hstart, hend; > > > > > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > > > return SCAN_ANY_PROCESS; > > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > if (!vma) > > > > > return SCAN_VMA_NULL; > > > > > > > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > > > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > > > + if (!transhuge_vma_suitable(vma, address)) > > > > > return SCAN_ADDRESS_RANGE; > > > > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > > > > return SCAN_VMA_CHECK; > > > > > -- > > > > > 2.26.3 > > > > > > > > > >
On 10 Jun 20:25, Yang Shi wrote: > On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > > > > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > > > > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > > > > > add proper comments for transhuge_vma_suitable(). > > > > > > > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > > > > > --- > > > > > > include/linux/huge_mm.h | 6 ++++++ > > > > > > mm/khugepaged.c | 5 +---- > > > > > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > > > index a8f61db47f2a..79d5919beb83 100644 > > > > > > --- a/include/linux/huge_mm.h > > > > > > +++ b/include/linux/huge_mm.h > > > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > > > > > return false; > > > > > > } > > > > > > > > > > > > +/* > > > > > > + * Do the below checks: > > > > > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > > > > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > > > > > + * area. > > > > > > + */ > > > > > > > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > > > > > rather that linear_page_index(vma, round_up(vma->vm_start, > > > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was > > > > > > > > Yeah, you are right. > > > > > > > > > pretty confused about this (hopefully I have it right now - if not - > > > > > case and point :) ), so it might be a good opportunity to add some > > > > > extra commentary to help future travelers understand why this > > > > > constraint exists. > > > > > > > > I'm not fully sure I understand this 100%. I think this is related to > > > > how page cache is structured. I will try to add more comments. > > > > > > How's about "The underlying THP is always properly aligned in page > > > cache, but it may be across the boundary of VMA if the VMA is > > > misaligned, so the THP can't be PMD mapped for this case." > > > > I could certainly still be wrong / am learning here - but I *thought* > > the reason for this check was to make sure that the hugepage > > to-be-collapsed is naturally aligned within the file (since, AFAIK, > > without this constraint, different mm's might have different ideas > > about where hugepages in the file should be). > > The hugepage is definitely naturally aligned within the file, this is > guaranteed by how page cache is organized, you could find some example > code from shmem fault, for example, the below code snippet: > > hindex = round_down(index, folio_nr_pages(folio)); > error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp & > GFP_RECLAIM_MASK, charge_mm); > > The index is actually rounded down to HPAGE_PMD_NR aligned. Thanks for the reference here. > The check in hugepage_vma_check() is used to guarantee there is an PMD > aligned area in the vma exactly overlapping with a PMD range in the > page cache. For example, you have a vma starting from 0x1000 maps to > the file's page offset of 0, even though you get THP for the file, it > can not be PMD mapped to the vma. But if it maps to the file's page > offset of 1, then starting from 0x200000 (assuming the vma is big > enough) it can PMD map the second THP in the page cache. Does it make > sense? > Yes, this makes sense - thanks for providing your insight. I think I was basically thinking the same thing ; except your description is more accurate (namely, that is *some* pmd-aligned range covered by the vma that maps to a hugepage-aligned offset in the file (I mistakenly took this to be the *first* pmd-aligned address >= vma->vm_start)). Also, with this in mind, your previous suggested comment makes sense. If I had to take a stab at it, I would say something like: "The hugepage is guaranteed to be hugepage-aligned within the file, but we must check that the PMD-aligned addresses in the VMA map to PMD-aligned offsets within the file, else the hugepage will not be PMD-mappable". WDYT? > > > > > > > > > > > > > > > > Also I wonder while we're at it if we can rename this to > > > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something. > > > > > > > > I think it is still actually used to check vma. > > > > > > > > > > > > > > Otherwise I think the change is a nice cleanup. > > > > > > > > > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > > > > > unsigned long addr) > > > > > > { > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > > > index 7a5d1c1a1833..ca1754d3a827 100644 > > > > > > --- a/mm/khugepaged.c > > > > > > +++ b/mm/khugepaged.c > > > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > > struct vm_area_struct **vmap) > > > > > > { > > > > > > struct vm_area_struct *vma; > > > > > > - unsigned long hstart, hend; > > > > > > > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > > > > return SCAN_ANY_PROCESS; > > > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > > if (!vma) > > > > > > return SCAN_VMA_NULL; > > > > > > > > > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > > > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > > > > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > > > > + if (!transhuge_vma_suitable(vma, address)) > > > > > > return SCAN_ADDRESS_RANGE; > > > > > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > > > > > return SCAN_VMA_CHECK; > > > > > > -- > > > > > > 2.26.3 > > > > > > > > > > > >
On Sat, Jun 11, 2022 at 2:43 PM Zach O'Keefe <zokeefe@google.com> wrote: > > On 10 Jun 20:25, Yang Shi wrote: > > On Fri, Jun 10, 2022 at 5:28 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > > > On Fri, Jun 10, 2022 at 3:04 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > On Fri, Jun 10, 2022 at 9:59 AM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > > > On Thu, Jun 9, 2022 at 6:52 PM Zach O'Keefe <zokeefe@google.com> wrote: > > > > > > > > > > > > On Mon, Jun 6, 2022 at 2:44 PM Yang Shi <shy828301@gmail.com> wrote: > > > > > > > > > > > > > > The hugepage_vma_revalidate() needs to check if the address is still in > > > > > > > the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, > > > > > > > but it was open-coded, use transhuge_vma_suitable() to do the job. And > > > > > > > add proper comments for transhuge_vma_suitable(). > > > > > > > > > > > > > > Signed-off-by: Yang Shi <shy828301@gmail.com> > > > > > > > --- > > > > > > > include/linux/huge_mm.h | 6 ++++++ > > > > > > > mm/khugepaged.c | 5 +---- > > > > > > > 2 files changed, 7 insertions(+), 4 deletions(-) > > > > > > > > > > > > > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > > > > > > index a8f61db47f2a..79d5919beb83 100644 > > > > > > > --- a/include/linux/huge_mm.h > > > > > > > +++ b/include/linux/huge_mm.h > > > > > > > @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) > > > > > > > return false; > > > > > > > } > > > > > > > > > > > > > > +/* > > > > > > > + * Do the below checks: > > > > > > > + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. > > > > > > > + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE > > > > > > > + * area. > > > > > > > + */ > > > > > > > > > > > > AFAIK we aren't checking if vm_pgoff is HPAGE_PMD_NR aligned, but > > > > > > rather that linear_page_index(vma, round_up(vma->vm_start, > > > > > > HPAGE_PMD_SIZE)) is HPAGE_PMD_NR aligned within vma->vm_file. I was > > > > > > > > > > Yeah, you are right. > > > > > > > > > > > pretty confused about this (hopefully I have it right now - if not - > > > > > > case and point :) ), so it might be a good opportunity to add some > > > > > > extra commentary to help future travelers understand why this > > > > > > constraint exists. > > > > > > > > > > I'm not fully sure I understand this 100%. I think this is related to > > > > > how page cache is structured. I will try to add more comments. > > > > > > > > How's about "The underlying THP is always properly aligned in page > > > > cache, but it may be across the boundary of VMA if the VMA is > > > > misaligned, so the THP can't be PMD mapped for this case." > > > > > > I could certainly still be wrong / am learning here - but I *thought* > > > the reason for this check was to make sure that the hugepage > > > to-be-collapsed is naturally aligned within the file (since, AFAIK, > > > without this constraint, different mm's might have different ideas > > > about where hugepages in the file should be). > > > > The hugepage is definitely naturally aligned within the file, this is > > guaranteed by how page cache is organized, you could find some example > > code from shmem fault, for example, the below code snippet: > > > > hindex = round_down(index, folio_nr_pages(folio)); > > error = shmem_add_to_page_cache(folio, mapping, hindex, NULL, gfp & > > GFP_RECLAIM_MASK, charge_mm); > > > > The index is actually rounded down to HPAGE_PMD_NR aligned. > > Thanks for the reference here. > > > The check in hugepage_vma_check() is used to guarantee there is an PMD > > aligned area in the vma exactly overlapping with a PMD range in the > > page cache. For example, you have a vma starting from 0x1000 maps to > > the file's page offset of 0, even though you get THP for the file, it > > can not be PMD mapped to the vma. But if it maps to the file's page > > offset of 1, then starting from 0x200000 (assuming the vma is big > > enough) it can PMD map the second THP in the page cache. Does it make > > sense? > > > > Yes, this makes sense - thanks for providing your insight. I think I was > basically thinking the same thing ; except your description is more accurate > (namely, that is *some* pmd-aligned range covered by the vma that maps to a > hugepage-aligned offset in the file (I mistakenly took this to be the *first* > pmd-aligned address >= vma->vm_start)). > > Also, with this in mind, your previous suggested comment makes sense. If I had > to take a stab at it, I would say something like: > > "The hugepage is guaranteed to be hugepage-aligned within the file, but we must > check that the PMD-aligned addresses in the VMA map to PMD-aligned offsets > within the file, else the hugepage will not be PMD-mappable". > > WDYT? Looks good to me. Thanks for the wording. > > > > > > > > > > > > > > > > > > > > > Also I wonder while we're at it if we can rename this to > > > > > > transhuge_addr_aligned() or transhuge_addr_suitable() or something. > > > > > > > > > > I think it is still actually used to check vma. > > > > > > > > > > > > > > > > > Otherwise I think the change is a nice cleanup. > > > > > > > > > > > > > static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, > > > > > > > unsigned long addr) > > > > > > > { > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > > > > index 7a5d1c1a1833..ca1754d3a827 100644 > > > > > > > --- a/mm/khugepaged.c > > > > > > > +++ b/mm/khugepaged.c > > > > > > > @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > > > struct vm_area_struct **vmap) > > > > > > > { > > > > > > > struct vm_area_struct *vma; > > > > > > > - unsigned long hstart, hend; > > > > > > > > > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > > > > > return SCAN_ANY_PROCESS; > > > > > > > @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > > > > if (!vma) > > > > > > > return SCAN_VMA_NULL; > > > > > > > > > > > > > > - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; > > > > > > > - hend = vma->vm_end & HPAGE_PMD_MASK; > > > > > > > - if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > > > > > + if (!transhuge_vma_suitable(vma, address)) > > > > > > > return SCAN_ADDRESS_RANGE; > > > > > > > if (!hugepage_vma_check(vma, vma->vm_flags)) > > > > > > > return SCAN_VMA_CHECK; > > > > > > > -- > > > > > > > 2.26.3 > > > > > > > > > > > > > >
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index a8f61db47f2a..79d5919beb83 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -128,6 +128,12 @@ static inline bool transhuge_vma_size_ok(struct vm_area_struct *vma) return false; } +/* + * Do the below checks: + * - For non-anon vma, check if the vm_pgoff is HPAGE_PMD_NR aligned. + * - For all vmas, check if the haddr is in an aligned HPAGE_PMD_SIZE + * area. + */ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma, unsigned long addr) { diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7a5d1c1a1833..ca1754d3a827 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -951,7 +951,6 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct **vmap) { struct vm_area_struct *vma; - unsigned long hstart, hend; if (unlikely(khugepaged_test_exit(mm))) return SCAN_ANY_PROCESS; @@ -960,9 +959,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, if (!vma) return SCAN_VMA_NULL; - hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; - hend = vma->vm_end & HPAGE_PMD_MASK; - if (address < hstart || address + HPAGE_PMD_SIZE > hend) + if (!transhuge_vma_suitable(vma, address)) return SCAN_ADDRESS_RANGE; if (!hugepage_vma_check(vma, vma->vm_flags)) return SCAN_VMA_CHECK;
The hugepage_vma_revalidate() needs to check if the address is still in the aligned HPAGE_PMD_SIZE area of the vma when reacquiring mmap_lock, but it was open-coded, use transhuge_vma_suitable() to do the job. And add proper comments for transhuge_vma_suitable(). Signed-off-by: Yang Shi <shy828301@gmail.com> --- include/linux/huge_mm.h | 6 ++++++ mm/khugepaged.c | 5 +---- 2 files changed, 7 insertions(+), 4 deletions(-)