Message ID | 20231116012908.392077-8-peterx@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/gup: Unify hugetlb, part 2 | expand |
On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote: > All the fast-gup functions take a tail page to operate, always need to do > page mask calculations before feeding that into record_subpages(). > > Merge that logic into record_subpages(), so that we always take a head > page, and leave the rest calculation to record_subpages(). This is a bit fragile. You're assuming that pmd_page() always returns a head page, and that's only true today because I looked at the work required vs the reward and decided to cap the large folio size at PMD size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not return a head page. There is a small amount of demand for > PMD size large folio support, so I suspect we will want to do this eventually. I'm not particularly trying to do these conversions, but it would be good to not add more assumptions that pmd_page() returns a head page. > +static int record_subpages(struct page *head, unsigned long sz, > + unsigned long addr, unsigned long end, > + struct page **pages) > @@ -2870,8 +2873,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > pages, nr); > } > > - page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT); > - refs = record_subpages(page, addr, end, pages + *nr); > + page = pmd_page(orig); > + refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr); > > folio = try_grab_folio(page, refs, flags); > if (!folio)
On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote: > On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote: > > All the fast-gup functions take a tail page to operate, always need to do > > page mask calculations before feeding that into record_subpages(). > > > > Merge that logic into record_subpages(), so that we always take a head > > page, and leave the rest calculation to record_subpages(). > > This is a bit fragile. You're assuming that pmd_page() always returns > a head page, and that's only true today because I looked at the work > required vs the reward and decided to cap the large folio size at PMD > size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not > return a head page. There is a small amount of demand for > PMD size > large folio support, so I suspect we will want to do this eventually. > I'm not particularly trying to do these conversions, but it would be > good to not add more assumptions that pmd_page() returns a head page. Makes sense. Actually, IIUC arm64's CONT_PMD pages can already make that not a head page. The code should still be correct, though. AFAIU what I need to do then is renaming the first field of record_subpages() (s/head/base/) in the next version, or just keep it the old one ("page"), then update the commit message. Thanks,
On Thu, Nov 16, 2023 at 02:40:21PM -0500, Peter Xu wrote: > On Thu, Nov 16, 2023 at 02:51:52PM +0000, Matthew Wilcox wrote: > > On Wed, Nov 15, 2023 at 08:29:03PM -0500, Peter Xu wrote: > > > All the fast-gup functions take a tail page to operate, always need to do > > > page mask calculations before feeding that into record_subpages(). > > > > > > Merge that logic into record_subpages(), so that we always take a head > > > page, and leave the rest calculation to record_subpages(). > > > > This is a bit fragile. You're assuming that pmd_page() always returns > > a head page, and that's only true today because I looked at the work > > required vs the reward and decided to cap the large folio size at PMD > > size. If we allowed 2*PMD_SIZE (eg 4MB on x86), pmd_page() would not > > return a head page. There is a small amount of demand for > PMD size > > large folio support, so I suspect we will want to do this eventually. > > I'm not particularly trying to do these conversions, but it would be > > good to not add more assumptions that pmd_page() returns a head page. > > Makes sense. Actually, IIUC arm64's CONT_PMD pages can already make that > not a head page. > > The code should still be correct, though. AFAIU what I need to do then is > renaming the first field of record_subpages() (s/head/base/) in the next > version, or just keep it the old one ("page"), then update the commit > message. Yeah, I think just leave it as 'page' would be best. Thanks.
diff --git a/mm/gup.c b/mm/gup.c index 424d45e1afb3..69dae51f3eb1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2763,11 +2763,14 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr, } #endif -static int record_subpages(struct page *page, unsigned long addr, - unsigned long end, struct page **pages) +static int record_subpages(struct page *head, unsigned long sz, + unsigned long addr, unsigned long end, + struct page **pages) { + struct page *page; int nr; + page = nth_page(head, (addr & (sz - 1)) >> PAGE_SHIFT); for (nr = 0; addr != end; nr++, addr += PAGE_SIZE) pages[nr] = nth_page(page, nr); @@ -2804,8 +2807,8 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr, /* hugepages are never "special" */ VM_BUG_ON(!pfn_valid(pte_pfn(pte))); - page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pte_page(pte); + refs = record_subpages(page, sz, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2870,8 +2873,8 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, pages, nr); } - page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pmd_page(orig); + refs = record_subpages(page, PMD_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2914,8 +2917,8 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, pages, nr); } - page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pud_page(orig); + refs = record_subpages(page, PUD_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio) @@ -2954,8 +2957,8 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, BUILD_BUG_ON(pgd_devmap(orig)); - page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT); - refs = record_subpages(page, addr, end, pages + *nr); + page = pgd_page(orig); + refs = record_subpages(page, PGDIR_SIZE, addr, end, pages + *nr); folio = try_grab_folio(page, refs, flags); if (!folio)
All the fast-gup functions take a tail page to operate, always need to do page mask calculations before feeding that into record_subpages(). Merge that logic into record_subpages(), so that we always take a head page, and leave the rest calculation to record_subpages(). Signed-off-by: Peter Xu <peterx@redhat.com> --- mm/gup.c | 23 +++++++++++++---------- 1 file changed, 13 insertions(+), 10 deletions(-)