Message ID | 1560422702-11403-3-git-send-email-kernelfans@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/gup: fix omission of check on FOLL_LONGTERM in gup fast path | expand |
On Thu, Jun 13, 2019 at 06:45:01PM +0800, Pingfan Liu wrote: > FOLL_LONGTERM suggests a pin which is going to be given to hardware and > can't move. It would truncate CMA permanently and should be excluded. > > FOLL_LONGTERM has already been checked in the slow path, but not checked in > the fast path, which means a possible leak of CMA page to longterm pinned > requirement through this crack. > > Place a check in gup_pte_range() in the fast path. > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Mike Rapoport <rppt@linux.ibm.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > Cc: Keith Busch <keith.busch@intel.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Shuah Khan <shuah@kernel.org> > Cc: linux-kernel@vger.kernel.org > --- > mm/gup.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/mm/gup.c b/mm/gup.c > index 766ae54..de1b03f 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > VM_BUG_ON(!pfn_valid(pte_pfn(pte))); > page = pte_page(pte); > > + /* > + * FOLL_LONGTERM suggests a pin given to hardware. Prevent it > + * from truncating CMA area > + */ > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) > + goto pte_unmap; > + > head = try_get_compound_head(page, 1); > if (!head) > goto pte_unmap; > @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + Why can't we place this check before the while loop and skip subtracting the page count? Can is_migrate_cma_page() operate on any "subpage" of a compound page? Here this calls is_magrate_cma_page() on the tail page of the compound page. I'm not an expert on compound pages nor cma handling so is this ok? It seems like you need to call is_migrate_cma_page() on each page within the while loop? > head = try_get_compound_head(pmd_page(orig), refs); > if (!head) { > *nr -= refs; > @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + Same comment here. > head = try_get_compound_head(pud_page(orig), refs); > if (!head) { > *nr -= refs; > @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, > refs++; > } while (addr += PAGE_SIZE, addr != end); > > + if (unlikely(flags & FOLL_LONGTERM) && > + is_migrate_cma_page(page)) { > + *nr -= refs; > + return 0; > + } > + And here. Ira > head = try_get_compound_head(pgd_page(orig), refs); > if (!head) { > *nr -= refs; > -- > 2.7.5 >
Cc Mike, David, who is an expert of hugetlb and thp On Fri, Jun 14, 2019 at 5:37 AM Ira Weiny <ira.weiny@intel.com> wrote: > > On Thu, Jun 13, 2019 at 06:45:01PM +0800, Pingfan Liu wrote: > > FOLL_LONGTERM suggests a pin which is going to be given to hardware and > > can't move. It would truncate CMA permanently and should be excluded. > > > > FOLL_LONGTERM has already been checked in the slow path, but not checked in > > the fast path, which means a possible leak of CMA page to longterm pinned > > requirement through this crack. > > > > Place a check in gup_pte_range() in the fast path. > > > > Signed-off-by: Pingfan Liu <kernelfans@gmail.com> > > Cc: Ira Weiny <ira.weiny@intel.com> > > Cc: Andrew Morton <akpm@linux-foundation.org> > > Cc: Mike Rapoport <rppt@linux.ibm.com> > > Cc: Dan Williams <dan.j.williams@intel.com> > > Cc: Matthew Wilcox <willy@infradead.org> > > Cc: John Hubbard <jhubbard@nvidia.com> > > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> > > Cc: Keith Busch <keith.busch@intel.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Cc: Shuah Khan <shuah@kernel.org> > > Cc: linux-kernel@vger.kernel.org > > --- > > mm/gup.c | 26 ++++++++++++++++++++++++++ > > 1 file changed, 26 insertions(+) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 766ae54..de1b03f 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > VM_BUG_ON(!pfn_valid(pte_pfn(pte))); > > page = pte_page(pte); > > > > + /* > > + * FOLL_LONGTERM suggests a pin given to hardware. Prevent it > > + * from truncating CMA area > > + */ > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) > > + goto pte_unmap; > > + > > head = try_get_compound_head(page, 1); > > if (!head) > > goto pte_unmap; > > @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > Why can't we place this check before the while loop and skip subtracting the > page count? Yes, that will be better. > > Can is_migrate_cma_page() operate on any "subpage" of a compound page? For gigantic page, __alloc_gigantic_page() allocate from MIGRATE_MOVABLE pageblock. For page order < MAX_ORDER, pages are allocated from either free_list[MIGRATE_MOVABLE] or free_list[MIGRATE_CMA]. So all subpage have the same migrate type. Thanks, Pingfan > > Here this calls is_magrate_cma_page() on the tail page of the compound page. > > I'm not an expert on compound pages nor cma handling so is this ok? > > It seems like you need to call is_migrate_cma_page() on each page within the > while loop? > > > head = try_get_compound_head(pmd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > Same comment here. > > > head = try_get_compound_head(pud_page(orig), refs); > > if (!head) { > > *nr -= refs; > > @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, > > refs++; > > } while (addr += PAGE_SIZE, addr != end); > > > > + if (unlikely(flags & FOLL_LONGTERM) && > > + is_migrate_cma_page(page)) { > > + *nr -= refs; > > + return 0; > > + } > > + > > And here. > > Ira > > > head = try_get_compound_head(pgd_page(orig), refs); > > if (!head) { > > *nr -= refs; > > -- > > 2.7.5 > >
diff --git a/mm/gup.c b/mm/gup.c index 766ae54..de1b03f 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1757,6 +1757,14 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, VM_BUG_ON(!pfn_valid(pte_pfn(pte))); page = pte_page(pte); + /* + * FOLL_LONGTERM suggests a pin given to hardware. Prevent it + * from truncating CMA area + */ + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) + goto pte_unmap; + head = try_get_compound_head(page, 1); if (!head) goto pte_unmap; @@ -1900,6 +1908,12 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pmd_page(orig), refs); if (!head) { *nr -= refs; @@ -1941,6 +1955,12 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pud_page(orig), refs); if (!head) { *nr -= refs; @@ -1978,6 +1998,12 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr, refs++; } while (addr += PAGE_SIZE, addr != end); + if (unlikely(flags & FOLL_LONGTERM) && + is_migrate_cma_page(page)) { + *nr -= refs; + return 0; + } + head = try_get_compound_head(pgd_page(orig), refs); if (!head) { *nr -= refs;
FOLL_LONGTERM suggests a pin which is going to be given to hardware and can't move. It would truncate CMA permanently and should be excluded. FOLL_LONGTERM has already been checked in the slow path, but not checked in the fast path, which means a possible leak of CMA page to longterm pinned requirement through this crack. Place a check in gup_pte_range() in the fast path. Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mike Rapoport <rppt@linux.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: John Hubbard <jhubbard@nvidia.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Shuah Khan <shuah@kernel.org> Cc: linux-kernel@vger.kernel.org --- mm/gup.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+)