Message ID | 1725373521-451395-4-git-send-email-steven.sistare@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | memfd-pin huge page fixes | expand |
Hi Steve, > Subject: [PATCH V1 3/5] mm/hugetlb: fix memfd_pin_folios resv_huge_pages > leak > > memfd_pin_folios followed by unpin_folios leaves resv_huge_pages > elevated > if the pages were not already faulted in. During a normal page fault, > resv_huge_pages is consumed here: > > hugetlb_fault() > alloc_hugetlb_folio() > dequeue_hugetlb_folio_vma() > dequeue_hugetlb_folio_nodemask() > dequeue_hugetlb_folio_node_exact() > free_huge_pages-- > resv_huge_pages-- > > During memfd_pin_folios, the page is created by calling > alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and > resv_huge_pages is not modified: > > memfd_alloc_folio() > alloc_hugetlb_folio_nodemask() > dequeue_hugetlb_folio_nodemask() > dequeue_hugetlb_folio_node_exact() > free_huge_pages-- > > alloc_hugetlb_folio_nodemask has other callers that must not modify > resv_huge_pages. Therefore, to fix, define an alternate version of > alloc_hugetlb_folio_nodemask for this call site that adjusts > resv_huge_pages. > > Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning > memfd folios") > > Signed-off-by: Steve Sistare <steven.sistare@oracle.com> > --- > include/linux/hugetlb.h | 10 ++++++++++ > mm/hugetlb.c | 17 +++++++++++++++++ > mm/memfd.c | 9 ++++----- > 3 files changed, 31 insertions(+), 5 deletions(-) > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > index 45bf05a..3ddd69b 100644 > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -695,6 +695,9 @@ struct folio *alloc_hugetlb_folio(struct > vm_area_struct *vma, > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int > preferred_nid, > nodemask_t *nmask, gfp_t gfp_mask, > bool allow_alloc_fallback); > +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, > + nodemask_t *nmask, gfp_t > gfp_mask); > + > int hugetlb_add_to_page_cache(struct folio *folio, struct address_space > *mapping, > pgoff_t idx); > void restore_reserve_on_error(struct hstate *h, struct vm_area_struct > *vma, > @@ -1062,6 +1065,13 @@ static inline struct folio > *alloc_hugetlb_folio(struct vm_area_struct *vma, > } > > static inline struct folio * > +alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, > + nodemask_t *nmask, gfp_t gfp_mask) > +{ > + return NULL; > +} > + > +static inline struct folio * > alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, > nodemask_t *nmask, gfp_t gfp_mask, > bool allow_alloc_fallback) > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index aaf508b..c2d44a1 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -2564,6 +2564,23 @@ struct folio > *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, > return folio; > } > > +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, > + nodemask_t *nmask, gfp_t gfp_mask) > +{ > + struct folio *folio; > + > + spin_lock_irq(&hugetlb_lock); > + folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, > preferred_nid, I am assuming a check for available_huge_pages(h) before calling dequeue would be redundant as it would simply return NULL if no huge pages are available? Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> Thanks, Vivek > + nmask); > + if (folio) { > + VM_BUG_ON(!h->resv_huge_pages); > + h->resv_huge_pages--; > + } > + > + spin_unlock_irq(&hugetlb_lock); > + return folio; > +} > + > /* folio migration callback function */ > struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int > preferred_nid, > nodemask_t *nmask, gfp_t gfp_mask, bool > allow_alloc_fallback) > diff --git a/mm/memfd.c b/mm/memfd.c > index e7b7c52..bfe0e71 100644 > --- a/mm/memfd.c > +++ b/mm/memfd.c > @@ -82,11 +82,10 @@ struct folio *memfd_alloc_folio(struct file *memfd, > pgoff_t idx) > gfp_mask = htlb_alloc_mask(hstate_file(memfd)); > gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); > > - folio = alloc_hugetlb_folio_nodemask(hstate_file(memfd), > - numa_node_id(), > - NULL, > - gfp_mask, > - false); > + folio = alloc_hugetlb_folio_reserve(hstate_file(memfd), > + numa_node_id(), > + NULL, > + gfp_mask); > if (folio && folio_try_get(folio)) { > err = hugetlb_add_to_page_cache(folio, > memfd->f_mapping, > -- > 1.8.3.1
On 9/3/2024 9:04 PM, Kasireddy, Vivek wrote: > Hi Steve, > >> Subject: [PATCH V1 3/5] mm/hugetlb: fix memfd_pin_folios resv_huge_pages >> leak >> >> memfd_pin_folios followed by unpin_folios leaves resv_huge_pages >> elevated >> if the pages were not already faulted in. During a normal page fault, >> resv_huge_pages is consumed here: >> >> hugetlb_fault() >> alloc_hugetlb_folio() >> dequeue_hugetlb_folio_vma() >> dequeue_hugetlb_folio_nodemask() >> dequeue_hugetlb_folio_node_exact() >> free_huge_pages-- >> resv_huge_pages-- >> >> During memfd_pin_folios, the page is created by calling >> alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and >> resv_huge_pages is not modified: >> >> memfd_alloc_folio() >> alloc_hugetlb_folio_nodemask() >> dequeue_hugetlb_folio_nodemask() >> dequeue_hugetlb_folio_node_exact() >> free_huge_pages-- >> >> alloc_hugetlb_folio_nodemask has other callers that must not modify >> resv_huge_pages. Therefore, to fix, define an alternate version of >> alloc_hugetlb_folio_nodemask for this call site that adjusts >> resv_huge_pages. >> >> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning >> memfd folios") >> >> Signed-off-by: Steve Sistare <steven.sistare@oracle.com> >> --- >> include/linux/hugetlb.h | 10 ++++++++++ >> mm/hugetlb.c | 17 +++++++++++++++++ >> mm/memfd.c | 9 ++++----- >> 3 files changed, 31 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h >> index 45bf05a..3ddd69b 100644 >> --- a/include/linux/hugetlb.h >> +++ b/include/linux/hugetlb.h >> @@ -695,6 +695,9 @@ struct folio *alloc_hugetlb_folio(struct >> vm_area_struct *vma, >> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int >> preferred_nid, >> nodemask_t *nmask, gfp_t gfp_mask, >> bool allow_alloc_fallback); >> +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, >> + nodemask_t *nmask, gfp_t >> gfp_mask); >> + >> int hugetlb_add_to_page_cache(struct folio *folio, struct address_space >> *mapping, >> pgoff_t idx); >> void restore_reserve_on_error(struct hstate *h, struct vm_area_struct >> *vma, >> @@ -1062,6 +1065,13 @@ static inline struct folio >> *alloc_hugetlb_folio(struct vm_area_struct *vma, >> } >> >> static inline struct folio * >> +alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, >> + nodemask_t *nmask, gfp_t gfp_mask) >> +{ >> + return NULL; >> +} >> + >> +static inline struct folio * >> alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, >> nodemask_t *nmask, gfp_t gfp_mask, >> bool allow_alloc_fallback) >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index aaf508b..c2d44a1 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -2564,6 +2564,23 @@ struct folio >> *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, >> return folio; >> } >> >> +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, >> + nodemask_t *nmask, gfp_t gfp_mask) >> +{ >> + struct folio *folio; >> + >> + spin_lock_irq(&hugetlb_lock); >> + folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, >> preferred_nid, > I am assuming a check for available_huge_pages(h) before calling dequeue > would be redundant as it would simply return NULL if no huge pages are > available? It would be wrong to check available_huge_pages() first, because it would return false if free_huge_pages == resv_huge_pages, but at this point a page was already reserved for us in resv_huge_pages, so we should always succeed. The invariant here is free_huge_pages >= resv_huge_pages. We simply do resv_huge_pages-- here, and do free_huge_pages-- in the dequeue_hugetlb_folio_nodemask call stack. - Steve > Acked-by: Vivek Kasireddy <vivek.kasireddy@intel.com> > > Thanks, > Vivek > >> + nmask); >> + if (folio) { >> + VM_BUG_ON(!h->resv_huge_pages); >> + h->resv_huge_pages--; >> + } >> + >> + spin_unlock_irq(&hugetlb_lock); >> + return folio; >> +} >> + >> /* folio migration callback function */ >> struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int >> preferred_nid, >> nodemask_t *nmask, gfp_t gfp_mask, bool >> allow_alloc_fallback) >> diff --git a/mm/memfd.c b/mm/memfd.c >> index e7b7c52..bfe0e71 100644 >> --- a/mm/memfd.c >> +++ b/mm/memfd.c >> @@ -82,11 +82,10 @@ struct folio *memfd_alloc_folio(struct file *memfd, >> pgoff_t idx) >> gfp_mask = htlb_alloc_mask(hstate_file(memfd)); >> gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); >> >> - folio = alloc_hugetlb_folio_nodemask(hstate_file(memfd), >> - numa_node_id(), >> - NULL, >> - gfp_mask, >> - false); >> + folio = alloc_hugetlb_folio_reserve(hstate_file(memfd), >> + numa_node_id(), >> + NULL, >> + gfp_mask); >> if (folio && folio_try_get(folio)) { >> err = hugetlb_add_to_page_cache(folio, >> memfd->f_mapping, >> -- >> 1.8.3.1 >
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 45bf05a..3ddd69b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -695,6 +695,9 @@ struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback); +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, + nodemask_t *nmask, gfp_t gfp_mask); + int hugetlb_add_to_page_cache(struct folio *folio, struct address_space *mapping, pgoff_t idx); void restore_reserve_on_error(struct hstate *h, struct vm_area_struct *vma, @@ -1062,6 +1065,13 @@ static inline struct folio *alloc_hugetlb_folio(struct vm_area_struct *vma, } static inline struct folio * +alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, + nodemask_t *nmask, gfp_t gfp_mask) +{ + return NULL; +} + +static inline struct folio * alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index aaf508b..c2d44a1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2564,6 +2564,23 @@ struct folio *alloc_buddy_hugetlb_folio_with_mpol(struct hstate *h, return folio; } +struct folio *alloc_hugetlb_folio_reserve(struct hstate *h, int preferred_nid, + nodemask_t *nmask, gfp_t gfp_mask) +{ + struct folio *folio; + + spin_lock_irq(&hugetlb_lock); + folio = dequeue_hugetlb_folio_nodemask(h, gfp_mask, preferred_nid, + nmask); + if (folio) { + VM_BUG_ON(!h->resv_huge_pages); + h->resv_huge_pages--; + } + + spin_unlock_irq(&hugetlb_lock); + return folio; +} + /* folio migration callback function */ struct folio *alloc_hugetlb_folio_nodemask(struct hstate *h, int preferred_nid, nodemask_t *nmask, gfp_t gfp_mask, bool allow_alloc_fallback) diff --git a/mm/memfd.c b/mm/memfd.c index e7b7c52..bfe0e71 100644 --- a/mm/memfd.c +++ b/mm/memfd.c @@ -82,11 +82,10 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx) gfp_mask = htlb_alloc_mask(hstate_file(memfd)); gfp_mask &= ~(__GFP_HIGHMEM | __GFP_MOVABLE); - folio = alloc_hugetlb_folio_nodemask(hstate_file(memfd), - numa_node_id(), - NULL, - gfp_mask, - false); + folio = alloc_hugetlb_folio_reserve(hstate_file(memfd), + numa_node_id(), + NULL, + gfp_mask); if (folio && folio_try_get(folio)) { err = hugetlb_add_to_page_cache(folio, memfd->f_mapping,
memfd_pin_folios followed by unpin_folios leaves resv_huge_pages elevated if the pages were not already faulted in. During a normal page fault, resv_huge_pages is consumed here: hugetlb_fault() alloc_hugetlb_folio() dequeue_hugetlb_folio_vma() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- resv_huge_pages-- During memfd_pin_folios, the page is created by calling alloc_hugetlb_folio_nodemask instead of alloc_hugetlb_folio, and resv_huge_pages is not modified: memfd_alloc_folio() alloc_hugetlb_folio_nodemask() dequeue_hugetlb_folio_nodemask() dequeue_hugetlb_folio_node_exact() free_huge_pages-- alloc_hugetlb_folio_nodemask has other callers that must not modify resv_huge_pages. Therefore, to fix, define an alternate version of alloc_hugetlb_folio_nodemask for this call site that adjusts resv_huge_pages. Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios") Signed-off-by: Steve Sistare <steven.sistare@oracle.com> --- include/linux/hugetlb.h | 10 ++++++++++ mm/hugetlb.c | 17 +++++++++++++++++ mm/memfd.c | 9 ++++----- 3 files changed, 31 insertions(+), 5 deletions(-)