Message ID | 20231114070044.464451-1-vivek.kasireddy@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages (v3) | expand |
On 14.11.23 08:00, Vivek Kasireddy wrote: > For drivers that would like to longterm-pin the pages associated > with a file, the pin_user_pages_fd() API provides an option to > not only pin the pages via FOLL_PIN but also to check and migrate > them if they reside in movable zone or CMA block. This API > currently works with files that belong to either shmem or hugetlbfs. > Files belonging to other filesystems are rejected for now. > > The pages need to be located first before pinning them via FOLL_PIN. > If they are found in the page cache, they can be immediately pinned. > Otherwise, they need to be allocated using the filesystem specific > APIs and then pinned. > > v2: > - Drop gup_flags and improve comments and commit message (David) > - Allocate a page if we cannot find in page cache for the hugetlbfs > case as well (David) > - Don't unpin pages if there is a migration related failure (David) > - Drop the unnecessary nr_pages <= 0 check (Jason) > - Have the caller of the API pass in file * instead of fd (Jason) > > v3: (David) > - Enclose the huge page allocation code with #ifdef CONFIG_HUGETLB_PAGE > (Build error reported by kernel test robot <lkp@intel.com>) > - Don't forget memalloc_pin_restore() on non-migration related errors > - Improve the readability of the cleanup code associated with > non-migration related errors > - Augment the comments by describing FOLL_LONGTERM like behavior > - Include the R-b tag from Jason > > Cc: David Hildenbrand <david@redhat.com> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch> > Cc: Mike Kravetz <mike.kravetz@oracle.com> > Cc: Hugh Dickins <hughd@google.com> > Cc: Peter Xu <peterx@redhat.com> > Cc: Gerd Hoffmann <kraxel@redhat.com> > Cc: Dongwon Kim <dongwon.kim@intel.com> > Cc: Junxiao Chang <junxiao.chang@intel.com> > Suggested-by: Jason Gunthorpe <jgg@nvidia.com> > Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> (v2) > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@intel.com> > --- > include/linux/mm.h | 2 + > mm/gup.c | 109 +++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 111 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 418d26608ece..1b675fa35059 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags); > long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > struct page **pages, unsigned int gup_flags); > +long pin_user_pages_fd(struct file *file, pgoff_t start, > + unsigned long nr_pages, struct page **pages); > > int get_user_pages_fast(unsigned long start, int nr_pages, > unsigned int gup_flags, struct page **pages); > diff --git a/mm/gup.c b/mm/gup.c > index 231711efa390..b3af967cdff1 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -3410,3 +3410,112 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, > &locked, gup_flags); > } > EXPORT_SYMBOL(pin_user_pages_unlocked); > + > +static struct page *alloc_file_page(struct file *file, pgoff_t idx) > +{ > +#ifdef CONFIG_HUGETLB_PAGE > + struct page *page = ERR_PTR(-ENOMEM); > + struct folio *folio; > + int err; > + > + if (is_file_hugepages(file)) { > + folio = alloc_hugetlb_folio_nodemask(hstate_file(file), > + NUMA_NO_NODE, > + NULL, > + GFP_USER); > + if (folio && folio_try_get(folio)) { > + page = &folio->page; > + err = hugetlb_add_to_page_cache(folio, > + file->f_mapping, > + idx); > + if (err) { > + folio_put(folio); > + free_huge_folio(folio); > + page = ERR_PTR(err); > + } > + } > + return page; You could avoid the "page" variable completely simply by using 3 return statements. LGTM, thanks Reviewed-by: David Hildenbrand <david@redhat.com>
diff --git a/include/linux/mm.h b/include/linux/mm.h index 418d26608ece..1b675fa35059 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, struct page **pages, unsigned int gup_flags); +long pin_user_pages_fd(struct file *file, pgoff_t start, + unsigned long nr_pages, struct page **pages); int get_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); diff --git a/mm/gup.c b/mm/gup.c index 231711efa390..b3af967cdff1 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -3410,3 +3410,112 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages, &locked, gup_flags); } EXPORT_SYMBOL(pin_user_pages_unlocked); + +static struct page *alloc_file_page(struct file *file, pgoff_t idx) +{ +#ifdef CONFIG_HUGETLB_PAGE + struct page *page = ERR_PTR(-ENOMEM); + struct folio *folio; + int err; + + if (is_file_hugepages(file)) { + folio = alloc_hugetlb_folio_nodemask(hstate_file(file), + NUMA_NO_NODE, + NULL, + GFP_USER); + if (folio && folio_try_get(folio)) { + page = &folio->page; + err = hugetlb_add_to_page_cache(folio, + file->f_mapping, + idx); + if (err) { + folio_put(folio); + free_huge_folio(folio); + page = ERR_PTR(err); + } + } + return page; + } +#endif + return shmem_read_mapping_page(file->f_mapping, idx); +} + +/** + * pin_user_pages_fd() - pin user pages associated with a file + * @file: the file whose pages are to be pinned + * @start: starting file offset + * @nr_pages: number of pages from start to pin + * @pages: array that receives pointers to the pages pinned. + * Should be at-least nr_pages long. + * + * Attempt to pin pages associated with a file that belongs to either shmem + * or hugetlb. The pages are either found in the page cache or allocated if + * necessary. Once the pages are located, they are all pinned via FOLL_PIN. + * And, these pinned pages need to be released either using unpin_user_pages() + * or unpin_user_page(). + * + * It must be noted that the pages may be pinned for an indefinite amount + * of time. And, in most cases, the duration of time they may stay pinned + * would be controlled by the userspace. This behavior is effectively the + * same as using FOLL_LONGTERM with other GUP APIs. + * + * Returns number of pages pinned. This would be equal to the number of + * pages requested. If no pages were pinned, it returns -errno. + */ +long pin_user_pages_fd(struct file *file, pgoff_t start, + unsigned long nr_pages, struct page **pages) +{ + struct page *page; + unsigned int flags, i; + long ret; + + if (start < 0) + return -EINVAL; + + if (!file) + return -EINVAL; + + if (!shmem_file(file) && !is_file_hugepages(file)) + return -EINVAL; + + flags = memalloc_pin_save(); + do { + for (i = 0; i < nr_pages; i++) { + /* + * In most cases, we should be able to find the page + * in the page cache. If we cannot find it, we try to + * allocate one and add it to the page cache. + */ + page = find_get_page_flags(file->f_mapping, + start + i, + FGP_ACCESSED); + if (!page) { + page = alloc_file_page(file, start + i); + if (IS_ERR(page)) { + ret = PTR_ERR(page); + goto err; + } + } + ret = try_grab_page(page, FOLL_PIN); + if (unlikely(ret)) + goto err; + + pages[i] = page; + put_page(pages[i]); + } + + ret = check_and_migrate_movable_pages(nr_pages, pages); + } while (ret == -EAGAIN); + + memalloc_pin_restore(flags); + return ret ? ret : nr_pages; +err: + memalloc_pin_restore(flags); + while (i-- > 0) + if (pages[i]) + unpin_user_page(pages[i]); + + return ret; +} +EXPORT_SYMBOL_GPL(pin_user_pages_fd); +