Message ID | 20190208075649.3025-2-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: put_user_page() call site conversion first | expand |
On Thu, Feb 07, 2019 at 11:56:48PM -0800, john.hubbard@gmail.com wrote: > From: John Hubbard <jhubbard@nvidia.com> > > Introduces put_user_page(), which simply calls put_page(). > This provides a way to update all get_user_pages*() callers, > so that they call put_user_page(), instead of put_page(). > > Also introduces put_user_pages(), and a few dirty/locked variations, > as a replacement for release_pages(), and also as a replacement > for open-coded loops that release multiple pages. > These may be used for subsequent performance improvements, > via batching of pages to be released. > > This is the first step of fixing a problem (also described in [1] and > [2]) with interactions between get_user_pages ("gup") and filesystems. > > Problem description: let's start with a bug report. Below, is what happens > sometimes, under memory pressure, when a driver pins some pages via gup, > and then marks those pages dirty, and releases them. Note that the gup > documentation actually recommends that pattern. The problem is that the > filesystem may do a writeback while the pages were gup-pinned, and then the > filesystem believes that the pages are clean. So, when the driver later > marks the pages as dirty, that conflicts with the filesystem's page > tracking and results in a BUG(), like this one that I experienced: > > kernel BUG at /build/linux-fQ94TU/linux-4.4.0/fs/ext4/inode.c:1899! > backtrace: > ext4_writepage > __writepage > write_cache_pages > ext4_writepages > do_writepages > __writeback_single_inode > writeback_sb_inodes > __writeback_inodes_wb > wb_writeback > wb_workfn > process_one_work > worker_thread > kthread > ret_from_fork > > ...which is due to the file system asserting that there are still buffer > heads attached: > > ({ \ > BUG_ON(!PagePrivate(page)); \ > ((struct buffer_head *)page_private(page)); \ > }) > > Dave Chinner's description of this is very clear: > > "The fundamental issue is that ->page_mkwrite must be called on every > write access to a clean file backed page, not just the first one. > How long the GUP reference lasts is irrelevant, if the page is clean > and you need to dirty it, you must call ->page_mkwrite before it is > marked writeable and dirtied. Every. Time." > > This is just one symptom of the larger design problem: filesystems do not > actually support get_user_pages() being called on their pages, and letting > hardware write directly to those pages--even though that patter has been > going on since about 2005 or so. > > The steps are to fix it are: > > 1) (This patch): provide put_user_page*() routines, intended to be used > for releasing pages that were pinned via get_user_pages*(). > > 2) Convert all of the call sites for get_user_pages*(), to > invoke put_user_page*(), instead of put_page(). This involves dozens of > call sites, and will take some time. > > 3) After (2) is complete, use get_user_pages*() and put_user_page*() to > implement tracking of these pages. This tracking will be separate from > the existing struct page refcounting. > > 4) Use the tracking and identification of these pages, to implement > special handling (especially in writeback paths) when the pages are > backed by a filesystem. > > [1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()" > [2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > > Cc: Al Viro <viro@zeniv.linux.org.uk> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Christopher Lameter <cl@linux.com> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Dave Chinner <david@fromorbit.com> > Cc: Jan Kara <jack@suse.cz> > Cc: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Jerome Glisse <jglisse@redhat.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Michal Hocko <mhocko@kernel.org> > Cc: Mike Rapoport <rppt@linux.ibm.com> > Cc: Ralph Campbell <rcampbell@nvidia.com> > > Reviewed-by: Jan Kara <jack@suse.cz> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > include/linux/mm.h | 24 ++++++++++++++ > mm/swap.c | 82 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 106 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 80bb6408fe73..809b7397d41e 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -993,6 +993,30 @@ static inline void put_page(struct page *page) > __put_page(page); > } > > +/** > + * put_user_page() - release a gup-pinned page > + * @page: pointer to page to be released > + * > + * Pages that were pinned via get_user_pages*() must be released via > + * either put_user_page(), or one of the put_user_pages*() routines > + * below. This is so that eventually, pages that are pinned via > + * get_user_pages*() can be separately tracked and uniquely handled. In > + * particular, interactions with RDMA and filesystems need special > + * handling. > + * > + * put_user_page() and put_page() are not interchangeable, despite this early > + * implementation that makes them look the same. put_user_page() calls must I just hope we'll remember to update when the real implementation will be merged ;-) Other than that, feel free to add Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> # docs > + * be perfectly matched up with get_user_page() calls. > + */ > +static inline void put_user_page(struct page *page) > +{ > + put_page(page); > +} > + > +void put_user_pages_dirty(struct page **pages, unsigned long npages); > +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); > +void put_user_pages(struct page **pages, unsigned long npages); > + > #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > #define SECTION_IN_PAGE_FLAGS > #endif > diff --git a/mm/swap.c b/mm/swap.c > index 4929bc1be60e..7c42ca45bb89 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -133,6 +133,88 @@ void put_pages_list(struct list_head *pages) > } > EXPORT_SYMBOL(put_pages_list); > > +typedef int (*set_dirty_func)(struct page *page); > + > +static void __put_user_pages_dirty(struct page **pages, > + unsigned long npages, > + set_dirty_func sdf) > +{ > + unsigned long index; > + > + for (index = 0; index < npages; index++) { > + struct page *page = compound_head(pages[index]); > + > + if (!PageDirty(page)) > + sdf(page); > + > + put_user_page(page); > + } > +} > + > +/** > + * put_user_pages_dirty() - release and dirty an array of gup-pinned pages > + * @pages: array of pages to be marked dirty and released. > + * @npages: number of pages in the @pages array. > + * > + * "gup-pinned page" refers to a page that has had one of the get_user_pages() > + * variants called on that page. > + * > + * For each page in the @pages array, make that page (or its head page, if a > + * compound page) dirty, if it was previously listed as clean. Then, release > + * the page using put_user_page(). > + * > + * Please see the put_user_page() documentation for details. > + * > + * set_page_dirty(), which does not lock the page, is used here. > + * Therefore, it is the caller's responsibility to ensure that this is > + * safe. If not, then put_user_pages_dirty_lock() should be called instead. > + * > + */ > +void put_user_pages_dirty(struct page **pages, unsigned long npages) > +{ > + __put_user_pages_dirty(pages, npages, set_page_dirty); > +} > +EXPORT_SYMBOL(put_user_pages_dirty); > + > +/** > + * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages > + * @pages: array of pages to be marked dirty and released. > + * @npages: number of pages in the @pages array. > + * > + * For each page in the @pages array, make that page (or its head page, if a > + * compound page) dirty, if it was previously listed as clean. Then, release > + * the page using put_user_page(). > + * > + * Please see the put_user_page() documentation for details. > + * > + * This is just like put_user_pages_dirty(), except that it invokes > + * set_page_dirty_lock(), instead of set_page_dirty(). > + * > + */ > +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) > +{ > + __put_user_pages_dirty(pages, npages, set_page_dirty_lock); > +} > +EXPORT_SYMBOL(put_user_pages_dirty_lock); > + > +/** > + * put_user_pages() - release an array of gup-pinned pages. > + * @pages: array of pages to be marked dirty and released. > + * @npages: number of pages in the @pages array. > + * > + * For each page in the @pages array, release the page using put_user_page(). > + * > + * Please see the put_user_page() documentation for details. > + */ > +void put_user_pages(struct page **pages, unsigned long npages) > +{ > + unsigned long index; > + > + for (index = 0; index < npages; index++) > + put_user_page(pages[index]); > +} > +EXPORT_SYMBOL(put_user_pages); > + > /* > * get_kernel_pages() - pin kernel pages in memory > * @kiov: An array of struct kvec structures > -- > 2.20.1 >
On 2/8/19 2:32 AM, Mike Rapoport wrote: > On Thu, Feb 07, 2019 at 11:56:48PM -0800, john.hubbard@gmail.com wrote: >> From: John Hubbard <jhubbard@nvidia.com> [...] >> +/** >> + * put_user_page() - release a gup-pinned page >> + * @page: pointer to page to be released >> + * >> + * Pages that were pinned via get_user_pages*() must be released via >> + * either put_user_page(), or one of the put_user_pages*() routines >> + * below. This is so that eventually, pages that are pinned via >> + * get_user_pages*() can be separately tracked and uniquely handled. In >> + * particular, interactions with RDMA and filesystems need special >> + * handling. >> + * >> + * put_user_page() and put_page() are not interchangeable, despite this early >> + * implementation that makes them look the same. put_user_page() calls must > > I just hope we'll remember to update when the real implementation will be > merged ;-) > > Other than that, feel free to add > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> # docs > Thanks for the review! Yes, the follow-on patch that turns this into a real implementation is posted [1], and its documentation is updated accordingly. (I've already changed "@Returns" to "@Return" locally in that patch, btw.) [1] https://lore.kernel.org/r/20190204052135.25784-5-jhubbard@nvidia.com thanks,
diff --git a/include/linux/mm.h b/include/linux/mm.h index 80bb6408fe73..809b7397d41e 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -993,6 +993,30 @@ static inline void put_page(struct page *page) __put_page(page); } +/** + * put_user_page() - release a gup-pinned page + * @page: pointer to page to be released + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + * + * put_user_page() and put_page() are not interchangeable, despite this early + * implementation that makes them look the same. put_user_page() calls must + * be perfectly matched up with get_user_page() calls. + */ +static inline void put_user_page(struct page *page) +{ + put_page(page); +} + +void put_user_pages_dirty(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages(struct page **pages, unsigned long npages); + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/mm/swap.c b/mm/swap.c index 4929bc1be60e..7c42ca45bb89 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -133,6 +133,88 @@ void put_pages_list(struct list_head *pages) } EXPORT_SYMBOL(put_pages_list); +typedef int (*set_dirty_func)(struct page *page); + +static void __put_user_pages_dirty(struct page **pages, + unsigned long npages, + set_dirty_func sdf) +{ + unsigned long index; + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + + if (!PageDirty(page)) + sdf(page); + + put_user_page(page); + } +} + +/** + * put_user_pages_dirty() - release and dirty an array of gup-pinned pages + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * "gup-pinned page" refers to a page that has had one of the get_user_pages() + * variants called on that page. + * + * For each page in the @pages array, make that page (or its head page, if a + * compound page) dirty, if it was previously listed as clean. Then, release + * the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * set_page_dirty(), which does not lock the page, is used here. + * Therefore, it is the caller's responsibility to ensure that this is + * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * + */ +void put_user_pages_dirty(struct page **pages, unsigned long npages) +{ + __put_user_pages_dirty(pages, npages, set_page_dirty); +} +EXPORT_SYMBOL(put_user_pages_dirty); + +/** + * put_user_pages_dirty_lock() - release and dirty an array of gup-pinned pages + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * For each page in the @pages array, make that page (or its head page, if a + * compound page) dirty, if it was previously listed as clean. Then, release + * the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * This is just like put_user_pages_dirty(), except that it invokes + * set_page_dirty_lock(), instead of set_page_dirty(). + * + */ +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages) +{ + __put_user_pages_dirty(pages, npages, set_page_dirty_lock); +} +EXPORT_SYMBOL(put_user_pages_dirty_lock); + +/** + * put_user_pages() - release an array of gup-pinned pages. + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + * For each page in the @pages array, release the page using put_user_page(). + * + * Please see the put_user_page() documentation for details. + */ +void put_user_pages(struct page **pages, unsigned long npages) +{ + unsigned long index; + + for (index = 0; index < npages; index++) + put_user_page(pages[index]); +} +EXPORT_SYMBOL(put_user_pages); + /* * get_kernel_pages() - pin kernel pages in memory * @kiov: An array of struct kvec structures