[2/6] mm: introduce put_user_page*(), placeholder versions

Message ID	20181012060014.10242-3-jhubbard@nvidia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <owner-linux-mm@kvack.org> Received-SPF: pass (google.com: domain of john.hubbard@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; From: john.hubbard@gmail.com To: Matthew Wilcox <willy@infradead.org>, Michal Hocko <mhocko@kernel.org>, Christopher Lameter <cl@linux.com>, Jason Gunthorpe <jgg@ziepe.ca>, Dan Williams <dan.j.williams@intel.com>, Jan Kara <jack@suse.cz> Cc: linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>, LKML <linux-kernel@vger.kernel.org>, linux-rdma <linux-rdma@vger.kernel.org>, linux-fsdevel@vger.kernel.org, John Hubbard <jhubbard@nvidia.com>, Al Viro <viro@zeniv.linux.org.uk>, Jerome Glisse <jglisse@redhat.com>, Christoph Hellwig <hch@infradead.org>, Ralph Campbell <rcampbell@nvidia.com> Subject: [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions Date: Thu, 11 Oct 2018 23:00:10 -0700 Message-Id: <20181012060014.10242-3-jhubbard@nvidia.com> In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com> References: <20181012060014.10242-1-jhubbard@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	RFC: gup+dma: tracking dma-pinned pages \| expand [0/6] RFC: gup+dma: tracking dma-pinned pages [1/6] mm: get_user_pages: consolidate error handling [2/6] mm: introduce put_user_page(), placeholder versions [3/6] infiniband/mm: convert put_page() to put_user_page() [4/6] mm: introduce page->dma_pinned_flags, _count [5/6] mm: introduce zone_gup_lock, for dma-pinned pages [6/6] mm: track gup pages with page->dma_pinned_* fields

Message ID

20181012060014.10242-3-jhubbard@nvidia.com (mailing list archive)

State

New, archived

Headers

Received-SPF: pass (google.com: domain of john.hubbard@gmail.com designates
 209.85.220.65 as permitted sender) client-ip=209.85.220.65;
From: john.hubbard@gmail.com
To: Matthew Wilcox <willy@infradead.org>,
	Michal Hocko <mhocko@kernel.org>,
	Christopher Lameter <cl@linux.com>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Dan Williams <dan.j.williams@intel.com>,
	Jan Kara <jack@suse.cz>
Cc: linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-rdma <linux-rdma@vger.kernel.org>,
	linux-fsdevel@vger.kernel.org,
	John Hubbard <jhubbard@nvidia.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Jerome Glisse <jglisse@redhat.com>,
	Christoph Hellwig <hch@infradead.org>,
	Ralph Campbell <rcampbell@nvidia.com>
Subject: [PATCH 2/6] mm: introduce put_user_page*(), placeholder versions
Date: Thu, 11 Oct 2018 23:00:10 -0700
Message-Id: <20181012060014.10242-3-jhubbard@nvidia.com>
In-Reply-To: <20181012060014.10242-1-jhubbard@nvidia.com>
References: <20181012060014.10242-1-jhubbard@nvidia.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Sender: owner-linux-mm@kvack.org
Precedence: bulk

Series

RFC: gup+dma: tracking dma-pinned pages | expand

Commit Message

john.hubbard@gmail.com Oct. 12, 2018, 6 a.m. UTC

From: John Hubbard <jhubbard@nvidia.com>

Introduces put_user_page(), which simply calls put_page().
This provides a way to update all get_user_pages*() callers,
so that they call put_user_page(), instead of put_page().

Also introduces put_user_pages(), and a few dirty/locked variations,
as a replacement for release_pages(), and also as a replacement
for open-coded loops that release multiple pages.
These may be used for subsequent performance improvements,
via batching of pages to be released.

This is the first step of fixing the problem described in [1]. The steps
are:

1) (This patch): provide put_user_page*() routines, intended to be used
   for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
   invoke put_user_page*(), instead of put_page(). This involves dozens of
   call sites, any will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
   implement tracking of these pages. This tracking will be separate from
   the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
   special handling (especially in writeback paths) when the pages are
   backed by a filesystem. Again, [1] provides details as to why that is
   desirable.

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

CC: Matthew Wilcox <willy@infradead.org>
CC: Michal Hocko <mhocko@kernel.org>
CC: Christopher Lameter <cl@linux.com>
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Jan Kara <jack@suse.cz>
CC: Al Viro <viro@zeniv.linux.org.uk>
CC: Jerome Glisse <jglisse@redhat.com>
CC: Christoph Hellwig <hch@infradead.org>
CC: Ralph Campbell <rcampbell@nvidia.com>

Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 include/linux/mm.h | 20 +++++++++++
 mm/swap.c          | 83 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 103 insertions(+)

Comments

Education Directorate Oct. 12, 2018, 7:35 a.m. UTC | #1

On Thu, Oct 11, 2018 at 11:00:10PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> Introduces put_user_page(), which simply calls put_page().
> This provides a way to update all get_user_pages*() callers,
> so that they call put_user_page(), instead of put_page().
> 
> Also introduces put_user_pages(), and a few dirty/locked variations,
> as a replacement for release_pages(), and also as a replacement
> for open-coded loops that release multiple pages.
> These may be used for subsequent performance improvements,
> via batching of pages to be released.
> 
> This is the first step of fixing the problem described in [1]. The steps
> are:
> 
> 1) (This patch): provide put_user_page*() routines, intended to be used
>    for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>    invoke put_user_page*(), instead of put_page(). This involves dozens of
>    call sites, any will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>    implement tracking of these pages. This tracking will be separate from
>    the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>    special handling (especially in writeback paths) when the pages are
>    backed by a filesystem. Again, [1] provides details as to why that is
>    desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> CC: Matthew Wilcox <willy@infradead.org>
> CC: Michal Hocko <mhocko@kernel.org>
> CC: Christopher Lameter <cl@linux.com>
> CC: Jason Gunthorpe <jgg@ziepe.ca>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: Jan Kara <jack@suse.cz>
> CC: Al Viro <viro@zeniv.linux.org.uk>
> CC: Jerome Glisse <jglisse@redhat.com>
> CC: Christoph Hellwig <hch@infradead.org>
> CC: Ralph Campbell <rcampbell@nvidia.com>
> 
> Reviewed-by: Jan Kara <jack@suse.cz>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  include/linux/mm.h | 20 +++++++++++
>  mm/swap.c          | 83 ++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 103 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 0416a7204be3..76d18aada9f8 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -943,6 +943,26 @@ static inline void put_page(struct page *page)
>  		__put_page(page);
>  }
>  
> +/*
> + * put_user_page() - release a page that had previously been acquired via
> + * a call to one of the get_user_pages*() functions.
> + *
> + * Pages that were pinned via get_user_pages*() must be released via
> + * either put_user_page(), or one of the put_user_pages*() routines
> + * below. This is so that eventually, pages that are pinned via
> + * get_user_pages*() can be separately tracked and uniquely handled. In
> + * particular, interactions with RDMA and filesystems need special
> + * handling.
> + */
> +static inline void put_user_page(struct page *page)
> +{
> +	put_page(page);
> +}
> +
> +void put_user_pages_dirty(struct page **pages, unsigned long npages);
> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
> +void put_user_pages(struct page **pages, unsigned long npages);
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/mm/swap.c b/mm/swap.c
> index 26fc9b5f1b6c..efab3a6b6f91 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -134,6 +134,89 @@ void put_pages_list(struct list_head *pages)
>  }
>  EXPORT_SYMBOL(put_pages_list);
>  
> +/*
> + * put_user_pages_dirty() - for each page in the @pages array, make
> + * that page (or its head page, if a compound page) dirty, if it was
> + * previously listed as clean. Then, release the page using
> + * put_user_page().
> + *
> + * Please see the put_user_page() documentation for details.
> + *
> + * set_page_dirty(), which does not lock the page, is used here.
> + * Therefore, it is the caller's responsibility to ensure that this is
> + * safe. If not, then put_user_pages_dirty_lock() should be called instead.
> + *
> + * @pages:  array of pages to be marked dirty and released.
> + * @npages: number of pages in the @pages array.
> + *
> + */
> +void put_user_pages_dirty(struct page **pages, unsigned long npages)
> +{
> +	unsigned long index;
> +
> +	for (index = 0; index < npages; index++) {
Do we need any checks on npages, npages <= (PUD_SHIFT - PAGE_SHIFT)?

> +		struct page *page = compound_head(pages[index]);
> +
> +		if (!PageDirty(page))
> +			set_page_dirty(page);
> +
> +		put_user_page(page);
> +	}
> +}
> +EXPORT_SYMBOL(put_user_pages_dirty);
> +
> +/*
> + * put_user_pages_dirty_lock() - for each page in the @pages array, make
> + * that page (or its head page, if a compound page) dirty, if it was
> + * previously listed as clean. Then, release the page using
> + * put_user_page().
> + *
> + * Please see the put_user_page() documentation for details.
> + *
> + * This is just like put_user_pages_dirty(), except that it invokes
> + * set_page_dirty_lock(), instead of set_page_dirty().
> + *
> + * @pages:  array of pages to be marked dirty and released.
> + * @npages: number of pages in the @pages array.
> + *
> + */
> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
> +{
> +	unsigned long index;
> +
> +	for (index = 0; index < npages; index++) {
> +		struct page *page = compound_head(pages[index]);
> +
> +		if (!PageDirty(page))
> +			set_page_dirty_lock(page);
> +
> +		put_user_page(page);
> +	}
> +}
> +EXPORT_SYMBOL(put_user_pages_dirty_lock);
> +

This can be collapsed w.r.t put_user_pages_dirty, a function pointer indirection
for the locked vs unlocked case, not sure how that affects function optimization.


> +/*
> + * put_user_pages() - for each page in the @pages array, release the page
> + * using put_user_page().
> + *
> + * Please see the put_user_page() documentation for details.
> + *
> + * This is just like put_user_pages_dirty(), except that it invokes
> + * set_page_dirty_lock(), instead of set_page_dirty().

The comment is incorrect.

> + *
> + * @pages:  array of pages to be marked dirty and released.
> + * @npages: number of pages in the @pages array.
> + *
> + */
> +void put_user_pages(struct page **pages, unsigned long npages)
> +{
> +	unsigned long index;
> +
> +	for (index = 0; index < npages; index++)
> +		put_user_page(pages[index]);
> +}

Ditto in terms of code duplication

How about

for_each_page_index(index, npages) {
	<do the dirty bits if needed>
	put_user_pages(pages[index]
}

Then pass what you want the page iterator to do


> +EXPORT_SYMBOL(put_user_pages);
> +
>  /*
>   * get_kernel_pages() - pin kernel pages in memory
>   * @kiov:	An array of struct kvec structures
> -- 
> 2.19.1
> 

Balbir Singh.

John Hubbard Oct. 12, 2018, 10:31 p.m. UTC | #2

On 10/12/18 12:35 AM, Balbir Singh wrote:
> On Thu, Oct 11, 2018 at 11:00:10PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
[...]>> +/*
>> + * put_user_pages_dirty() - for each page in the @pages array, make
>> + * that page (or its head page, if a compound page) dirty, if it was
>> + * previously listed as clean. Then, release the page using
>> + * put_user_page().
>> + *
>> + * Please see the put_user_page() documentation for details.
>> + *
>> + * set_page_dirty(), which does not lock the page, is used here.
>> + * Therefore, it is the caller's responsibility to ensure that this is
>> + * safe. If not, then put_user_pages_dirty_lock() should be called instead.
>> + *
>> + * @pages:  array of pages to be marked dirty and released.
>> + * @npages: number of pages in the @pages array.
>> + *
>> + */
>> +void put_user_pages_dirty(struct page **pages, unsigned long npages)
>> +{
>> +	unsigned long index;
>> +
>> +	for (index = 0; index < npages; index++) {
> Do we need any checks on npages, npages <= (PUD_SHIFT - PAGE_SHIFT)?
> 

Hi Balbir,

Thanks for combing through this series.

I'd go with "probably not", because the only check that can be done is
what you showed above: "did someone crazy pass in more pages than are
possible for this system?". I don't think checking for that helps here,
as that will show up loud and clear, in other ways.

The release_pages() implementation made the same judgment call to not 
check npages, which also influenced me.

A VM_BUG_ON could be added but I'd prefer not to, as it seems to have 
not enough benefit to be worth it.


>> +		struct page *page = compound_head(pages[index]);
>> +
>> +		if (!PageDirty(page))
>> +			set_page_dirty(page);
>> +
>> +		put_user_page(page);
>> +	}
>> +}
>> +EXPORT_SYMBOL(put_user_pages_dirty);
>> +
>> +/*
>> + * put_user_pages_dirty_lock() - for each page in the @pages array, make
>> + * that page (or its head page, if a compound page) dirty, if it was
>> + * previously listed as clean. Then, release the page using
>> + * put_user_page().
>> + *
>> + * Please see the put_user_page() documentation for details.
>> + *
>> + * This is just like put_user_pages_dirty(), except that it invokes
>> + * set_page_dirty_lock(), instead of set_page_dirty().
>> + *
>> + * @pages:  array of pages to be marked dirty and released.
>> + * @npages: number of pages in the @pages array.
>> + *
>> + */
>> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
>> +{
>> +	unsigned long index;
>> +
>> +	for (index = 0; index < npages; index++) {
>> +		struct page *page = compound_head(pages[index]);
>> +
>> +		if (!PageDirty(page))
>> +			set_page_dirty_lock(page);
>> +
>> +		put_user_page(page);
>> +	}
>> +}
>> +EXPORT_SYMBOL(put_user_pages_dirty_lock);
>> +
> 
> This can be collapsed w.r.t put_user_pages_dirty, a function pointer indirection
> for the locked vs unlocked case, not sure how that affects function optimization.
> 

OK, good point. I initially wanted to avoid the overhead of a function 
pointer, but considering that there are lots of other operations 
happening, I think you're right: best to just get rid of the code 
duplication. If later on we find that changing it back actually helps 
any benchmarks, that can always be done.

See below for how I'm planning on fixing it, and it is a nice little 
cleanup, so thanks for pointing that out.

>> +/*
>> + * put_user_pages() - for each page in the @pages array, release the page
>> + * using put_user_page().
>> + *
>> + * Please see the put_user_page() documentation for details.
>> + *
>> + * This is just like put_user_pages_dirty(), except that it invokes
>> + * set_page_dirty_lock(), instead of set_page_dirty().
> 
> The comment is incorrect.

Yes, it sure is! Jan spotted it before, and I fixed it once, then 
rebased off of the version right *before* the fix, so now I have to 
delete that sentence again.  It's hard to kill! :)

> 
>> + *
>> + * @pages:  array of pages to be marked dirty and released.
>> + * @npages: number of pages in the @pages array.
>> + *
>> + */
>> +void put_user_pages(struct page **pages, unsigned long npages)
>> +{
>> +	unsigned long index;
>> +
>> +	for (index = 0; index < npages; index++)
>> +		put_user_page(pages[index]);
>> +}
> 
> Ditto in terms of code duplication
> 

Here, I think you'll find that the end result, is sufficiently 
de-duplicated, after applying the function pointer above. Here's what it 
looks like without the comment blocks, below. I don't want to go any 
further than this, because the only thing left is the "for" loops, and 
macro-izing such a trivial thing is not really a win:


typedef int (*set_dirty_func)(struct page *page);

static void __put_user_pages_dirty(struct page **pages,
				   unsigned long npages,
				   set_dirty_func sdf)
{
	unsigned long index;

	for (index = 0; index < npages; index++) {
		struct page *page = compound_head(pages[index]);

		if (!PageDirty(page))
			sdf(page);

		put_user_page(page);
	}
}

void put_user_pages_dirty(struct page **pages, unsigned long npages)
{
	__put_user_pages_dirty(pages, npages, set_page_dirty);
}
EXPORT_SYMBOL(put_user_pages_dirty);

void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
{
	__put_user_pages_dirty(pages, npages, set_page_dirty_lock);
}
EXPORT_SYMBOL(put_user_pages_dirty_lock);

void put_user_pages(struct page **pages, unsigned long npages)
{
	unsigned long index;

	for (index = 0; index < npages; index++)
		put_user_page(pages[index]);
}
EXPORT_SYMBOL(put_user_pages);

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 0416a7204be3..76d18aada9f8 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -943,6 +943,26 @@  static inline void put_page(struct page *page)
 		__put_page(page);
 }
 
+/*
+ * put_user_page() - release a page that had previously been acquired via
+ * a call to one of the get_user_pages*() functions.
+ *
+ * Pages that were pinned via get_user_pages*() must be released via
+ * either put_user_page(), or one of the put_user_pages*() routines
+ * below. This is so that eventually, pages that are pinned via
+ * get_user_pages*() can be separately tracked and uniquely handled. In
+ * particular, interactions with RDMA and filesystems need special
+ * handling.
+ */
+static inline void put_user_page(struct page *page)
+{
+	put_page(page);
+}
+
+void put_user_pages_dirty(struct page **pages, unsigned long npages);
+void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
+void put_user_pages(struct page **pages, unsigned long npages);
+
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
 #define SECTION_IN_PAGE_FLAGS
 #endif
diff --git a/mm/swap.c b/mm/swap.c
index 26fc9b5f1b6c..efab3a6b6f91 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -134,6 +134,89 @@  void put_pages_list(struct list_head *pages)
 }
 EXPORT_SYMBOL(put_pages_list);
 
+/*
+ * put_user_pages_dirty() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * set_page_dirty(), which does not lock the page, is used here.
+ * Therefore, it is the caller's responsibility to ensure that this is
+ * safe. If not, then put_user_pages_dirty_lock() should be called instead.
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty(struct page **pages, unsigned long npages)
+{
+	unsigned long index;
+
+	for (index = 0; index < npages; index++) {
+		struct page *page = compound_head(pages[index]);
+
+		if (!PageDirty(page))
+			set_page_dirty(page);
+
+		put_user_page(page);
+	}
+}
+EXPORT_SYMBOL(put_user_pages_dirty);
+
+/*
+ * put_user_pages_dirty_lock() - for each page in the @pages array, make
+ * that page (or its head page, if a compound page) dirty, if it was
+ * previously listed as clean. Then, release the page using
+ * put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages_dirty_lock(struct page **pages, unsigned long npages)
+{
+	unsigned long index;
+
+	for (index = 0; index < npages; index++) {
+		struct page *page = compound_head(pages[index]);
+
+		if (!PageDirty(page))
+			set_page_dirty_lock(page);
+
+		put_user_page(page);
+	}
+}
+EXPORT_SYMBOL(put_user_pages_dirty_lock);
+
+/*
+ * put_user_pages() - for each page in the @pages array, release the page
+ * using put_user_page().
+ *
+ * Please see the put_user_page() documentation for details.
+ *
+ * This is just like put_user_pages_dirty(), except that it invokes
+ * set_page_dirty_lock(), instead of set_page_dirty().
+ *
+ * @pages:  array of pages to be marked dirty and released.
+ * @npages: number of pages in the @pages array.
+ *
+ */
+void put_user_pages(struct page **pages, unsigned long npages)
+{
+	unsigned long index;
+
+	for (index = 0; index < npages; index++)
+		put_user_page(pages[index]);
+}
+EXPORT_SYMBOL(put_user_pages);
+
 /*
  * get_kernel_pages() - pin kernel pages in memory
  * @kiov:	An array of struct kvec structures

[2/6] mm: introduce put_user_page*(), placeholder versions

Commit Message

Comments

Patch