diff mbox series

[3/4] infiniband/mm: convert to the new put_user_page() call

Message ID 20180928053949.5381-3-jhubbard@nvidia.com (mailing list archive)
State New, archived
Headers show
Series get_user_pages*() and RDMA: first steps | expand

Commit Message

john.hubbard@gmail.com Sept. 28, 2018, 5:39 a.m. UTC
From: John Hubbard <jhubbard@nvidia.com>

For code that retains pages via get_user_pages*(),
release those pages via the new put_user_page(),
instead of put_page().

This prepares for eventually fixing the problem described
in [1], and is following a plan listed in [2].

[1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"

[2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubbard@nvidia.com
    Proposed steps for fixing get_user_pages() + DMA problems.

CC: Doug Ledford <dledford@redhat.com>
CC: Jason Gunthorpe <jgg@ziepe.ca>
CC: Mike Marciniszyn <mike.marciniszyn@intel.com>
CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
CC: Christian Benvenuti <benve@cisco.com>

CC: linux-rdma@vger.kernel.org
CC: linux-kernel@vger.kernel.org
CC: linux-mm@kvack.org
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem.c              | 2 +-
 drivers/infiniband/core/umem_odp.c          | 2 +-
 drivers/infiniband/hw/hfi1/user_pages.c     | 2 +-
 drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
 drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 ++++----
 drivers/infiniband/hw/usnic/usnic_uiom.c    | 2 +-
 7 files changed, 12 insertions(+), 12 deletions(-)

Comments

Jason Gunthorpe Sept. 28, 2018, 3:39 p.m. UTC | #1
On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubbard@gmail.com wrote:
> From: John Hubbard <jhubbard@nvidia.com>
> 
> For code that retains pages via get_user_pages*(),
> release those pages via the new put_user_page(),
> instead of put_page().
> 
> This prepares for eventually fixing the problem described
> in [1], and is following a plan listed in [2].
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> [2] https://lkml.kernel.org/r/20180709080554.21931-1-jhubbard@nvidia.com
>     Proposed steps for fixing get_user_pages() + DMA problems.
> 
> CC: Doug Ledford <dledford@redhat.com>
> CC: Jason Gunthorpe <jgg@ziepe.ca>
> CC: Mike Marciniszyn <mike.marciniszyn@intel.com>
> CC: Dennis Dalessandro <dennis.dalessandro@intel.com>
> CC: Christian Benvenuti <benve@cisco.com>
> 
> CC: linux-rdma@vger.kernel.org
> CC: linux-kernel@vger.kernel.org
> CC: linux-mm@kvack.org
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>  drivers/infiniband/core/umem.c              | 2 +-
>  drivers/infiniband/core/umem_odp.c          | 2 +-
>  drivers/infiniband/hw/hfi1/user_pages.c     | 2 +-
>  drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++---
>  drivers/infiniband/hw/qib/qib_user_pages.c  | 2 +-
>  drivers/infiniband/hw/qib/qib_user_sdma.c   | 8 ++++----
>  drivers/infiniband/hw/usnic/usnic_uiom.c    | 2 +-
>  7 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index a41792dbae1f..9430d697cb9f 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>  		page = sg_page(sg);
>  		if (!PageDirty(page) && umem->writable && dirty)
>  			set_page_dirty_lock(page);
> -		put_page(page);
> +		put_user_page(page);

Would it make sense to have a release/put_user_pages_dirtied to absorb
the set_page_dity pattern too? I notice in this patch there is some
variety here, I wonder what is the right way?

Also, I'm told this code here is a big performance bottleneck when the
number of pages becomes very long (think >> GB of memory), so having a
future path to use some kind of batching/threading sound great.

Otherwise this RDMA part seems fine to me, there might be some minor
conflicts however. I assume you want to run this through the -mm tree?

Acked-by: Jason Gunthorpe <jgg@mellanox.com>

Jason
John Hubbard Sept. 29, 2018, 3:12 a.m. UTC | #2
On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
[...]
>>
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index a41792dbae1f..9430d697cb9f 100644
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>>  		page = sg_page(sg);
>>  		if (!PageDirty(page) && umem->writable && dirty)
>>  			set_page_dirty_lock(page);
>> -		put_page(page);
>> +		put_user_page(page);
> 
> Would it make sense to have a release/put_user_pages_dirtied to absorb
> the set_page_dity pattern too? I notice in this patch there is some
> variety here, I wonder what is the right way?
> 
> Also, I'm told this code here is a big performance bottleneck when the
> number of pages becomes very long (think >> GB of memory), so having a
> future path to use some kind of batching/threading sound great.
> 

Yes. And you asked for this the first time, too. Consistent! :) Sorry for
being slow to pick it up. It looks like there are several patterns, and
we have to support both set_page_dirty() and set_page_dirty_lock(). So
the best combination looks to be adding a few variations of
release_user_pages*(), but leaving put_user_page() alone, because it's
the "do it yourself" basic one. Scatter-gather will be stuck with that.

Here's a differential patch with that, that shows a nice little cleanup in 
a couple of IB places, and as you point out, it also provides the hooks for 
performance upgrades (via batching) in the future.

Does this API look about right?

diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index c7516029af33..48afec362c31 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -123,11 +123,7 @@ void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 {
        size_t i;
 
-       for (i = 0; i < npages; i++) {
-               if (dirty)
-                       set_page_dirty_lock(p[i]);
-               put_user_page(p[i]);
-       }
+       release_user_pages_lock(p, npages, dirty);
 
        if (mm) { /* during close after signal, mm can be NULL */
                down_write(&mm->mmap_sem);
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 3f8fd42dd7fc..c57a3a6730b6 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -40,13 +40,7 @@
 static void __qib_release_user_pages(struct page **p, size_t num_pages,
                                     int dirty)
 {
-       size_t i;
-
-       for (i = 0; i < num_pages; i++) {
-               if (dirty)
-                       set_page_dirty_lock(p[i]);
-               put_user_page(p[i]);
-       }
+       release_user_pages_lock(p, num_pages, dirty);
 }
 
 /*
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 72caf803115f..b280d0181e06 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -138,6 +138,9 @@ extern int overcommit_ratio_handler(struct ctl_table *, int, void __user *,
 extern int overcommit_kbytes_handler(struct ctl_table *, int, void __user *,
                                    size_t *, loff_t *);
 
+int set_page_dirty(struct page *page);
+int set_page_dirty_lock(struct page *page);
+
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
 /* to align the pointer to the (next) page boundary */
@@ -949,12 +952,56 @@ static inline void put_user_page(struct page *page)
        put_page(page);
 }
 
-/* A drop-in replacement for release_pages(): */
+/* For get_user_pages*()-pinned pages, use these variants instead of
+ * release_pages():
+ */
+static inline void release_user_pages_dirty(struct page **pages,
+                                           unsigned long npages)
+{
+       while (npages) {
+               set_page_dirty(pages[npages]);
+               put_user_page(pages[npages]);
+               --npages;
+       }
+}
+
+static inline void release_user_pages_dirty_lock(struct page **pages,
+                                                unsigned long npages)
+{
+       while (npages) {
+               set_page_dirty_lock(pages[npages]);
+               put_user_page(pages[npages]);
+               --npages;
+       }
+}
+
+static inline void release_user_pages_basic(struct page **pages,
+                                           unsigned long npages)
+{
+       while (npages) {
+               put_user_page(pages[npages]);
+               --npages;
+       }
+}
+
 static inline void release_user_pages(struct page **pages,
-                                     unsigned long npages)
+                                     unsigned long npages,
+                                     bool set_dirty)
 {
-       while (npages)
-               put_user_page(pages[--npages]);
+       if (set_dirty)
+               release_user_pages_dirty(pages, npages);
+       else
+               release_user_pages_basic(pages, npages);
+}
+
+static inline void release_user_pages_lock(struct page **pages,
+                                          unsigned long npages,
+                                          bool set_dirty)
+{
+       if (set_dirty)
+               release_user_pages_dirty_lock(pages, npages);
+       else
+               release_user_pages_basic(pages, npages);
 }
 
 #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
@@ -1548,8 +1595,6 @@ int redirty_page_for_writepage(struct writeback_control *wbc,
 void account_page_dirtied(struct page *page, struct address_space *mapping);
 void account_page_cleaned(struct page *page, struct address_space *mapping,
                          struct bdi_writeback *wb);
-int set_page_dirty(struct page *page);
-int set_page_dirty_lock(struct page *page);
 void __cancel_dirty_page(struct page *page);
 static inline void cancel_dirty_page(struct page *page)
 {


> Otherwise this RDMA part seems fine to me, there might be some minor
> conflicts however. I assume you want to run this through the -mm tree?
> 
> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
> 

Great, thanks for the ACK.


thanks,
Matthew Wilcox Sept. 29, 2018, 4:21 p.m. UTC | #3
On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
> >>  		page = sg_page(sg);
> >>  		if (!PageDirty(page) && umem->writable && dirty)
> >>  			set_page_dirty_lock(page);
> >> -		put_page(page);
> >> +		put_user_page(page);
> > 
> > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > the set_page_dity pattern too? I notice in this patch there is some
> > variety here, I wonder what is the right way?
> > 
> > Also, I'm told this code here is a big performance bottleneck when the
> > number of pages becomes very long (think >> GB of memory), so having a
> > future path to use some kind of batching/threading sound great.
> 
> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> being slow to pick it up. It looks like there are several patterns, and
> we have to support both set_page_dirty() and set_page_dirty_lock(). So
> the best combination looks to be adding a few variations of
> release_user_pages*(), but leaving put_user_page() alone, because it's
> the "do it yourself" basic one. Scatter-gather will be stuck with that.

I think our current interfaces are wrong.  We should really have a
get_user_sg() / put_user_sg() function that will set up / destroy an
SG list appropriate for that range of user memory.  This is almost
orthogonal to the original intent here, so please don't see this as a
"must do first" kind of argument that might derail the whole thing.
Jason Gunthorpe Sept. 29, 2018, 7:19 p.m. UTC | #4
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> On Fri, Sep 28, 2018 at 08:12:33PM -0700, John Hubbard wrote:
> > >> +++ b/drivers/infiniband/core/umem.c
> > >> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
> > >>  		page = sg_page(sg);
> > >>  		if (!PageDirty(page) && umem->writable && dirty)
> > >>  			set_page_dirty_lock(page);
> > >> -		put_page(page);
> > >> +		put_user_page(page);
> > > 
> > > Would it make sense to have a release/put_user_pages_dirtied to absorb
> > > the set_page_dity pattern too? I notice in this patch there is some
> > > variety here, I wonder what is the right way?
> > > 
> > > Also, I'm told this code here is a big performance bottleneck when the
> > > number of pages becomes very long (think >> GB of memory), so having a
> > > future path to use some kind of batching/threading sound great.
> > 
> > Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

This would be an excellent API, and it is exactly what this 'umem'
code in RDMA is trying to do for RDMA drivers..

I suspect it could do a much better job if it wasn't hindered by the
get_user_pages API - ie it could directly stuff huge pages into the SG
list instead of breaking them up, maybe also avoid the temporary memory
allocations for page * pointers, etc?

Jason
Christoph Hellwig Oct. 1, 2018, 12:50 p.m. UTC | #5
On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > being slow to pick it up. It looks like there are several patterns, and
> > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > the best combination looks to be adding a few variations of
> > release_user_pages*(), but leaving put_user_page() alone, because it's
> > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> I think our current interfaces are wrong.  We should really have a
> get_user_sg() / put_user_sg() function that will set up / destroy an
> SG list appropriate for that range of user memory.  This is almost
> orthogonal to the original intent here, so please don't see this as a
> "must do first" kind of argument that might derail the whole thing.

The SG list really is the wrong interface, as it mixes up information
about the pages/phys addr range and a potential dma mapping.  I think
the right interface is an array of bio_vecs.  In fact I've recently
been looking into a get_user_pages variant that does fill bio_vecs,
as it fundamentally is the right thing for doing I/O on large pages,
and will really help with direct I/O performance in that case.
Dennis Dalessandro Oct. 1, 2018, 2:35 p.m. UTC | #6
On 9/28/2018 11:12 PM, John Hubbard wrote:
> On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
>> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubbard@gmail.com wrote:
>>> From: John Hubbard <jhubbard@nvidia.com>
> [...]
>>>
>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>>> index a41792dbae1f..9430d697cb9f 100644
>>> +++ b/drivers/infiniband/core/umem.c
>>> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>>>   		page = sg_page(sg);
>>>   		if (!PageDirty(page) && umem->writable && dirty)
>>>   			set_page_dirty_lock(page);
>>> -		put_page(page);
>>> +		put_user_page(page);
>>
>> Would it make sense to have a release/put_user_pages_dirtied to absorb
>> the set_page_dity pattern too? I notice in this patch there is some
>> variety here, I wonder what is the right way?
>>
>> Also, I'm told this code here is a big performance bottleneck when the
>> number of pages becomes very long (think >> GB of memory), so having a
>> future path to use some kind of batching/threading sound great.
>>
> 
> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
> being slow to pick it up. It looks like there are several patterns, and
> we have to support both set_page_dirty() and set_page_dirty_lock(). So
> the best combination looks to be adding a few variations of
> release_user_pages*(), but leaving put_user_page() alone, because it's
> the "do it yourself" basic one. Scatter-gather will be stuck with that.
> 
> Here's a differential patch with that, that shows a nice little cleanup in
> a couple of IB places, and as you point out, it also provides the hooks for
> performance upgrades (via batching) in the future.
> 
> Does this API look about right?

I'm on board with that and the changes to hfi1 and qib.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Matthew Wilcox Oct. 1, 2018, 3:29 p.m. UTC | #7
On Mon, Oct 01, 2018 at 05:50:13AM -0700, Christoph Hellwig wrote:
> On Sat, Sep 29, 2018 at 09:21:17AM -0700, Matthew Wilcox wrote:
> > > being slow to pick it up. It looks like there are several patterns, and
> > > we have to support both set_page_dirty() and set_page_dirty_lock(). So
> > > the best combination looks to be adding a few variations of
> > > release_user_pages*(), but leaving put_user_page() alone, because it's
> > > the "do it yourself" basic one. Scatter-gather will be stuck with that.
> > 
> > I think our current interfaces are wrong.  We should really have a
> > get_user_sg() / put_user_sg() function that will set up / destroy an
> > SG list appropriate for that range of user memory.  This is almost
> > orthogonal to the original intent here, so please don't see this as a
> > "must do first" kind of argument that might derail the whole thing.
> 
> The SG list really is the wrong interface, as it mixes up information
> about the pages/phys addr range and a potential dma mapping.  I think
> the right interface is an array of bio_vecs.  In fact I've recently
> been looking into a get_user_pages variant that does fill bio_vecs,
> as it fundamentally is the right thing for doing I/O on large pages,
> and will really help with direct I/O performance in that case.

I don't think the bio_vec is really a big improvement; it's just a (page,
offset, length) tuple.  Not to mention that due to the annoying divergence
between block and networking [1] this is actually a less useful interface.

I don't understand the dislike of the sg list.  Other than for special
cases which we should't be optimising for (ramfs, brd, loopback
filesystems), when we get a page to do I/O, we're going to want a dma
mapping for them.  It makes sense to already allocate space to store
the mapping at the outset.

[1] Can we ever admit that the bio_vec and the skb_frag_t are actually
the same thing?
Christoph Hellwig Oct. 1, 2018, 3:51 p.m. UTC | #8
On Mon, Oct 01, 2018 at 08:29:29AM -0700, Matthew Wilcox wrote:
> I don't understand the dislike of the sg list.  Other than for special
> cases which we should't be optimising for (ramfs, brd, loopback
> filesystems), when we get a page to do I/O, we're going to want a dma
> mapping for them.  It makes sense to already allocate space to store
> the mapping at the outset.

We don't actually need the space - the scatterlist forces it on us,
otherwise we could translate directly in the on-disk format and
save that duplicate space.  I have prototypes for NVMe and RDMA that do
away with the scatterlist entirely.

And even if we are still using the scatterlist as we do right now we'd
need a second scatterlist at least for block / file system based I/O
as we can't plug the scatterlist into the I/O stack (nevermind that
due to splitting merging the lower one might not map 1:1 to the upper
one).

> [1] Can we ever admit that the bio_vec and the skb_frag_t are actually
> the same thing?

When I brought this up years ago the networking folks insisted that
their use of u16 offset/size fields was important for performance,
while for bio_vecs we needed the larger ones for some cases.  Since
then networking switched to 32-bit fields for what is now the fast
path, so it might be worth to give it another spin.

Than should also help with using my new bio_vec based dma-mapping
helpers to batch iommu mappings in networking, which Jesper had on
his todo list as all the indirect calls are causing performance
issues.
John Hubbard Oct. 3, 2018, 5:40 a.m. UTC | #9
On 10/1/18 7:35 AM, Dennis Dalessandro wrote:
> On 9/28/2018 11:12 PM, John Hubbard wrote:
>> On 9/28/18 8:39 AM, Jason Gunthorpe wrote:
>>> On Thu, Sep 27, 2018 at 10:39:47PM -0700, john.hubbard@gmail.com wrote:
>>>> From: John Hubbard <jhubbard@nvidia.com>
>> [...]
>>>>
>>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>>>> index a41792dbae1f..9430d697cb9f 100644
>>>> +++ b/drivers/infiniband/core/umem.c
>>>> @@ -60,7 +60,7 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
>>>>           page = sg_page(sg);
>>>>           if (!PageDirty(page) && umem->writable && dirty)
>>>>               set_page_dirty_lock(page);
>>>> -        put_page(page);
>>>> +        put_user_page(page);
>>>
>>> Would it make sense to have a release/put_user_pages_dirtied to absorb
>>> the set_page_dity pattern too? I notice in this patch there is some
>>> variety here, I wonder what is the right way?
>>>
>>> Also, I'm told this code here is a big performance bottleneck when the
>>> number of pages becomes very long (think >> GB of memory), so having a
>>> future path to use some kind of batching/threading sound great.
>>>
>>
>> Yes. And you asked for this the first time, too. Consistent! :) Sorry for
>> being slow to pick it up. It looks like there are several patterns, and
>> we have to support both set_page_dirty() and set_page_dirty_lock(). So
>> the best combination looks to be adding a few variations of
>> release_user_pages*(), but leaving put_user_page() alone, because it's
>> the "do it yourself" basic one. Scatter-gather will be stuck with that.
>>
>> Here's a differential patch with that, that shows a nice little cleanup in
>> a couple of IB places, and as you point out, it also provides the hooks for
>> performance upgrades (via batching) in the future.
>>
>> Does this API look about right?
> 
> I'm on board with that and the changes to hfi1 and qib.
> 
> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>

Hi Dennis, thanks for the review!

I'll add those new routines in and send out a v2 soon, now that it appears, from 
the recent discussion, that this aspect of the approach is still viable.


thanks,
Jan Kara Oct. 3, 2018, 4:27 p.m. UTC | #10
On Fri 28-09-18 20:12:33, John Hubbard wrote:
>  static inline void release_user_pages(struct page **pages,
> -                                     unsigned long npages)
> +                                     unsigned long npages,
> +                                     bool set_dirty)
>  {
> -       while (npages)
> -               put_user_page(pages[--npages]);
> +       if (set_dirty)
> +               release_user_pages_dirty(pages, npages);
> +       else
> +               release_user_pages_basic(pages, npages);
> +}

Is there a good reason to have this with set_dirty argument? Generally bool
arguments are not great for readability (or greppability for that matter).
Also in this case callers can just as easily do:
	if (set_dirty)
		release_user_pages_dirty(...);
	else
		release_user_pages(...);

And furthermore it makes the code author think more whether he needs
set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
and hoping the function magically does the right thing for him.

								Honza
John Hubbard Oct. 3, 2018, 11:19 p.m. UTC | #11
On 10/3/18 9:27 AM, Jan Kara wrote:
> On Fri 28-09-18 20:12:33, John Hubbard wrote:
>>  static inline void release_user_pages(struct page **pages,
>> -                                     unsigned long npages)
>> +                                     unsigned long npages,
>> +                                     bool set_dirty)
>>  {
>> -       while (npages)
>> -               put_user_page(pages[--npages]);
>> +       if (set_dirty)
>> +               release_user_pages_dirty(pages, npages);
>> +       else
>> +               release_user_pages_basic(pages, npages);
>> +}
> 
> Is there a good reason to have this with set_dirty argument? Generally bool
> arguments are not great for readability (or greppability for that matter).
> Also in this case callers can just as easily do:
> 	if (set_dirty)
> 		release_user_pages_dirty(...);
> 	else
> 		release_user_pages(...);
> 
> And furthermore it makes the code author think more whether he needs
> set_page_dirty() or set_page_dirty_lock(), rather than just passing 'true'
> and hoping the function magically does the right thing for him.
> 

Ha, I went through *precisely* that argument in my head, too--and then
got seduced with the idea that it pretties up the existing calling code, 
because it's a drop-in one-liner at the call sites. But yes, I'll change it 
back to omit the bool set_dirty argument.

thanks,
diff mbox series

Patch

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index a41792dbae1f..9430d697cb9f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -60,7 +60,7 @@  static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d
 		page = sg_page(sg);
 		if (!PageDirty(page) && umem->writable && dirty)
 			set_page_dirty_lock(page);
-		put_page(page);
+		put_user_page(page);
 	}
 
 	sg_free_table(&umem->sg_head);
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 6ec748eccff7..6227b89cf05c 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -717,7 +717,7 @@  int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 user_virt, u64 bcnt,
 					ret = -EFAULT;
 					break;
 				}
-				put_page(local_page_list[j]);
+				put_user_page(local_page_list[j]);
 				continue;
 			}
 
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index e341e6dcc388..c7516029af33 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -126,7 +126,7 @@  void hfi1_release_user_pages(struct mm_struct *mm, struct page **p,
 	for (i = 0; i < npages; i++) {
 		if (dirty)
 			set_page_dirty_lock(p[i]);
-		put_page(p[i]);
+		put_user_page(p[i]);
 	}
 
 	if (mm) { /* during close after signal, mm can be NULL */
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index cc9c0c8ccba3..b8b12effd009 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -481,7 +481,7 @@  int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 
 	ret = pci_map_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
 	if (ret < 0) {
-		put_page(pages[0]);
+		put_user_page(pages[0]);
 		goto out;
 	}
 
@@ -489,7 +489,7 @@  int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 				 mthca_uarc_virt(dev, uar, i));
 	if (ret) {
 		pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-		put_page(sg_page(&db_tab->page[i].mem));
+		put_user_page(sg_page(&db_tab->page[i].mem));
 		goto out;
 	}
 
@@ -555,7 +555,7 @@  void mthca_cleanup_user_db_tab(struct mthca_dev *dev, struct mthca_uar *uar,
 		if (db_tab->page[i].uvirt) {
 			mthca_UNMAP_ICM(dev, mthca_uarc_virt(dev, uar, i), 1);
 			pci_unmap_sg(dev->pdev, &db_tab->page[i].mem, 1, PCI_DMA_TODEVICE);
-			put_page(sg_page(&db_tab->page[i].mem));
+			put_user_page(sg_page(&db_tab->page[i].mem));
 		}
 	}
 
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 16543d5e80c3..3f8fd42dd7fc 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -45,7 +45,7 @@  static void __qib_release_user_pages(struct page **p, size_t num_pages,
 	for (i = 0; i < num_pages; i++) {
 		if (dirty)
 			set_page_dirty_lock(p[i]);
-		put_page(p[i]);
+		put_user_page(p[i]);
 	}
 }
 
diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 926f3c8eba69..14f94d823907 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -266,7 +266,7 @@  static void qib_user_sdma_init_frag(struct qib_user_sdma_pkt *pkt,
 	pkt->addr[i].length = len;
 	pkt->addr[i].first_desc = first_desc;
 	pkt->addr[i].last_desc = last_desc;
-	pkt->addr[i].put_page = put_page;
+	pkt->addr[i].put_page = put_user_page;
 	pkt->addr[i].dma_mapped = dma_mapped;
 	pkt->addr[i].page = page;
 	pkt->addr[i].kvaddr = kvaddr;
@@ -321,7 +321,7 @@  static int qib_user_sdma_page_to_frags(const struct qib_devdata *dd,
 		 * the caller can ignore this page.
 		 */
 		if (put) {
-			put_page(page);
+			put_user_page(page);
 		} else {
 			/* coalesce case */
 			kunmap(page);
@@ -635,7 +635,7 @@  static void qib_user_sdma_free_pkt_frag(struct device *dev,
 			kunmap(pkt->addr[i].page);
 
 		if (pkt->addr[i].put_page)
-			put_page(pkt->addr[i].page);
+			put_user_page(pkt->addr[i].page);
 		else
 			__free_page(pkt->addr[i].page);
 	} else if (pkt->addr[i].kvaddr) {
@@ -710,7 +710,7 @@  static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 	/* if error, return all pages not managed by pkt */
 free_pages:
 	while (i < j)
-		put_page(pages[i++]);
+		put_user_page(pages[i++]);
 
 done:
 	return ret;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 9dd39daa602b..0f607f31c262 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -91,7 +91,7 @@  static void usnic_uiom_put_pages(struct list_head *chunk_list, int dirty)
 			pa = sg_phys(sg);
 			if (!PageDirty(page) && dirty)
 				set_page_dirty_lock(page);
-			put_page(page);
+			put_user_page(page);
 			usnic_dbg("pa: %pa\n", &pa);
 		}
 		kfree(chunk);