[00/34] put_user_pages(): miscellaneous call sites
mbox series

Message ID 20190802022005.5117-1-jhubbard@nvidia.com
Headers show
Series
  • put_user_pages(): miscellaneous call sites
Related show

Message

john.hubbard@gmail.com Aug. 2, 2019, 2:19 a.m. UTC
From: John Hubbard <jhubbard@nvidia.com>

Hi,

These are best characterized as miscellaneous conversions: many (not all)
call sites that don't involve biovec or iov_iter, nor mm/. It also leaves
out a few call sites that require some more work. These are mostly pretty
simple ones.

It's probably best to send all of these via Andrew's -mm tree, assuming
that there are no significant merge conflicts with ongoing work in other
trees (which I doubt, given that these are small changes).

These patches apply to the latest linux.git. Patch #1 is also already in
Andrew's tree, but given the broad non-linux-mm Cc list, I thought it
would be more convenient to just include that patch here, so that people
can use linux.git as the base--even though these are probably destined
for linux-mm.

This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
("mm: introduce put_user_page*(), placeholder versions"). That commit
has an extensive description of the problem and the planned steps to
solve it, but the highlites are:

1) Provide put_user_page*() routines, intended to be used
for releasing pages that were pinned via get_user_pages*().

2) Convert all of the call sites for get_user_pages*(), to
invoke put_user_page*(), instead of put_page(). This involves dozens of
call sites, and will take some time.

3) After (2) is complete, use get_user_pages*() and put_user_page*() to
implement tracking of these pages. This tracking will be separate from
the existing struct page refcounting.

4) Use the tracking and identification of these pages, to implement
special handling (especially in writeback paths) when the pages are
backed by a filesystem.

And a few references, also from that commit:

[1] https://lwn.net/Articles/774411/ : "DMA and get_user_pages()"
[2] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"


Ira Weiny (1):
  fs/binfmt_elf: convert put_page() to put_user_page*()

John Hubbard (33):
  mm/gup: add make_dirty arg to put_user_pages_dirty_lock()
  net/rds: convert put_page() to put_user_page*()
  net/ceph: convert put_page() to put_user_page*()
  x86/kvm: convert put_page() to put_user_page*()
  drm/etnaviv: convert release_pages() to put_user_pages()
  drm/i915: convert put_page() to put_user_page*()
  drm/radeon: convert put_page() to put_user_page*()
  media/ivtv: convert put_page() to put_user_page*()
  media/v4l2-core/mm: convert put_page() to put_user_page*()
  genwqe: convert put_page() to put_user_page*()
  scif: convert put_page() to put_user_page*()
  vmci: convert put_page() to put_user_page*()
  rapidio: convert put_page() to put_user_page*()
  oradax: convert put_page() to put_user_page*()
  staging/vc04_services: convert put_page() to put_user_page*()
  drivers/tee: convert put_page() to put_user_page*()
  vfio: convert put_page() to put_user_page*()
  fbdev/pvr2fb: convert put_page() to put_user_page*()
  fsl_hypervisor: convert put_page() to put_user_page*()
  xen: convert put_page() to put_user_page*()
  fs/exec.c: convert put_page() to put_user_page*()
  orangefs: convert put_page() to put_user_page*()
  uprobes: convert put_page() to put_user_page*()
  futex: convert put_page() to put_user_page*()
  mm/frame_vector.c: convert put_page() to put_user_page*()
  mm/gup_benchmark.c: convert put_page() to put_user_page*()
  mm/memory.c: convert put_page() to put_user_page*()
  mm/madvise.c: convert put_page() to put_user_page*()
  mm/process_vm_access.c: convert put_page() to put_user_page*()
  crypt: convert put_page() to put_user_page*()
  nfs: convert put_page() to put_user_page*()
  goldfish_pipe: convert put_page() to put_user_page*()
  kernel/events/core.c: convert put_page() to put_user_page*()

 arch/x86/kvm/svm.c                            |   4 +-
 crypto/af_alg.c                               |   7 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c         |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   9 +-
 drivers/gpu/drm/radeon/radeon_ttm.c           |   2 +-
 drivers/infiniband/core/umem.c                |   5 +-
 drivers/infiniband/hw/hfi1/user_pages.c       |   5 +-
 drivers/infiniband/hw/qib/qib_user_pages.c    |   5 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c      |   5 +-
 drivers/infiniband/sw/siw/siw_mem.c           |  10 +-
 drivers/media/pci/ivtv/ivtv-udma.c            |  14 +--
 drivers/media/pci/ivtv/ivtv-yuv.c             |  10 +-
 drivers/media/v4l2-core/videobuf-dma-sg.c     |   3 +-
 drivers/misc/genwqe/card_utils.c              |  17 +--
 drivers/misc/mic/scif/scif_rma.c              |  17 ++-
 drivers/misc/vmw_vmci/vmci_context.c          |   2 +-
 drivers/misc/vmw_vmci/vmci_queue_pair.c       |  11 +-
 drivers/platform/goldfish/goldfish_pipe.c     |   9 +-
 drivers/rapidio/devices/rio_mport_cdev.c      |   9 +-
 drivers/sbus/char/oradax.c                    |   2 +-
 .../interface/vchiq_arm/vchiq_2835_arm.c      |  10 +-
 drivers/tee/tee_shm.c                         |  10 +-
 drivers/vfio/vfio_iommu_type1.c               |   8 +-
 drivers/video/fbdev/pvr2fb.c                  |   3 +-
 drivers/virt/fsl_hypervisor.c                 |   7 +-
 drivers/xen/gntdev.c                          |   5 +-
 drivers/xen/privcmd.c                         |   7 +-
 fs/binfmt_elf.c                               |   2 +-
 fs/binfmt_elf_fdpic.c                         |   2 +-
 fs/exec.c                                     |   2 +-
 fs/nfs/direct.c                               |   4 +-
 fs/orangefs/orangefs-bufmap.c                 |   7 +-
 include/linux/mm.h                            |   5 +-
 kernel/events/core.c                          |   2 +-
 kernel/events/uprobes.c                       |   6 +-
 kernel/futex.c                                |  10 +-
 mm/frame_vector.c                             |   4 +-
 mm/gup.c                                      | 115 ++++++++----------
 mm/gup_benchmark.c                            |   2 +-
 mm/madvise.c                                  |   2 +-
 mm/memory.c                                   |   2 +-
 mm/process_vm_access.c                        |  18 +--
 net/ceph/pagevec.c                            |   8 +-
 net/rds/info.c                                |   5 +-
 net/rds/message.c                             |   2 +-
 net/rds/rdma.c                                |  15 ++-
 virt/kvm/kvm_main.c                           |   4 +-
 47 files changed, 151 insertions(+), 266 deletions(-)

Comments

John Hubbard Aug. 2, 2019, 5:48 a.m. UTC | #1
On 8/1/19 9:36 PM, Juergen Gross wrote:
> On 02.08.19 04:19, john.hubbard@gmail.com wrote:
>> From: John Hubbard <jhubbard@nvidia.com>
...
>> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
>> index 2f5ce7230a43..29e461dbee2d 100644
>> --- a/drivers/xen/privcmd.c
>> +++ b/drivers/xen/privcmd.c
>> @@ -611,15 +611,10 @@ static int lock_pages(
>>   static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>>   {
>> -    unsigned int i;
>> -
>>       if (!pages)
>>           return;
>> -    for (i = 0; i < nr_pages; i++) {
>> -        if (pages[i])
>> -            put_page(pages[i]);
>> -    }
>> +    put_user_pages(pages, nr_pages);
> 
> You are not handling the case where pages[i] is NULL here. Or am I
> missing a pending patch to put_user_pages() here?
> 

Hi Juergen,

You are correct--this no longer handles the cases where pages[i]
is NULL. It's intentional, though possibly wrong. :)

I see that I should have added my standard blurb to this
commit description. I missed this one, but some of the other patches
have it. It makes the following, possibly incorrect claim:

"This changes the release code slightly, because each page slot in the
page_list[] array is no longer checked for NULL. However, that check
was wrong anyway, because the get_user_pages() pattern of usage here
never allowed for NULL entries within a range of pinned pages."

The way I've seen these page arrays used with get_user_pages(),
things are either done single page, or with a contiguous range. So
unless I'm missing a case where someone is either

a) releasing individual pages within a range (and thus likely messing
up their count of pages they have), or

b) allocating two gup ranges within the same pages[] array, with a
gap between the allocations,

...then it should be correct. If so, then I'll add the above blurb
to this patch's commit description.

If that's not the case (both here, and in 3 or 4 other patches in this
series, then as you said, I should add NULL checks to put_user_pages()
and put_user_pages_dirty_lock().


thanks,
John Hubbard Aug. 2, 2019, 6:48 p.m. UTC | #2
On 8/2/19 2:19 AM, Joonas Lahtinen wrote:
> Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
>> From: John Hubbard <jhubbard@nvidia.com>
>>
>> For pages that were retained via get_user_pages*(), release those pages
>> via the new put_user_page*() routines, instead of via put_page() or
>> release_pages().
>>
>> This is part a tree-wide conversion, as described in commit fc1d8e7cca2d
>> ("mm: introduce put_user_page*(), placeholder versions").
>>
>> Note that this effectively changes the code's behavior in
>> i915_gem_userptr_put_pages(): it now calls set_page_dirty_lock(),
>> instead of set_page_dirty(). This is probably more accurate.
> 
> We've already fixed this in drm-tip where the current code uses
> set_page_dirty_lock().
> 
> This would conflict with our tree. Rodrigo is handling
> drm-intel-next for 5.4, so you guys want to coordinate how
> to merge.
> 

Hi Joonas, Rodrigo,

First of all, I apologize for the API breakage: put_user_pages_dirty_lock()
has an additional "dirty" parameter.

In order to deal with the merge problem, I'll drop this patch from my series,
and I'd recommend that the drm-intel-next take the following approach:

1) For now, s/put_page/put_user_page/ in i915_gem_userptr_put_pages(),
and fix up the set_page_dirty() --> set_page_dirty_lock() issue, like this
(based against linux.git):

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index 528b61678334..94721cc0093b 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -664,10 +664,10 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,

         for_each_sgt_page(page, sgt_iter, pages) {
                 if (obj->mm.dirty)
-                       set_page_dirty(page);
+                       set_page_dirty_lock(page);

                 mark_page_accessed(page);
-               put_page(page);
+               put_user_page(page);
         }
         obj->mm.dirty = false;


That will leave you with your original set_page_dirty_lock() calls
and everything works properly.

2) Next cycle, move to the new put_user_pages_dirty_lock().

thanks,
John Hubbard Aug. 3, 2019, 8:03 p.m. UTC | #3
On 8/2/19 11:48 AM, John Hubbard wrote:
> On 8/2/19 2:19 AM, Joonas Lahtinen wrote:
>> Quoting john.hubbard@gmail.com (2019-08-02 05:19:37)
>>> From: John Hubbard <jhubbard@nvidia.com>
...
> In order to deal with the merge problem, I'll drop this patch from my series,
> and I'd recommend that the drm-intel-next take the following approach:

Actually, I just pulled the latest linux.git, and there are a few changes:

> 
> 1) For now, s/put_page/put_user_page/ in i915_gem_userptr_put_pages(),
> and fix up the set_page_dirty() --> set_page_dirty_lock() issue, like this
> (based against linux.git):
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> index 528b61678334..94721cc0093b 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
> @@ -664,10 +664,10 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
> 
>         for_each_sgt_page(page, sgt_iter, pages) {
>                 if (obj->mm.dirty)
> -                       set_page_dirty(page);
> +                       set_page_dirty_lock(page);

I see you've already applied this fix to your tree, in linux.git already.

> 
>                 mark_page_accessed(page);
> -               put_page(page);
> +               put_user_page(page);

But this conversion still needs doing. So I'll repost a patch that only does 
this (plus the other call sites). 

That can go in via either your tree, or Andrew's -mm tree, without generating
any conflicts.

thanks,
John Hubbard Aug. 8, 2019, 3:46 a.m. UTC | #4
On 8/7/19 7:36 PM, Ira Weiny wrote:
> On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
>> On Wed 07-08-19 10:37:26, Jan Kara wrote:
>>> On Fri 02-08-19 12:14:09, John Hubbard wrote:
>>>> On 8/2/19 7:52 AM, Jan Kara wrote:
>>>>> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>>>>>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>>>>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>>>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
  [...]
> Before I go on, I would like to say that the "imbalance" of get_user_pages()
> and put_page() bothers me from a purist standpoint...  However, since this
> discussion cropped up I went ahead and ported my work to Linus' current master
> (5.3-rc3+) and in doing so I only had to steal a bit of Johns code...  Sorry
> John...  :-(
> 
> I don't have the commit messages all cleaned up and I know there may be some
> discussion on these new interfaces but I wanted to throw this series out there
> because I think it may be what Jan and Michal are driving at (or at least in
> that direction.
> 
> Right now only RDMA and DAX FS's are supported.  Other users of GUP will still
> fail on a DAX file and regular files will still be at risk.[2]
> 
> I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
> 
> https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
> 
> I think the most relevant patch to this conversation is:
> 
> https://github.com/weiny2/linux-kernel/commit/5d377653ba5cf11c3b716f904b057bee6641aaf6
> 

ohhh...can you please avoid using the old __put_user_pages_dirty()
function? I thought I'd caught things early enough to get away with
the rename and deletion of that. You could either:

a) open code an implementation of vaddr_put_pages_dirty_lock() that
doesn't call any of the *put_user_pages_dirty*() variants, or

b) include my first patch ("") are part of your series, or

c) base this on Andrews's tree, which already has merged in my first patch.


thanks,
John Hubbard Aug. 8, 2019, 6:18 p.m. UTC | #5
On 8/8/19 9:25 AM, Weiny, Ira wrote:
>>
>> On 8/7/19 7:36 PM, Ira Weiny wrote:
>>> On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
>>>> On Wed 07-08-19 10:37:26, Jan Kara wrote:
>>>>> On Fri 02-08-19 12:14:09, John Hubbard wrote:
>>>>>> On 8/2/19 7:52 AM, Jan Kara wrote:
>>>>>>> On Fri 02-08-19 07:24:43, Matthew Wilcox wrote:
>>>>>>>> On Fri, Aug 02, 2019 at 02:41:46PM +0200, Jan Kara wrote:
>>>>>>>>> On Fri 02-08-19 11:12:44, Michal Hocko wrote:
>>>>>>>>>> On Thu 01-08-19 19:19:31, john.hubbard@gmail.com wrote:
>>   [...]
> Yep I can do this.  I did not realize that Andrew had accepted any of this work.  I'll check out his tree.  But I don't think he is going to accept this series through his tree.  So what is the ETA on that landing in Linus' tree?
> 

I'd expect it to go into 5.4, according to my understanding of how
the release cycles are arranged.


> To that point I'm still not sure who would take all this as I am now touching mm, procfs, rdma, ext4, and xfs.
> 
> I just thought I would chime in with my progress because I'm to a point where things are working and so I can submit the code but I'm not sure what I can/should depend on landing...  Also, now that 0day has run overnight it has found issues with this rebase so I need to clean those up...  Perhaps I will base on Andrew's tree prior to doing that...

I'm certainly not the right person to answer, but in spite of that, I'd think
Andrew's tree is a reasonable place for it. Sort of.

thanks,
Ira Weiny Aug. 9, 2019, 5:56 p.m. UTC | #6
> 
> On Wed 07-08-19 19:36:37, Ira Weiny wrote:
> > On Wed, Aug 07, 2019 at 10:46:49AM +0200, Michal Hocko wrote:
> > > > So I think your debug option and my suggested renaming serve a bit
> > > > different purposes (and thus both make sense). If you do the
> > > > renaming, you can just grep to see unconverted sites. Also when
> > > > someone merges new GUP user (unaware of the new rules) while you
> > > > switch GUP to use pins instead of ordinary references, you'll get
> > > > compilation error in case of renaming instead of hard to debug
> > > > refcount leak without the renaming. And such conflict is almost
> > > > bound to happen given the size of GUP patch set... Also the
> > > > renaming serves against the "coding inertia" - i.e., GUP is around for
> ages so people just use it without checking any documentation or comments.
> > > > After switching how GUP works, what used to be correct isn't
> > > > anymore so renaming the function serves as a warning that
> > > > something has really changed.
> > >
> > > Fully agreed!
> >
> > Ok Prior to this I've been basing all my work for the RDMA/FS DAX
> > stuff in Johns put_user_pages()...  (Including when I proposed failing
> > truncate with a lease in June [1])
> >
> > However, based on the suggestions in that thread it became clear that
> > a new interface was going to need to be added to pass in the "RDMA
> > file" information to GUP to associate file pins with the correct processes...
> >
> > I have many drawings on my white board with "a whole lot of lines" on
> > them to make sure that if a process opens a file, mmaps it, pins it
> > with RDMA, _closes_ it, and ummaps it; that the resulting file pin can
> > still be traced back to the RDMA context and all the processes which
> > may have access to it....  No matter where the original context may
> > have come from.  I believe I have accomplished that.
> >
> > Before I go on, I would like to say that the "imbalance" of
> > get_user_pages() and put_page() bothers me from a purist standpoint...
> > However, since this discussion cropped up I went ahead and ported my
> > work to Linus' current master
> > (5.3-rc3+) and in doing so I only had to steal a bit of Johns code...
> > Sorry John...  :-(
> >
> > I don't have the commit messages all cleaned up and I know there may
> > be some discussion on these new interfaces but I wanted to throw this
> > series out there because I think it may be what Jan and Michal are
> > driving at (or at least in that direction.
> >
> > Right now only RDMA and DAX FS's are supported.  Other users of GUP
> > will still fail on a DAX file and regular files will still be at
> > risk.[2]
> >
> > I've pushed this work (based 5.3-rc3+ (33920f1ec5bf)) here[3]:
> >
> > https://github.com/weiny2/linux-kernel/tree/linus-rdmafsdax-b0-v3
> >
> > I think the most relevant patch to this conversation is:
> >
> > https://github.com/weiny2/linux-
> kernel/commit/5d377653ba5cf11c3b716f90
> > 4b057bee6641aaf6
> >
> > I stole Jans suggestion for a name as the name I used while
> > prototyping was pretty bad...  So Thanks Jan...  ;-)
> 
> For your function, I'd choose a name like vaddr_pin_leased_pages() so that
> association with a lease is clear from the name :)

My gut was to just change this as you suggested.  But the fact is that these calls can get used on anonymous pages as well.  So the "leased" semantic may not apply...  OTOH if a file is encountered it will fail the pin...  :-/  I'm going to leave it for now and get the patches submitted to the list...

> Also I'd choose the
> counterpart to be vaddr_unpin_leased_page[s](). Especially having put_page
> in the name looks confusing to me...

Ah yes, totally agree with the "pin/unpin" symmetry.  I've changed from "put" to "unpin"...

Thanks,
Ira

> 
> 								Honza
> 
> --
> Jan Kara <jack@suse.com>
> SUSE Labs, CR