Message ID | 20190809225833.6657-16-ira.weiny@intel.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | RDMA/FS DAX truncate proposal V1,000,002 ;-) | expand |
On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > pages. > > In addition subsystems such as RDMA require new information to be passed > to the GUP interface to track file owning information. As such a simple > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > information. Failure to pass the vaddr_pin object back to a vaddr_put* > call will result in a failure if pins were created on files during the > pin operation. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > --- > Changes from list: > Change to vaddr_put_pages_dirty_lock > Change to vaddr_unpin_pages_dirty_lock > > include/linux/mm.h | 5 ++++ > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 657c947bda49..90c5802866df 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, > struct task_struct *task, bool bypass_rlim); > > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > + unsigned int gup_flags, struct page **pages, > + struct vaddr_pin *vaddr_pin); > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > + struct vaddr_pin *vaddr_pin, bool make_dirty); Hi Ira, OK, the API seems fine to me, anyway. :) A bit more below... > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page); > > /* Container for pinned pfns / pages */ > diff --git a/mm/gup.c b/mm/gup.c > index eeaa0ddd08a6..6d23f70d7847 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > return ret; > } > EXPORT_SYMBOL_GPL(get_user_pages_fast); > + > +/** > + * vaddr_pin_pages pin pages by virtual address and return the pages to the > + * user. > + * > + * @addr, start address What's with the commas? I thought kernel-doc wants colons, like this, right? @addr: start address > + * @nr_pages, number of pages to pin > + * @gup_flags, flags to use for the pin > + * @pages, array of pages returned > + * @vaddr_pin, initalized meta information this pin is to be associated > + * with. > + * > + * NOTE regarding vaddr_pin: > + * > + * Some callers can share pins via file descriptors to other processes. > + * Callers such as this should use the f_owner field of vaddr_pin to indicate > + * the file the fd points to. All other callers should use the mm this pin is > + * being made against. Usually "current->mm". > + * > + * Expects mmap_sem to be read locked. > + */ > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > + unsigned int gup_flags, struct page **pages, > + struct vaddr_pin *vaddr_pin) > +{ > + long ret; > + > + gup_flags |= FOLL_LONGTERM; Is now the right time to introduce and use FOLL_PIN? If not, then I can always add it on top of this later, as part of gup-tracking patches. But you did point out that FOLL_LONGTERM is taking on additional meaning, and so maybe it's better to split that meaning up right from the start. > + > + if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner)) > + return -EINVAL; > + > + ret = __gup_longterm_locked(current, > + vaddr_pin->mm, > + addr, nr_pages, > + pages, NULL, gup_flags, > + vaddr_pin); > + return ret; > +} > +EXPORT_SYMBOL(vaddr_pin_pages); > + > +/** > + * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages > + * > + * @pages, array of pages returned > + * @nr_pages, number of pages in pages > + * @vaddr_pin, same information passed to vaddr_pin_pages > + * @make_dirty: whether to mark the pages dirty > + * > + * The semantics are similar to put_user_pages_dirty_lock but a vaddr_pin used > + * in vaddr_pin_pages should be passed back into this call for propper Typo: proper > + * tracking. > + */ > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > + struct vaddr_pin *vaddr_pin, bool make_dirty) > +{ > + __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty); > +} > +EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock); > OK, whew, I'm glad to see the updated _dirty_lock() API used here. :) thanks,
On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > pages. > > In addition subsystems such as RDMA require new information to be passed > to the GUP interface to track file owning information. As such a simple > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > information. Failure to pass the vaddr_pin object back to a vaddr_put* > call will result in a failure if pins were created on files during the > pin operation. > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > I'm creating a new call site conversion series, to replace the "put_user_pages(): miscellaneous call sites" series. This uses vaddr_pin_pages*() where appropriate. So it's based on your series here. btw, while doing that, I noticed one more typo while re-reading some of the comments. Thought you probably want to collect them all for the next spin. Below... > --- > Changes from list: > Change to vaddr_put_pages_dirty_lock > Change to vaddr_unpin_pages_dirty_lock > > include/linux/mm.h | 5 ++++ > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 64 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 657c947bda49..90c5802866df 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, > struct task_struct *task, bool bypass_rlim); > > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > + unsigned int gup_flags, struct page **pages, > + struct vaddr_pin *vaddr_pin); > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > + struct vaddr_pin *vaddr_pin, bool make_dirty); > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page); > > /* Container for pinned pfns / pages */ > diff --git a/mm/gup.c b/mm/gup.c > index eeaa0ddd08a6..6d23f70d7847 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > return ret; > } > EXPORT_SYMBOL_GPL(get_user_pages_fast); > + > +/** > + * vaddr_pin_pages pin pages by virtual address and return the pages to the > + * user. > + * > + * @addr, start address > + * @nr_pages, number of pages to pin > + * @gup_flags, flags to use for the pin > + * @pages, array of pages returned > + * @vaddr_pin, initalized meta information this pin is to be associated Typo: initialized thanks,
On Fri, Aug 09, 2019 at 03:58:29PM -0700, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > pages. > > In addition subsystems such as RDMA require new information to be passed > to the GUP interface to track file owning information. As such a simple > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > information. Failure to pass the vaddr_pin object back to a vaddr_put* > call will result in a failure if pins were created on files during the > pin operation. Is this a 'vaddr' in the traditional sense, ie does it work with something returned by valloc? Maybe another name would be better? I also wish GUP like functions took in a 'void __user *' instead of the unsigned long to make this clear :\ Jason
On Fri, Aug 09, 2019 at 05:09:54PM -0700, John Hubbard wrote: > On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > > pages. > > > > In addition subsystems such as RDMA require new information to be passed > > to the GUP interface to track file owning information. As such a simple > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > > information. Failure to pass the vaddr_pin object back to a vaddr_put* > > call will result in a failure if pins were created on files during the > > pin operation. > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > --- > > Changes from list: > > Change to vaddr_put_pages_dirty_lock > > Change to vaddr_unpin_pages_dirty_lock > > > > include/linux/mm.h | 5 ++++ > > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 64 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 657c947bda49..90c5802866df 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); > > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, > > struct task_struct *task, bool bypass_rlim); > > > > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > > + unsigned int gup_flags, struct page **pages, > > + struct vaddr_pin *vaddr_pin); > > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > > + struct vaddr_pin *vaddr_pin, bool make_dirty); > > Hi Ira, > > OK, the API seems fine to me, anyway. :) > > A bit more below... > > > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page); > > > > /* Container for pinned pfns / pages */ > > diff --git a/mm/gup.c b/mm/gup.c > > index eeaa0ddd08a6..6d23f70d7847 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > return ret; > > } > > EXPORT_SYMBOL_GPL(get_user_pages_fast); > > + > > +/** > > + * vaddr_pin_pages pin pages by virtual address and return the pages to the > > + * user. > > + * > > + * @addr, start address > > What's with the commas? I thought kernel-doc wants colons, like this, right? > > @addr: start address :-/ I don't know. Fixed. > > > > + * @nr_pages, number of pages to pin > > + * @gup_flags, flags to use for the pin > > + * @pages, array of pages returned > > + * @vaddr_pin, initalized meta information this pin is to be associated > > + * with. > > + * > > + * NOTE regarding vaddr_pin: > > + * > > + * Some callers can share pins via file descriptors to other processes. > > + * Callers such as this should use the f_owner field of vaddr_pin to indicate > > + * the file the fd points to. All other callers should use the mm this pin is > > + * being made against. Usually "current->mm". > > + * > > + * Expects mmap_sem to be read locked. > > + */ > > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > > + unsigned int gup_flags, struct page **pages, > > + struct vaddr_pin *vaddr_pin) > > +{ > > + long ret; > > + > > + gup_flags |= FOLL_LONGTERM; > > > Is now the right time to introduce and use FOLL_PIN? If not, then I can always > add it on top of this later, as part of gup-tracking patches. But you did point > out that FOLL_LONGTERM is taking on additional meaning, and so maybe it's better > to split that meaning up right from the start. > At one point I wanted to (and had in my tree) a new flag but I went away from it. Prior to the discussion on mlock last week I did not think we needed it. But I'm ok to add it back in. I was not ignoring the idea for this RFC I just wanted to get this out there for people to see. I see that you threw out a couple of patches which add this flag in. FWIW, I think it would be good to differentiate between an indefinite pinned page vs a referenced "gotten" page. What you and I have been working on is the former. So it would be easy to change your refcounting patches to simply key off of FOLL_PIN. Would you like me to add in your FOLL_PIN patches to this series? > > > + > > + if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner)) > > + return -EINVAL; > > + > > + ret = __gup_longterm_locked(current, > > + vaddr_pin->mm, > > + addr, nr_pages, > > + pages, NULL, gup_flags, > > + vaddr_pin); > > + return ret; > > +} > > +EXPORT_SYMBOL(vaddr_pin_pages); > > + > > +/** > > + * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages > > + * > > + * @pages, array of pages returned > > + * @nr_pages, number of pages in pages > > + * @vaddr_pin, same information passed to vaddr_pin_pages > > + * @make_dirty: whether to mark the pages dirty > > + * > > + * The semantics are similar to put_user_pages_dirty_lock but a vaddr_pin used > > + * in vaddr_pin_pages should be passed back into this call for propper > > Typo: proper Fixed. > > > + * tracking. > > + */ > > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > > + struct vaddr_pin *vaddr_pin, bool make_dirty) > > +{ > > + __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty); > > +} > > +EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock); > > > > OK, whew, I'm glad to see the updated _dirty_lock() API used here. :) Yea this was pretty easy to change during the rebase. Again I'm kind of floating these quickly at this point. So sorry about the nits... Ira > > thanks, > -- > John Hubbard > NVIDIA
On Sun, Aug 11, 2019 at 04:07:23PM -0700, John Hubbard wrote: > On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > > pages. > > > > In addition subsystems such as RDMA require new information to be passed > > to the GUP interface to track file owning information. As such a simple > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > > information. Failure to pass the vaddr_pin object back to a vaddr_put* > > call will result in a failure if pins were created on files during the > > pin operation. > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com> > > > > I'm creating a new call site conversion series, to replace the > "put_user_pages(): miscellaneous call sites" series. This uses > vaddr_pin_pages*() where appropriate. So it's based on your series here. > > btw, while doing that, I noticed one more typo while re-reading some of the comments. > Thought you probably want to collect them all for the next spin. Below... > > > --- > > Changes from list: > > Change to vaddr_put_pages_dirty_lock > > Change to vaddr_unpin_pages_dirty_lock > > > > include/linux/mm.h | 5 ++++ > > mm/gup.c | 59 ++++++++++++++++++++++++++++++++++++++++++++++ > > 2 files changed, 64 insertions(+) > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > index 657c947bda49..90c5802866df 100644 > > --- a/include/linux/mm.h > > +++ b/include/linux/mm.h > > @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); > > int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, > > struct task_struct *task, bool bypass_rlim); > > > > +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, > > + unsigned int gup_flags, struct page **pages, > > + struct vaddr_pin *vaddr_pin); > > +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, > > + struct vaddr_pin *vaddr_pin, bool make_dirty); > > bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page); > > > > /* Container for pinned pfns / pages */ > > diff --git a/mm/gup.c b/mm/gup.c > > index eeaa0ddd08a6..6d23f70d7847 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages, > > return ret; > > } > > EXPORT_SYMBOL_GPL(get_user_pages_fast); > > + > > +/** > > + * vaddr_pin_pages pin pages by virtual address and return the pages to the > > + * user. > > + * > > + * @addr, start address > > + * @nr_pages, number of pages to pin > > + * @gup_flags, flags to use for the pin > > + * @pages, array of pages returned > > + * @vaddr_pin, initalized meta information this pin is to be associated > > Typo: > initialized Thanks fixed. Ira > > > thanks, > -- > John Hubbard > NVIDIA
On 8/12/19 2:00 PM, Ira Weiny wrote: > On Fri, Aug 09, 2019 at 05:09:54PM -0700, John Hubbard wrote: >> On 8/9/19 3:58 PM, ira.weiny@intel.com wrote: >>> From: Ira Weiny <ira.weiny@intel.com> ... > > At one point I wanted to (and had in my tree) a new flag but I went away from > it. Prior to the discussion on mlock last week I did not think we needed it. > But I'm ok to add it back in. > > I was not ignoring the idea for this RFC I just wanted to get this out there > for people to see. I see that you threw out a couple of patches which add this > flag in. > > FWIW, I think it would be good to differentiate between an indefinite pinned > page vs a referenced "gotten" page. > > What you and I have been working on is the former. So it would be easy to > change your refcounting patches to simply key off of FOLL_PIN. > > Would you like me to add in your FOLL_PIN patches to this series? Sure, that would be perfect. They don't make any sense on their own, and it's all part of the same design idea. thanks,
On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote: > On Fri, Aug 09, 2019 at 03:58:29PM -0700, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > > pages. > > > > In addition subsystems such as RDMA require new information to be passed > > to the GUP interface to track file owning information. As such a simple > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > > information. Failure to pass the vaddr_pin object back to a vaddr_put* > > call will result in a failure if pins were created on files during the > > pin operation. > > Is this a 'vaddr' in the traditional sense, ie does it work with > something returned by valloc? ...or malloc in user space, yes. I think the idea is that it is a user virtual address. > > Maybe another name would be better? Maybe, the name I had was way worse... So I'm not even going to admit to it... ;-) So I'm open to suggestions. Jan gave me this one, so I figured it was safer to suggest it... :-D > > I also wish GUP like functions took in a 'void __user *' instead of > the unsigned long to make this clear :\ Not a bad idea. But I only see a couple of call sites who actually use a 'void __user *' to pass into GUP... :-/ For RDMA the address is _never_ a 'void __user *' AFAICS. For the new API, it may be tractable to force users to cast to 'void __user *' but it is not going to provide any type safety. But it is easy to change in this series. What do others think? Ira > > Jason >
On Mon, Aug 12, 2019 at 02:48:55PM -0700, Ira Weiny wrote: > On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote: > > On Fri, Aug 09, 2019 at 03:58:29PM -0700, ira.weiny@intel.com wrote: > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > > > pages. > > > > > > In addition subsystems such as RDMA require new information to be passed > > > to the GUP interface to track file owning information. As such a simple > > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > > > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > > > information. Failure to pass the vaddr_pin object back to a vaddr_put* > > > call will result in a failure if pins were created on files during the > > > pin operation. > > > > Is this a 'vaddr' in the traditional sense, ie does it work with > > something returned by valloc? > > ...or malloc in user space, yes. I think the idea is that it is a user virtual > address. valloc is a kernel call > So I'm open to suggestions. Jan gave me this one, so I figured it was safer to > suggest it... Should have the word user in it, imho > > I also wish GUP like functions took in a 'void __user *' instead of > > the unsigned long to make this clear :\ > > Not a bad idea. But I only see a couple of call sites who actually use a 'void > __user *' to pass into GUP... :-/ > > For RDMA the address is _never_ a 'void __user *' AFAICS. That is actually a bug, converting from u64 to a 'user VA' needs to go through u64_to_user_ptr(). Jason
On Tue, Aug 13, 2019 at 08:47:06AM -0300, Jason Gunthorpe wrote: > On Mon, Aug 12, 2019 at 02:48:55PM -0700, Ira Weiny wrote: > > On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote: > > > On Fri, Aug 09, 2019 at 03:58:29PM -0700, ira.weiny@intel.com wrote: > > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > > > The addition of FOLL_LONGTERM has taken on additional meaning for CMA > > > > pages. > > > > > > > > In addition subsystems such as RDMA require new information to be passed > > > > to the GUP interface to track file owning information. As such a simple > > > > FOLL_LONGTERM flag is no longer sufficient for these users to pin pages. > > > > > > > > Introduce a new GUP like call which takes the newly introduced vaddr_pin > > > > information. Failure to pass the vaddr_pin object back to a vaddr_put* > > > > call will result in a failure if pins were created on files during the > > > > pin operation. > > > > > > Is this a 'vaddr' in the traditional sense, ie does it work with > > > something returned by valloc? > > > > ...or malloc in user space, yes. I think the idea is that it is a user virtual > > address. > > valloc is a kernel call Oh... I thought you meant this: https://linux.die.net/man/3/valloc > > > So I'm open to suggestions. Jan gave me this one, so I figured it was safer to > > suggest it... > > Should have the word user in it, imho Fair enough... user_addr_pin_pages(void __user * addr, ...) ? uaddr_pin_pages(void __user * addr, ...) ? I think I like uaddr... > > > > I also wish GUP like functions took in a 'void __user *' instead of > > > the unsigned long to make this clear :\ > > > > Not a bad idea. But I only see a couple of call sites who actually use a 'void > > __user *' to pass into GUP... :-/ > > > > For RDMA the address is _never_ a 'void __user *' AFAICS. > > That is actually a bug, converting from u64 to a 'user VA' needs to go > through u64_to_user_ptr(). Fair enough. But there are a lot of call sites throughout the kernel who have the same bug... I'm ok with forcing u64_to_user_ptr() to use this new call if others are. Ira
On 8/13/19 10:46 AM, Ira Weiny wrote: > On Tue, Aug 13, 2019 at 08:47:06AM -0300, Jason Gunthorpe wrote: >> On Mon, Aug 12, 2019 at 02:48:55PM -0700, Ira Weiny wrote: >>> On Mon, Aug 12, 2019 at 09:28:14AM -0300, Jason Gunthorpe wrote: >>>> On Fri, Aug 09, 2019 at 03:58:29PM -0700, ira.weiny@intel.com wrote: >>>>> From: Ira Weiny <ira.weiny@intel.com> ... >>> So I'm open to suggestions. Jan gave me this one, so I figured it was safer to >>> suggest it... >> >> Should have the word user in it, imho > > Fair enough... > > user_addr_pin_pages(void __user * addr, ...) ? > > uaddr_pin_pages(void __user * addr, ...) ? > > I think I like uaddr... > Better to spell out "user". "u" prefixes are used for "unsigned" and it is just too ambiguous here. Maybe: vaddr_pin_user_pages() ...which also sounds close enough to get_user_pages() that a bit of history and continuity is preserved, too. thanks,
diff --git a/include/linux/mm.h b/include/linux/mm.h index 657c947bda49..90c5802866df 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1603,6 +1603,11 @@ int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vaddr_pin *vaddr_pin); +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, + struct vaddr_pin *vaddr_pin, bool make_dirty); bool mapping_inode_has_layout(struct vaddr_pin *vaddr_pin, struct page *page); /* Container for pinned pfns / pages */ diff --git a/mm/gup.c b/mm/gup.c index eeaa0ddd08a6..6d23f70d7847 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -2536,3 +2536,62 @@ int get_user_pages_fast(unsigned long start, int nr_pages, return ret; } EXPORT_SYMBOL_GPL(get_user_pages_fast); + +/** + * vaddr_pin_pages pin pages by virtual address and return the pages to the + * user. + * + * @addr, start address + * @nr_pages, number of pages to pin + * @gup_flags, flags to use for the pin + * @pages, array of pages returned + * @vaddr_pin, initalized meta information this pin is to be associated + * with. + * + * NOTE regarding vaddr_pin: + * + * Some callers can share pins via file descriptors to other processes. + * Callers such as this should use the f_owner field of vaddr_pin to indicate + * the file the fd points to. All other callers should use the mm this pin is + * being made against. Usually "current->mm". + * + * Expects mmap_sem to be read locked. + */ +long vaddr_pin_pages(unsigned long addr, unsigned long nr_pages, + unsigned int gup_flags, struct page **pages, + struct vaddr_pin *vaddr_pin) +{ + long ret; + + gup_flags |= FOLL_LONGTERM; + + if (!vaddr_pin || (!vaddr_pin->mm && !vaddr_pin->f_owner)) + return -EINVAL; + + ret = __gup_longterm_locked(current, + vaddr_pin->mm, + addr, nr_pages, + pages, NULL, gup_flags, + vaddr_pin); + return ret; +} +EXPORT_SYMBOL(vaddr_pin_pages); + +/** + * vaddr_unpin_pages_dirty_lock - counterpart to vaddr_pin_pages + * + * @pages, array of pages returned + * @nr_pages, number of pages in pages + * @vaddr_pin, same information passed to vaddr_pin_pages + * @make_dirty: whether to mark the pages dirty + * + * The semantics are similar to put_user_pages_dirty_lock but a vaddr_pin used + * in vaddr_pin_pages should be passed back into this call for propper + * tracking. + */ +void vaddr_unpin_pages_dirty_lock(struct page **pages, unsigned long nr_pages, + struct vaddr_pin *vaddr_pin, bool make_dirty) +{ + __put_user_pages_dirty_lock(vaddr_pin, pages, nr_pages, make_dirty); +} +EXPORT_SYMBOL(vaddr_unpin_pages_dirty_lock);