Message ID | 20190211201643.7599-1-ira.weiny@intel.com (mailing list archive) |
---|---|
Headers | show |
Series | Add gup fast + longterm and use it in HFI1 | expand |
On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: >Ira Weiny (3): > mm/gup: Change "write" parameter to flags > mm/gup: Introduce get_user_pages_fast_longterm() > IB/HFI1: Use new get_user_pages_fast_longterm() Out of curiosity, are you planning on having all rdma drivers use get_user_pages_fast_longterm()? Ie: hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); hw/qib/qib_user_sdma.c: ret = get_user_pages_fast(addr, j, 0, pages); Thanks, Davidlohr
On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote: > From: Ira Weiny <ira.weiny@intel.com> > > NOTE: This series depends on my clean up patch to remove the write parameter > from gup_fast_permitted()[1] > > HFI1 uses get_user_pages_fast() due to it performance advantages. Like RDMA, > HFI1 pages can be held for a significant time. But get_user_pages_fast() does > not protect against mapping of FS DAX pages. If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too? What is the guidance on when fast vs not fast should be use? Jason
On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote: > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: > > Ira Weiny (3): > > mm/gup: Change "write" parameter to flags > > mm/gup: Introduce get_user_pages_fast_longterm() > > IB/HFI1: Use new get_user_pages_fast_longterm() > > Out of curiosity, are you planning on having all rdma drivers > use get_user_pages_fast_longterm()? Ie: > > hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); This one is certainly a mistake - this should be done with a umem. Jason
> > On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote: > > From: Ira Weiny <ira.weiny@intel.com> > > > > NOTE: This series depends on my clean up patch to remove the write > > parameter from gup_fast_permitted()[1] > > > > HFI1 uses get_user_pages_fast() due to it performance advantages. > > Like RDMA, > > HFI1 pages can be held for a significant time. But > > get_user_pages_fast() does not protect against mapping of FS DAX pages. > > If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too? > > What is the guidance on when fast vs not fast should be use? Right now it can't because it holds mmap_sem across the call. Once Shiraz's patches are accepted removing the umem->hugetlb flag I think we can change umem.c. Also, it specifies FOLL_FORCE which can't currently be specified with gup fast. One idea I had was to change get_user_pages_fast() to use gup_flags instead of a single write flag. But that proved to be a very big cosmetic change across a lot of callers so I went this way. Ira > > Jason
On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote: > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: > > Ira Weiny (3): > > mm/gup: Change "write" parameter to flags > > mm/gup: Introduce get_user_pages_fast_longterm() > > IB/HFI1: Use new get_user_pages_fast_longterm() > > Out of curiosity, are you planning on having all rdma drivers > use get_user_pages_fast_longterm()? Ie: > > hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); > hw/qib/qib_user_sdma.c: ret = get_user_pages_fast(addr, j, 0, pages); I missed that when I change the other qib call to longterm... :-( Yes both of these should be changed. Although I need to look into Jasons comment WRT the mthca call. Ira > > Thanks, > Davidlohr
On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote: > On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote: > > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: > > > Ira Weiny (3): > > > mm/gup: Change "write" parameter to flags > > > mm/gup: Introduce get_user_pages_fast_longterm() > > > IB/HFI1: Use new get_user_pages_fast_longterm() > > > > Out of curiosity, are you planning on having all rdma drivers > > use get_user_pages_fast_longterm()? Ie: > > > > hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); > > This one is certainly a mistake - this should be done with a umem. It looks like this is mapping a page allocated by user space for a doorbell?!?! And that this is supporting the old memory free cards. I remember that these cards used system memory instead of memory on the cards but why it expects user space to allocate that memory and how it all works is way too old for me to even try to remember. This does not seem to be allocating memory regions. Jason, do you want a patch to just convert these calls and consider it legacy code? Ira > > Jason
On Mon, Feb 11, 2019 at 01:42:57PM -0800, Ira Weiny wrote: > On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote: > > On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote: > > > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: > > > > Ira Weiny (3): > > > > mm/gup: Change "write" parameter to flags > > > > mm/gup: Introduce get_user_pages_fast_longterm() > > > > IB/HFI1: Use new get_user_pages_fast_longterm() > > > > > > Out of curiosity, are you planning on having all rdma drivers > > > use get_user_pages_fast_longterm()? Ie: > > > > > > hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages); > > > > This one is certainly a mistake - this should be done with a umem. > > It looks like this is mapping a page allocated by user space for a > doorbell?!?! Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort > This does not seem to be allocating memory regions. Jason, do you > want a patch to just convert these calls and consider it legacy > code? It needs to use umem like all the other drivers on this path. Otherwise it doesn't get the page pinning logic right There is also something else rotten with these longterm callsites, they seem to have very different ideas how to handle RLIMIT_MEMLOCK. ie vfio doesn't even touch pinned_vm.. and rdma is applying RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No idea which is right, but they should be the same, and this pattern should probably be in core code someplace. Jason
On Mon, Feb 11, 2019 at 09:14:56PM +0000, Weiny, Ira wrote: > > > > On Mon, Feb 11, 2019 at 12:16:40PM -0800, ira.weiny@intel.com wrote: > > > From: Ira Weiny <ira.weiny@intel.com> > > > > > > NOTE: This series depends on my clean up patch to remove the write > > > parameter from gup_fast_permitted()[1] > > > > > > HFI1 uses get_user_pages_fast() due to it performance advantages. > > > Like RDMA, > > > HFI1 pages can be held for a significant time. But > > > get_user_pages_fast() does not protect against mapping of FS DAX pages. > > > > If HFI1 can use the _fast varient, can't all the general RDMA stuff use it too? > > > > What is the guidance on when fast vs not fast should be use? > > Right now it can't because it holds mmap_sem across the call. Once > Shiraz's patches are accepted removing the umem->hugetlb flag I > think we can change umem.c. Okay, that make sense, we should change it when Shiraz's patches are merged > Also, it specifies FOLL_FORCE which can't currently be specified > with gup fast. One idea I had was to change get_user_pages_fast() > to use gup_flags instead of a single write flag. But that proved to > be a very big cosmetic change across a lot of callers so I went this > way. I think you should do it.. :) Jason
> On Mon, Feb 11, 2019 at 01:42:57PM -0800, Ira Weiny wrote: > > On Mon, Feb 11, 2019 at 01:47:10PM -0700, Jason Gunthorpe wrote: > > > On Mon, Feb 11, 2019 at 12:34:17PM -0800, Davidlohr Bueso wrote: > > > > On Mon, 11 Feb 2019, ira.weiny@intel.com wrote: > > > > > Ira Weiny (3): > > > > > mm/gup: Change "write" parameter to flags > > > > > mm/gup: Introduce get_user_pages_fast_longterm() > > > > > IB/HFI1: Use new get_user_pages_fast_longterm() > > > > > > > > Out of curiosity, are you planning on having all rdma drivers use > > > > get_user_pages_fast_longterm()? Ie: > > > > > > > > hw/mthca/mthca_memfree.c: ret = get_user_pages_fast(uaddr & > PAGE_MASK, 1, FOLL_WRITE, pages); > > > > > > This one is certainly a mistake - this should be done with a umem. > > > > It looks like this is mapping a page allocated by user space for a > > doorbell?!?! > > Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort My surprise is why does _userspace_ allocate this memory? > > > This does not seem to be allocating memory regions. Jason, do you > > want a patch to just convert these calls and consider it legacy code? > > It needs to use umem like all the other drivers on this path. > Otherwise it doesn't get the page pinning logic right Not sure what you mean regarding the pinning logic? > > There is also something else rotten with these longterm callsites, they seem > to have very different ideas how to handle RLIMIT_MEMLOCK. > > ie vfio doesn't even touch pinned_vm.. and rdma is applying > RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No > idea which is right, but they should be the same, and this pattern should > probably be in core code someplace. Neither do I. But AFAIK pinned_vm is a subset of locked_vm. So should we be accounting both of the counters? Ira
On Mon, Feb 11, 2019 at 10:40:02PM +0000, Weiny, Ira wrote: > > Many drivers do this, the 'doorbell' is a PCI -> CPU thing of some sort > > My surprise is why does _userspace_ allocate this memory? Well, userspace needs to read the memory, so either userpace allocates it and the kernel GUP's it, or userspace mmap's a kernel page which was DMA mapped. The GUP version lets the doorbells have lower alignment than a PAGE, and thes RDMA drivers hard requires GUP->DMA to function.. So why not use a umem here? It already has to work. > > > This does not seem to be allocating memory regions. Jason, do you > > > want a patch to just convert these calls and consider it legacy code? > > > > It needs to use umem like all the other drivers on this path. > > Otherwise it doesn't get the page pinning logic right > > Not sure what you mean regarding the pinning logic? The RLIMIT_MEMLOCK stuff and so on. > > There is also something else rotten with these longterm callsites, > > they seem to have very different ideas how to handle > > RLIMIT_MEMLOCK. > > > > ie vfio doesn't even touch pinned_vm.. and rdma is applying > > RLIMIT_MEMLOCK to mm->pinned_vm, while vfio is using locked_vm.. No > > idea which is right, but they should be the same, and this pattern should > > probably be in core code someplace. > > Neither do I. But AFAIK pinned_vm is a subset of locked_vm. I thought so.. > So should we be accounting both of the counters? Someone should check :) Since we don't increment locked_vm when we increment pinned_vm and vfio only checke RLIMIT_MEMLOCK against locked_vm one can certainly exceed the limit by mixing and matching RDMA and VFIO pins in the same process. Sure seems like there is a bug somewhere here. Jason
From: Ira Weiny <ira.weiny@intel.com>
NOTE: This series depends on my clean up patch to remove the write parameter
from gup_fast_permitted()[1]
HFI1, qib, and mthca, use get_user_pages_fast() due to it performance
advantages. These pages can be held for a significant time. But
get_user_pages_fast() does not protect against mapping of FS DAX pages.
Introduce FOLL_LONGTERM and use this flag in get_user_pages_fast() which
retains the performance while also adding the FS DAX checks. XDP has also
shown interest in using this functionality.[2]
In addition we change get_user_pages() to use the new FOLL_LONGTERM flag and
remove the specialized get_user_pages_longterm call.
[1] https://lkml.org/lkml/2019/2/11/237
[2] https://lkml.org/lkml/2019/2/11/1789
Ira Weiny (7):
mm/gup: Replace get_user_pages_longterm() with FOLL_LONGTERM
mm/gup: Change write parameter to flags in fast walk
mm/gup: Change GUP fast to use flags rather than a write 'bool'
mm/gup: Add FOLL_LONGTERM capability to GUP fast
IB/hfi1: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
IB/qib: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
IB/mthca: Use the new FOLL_LONGTERM flag to get_user_pages_fast()
arch/mips/mm/gup.c | 11 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +-
arch/powerpc/kvm/e500_mmu.c | 2 +-
arch/powerpc/mm/mmu_context_iommu.c | 4 +-
arch/s390/kvm/interrupt.c | 2 +-
arch/s390/mm/gup.c | 12 +-
arch/sh/mm/gup.c | 11 +-
arch/sparc/mm/gup.c | 9 +-
arch/x86/kvm/paging_tmpl.h | 2 +-
arch/x86/kvm/svm.c | 2 +-
drivers/fpga/dfl-afu-dma-region.c | 2 +-
drivers/gpu/drm/via/via_dmablit.c | 3 +-
drivers/infiniband/core/umem.c | 5 +-
drivers/infiniband/hw/hfi1/user_pages.c | 5 +-
drivers/infiniband/hw/mthca/mthca_memfree.c | 3 +-
drivers/infiniband/hw/qib/qib_user_pages.c | 8 +-
drivers/infiniband/hw/qib/qib_user_sdma.c | 2 +-
drivers/infiniband/hw/usnic/usnic_uiom.c | 9 +-
drivers/media/v4l2-core/videobuf-dma-sg.c | 6 +-
drivers/misc/genwqe/card_utils.c | 2 +-
drivers/misc/vmw_vmci/vmci_host.c | 2 +-
drivers/misc/vmw_vmci/vmci_queue_pair.c | 6 +-
drivers/platform/goldfish/goldfish_pipe.c | 3 +-
drivers/rapidio/devices/rio_mport_cdev.c | 4 +-
drivers/sbus/char/oradax.c | 2 +-
drivers/scsi/st.c | 3 +-
drivers/staging/gasket/gasket_page_table.c | 4 +-
drivers/tee/tee_shm.c | 2 +-
drivers/vfio/vfio_iommu_spapr_tce.c | 3 +-
drivers/vfio/vfio_iommu_type1.c | 3 +-
drivers/vhost/vhost.c | 2 +-
drivers/video/fbdev/pvr2fb.c | 2 +-
drivers/virt/fsl_hypervisor.c | 2 +-
drivers/xen/gntdev.c | 2 +-
fs/orangefs/orangefs-bufmap.c | 2 +-
include/linux/mm.h | 17 +-
kernel/futex.c | 2 +-
lib/iov_iter.c | 7 +-
mm/gup.c | 220 ++++++++++++--------
mm/gup_benchmark.c | 5 +-
mm/util.c | 8 +-
net/ceph/pagevec.c | 2 +-
net/rds/info.c | 2 +-
net/rds/rdma.c | 3 +-
44 files changed, 232 insertions(+), 180 deletions(-)
From: Ira Weiny <ira.weiny@intel.com> NOTE: This series depends on my clean up patch to remove the write parameter from gup_fast_permitted()[1] HFI1 uses get_user_pages_fast() due to it performance advantages. Like RDMA, HFI1 pages can be held for a significant time. But get_user_pages_fast() does not protect against mapping of FS DAX pages. Introduce a get_user_pages_fast_longterm() which retains the performance while also adding the FS DAX checks. XDP has also shown interest in using this functionality.[2] [1] https://lkml.org/lkml/2019/2/11/237 [2] https://lkml.org/lkml/2019/2/11/1789 Ira Weiny (3): mm/gup: Change "write" parameter to flags mm/gup: Introduce get_user_pages_fast_longterm() IB/HFI1: Use new get_user_pages_fast_longterm() drivers/infiniband/hw/hfi1/user_pages.c | 2 +- include/linux/mm.h | 8 ++ mm/gup.c | 152 ++++++++++++++++-------- 3 files changed, 114 insertions(+), 48 deletions(-)