Message ID | 20231012082921.546114-1-max7255@meta.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | [v2] IB: rework memlock limit handling code | expand |
On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > This patch provides the uniform handling for RLIM_INFINITY value > across the infiniband/rdma subsystem. > > Currently in some cases the infinity constant is treated > as an actual limit value, which could be misleading. > > Let's also provide the single helper to check against process > MEMLOCK limit while registering user memory region mappings. > > Signed-off-by: Maxim Samoylov <max7255@meta.com> > --- > > v1 -> v2: rewritten commit message, rebased on recent upstream > > drivers/infiniband/core/umem.c | 7 ++----- > drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ > include/rdma/ib_umem.h | 11 +++++++++++ > 6 files changed, 31 insertions(+), 29 deletions(-) <...> > @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > struct siw_umem *umem = NULL; > struct siw_ureq_reg_mr ureq; > struct siw_device *sdev = to_siw_dev(pd->device); > - > - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > + unsigned long num_pages = > + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > int rv; > > siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > rv = -EINVAL; > goto err_out; > } > - if (mem_limit != RLIM_INFINITY) { > - unsigned long num_pages = > - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > - mem_limit >>= PAGE_SHIFT; > > - if (num_pages > mem_limit - current->mm->locked_vm) { > - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > - num_pages, mem_limit, > - current->mm->locked_vm); > - rv = -ENOMEM; > - goto err_out; > - } > + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { > + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > + num_pages, rlimit(RLIMIT_MEMLOCK), > + current->mm->locked_vm); > + rv = -ENOMEM; > + goto err_out; > } Sorry for late response, but why does this hunk exist in first place? > + > umem = siw_umem_get(start, len, ib_access_writable(rights)); This should be ib_umem_get(). Thanks
On 10/15/23 17:19, Leon Romanovsky wrote: > On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: >> This patch provides the uniform handling for RLIM_INFINITY value >> across the infiniband/rdma subsystem. >> >> Currently in some cases the infinity constant is treated >> as an actual limit value, which could be misleading. >> >> Let's also provide the single helper to check against process >> MEMLOCK limit while registering user memory region mappings. >> >> Signed-off-by: Maxim Samoylov<max7255@meta.com> >> --- >> >> v1 -> v2: rewritten commit message, rebased on recent upstream >> >> drivers/infiniband/core/umem.c | 7 ++----- >> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- >> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- >> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- >> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ >> include/rdma/ib_umem.h | 11 +++++++++++ >> 6 files changed, 31 insertions(+), 29 deletions(-) > <...> > >> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, >> struct siw_umem *umem = NULL; >> struct siw_ureq_reg_mr ureq; >> struct siw_device *sdev = to_siw_dev(pd->device); >> - >> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); >> + unsigned long num_pages = >> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; >> int rv; >> >> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", >> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, >> rv = -EINVAL; >> goto err_out; >> } >> - if (mem_limit != RLIM_INFINITY) { >> - unsigned long num_pages = >> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; >> - mem_limit >>= PAGE_SHIFT; >> >> - if (num_pages > mem_limit - current->mm->locked_vm) { >> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", >> - num_pages, mem_limit, >> - current->mm->locked_vm); >> - rv = -ENOMEM; >> - goto err_out; >> - } >> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { >> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", >> + num_pages, rlimit(RLIMIT_MEMLOCK), >> + current->mm->locked_vm); >> + rv = -ENOMEM; >> + goto err_out; >> } > Sorry for late response, but why does this hunk exist in first place? > >> + >> umem = siw_umem_get(start, len, ib_access_writable(rights)); > This should be ib_umem_get(). IMO, it deserves a separate patch, and replace siw_umem_get with ib_umem_get is not straightforward given siw_mem has two types of memory (pbl and umem). Thanks, Guoqing
On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: > > > On 10/15/23 17:19, Leon Romanovsky wrote: > > On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > > > This patch provides the uniform handling for RLIM_INFINITY value > > > across the infiniband/rdma subsystem. > > > > > > Currently in some cases the infinity constant is treated > > > as an actual limit value, which could be misleading. > > > > > > Let's also provide the single helper to check against process > > > MEMLOCK limit while registering user memory region mappings. > > > > > > Signed-off-by: Maxim Samoylov<max7255@meta.com> > > > --- > > > > > > v1 -> v2: rewritten commit message, rebased on recent upstream > > > > > > drivers/infiniband/core/umem.c | 7 ++----- > > > drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > > > drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > > > drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > > > drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ > > > include/rdma/ib_umem.h | 11 +++++++++++ > > > 6 files changed, 31 insertions(+), 29 deletions(-) > > <...> > > > > > @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > > > struct siw_umem *umem = NULL; > > > struct siw_ureq_reg_mr ureq; > > > struct siw_device *sdev = to_siw_dev(pd->device); > > > - > > > - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > > > + unsigned long num_pages = > > > + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > > > int rv; > > > siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > > > @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > > > rv = -EINVAL; > > > goto err_out; > > > } > > > - if (mem_limit != RLIM_INFINITY) { > > > - unsigned long num_pages = > > > - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > > > - mem_limit >>= PAGE_SHIFT; > > > - if (num_pages > mem_limit - current->mm->locked_vm) { > > > - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > > - num_pages, mem_limit, > > > - current->mm->locked_vm); > > > - rv = -ENOMEM; > > > - goto err_out; > > > - } > > > + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { > > > + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > > + num_pages, rlimit(RLIMIT_MEMLOCK), > > > + current->mm->locked_vm); > > > + rv = -ENOMEM; > > > + goto err_out; > > > } > > Sorry for late response, but why does this hunk exist in first place? > > > > > + > > > umem = siw_umem_get(start, len, ib_access_writable(rights)); > > This should be ib_umem_get(). > > IMO, it deserves a separate patch, and replace siw_umem_get with ib_umem_get > is not straightforward given siw_mem has two types of memory (pbl and umem). The thing is that once you convince yourself that SIW should use ib_umem_get(), the same question will arise for other parts of this patch where ib_umem_check_rlimit_memlock() is used. And if we eliminate them all, there won't be a need for this new API call at all. Thanks > > Thanks, > Guoqing
On 23/10/2023 07:52, Leon Romanovsky wrote: > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: >> >> >> On 10/15/23 17:19, Leon Romanovsky wrote: >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: >>>> This patch provides the uniform handling for RLIM_INFINITY value >>>> across the infiniband/rdma subsystem. >>>> >>>> Currently in some cases the infinity constant is treated >>>> as an actual limit value, which could be misleading. >>>> >>>> Let's also provide the single helper to check against process >>>> MEMLOCK limit while registering user memory region mappings. >>>> >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com> >>>> --- >>>> >>>> v1 -> v2: rewritten commit message, rebased on recent upstream >>>> >>>> drivers/infiniband/core/umem.c | 7 ++----- >>>> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- >>>> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- >>>> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- >>>> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ >>>> include/rdma/ib_umem.h | 11 +++++++++++ >>>> 6 files changed, 31 insertions(+), 29 deletions(-) >>> <...> >>> >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, >>>> struct siw_umem *umem = NULL; >>>> struct siw_ureq_reg_mr ureq; >>>> struct siw_device *sdev = to_siw_dev(pd->device); >>>> - >>>> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); >>>> + unsigned long num_pages = >>>> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; >>>> int rv; >>>> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, >>>> rv = -EINVAL; >>>> goto err_out; >>>> } >>>> - if (mem_limit != RLIM_INFINITY) { >>>> - unsigned long num_pages = >>>> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; >>>> - mem_limit >>= PAGE_SHIFT; >>>> - if (num_pages > mem_limit - current->mm->locked_vm) { >>>> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", >>>> - num_pages, mem_limit, >>>> - current->mm->locked_vm); >>>> - rv = -ENOMEM; >>>> - goto err_out; >>>> - } >>>> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { >>>> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", >>>> + num_pages, rlimit(RLIMIT_MEMLOCK), >>>> + current->mm->locked_vm); >>>> + rv = -ENOMEM; >>>> + goto err_out; >>>> } >>> Sorry for late response, but why does this hunk exist in first place? >>> Trailing newline, will definitely drop it. >>>> + >>>> umem = siw_umem_get(start, len, ib_access_writable(rights)); >>> This should be ib_umem_get(). >> >> IMO, it deserves a separate patch, and replace siw_umem_get with ib_umem_get >> is not straightforward given siw_mem has two types of memory (pbl and umem). > > The thing is that once you convince yourself that SIW should use ib_umem_get(), > the same question will arise for other parts of this patch where > ib_umem_check_rlimit_memlock() is used. > > And if we eliminate them all, there won't be a need for this new API call at all. > > Thanks > Hi! So, as for 31.10.2023 I still see siw_umem_get() call used in linux-rdma repo in "for-next" branch. AFAIU this helper call is used only in a single place and could potentially be replaced with ib_umem_get() as Leon suggests. But should we perform it right inside this memlock helper patch? I can submit later another patch with siw_umem_get() replaced if necessary. >> >> Thanks, >> Guoqing
On Tue, Oct 31, 2023 at 01:30:27PM +0000, Maxim Samoylov wrote: > On 23/10/2023 07:52, Leon Romanovsky wrote: > > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: > >> > >> > >> On 10/15/23 17:19, Leon Romanovsky wrote: > >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > >>>> This patch provides the uniform handling for RLIM_INFINITY value > >>>> across the infiniband/rdma subsystem. > >>>> > >>>> Currently in some cases the infinity constant is treated > >>>> as an actual limit value, which could be misleading. > >>>> > >>>> Let's also provide the single helper to check against process > >>>> MEMLOCK limit while registering user memory region mappings. > >>>> > >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com> > >>>> --- > >>>> > >>>> v1 -> v2: rewritten commit message, rebased on recent upstream > >>>> > >>>> drivers/infiniband/core/umem.c | 7 ++----- > >>>> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > >>>> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > >>>> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > >>>> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ > >>>> include/rdma/ib_umem.h | 11 +++++++++++ > >>>> 6 files changed, 31 insertions(+), 29 deletions(-) > >>> <...> > >>> > >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > >>>> struct siw_umem *umem = NULL; > >>>> struct siw_ureq_reg_mr ureq; > >>>> struct siw_device *sdev = to_siw_dev(pd->device); > >>>> - > >>>> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > >>>> + unsigned long num_pages = > >>>> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > >>>> int rv; > >>>> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, > >>>> rv = -EINVAL; > >>>> goto err_out; > >>>> } > >>>> - if (mem_limit != RLIM_INFINITY) { > >>>> - unsigned long num_pages = > >>>> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > >>>> - mem_limit >>= PAGE_SHIFT; > >>>> - if (num_pages > mem_limit - current->mm->locked_vm) { > >>>> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > >>>> - num_pages, mem_limit, > >>>> - current->mm->locked_vm); > >>>> - rv = -ENOMEM; > >>>> - goto err_out; > >>>> - } > >>>> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { > >>>> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > >>>> + num_pages, rlimit(RLIMIT_MEMLOCK), > >>>> + current->mm->locked_vm); > >>>> + rv = -ENOMEM; > >>>> + goto err_out; > >>>> } > >>> Sorry for late response, but why does this hunk exist in first place? > >>> > > Trailing newline, will definitely drop it. > > >>>> + > >>>> umem = siw_umem_get(start, len, ib_access_writable(rights)); > >>> This should be ib_umem_get(). > >> > >> IMO, it deserves a separate patch, and replace siw_umem_get with ib_umem_get > >> is not straightforward given siw_mem has two types of memory (pbl and umem). > > > > The thing is that once you convince yourself that SIW should use ib_umem_get(), > > the same question will arise for other parts of this patch where > > ib_umem_check_rlimit_memlock() is used. > > > > And if we eliminate them all, there won't be a need for this new API call at all. > > > > Thanks > > > > Hi! > > So, as for 31.10.2023 I still see siw_umem_get() call used in > linux-rdma repo in "for-next" branch. I hoped to hear some feedback from Bernard and Dennis. > > AFAIU this helper call is used only in a single place and could > potentially be replaced with ib_umem_get() as Leon suggests. > > But should we perform it right inside this memlock helper patch? > > I can submit later another patch with siw_umem_get() replaced > if necessary. > > > >> > >> Thanks, > >> Guoqing >
> -----Original Message----- > From: Leon Romanovsky <leon@kernel.org> > Sent: Thursday, November 2, 2023 1:32 PM > To: Maxim Samoylov <max7255@meta.com>; Bernard Metzler > <BMT@zurich.ibm.com>; Dennis Dalessandro > <dennis.dalessandro@cornelisnetworks.com> > Cc: Guoqing Jiang <guoqing.jiang@linux.dev>; linux-rdma@vger.kernel.org; > Jason Gunthorpe <jgg@ziepe.ca>; Christian Benvenuti <benve@cisco.com>; > Vadim Fedorenko <vadim.fedorenko@linux.dev> > Subject: [EXTERNAL] Re: [PATCH v2] IB: rework memlock limit handling code > > On Tue, Oct 31, 2023 at 01:30:27PM +0000, Maxim Samoylov wrote: > > On 23/10/2023 07:52, Leon Romanovsky wrote: > > > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: > > >> > > >> > > >> On 10/15/23 17:19, Leon Romanovsky wrote: > > >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > > >>>> This patch provides the uniform handling for RLIM_INFINITY value > > >>>> across the infiniband/rdma subsystem. > > >>>> > > >>>> Currently in some cases the infinity constant is treated > > >>>> as an actual limit value, which could be misleading. > > >>>> > > >>>> Let's also provide the single helper to check against process > > >>>> MEMLOCK limit while registering user memory region mappings. > > >>>> > > >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com> > > >>>> --- > > >>>> > > >>>> v1 -> v2: rewritten commit message, rebased on recent upstream > > >>>> > > >>>> drivers/infiniband/core/umem.c | 7 ++----- > > >>>> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > > >>>> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > > >>>> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > > >>>> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------- > ----- > > >>>> include/rdma/ib_umem.h | 11 +++++++++++ > > >>>> 6 files changed, 31 insertions(+), 29 deletions(-) > > >>> <...> > > >>> > > >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > *pd, u64 start, u64 len, > > >>>> struct siw_umem *umem = NULL; > > >>>> struct siw_ureq_reg_mr ureq; > > >>>> struct siw_device *sdev = to_siw_dev(pd->device); > > >>>> - > > >>>> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > > >>>> + unsigned long num_pages = > > >>>> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > > >>>> int rv; > > >>>> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > > >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > *pd, u64 start, u64 len, > > >>>> rv = -EINVAL; > > >>>> goto err_out; > > >>>> } > > >>>> - if (mem_limit != RLIM_INFINITY) { > > >>>> - unsigned long num_pages = > > >>>> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> > PAGE_SHIFT; > > >>>> - mem_limit >>= PAGE_SHIFT; > > >>>> - if (num_pages > mem_limit - current->mm->locked_vm) { > > >>>> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > >>>> - num_pages, mem_limit, > > >>>> - current->mm->locked_vm); > > >>>> - rv = -ENOMEM; > > >>>> - goto err_out; > > >>>> - } > > >>>> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm- > >locked_vm)) { > > >>>> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > >>>> + num_pages, rlimit(RLIMIT_MEMLOCK), > > >>>> + current->mm->locked_vm); > > >>>> + rv = -ENOMEM; > > >>>> + goto err_out; > > >>>> } > > >>> Sorry for late response, but why does this hunk exist in first place? > > >>> > > > > Trailing newline, will definitely drop it. > > > > >>>> + > > >>>> umem = siw_umem_get(start, len, ib_access_writable(rights)); > > >>> This should be ib_umem_get(). > > >> > > >> IMO, it deserves a separate patch, and replace siw_umem_get with > ib_umem_get > > >> is not straightforward given siw_mem has two types of memory (pbl and > umem). > > > > > > The thing is that once you convince yourself that SIW should use > ib_umem_get(), > > > the same question will arise for other parts of this patch where > > > ib_umem_check_rlimit_memlock() is used. > > > > > > And if we eliminate them all, there won't be a need for this new API > call at all. > > > > > > Thanks > > > > > > > Hi! > > > > So, as for 31.10.2023 I still see siw_umem_get() call used in > > linux-rdma repo in "for-next" branch. > > I hoped to hear some feedback from Bernard and Dennis. Sorry for the late reply. Not using ib_umem_get(), siw intentionally implements a simpler routine to get its user pages pinned and organized as a two dimensional list. It not only saves memory for the structure holding the pages, but also makes it very easy to resume sending or receiving data at any address without traversing ib_umem linearly. What we could do is using ib_umem_get() to get the page list and use that list to organize pages more efficiently for a software rdma driver. I think this is what rxe implements today. It comes with the penalty of extra data structures being allocated a software rdma driver does not care about. > > > > > AFAIU this helper call is used only in a single place and could > > potentially be replaced with ib_umem_get() as Leon suggests. > > > > But should we perform it right inside this memlock helper patch? > > > > I can submit later another patch with siw_umem_get() replaced > > if necessary. > > > > > > >> > > >> Thanks, > > >> Guoqing > >
On 11/2/23 8:32 AM, Leon Romanovsky wrote: >> >> So, as for 31.10.2023 I still see siw_umem_get() call used in >> linux-rdma repo in "for-next" branch. > > I hoped to hear some feedback from Bernard and Dennis. > Sorry about that. I thought I did respond about qib. Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
> -----Original Message----- > From: Leon Romanovsky <leon@kernel.org> > Sent: Thursday, November 2, 2023 1:32 PM > To: Maxim Samoylov <max7255@meta.com>; Bernard Metzler > <BMT@zurich.ibm.com>; Dennis Dalessandro > <dennis.dalessandro@cornelisnetworks.com> > Cc: Guoqing Jiang <guoqing.jiang@linux.dev>; linux-rdma@vger.kernel.org; > Jason Gunthorpe <jgg@ziepe.ca>; Christian Benvenuti <benve@cisco.com>; > Vadim Fedorenko <vadim.fedorenko@linux.dev> > Subject: [EXTERNAL] Re: [PATCH v2] IB: rework memlock limit handling code > > On Tue, Oct 31, 2023 at 01:30:27PM +0000, Maxim Samoylov wrote: > > On 23/10/2023 07:52, Leon Romanovsky wrote: > > > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: > > >> > > >> > > >> On 10/15/23 17:19, Leon Romanovsky wrote: > > >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > > >>>> This patch provides the uniform handling for RLIM_INFINITY value > > >>>> across the infiniband/rdma subsystem. > > >>>> > > >>>> Currently in some cases the infinity constant is treated > > >>>> as an actual limit value, which could be misleading. > > >>>> > > >>>> Let's also provide the single helper to check against process > > >>>> MEMLOCK limit while registering user memory region mappings. > > >>>> > > >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com> > > >>>> --- > > >>>> > > >>>> v1 -> v2: rewritten commit message, rebased on recent upstream > > >>>> > > >>>> drivers/infiniband/core/umem.c | 7 ++----- > > >>>> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > > >>>> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > > >>>> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > > >>>> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------- > ----- > > >>>> include/rdma/ib_umem.h | 11 +++++++++++ > > >>>> 6 files changed, 31 insertions(+), 29 deletions(-) > > >>> <...> > > >>> > > >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > *pd, u64 start, u64 len, > > >>>> struct siw_umem *umem = NULL; > > >>>> struct siw_ureq_reg_mr ureq; > > >>>> struct siw_device *sdev = to_siw_dev(pd->device); > > >>>> - > > >>>> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > > >>>> + unsigned long num_pages = > > >>>> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > > >>>> int rv; > > >>>> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > > >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > *pd, u64 start, u64 len, > > >>>> rv = -EINVAL; > > >>>> goto err_out; > > >>>> } > > >>>> - if (mem_limit != RLIM_INFINITY) { > > >>>> - unsigned long num_pages = > > >>>> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> > PAGE_SHIFT; > > >>>> - mem_limit >>= PAGE_SHIFT; > > >>>> - if (num_pages > mem_limit - current->mm->locked_vm) { > > >>>> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > >>>> - num_pages, mem_limit, > > >>>> - current->mm->locked_vm); > > >>>> - rv = -ENOMEM; > > >>>> - goto err_out; > > >>>> - } > > >>>> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm- > >locked_vm)) { > > >>>> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > >>>> + num_pages, rlimit(RLIMIT_MEMLOCK), > > >>>> + current->mm->locked_vm); > > >>>> + rv = -ENOMEM; > > >>>> + goto err_out; > > >>>> } > > >>> Sorry for late response, but why does this hunk exist in first place? If using ib_umem_get() for siw, as I sent as for-next patch yesterday, we can drop that logic completely, since we now have it in ib_umem_get(). It was only there because of not using ib_umem_get(). I can resend my pending for-next patch as a patch to current, also removing memlock check (I simply forgot to remove it). Not sure if it would obsolete this patch here completely. Leon, please advise. Otherwise: Acked-by: Bernard Metzler <bmt@zurich.ibm.com> > > >>> > > > > Trailing newline, will definitely drop it. > > > > >>>> + > > >>>> umem = siw_umem_get(start, len, ib_access_writable(rights)); > > >>> This should be ib_umem_get(). > > >> > > >> IMO, it deserves a separate patch, and replace siw_umem_get with > ib_umem_get > > >> is not straightforward given siw_mem has two types of memory (pbl and > umem). > > > > > > The thing is that once you convince yourself that SIW should use > ib_umem_get(), > > > the same question will arise for other parts of this patch where > > > ib_umem_check_rlimit_memlock() is used. > > > > > > And if we eliminate them all, there won't be a need for this new API > call at all. > > > > > > Thanks > > > > > > > Hi! > > > > So, as for 31.10.2023 I still see siw_umem_get() call used in > > linux-rdma repo in "for-next" branch. > > I hoped to hear some feedback from Bernard and Dennis. > > > > > AFAIU this helper call is used only in a single place and could > > potentially be replaced with ib_umem_get() as Leon suggests. > > > > But should we perform it right inside this memlock helper patch? > > > > I can submit later another patch with siw_umem_get() replaced > > if necessary. > > > > > > >> > > >> Thanks, > > >> Guoqing > >
On Fri, Nov 03, 2023 at 10:18:51AM +0000, Bernard Metzler wrote: > > > > -----Original Message----- > > From: Leon Romanovsky <leon@kernel.org> > > Sent: Thursday, November 2, 2023 1:32 PM > > To: Maxim Samoylov <max7255@meta.com>; Bernard Metzler > > <BMT@zurich.ibm.com>; Dennis Dalessandro > > <dennis.dalessandro@cornelisnetworks.com> > > Cc: Guoqing Jiang <guoqing.jiang@linux.dev>; linux-rdma@vger.kernel.org; > > Jason Gunthorpe <jgg@ziepe.ca>; Christian Benvenuti <benve@cisco.com>; > > Vadim Fedorenko <vadim.fedorenko@linux.dev> > > Subject: [EXTERNAL] Re: [PATCH v2] IB: rework memlock limit handling code > > > > On Tue, Oct 31, 2023 at 01:30:27PM +0000, Maxim Samoylov wrote: > > > On 23/10/2023 07:52, Leon Romanovsky wrote: > > > > On Mon, Oct 23, 2023 at 09:40:16AM +0800, Guoqing Jiang wrote: > > > >> > > > >> > > > >> On 10/15/23 17:19, Leon Romanovsky wrote: > > > >>> On Thu, Oct 12, 2023 at 01:29:21AM -0700, Maxim Samoylov wrote: > > > >>>> This patch provides the uniform handling for RLIM_INFINITY value > > > >>>> across the infiniband/rdma subsystem. > > > >>>> > > > >>>> Currently in some cases the infinity constant is treated > > > >>>> as an actual limit value, which could be misleading. > > > >>>> > > > >>>> Let's also provide the single helper to check against process > > > >>>> MEMLOCK limit while registering user memory region mappings. > > > >>>> > > > >>>> Signed-off-by: Maxim Samoylov<max7255@meta.com> > > > >>>> --- > > > >>>> > > > >>>> v1 -> v2: rewritten commit message, rebased on recent upstream > > > >>>> > > > >>>> drivers/infiniband/core/umem.c | 7 ++----- > > > >>>> drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- > > > >>>> drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- > > > >>>> drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- > > > >>>> drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------- > > ----- > > > >>>> include/rdma/ib_umem.h | 11 +++++++++++ > > > >>>> 6 files changed, 31 insertions(+), 29 deletions(-) > > > >>> <...> > > > >>> > > > >>>> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > > *pd, u64 start, u64 len, > > > >>>> struct siw_umem *umem = NULL; > > > >>>> struct siw_ureq_reg_mr ureq; > > > >>>> struct siw_device *sdev = to_siw_dev(pd->device); > > > >>>> - > > > >>>> - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); > > > >>>> + unsigned long num_pages = > > > >>>> + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; > > > >>>> int rv; > > > >>>> siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", > > > >>>> @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd > > *pd, u64 start, u64 len, > > > >>>> rv = -EINVAL; > > > >>>> goto err_out; > > > >>>> } > > > >>>> - if (mem_limit != RLIM_INFINITY) { > > > >>>> - unsigned long num_pages = > > > >>>> - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> > > PAGE_SHIFT; > > > >>>> - mem_limit >>= PAGE_SHIFT; > > > >>>> - if (num_pages > mem_limit - current->mm->locked_vm) { > > > >>>> - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > > >>>> - num_pages, mem_limit, > > > >>>> - current->mm->locked_vm); > > > >>>> - rv = -ENOMEM; > > > >>>> - goto err_out; > > > >>>> - } > > > >>>> + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm- > > >locked_vm)) { > > > >>>> + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", > > > >>>> + num_pages, rlimit(RLIMIT_MEMLOCK), > > > >>>> + current->mm->locked_vm); > > > >>>> + rv = -ENOMEM; > > > >>>> + goto err_out; > > > >>>> } > > > >>> Sorry for late response, but why does this hunk exist in first place? > > > If using ib_umem_get() for siw, as I sent as for-next > patch yesterday, we can drop that logic completely, since we now > have it in ib_umem_get(). It was only there because of not > using ib_umem_get(). > > I can resend my pending for-next patch as a patch to current, > also removing memlock check (I simply forgot to remove it). > Not sure if it would obsolete this patch here completely. > Leon, please advise. We are in the middle of merge window, so won't take any patches except bug fixes. So please, resend your patch after after merge window ends. Thanks > > Otherwise: > > Acked-by: Bernard Metzler <bmt@zurich.ibm.com> > > > > > >>> > > > > > > Trailing newline, will definitely drop it. > > > > > > >>>> + > > > >>>> umem = siw_umem_get(start, len, ib_access_writable(rights)); > > > >>> This should be ib_umem_get(). > > > >> > > > >> IMO, it deserves a separate patch, and replace siw_umem_get with > > ib_umem_get > > > >> is not straightforward given siw_mem has two types of memory (pbl and > > umem). > > > > > > > > The thing is that once you convince yourself that SIW should use > > ib_umem_get(), > > > > the same question will arise for other parts of this patch where > > > > ib_umem_check_rlimit_memlock() is used. > > > > > > > > And if we eliminate them all, there won't be a need for this new API > > call at all. > > > > > > > > Thanks > > > > > > > > > > Hi! > > > > > > So, as for 31.10.2023 I still see siw_umem_get() call used in > > > linux-rdma repo in "for-next" branch. > > > > I hoped to hear some feedback from Bernard and Dennis. > > > > > > > > AFAIU this helper call is used only in a single place and could > > > potentially be replaced with ib_umem_get() as Leon suggests. > > > > > > But should we perform it right inside this memlock helper patch? > > > > > > I can submit later another patch with siw_umem_get() replaced > > > if necessary. > > > > > > > > > >> > > > >> Thanks, > > > >> Guoqing > > >
On Thu, Nov 02, 2023 at 04:54:22PM -0400, Dennis Dalessandro wrote: > On 11/2/23 8:32 AM, Leon Romanovsky wrote: > >> > >> So, as for 31.10.2023 I still see siw_umem_get() call used in > >> linux-rdma repo in "for-next" branch. > > > > I hoped to hear some feedback from Bernard and Dennis. > > > > Sorry about that. I thought I did respond about qib. Dennis, qib probably needs to use ib_umem_get too. > > Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index f9ab671c8eda..3b197bdc21bf 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -35,12 +35,12 @@ #include <linux/mm.h> #include <linux/dma-mapping.h> -#include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/export.h> #include <linux/slab.h> #include <linux/pagemap.h> #include <linux/count_zeros.h> +#include <rdma/ib_umem.h> #include <rdma/ib_umem_odp.h> #include "uverbs.h" @@ -150,7 +150,6 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, { struct ib_umem *umem; struct page **page_list; - unsigned long lock_limit; unsigned long new_pinned; unsigned long cur_base; unsigned long dma_attr = 0; @@ -200,10 +199,8 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, goto out; } - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - new_pinned = atomic64_add_return(npages, &mm->pinned_vm); - if (new_pinned > lock_limit && !capable(CAP_IPC_LOCK)) { + if (!ib_umem_check_rlimit_memlock(new_pinned)) { atomic64_sub(npages, &mm->pinned_vm); ret = -ENOMEM; goto out; diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 1bb7507325bc..3889aefdfc6b 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -32,8 +32,8 @@ */ #include <linux/mm.h> -#include <linux/sched/signal.h> #include <linux/device.h> +#include <rdma/ib_umem.h> #include "qib.h" @@ -94,14 +94,13 @@ int qib_map_page(struct pci_dev *hwdev, struct page *page, dma_addr_t *daddr) int qib_get_user_pages(unsigned long start_page, size_t num_pages, struct page **p) { - unsigned long locked, lock_limit; + unsigned long locked; size_t got; int ret; - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; locked = atomic64_add_return(num_pages, ¤t->mm->pinned_vm); - if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { + if (!ib_umem_check_rlimit_memlock(locked)) { ret = -ENOMEM; goto bail; } diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c index 84e0f41e7dfa..fdbb9737c7f0 100644 --- a/drivers/infiniband/hw/usnic/usnic_uiom.c +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c @@ -34,13 +34,13 @@ #include <linux/mm.h> #include <linux/dma-mapping.h> -#include <linux/sched/signal.h> #include <linux/sched/mm.h> #include <linux/hugetlb.h> #include <linux/iommu.h> #include <linux/workqueue.h> #include <linux/list.h> #include <rdma/ib_verbs.h> +#include <rdma/ib_umem.h> #include "usnic_log.h" #include "usnic_uiom.h" @@ -90,7 +90,6 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, struct scatterlist *sg; struct usnic_uiom_chunk *chunk; unsigned long locked; - unsigned long lock_limit; unsigned long cur_base; unsigned long npages; int ret; @@ -124,9 +123,8 @@ static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable, mmap_read_lock(mm); locked = atomic64_add_return(npages, ¤t->mm->pinned_vm); - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) { + if (!ib_umem_check_rlimit_memlock(locked)) { ret = -ENOMEM; goto out; } diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c index c5f7f1669d09..fe11379e6928 100644 --- a/drivers/infiniband/sw/siw/siw_mem.c +++ b/drivers/infiniband/sw/siw/siw_mem.c @@ -5,6 +5,7 @@ #include <linux/gfp.h> #include <rdma/ib_verbs.h> +#include <rdma/ib_umem.h> #include <linux/dma-mapping.h> #include <linux/slab.h> #include <linux/sched/mm.h> @@ -367,7 +368,6 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) struct siw_umem *umem; struct mm_struct *mm_s; u64 first_page_va; - unsigned long mlock_limit; unsigned int foll_flags = FOLL_LONGTERM; int num_pages, num_chunks, i, rv = 0; @@ -396,9 +396,9 @@ struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable) mmap_read_lock(mm_s); - mlock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - if (atomic64_add_return(num_pages, &mm_s->pinned_vm) > mlock_limit) { + if (!ib_umem_check_rlimit_memlock( + atomic64_add_return(num_pages, &mm_s->pinned_vm))) { rv = -ENOMEM; goto out_sem_up; } diff --git a/drivers/infiniband/sw/siw/siw_verbs.c b/drivers/infiniband/sw/siw/siw_verbs.c index c5c27db9c2fe..e98a4a239722 100644 --- a/drivers/infiniband/sw/siw/siw_verbs.c +++ b/drivers/infiniband/sw/siw/siw_verbs.c @@ -12,6 +12,7 @@ #include <rdma/iw_cm.h> #include <rdma/ib_verbs.h> +#include <rdma/ib_umem.h> #include <rdma/ib_user_verbs.h> #include <rdma/uverbs_ioctl.h> @@ -1321,8 +1322,8 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, struct siw_umem *umem = NULL; struct siw_ureq_reg_mr ureq; struct siw_device *sdev = to_siw_dev(pd->device); - - unsigned long mem_limit = rlimit(RLIMIT_MEMLOCK); + unsigned long num_pages = + (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; int rv; siw_dbg_pd(pd, "start: 0x%pK, va: 0x%pK, len: %llu\n", @@ -1338,19 +1339,15 @@ struct ib_mr *siw_reg_user_mr(struct ib_pd *pd, u64 start, u64 len, rv = -EINVAL; goto err_out; } - if (mem_limit != RLIM_INFINITY) { - unsigned long num_pages = - (PAGE_ALIGN(len + (start & ~PAGE_MASK))) >> PAGE_SHIFT; - mem_limit >>= PAGE_SHIFT; - if (num_pages > mem_limit - current->mm->locked_vm) { - siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", - num_pages, mem_limit, - current->mm->locked_vm); - rv = -ENOMEM; - goto err_out; - } + if (!ib_umem_check_rlimit_memlock(num_pages + current->mm->locked_vm)) { + siw_dbg_pd(pd, "pages req %lu, max %lu, lock %lu\n", + num_pages, rlimit(RLIMIT_MEMLOCK), + current->mm->locked_vm); + rv = -ENOMEM; + goto err_out; } + umem = siw_umem_get(start, len, ib_access_writable(rights)); if (IS_ERR(umem)) { rv = PTR_ERR(umem); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 95896472a82b..3970da64b01e 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -11,6 +11,7 @@ #include <linux/scatterlist.h> #include <linux/workqueue.h> #include <rdma/ib_verbs.h> +#include <linux/sched/signal.h> struct ib_ucontext; struct ib_umem_odp; @@ -71,6 +72,16 @@ static inline size_t ib_umem_num_pages(struct ib_umem *umem) return ib_umem_num_dma_blocks(umem, PAGE_SIZE); } +static inline bool ib_umem_check_rlimit_memlock(unsigned long value) +{ + unsigned long lock_limit = rlimit(RLIMIT_MEMLOCK); + + if (lock_limit == RLIM_INFINITY || capable(CAP_IPC_LOCK)) + return true; + + return value <= PFN_DOWN(lock_limit); +} + static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter, struct ib_umem *umem, unsigned long pgsz)
This patch provides the uniform handling for RLIM_INFINITY value across the infiniband/rdma subsystem. Currently in some cases the infinity constant is treated as an actual limit value, which could be misleading. Let's also provide the single helper to check against process MEMLOCK limit while registering user memory region mappings. Signed-off-by: Maxim Samoylov <max7255@meta.com> --- v1 -> v2: rewritten commit message, rebased on recent upstream drivers/infiniband/core/umem.c | 7 ++----- drivers/infiniband/hw/qib/qib_user_pages.c | 7 +++---- drivers/infiniband/hw/usnic/usnic_uiom.c | 6 ++---- drivers/infiniband/sw/siw/siw_mem.c | 6 +++--- drivers/infiniband/sw/siw/siw_verbs.c | 23 ++++++++++------------ include/rdma/ib_umem.h | 11 +++++++++++ 6 files changed, 31 insertions(+), 29 deletions(-)