diff mbox

[V3,1/5] RDMA/core: Transport-independent access flags

Message ID 20150714202943.GB26927@obsidianresearch.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jason Gunthorpe July 14, 2015, 8:29 p.m. UTC
On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote:
> On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote:
> > You mean "should not", yea?  
> > 
> > Ok.  I'll check for iWARP.  But don't tell me to remove the transport-specific hacks in this series when I post it! ;)
> 
> Just curious if there are any holes in this little scheme to deal with
> the lkey mess:
> 
>  (1) make sure all drivers that currently do not set
>      IB_DEVICE_LOCAL_DMA_LKEY but which can safely use ib_get_dma_mr
>      call it underneath at device setup time, and tear it down before
>      removal.

Yes, I'd like this.

local_dma_lkey appears to be global, it works with any PD.

ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
the struct device level.

ib_alloc_pd is the in-kernel entry point, the UAPI calls
device->alloc_pd directly.. So how about the below patch as a starting
       >point?

(Steve the goal of step #1 would be to remove ib_get_dma_mr from ULPs
 Follow on patches would be to convert all ULPs to use this API change.)

In the long run this also makes it easier to develop helper wrappers
around posting because a local_dma_lkey is now always accessible to
the core code.

>  (2) now ULD can check for IB_DEVICE_LOCAL_DMA_LKEY and use local_dma_lkey
>      in that case, or will have to perform a per-IO MR with LOCAL and
>      REMOTE flags if not

If we do the above all ULPs simply use pd->local_dma_lkey and we drop
correct uses of ib_get_dma_mr completely ?

>  (3) remove insecure remote uses of ib_get_dma_mr from ULDs
>  (4) remove ib_get_dma_mr from the public API

Sure would be nice!

> This should help to sort out the lkey side of the memory registrations,
> and isn't too hard.  For rkeys I'd love to go with something like Sagis
> proposal as a first step, and then we can see if we can refine it in
> another iteration.

Agree. I'd love to see FMR fit under that, but even if we cannot, it
is appears to be a sane way to advance indirect MR..

Jason

From 5a9799bf487d822daae5a8b8f3b197f1dda1db45 Mon Sep 17 00:00:00 2001
From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Date: Tue, 14 Jul 2015 14:27:37 -0600
Subject: [PATCH] IB/core: Guarantee that a local_dma_lkey is available

Every single ULP requires a local_dma_lkey to do anything with
a QP, so lets us ensure one exists for every PD created.

If the driver can supply a global local_dma_lkey then use that, otherwise
ask the driver to create a local use all physical memory MR associated
with the new PD.

Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
---
 drivers/infiniband/core/verbs.c | 40 +++++++++++++++++++++++++++++++++++-----
 include/rdma/ib_verbs.h         |  2 ++
 2 files changed, 37 insertions(+), 5 deletions(-)

Comments

Steve Wise July 14, 2015, 8:40 p.m. UTC | #1
> -----Original Message-----
> From: linux-rdma-owner@vger.kernel.org [mailto:linux-rdma-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> Sent: Tuesday, July 14, 2015 3:30 PM
> To: 'Christoph Hellwig'
> Cc: Steve Wise; 'Sagi Grimberg'; 'Steve Wise'; 'Tom Talpey'; 'Doug Ledford'; sagig@mellanox.com; ogerlitz@mellanox.com;
> roid@mellanox.com; linux-rdma@vger.kernel.org; eli@mellanox.com; target-devel@vger.kernel.org; linux-nfs@vger.kernel.org;
> trond.myklebust@primarydata.com; bfields@fieldses.org; 'Oren Duer'
> Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
> 
> On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote:
> > On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote:
> > > You mean "should not", yea?
> > >
> > > Ok.  I'll check for iWARP.  But don't tell me to remove the transport-specific hacks in this series when I post it! ;)
> >
> > Just curious if there are any holes in this little scheme to deal with
> > the lkey mess:
> >
> >  (1) make sure all drivers that currently do not set
> >      IB_DEVICE_LOCAL_DMA_LKEY but which can safely use ib_get_dma_mr
> >      call it underneath at device setup time, and tear it down before
> >      removal.
> 
> Yes, I'd like this.
> 
> local_dma_lkey appears to be global, it works with any PD.
> 
> ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> the struct device level.
> 
> ib_alloc_pd is the in-kernel entry point, the UAPI calls
> device->alloc_pd directly.. So how about the below patch as a starting
>        >point?
> 
> (Steve the goal of step #1 would be to remove ib_get_dma_mr from ULPs
>  Follow on patches would be to convert all ULPs to use this API change.)
> 
> In the long run this also makes it easier to develop helper wrappers
> around posting because a local_dma_lkey is now always accessible to
> the core code.
>

I'm not seeing the benefit of adding pd->local_dma_lkey?  pd->device->local_dma_lkey is there for core and ULP use, and we could
have old drivers that don't currently have support for local_dma_lkey allocate their own private pd/dma_mr (via their private
functions for doing this) with only LOCAL_WRITE access flags, and export that lkey as the device->local_dma_lkey.  Wouldn't that be
simpler?

 > >  (2) now ULD can check for IB_DEVICE_LOCAL_DMA_LKEY and use local_dma_lkey
> >      in that case, or will have to perform a per-IO MR with LOCAL and
> >      REMOTE flags if not
> 
> If we do the above all ULPs simply use pd->local_dma_lkey and we drop
> correct uses of ib_get_dma_mr completely ?
> 
> >  (3) remove insecure remote uses of ib_get_dma_mr from ULDs
> >  (4) remove ib_get_dma_mr from the public API
> 
> Sure would be nice!
> 
> > This should help to sort out the lkey side of the memory registrations,
> > and isn't too hard.  For rkeys I'd love to go with something like Sagis
> > proposal as a first step, and then we can see if we can refine it in
> > another iteration.
> 
> Agree. I'd love to see FMR fit under that, but even if we cannot, it
> is appears to be a sane way to advance indirect MR..
> 
> Jason
> 
> >From 5a9799bf487d822daae5a8b8f3b197f1dda1db45 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> Date: Tue, 14 Jul 2015 14:27:37 -0600
> Subject: [PATCH] IB/core: Guarantee that a local_dma_lkey is available
> 
> Every single ULP requires a local_dma_lkey to do anything with
> a QP, so lets us ensure one exists for every PD created.
> 
> If the driver can supply a global local_dma_lkey then use that, otherwise
> ask the driver to create a local use all physical memory MR associated
> with the new PD.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>  drivers/infiniband/core/verbs.c | 40 +++++++++++++++++++++++++++++++++++-----
>  include/rdma/ib_verbs.h         |  2 ++
>  2 files changed, 37 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index bac3fb406a74..1ddf06314f36 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -213,24 +213,54 @@ EXPORT_SYMBOL(rdma_port_get_link_layer);
> 
>  /* Protection domains */
> 
> +/* Return a pd for in-kernel use that has a local_dma_lkey which provides
> +   local access to all physical memory. */
>  struct ib_pd *ib_alloc_pd(struct ib_device *device)
>  {
>  	struct ib_pd *pd;
> +	struct ib_device_attr devattr;
> +	int rc;
> +
> +	rc = ib_query_device(device, &devattr);
> +	if (rc)
> +		return ERR_PTR(rc);
> 
>  	pd = device->alloc_pd(device, NULL, NULL);
> +	if (IS_ERR(pd))
> +		return pd;
> +
> +	pd->device = device;
> +	pd->uobject = NULL;
> +	pd->local_mr = NULL;
> +	atomic_set(&pd->usecnt, 0);
> +
> +	if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
> +		pd->local_dma_lkey = device->local_dma_lkey;
> +	else {
> +		struct ib_mr *mr;
> +
> +		mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
> +		if (IS_ERR(mr)) {
> +			ib_dealloc_pd(pd);
> +			return (struct ib_pd *)mr;
> +		}
> 
> -	if (!IS_ERR(pd)) {
> -		pd->device  = device;
> -		pd->uobject = NULL;
> -		atomic_set(&pd->usecnt, 0);
> +		pd->local_mr = mr;
> +		pd->local_dma_lkey = pd->local_mr->lkey;
>  	}
> -
>  	return pd;
>  }
> +
>  EXPORT_SYMBOL(ib_alloc_pd);
> 
>  int ib_dealloc_pd(struct ib_pd *pd)
>  {
> +	if (pd->local_mr) {
> +		if (ib_dereg_mr(pd->local_mr))
> +			return -EBUSY;
> +		pd->local_mr = NULL;
> +	}
> +
>  	if (atomic_read(&pd->usecnt))
>  		return -EBUSY;
> 
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 986fddb08579..cfda95d7b067 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1255,6 +1255,8 @@ struct ib_pd {
>  	struct ib_device       *device;
>  	struct ib_uobject      *uobject;
>  	atomic_t          	usecnt; /* count all resources */
> +	struct ib_mr	       *local_mr;
> +	u32			local_dma_lkey;
>  };
> 
>  struct ib_xrcd {
> --
> 2.1.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 14, 2015, 8:44 p.m. UTC | #2
On Tue, Jul 14, 2015 at 03:40:49PM -0500, Steve Wise wrote:
> > local_dma_lkey appears to be global, it works with any PD.
> > 
> > ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> > the struct device level.
> > 
> > ib_alloc_pd is the in-kernel entry point, the UAPI calls
> > device->alloc_pd directly.. So how about the below patch as a starting
> >        >point?
> > 
> > (Steve the goal of step #1 would be to remove ib_get_dma_mr from ULPs
> >  Follow on patches would be to convert all ULPs to use this API change.)

> I'm not seeing the benefit of adding pd->local_dma_lkey?
> pd->device->local_dma_lkey is there for core and ULP use, and we
> could have old drivers that don't currently have support for
> local_dma_lkey allocate their own private pd/dma_mr (via their
> private functions for doing this) with only LOCAL_WRITE access
> flags, and export that lkey as the device->local_dma_lkey.  Wouldn't
> that be simpler?

It would be, but AFAIK that can't work?

My understanding is if you create a QP against a PD then only lkeys
and rkeys (and local_dma_rkey) created against that PD are valid for
use with that QP.

I can't use an lkey from a PD not associated with the QP.

Am I wrong on this?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Tom Talpey July 14, 2015, 8:50 p.m. UTC | #3
On 7/14/2015 4:29 PM, Jason Gunthorpe wrote:
> On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote:
>> On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote:
>>> You mean "should not", yea?
>>>
>>> Ok.  I'll check for iWARP.  But don't tell me to remove the transport-specific hacks in this series when I post it! ;)
>>
>> Just curious if there are any holes in this little scheme to deal with
>> the lkey mess:
>>
>>   (1) make sure all drivers that currently do not set
>>       IB_DEVICE_LOCAL_DMA_LKEY but which can safely use ib_get_dma_mr
>>       call it underneath at device setup time, and tear it down before
>>       removal.
>
> Yes, I'd like this.
>
> local_dma_lkey appears to be global, it works with any PD.

Only if it's supported, right? There's an attribute that a provider uses
to expose it. For example, I would not expect a virtualized provider to
be able to support this.

>
> ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> the struct device level.

Correct, and by design, in fact. In most implementations, a different
token is returned for each call, in fact.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steve Wise July 14, 2015, 8:54 p.m. UTC | #4
> -----Original Message-----
> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-owner@vger.kernel.org] On Behalf Of Jason Gunthorpe
> Sent: Tuesday, July 14, 2015 3:45 PM
> To: Steve Wise
> Cc: 'Christoph Hellwig'; 'Sagi Grimberg'; 'Tom Talpey'; 'Doug Ledford'; sagig@mellanox.com; ogerlitz@mellanox.com;
> roid@mellanox.com; linux-rdma@vger.kernel.org; eli@mellanox.com; target-devel@vger.kernel.org; linux-nfs@vger.kernel.org;
> trond.myklebust@primarydata.com; bfields@fieldses.org; 'Oren Duer'
> Subject: Re: [PATCH V3 1/5] RDMA/core: Transport-independent access flags
> 
> On Tue, Jul 14, 2015 at 03:40:49PM -0500, Steve Wise wrote:
> > > local_dma_lkey appears to be global, it works with any PD.
> > >
> > > ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> > > the struct device level.
> > >
> > > ib_alloc_pd is the in-kernel entry point, the UAPI calls
> > > device->alloc_pd directly.. So how about the below patch as a starting
> > >        >point?
> > >
> > > (Steve the goal of step #1 would be to remove ib_get_dma_mr from ULPs
> > >  Follow on patches would be to convert all ULPs to use this API change.)
> 
> > I'm not seeing the benefit of adding pd->local_dma_lkey?
> > pd->device->local_dma_lkey is there for core and ULP use, and we
> > could have old drivers that don't currently have support for
> > local_dma_lkey allocate their own private pd/dma_mr (via their
> > private functions for doing this) with only LOCAL_WRITE access
> > flags, and export that lkey as the device->local_dma_lkey.  Wouldn't
> > that be simpler?
> 
> It would be, but AFAIK that can't work?
> 
> My understanding is if you create a QP against a PD then only lkeys
> and rkeys (and local_dma_rkey) created against that PD are valid for
> use with that QP.
> 
> I can't use an lkey from a PD not associated with the QP.
> 
> Am I wrong on this?

Kernel users can use the local_dma_lkey for all lkey IO on all QPs (ignoring the iwarp read issue).  Look at sc_dma_lkey in the
NFSRDMA server.

> 
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 14, 2015, 8:59 p.m. UTC | #5
On Tue, Jul 14, 2015 at 03:54:15PM -0500, Steve Wise wrote:
> > > I'm not seeing the benefit of adding pd->local_dma_lkey?
> > > pd->device->local_dma_lkey is there for core and ULP use, and we
> > > could have old drivers that don't currently have support for
> > > local_dma_lkey allocate their own private pd/dma_mr (via their
> > > private functions for doing this) with only LOCAL_WRITE access
> > > flags, and export that lkey as the device->local_dma_lkey.  Wouldn't
> > > that be simpler?
> > 
> > It would be, but AFAIK that can't work?
> > 
> > My understanding is if you create a QP against a PD then only lkeys
> > and rkeys (and local_dma_rkey) created against that PD are valid for
> > use with that QP.
> > 
> > I can't use an lkey from a PD not associated with the QP.
> > 
> > Am I wrong on this?
> 
> Kernel users can use the local_dma_lkey for all lkey IO on all QPs
> (ignoring the iwarp read issue).  Look at sc_dma_lkey in the NFSRDMA
> server.

Read the exchange again?

You asked this:

> > > could have old drivers that don't currently have support for
> > > local_dma_lkey allocate their own private pd/dma_mr (via their

The answer to why that doesn't work is: In the generic case of old
drivers every PD has a different local_dma_lkey value.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 15, 2015, 6:50 a.m. UTC | #6
On Tue, Jul 14, 2015 at 02:29:43PM -0600, Jason Gunthorpe wrote:
> local_dma_lkey appears to be global, it works with any PD.
> 
> ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> the struct device level.
> 
> ib_alloc_pd is the in-kernel entry point, the UAPI calls
> device->alloc_pd directly.. So how about the below patch as a starting
> point?

This looks perfect to me.  After this we can get rid of the
ib_get_dma_mr calls outside of ib_alloc_pd, and eventuall move
setting up ->local_dma_lkey into the HW driver and kill of
ib_get_dma_mr, IB_DEVICE_LOCAL_DMA_LKEY and device->local_dma_lkey.

> >  (2) now ULD can check for IB_DEVICE_LOCAL_DMA_LKEY and use local_dma_lkey
> >      in that case, or will have to perform a per-IO MR with LOCAL and
> >      REMOTE flags if not
> 
> If we do the above all ULPs simply use pd->local_dma_lkey and we drop
> correct uses of ib_get_dma_mr completely ?

Yes, please, although:

> >  (3) remove insecure remote uses of ib_get_dma_mr from ULDs

I kept this separate as the I suspect the "optimizations" using 
ib_get_dma_mr with remote flags will warrant a separate discussion.

> >From 5a9799bf487d822daae5a8b8f3b197f1dda1db45 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> Date: Tue, 14 Jul 2015 14:27:37 -0600
> Subject: [PATCH] IB/core: Guarantee that a local_dma_lkey is available
> 
> Every single ULP requires a local_dma_lkey to do anything with
> a QP, so lets us ensure one exists for every PD created.
> 
> If the driver can supply a global local_dma_lkey then use that, otherwise
> ask the driver to create a local use all physical memory MR associated
> with the new PD.
> 
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>


Acked-by: Christoph Hellwig <hch@lst.de>

(especially together with patches removing all other callsites of
"ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE)".
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg July 15, 2015, 8:47 a.m. UTC | #7
On 7/14/2015 11:29 PM, Jason Gunthorpe wrote:
> On Tue, Jul 14, 2015 at 12:55:11PM -0700, 'Christoph Hellwig' wrote:
>> On Tue, Jul 14, 2015 at 02:32:31PM -0500, Steve Wise wrote:
>>> You mean "should not", yea?
>>>
>>> Ok.  I'll check for iWARP.  But don't tell me to remove the transport-specific hacks in this series when I post it! ;)
>>
>> Just curious if there are any holes in this little scheme to deal with
>> the lkey mess:
>>
>>   (1) make sure all drivers that currently do not set
>>       IB_DEVICE_LOCAL_DMA_LKEY but which can safely use ib_get_dma_mr
>>       call it underneath at device setup time, and tear it down before
>>       removal.
>
> Yes, I'd like this.
>
> local_dma_lkey appears to be global, it works with any PD.
>
> ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> the struct device level.
>
> ib_alloc_pd is the in-kernel entry point, the UAPI calls
> device->alloc_pd directly.. So how about the below patch as a starting
>         >point?
>
> (Steve the goal of step #1 would be to remove ib_get_dma_mr from ULPs
>   Follow on patches would be to convert all ULPs to use this API change.)
>
> In the long run this also makes it easier to develop helper wrappers
> around posting because a local_dma_lkey is now always accessible to
> the core code.
>
>>   (2) now ULD can check for IB_DEVICE_LOCAL_DMA_LKEY and use local_dma_lkey
>>       in that case, or will have to perform a per-IO MR with LOCAL and
>>       REMOTE flags if not
>
> If we do the above all ULPs simply use pd->local_dma_lkey and we drop
> correct uses of ib_get_dma_mr completely ?
>
>>   (3) remove insecure remote uses of ib_get_dma_mr from ULDs
>>   (4) remove ib_get_dma_mr from the public API
>
> Sure would be nice!
>
>> This should help to sort out the lkey side of the memory registrations,
>> and isn't too hard.  For rkeys I'd love to go with something like Sagis
>> proposal as a first step, and then we can see if we can refine it in
>> another iteration.
>
> Agree. I'd love to see FMR fit under that, but even if we cannot, it
> is appears to be a sane way to advance indirect MR..
>
> Jason
>
>  From 5a9799bf487d822daae5a8b8f3b197f1dda1db45 Mon Sep 17 00:00:00 2001
> From: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> Date: Tue, 14 Jul 2015 14:27:37 -0600
> Subject: [PATCH] IB/core: Guarantee that a local_dma_lkey is available
>
> Every single ULP requires a local_dma_lkey to do anything with
> a QP, so lets us ensure one exists for every PD created.
>
> If the driver can supply a global local_dma_lkey then use that, otherwise
> ask the driver to create a local use all physical memory MR associated
> with the new PD.
>
> Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
> ---
>   drivers/infiniband/core/verbs.c | 40 +++++++++++++++++++++++++++++++++++-----
>   include/rdma/ib_verbs.h         |  2 ++
>   2 files changed, 37 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
> index bac3fb406a74..1ddf06314f36 100644
> --- a/drivers/infiniband/core/verbs.c
> +++ b/drivers/infiniband/core/verbs.c
> @@ -213,24 +213,54 @@ EXPORT_SYMBOL(rdma_port_get_link_layer);
>
>   /* Protection domains */
>
> +/* Return a pd for in-kernel use that has a local_dma_lkey which provides
> +   local access to all physical memory. */
>   struct ib_pd *ib_alloc_pd(struct ib_device *device)
>   {
>   	struct ib_pd *pd;
> +	struct ib_device_attr devattr;
> +	int rc;
> +
> +	rc = ib_query_device(device, &devattr);
> +	if (rc)
> +		return ERR_PTR(rc);

Unrelated question,
Can we just cache the dev_attr in ib_device already? It's pretty
obvious that each consumer that opens a device will automatically
query its attributes...

>
>   	pd = device->alloc_pd(device, NULL, NULL);
> +	if (IS_ERR(pd))
> +		return pd;
> +
> +	pd->device = device;
> +	pd->uobject = NULL;
> +	pd->local_mr = NULL;
> +	atomic_set(&pd->usecnt, 0);
> +
> +	if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
> +		pd->local_dma_lkey = device->local_dma_lkey;
> +	else {
> +		struct ib_mr *mr;
> +
> +		mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
> +		if (IS_ERR(mr)) {
> +			ib_dealloc_pd(pd);
> +			return (struct ib_pd *)mr;
> +		}
>
> -	if (!IS_ERR(pd)) {
> -		pd->device  = device;
> -		pd->uobject = NULL;
> -		atomic_set(&pd->usecnt, 0);
> +		pd->local_mr = mr;
> +		pd->local_dma_lkey = pd->local_mr->lkey;
>   	}
> -
>   	return pd;
>   }
> +
>   EXPORT_SYMBOL(ib_alloc_pd);
>
>   int ib_dealloc_pd(struct ib_pd *pd)
>   {
> +	if (pd->local_mr) {
> +		if (ib_dereg_mr(pd->local_mr))
> +			return -EBUSY;
> +		pd->local_mr = NULL;
> +	}
> +
>   	if (atomic_read(&pd->usecnt))
>   		return -EBUSY;
>
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 986fddb08579..cfda95d7b067 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1255,6 +1255,8 @@ struct ib_pd {
>   	struct ib_device       *device;
>   	struct ib_uobject      *uobject;
>   	atomic_t          	usecnt; /* count all resources */
> +	struct ib_mr	       *local_mr;
> +	u32			local_dma_lkey;
>   };
>
>   struct ib_xrcd {
>


The patch itself looks good.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 15, 2015, 12:19 p.m. UTC | #8
On Wed, Jul 15, 2015 at 11:47:52AM +0300, Sagi Grimberg wrote:
> >  struct ib_pd *ib_alloc_pd(struct ib_device *device)
> >  {
> >  	struct ib_pd *pd;
> >+	struct ib_device_attr devattr;
> >+	int rc;
> >+
> >+	rc = ib_query_device(device, &devattr);
> >+	if (rc)
> >+		return ERR_PTR(rc);
> 
> Unrelated question,
> Can we just cache the dev_attr in ib_device already? It's pretty
> obvious that each consumer that opens a device will automatically
> query its attributes...

Instead of caching it I'd suggest to kill ib_device_attr and move
the field into struct ib_device, similar to how all major kernel
subsystems work.  Otherwise fully agreed.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 15, 2015, 7:12 p.m. UTC | #9
On Tue, Jul 14, 2015 at 11:50:57PM -0700, 'Christoph Hellwig' wrote:
> On Tue, Jul 14, 2015 at 02:29:43PM -0600, Jason Gunthorpe wrote:
> > local_dma_lkey appears to be global, it works with any PD.
> > 
> > ib_get_dma_mr is tied to a PD, so it cannot replace local_dma_lkey at
> > the struct device level.
> > 
> > ib_alloc_pd is the in-kernel entry point, the UAPI calls
> > device->alloc_pd directly.. So how about the below patch as a starting
> > point?
> 
> This looks perfect to me.  After this we can get rid of the
> ib_get_dma_mr calls outside of ib_alloc_pd, and eventuall move
> setting up ->local_dma_lkey into the HW driver and kill of
> ib_get_dma_mr, IB_DEVICE_LOCAL_DMA_LKEY and device->local_dma_lkey.

Just for clarity, again, we can never do this.

device->local_dma_lkey requires dedicated hardware support. We cannot
create it in software on old hardware. The only option I see is the
different-for-every-PD solution in my patch.

> (especially together with patches removing all other callsites of
> "ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE)".

I might find time to type this in, but I won't be able to find time to
do any testing on the ULPs..

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 15, 2015, 7:17 p.m. UTC | #10
On Wed, Jul 15, 2015 at 05:19:26AM -0700, 'Christoph Hellwig' wrote:
> On Wed, Jul 15, 2015 at 11:47:52AM +0300, Sagi Grimberg wrote:
> > >  struct ib_pd *ib_alloc_pd(struct ib_device *device)
> > >  {
> > >  	struct ib_pd *pd;
> > >+	struct ib_device_attr devattr;
> > >+	int rc;
> > >+
> > >+	rc = ib_query_device(device, &devattr);
> > >+	if (rc)
> > >+		return ERR_PTR(rc);
> > 
> > Unrelated question,
> > Can we just cache the dev_attr in ib_device already? It's pretty
> > obvious that each consumer that opens a device will automatically
> > query its attributes...
> 
> Instead of caching it I'd suggest to kill ib_device_attr and move
> the field into struct ib_device, similar to how all major kernel
> subsystems work.  Otherwise fully agreed.

Indeed, I think there are quite a few weird cases like this floating
around for someone to work on :)

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 16, 2015, 6:41 a.m. UTC | #11
On Wed, Jul 15, 2015 at 01:12:57PM -0600, Jason Gunthorpe wrote:

> I might find time to type this in, but I won't be able to find time to
> do any testing on the ULPs..

Here is the typing, I'll look more carefully at it later and send it
via email:

https://github.com/jgunthorpe/linux/commits/remove-ib_get_dma_mr

Jason Gunthorpe (9):
      IB/core: Guarantee that a local_dma_lkey is available
      IB/mad: Remove ib_get_dma_mr calls
      IB/ipoib: Remove ib_get_dma_mr calls
      IB/mlx4: Remove ib_get_dma_mr calls
      IB/mlx5: Remove ib_get_dma_mr calls
      iser-target: Remove ib_get_dma_mr calls
      ib_srpt: Remove ib_get_dma_mr calls
      net/9p: Remove ib_get_dma_mr calls
      rds/ib: Remove ib_get_dma_mr calls

The remaining calls are more complex as they are opening remote
writable physical windows.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig July 16, 2015, 8:04 a.m. UTC | #12
On Wed, Jul 15, 2015 at 01:12:57PM -0600, Jason Gunthorpe wrote:
> > This looks perfect to me.  After this we can get rid of the
> > ib_get_dma_mr calls outside of ib_alloc_pd, and eventuall move
> > setting up ->local_dma_lkey into the HW driver and kill of
> > ib_get_dma_mr, IB_DEVICE_LOCAL_DMA_LKEY and device->local_dma_lkey.
> 
> Just for clarity, again, we can never do this.
> 
> device->local_dma_lkey requires dedicated hardware support. We cannot
> create it in software on old hardware. The only option I see is the
> different-for-every-PD solution in my patch.

I don't see how my sentence above contradicts this.

One we use pd->local_dma_lkey everywhere, we can kill of
device->local_dma_lkey as an API - drivers either stick it straight
into pd->local_dma_lkey or do the internal equivalent of ib_get_dma_mr.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 16, 2015, 4:13 p.m. UTC | #13
On Thu, Jul 16, 2015 at 01:04:02AM -0700, 'Christoph Hellwig' wrote:
> On Wed, Jul 15, 2015 at 01:12:57PM -0600, Jason Gunthorpe wrote:
> > > This looks perfect to me.  After this we can get rid of the
> > > ib_get_dma_mr calls outside of ib_alloc_pd, and eventuall move
> > > setting up ->local_dma_lkey into the HW driver and kill of
> > > ib_get_dma_mr, IB_DEVICE_LOCAL_DMA_LKEY and device->local_dma_lkey.
> > 
> > Just for clarity, again, we can never do this.
> > 
> > device->local_dma_lkey requires dedicated hardware support. We cannot
> > create it in software on old hardware. The only option I see is the
> > different-for-every-PD solution in my patch.
> 
> I don't see how my sentence above contradicts this.
> 
> One we use pd->local_dma_lkey everywhere, we can kill of
> device->local_dma_lkey as an API - drivers either stick it straight
> into pd->local_dma_lkey or do the internal equivalent of
> ib_get_dma_mr.

Right, I misread your message

Thanks,
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index bac3fb406a74..1ddf06314f36 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -213,24 +213,54 @@  EXPORT_SYMBOL(rdma_port_get_link_layer);
 
 /* Protection domains */
 
+/* Return a pd for in-kernel use that has a local_dma_lkey which provides
+   local access to all physical memory. */
 struct ib_pd *ib_alloc_pd(struct ib_device *device)
 {
 	struct ib_pd *pd;
+	struct ib_device_attr devattr;
+	int rc;
+
+	rc = ib_query_device(device, &devattr);
+	if (rc)
+		return ERR_PTR(rc);
 
 	pd = device->alloc_pd(device, NULL, NULL);
+	if (IS_ERR(pd))
+		return pd;
+
+	pd->device = device;
+	pd->uobject = NULL;
+	pd->local_mr = NULL;
+	atomic_set(&pd->usecnt, 0);
+
+	if (devattr.device_cap_flags & IB_DEVICE_LOCAL_DMA_LKEY)
+		pd->local_dma_lkey = device->local_dma_lkey;
+	else {
+		struct ib_mr *mr;
+
+		mr = ib_get_dma_mr(pd, IB_ACCESS_LOCAL_WRITE);
+		if (IS_ERR(mr)) {
+			ib_dealloc_pd(pd);
+			return (struct ib_pd *)mr;
+		}
 
-	if (!IS_ERR(pd)) {
-		pd->device  = device;
-		pd->uobject = NULL;
-		atomic_set(&pd->usecnt, 0);
+		pd->local_mr = mr;
+		pd->local_dma_lkey = pd->local_mr->lkey;
 	}
-
 	return pd;
 }
+
 EXPORT_SYMBOL(ib_alloc_pd);
 
 int ib_dealloc_pd(struct ib_pd *pd)
 {
+	if (pd->local_mr) {
+		if (ib_dereg_mr(pd->local_mr))
+			return -EBUSY;
+		pd->local_mr = NULL;
+	}
+
 	if (atomic_read(&pd->usecnt))
 		return -EBUSY;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 986fddb08579..cfda95d7b067 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1255,6 +1255,8 @@  struct ib_pd {
 	struct ib_device       *device;
 	struct ib_uobject      *uobject;
 	atomic_t          	usecnt; /* count all resources */
+	struct ib_mr	       *local_mr;
+	u32			local_dma_lkey;
 };
 
 struct ib_xrcd {