[for-next,2/7] IB/core: Add support for custom types

Message ID	1484132033-3346-3-git-send-email-matanb@mellanox.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Matan Barak <matanb@mellanox.com> To: Doug Ledford <dledford@redhat.com> Cc: linux-rdma@vger.kernel.org, Yishai Hadas <yishaih@mellanox.com>, Jason Gunthorpe <jgunthorpe@obsidianresearch.com>, Sean Hefty <sean.hefty@intel.com>, Ira Weiny <ira.weiny@intel.com>, Christoph Lameter <cl@linux.com>, Majd Dibbiny <majd@mellanox.com>, Tal Alon <talal@mellanox.com>, Leon Romanovsky <leonro@mellanox.com>, Liran Liss <liranl@mellanox.com>, Haggai Eran <haggaie@mellanox.com>, Matan Barak <matanb@mellanox.com> Subject: [PATCH for-next 2/7] IB/core: Add support for custom types Date: Wed, 11 Jan 2017 12:53:48 +0200 Message-Id: <1484132033-3346-3-git-send-email-matanb@mellanox.com> In-Reply-To: <1484132033-3346-1-git-send-email-matanb@mellanox.com> References: <1484132033-3346-1-git-send-email-matanb@mellanox.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk

Message ID

1484132033-3346-3-git-send-email-matanb@mellanox.com (mailing list archive)

State

Superseded

Headers

From: Matan Barak <matanb@mellanox.com>
To: Doug Ledford <dledford@redhat.com>
Cc: linux-rdma@vger.kernel.org, Yishai Hadas <yishaih@mellanox.com>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Sean Hefty <sean.hefty@intel.com>, Ira Weiny <ira.weiny@intel.com>,
	Christoph Lameter <cl@linux.com>, Majd Dibbiny <majd@mellanox.com>,
	Tal Alon <talal@mellanox.com>, Leon Romanovsky <leonro@mellanox.com>,
	Liran Liss <liranl@mellanox.com>, Haggai Eran <haggaie@mellanox.com>,
	Matan Barak <matanb@mellanox.com>
Subject: [PATCH for-next 2/7] IB/core: Add support for custom types
Date: Wed, 11 Jan 2017 12:53:48 +0200
Message-Id: <1484132033-3346-3-git-send-email-matanb@mellanox.com>
In-Reply-To: <1484132033-3346-1-git-send-email-matanb@mellanox.com>
References: <1484132033-3346-1-git-send-email-matanb@mellanox.com>
Sender: linux-rdma-owner@vger.kernel.org
Precedence: bulk

Commit Message

Matan Barak Jan. 11, 2017, 10:53 a.m. UTC

The new ioctl infrastructure supports driver specific objects.
Each such object type has a free function, allocation size and an
order of destruction. This information is embedded in the same
table describing the various action allowed on the object, similarly
to object oriented programming.

When a ucontext is created, a new list is created in this ib_ucontext.
This list contains all objects created under this ib_ucontext.
When a ib_ucontext is destroyed, we traverse this list several time
destroying the various objects by the order mentioned in the object
type description. If few object types have the same destruction order,
they are destroyed in an order opposite to their creation order.

Adding an object is done in two parts.
First, an object is allocated and added to IDR/fd table. Then, the
command's handlers (in downstream patches) could work on this object
and fill in its required details.
After a successful command, ib_uverbs_uobject_enable is called and
this user objects becomes ucontext visible.

Removing an uboject is done by calling ib_uverbs_uobject_remove.

We should make sure IDR (per-device) and list (per-ucontext) could
be accessed concurrently without corrupting them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/Makefile    |   3 +-
 drivers/infiniband/core/rdma_core.c | 264 ++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h |  61 +++++++++
 include/rdma/ib_verbs.h             |   9 ++
 include/rdma/uverbs_ioctl.h         |  60 ++++++++
 5 files changed, 396 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 include/rdma/uverbs_ioctl.h

Comments

Jason Gunthorpe Jan. 11, 2017, 11:28 p.m. UTC | #1

On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote:

> +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context)
> +{
> +	struct ib_uobject *uobj;
> +
> +	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> +			 "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!");
> +	/* object won't be released as we're protected in rcu */
> +	uobj = idr_find(&context->device->idr, id);

So.. I think you need another patch in your series to rework the idr
globally to use rcu. Waiting till patch 7 seems too late

I'd particularly like to see this series without patch 5..

Reworking this to RCU seems like a good idea. Even better would be to
have per-fd locking domains ;)

> +	/*
> +	 * We start with allocating an idr pointing to NULL. This represents an
> +	 * object which isn't initialized yet. We'll replace it later on with
> +	 * the real object once we commit.
> +	 */
> +	ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT);
> +	if (ret >= 0)
> +		uobj->id = ret;

And probably another patch to replace live with NULL.

These are both big changes, it would be better to see them with their
full context, not as stubs like this. Patch 7 has the code for this
but it is all intermingled with so much other stuff..

> +	/*
> +	 * When we destroy an object, we first just lock it for WRITE and
> +	 * actually DESTROY it in the finalize stage. So, the problematic
> +	 * scenario is when we just stared the finalize stage of the
> +	 * destruction (nothing was executed yet). Now, the other thread
> +	 * fetched the object for READ access, but it didn't lock it yet.
> +	 * The DESTROY thread continues and starts destroying the object.
> +	 * When the other thread continue - without the RCU, it would
> +	 * access freed memory. However, the rcu_read_lock delays the free
> +	 * until the rcu_read_lock of the READ operation quits. Since the
> +	 * write lock of the object is still taken by the DESTROY flow, the
> +	 * READ operation will get -EBUSY and it'll just bail out.
> +	 */
> +	kfree_rcu(uobj, rcu);

Zoiks. But yes, ok, it makes sense. Good comment.

> +/*
> + * Returns the ib_uobject, NULL if the requested object isn't found or an error.
> + * The caller should check for IS_ERR_OR_NULL.

Some people really hate IS_ERR_OR_NULL.. Why would the caller handle
NULL differently? Why not return -ENOENT?

> + */
> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
> +						   const struct uverbs_type_alloc_action *type,
> +						   u32 idr,
> +						   enum uverbs_idr_access access)
> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	rcu_read_lock();
> +	uobj = get_uobj_rcu(idr, ucontext);

Is there a reason we have get_uobj_rcu() at all ? Just inline it.

> +	if (!uobj)
> +		goto free;
> +
> +	if (uobj->type != type) {
> +		uobj = NULL;

But this is an error, not no found.

> +static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc,
> +						      struct ib_ucontext *ucontext,
> +						      enum uverbs_idr_access access,
> +						      uint32_t idr)

u32 for the kernel please, check all patches

Why do we need to have three confusingly named functions to implement
one thing? Does it even make sense that add/get/destroy are all one
function?

I'm not sure it makes any sense to overload
uverbs_get_uobject_from_context for these two very different cases..
They don't even have the same type in the uapi (int vs u32)

I think you should just stick with uverbs_get_uobject_from_idr/_fd and
skip the common getter one..

> +{
> +	struct ib_uobject *uobj;
> +	int ret;
> +
> +	if (access == UVERBS_ACCESS_NEW) {
> +		uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL);
> +		if (!uobj)
> +			return ERR_PTR(-ENOMEM);

Does this really make sense as an access enum and not just a different
function? This one one function is doing far too much I suspect.

Use function pointers:

 struct ub_uobject *obj = type_alloc->ops->allocate(type_alloc, ucontext);

> +static void uverbs_uobject_add(struct ib_uobject *uobject)
> +{
> +	mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_add(&uobject->list, &uobject->context->uobjects);
> +	mutex_unlock(&uobject->context->uobjects_lock->lock);
> +}
> +
> +static void uverbs_uobject_remove(struct ib_uobject *uobject)
> +{
> +	/*
> +	 * Calling remove requires exclusive access, so it's not possible
> +	 * another thread will use our object since the function is called
> +	 * with exclusive access.
> +	 */
> +	uverbs_idr_remove_uobj(uobject);
> +	mutex_lock(&uobject->context->uobjects_lock->lock);
> +	list_del(&uobject->list);
> +	mutex_unlock(&uobject->context->uobjects_lock->lock);
> +	put_uobj(uobject);

What does this put pair with? Usually a put like this would pair with
a get inside the _add function. So this deserves a comment at least.
If it is pairing with a get the caller performed then it should not be
here.

> +static void uverbs_finalize_idr(struct ib_uobject *uobj,
> +				enum uverbs_idr_access access,
> +				bool commit)
> +{
> +	switch (access) {
> +	case UVERBS_ACCESS_READ:
> +		up_read(&uobj->usecnt);
> +		break;
> +	case UVERBS_ACCESS_NEW:
> +		if (commit) {
> +			uverbs_uobject_add(uobj);
> +			spin_lock(&uobj->context->device->idr_lock);
> +			/*
> +			 * We already allocated this IDR with a NULL object, so
> +			 * this shouldn't fail.
> +			 */
> +			WARN_ON(idr_replace(&uobj->context->device->idr,
> +					    uobj, uobj->id));
> +			spin_unlock(&uobj->context->device->idr_lock);
> +		} else {
> +			uverbs_idr_remove_uobj(uobj);
> +			put_uobj(uobj);
> +		}
> +		break;
> +	case UVERBS_ACCESS_WRITE:
> +		up_write(&uobj->usecnt);
> +		break;
> +	case UVERBS_ACCESS_DESTROY:
> +		if (commit)
> +			uverbs_uobject_remove(uobj);
> +		else
> +			up_write(&uobj->usecnt);

Again wondering if ACCESS_DESTROY makes any sense..

> +		break;
> +	}
> +}
> +
> +void uverbs_finalize_object(struct ib_uobject *uobj,
> +			    enum uverbs_idr_access access,
> +			    bool commit)
> +{
> +	if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
> +		uverbs_finalize_idr(uobj, access, commit);
> +	else
> +		WARN_ON(true);

I think you should use ops like the rest of the kernel.

  uobj->ops->finalize(uobj,access,commit);

Get rid of these switch version.

> +
>  struct ib_ucontext {
>  	struct ib_device       *device;
>  	struct list_head	pd_list;
> @@ -1346,6 +1348,10 @@ struct ib_ucontext {
>  	struct list_head	rwq_ind_tbl_list;
>  	int			closing;
>  
> +	/* lock for uobjects list */
> +	struct ib_ucontext_lock	*uobjects_lock;

This is so weird. Why do we have a pointer to a struct mutex? This
serves absolutely no function at this point in the series, just inline
the mutex.

> +	struct list_head	uobjects;
> +
>  	struct pid             *tgid;
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>  	struct rb_root      umem_tree;
> @@ -1373,8 +1379,11 @@ struct ib_uobject {
>  	int			id;		/* index into kernel idr */

Shouldn't this be u32?

>  	struct kref		ref;
>  	struct rw_semaphore	mutex;		/* protects .live */
> +	struct rw_semaphore	usecnt;		/* protects exclusive access */

usecnt isn't really a cnt, better name?

> +	const struct uverbs_type_alloc_action *type;

[..]

> +struct uverbs_type_alloc_action;
> +typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type,
> +			  struct ib_uobject *uobject);
> +
> +struct uverbs_type_alloc_action {
> +	enum uverbs_attr_type		type;
> +	int				order;
> +	size_t				obj_size;
> +	free_type			free_fn;
> +};

Just make a sensible meta-class struct like other parts of the kernel:

struct uverbs_obj_type {
   unsigned int destroy_order;
   size_t allocation_size;

   struct ib_uobject (*alloc)(..);
   void (*free)(..);
   void (*lookup)(..);
};

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Matan Barak Jan. 17, 2017, 5:23 p.m. UTC | #2

On Thu, Jan 12, 2017 at 1:28 AM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote:
>
>> +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context)
>> +{
>> +     struct ib_uobject *uobj;
>> +
>> +     RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
>> +                      "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!");
>> +     /* object won't be released as we're protected in rcu */
>> +     uobj = idr_find(&context->device->idr, id);
>
> So.. I think you need another patch in your series to rework the idr
> globally to use rcu. Waiting till patch 7 seems too late
>

I'm not sure I followed here.
The idr mechanism uses RCU as of the current code for fetching ids.

> I'd particularly like to see this series without patch 5..
>

The major purpose of this series is to be a parliamentary step to the
ABI, which as you recall,
uses a macro based language to declare types. I can simplify it in
this series to directly initialize the
types, but that means the next series will modify these lines to the
macros language.

> Reworking this to RCU seems like a good idea. Even better would be to
> have per-fd locking domains ;)
>

What do you mean by "per-fd locking domains"?

>> +     /*
>> +      * We start with allocating an idr pointing to NULL. This represents an
>> +      * object which isn't initialized yet. We'll replace it later on with
>> +      * the real object once we commit.
>> +      */
>> +     ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT);
>> +     if (ret >= 0)
>> +             uobj->id = ret;
>
> And probably another patch to replace live with NULL.
>

The whole implementation here is replaced, so if we have a patch that replaces
live with NULL in the current implementation and then later on in the
series we re-work
the API, we actually delete lines we've added at the same series.
Since we replace here the whole
idr/fd objects management, I think it's better to review the
locking/lifetime as an infrastructure
rather than bunch of changes in the usage of the current infrastructure.

> These are both big changes, it would be better to see them with their
> full context, not as stubs like this. Patch 7 has the code for this
> but it is all intermingled with so much other stuff..
>

Although putting an infrastructure and then using it in later patches
is common practice in
the kernel, I can look into embedding the required changes in the
infrastructure changes.
Saying that, testing that it's really bi-sectable could be a
nightmare, so I'm not really sure it's worth here.

>> +     /*
>> +      * When we destroy an object, we first just lock it for WRITE and
>> +      * actually DESTROY it in the finalize stage. So, the problematic
>> +      * scenario is when we just stared the finalize stage of the
>> +      * destruction (nothing was executed yet). Now, the other thread
>> +      * fetched the object for READ access, but it didn't lock it yet.
>> +      * The DESTROY thread continues and starts destroying the object.
>> +      * When the other thread continue - without the RCU, it would
>> +      * access freed memory. However, the rcu_read_lock delays the free
>> +      * until the rcu_read_lock of the READ operation quits. Since the
>> +      * write lock of the object is still taken by the DESTROY flow, the
>> +      * READ operation will get -EBUSY and it'll just bail out.
>> +      */
>> +     kfree_rcu(uobj, rcu);
>
> Zoiks. But yes, ok, it makes sense. Good comment.
>
>> +/*
>> + * Returns the ib_uobject, NULL if the requested object isn't found or an error.
>> + * The caller should check for IS_ERR_OR_NULL.
>
> Some people really hate IS_ERR_OR_NULL.. Why would the caller handle
> NULL differently? Why not return -ENOENT?
>

Ok

>> + */
>> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
>> +                                                const struct uverbs_type_alloc_action *type,
>> +                                                u32 idr,
>> +                                                enum uverbs_idr_access access)
>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     rcu_read_lock();
>> +     uobj = get_uobj_rcu(idr, ucontext);
>
> Is there a reason we have get_uobj_rcu() at all ? Just inline it.
>

It's later used for uverbs_is_live.

>> +     if (!uobj)
>> +             goto free;
>> +
>> +     if (uobj->type != type) {
>> +             uobj = NULL;
>
> But this is an error, not no found.
>

Ok, I'll fix.

>> +static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc,
>> +                                                   struct ib_ucontext *ucontext,
>> +                                                   enum uverbs_idr_access access,
>> +                                                   uint32_t idr)
>
> u32 for the kernel please, check all patches
>

Right, actually idr should be an int :)

> Why do we need to have three confusingly named functions to implement
> one thing? Does it even make sense that add/get/destroy are all one
> function?
>

It's a preliminary step for the next series which actually introduces the ABI.
If you recall, the metadata for each attribute contains how to access it.
I'll take a look into changing the enums and switch-cases to function
pointers and examine if it also fits the new model well enough.
Saying that, we only have two types (idr and fds) and we don't expect
to add much later

> I'm not sure it makes any sense to overload
> uverbs_get_uobject_from_context for these two very different cases..
> They don't even have the same type in the uapi (int vs u32)
>

I could split them to two different functions here. There are some
benefits of the current way.
You could of rdma_core as managing a big repository of objects of all
types and you could get
any object using the tuple (type,id). If we go to the function
pointers model, you'll probably just go to
type->lookup function.

> I think you should just stick with uverbs_get_uobject_from_idr/_fd and
> skip the common getter one..
>

Ok

>> +{
>> +     struct ib_uobject *uobj;
>> +     int ret;
>> +
>> +     if (access == UVERBS_ACCESS_NEW) {
>> +             uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL);
>> +             if (!uobj)
>> +                     return ERR_PTR(-ENOMEM);
>
> Does this really make sense as an access enum and not just a different
> function? This one one function is doing far too much I suspect.
>
> Use function pointers:
>
>  struct ub_uobject *obj = type_alloc->ops->allocate(type_alloc, ucontext);
>

For the same reason as seen above - the goal here is to use ACCESS_ identifiers
in the DSL to specify how to use the object. Therefore, I think enum
makes sense.
If we transform this into function pointers, the new ABI (next series)
will probably use a wrapper
function to execute the correct function pointer.

>> +static void uverbs_uobject_add(struct ib_uobject *uobject)
>> +{
>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_add(&uobject->list, &uobject->context->uobjects);
>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>> +}
>> +
>> +static void uverbs_uobject_remove(struct ib_uobject *uobject)
>> +{
>> +     /*
>> +      * Calling remove requires exclusive access, so it's not possible
>> +      * another thread will use our object since the function is called
>> +      * with exclusive access.
>> +      */
>> +     uverbs_idr_remove_uobj(uobject);
>> +     mutex_lock(&uobject->context->uobjects_lock->lock);
>> +     list_del(&uobject->list);
>> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
>> +     put_uobj(uobject);
>
> What does this put pair with? Usually a put like this would pair with
> a get inside the _add function. So this deserves a comment at least.
> If it is pairing with a get the caller performed then it should not be
> here.
>

They're a pair in a sense they're both used in the finalize stage of the IDR
object creation. Both object creation and object destruction are split
into two parts.
Object creation is mainly done in the "get" stage while object
destruction is mostly
done in the "finalize" stage.

>> +static void uverbs_finalize_idr(struct ib_uobject *uobj,
>> +                             enum uverbs_idr_access access,
>> +                             bool commit)
>> +{
>> +     switch (access) {
>> +     case UVERBS_ACCESS_READ:
>> +             up_read(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_ACCESS_NEW:
>> +             if (commit) {
>> +                     uverbs_uobject_add(uobj);
>> +                     spin_lock(&uobj->context->device->idr_lock);
>> +                     /*
>> +                      * We already allocated this IDR with a NULL object, so
>> +                      * this shouldn't fail.
>> +                      */
>> +                     WARN_ON(idr_replace(&uobj->context->device->idr,
>> +                                         uobj, uobj->id));
>> +                     spin_unlock(&uobj->context->device->idr_lock);
>> +             } else {
>> +                     uverbs_idr_remove_uobj(uobj);
>> +                     put_uobj(uobj);
>> +             }
>> +             break;
>> +     case UVERBS_ACCESS_WRITE:
>> +             up_write(&uobj->usecnt);
>> +             break;
>> +     case UVERBS_ACCESS_DESTROY:
>> +             if (commit)
>> +                     uverbs_uobject_remove(uobj);
>> +             else
>> +                     up_write(&uobj->usecnt);
>
> Again wondering if ACCESS_DESTROY makes any sense..
>

Why? If you write a destroy_qp handler, it makes sense that the QP_HANDLE
will have ACCESS_DESTROY behavior.

>> +             break;
>> +     }
>> +}
>> +
>> +void uverbs_finalize_object(struct ib_uobject *uobj,
>> +                         enum uverbs_idr_access access,
>> +                         bool commit)
>> +{
>> +     if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
>> +             uverbs_finalize_idr(uobj, access, commit);
>> +     else
>> +             WARN_ON(true);
>
> I think you should use ops like the rest of the kernel.
>
>   uobj->ops->finalize(uobj,access,commit);
>
> Get rid of these switch version.
>

Maybe we could a type->finalize pointer and populate that
automatically in type creation.
I first want to see how the function pointers model ends up.

>> +
>>  struct ib_ucontext {
>>       struct ib_device       *device;
>>       struct list_head        pd_list;
>> @@ -1346,6 +1348,10 @@ struct ib_ucontext {
>>       struct list_head        rwq_ind_tbl_list;
>>       int                     closing;
>>
>> +     /* lock for uobjects list */
>> +     struct ib_ucontext_lock *uobjects_lock;
>
> This is so weird. Why do we have a pointer to a struct mutex? This
> serves absolutely no function at this point in the series, just inline
> the mutex.
>

It's used in the next patch as a kref-ed lock. The purpose here is
that a FD object could outlive its context.
However, since you suggest to allow keeping a dummy ucontext alive,
I'll just add a kref and inline the lock.

>> +     struct list_head        uobjects;
>> +
>>       struct pid             *tgid;
>>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
>>       struct rb_root      umem_tree;
>> @@ -1373,8 +1379,11 @@ struct ib_uobject {
>>       int                     id;             /* index into kernel idr */
>
> Shouldn't this be u32?
>

idr_alloc actually returns an int. Actually, we need to change all ids
to be int and allocate
an idr with S32_MAX as end.

>>       struct kref             ref;
>>       struct rw_semaphore     mutex;          /* protects .live */
>> +     struct rw_semaphore     usecnt;         /* protects exclusive access */
>
> usecnt isn't really a cnt, better name?
>

I could rename it to currently_used. What do you think?

>> +     const struct uverbs_type_alloc_action *type;
>
> [..]
>
>> +struct uverbs_type_alloc_action;
>> +typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type,
>> +                       struct ib_uobject *uobject);
>> +
>> +struct uverbs_type_alloc_action {
>> +     enum uverbs_attr_type           type;
>> +     int                             order;
>> +     size_t                          obj_size;
>> +     free_type                       free_fn;
>> +};
>
> Just make a sensible meta-class struct like other parts of the kernel:
>
> struct uverbs_obj_type {
>    unsigned int destroy_order;
>    size_t allocation_size;
>
>    struct ib_uobject (*alloc)(..);
>    void (*free)(..);
>    void (*lookup)(..);
> };
>

We could use function pointers instead of enum. I'll start refactor
the code to use this approach.
If it turns out to make more sense, I'll use this in V2:

struct uverbs_obj_type_ops {
        /*
         * Get an ib_uobject that corresponds to the given id from ucontext,
         * These functions could create or destroy objects if required.
         * The action will be finalized only when commit or abort fops are
         * called.
         */
        struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type,
                                          struct ib_ucontext *ucontext);
        struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type,
                                         struct ib_ucontext *ucontext, int id,
                                         bool write);
        void (*alloc_commit)(struct ib_uobject *uobj);
        void (*alloc_abort)(struct ib_uobject *uobj);
        void (*lookup_put)(struct ib_uobject *uobj, bool write);
        void (*destroy_commit)(struct ib_uobject *uobj);
        void (*hot_unplug)(struct ib_uobject *uobj);
};

struct uverbs_obj_type {
        const struct uverbs_obj_type_ops * const ops;
        unsigned int destroy_order;
};

struct uverbs_obj_idr_type {
        struct uverbs_obj_type  type;
        size_t                  obj_size;
        void (*hot_unplug)(struct ib_uobject *uobj);
};

struct uverbs_obj_fd_type {
        struct uverbs_obj_type  type;
        size_t                  obj_size;
        void (*hot_unplug)(struct ib_uobject *uobj);
        const struct file_operations    *fops;
        const char                      *name;
        int                             flags;
};

Probably ops will be the same for all IDR/FD types (use the same
pointer) and the custom behavior
should be embedded in uverbs_obj_idr_type/uverbs_obj_fd_type.

> Jason

Thanks for reviewing.

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Jason Gunthorpe Jan. 17, 2017, 7:01 p.m. UTC | #3

On Tue, Jan 17, 2017 at 07:23:47PM +0200, Matan Barak wrote:
> On Thu, Jan 12, 2017 at 1:28 AM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> > On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote:
> >
> >> +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context)
> >> +{
> >> +     struct ib_uobject *uobj;
> >> +
> >> +     RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
> >> +                      "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!");
> >> +     /* object won't be released as we're protected in rcu */
> >> +     uobj = idr_find(&context->device->idr, id);
> >
> > So.. I think you need another patch in your series to rework the idr
> > globally to use rcu. Waiting till patch 7 seems too late
> 
> I'm not sure I followed here.
> The idr mechanism uses RCU as of the current code for fetching ids.

Hmm, Not sure what I was thinking, but yes.. this is the same as
__idr_get_uobj, so why does it exist?

> The major purpose of this series is to be a parliamentary step to
> the ABI, which as you recall, uses a macro based language to declare
> types.

Yes, but that seem controversial, let us get this part done without
that.
 
> What do you mean by "per-fd locking domains"?

Put the IDR into the ucontext and a per-ucontext rcu + spinlock for it..

> > And probably another patch to replace live with NULL.
> 
> The whole implementation here is replaced, so if we have a patch
> that replaces live with NULL in the current implementation and then
> later on in the series we re-work the API, we actually delete lines
> we've added at the same series.

The kernel philsophy is to try an avoid these 'flag day' kinds of
approaches. You are adding new code that does not work with the old
code which makes it incredibly hard to review, and then just swithing
everything over.

At the same time many fundamental designs are being altered.

You might have a point if the two approaches could exist
concurrently, but that is not the case in these patches.

> Saying that, testing that it's really bi-sectable could be a
> nightmare, so I'm not really sure it's worth here.


> >> +     /*
> >> +      * When we destroy an object, we first just lock it for WRITE and
> >> +      * actually DESTROY it in the finalize stage. So, the problematic
> >> +      * scenario is when we just stared the finalize stage of the
> >> +      * destruction (nothing was executed yet). Now, the other thread
> >> +      * fetched the object for READ access, but it didn't lock it yet.
> >> +      * The DESTROY thread continues and starts destroying the object.
> >> +      * When the other thread continue - without the RCU, it would
> >> +      * access freed memory. However, the rcu_read_lock delays the free
> >> +      * until the rcu_read_lock of the READ operation quits. Since the
> >> +      * write lock of the object is still taken by the DESTROY flow, the
> >> +      * READ operation will get -EBUSY and it'll just bail out.
> >> +      */
> >> +     kfree_rcu(uobj, rcu);
> >
> > Zoiks. But yes, ok, it makes sense. Good comment.

Actually, why is this replacing release_uobj? Why is it now
called put_uobj? Why is something that doesn't use a kref called put?
Why is a kfree being called outside a kref release function?

Seems wrong...

> > Is there a reason we have get_uobj_rcu() at all ? Just inline it.
> 
> It's later used for uverbs_is_live.

Huh. That bit of code in ib_uverbs_qp_event_handler seems super
sketchy. Looks like the qp there comes from the open_list? Seems to me
that the open_list needs to be cleaned up before the object is removed
from the idr, eliminating the need for the uverbs_is_live at all.

> >> +                                                   struct ib_ucontext *ucontext,
> >> +                                                   enum uverbs_idr_access access,
> >> +                                                   uint32_t idr)
> >
> > u32 for the kernel please, check all patches
> 
> Right, actually idr should be an int :)

Our UAPI is u32 for sane 64/32 bit compat, the IDR has to be limited
to min(U32_MAX-1,INT_MAX). (may as well reserve U32_MAX for error)

> It's a preliminary step for the next series which actually introduces the ABI.
> If you recall, the metadata for each attribute contains how to access it.
> I'll take a look into changing the enums and switch-cases to function
> pointers and examine if it also fits the new model well enough.
> Saying that, we only have two types (idr and fds) and we don't expect
> to add much later

But this makes no sense here, handle the switch from the enum in the
code that works with those future macro structures.

> I could split them to two different functions here. There are some
> benefits of the current way.  You could of rdma_core as managing a
> big repository of objects of all types and you could get any object
> using the tuple (type,id). If we go to the function pointers model,
> you'll probably just go to type->lookup function.

The different types seem to have significant differences (eg with
locking) so I'm not sure that single model even makes sense.

> in the DSL to specify how to use the object. Therefore, I think enum
> makes sense.  If we transform this into function pointers, the new
> ABI (next series) will probably use a wrapper function to execute
> the correct function pointer.

Seems OK to me

> >> +static void uverbs_uobject_add(struct ib_uobject *uobject)
> >> +{
> >> +     mutex_lock(&uobject->context->uobjects_lock->lock);
> >> +     list_add(&uobject->list, &uobject->context->uobjects);
> >> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
> >> +}
> >> +
> >> +static void uverbs_uobject_remove(struct ib_uobject *uobject)
> >> +{
> >> +     /*
> >> +      * Calling remove requires exclusive access, so it's not possible
> >> +      * another thread will use our object since the function is called
> >> +      * with exclusive access.
> >> +      */
> >> +     uverbs_idr_remove_uobj(uobject);
> >> +     mutex_lock(&uobject->context->uobjects_lock->lock);
> >> +     list_del(&uobject->list);
> >> +     mutex_unlock(&uobject->context->uobjects_lock->lock);
> >> +     put_uobj(uobject);
> >
> > What does this put pair with? Usually a put like this would pair with
> > a get inside the _add function. So this deserves a comment at least.
> > If it is pairing with a get the caller performed then it should not be
> > here.
> 
> They're a pair in a sense they're both used in the finalize stage of
> the IDR object creation. Both object creation and object destruction
> are split into two parts.  Object creation is mainly done in the
> "get" stage while object destruction is mostly done in the
> "finalize" stage.

I think this goes back to my earlier comment about put_uobj not making
much sense - for something named put there needs to be a paired get
and that get isn't clear.

> Why? If you write a destroy_qp handler, it makes sense that the QP_HANDLE
> will have ACCESS_DESTROY behavior.

Sure but it really isn't much of an ACCESS ..

> >>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> >>       struct rb_root      umem_tree;
> >> @@ -1373,8 +1379,11 @@ struct ib_uobject {
> >>       int                     id;             /* index into kernel idr */
> >
> > Shouldn't this be u32?
> 
> idr_alloc actually returns an int. Actually, we need to change all ids
> to be int and allocate
> an idr with S32_MAX as end.

The uapi is u32, see above...

> >>       struct kref             ref;
> >>       struct rw_semaphore     mutex;          /* protects .live */
> >> +     struct rw_semaphore     usecnt;         /* protects exclusive access */
> >
> > usecnt isn't really a cnt, better name?
> 
> I could rename it to currently_used. What do you think?

Maybe call it destroy_lock? reading_lock?

> We could use function pointers instead of enum. I'll start refactor
> the code to use this approach.
> If it turns out to make more sense, I'll use this in V2:
> 
> struct uverbs_obj_type_ops {
>         /*
>          * Get an ib_uobject that corresponds to the given id from ucontext,
>          * These functions could create or destroy objects if required.
>          * The action will be finalized only when commit or abort fops are
>          * called.
>          */
>         struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type,
>                                           struct ib_ucontext *ucontext);
>         struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type,
>                                          struct ib_ucontext *ucontext, int id,
>                                          bool write);
>         void (*alloc_commit)(struct ib_uobject *uobj);
>         void (*alloc_abort)(struct ib_uobject *uobj);
>         void (*lookup_put)(struct ib_uobject *uobj, bool write);
>         void (*destroy_commit)(struct ib_uobject *uobj);
>         void (*hot_unplug)(struct ib_uobject *uobj);
> };
> 
> struct uverbs_obj_type {
>         const struct uverbs_obj_type_ops * const ops;
>         unsigned int destroy_order;
> };
> 
> struct uverbs_obj_idr_type {
>         struct uverbs_obj_type  type;
>         size_t                  obj_size;
>         void (*hot_unplug)(struct ib_uobject *uobj);
> };
>
> struct uverbs_obj_fd_type {
>         struct uverbs_obj_type  type;
>         size_t                  obj_size;
>         void (*hot_unplug)(struct ib_uobject *uobj);
>         const struct file_operations    *fops;
>         const char                      *name;
>         int                             flags;
> };

I'd probably just use a union or unused struct members, so much
simpler than all these structs. Like you say we probably only have two
kinds, no reason to be fancy.

> Probably ops will be the same for all IDR/FD types (use the same
> pointer)

Then why have it?

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..1819623 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -28,4 +28,5 @@  ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
+				rdma_core.o
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
new file mode 100644
index 0000000..09b44ec
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.c
@@ -0,0 +1,264 @@ 
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/uverbs_ioctl.h>
+#include "uverbs.h"
+#include "rdma_core.h"
+
+static int uverbs_lock_object(struct ib_uobject *uobj,
+			      enum uverbs_idr_access access)
+{
+	if (access == UVERBS_ACCESS_READ)
+		return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
+
+	/* lock is either WRITE or DESTROY - should be exclusive */
+	return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY;
+}
+
+static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context)
+{
+	struct ib_uobject *uobj;
+
+	RCU_LOCKDEP_WARN(!rcu_read_lock_held(),
+			 "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!");
+	/* object won't be released as we're protected in rcu */
+	uobj = idr_find(&context->device->idr, id);
+	if (uobj) {
+		if (uobj->context != context)
+			uobj = NULL;
+	}
+
+	return uobj;
+}
+
+struct ib_ucontext_lock {
+	/* locking the uobjects_list */
+	struct mutex lock;
+};
+
+static void init_uobj(struct ib_uobject *uobj, struct ib_ucontext *context)
+{
+	init_rwsem(&uobj->usecnt);
+	uobj->context     = context;
+}
+
+static int uverbs_idr_add_uobj(struct ib_uobject *uobj)
+{
+	int ret;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&uobj->context->device->idr_lock);
+
+	/*
+	 * We start with allocating an idr pointing to NULL. This represents an
+	 * object which isn't initialized yet. We'll replace it later on with
+	 * the real object once we commit.
+	 */
+	ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT);
+	if (ret >= 0)
+		uobj->id = ret;
+
+	spin_unlock(&uobj->context->device->idr_lock);
+	idr_preload_end();
+
+	return ret < 0 ? ret : 0;
+}
+
+static void uverbs_idr_remove_uobj(struct ib_uobject *uobj)
+{
+	spin_lock(&uobj->context->device->idr_lock);
+	idr_remove(&uobj->context->device->idr, uobj->id);
+	spin_unlock(&uobj->context->device->idr_lock);
+}
+
+static void put_uobj(struct ib_uobject *uobj)
+{
+	/*
+	 * When we destroy an object, we first just lock it for WRITE and
+	 * actually DESTROY it in the finalize stage. So, the problematic
+	 * scenario is when we just stared the finalize stage of the
+	 * destruction (nothing was executed yet). Now, the other thread
+	 * fetched the object for READ access, but it didn't lock it yet.
+	 * The DESTROY thread continues and starts destroying the object.
+	 * When the other thread continue - without the RCU, it would
+	 * access freed memory. However, the rcu_read_lock delays the free
+	 * until the rcu_read_lock of the READ operation quits. Since the
+	 * write lock of the object is still taken by the DESTROY flow, the
+	 * READ operation will get -EBUSY and it'll just bail out.
+	 */
+	kfree_rcu(uobj, rcu);
+}
+
+/*
+ * Returns the ib_uobject, NULL if the requested object isn't found or an error.
+ * The caller should check for IS_ERR_OR_NULL.
+ */
+static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext,
+						   const struct uverbs_type_alloc_action *type,
+						   u32 idr,
+						   enum uverbs_idr_access access)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	rcu_read_lock();
+	uobj = get_uobj_rcu(idr, ucontext);
+	if (!uobj)
+		goto free;
+
+	if (uobj->type != type) {
+		uobj = NULL;
+		goto free;
+	}
+
+	ret = uverbs_lock_object(uobj, access);
+	if (ret)
+		uobj = ERR_PTR(ret);
+free:
+	rcu_read_unlock();
+	return uobj;
+}
+
+static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc,
+						      struct ib_ucontext *ucontext,
+						      enum uverbs_idr_access access,
+						      uint32_t idr)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	if (access == UVERBS_ACCESS_NEW) {
+		uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL);
+		if (!uobj)
+			return ERR_PTR(-ENOMEM);
+
+		init_uobj(uobj, ucontext);
+
+		uobj->type = type_alloc;
+		ret = uverbs_idr_add_uobj(uobj);
+		if (ret) {
+			kfree(uobj);
+			return ERR_PTR(ret);
+		}
+
+	} else {
+		uobj = get_uobject_from_context(ucontext, type_alloc, idr,
+						access);
+
+		if (IS_ERR_OR_NULL(uobj))
+			return ERR_PTR(-ENOENT);
+	}
+
+	return uobj;
+}
+
+struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_type_alloc_action *type_alloc,
+						   struct ib_ucontext *ucontext,
+						   enum uverbs_idr_access access,
+						   unsigned int id)
+{
+	if (type_alloc->type == UVERBS_ATTR_TYPE_IDR)
+		return uverbs_get_uobject_from_idr(type_alloc, ucontext, access,
+						   id);
+	else
+		return ERR_PTR(-ENOENT);
+}
+
+static void uverbs_uobject_add(struct ib_uobject *uobject)
+{
+	mutex_lock(&uobject->context->uobjects_lock->lock);
+	list_add(&uobject->list, &uobject->context->uobjects);
+	mutex_unlock(&uobject->context->uobjects_lock->lock);
+}
+
+static void uverbs_uobject_remove(struct ib_uobject *uobject)
+{
+	/*
+	 * Calling remove requires exclusive access, so it's not possible
+	 * another thread will use our object since the function is called
+	 * with exclusive access.
+	 */
+	uverbs_idr_remove_uobj(uobject);
+	mutex_lock(&uobject->context->uobjects_lock->lock);
+	list_del(&uobject->list);
+	mutex_unlock(&uobject->context->uobjects_lock->lock);
+	put_uobj(uobject);
+}
+
+static void uverbs_finalize_idr(struct ib_uobject *uobj,
+				enum uverbs_idr_access access,
+				bool commit)
+{
+	switch (access) {
+	case UVERBS_ACCESS_READ:
+		up_read(&uobj->usecnt);
+		break;
+	case UVERBS_ACCESS_NEW:
+		if (commit) {
+			uverbs_uobject_add(uobj);
+			spin_lock(&uobj->context->device->idr_lock);
+			/*
+			 * We already allocated this IDR with a NULL object, so
+			 * this shouldn't fail.
+			 */
+			WARN_ON(idr_replace(&uobj->context->device->idr,
+					    uobj, uobj->id));
+			spin_unlock(&uobj->context->device->idr_lock);
+		} else {
+			uverbs_idr_remove_uobj(uobj);
+			put_uobj(uobj);
+		}
+		break;
+	case UVERBS_ACCESS_WRITE:
+		up_write(&uobj->usecnt);
+		break;
+	case UVERBS_ACCESS_DESTROY:
+		if (commit)
+			uverbs_uobject_remove(uobj);
+		else
+			up_write(&uobj->usecnt);
+		break;
+	}
+}
+
+void uverbs_finalize_object(struct ib_uobject *uobj,
+			    enum uverbs_idr_access access,
+			    bool commit)
+{
+	if (uobj->type->type == UVERBS_ATTR_TYPE_IDR)
+		uverbs_finalize_idr(uobj, access, commit);
+	else
+		WARN_ON(true);
+}
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
new file mode 100644
index 0000000..0142573
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.h
@@ -0,0 +1,61 @@ 
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_CORE_H
+#define RDMA_CORE_H
+
+#include <linux/idr.h>
+#include <rdma/uverbs_ioctl.h>
+#include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
+
+/*
+ * Get an ib_uobject that corresponds to the given id from ucontext, assuming
+ * the object is from the given type. Lock it to the required access.
+ * This function could create (access == NEW) or destroy (access == DESTROY)
+ * objects if required. The action will be finalized only when
+ * uverbs_finalize_object is called.
+ */
+struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_type_alloc_action *type_alloc,
+						   struct ib_ucontext *ucontext,
+						   enum uverbs_idr_access access,
+						   unsigned int id);
+
+void uverbs_finalize_object(struct ib_uobject *uobj,
+			    enum uverbs_idr_access access,
+			    bool success);
+
+#endif /* RDMA_CORE_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index cad2c00..47f560d 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1331,6 +1331,8 @@  struct ib_fmr_attr {
 
 struct ib_umem;
 
+struct ib_ucontext_lock;
+
 struct ib_ucontext {
 	struct ib_device       *device;
 	struct list_head	pd_list;
@@ -1346,6 +1348,10 @@  struct ib_ucontext {
 	struct list_head	rwq_ind_tbl_list;
 	int			closing;
 
+	/* lock for uobjects list */
+	struct ib_ucontext_lock	*uobjects_lock;
+	struct list_head	uobjects;
+
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	struct rb_root      umem_tree;
@@ -1373,8 +1379,11 @@  struct ib_uobject {
 	int			id;		/* index into kernel idr */
 	struct kref		ref;
 	struct rw_semaphore	mutex;		/* protects .live */
+	struct rw_semaphore	usecnt;		/* protects exclusive access */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
+
+	const struct uverbs_type_alloc_action *type;
 };
 
 struct ib_udata {
diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h
new file mode 100644
index 0000000..903f6b3
--- /dev/null
+++ b/include/rdma/uverbs_ioctl.h
@@ -0,0 +1,60 @@ 
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_IOCTL_
+#define _UVERBS_IOCTL_
+
+#include <linux/kernel.h>
+
+enum uverbs_attr_type {
+	UVERBS_ATTR_TYPE_IDR,
+};
+
+enum uverbs_idr_access {
+	UVERBS_ACCESS_READ,
+	UVERBS_ACCESS_WRITE,
+	UVERBS_ACCESS_NEW,
+	UVERBS_ACCESS_DESTROY
+};
+
+struct uverbs_type_alloc_action;
+typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type,
+			  struct ib_uobject *uobject);
+
+struct uverbs_type_alloc_action {
+	enum uverbs_attr_type		type;
+	int				order;
+	size_t				obj_size;
+	free_type			free_fn;
+};
+
+#endif