diff mbox

[V1,for-next,2/7] IB/core: Add support for idr types

Message ID 1485952745-58476-3-git-send-email-matanb@mellanox.com (mailing list archive)
State Superseded
Headers show

Commit Message

Matan Barak Feb. 1, 2017, 12:39 p.m. UTC
The new ioctl infrastructure supports driver specific objects.
Each such object type has a hot unplug function, allocation size and
an order of destruction.

When a ucontext is created, a new list is created in this ib_ucontext.
This list contains all objects created under this ib_ucontext.
When a ib_ucontext is destroyed, we traverse this list several time
destroying the various objects by the order mentioned in the object
type description. If few object types have the same destruction order,
they are destroyed in an order opposite to their creation.

Adding an object is done in two parts.
First, an object is allocated and added to idr tree. Then, the
command's handlers (in downstream patches) could work on this object
and fill in its required details.
After a successful command, the commit part is called and the user
objects become ucontext visible. If the handler failed, alloc_abort
should be called.

Removing an uboject is done by calling lookup_get with the write flag
and finalizing it with destroy_commit.

We should make sure idr (per-uverbs-file) and list (per-ucontext) could
be accessed concurrently without corrupting them.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
---
 drivers/infiniband/core/Makefile    |   3 +-
 drivers/infiniband/core/rdma_core.c | 277 ++++++++++++++++++++++++++++++++++++
 drivers/infiniband/core/rdma_core.h |  55 +++++++
 include/rdma/ib_verbs.h             |   7 +
 include/rdma/uverbs_types.h         | 104 ++++++++++++++
 5 files changed, 445 insertions(+), 1 deletion(-)
 create mode 100644 drivers/infiniband/core/rdma_core.c
 create mode 100644 drivers/infiniband/core/rdma_core.h
 create mode 100644 include/rdma/uverbs_types.h

Comments

Jason Gunthorpe Feb. 10, 2017, 7:56 p.m. UTC | #1
On Wed, Feb 01, 2017 at 02:39:00PM +0200, Matan Barak wrote:

> +static int uverbs_lock_object(struct ib_uobject *uobj, bool write)
> +{
> +	if (!write)
> +		return down_read_trylock(&uobj->currently_used) == 1 ? 0 :
> +			-EBUSY;
> +
> +	/* lock is either WRITE or DESTROY - should be exclusive */
> +	return down_write_trylock(&uobj->currently_used) == 1 ? 0 : -EBUSY;

Is currently_used ever used as an actual blocking rwsem? Looks like
no to me right now, is that the long term plan?

It actually seems to be important that this not be blocking, so it
might be better to code this a simple atomic.

And lets call it uverbs_try_lock_object() for clarity.

> +static void uverbs_uobject_put_ref(struct kref *ref)
> +{
> +	struct ib_uobject *uobj =
> +		container_of(ref, struct ib_uobject, ref);
> +
> +	uobj->type->ops->release(uobj);
> +}

This sort of approach starts to become very dangerous when you
contemplate having 'release' be a function in a module. Is that the
case?

But, broadly speaking, I expect to see kfree() in a release function,
and from what I can tell the only issue here is that the IDR sometimes
needs kfree_rcu.

So lets drop the function pointer and just add a 1 bit flag:

if (uobj->needs_rcu)
   kfree_rcu(uobj);
else
   kfree(uobj);

This is more clear, don't overdesign things at this point because it
becomes hard to see if down the road things are done wrong (eg putting
release in a different module)

> +static void init_uobj(struct ib_uobject *uobj, struct ib_ucontext *context,
> +		      const struct uverbs_obj_type *type)
> +{
> +	/*
> +	 * user_handle should be filled by the handler,
> +	 * The object is added to the list in the commit stage.
> +	 */
> +	init_rwsem(&uobj->currently_used);
> +	uobj->context     = context;
> +	uobj->type	  = type;

.. and can you please not add the bogus whitespace in new commits
please? That is really not the typical kernel style and makes
everything hard to maintain and read.

> +	kref_init(&uobj->ref);
> +}

Lets call this alloc_uobj, and just use kzalloc(type->obj_size) to do
it.

Due to the kref and the above comment about release it is not a good
idea to have a type specific allocation, we always want kalloc'd
memory here.

> +static void uverbs_idr_remove_uobj(struct ib_uobject *uobj)
> +{
> +	spin_lock(&uobj->context->ufile->idr_lock);
> +	idr_remove(&uobj->context->ufile->idr, uobj->id);
> +	spin_unlock(&uobj->context->ufile->idr_lock);
> +}

Add a clarifying comment

/* The caller must uverbs_uobject_put() uobj */

> +	rcu_read_lock();
> +	/* object won't be released as we're protected in rcu */
> +	uobj = idr_find(&ucontext->ufile->idr, id);

> +	if (!uobj) {
> +		uobj = ERR_PTR(-ENOENT);
> +		goto free;
> +	}
> +
> +	if (uobj->type != type) {
> +		uobj = ERR_PTR(-EINVAL);
> +		goto free;
> +	}
> +
> +	ret = uverbs_lock_object(uobj, write);

Hum. This RCU is a bit exciting.. So this relies on the write lock
being held whenever uverbs_idr_remove is called. Can you add a
LOCKDEP style of of assertion to uverbs_idr_remove to prove that?

> +static struct ib_uobject *alloc_begin_idr_uobject(const struct uverbs_obj_type *type,
> +						  struct ib_ucontext *ucontext)
> +{
> +	int ret;
> +	const struct uverbs_obj_idr_type *idr_type =
> +		container_of(type, struct uverbs_obj_idr_type, type);
> +	struct ib_uobject *uobj = kmalloc(idr_type->obj_size, GFP_KERNEL);

> +
> +	if (!uobj)
> +		return ERR_PTR(-ENOMEM);

This would be the thing to move to alloc_uobj. Ditto for fd.

> +	init_uobj(uobj, ucontext, type);
> +	ret = idr_add_uobj(uobj);
> +	if (ret) {
> +		kfree(uobj);

This should be a uverbs_uobject_put()

> +static void uverbs_uobject_add(struct ib_uobject *uobject)
> +{
> +	mutex_lock(&uobject->context->lock);
> +	list_add(&uobject->list, &uobject->context->uobjects);
> +	mutex_unlock(&uobject->context->lock);
> +}

I'd open code this into alloc_commit_idr_uobject. The only place the
list should be updated is directly after it has been placed in the
IDR, it is confusing to see it in a function as though it can be
called from someplace else

> +static void _put_uobj_ref(struct kref *ref)
> +{
> +	kfree(container_of(ref, struct ib_uobject, ref));
> +}
> +
> +static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
> +{
> +	uverbs_idr_remove_uobj(uobj);
> +	/*
> +	 * we don't need kfree_rcu here, as the uobject wasn't exposed to any
> +	 * other verb.
> +	 */
> +	kref_put(&uobj->ref, _put_uobj_ref);
> +}

Once needs_rcu is added then this ugly stuff goes away. Set needs_rcu
only when the uboject has been added to the IDR.

> +static void destroy_commit_idr_uobject(struct ib_uobject *uobj)

This is a confusing name.. See below..

> +{
> +	uverbs_idr_remove_uobj(uobj);
> +	mutex_lock(&uobj->context->lock);
> +	list_del(&uobj->list);
> +	mutex_unlock(&uobj->context->lock);
> +	uverbs_uobject_put(uobj);
> +}

And this flow is weird, hot_unplug calls an idr_type op, but this does
not? Why?

> +void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed)
> +{
> +	unsigned int cur_order = 0;
> +
> +	while (!list_empty(&ucontext->uobjects)) {
> +		struct ib_uobject *obj, *next_obj;
> +		unsigned int next_order = UINT_MAX;
> +
> +		/*
> +		 * This shouldn't run while executing other commands on this
> +		 * context, thus no lock is required.
> +		 */
> +		list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects,
> +					 list)
> +			if (obj->type->destroy_order == cur_order) {
> +				list_del(&obj->list);
> +				obj->type->ops->hot_unplug(obj, device_removed);

Let's add a safety:

   WARN_ON(uverbs_lock_object(obj, true));

It would be really nice to see a simpler 'built-in' scheme to
guarentee that WARN_ON never hits. I expect this is relying on the
external SRCU system to make it work out..

For instance since we now have lookup_get/lookup_put could we
ultimately move the SRCU into those functions?

Also, the call to hot_unplug has to be checked

> +	/* locking the uobjects_list */
> +	struct mutex		lock;

Prefer uobjects_lock for clarity

> @@ -1374,8 +1378,11 @@ struct ib_uobject {
>  	int			id;		/* index into kernel idr */

> +
> +struct uverbs_obj_type_ops {
> +	/*
> +	 * Get an ib_uobject that corresponds to the given id from ucontext,
> +	 * These functions could create or destroy objects if required.
> +	 * The action will be finalized only when commit, abort or put fops are
> +	 * called.
> +	 * The flow of the different actions is:
> +	 * [alloc]:	 Starts with alloc_begin. The handlers logic is than
> +	 *		 executed. If the handler is successful, alloc_commit
> +	 *		 is called and the object is inserted to the repository.
> +	 *		 Otherwise, alloc_abort is called and the object is
> +	 *		 destroyed.

Add:

 Once alloc_commit completes the object is visible to other threads
 and userspace.

> +	 * [lookup]:	 Starts with lookup_get which fetches and locks the
> +	 *		 object. After the handler finished using the object, it
> +	 *		 needs to call lookup_put to unlock it. The write flag
> +	 *		 indicates if the object is locked for exclusive access.
> +	 * [destroy]:	 Starts with lookup_get with write flag set. This locks
> +	 *		 the object for exclusive access. If the handler code
> +	 *		 completed successfully, destroy_commit is called and
> +	 *		 the ib_uobject is removed from the context's uobjects
> +	 *		 repository and put. Otherwise, lookup_put should be
> +	 *		 called.

Lets call this 'remove' please.

And add:

 Once remove succeeds new krefs to the object cannot be acquired by
 other threads or userspace and the hardware driver is removed from
 the object. Other krefs on the object may still exist.

> +	 * [hot_unplug]: Used when the context is destroyed (process
> +	 *		 termination, reset flow).

I don't think we need a dedicated entry point. I think you should add
an enum argument to remove:

enum rdma_remove_reason {
 RDMA_REMOVE_DESTROY, // Userspace requested uobject deletion
 RDMA_REMOVE_CLOSE,   // Context deletion. Call cannot fail.
 RDMA_REMOVE_DRIVER_REMOVE, // Driver is being hot-unplugged. Call cannot fail.
};

This gives the drivers a more information to the right thing to do.

> +	 * [release]:    Releases the memory of the ib_uobject. This is called
> +	 *		 when the last reference is put. We favored a callback
> +	 *		 here as this might require tricks like kfree_rcu.
> +	 *		 This shouldn't be called explicitly by the user as it's
> +	 *		 used by uverbs_uobject_put.

Delete as discussed


> +	void (*destroy_commit)(struct ib_uobject *uobj);

Use:
   /* Must be called with the write lock held. If successful uboj is
      invalid on return. On failure uobject is left completely unchanged */
   int __must_check (*remove)(struct ib_uobject *uobj,enum rdma_remove_reason why);

> +struct uverbs_obj_idr_type {
> +	/*
> +	 * In idr based objects, uverbs_obj_type_ops points to a generic
> +	 * idr operations. In order to specialize the underlying types (e.g. CQ,
> +	 * QPs, etc.), we add obj_size and hot_unplug specific callbacks here.
> +	 */
> +	struct uverbs_obj_type  type;
> +	size_t			obj_size;

Move to type, do not support no kalloc configurations

> +	/* The hot_unplug in ops initialized by idr, calls this callback */
> +	void (*hot_unplug)(struct ib_uobject *uobj);

Use:

/* Free driver resources from the uobject, make the driver uncallable,
   and move the uobject to the detached state. On failure the uboject
   is left completely unchanged. */
int __must_check (*destroy)(struct ib_uobject *uobj,enum rdma_remove_reason why);

And probably add some commentary what objects support a detached
state. Any object that cannot be detached can only exist in the IDR
and cannot have a kref taken. Perhaps we should enforce that directly
for clarity.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak Feb. 15, 2017, 1:43 p.m. UTC | #2
On Fri, Feb 10, 2017 at 9:56 PM, Jason Gunthorpe
<jgunthorpe@obsidianresearch.com> wrote:
> On Wed, Feb 01, 2017 at 02:39:00PM +0200, Matan Barak wrote:
>
>> +static int uverbs_lock_object(struct ib_uobject *uobj, bool write)
>> +{
>> +     if (!write)
>> +             return down_read_trylock(&uobj->currently_used) == 1 ? 0 :
>> +                     -EBUSY;
>> +
>> +     /* lock is either WRITE or DESTROY - should be exclusive */
>> +     return down_write_trylock(&uobj->currently_used) == 1 ? 0 : -EBUSY;
>
> Is currently_used ever used as an actual blocking rwsem? Looks like
> no to me right now, is that the long term plan?
>

Yeah, it's never used for blocking and that's the long term plan.

> It actually seems to be important that this not be blocking, so it
> might be better to code this a simple atomic.
>

Actually, in the first version it was coded as an atomic and we decided
to change it to rwsem after some comments. Anyway, no problem
changing it back to the original version (see
http://www.spinics.net/lists/linux-rdma/msg38346.html).

> And lets call it uverbs_try_lock_object() for clarity.
>

sure

>> +static void uverbs_uobject_put_ref(struct kref *ref)
>> +{
>> +     struct ib_uobject *uobj =
>> +             container_of(ref, struct ib_uobject, ref);
>> +
>> +     uobj->type->ops->release(uobj);
>> +}
>
> This sort of approach starts to become very dangerous when you
> contemplate having 'release' be a function in a module. Is that the
> case?
>

Well, I expect these core meta-types (e.g. idr, fd) to be declared in ib_uverbs
or ib_core.

> But, broadly speaking, I expect to see kfree() in a release function,
> and from what I can tell the only issue here is that the IDR sometimes
> needs kfree_rcu.
>
> So lets drop the function pointer and just add a 1 bit flag:
>
> if (uobj->needs_rcu)
>    kfree_rcu(uobj);
> else
>    kfree(uobj);
>
> This is more clear, don't overdesign things at this point because it
> becomes hard to see if down the road things are done wrong (eg putting
> release in a different module)
>

Ok, we could use that instead of an explicit release function.

>> +static void init_uobj(struct ib_uobject *uobj, struct ib_ucontext *context,
>> +                   const struct uverbs_obj_type *type)
>> +{
>> +     /*
>> +      * user_handle should be filled by the handler,
>> +      * The object is added to the list in the commit stage.
>> +      */
>> +     init_rwsem(&uobj->currently_used);
>> +     uobj->context     = context;
>> +     uobj->type        = type;
>
> .. and can you please not add the bogus whitespace in new commits
> please? That is really not the typical kernel style and makes
> everything hard to maintain and read.
>

Which bogus whitespace? This is from the kernel coding style doc:
The preferred style for long (multi-line) comments is:

.. code-block:: c

        /*
         * This is the preferred style for multi-line
         * comments in the Linux kernel source code.
         * Please use it consistently.
         *
         * Description:  A column of asterisks on the left side,
         * with beginning and ending almost-blank lines.
         */

Only the net subsystem starts the comment from the first line.

>> +     kref_init(&uobj->ref);
>> +}
>
> Lets call this alloc_uobj, and just use kzalloc(type->obj_size) to do
> it.
>
> Due to the kref and the above comment about release it is not a good
> idea to have a type specific allocation, we always want kalloc'd
> memory here.
>

I can move that. I don't think we really need kzalloc as all fields are
initialized anyway.

>> +static void uverbs_idr_remove_uobj(struct ib_uobject *uobj)
>> +{
>> +     spin_lock(&uobj->context->ufile->idr_lock);
>> +     idr_remove(&uobj->context->ufile->idr, uobj->id);
>> +     spin_unlock(&uobj->context->ufile->idr_lock);
>> +}
>
> Add a clarifying comment
>
> /* The caller must uverbs_uobject_put() uobj */
>

It could actually call kfree directly if the object is guaranteed not
to be used, but
this is a micro optimization that we probably shouldn't really care about.
I prefer to put needs_rcu on the type and not on each uobject.

>> +     rcu_read_lock();
>> +     /* object won't be released as we're protected in rcu */
>> +     uobj = idr_find(&ucontext->ufile->idr, id);
>
>> +     if (!uobj) {
>> +             uobj = ERR_PTR(-ENOENT);
>> +             goto free;
>> +     }
>> +
>> +     if (uobj->type != type) {
>> +             uobj = ERR_PTR(-EINVAL);
>> +             goto free;
>> +     }
>> +
>> +     ret = uverbs_lock_object(uobj, write);
>
> Hum. This RCU is a bit exciting.. So this relies on the write lock
> being held whenever uverbs_idr_remove is called. Can you add a
> LOCKDEP style of of assertion to uverbs_idr_remove to prove that?
>

if we remove the lock and use atomics instead, there's nothing to do
lockdep on, right?

>> +static struct ib_uobject *alloc_begin_idr_uobject(const struct uverbs_obj_type *type,
>> +                                               struct ib_ucontext *ucontext)
>> +{
>> +     int ret;
>> +     const struct uverbs_obj_idr_type *idr_type =
>> +             container_of(type, struct uverbs_obj_idr_type, type);
>> +     struct ib_uobject *uobj = kmalloc(idr_type->obj_size, GFP_KERNEL);
>
>> +
>> +     if (!uobj)
>> +             return ERR_PTR(-ENOMEM);
>
> This would be the thing to move to alloc_uobj. Ditto for fd.
>

Ok

>> +     init_uobj(uobj, ucontext, type);
>> +     ret = idr_add_uobj(uobj);
>> +     if (ret) {
>> +             kfree(uobj);
>
> This should be a uverbs_uobject_put()
>

It'll just postpone this to rcu, but I guess this doesn't really matter.

>> +static void uverbs_uobject_add(struct ib_uobject *uobject)
>> +{
>> +     mutex_lock(&uobject->context->lock);
>> +     list_add(&uobject->list, &uobject->context->uobjects);
>> +     mutex_unlock(&uobject->context->lock);
>> +}
>
> I'd open code this into alloc_commit_idr_uobject. The only place the
> list should be updated is directly after it has been placed in the
> IDR, it is confusing to see it in a function as though it can be
> called from someplace else
>

It'll be later used in the fds path, so I prefer to leave that as is.

>> +static void _put_uobj_ref(struct kref *ref)
>> +{
>> +     kfree(container_of(ref, struct ib_uobject, ref));
>> +}
>> +
>> +static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
>> +{
>> +     uverbs_idr_remove_uobj(uobj);
>> +     /*
>> +      * we don't need kfree_rcu here, as the uobject wasn't exposed to any
>> +      * other verb.
>> +      */
>> +     kref_put(&uobj->ref, _put_uobj_ref);
>> +}
>
> Once needs_rcu is added then this ugly stuff goes away. Set needs_rcu
> only when the uboject has been added to the IDR.
>

We could do that but needs_rcu is actually a type feature.

>> +static void destroy_commit_idr_uobject(struct ib_uobject *uobj)
>
> This is a confusing name.. See below..
>
>> +{
>> +     uverbs_idr_remove_uobj(uobj);
>> +     mutex_lock(&uobj->context->lock);
>> +     list_del(&uobj->list);
>> +     mutex_unlock(&uobj->context->lock);
>> +     uverbs_uobject_put(uobj);
>> +}
>
> And this flow is weird, hot_unplug calls an idr_type op, but this does
> not? Why?
>

In hot unplug or context removal, we need to cleanup the object. In regular
destruction, the handler should take care of destructing the actual object.
It's more symmetrical this way (the handler created the object and
thus it should
destroy it), it lets the handler decide what to do if the destruction
failed or if the destruction
succeeded but copy_to_user failed.
Scenarios like this won't be supported as well:

ret = ib_destroy_xxxx(uobj_xxxx->object, &resp);
if (!ret) {
    copy_to_user(user_resp, &resp, sizeof(resp));
    destroy_commit(uobj_xxxx)
}

>> +void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed)
>> +{
>> +     unsigned int cur_order = 0;
>> +
>> +     while (!list_empty(&ucontext->uobjects)) {
>> +             struct ib_uobject *obj, *next_obj;
>> +             unsigned int next_order = UINT_MAX;
>> +
>> +             /*
>> +              * This shouldn't run while executing other commands on this
>> +              * context, thus no lock is required.
>> +              */
>> +             list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects,
>> +                                      list)
>> +                     if (obj->type->destroy_order == cur_order) {
>> +                             list_del(&obj->list);
>> +                             obj->type->ops->hot_unplug(obj, device_removed);
>
> Let's add a safety:
>
>    WARN_ON(uverbs_lock_object(obj, true));
>

Ok

> It would be really nice to see a simpler 'built-in' scheme to
> guarentee that WARN_ON never hits. I expect this is relying on the
> external SRCU system to make it work out..
>
> For instance since we now have lookup_get/lookup_put could we
> ultimately move the SRCU into those functions?
>
> Also, the call to hot_unplug has to be checked
>

We assume all handlers finished executing and new handlers won't
execute anymore. It's required here that all handlers aren't executing and
the SRCU guarantees that. We could add another such mechanism or move it,
but it would just complicate the code. SRCU protects the commands handling
path and not just objects fetching and locking.

>> +     /* locking the uobjects_list */
>> +     struct mutex            lock;
>
> Prefer uobjects_lock for clarity
>

Ok

>> @@ -1374,8 +1378,11 @@ struct ib_uobject {
>>       int                     id;             /* index into kernel idr */
>
>> +
>> +struct uverbs_obj_type_ops {
>> +     /*
>> +      * Get an ib_uobject that corresponds to the given id from ucontext,
>> +      * These functions could create or destroy objects if required.
>> +      * The action will be finalized only when commit, abort or put fops are
>> +      * called.
>> +      * The flow of the different actions is:
>> +      * [alloc]:      Starts with alloc_begin. The handlers logic is than
>> +      *               executed. If the handler is successful, alloc_commit
>> +      *               is called and the object is inserted to the repository.
>> +      *               Otherwise, alloc_abort is called and the object is
>> +      *               destroyed.
>> +      * [lookup]:     Starts with lookup_get which fetches and locks the
>> +      *               object. After the handler finished using the object, it
>> +      *               needs to call lookup_put to unlock it. The write flag
>> +      *               indicates if the object is locked for exclusive access.
>> +      * [destroy]:    Starts with lookup_get with write flag set. This locks
>> +      *               the object for exclusive access. If the handler code
>> +      *               completed successfully, destroy_commit is called and
>> +      *               the ib_uobject is removed from the context's uobjects
>> +      *               repository and put. Otherwise, lookup_put should be
>> +      *               called.
>
> Lets call this 'remove' please.
>
> And add:
>
>  Once remove succeeds new krefs to the object cannot be acquired by
>  other threads or userspace and the hardware driver is removed from
>  the object. Other krefs on the object may still exist.
>

Ok

>> +      * [hot_unplug]: Used when the context is destroyed (process
>> +      *               termination, reset flow).
>
> I don't think we need a dedicated entry point. I think you should add
> an enum argument to remove:
>
> enum rdma_remove_reason {
>  RDMA_REMOVE_DESTROY, // Userspace requested uobject deletion
>  RDMA_REMOVE_CLOSE,   // Context deletion. Call cannot fail.
>  RDMA_REMOVE_DRIVER_REMOVE, // Driver is being hot-unplugged. Call cannot fail.
> };
>

The last two differs vastly from the first one. The first one only
deletes the uobject, assumes
that the handler itself deleted the actual object. The last two are
invoked when the context
is destroyed and we need to delete both the objects and the uobject.
Therefore, I can
rename destroy_commit to remove_commit and the hot_unplug to cleanup.
Cleanup could
get the reason.
None of these calls could fail.

> This gives the drivers a more information to the right thing to do.
>
>> +      * [release]:    Releases the memory of the ib_uobject. This is called
>> +      *               when the last reference is put. We favored a callback
>> +      *               here as this might require tricks like kfree_rcu.
>> +      *               This shouldn't be called explicitly by the user as it's
>> +      *               used by uverbs_uobject_put.
>
> Delete as discussed
>
>

Ok

>> +     void (*destroy_commit)(struct ib_uobject *uobj);
>
> Use:
>    /* Must be called with the write lock held. If successful uboj is
>       invalid on return. On failure uobject is left completely unchanged */
>    int __must_check (*remove)(struct ib_uobject *uobj,enum rdma_remove_reason why);
>

I don't think we need to check the result. It just can't fail.

>> +struct uverbs_obj_idr_type {
>> +     /*
>> +      * In idr based objects, uverbs_obj_type_ops points to a generic
>> +      * idr operations. In order to specialize the underlying types (e.g. CQ,
>> +      * QPs, etc.), we add obj_size and hot_unplug specific callbacks here.
>> +      */
>> +     struct uverbs_obj_type  type;
>> +     size_t                  obj_size;
>
> Move to type, do not support no kalloc configurations
>

Ok, I guess this should be enough for the expected types.

>> +     /* The hot_unplug in ops initialized by idr, calls this callback */
>> +     void (*hot_unplug)(struct ib_uobject *uobj);
>
> Use:
>
> /* Free driver resources from the uobject, make the driver uncallable,
>    and move the uobject to the detached state. On failure the uboject
>    is left completely unchanged. */
> int __must_check (*destroy)(struct ib_uobject *uobj,enum rdma_remove_reason why);
>

As above, this can't fail.

> And probably add some commentary what objects support a detached
> state. Any object that cannot be detached can only exist in the IDR
> and cannot have a kref taken. Perhaps we should enforce that directly
> for clarity.
>

Actually, I don't see why we can't support detached IDR objects....
As long as you don't execute commands on this IDR (for example, removing
it from the objects list), you could use the memory as long as you want
after you called uverbs_uobject_get (until you call uverbs_uobject_put).

> Jason

Thanks for the review

Matan

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe Feb. 15, 2017, 6:06 p.m. UTC | #3
On Wed, Feb 15, 2017 at 03:43:31PM +0200, Matan Barak wrote:
> On Fri, Feb 10, 2017 at 9:56 PM, Jason Gunthorpe
> <jgunthorpe@obsidianresearch.com> wrote:
> > On Wed, Feb 01, 2017 at 02:39:00PM +0200, Matan Barak wrote:
> >
> >> +static int uverbs_lock_object(struct ib_uobject *uobj, bool write)
> >> +{
> >> +     if (!write)
> >> +             return down_read_trylock(&uobj->currently_used) == 1 ? 0 :
> >> +                     -EBUSY;
> >> +
> >> +     /* lock is either WRITE or DESTROY - should be exclusive */
> >> +     return down_write_trylock(&uobj->currently_used) == 1 ? 0 : -EBUSY;
> >
> > Is currently_used ever used as an actual blocking rwsem? Looks like
> > no to me right now, is that the long term plan?
> >
> 
> Yeah, it's never used for blocking and that's the long term plan.
> 
> > It actually seems to be important that this not be blocking, so it
> > might be better to code this a simple atomic.
> >
> 
> Actually, in the first version it was coded as an atomic and we decided
> to change it to rwsem after some comments. Anyway, no problem
> changing it back to the original version (see
> http://www.spinics.net/lists/linux-rdma/msg38346.html).

Yes, in general I don't like seeing open coded locks, but in this case
the non-block is actually an important property so it may be clearer,
but the atomic is a messy construct as well.

Maybe a comment on currently_used that the rwsem may never be used
blocking is enough?

> > This sort of approach starts to become very dangerous when you
> > contemplate having 'release' be a function in a module. Is that the
> > case?
> >
> 
> Well, I expect these core meta-types (e.g. idr, fd) to be declared in ib_uverbs
> or ib_core.

So that is two modules and it starts to become tricky..

> >> +     init_rwsem(&uobj->currently_used);
> >> +     uobj->context     = context;
> >> +     uobj->type        = type;
> >
> > .. and can you please not add the bogus whitespace in new commits
> > please? That is really not the typical kernel style and makes
> > everything hard to maintain and read.
> 
> Which bogus whitespace?

The stuff before the = to 'line up' the rhs of the assignment.

> >> +static void uverbs_idr_remove_uobj(struct ib_uobject *uobj)
> >> +{
> >> +     spin_lock(&uobj->context->ufile->idr_lock);
> >> +     idr_remove(&uobj->context->ufile->idr, uobj->id);
> >> +     spin_unlock(&uobj->context->ufile->idr_lock);
> >> +}
> >
> > Add a clarifying comment
> >
> > /* The caller must uverbs_uobject_put() uobj */
> 
> It could actually call kfree directly if the object is guaranteed not
> to be used, but
> this is a micro optimization that we probably shouldn't really care about.
> I prefer to put needs_rcu on the type and not on each uobject.

If skipping the kfree_rcu is important the needs_rcu should be
per-object, otherwise per type is probably fine.


> > Hum. This RCU is a bit exciting.. So this relies on the write lock
> > being held whenever uverbs_idr_remove is called. Can you add a
> > LOCKDEP style of of assertion to uverbs_idr_remove to prove that?
> 
> if we remove the lock and use atomics instead, there's nothing to do
> lockdep on, right?

Well, lockdep-like, if it is an atomic then do an atomic test
protected by CONFIG_LOCKDEP or some other such approach

> >> +     init_uobj(uobj, ucontext, type);
> >> +     ret = idr_add_uobj(uobj);
> >> +     if (ret) {
> >> +             kfree(uobj);
> >
> > This should be a uverbs_uobject_put()
> >
> 
> It'll just postpone this to rcu, but I guess this doesn't really matter.

Idiomatically, once the kref is inited then all frees must go through
kref_put.

> >> +static void _put_uobj_ref(struct kref *ref)
> >> +{
> >> +     kfree(container_of(ref, struct ib_uobject, ref));
> >> +}
> >> +
> >> +static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
> >> +{
> >> +     uverbs_idr_remove_uobj(uobj);
> >> +     /*
> >> +      * we don't need kfree_rcu here, as the uobject wasn't exposed to any
> >> +      * other verb.
> >> +      */
> >> +     kref_put(&uobj->ref, _put_uobj_ref);
> >> +}
> >
> > Once needs_rcu is added then this ugly stuff goes away. Set needs_rcu
> > only when the uboject has been added to the IDR.
> 
> We could do that but needs_rcu is actually a type feature.

Either way this stuff goes away.

> >> +{
> >> +     uverbs_idr_remove_uobj(uobj);
> >> +     mutex_lock(&uobj->context->lock);
> >> +     list_del(&uobj->list);
> >> +     mutex_unlock(&uobj->context->lock);
> >> +     uverbs_uobject_put(uobj);
> >> +}
> >
> > And this flow is weird, hot_unplug calls an idr_type op, but this does
> > not? Why?
> >
> 
> In hot unplug or context removal, we need to cleanup the object. In regular
> destruction, the handler should take care of destructing the actual
> object.

So, I think this is a mistake. There is nothing special the handler
does compared to the hot unplug case and it makes no sense to have two
places to duplicate the call out to the driver unplug code. Having
three copies when the ioctl stuff comes along is even worse..

Having them be different is just going to create bugs down the road...

I don't mind the asymmetry because our entire model assumes complex
creation and consistent/trivial destruction.

It is not unlike the device_allocate/device_destroy/device_put sort of
flow where the allocate step has many variations but the destroy is
always uniform.

> destroy it), it lets the handler decide what to do if the
> destruction failed or if the destruction succeeded but copy_to_user
> failed.  Scenarios like this won't be supported as well:

This is why I added the return code to (*destroy) - exactly so the one
place that cares can return the error code, and all the other places
that can't be allowed to fail can have common recovery handling in the
core code.

Too many of our destroy calls toward the driver can can fail, we need
this to be handled in one place, not sprinkled throughout the code.

> >> +      * [hot_unplug]: Used when the context is destroyed (process
> >> +      *               termination, reset flow).
> >
> > I don't think we need a dedicated entry point. I think you should add
> > an enum argument to remove:
> >
> > enum rdma_remove_reason {
> >  RDMA_REMOVE_DESTROY, // Userspace requested uobject deletion
> >  RDMA_REMOVE_CLOSE,   // Context deletion. Call cannot fail.
> >  RDMA_REMOVE_DRIVER_REMOVE, // Driver is being hot-unplugged. Call cannot fail.
> > };
> >
> 
> The last two differs vastly from the first one. The first one only
> deletes the uobject, assumes

No, that isn't my proposal. The idea is that all three do the same
thing excep that:
 - RDMA_REMOVE_DESTROY the driver can return an error code if it
   wishes, the code is returned to userspace
 - RDMA_REMOVE_CLOSE the driver cannot return an error code and must
   force-destroy the object
 - RDMA_REMOVE_DRIVER_REMOVE the driver cannot return an error code,
   must force-destroy the object, and must leave behind any stuff
   to let userspace keep working (eg a zero page mmap, or whatever)

All flows will call the driver destroy function.

The approach is that the one or two call sites that use
RDMA_REMOVE_CLOSE/RDMA_REMOVE_DRIVER_REMOVE would also include the
WARN_ON to prove the driver is working as expected and then leave
whatever dangling ref around in some kind of controlled way.

> None of these calls could fail.

But that isn't really true :)

> > And probably add some commentary what objects support a detached
> > state. Any object that cannot be detached can only exist in the IDR
> > and cannot have a kref taken. Perhaps we should enforce that directly
> > for clarity.
> >
> 
> Actually, I don't see why we can't support detached IDR objects....
> As long as you don't execute commands on this IDR (for example, removing
> it from the objects list), you could use the memory as long as you want
> after you called uverbs_uobject_get (until you call uverbs_uobject_put).

The question about 'detached' revolves around what the users of the
object do. Eg if the users blindly do

ib_foo_action(uboj->driver_obj)

after the driver is disconnected then they will crash. Those objects
do not support 'detatch'

You are right in general the the IDR framework is OK and doesn't
care. But I think the current state of affairs is that only some call
sites, probably the FD related ones, are actually safe for this usage.

This becomes a bit more important down the road if people want to do
things outside the ucontext - eg enumerate all of the QPs in a
process - then we need clear rules for how that continues to work.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile
index edaae9f..1819623 100644
--- a/drivers/infiniband/core/Makefile
+++ b/drivers/infiniband/core/Makefile
@@ -28,4 +28,5 @@  ib_umad-y :=			user_mad.o
 
 ib_ucm-y :=			ucm.o
 
-ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o
+ib_uverbs-y :=			uverbs_main.o uverbs_cmd.o uverbs_marshall.o \
+				rdma_core.o
diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c
new file mode 100644
index 0000000..7ce4d67
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.c
@@ -0,0 +1,277 @@ 
+/*
+ * Copyright (c) 2016, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#include <linux/file.h>
+#include <linux/anon_inodes.h>
+#include <rdma/ib_verbs.h>
+#include <rdma/uverbs_types.h>
+#include "uverbs.h"
+#include "rdma_core.h"
+
+static void uverbs_uobject_put_ref(struct kref *ref)
+{
+	struct ib_uobject *uobj =
+		container_of(ref, struct ib_uobject, ref);
+
+	uobj->type->ops->release(uobj);
+}
+
+void uverbs_uobject_put(struct ib_uobject *uobj)
+{
+	kref_put(&uobj->ref, uverbs_uobject_put_ref);
+}
+
+static int uverbs_lock_object(struct ib_uobject *uobj, bool write)
+{
+	if (!write)
+		return down_read_trylock(&uobj->currently_used) == 1 ? 0 :
+			-EBUSY;
+
+	/* lock is either WRITE or DESTROY - should be exclusive */
+	return down_write_trylock(&uobj->currently_used) == 1 ? 0 : -EBUSY;
+}
+
+static void init_uobj(struct ib_uobject *uobj, struct ib_ucontext *context,
+		      const struct uverbs_obj_type *type)
+{
+	/*
+	 * user_handle should be filled by the handler,
+	 * The object is added to the list in the commit stage.
+	 */
+	init_rwsem(&uobj->currently_used);
+	uobj->context     = context;
+	uobj->type	  = type;
+	kref_init(&uobj->ref);
+}
+
+static int idr_add_uobj(struct ib_uobject *uobj)
+{
+	int ret;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&uobj->context->ufile->idr_lock);
+
+	/*
+	 * We start with allocating an idr pointing to NULL. This represents an
+	 * object which isn't initialized yet. We'll replace it later on with
+	 * the real object once we commit.
+	 */
+	ret = idr_alloc(&uobj->context->ufile->idr, NULL, 0,
+			min_t(unsigned long, U32_MAX - 1, INT_MAX), GFP_NOWAIT);
+	if (ret >= 0)
+		uobj->id = ret;
+
+	spin_unlock(&uobj->context->ufile->idr_lock);
+	idr_preload_end();
+
+	return ret < 0 ? ret : 0;
+}
+
+static void uverbs_idr_remove_uobj(struct ib_uobject *uobj)
+{
+	spin_lock(&uobj->context->ufile->idr_lock);
+	idr_remove(&uobj->context->ufile->idr, uobj->id);
+	spin_unlock(&uobj->context->ufile->idr_lock);
+}
+
+/* Returns the ib_uobject or an error. The caller should check for IS_ERR. */
+static struct ib_uobject *lookup_get_idr_uobject(const struct uverbs_obj_type *type,
+						 struct ib_ucontext *ucontext,
+						 int id, bool write)
+{
+	struct ib_uobject *uobj;
+	int ret;
+
+	rcu_read_lock();
+	/* object won't be released as we're protected in rcu */
+	uobj = idr_find(&ucontext->ufile->idr, id);
+	if (!uobj) {
+		uobj = ERR_PTR(-ENOENT);
+		goto free;
+	}
+
+	if (uobj->type != type) {
+		uobj = ERR_PTR(-EINVAL);
+		goto free;
+	}
+
+	ret = uverbs_lock_object(uobj, write);
+	if (ret)
+		uobj = ERR_PTR(ret);
+free:
+	rcu_read_unlock();
+	return uobj;
+}
+
+static struct ib_uobject *alloc_begin_idr_uobject(const struct uverbs_obj_type *type,
+						  struct ib_ucontext *ucontext)
+{
+	int ret;
+	const struct uverbs_obj_idr_type *idr_type =
+		container_of(type, struct uverbs_obj_idr_type, type);
+	struct ib_uobject *uobj = kmalloc(idr_type->obj_size, GFP_KERNEL);
+
+	if (!uobj)
+		return ERR_PTR(-ENOMEM);
+
+	init_uobj(uobj, ucontext, type);
+	ret = idr_add_uobj(uobj);
+	if (ret) {
+		kfree(uobj);
+		return ERR_PTR(ret);
+	}
+
+	return uobj;
+}
+
+static void uverbs_uobject_add(struct ib_uobject *uobject)
+{
+	mutex_lock(&uobject->context->lock);
+	list_add(&uobject->list, &uobject->context->uobjects);
+	mutex_unlock(&uobject->context->lock);
+}
+
+static void alloc_commit_idr_uobject(struct ib_uobject *uobj)
+{
+	uverbs_uobject_add(uobj);
+	spin_lock(&uobj->context->ufile->idr_lock);
+	/*
+	 * We already allocated this IDR with a NULL object, so
+	 * this shouldn't fail.
+	 */
+	WARN_ON(idr_replace(&uobj->context->ufile->idr,
+			    uobj, uobj->id));
+	spin_unlock(&uobj->context->ufile->idr_lock);
+}
+
+static void _put_uobj_ref(struct kref *ref)
+{
+	kfree(container_of(ref, struct ib_uobject, ref));
+}
+
+static void alloc_abort_idr_uobject(struct ib_uobject *uobj)
+{
+	uverbs_idr_remove_uobj(uobj);
+	/*
+	 * we don't need kfree_rcu here, as the uobject wasn't exposed to any
+	 * other verb.
+	 */
+	kref_put(&uobj->ref, _put_uobj_ref);
+}
+
+static void lookup_put_idr_uobject(struct ib_uobject *uobj, bool write)
+{
+	if (write)
+		up_write(&uobj->currently_used);
+	else
+		up_read(&uobj->currently_used);
+}
+
+static void destroy_commit_idr_uobject(struct ib_uobject *uobj)
+{
+	uverbs_idr_remove_uobj(uobj);
+	mutex_lock(&uobj->context->lock);
+	list_del(&uobj->list);
+	mutex_unlock(&uobj->context->lock);
+	uverbs_uobject_put(uobj);
+}
+
+static void hot_unplug_idr_uobject(struct ib_uobject *uobj, bool device_removed)
+{
+	const struct uverbs_obj_idr_type *idr_type =
+		container_of(uobj->type, struct uverbs_obj_idr_type, type);
+
+	idr_type->hot_unplug(uobj);
+	uverbs_idr_remove_uobj(uobj);
+	uverbs_uobject_put(uobj);
+}
+
+static void release_idr_uobject(struct ib_uobject *uobj)
+{
+	/*
+	 * When we destroy an object, we first just lock it for WRITE and
+	 * actually DESTROY it in the finalize stage. So, the problematic
+	 * scenario is when we just started the finalize stage of the
+	 * destruction (nothing was executed yet). Now, the other thread
+	 * fetched the object for READ access, but it didn't lock it yet.
+	 * The DESTROY thread continues and starts destroying the object.
+	 * When the other thread continue - without the RCU, it would
+	 * access freed memory. However, the rcu_read_lock delays the free
+	 * until the rcu_read_lock of the READ operation quits. Since the
+	 * write lock of the object is still taken by the DESTROY flow, the
+	 * READ operation will get -EBUSY and it'll just bail out.
+	 */
+	kfree_rcu(uobj, rcu);
+}
+
+const struct uverbs_obj_type_ops uverbs_idr_ops = {
+	.alloc_begin = alloc_begin_idr_uobject,
+	.lookup_get = lookup_get_idr_uobject,
+	.alloc_commit = alloc_commit_idr_uobject,
+	.alloc_abort = alloc_abort_idr_uobject,
+	.lookup_put = lookup_put_idr_uobject,
+	.destroy_commit = destroy_commit_idr_uobject,
+	.hot_unplug = hot_unplug_idr_uobject,
+	.release = release_idr_uobject,
+};
+
+void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed)
+{
+	unsigned int cur_order = 0;
+
+	while (!list_empty(&ucontext->uobjects)) {
+		struct ib_uobject *obj, *next_obj;
+		unsigned int next_order = UINT_MAX;
+
+		/*
+		 * This shouldn't run while executing other commands on this
+		 * context, thus no lock is required.
+		 */
+		list_for_each_entry_safe(obj, next_obj, &ucontext->uobjects,
+					 list)
+			if (obj->type->destroy_order == cur_order) {
+				list_del(&obj->list);
+				obj->type->ops->hot_unplug(obj, device_removed);
+			} else {
+				next_order = min(next_order,
+						 obj->type->destroy_order);
+			}
+		cur_order = next_order;
+	}
+}
+
+void uverbs_initialize_ucontext(struct ib_ucontext *ucontext)
+{
+	mutex_init(&ucontext->lock);
+	INIT_LIST_HEAD(&ucontext->uobjects);
+}
+
diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h
new file mode 100644
index 0000000..ab665a6
--- /dev/null
+++ b/drivers/infiniband/core/rdma_core.h
@@ -0,0 +1,55 @@ 
+/*
+ * Copyright (c) 2005 Topspin Communications.  All rights reserved.
+ * Copyright (c) 2005, 2006 Cisco Systems.  All rights reserved.
+ * Copyright (c) 2005-2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2005 Voltaire, Inc. All rights reserved.
+ * Copyright (c) 2005 PathScale, Inc. All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef RDMA_CORE_H
+#define RDMA_CORE_H
+
+#include <linux/idr.h>
+#include <rdma/uverbs_types.h>
+#include <rdma/ib_verbs.h>
+#include <linux/mutex.h>
+
+/*
+ * These functions initialize the context and cleanups its uobjects.
+ * The context has a list of objects which is protected by a mutex
+ * on the context. initialize_ucontext should be called when we create
+ * a context.
+ * cleanup_ucontext removes all uobjects from the context and puts them.
+ */
+void uverbs_cleanup_ucontext(struct ib_ucontext *ucontext, bool device_removed);
+void uverbs_initialize_ucontext(struct ib_ucontext *ucontext);
+
+#endif /* RDMA_CORE_H */
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index ab3cae4..4a2a0fc 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1347,6 +1347,10 @@  struct ib_ucontext {
 	struct list_head	rwq_ind_tbl_list;
 	int			closing;
 
+	/* locking the uobjects_list */
+	struct mutex		lock;
+	struct list_head	uobjects;
+
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
 	struct rb_root      umem_tree;
@@ -1374,8 +1378,11 @@  struct ib_uobject {
 	int			id;		/* index into kernel idr */
 	struct kref		ref;
 	struct rw_semaphore	mutex;		/* protects .live */
+	struct rw_semaphore	currently_used;	/* protects exclusive access */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
+
+	const struct uverbs_obj_type *type;
 };
 
 struct ib_udata {
diff --git a/include/rdma/uverbs_types.h b/include/rdma/uverbs_types.h
new file mode 100644
index 0000000..bb263f0
--- /dev/null
+++ b/include/rdma/uverbs_types.h
@@ -0,0 +1,104 @@ 
+/*
+ * Copyright (c) 2017, Mellanox Technologies inc.  All rights reserved.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses.  You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ *     Redistribution and use in source and binary forms, with or
+ *     without modification, are permitted provided that the following
+ *     conditions are met:
+ *
+ *      - Redistributions of source code must retain the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer.
+ *
+ *      - Redistributions in binary form must reproduce the above
+ *        copyright notice, this list of conditions and the following
+ *        disclaimer in the documentation and/or other materials
+ *        provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef _UVERBS_TYPES_
+#define _UVERBS_TYPES_
+
+#include <linux/kernel.h>
+#include <rdma/ib_verbs.h>
+
+struct uverbs_obj_type;
+
+struct uverbs_obj_type_ops {
+	/*
+	 * Get an ib_uobject that corresponds to the given id from ucontext,
+	 * These functions could create or destroy objects if required.
+	 * The action will be finalized only when commit, abort or put fops are
+	 * called.
+	 * The flow of the different actions is:
+	 * [alloc]:	 Starts with alloc_begin. The handlers logic is than
+	 *		 executed. If the handler is successful, alloc_commit
+	 *		 is called and the object is inserted to the repository.
+	 *		 Otherwise, alloc_abort is called and the object is
+	 *		 destroyed.
+	 * [lookup]:	 Starts with lookup_get which fetches and locks the
+	 *		 object. After the handler finished using the object, it
+	 *		 needs to call lookup_put to unlock it. The write flag
+	 *		 indicates if the object is locked for exclusive access.
+	 * [destroy]:	 Starts with lookup_get with write flag set. This locks
+	 *		 the object for exclusive access. If the handler code
+	 *		 completed successfully, destroy_commit is called and
+	 *		 the ib_uobject is removed from the context's uobjects
+	 *		 repository and put. Otherwise, lookup_put should be
+	 *		 called.
+	 * [hot_unplug]: Used when the context is destroyed (process
+	 *		 termination, reset flow).
+	 * [release]:    Releases the memory of the ib_uobject. This is called
+	 *		 when the last reference is put. We favored a callback
+	 *		 here as this might require tricks like kfree_rcu.
+	 *		 This shouldn't be called explicitly by the user as it's
+	 *		 used by uverbs_uobject_put.
+	 */
+	struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type,
+					  struct ib_ucontext *ucontext);
+	void (*alloc_commit)(struct ib_uobject *uobj);
+	void (*alloc_abort)(struct ib_uobject *uobj);
+
+	struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type,
+					 struct ib_ucontext *ucontext, int id,
+					 bool write);
+	void (*lookup_put)(struct ib_uobject *uobj, bool write);
+	void (*destroy_commit)(struct ib_uobject *uobj);
+
+	void (*hot_unplug)(struct ib_uobject *uobj, bool device_removed);
+
+	void (*release)(struct ib_uobject *uobj);
+};
+
+struct uverbs_obj_type {
+	const struct uverbs_obj_type_ops * const ops;
+	unsigned int destroy_order;
+};
+
+struct uverbs_obj_idr_type {
+	/*
+	 * In idr based objects, uverbs_obj_type_ops points to a generic
+	 * idr operations. In order to specialize the underlying types (e.g. CQ,
+	 * QPs, etc.), we add obj_size and hot_unplug specific callbacks here.
+	 */
+	struct uverbs_obj_type  type;
+	size_t			obj_size;
+	/* The hot_unplug in ops initialized by idr, calls this callback */
+	void (*hot_unplug)(struct ib_uobject *uobj);
+};
+
+#endif