Message ID | 1484132033-3346-3-git-send-email-matanb@mellanox.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote: > +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context) > +{ > + struct ib_uobject *uobj; > + > + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > + "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!"); > + /* object won't be released as we're protected in rcu */ > + uobj = idr_find(&context->device->idr, id); So.. I think you need another patch in your series to rework the idr globally to use rcu. Waiting till patch 7 seems too late I'd particularly like to see this series without patch 5.. Reworking this to RCU seems like a good idea. Even better would be to have per-fd locking domains ;) > + /* > + * We start with allocating an idr pointing to NULL. This represents an > + * object which isn't initialized yet. We'll replace it later on with > + * the real object once we commit. > + */ > + ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT); > + if (ret >= 0) > + uobj->id = ret; And probably another patch to replace live with NULL. These are both big changes, it would be better to see them with their full context, not as stubs like this. Patch 7 has the code for this but it is all intermingled with so much other stuff.. > + /* > + * When we destroy an object, we first just lock it for WRITE and > + * actually DESTROY it in the finalize stage. So, the problematic > + * scenario is when we just stared the finalize stage of the > + * destruction (nothing was executed yet). Now, the other thread > + * fetched the object for READ access, but it didn't lock it yet. > + * The DESTROY thread continues and starts destroying the object. > + * When the other thread continue - without the RCU, it would > + * access freed memory. However, the rcu_read_lock delays the free > + * until the rcu_read_lock of the READ operation quits. Since the > + * write lock of the object is still taken by the DESTROY flow, the > + * READ operation will get -EBUSY and it'll just bail out. > + */ > + kfree_rcu(uobj, rcu); Zoiks. But yes, ok, it makes sense. Good comment. > +/* > + * Returns the ib_uobject, NULL if the requested object isn't found or an error. > + * The caller should check for IS_ERR_OR_NULL. Some people really hate IS_ERR_OR_NULL.. Why would the caller handle NULL differently? Why not return -ENOENT? > + */ > +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext, > + const struct uverbs_type_alloc_action *type, > + u32 idr, > + enum uverbs_idr_access access) > +{ > + struct ib_uobject *uobj; > + int ret; > + > + rcu_read_lock(); > + uobj = get_uobj_rcu(idr, ucontext); Is there a reason we have get_uobj_rcu() at all ? Just inline it. > + if (!uobj) > + goto free; > + > + if (uobj->type != type) { > + uobj = NULL; But this is an error, not no found. > +static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc, > + struct ib_ucontext *ucontext, > + enum uverbs_idr_access access, > + uint32_t idr) u32 for the kernel please, check all patches Why do we need to have three confusingly named functions to implement one thing? Does it even make sense that add/get/destroy are all one function? I'm not sure it makes any sense to overload uverbs_get_uobject_from_context for these two very different cases.. They don't even have the same type in the uapi (int vs u32) I think you should just stick with uverbs_get_uobject_from_idr/_fd and skip the common getter one.. > +{ > + struct ib_uobject *uobj; > + int ret; > + > + if (access == UVERBS_ACCESS_NEW) { > + uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL); > + if (!uobj) > + return ERR_PTR(-ENOMEM); Does this really make sense as an access enum and not just a different function? This one one function is doing far too much I suspect. Use function pointers: struct ub_uobject *obj = type_alloc->ops->allocate(type_alloc, ucontext); > +static void uverbs_uobject_add(struct ib_uobject *uobject) > +{ > + mutex_lock(&uobject->context->uobjects_lock->lock); > + list_add(&uobject->list, &uobject->context->uobjects); > + mutex_unlock(&uobject->context->uobjects_lock->lock); > +} > + > +static void uverbs_uobject_remove(struct ib_uobject *uobject) > +{ > + /* > + * Calling remove requires exclusive access, so it's not possible > + * another thread will use our object since the function is called > + * with exclusive access. > + */ > + uverbs_idr_remove_uobj(uobject); > + mutex_lock(&uobject->context->uobjects_lock->lock); > + list_del(&uobject->list); > + mutex_unlock(&uobject->context->uobjects_lock->lock); > + put_uobj(uobject); What does this put pair with? Usually a put like this would pair with a get inside the _add function. So this deserves a comment at least. If it is pairing with a get the caller performed then it should not be here. > +static void uverbs_finalize_idr(struct ib_uobject *uobj, > + enum uverbs_idr_access access, > + bool commit) > +{ > + switch (access) { > + case UVERBS_ACCESS_READ: > + up_read(&uobj->usecnt); > + break; > + case UVERBS_ACCESS_NEW: > + if (commit) { > + uverbs_uobject_add(uobj); > + spin_lock(&uobj->context->device->idr_lock); > + /* > + * We already allocated this IDR with a NULL object, so > + * this shouldn't fail. > + */ > + WARN_ON(idr_replace(&uobj->context->device->idr, > + uobj, uobj->id)); > + spin_unlock(&uobj->context->device->idr_lock); > + } else { > + uverbs_idr_remove_uobj(uobj); > + put_uobj(uobj); > + } > + break; > + case UVERBS_ACCESS_WRITE: > + up_write(&uobj->usecnt); > + break; > + case UVERBS_ACCESS_DESTROY: > + if (commit) > + uverbs_uobject_remove(uobj); > + else > + up_write(&uobj->usecnt); Again wondering if ACCESS_DESTROY makes any sense.. > + break; > + } > +} > + > +void uverbs_finalize_object(struct ib_uobject *uobj, > + enum uverbs_idr_access access, > + bool commit) > +{ > + if (uobj->type->type == UVERBS_ATTR_TYPE_IDR) > + uverbs_finalize_idr(uobj, access, commit); > + else > + WARN_ON(true); I think you should use ops like the rest of the kernel. uobj->ops->finalize(uobj,access,commit); Get rid of these switch version. > + > struct ib_ucontext { > struct ib_device *device; > struct list_head pd_list; > @@ -1346,6 +1348,10 @@ struct ib_ucontext { > struct list_head rwq_ind_tbl_list; > int closing; > > + /* lock for uobjects list */ > + struct ib_ucontext_lock *uobjects_lock; This is so weird. Why do we have a pointer to a struct mutex? This serves absolutely no function at this point in the series, just inline the mutex. > + struct list_head uobjects; > + > struct pid *tgid; > #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING > struct rb_root umem_tree; > @@ -1373,8 +1379,11 @@ struct ib_uobject { > int id; /* index into kernel idr */ Shouldn't this be u32? > struct kref ref; > struct rw_semaphore mutex; /* protects .live */ > + struct rw_semaphore usecnt; /* protects exclusive access */ usecnt isn't really a cnt, better name? > + const struct uverbs_type_alloc_action *type; [..] > +struct uverbs_type_alloc_action; > +typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type, > + struct ib_uobject *uobject); > + > +struct uverbs_type_alloc_action { > + enum uverbs_attr_type type; > + int order; > + size_t obj_size; > + free_type free_fn; > +}; Just make a sensible meta-class struct like other parts of the kernel: struct uverbs_obj_type { unsigned int destroy_order; size_t allocation_size; struct ib_uobject (*alloc)(..); void (*free)(..); void (*lookup)(..); }; Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jan 12, 2017 at 1:28 AM, Jason Gunthorpe <jgunthorpe@obsidianresearch.com> wrote: > On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote: > >> +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context) >> +{ >> + struct ib_uobject *uobj; >> + >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), >> + "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!"); >> + /* object won't be released as we're protected in rcu */ >> + uobj = idr_find(&context->device->idr, id); > > So.. I think you need another patch in your series to rework the idr > globally to use rcu. Waiting till patch 7 seems too late > I'm not sure I followed here. The idr mechanism uses RCU as of the current code for fetching ids. > I'd particularly like to see this series without patch 5.. > The major purpose of this series is to be a parliamentary step to the ABI, which as you recall, uses a macro based language to declare types. I can simplify it in this series to directly initialize the types, but that means the next series will modify these lines to the macros language. > Reworking this to RCU seems like a good idea. Even better would be to > have per-fd locking domains ;) > What do you mean by "per-fd locking domains"? >> + /* >> + * We start with allocating an idr pointing to NULL. This represents an >> + * object which isn't initialized yet. We'll replace it later on with >> + * the real object once we commit. >> + */ >> + ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT); >> + if (ret >= 0) >> + uobj->id = ret; > > And probably another patch to replace live with NULL. > The whole implementation here is replaced, so if we have a patch that replaces live with NULL in the current implementation and then later on in the series we re-work the API, we actually delete lines we've added at the same series. Since we replace here the whole idr/fd objects management, I think it's better to review the locking/lifetime as an infrastructure rather than bunch of changes in the usage of the current infrastructure. > These are both big changes, it would be better to see them with their > full context, not as stubs like this. Patch 7 has the code for this > but it is all intermingled with so much other stuff.. > Although putting an infrastructure and then using it in later patches is common practice in the kernel, I can look into embedding the required changes in the infrastructure changes. Saying that, testing that it's really bi-sectable could be a nightmare, so I'm not really sure it's worth here. >> + /* >> + * When we destroy an object, we first just lock it for WRITE and >> + * actually DESTROY it in the finalize stage. So, the problematic >> + * scenario is when we just stared the finalize stage of the >> + * destruction (nothing was executed yet). Now, the other thread >> + * fetched the object for READ access, but it didn't lock it yet. >> + * The DESTROY thread continues and starts destroying the object. >> + * When the other thread continue - without the RCU, it would >> + * access freed memory. However, the rcu_read_lock delays the free >> + * until the rcu_read_lock of the READ operation quits. Since the >> + * write lock of the object is still taken by the DESTROY flow, the >> + * READ operation will get -EBUSY and it'll just bail out. >> + */ >> + kfree_rcu(uobj, rcu); > > Zoiks. But yes, ok, it makes sense. Good comment. > >> +/* >> + * Returns the ib_uobject, NULL if the requested object isn't found or an error. >> + * The caller should check for IS_ERR_OR_NULL. > > Some people really hate IS_ERR_OR_NULL.. Why would the caller handle > NULL differently? Why not return -ENOENT? > Ok >> + */ >> +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext, >> + const struct uverbs_type_alloc_action *type, >> + u32 idr, >> + enum uverbs_idr_access access) >> +{ >> + struct ib_uobject *uobj; >> + int ret; >> + >> + rcu_read_lock(); >> + uobj = get_uobj_rcu(idr, ucontext); > > Is there a reason we have get_uobj_rcu() at all ? Just inline it. > It's later used for uverbs_is_live. >> + if (!uobj) >> + goto free; >> + >> + if (uobj->type != type) { >> + uobj = NULL; > > But this is an error, not no found. > Ok, I'll fix. >> +static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc, >> + struct ib_ucontext *ucontext, >> + enum uverbs_idr_access access, >> + uint32_t idr) > > u32 for the kernel please, check all patches > Right, actually idr should be an int :) > Why do we need to have three confusingly named functions to implement > one thing? Does it even make sense that add/get/destroy are all one > function? > It's a preliminary step for the next series which actually introduces the ABI. If you recall, the metadata for each attribute contains how to access it. I'll take a look into changing the enums and switch-cases to function pointers and examine if it also fits the new model well enough. Saying that, we only have two types (idr and fds) and we don't expect to add much later > I'm not sure it makes any sense to overload > uverbs_get_uobject_from_context for these two very different cases.. > They don't even have the same type in the uapi (int vs u32) > I could split them to two different functions here. There are some benefits of the current way. You could of rdma_core as managing a big repository of objects of all types and you could get any object using the tuple (type,id). If we go to the function pointers model, you'll probably just go to type->lookup function. > I think you should just stick with uverbs_get_uobject_from_idr/_fd and > skip the common getter one.. > Ok >> +{ >> + struct ib_uobject *uobj; >> + int ret; >> + >> + if (access == UVERBS_ACCESS_NEW) { >> + uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL); >> + if (!uobj) >> + return ERR_PTR(-ENOMEM); > > Does this really make sense as an access enum and not just a different > function? This one one function is doing far too much I suspect. > > Use function pointers: > > struct ub_uobject *obj = type_alloc->ops->allocate(type_alloc, ucontext); > For the same reason as seen above - the goal here is to use ACCESS_ identifiers in the DSL to specify how to use the object. Therefore, I think enum makes sense. If we transform this into function pointers, the new ABI (next series) will probably use a wrapper function to execute the correct function pointer. >> +static void uverbs_uobject_add(struct ib_uobject *uobject) >> +{ >> + mutex_lock(&uobject->context->uobjects_lock->lock); >> + list_add(&uobject->list, &uobject->context->uobjects); >> + mutex_unlock(&uobject->context->uobjects_lock->lock); >> +} >> + >> +static void uverbs_uobject_remove(struct ib_uobject *uobject) >> +{ >> + /* >> + * Calling remove requires exclusive access, so it's not possible >> + * another thread will use our object since the function is called >> + * with exclusive access. >> + */ >> + uverbs_idr_remove_uobj(uobject); >> + mutex_lock(&uobject->context->uobjects_lock->lock); >> + list_del(&uobject->list); >> + mutex_unlock(&uobject->context->uobjects_lock->lock); >> + put_uobj(uobject); > > What does this put pair with? Usually a put like this would pair with > a get inside the _add function. So this deserves a comment at least. > If it is pairing with a get the caller performed then it should not be > here. > They're a pair in a sense they're both used in the finalize stage of the IDR object creation. Both object creation and object destruction are split into two parts. Object creation is mainly done in the "get" stage while object destruction is mostly done in the "finalize" stage. >> +static void uverbs_finalize_idr(struct ib_uobject *uobj, >> + enum uverbs_idr_access access, >> + bool commit) >> +{ >> + switch (access) { >> + case UVERBS_ACCESS_READ: >> + up_read(&uobj->usecnt); >> + break; >> + case UVERBS_ACCESS_NEW: >> + if (commit) { >> + uverbs_uobject_add(uobj); >> + spin_lock(&uobj->context->device->idr_lock); >> + /* >> + * We already allocated this IDR with a NULL object, so >> + * this shouldn't fail. >> + */ >> + WARN_ON(idr_replace(&uobj->context->device->idr, >> + uobj, uobj->id)); >> + spin_unlock(&uobj->context->device->idr_lock); >> + } else { >> + uverbs_idr_remove_uobj(uobj); >> + put_uobj(uobj); >> + } >> + break; >> + case UVERBS_ACCESS_WRITE: >> + up_write(&uobj->usecnt); >> + break; >> + case UVERBS_ACCESS_DESTROY: >> + if (commit) >> + uverbs_uobject_remove(uobj); >> + else >> + up_write(&uobj->usecnt); > > Again wondering if ACCESS_DESTROY makes any sense.. > Why? If you write a destroy_qp handler, it makes sense that the QP_HANDLE will have ACCESS_DESTROY behavior. >> + break; >> + } >> +} >> + >> +void uverbs_finalize_object(struct ib_uobject *uobj, >> + enum uverbs_idr_access access, >> + bool commit) >> +{ >> + if (uobj->type->type == UVERBS_ATTR_TYPE_IDR) >> + uverbs_finalize_idr(uobj, access, commit); >> + else >> + WARN_ON(true); > > I think you should use ops like the rest of the kernel. > > uobj->ops->finalize(uobj,access,commit); > > Get rid of these switch version. > Maybe we could a type->finalize pointer and populate that automatically in type creation. I first want to see how the function pointers model ends up. >> + >> struct ib_ucontext { >> struct ib_device *device; >> struct list_head pd_list; >> @@ -1346,6 +1348,10 @@ struct ib_ucontext { >> struct list_head rwq_ind_tbl_list; >> int closing; >> >> + /* lock for uobjects list */ >> + struct ib_ucontext_lock *uobjects_lock; > > This is so weird. Why do we have a pointer to a struct mutex? This > serves absolutely no function at this point in the series, just inline > the mutex. > It's used in the next patch as a kref-ed lock. The purpose here is that a FD object could outlive its context. However, since you suggest to allow keeping a dummy ucontext alive, I'll just add a kref and inline the lock. >> + struct list_head uobjects; >> + >> struct pid *tgid; >> #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING >> struct rb_root umem_tree; >> @@ -1373,8 +1379,11 @@ struct ib_uobject { >> int id; /* index into kernel idr */ > > Shouldn't this be u32? > idr_alloc actually returns an int. Actually, we need to change all ids to be int and allocate an idr with S32_MAX as end. >> struct kref ref; >> struct rw_semaphore mutex; /* protects .live */ >> + struct rw_semaphore usecnt; /* protects exclusive access */ > > usecnt isn't really a cnt, better name? > I could rename it to currently_used. What do you think? >> + const struct uverbs_type_alloc_action *type; > > [..] > >> +struct uverbs_type_alloc_action; >> +typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type, >> + struct ib_uobject *uobject); >> + >> +struct uverbs_type_alloc_action { >> + enum uverbs_attr_type type; >> + int order; >> + size_t obj_size; >> + free_type free_fn; >> +}; > > Just make a sensible meta-class struct like other parts of the kernel: > > struct uverbs_obj_type { > unsigned int destroy_order; > size_t allocation_size; > > struct ib_uobject (*alloc)(..); > void (*free)(..); > void (*lookup)(..); > }; > We could use function pointers instead of enum. I'll start refactor the code to use this approach. If it turns out to make more sense, I'll use this in V2: struct uverbs_obj_type_ops { /* * Get an ib_uobject that corresponds to the given id from ucontext, * These functions could create or destroy objects if required. * The action will be finalized only when commit or abort fops are * called. */ struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type, struct ib_ucontext *ucontext); struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type, struct ib_ucontext *ucontext, int id, bool write); void (*alloc_commit)(struct ib_uobject *uobj); void (*alloc_abort)(struct ib_uobject *uobj); void (*lookup_put)(struct ib_uobject *uobj, bool write); void (*destroy_commit)(struct ib_uobject *uobj); void (*hot_unplug)(struct ib_uobject *uobj); }; struct uverbs_obj_type { const struct uverbs_obj_type_ops * const ops; unsigned int destroy_order; }; struct uverbs_obj_idr_type { struct uverbs_obj_type type; size_t obj_size; void (*hot_unplug)(struct ib_uobject *uobj); }; struct uverbs_obj_fd_type { struct uverbs_obj_type type; size_t obj_size; void (*hot_unplug)(struct ib_uobject *uobj); const struct file_operations *fops; const char *name; int flags; }; Probably ops will be the same for all IDR/FD types (use the same pointer) and the custom behavior should be embedded in uverbs_obj_idr_type/uverbs_obj_fd_type. > Jason Thanks for reviewing. Matan > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 17, 2017 at 07:23:47PM +0200, Matan Barak wrote: > On Thu, Jan 12, 2017 at 1:28 AM, Jason Gunthorpe > <jgunthorpe@obsidianresearch.com> wrote: > > On Wed, Jan 11, 2017 at 12:53:48PM +0200, Matan Barak wrote: > > > >> +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context) > >> +{ > >> + struct ib_uobject *uobj; > >> + > >> + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), > >> + "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!"); > >> + /* object won't be released as we're protected in rcu */ > >> + uobj = idr_find(&context->device->idr, id); > > > > So.. I think you need another patch in your series to rework the idr > > globally to use rcu. Waiting till patch 7 seems too late > > I'm not sure I followed here. > The idr mechanism uses RCU as of the current code for fetching ids. Hmm, Not sure what I was thinking, but yes.. this is the same as __idr_get_uobj, so why does it exist? > The major purpose of this series is to be a parliamentary step to > the ABI, which as you recall, uses a macro based language to declare > types. Yes, but that seem controversial, let us get this part done without that. > What do you mean by "per-fd locking domains"? Put the IDR into the ucontext and a per-ucontext rcu + spinlock for it.. > > And probably another patch to replace live with NULL. > > The whole implementation here is replaced, so if we have a patch > that replaces live with NULL in the current implementation and then > later on in the series we re-work the API, we actually delete lines > we've added at the same series. The kernel philsophy is to try an avoid these 'flag day' kinds of approaches. You are adding new code that does not work with the old code which makes it incredibly hard to review, and then just swithing everything over. At the same time many fundamental designs are being altered. You might have a point if the two approaches could exist concurrently, but that is not the case in these patches. > Saying that, testing that it's really bi-sectable could be a > nightmare, so I'm not really sure it's worth here. > >> + /* > >> + * When we destroy an object, we first just lock it for WRITE and > >> + * actually DESTROY it in the finalize stage. So, the problematic > >> + * scenario is when we just stared the finalize stage of the > >> + * destruction (nothing was executed yet). Now, the other thread > >> + * fetched the object for READ access, but it didn't lock it yet. > >> + * The DESTROY thread continues and starts destroying the object. > >> + * When the other thread continue - without the RCU, it would > >> + * access freed memory. However, the rcu_read_lock delays the free > >> + * until the rcu_read_lock of the READ operation quits. Since the > >> + * write lock of the object is still taken by the DESTROY flow, the > >> + * READ operation will get -EBUSY and it'll just bail out. > >> + */ > >> + kfree_rcu(uobj, rcu); > > > > Zoiks. But yes, ok, it makes sense. Good comment. Actually, why is this replacing release_uobj? Why is it now called put_uobj? Why is something that doesn't use a kref called put? Why is a kfree being called outside a kref release function? Seems wrong... > > Is there a reason we have get_uobj_rcu() at all ? Just inline it. > > It's later used for uverbs_is_live. Huh. That bit of code in ib_uverbs_qp_event_handler seems super sketchy. Looks like the qp there comes from the open_list? Seems to me that the open_list needs to be cleaned up before the object is removed from the idr, eliminating the need for the uverbs_is_live at all. > >> + struct ib_ucontext *ucontext, > >> + enum uverbs_idr_access access, > >> + uint32_t idr) > > > > u32 for the kernel please, check all patches > > Right, actually idr should be an int :) Our UAPI is u32 for sane 64/32 bit compat, the IDR has to be limited to min(U32_MAX-1,INT_MAX). (may as well reserve U32_MAX for error) > It's a preliminary step for the next series which actually introduces the ABI. > If you recall, the metadata for each attribute contains how to access it. > I'll take a look into changing the enums and switch-cases to function > pointers and examine if it also fits the new model well enough. > Saying that, we only have two types (idr and fds) and we don't expect > to add much later But this makes no sense here, handle the switch from the enum in the code that works with those future macro structures. > I could split them to two different functions here. There are some > benefits of the current way. You could of rdma_core as managing a > big repository of objects of all types and you could get any object > using the tuple (type,id). If we go to the function pointers model, > you'll probably just go to type->lookup function. The different types seem to have significant differences (eg with locking) so I'm not sure that single model even makes sense. > in the DSL to specify how to use the object. Therefore, I think enum > makes sense. If we transform this into function pointers, the new > ABI (next series) will probably use a wrapper function to execute > the correct function pointer. Seems OK to me > >> +static void uverbs_uobject_add(struct ib_uobject *uobject) > >> +{ > >> + mutex_lock(&uobject->context->uobjects_lock->lock); > >> + list_add(&uobject->list, &uobject->context->uobjects); > >> + mutex_unlock(&uobject->context->uobjects_lock->lock); > >> +} > >> + > >> +static void uverbs_uobject_remove(struct ib_uobject *uobject) > >> +{ > >> + /* > >> + * Calling remove requires exclusive access, so it's not possible > >> + * another thread will use our object since the function is called > >> + * with exclusive access. > >> + */ > >> + uverbs_idr_remove_uobj(uobject); > >> + mutex_lock(&uobject->context->uobjects_lock->lock); > >> + list_del(&uobject->list); > >> + mutex_unlock(&uobject->context->uobjects_lock->lock); > >> + put_uobj(uobject); > > > > What does this put pair with? Usually a put like this would pair with > > a get inside the _add function. So this deserves a comment at least. > > If it is pairing with a get the caller performed then it should not be > > here. > > They're a pair in a sense they're both used in the finalize stage of > the IDR object creation. Both object creation and object destruction > are split into two parts. Object creation is mainly done in the > "get" stage while object destruction is mostly done in the > "finalize" stage. I think this goes back to my earlier comment about put_uobj not making much sense - for something named put there needs to be a paired get and that get isn't clear. > Why? If you write a destroy_qp handler, it makes sense that the QP_HANDLE > will have ACCESS_DESTROY behavior. Sure but it really isn't much of an ACCESS .. > >> #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING > >> struct rb_root umem_tree; > >> @@ -1373,8 +1379,11 @@ struct ib_uobject { > >> int id; /* index into kernel idr */ > > > > Shouldn't this be u32? > > idr_alloc actually returns an int. Actually, we need to change all ids > to be int and allocate > an idr with S32_MAX as end. The uapi is u32, see above... > >> struct kref ref; > >> struct rw_semaphore mutex; /* protects .live */ > >> + struct rw_semaphore usecnt; /* protects exclusive access */ > > > > usecnt isn't really a cnt, better name? > > I could rename it to currently_used. What do you think? Maybe call it destroy_lock? reading_lock? > We could use function pointers instead of enum. I'll start refactor > the code to use this approach. > If it turns out to make more sense, I'll use this in V2: > > struct uverbs_obj_type_ops { > /* > * Get an ib_uobject that corresponds to the given id from ucontext, > * These functions could create or destroy objects if required. > * The action will be finalized only when commit or abort fops are > * called. > */ > struct ib_uobject *(*alloc_begin)(const struct uverbs_obj_type *type, > struct ib_ucontext *ucontext); > struct ib_uobject *(*lookup_get)(const struct uverbs_obj_type *type, > struct ib_ucontext *ucontext, int id, > bool write); > void (*alloc_commit)(struct ib_uobject *uobj); > void (*alloc_abort)(struct ib_uobject *uobj); > void (*lookup_put)(struct ib_uobject *uobj, bool write); > void (*destroy_commit)(struct ib_uobject *uobj); > void (*hot_unplug)(struct ib_uobject *uobj); > }; > > struct uverbs_obj_type { > const struct uverbs_obj_type_ops * const ops; > unsigned int destroy_order; > }; > > struct uverbs_obj_idr_type { > struct uverbs_obj_type type; > size_t obj_size; > void (*hot_unplug)(struct ib_uobject *uobj); > }; > > struct uverbs_obj_fd_type { > struct uverbs_obj_type type; > size_t obj_size; > void (*hot_unplug)(struct ib_uobject *uobj); > const struct file_operations *fops; > const char *name; > int flags; > }; I'd probably just use a union or unused struct members, so much simpler than all these structs. Like you say we probably only have two kinds, no reason to be fancy. > Probably ops will be the same for all IDR/FD types (use the same > pointer) Then why have it? Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/Makefile b/drivers/infiniband/core/Makefile index edaae9f..1819623 100644 --- a/drivers/infiniband/core/Makefile +++ b/drivers/infiniband/core/Makefile @@ -28,4 +28,5 @@ ib_umad-y := user_mad.o ib_ucm-y := ucm.o -ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o +ib_uverbs-y := uverbs_main.o uverbs_cmd.o uverbs_marshall.o \ + rdma_core.o diff --git a/drivers/infiniband/core/rdma_core.c b/drivers/infiniband/core/rdma_core.c new file mode 100644 index 0000000..09b44ec --- /dev/null +++ b/drivers/infiniband/core/rdma_core.c @@ -0,0 +1,264 @@ +/* + * Copyright (c) 2016, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#include <linux/file.h> +#include <linux/anon_inodes.h> +#include <rdma/ib_verbs.h> +#include <rdma/uverbs_ioctl.h> +#include "uverbs.h" +#include "rdma_core.h" + +static int uverbs_lock_object(struct ib_uobject *uobj, + enum uverbs_idr_access access) +{ + if (access == UVERBS_ACCESS_READ) + return down_read_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY; + + /* lock is either WRITE or DESTROY - should be exclusive */ + return down_write_trylock(&uobj->usecnt) == 1 ? 0 : -EBUSY; +} + +static struct ib_uobject *get_uobj_rcu(int id, struct ib_ucontext *context) +{ + struct ib_uobject *uobj; + + RCU_LOCKDEP_WARN(!rcu_read_lock_held(), + "uverbs: get_uobj_rcu wasn't called in a rcu_read_lock()!"); + /* object won't be released as we're protected in rcu */ + uobj = idr_find(&context->device->idr, id); + if (uobj) { + if (uobj->context != context) + uobj = NULL; + } + + return uobj; +} + +struct ib_ucontext_lock { + /* locking the uobjects_list */ + struct mutex lock; +}; + +static void init_uobj(struct ib_uobject *uobj, struct ib_ucontext *context) +{ + init_rwsem(&uobj->usecnt); + uobj->context = context; +} + +static int uverbs_idr_add_uobj(struct ib_uobject *uobj) +{ + int ret; + + idr_preload(GFP_KERNEL); + spin_lock(&uobj->context->device->idr_lock); + + /* + * We start with allocating an idr pointing to NULL. This represents an + * object which isn't initialized yet. We'll replace it later on with + * the real object once we commit. + */ + ret = idr_alloc(&uobj->context->device->idr, NULL, 0, 0, GFP_NOWAIT); + if (ret >= 0) + uobj->id = ret; + + spin_unlock(&uobj->context->device->idr_lock); + idr_preload_end(); + + return ret < 0 ? ret : 0; +} + +static void uverbs_idr_remove_uobj(struct ib_uobject *uobj) +{ + spin_lock(&uobj->context->device->idr_lock); + idr_remove(&uobj->context->device->idr, uobj->id); + spin_unlock(&uobj->context->device->idr_lock); +} + +static void put_uobj(struct ib_uobject *uobj) +{ + /* + * When we destroy an object, we first just lock it for WRITE and + * actually DESTROY it in the finalize stage. So, the problematic + * scenario is when we just stared the finalize stage of the + * destruction (nothing was executed yet). Now, the other thread + * fetched the object for READ access, but it didn't lock it yet. + * The DESTROY thread continues and starts destroying the object. + * When the other thread continue - without the RCU, it would + * access freed memory. However, the rcu_read_lock delays the free + * until the rcu_read_lock of the READ operation quits. Since the + * write lock of the object is still taken by the DESTROY flow, the + * READ operation will get -EBUSY and it'll just bail out. + */ + kfree_rcu(uobj, rcu); +} + +/* + * Returns the ib_uobject, NULL if the requested object isn't found or an error. + * The caller should check for IS_ERR_OR_NULL. + */ +static struct ib_uobject *get_uobject_from_context(struct ib_ucontext *ucontext, + const struct uverbs_type_alloc_action *type, + u32 idr, + enum uverbs_idr_access access) +{ + struct ib_uobject *uobj; + int ret; + + rcu_read_lock(); + uobj = get_uobj_rcu(idr, ucontext); + if (!uobj) + goto free; + + if (uobj->type != type) { + uobj = NULL; + goto free; + } + + ret = uverbs_lock_object(uobj, access); + if (ret) + uobj = ERR_PTR(ret); +free: + rcu_read_unlock(); + return uobj; +} + +static struct ib_uobject *uverbs_get_uobject_from_idr(const struct uverbs_type_alloc_action *type_alloc, + struct ib_ucontext *ucontext, + enum uverbs_idr_access access, + uint32_t idr) +{ + struct ib_uobject *uobj; + int ret; + + if (access == UVERBS_ACCESS_NEW) { + uobj = kmalloc(type_alloc->obj_size, GFP_KERNEL); + if (!uobj) + return ERR_PTR(-ENOMEM); + + init_uobj(uobj, ucontext); + + uobj->type = type_alloc; + ret = uverbs_idr_add_uobj(uobj); + if (ret) { + kfree(uobj); + return ERR_PTR(ret); + } + + } else { + uobj = get_uobject_from_context(ucontext, type_alloc, idr, + access); + + if (IS_ERR_OR_NULL(uobj)) + return ERR_PTR(-ENOENT); + } + + return uobj; +} + +struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_type_alloc_action *type_alloc, + struct ib_ucontext *ucontext, + enum uverbs_idr_access access, + unsigned int id) +{ + if (type_alloc->type == UVERBS_ATTR_TYPE_IDR) + return uverbs_get_uobject_from_idr(type_alloc, ucontext, access, + id); + else + return ERR_PTR(-ENOENT); +} + +static void uverbs_uobject_add(struct ib_uobject *uobject) +{ + mutex_lock(&uobject->context->uobjects_lock->lock); + list_add(&uobject->list, &uobject->context->uobjects); + mutex_unlock(&uobject->context->uobjects_lock->lock); +} + +static void uverbs_uobject_remove(struct ib_uobject *uobject) +{ + /* + * Calling remove requires exclusive access, so it's not possible + * another thread will use our object since the function is called + * with exclusive access. + */ + uverbs_idr_remove_uobj(uobject); + mutex_lock(&uobject->context->uobjects_lock->lock); + list_del(&uobject->list); + mutex_unlock(&uobject->context->uobjects_lock->lock); + put_uobj(uobject); +} + +static void uverbs_finalize_idr(struct ib_uobject *uobj, + enum uverbs_idr_access access, + bool commit) +{ + switch (access) { + case UVERBS_ACCESS_READ: + up_read(&uobj->usecnt); + break; + case UVERBS_ACCESS_NEW: + if (commit) { + uverbs_uobject_add(uobj); + spin_lock(&uobj->context->device->idr_lock); + /* + * We already allocated this IDR with a NULL object, so + * this shouldn't fail. + */ + WARN_ON(idr_replace(&uobj->context->device->idr, + uobj, uobj->id)); + spin_unlock(&uobj->context->device->idr_lock); + } else { + uverbs_idr_remove_uobj(uobj); + put_uobj(uobj); + } + break; + case UVERBS_ACCESS_WRITE: + up_write(&uobj->usecnt); + break; + case UVERBS_ACCESS_DESTROY: + if (commit) + uverbs_uobject_remove(uobj); + else + up_write(&uobj->usecnt); + break; + } +} + +void uverbs_finalize_object(struct ib_uobject *uobj, + enum uverbs_idr_access access, + bool commit) +{ + if (uobj->type->type == UVERBS_ATTR_TYPE_IDR) + uverbs_finalize_idr(uobj, access, commit); + else + WARN_ON(true); +} diff --git a/drivers/infiniband/core/rdma_core.h b/drivers/infiniband/core/rdma_core.h new file mode 100644 index 0000000..0142573 --- /dev/null +++ b/drivers/infiniband/core/rdma_core.h @@ -0,0 +1,61 @@ +/* + * Copyright (c) 2005 Topspin Communications. All rights reserved. + * Copyright (c) 2005, 2006 Cisco Systems. All rights reserved. + * Copyright (c) 2005-2016 Mellanox Technologies. All rights reserved. + * Copyright (c) 2005 Voltaire, Inc. All rights reserved. + * Copyright (c) 2005 PathScale, Inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef RDMA_CORE_H +#define RDMA_CORE_H + +#include <linux/idr.h> +#include <rdma/uverbs_ioctl.h> +#include <rdma/ib_verbs.h> +#include <linux/mutex.h> + +/* + * Get an ib_uobject that corresponds to the given id from ucontext, assuming + * the object is from the given type. Lock it to the required access. + * This function could create (access == NEW) or destroy (access == DESTROY) + * objects if required. The action will be finalized only when + * uverbs_finalize_object is called. + */ +struct ib_uobject *uverbs_get_uobject_from_context(const struct uverbs_type_alloc_action *type_alloc, + struct ib_ucontext *ucontext, + enum uverbs_idr_access access, + unsigned int id); + +void uverbs_finalize_object(struct ib_uobject *uobj, + enum uverbs_idr_access access, + bool success); + +#endif /* RDMA_CORE_H */ diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index cad2c00..47f560d 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1331,6 +1331,8 @@ struct ib_fmr_attr { struct ib_umem; +struct ib_ucontext_lock; + struct ib_ucontext { struct ib_device *device; struct list_head pd_list; @@ -1346,6 +1348,10 @@ struct ib_ucontext { struct list_head rwq_ind_tbl_list; int closing; + /* lock for uobjects list */ + struct ib_ucontext_lock *uobjects_lock; + struct list_head uobjects; + struct pid *tgid; #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING struct rb_root umem_tree; @@ -1373,8 +1379,11 @@ struct ib_uobject { int id; /* index into kernel idr */ struct kref ref; struct rw_semaphore mutex; /* protects .live */ + struct rw_semaphore usecnt; /* protects exclusive access */ struct rcu_head rcu; /* kfree_rcu() overhead */ int live; + + const struct uverbs_type_alloc_action *type; }; struct ib_udata { diff --git a/include/rdma/uverbs_ioctl.h b/include/rdma/uverbs_ioctl.h new file mode 100644 index 0000000..903f6b3 --- /dev/null +++ b/include/rdma/uverbs_ioctl.h @@ -0,0 +1,60 @@ +/* + * Copyright (c) 2016, Mellanox Technologies inc. All rights reserved. + * + * This software is available to you under a choice of one of two + * licenses. You may choose to be licensed under the terms of the GNU + * General Public License (GPL) Version 2, available from the file + * COPYING in the main directory of this source tree, or the + * OpenIB.org BSD license below: + * + * Redistribution and use in source and binary forms, with or + * without modification, are permitted provided that the following + * conditions are met: + * + * - Redistributions of source code must retain the above + * copyright notice, this list of conditions and the following + * disclaimer. + * + * - Redistributions in binary form must reproduce the above + * copyright notice, this list of conditions and the following + * disclaimer in the documentation and/or other materials + * provided with the distribution. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE + * SOFTWARE. + */ + +#ifndef _UVERBS_IOCTL_ +#define _UVERBS_IOCTL_ + +#include <linux/kernel.h> + +enum uverbs_attr_type { + UVERBS_ATTR_TYPE_IDR, +}; + +enum uverbs_idr_access { + UVERBS_ACCESS_READ, + UVERBS_ACCESS_WRITE, + UVERBS_ACCESS_NEW, + UVERBS_ACCESS_DESTROY +}; + +struct uverbs_type_alloc_action; +typedef void (*free_type)(const struct uverbs_type_alloc_action *uobject_type, + struct ib_uobject *uobject); + +struct uverbs_type_alloc_action { + enum uverbs_attr_type type; + int order; + size_t obj_size; + free_type free_fn; +}; + +#endif