Message ID | 1467293971-25688-8-git-send-email-matanb@mellanox.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote: > This patch presents some concept about lockless scheme. It's only > intended to open discussion and still misses some crucial parts. It isn't really lockless, is it? I gather the intent is to remove the per-uobj lock(s) and replace with a ucontext rwsem? > Every ib_uobject has a usecnt atomic variable. Single rwsem in > ib_uverbs_file (protect destruction of device from execution of > commands). > > For every command execution: > a. Check if device is available (as of today, with srcu) > b. Take down_read(&file->close_sem) > c. <User is now protected from closing the process and ib-dev > destruction> > d. When an object is used -> use uverbs_lock_object function > implemented in this patch. > i .If the object isn't available (-EBUSY) I'd like to see a detailed list of all the calls that change from blocking to EBUSY with this approach. I also think this should be stand-alone as part of the uobject rework series.. Again it seems like this benifits any approach. uverbs_lock_object is in an different patch in the series.. I'm always nervous when I see people create their own locks with atomics. Don't do that. That is a rw lock using try_lock. > e. Unlock with uverbs_unlock_object (implemented in this patch) > f. Release up_read(&file->close_sem) g. release srcu If the close_sem is always being held, then I suspect we can drop SRCU as well?? > ib_uobjects are lockless. If two commands are executed in > parallel and one need exclusive access (WRITE/DESTROY) -> one > command will fail. User-space needs to either synchronize or do > something productive with the failure. I think this can work for destroy, but I have a hard time seeing how it is going to be OK for WRITE/MODIFY/etc operations. eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing that be allowed to fail .. > @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file, > if (!uobj) > return -ENOMEM; > > - init_uobj(uobj, 0, file->ucontext, &pd_lock_class); > - down_write(&uobj->mutex); > + down_read(&file->close_sem); > + init_uobj(uobj, 0, file->ucontext); Seeing this pattern in your patches really makes me wonder.. What is the down_write even doing at that point??? uobj = kmalloc(sizeof *uobj, GFP_KERNEL); init_uobj(uobj, 0, file->ucontext, &pd_lock_class); down_write(&uobj->mutex); init_uobj doesn't add the memory to any lists or anything. How could another thread get access to the pointer and contend on the mutex? I guess it is because later we do: ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj); While holding the lock. But that is a really ugly API. Nothing should be added to the idr until it is 100% initialized, so we don't need concurrency protection. What is the write lock doing at that point?? This whole flow should be simplified, which will make the locking requirements a lot clearer. uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); [.. full setup ..] uobj_activate(uobj); // live = 1, list_add_tail, idr_add resp.handle = uobj->id; copy_to_user(..); put_uobj(uobj); return ret; err1: uobj_deactivate(uobj); err2: put_obj(uobj); I would do the above sort of restructuring as another (mergable, even) patch, then rip out the locking. Also, I think the destruction should be equally cleaned up. If you introduce a patch with actual ops and a destroy function pointer per-type then a centralized destruct call can be implemented that does the correct lock, wrapping the functionc all. Again this would be a nice clean up to have distinct from the ioctl stuff... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 13/07/2016 20:16, Jason Gunthorpe wrote: > On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote: >> This patch presents some concept about lockless scheme. It's only >> intended to open discussion and still misses some crucial parts. > > It isn't really lockless, is it? I gather the intent is to remove the > per-uobj lock(s) and replace with a ucontext rwsem? > >> Every ib_uobject has a usecnt atomic variable. Single rwsem in >> ib_uverbs_file (protect destruction of device from execution of >> commands). >> >> For every command execution: >> a. Check if device is available (as of today, with srcu) >> b. Take down_read(&file->close_sem) >> c. <User is now protected from closing the process and ib-dev >> destruction> >> d. When an object is used -> use uverbs_lock_object function >> implemented in this patch. >> i .If the object isn't available (-EBUSY) > > I'd like to see a detailed list of all the calls that change from > blocking to EBUSY with this approach. > I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast, rereg_mr and all the destroy functions. > I also think this should be stand-alone as part of the uobject rework > series.. Again it seems like this benifits any approach. > Yeah, but since the new infrastructure is going to do the locks automatically for you, we'll ditch the locks in uverbs_cmd anyway. Therefore, I don't see a real benefit refactoring uverbs_cmd and then ditching all these changes. > uverbs_lock_object is in an different patch in the series.. > > I'm always nervous when I see people create their own locks with > atomics. Don't do that. That is a rw lock using try_lock. > Ok >> e. Unlock with uverbs_unlock_object (implemented in this patch) >> f. Release up_read(&file->close_sem) > > g. release srcu > > If the close_sem is always being held, then I suspect we can drop SRCU > as well?? > No, it's not the same thing. SRCU protects device removal from command execution. close_sem protects context destruction from command execution. You're correct that in the current state of the code, context is only closed when a file is closed or when a device is removed, so it's protected. However, if we want to add a destroy_context action - it becomes necessary. If we move to a per uverbs_file idr, contexts (either file context or client context), we still need to sync destruction of these contexts with carrying out actions. Since the uverbs_file outlives them, it's reasonable to put the lock there. >> ib_uobjects are lockless. If two commands are executed in >> parallel and one need exclusive access (WRITE/DESTROY) -> one >> command will fail. User-space needs to either synchronize or do >> something productive with the failure. > > I think this can work for destroy, but I have a hard time seeing how > it is going to be OK for WRITE/MODIFY/etc operations. > > eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing > that be allowed to fail .. > Currently, as far as I remember, modify isn't protected by the write lock anyway. Besides that, nothing stops the user from syncing the actions. If you do the synchronization in kernel, you need to make sure locks are taken in the exact same order for all commands to avoid deadlock in the kernel. This is more expensive than just letting the user do that. For example, if a user has a single thread to control his resources, why do we have to waste time or sorting his handles in kernel? >> @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file, >> if (!uobj) >> return -ENOMEM; >> >> - init_uobj(uobj, 0, file->ucontext, &pd_lock_class); >> - down_write(&uobj->mutex); >> + down_read(&file->close_sem); >> + init_uobj(uobj, 0, file->ucontext); > > Seeing this pattern in your patches really makes me wonder.. What is > the down_write even doing at that point??? > > uobj = kmalloc(sizeof *uobj, GFP_KERNEL); > init_uobj(uobj, 0, file->ucontext, &pd_lock_class); > down_write(&uobj->mutex); > > init_uobj doesn't add the memory to any lists or anything. How could > another thread get access to the pointer and contend on the mutex? > Our latest locking series (not posted yet) removes the mutex from the uobject. > I guess it is because later we do: > > ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj); > > While holding the lock. But that is a really ugly API. Nothing should be > added to the idr until it is 100% initialized, so we don't need > concurrency protection. What is the write lock doing at that point?? > We assume the enable function can't fail. Since adding something to idr could fail, we preferred adding it first with live = 0 flag and then enabling it. > This whole flow should be simplified, which will make the locking > requirements a lot clearer. > > uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); > [.. full setup ..] > uobj_activate(uobj); // live = 1, list_add_tail, idr_add > > resp.handle = uobj->id; > copy_to_user(..); > > put_uobj(uobj); > return ret; > > err1: > uobj_deactivate(uobj); > err2: > put_obj(uobj); > I agree with the general schema, but I think idr_add should be done in uobj_alloc as it can fail. > I would do the above sort of restructuring as another (mergable, even) > patch, then rip out the locking. > This locking is done automatically using the new infrastructure. The goal is to remove all these locking semantics from the user handler. Therefore, I don't think we should refactor uverbs_cmd and then delete all locks from there. > Also, I think the destruction should be equally cleaned up. If you > introduce a patch with actual ops and a destroy function pointer > per-type then a centralized destruct call can be implemented that does > the correct lock, wrapping the functionc all. Again this would be a > nice clean up to have distinct from the ioctl stuff... > We have a free function per object type. The correct locking is indeed done by the wrapper. > Jason > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote: > I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast, > rereg_mr and all the destroy functions. I don't think we can change those three, I would be too scared that would break existing apps that don't serialize them. I think we need to change the kernel locking for those three cases to rely on a different lock so that the modify style operations can not return EBUSY. That seems like it should be simple enough? Can we use the lock in the kobj or something? Why is it like that today anyhow? > >I also think this should be stand-alone as part of the uobject rework > >series.. Again it seems like this benifits any approach. > > Yeah, but since the new infrastructure is going to do the locks > automatically for you, we'll ditch the locks in uverbs_cmd anyway. > Therefore, I don't see a real benefit refactoring uverbs_cmd and then > ditching all these changes. We still have to go through a transitional approach here, to me that means we should move existing uverbs closer to the new approach (particularly in terms of locking and code sharing), trial that in real apps, and then add the new api on that tested basis. We can't just abandon uverbs, and we probably shouldn't try to do a flag day convert either. So I don't know what your plan is to 'ditch the locks' with the new infrastucture while maintaining the uverbs api.. My preference is to see smaller patch series with clear goals that can be applied - and 'rework uverbs locking' seems to fit that bill quite well. For instance, if we are introducing a uobject type scheme like this series does then it should be another stand alone series that doesn't try and introduce the ioctl stuff.. > No, it's not the same thing. SRCU protects device removal from command > execution. close_sem protects context destruction from command > execution. It is the same, device removal requires 'context destruction' as a precondition, so it should be able to ride on the same lock. The 'context destruction' for device removal is a bit special but it is still basically the same action as close(fd). > You're correct that in the current state of the code, context is only closed > when a file is closed or when a device is removed, so it's protected. > However, if we want to add a destroy_context action - it becomes necessary. What would destroy_context do? Why would we need that? > If we move to a per uverbs_file idr, contexts (either file context or client > context), we still need to sync destruction of these contexts with carrying > out actions. Since the uverbs_file outlives them, it's reasonable to put the > lock there. Are you proposing that all object destroy will be serialized? > Currently, as far as I remember, modify isn't protected by the write lock > anyway. Besides that, nothing stops the user from syncing the actions. You can't make that argument when dealing with compat. We can't change something like the locking requirements in user space. > >While holding the lock. But that is a really ugly API. Nothing should be > >added to the idr until it is 100% initialized, so we don't need > >concurrency protection. What is the write lock doing at that point?? > > We assume the enable function can't fail. Since adding something to idr > could fail, we preferred adding it first with live = 0 flag and then > enabling it. Why can't it fail? > > uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); > > [.. full setup ..] > > uobj_activate(uobj); // live = 1, list_add_tail, idr_add > > > > resp.handle = uobj->id; > > copy_to_user(..); > > > > put_uobj(uobj); > > return ret; > > > >err1: > > uobj_deactivate(uobj); > >err2: > > put_obj(uobj); > > > > I agree with the general schema, but I think idr_add should be done in > uobj_alloc as it can fail. In my version uobj_activate can fail, and we already have to have the goto-unwind infrastructure to cope with that (copy_to_user could fail), so I'm not sure why avoiding it would be helpful? I also wonder if the above flow is enough to remove live? That always seemed like a strange approach to me, most idr users would just use in-idr as 'live' ... > >I would do the above sort of restructuring as another (mergable, even) > >patch, then rip out the locking. > > This locking is done automatically using the new infrastructure. The goal is > to remove all these locking semantics from the user handler. > Therefore, I don't think we should refactor uverbs_cmd and then delete all > locks from there. What happens to uverbs_cmd then? As I said above we still need to keep it around and it still need to fit with the new infrastructure. I'd rather see a patch flow that slowly transforms uverbs_cmd to a point where adding the new ioctl becomes a simpler patch. There is obvoisly a lot of in-kernel infrastructure work to do. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 14/07/2016 20:00, Jason Gunthorpe wrote: > On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote: > >> I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast, >> rereg_mr and all the destroy functions. > > I don't think we can change those three, I would be too scared that > would break existing apps that don't serialize them. > > I think we need to change the kernel locking for those three cases to > rely on a different lock so that the modify style operations can not > return EBUSY. > > That seems like it should be simple enough? Can we use the lock in the > kobj or something? Why is it like that today anyhow? > Yeah, I think it'll be safer to use lower level locking (like in kobj). >>> I also think this should be stand-alone as part of the uobject rework >>> series.. Again it seems like this benifits any approach. >> >> Yeah, but since the new infrastructure is going to do the locks >> automatically for you, we'll ditch the locks in uverbs_cmd anyway. >> Therefore, I don't see a real benefit refactoring uverbs_cmd and then >> ditching all these changes. > > We still have to go through a transitional approach here, to me that > means we should move existing uverbs closer to the new approach > (particularly in terms of locking and code sharing), trial that in > real apps, and then add the new api on that tested basis. We can't > just abandon uverbs, and we probably shouldn't try to do a flag day > convert either. > Well, another option (not saying it's a better one) is to develop the new ABI in a separate branch and try real world applications using it (with a modified libibverbs of course). This shall give use a feel if things are mature enough. The last step could be adding a write->ioctl interface. That's qualified to "flag day" conversion in the main branch. Saying that, I see that advantages of a transitional approach. We need to balance between transitional approach and try avoiding write code that we know will be dumped anyway. Lets decide on the development phases here after we have a clear idea on how the new ABI is going to look like and we could all work towards this approach. > So I don't know what your plan is to 'ditch the locks' with the new > infrastucture while maintaining the uverbs api.. > Using a conversion layer of the old ABI to the new ABI. Move the current write locks to lower kobj level as you suggested. > My preference is to see smaller patch series with clear goals that can > be applied - and 'rework uverbs locking' seems to fit that bill quite > well. > > For instance, if we are introducing a uobject type scheme like this > series does then it should be another stand alone series that doesn't > try and introduce the ioctl stuff.. > Yeah, changes have a clear smaller goals and won't be changed dramatically as the ABI task is evolving should definitely go through a different patch-set. >> No, it's not the same thing. SRCU protects device removal from command >> execution. close_sem protects context destruction from command >> execution. > > It is the same, device removal requires 'context destruction' as > a precondition, so it should be able to ride on the same lock. > > The 'context destruction' for device removal is a bit special but it > is still basically the same action as close(fd). > rechecked - you're right, both calls ib_uverbs_cleanup_ucontext. >> You're correct that in the current state of the code, context is only closed >> when a file is closed or when a device is removed, so it's protected. >> However, if we want to add a destroy_context action - it becomes necessary. > > What would destroy_context do? Why would we need that? > If you want to break device != fd, you might want to destroy a device and start using the same fd with another device. >> If we move to a per uverbs_file idr, contexts (either file context or client >> context), we still need to sync destruction of these contexts with carrying >> out actions. Since the uverbs_file outlives them, it's reasonable to put the >> lock there. > > Are you proposing that all object destroy will be serialized? > When a process/device dies, I think it's reasonable to destroy objects in a serial way. I think it's reasonable that when a device is removed, you serialize this with handling commands on the same fd. Of course destroying unrelated objects could be done in parallel. >> Currently, as far as I remember, modify isn't protected by the write lock >> anyway. Besides that, nothing stops the user from syncing the actions. > > You can't make that argument when dealing with compat. We can't change > something like the locking requirements in user space. > Of course, but if currently modify_qp uses read_lock - there's no change in user space locking requirement. >>> While holding the lock. But that is a really ugly API. Nothing should be >>> added to the idr until it is 100% initialized, so we don't need >>> concurrency protection. What is the write lock doing at that point?? >> >> We assume the enable function can't fail. Since adding something to idr >> could fail, we preferred adding it first with live = 0 flag and then >> enabling it. > > Why can't it fail? > Because in our v2 (to be sent soon) we only adds it to the type list and set the live flag. >>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); >>> [.. full setup ..] >>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add >>> >>> resp.handle = uobj->id; >>> copy_to_user(..); >>> >>> put_uobj(uobj); >>> return ret; >>> >>> err1: >>> uobj_deactivate(uobj); >>> err2: >>> put_obj(uobj); >>> >> >> I agree with the general schema, but I think idr_add should be done in >> uobj_alloc as it can fail. > > In my version uobj_activate can fail, and we already have to have the > goto-unwind infrastructure to cope with that (copy_to_user could > fail), so I'm not sure why avoiding it would be helpful? > I think it should be the other way around - copy_to_user is done by the handler and uobj_activate is done by the infrastructure. Therefore, the general doesn't know how to roll back the command and I want to guarantee anything it does after calling the handler succeed. > I also wonder if the above flow is enough to remove live? That always > seemed like a strange approach to me, most idr users would just use in-idr > as 'live' ... > >>> I would do the above sort of restructuring as another (mergable, even) >>> patch, then rip out the locking. >> >> This locking is done automatically using the new infrastructure. The goal is >> to remove all these locking semantics from the user handler. >> Therefore, I don't think we should refactor uverbs_cmd and then delete all >> locks from there. > > What happens to uverbs_cmd then? As I said above we still need to keep > it around and it still need to fit with the new infrastructure. I'd > rather see a patch flow that slowly transforms uverbs_cmd to a point > where adding the new ioctl becomes a simpler patch. > IMHO, uverbs_cmd should become write_format->ioctl_format convertor. We don't want to have two paths doing almost the same thing in two different ways. > There is obvoisly a lot of in-kernel infrastructure work to do. > Yeah, that's why we're still in the RFC phase. I think we need to agree on how we are going to do things and get the general picture clear. > Jason > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote: > On 14/07/2016 20:00, Jason Gunthorpe wrote: > >That seems like it should be simple enough? Can we use the lock in the > >kobj or something? Why is it like that today anyhow? > > Yeah, I think it'll be safer to use lower level locking (like in kobj). That is something that could productively be a dedicated patch series.. > Well, another option (not saying it's a better one) is to develop the new > ABI in a separate branch and try real world applications using it (with a > modified libibverbs of course). This shall give use a feel if things are > mature enough. The last step could be adding a write->ioctl interface. > That's qualified to "flag day" conversion in the main branch. My advice is to not do that. I've observed a low success rate with such huge flag day projects. Transitional is more work, but it is the historical 'kernel way' > >So I don't know what your plan is to 'ditch the locks' with the new > >infrastucture while maintaining the uverbs api.. > Using a conversion layer of the old ABI to the new ABI. Move the current > write locks to lower kobj level as you suggested. Hm, that is a long way to go :) > If you want to break device != fd, you might want to destroy a device and > start using the same fd with another device. But that would return EBUSY like all our other destory calls if the subordinate objects have not been deleted? > >>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); > >>> [.. full setup ..] > >>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add > >>> > >>> resp.handle = uobj->id; > >>> copy_to_user(..); > >>> > >>> put_uobj(uobj); > >>> return ret; > >>> > >>>err1: > >>> uobj_deactivate(uobj); > >>>err2: > >>> put_obj(uobj); > >>> > >> > >>I agree with the general schema, but I think idr_add should be done in > >>uobj_alloc as it can fail. > > > >In my version uobj_activate can fail, and we already have to have the > >goto-unwind infrastructure to cope with that (copy_to_user could > >fail), so I'm not sure why avoiding it would be helpful? > > > > I think it should be the other way around - copy_to_user is done by the > handler and uobj_activate is done by the infrastructure. Therefore, the > general doesn't know how to roll back the command and I want to guarantee > anything it does after calling the handler succeed. That just shuffles the requirement around. copy_to_user can always fail, and it always has to happen after the idr_add - so there must always be an unwind for it. Which means we can allow uobj_activate to fail too and use the same basic unwind. > IMHO, uverbs_cmd should become write_format->ioctl_format convertor. > We don't want to have two paths doing almost the same thing in two different > ways. Right, but that is a long way off... Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 19/07/2016 23:44, Jason Gunthorpe wrote: > On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote: >> On 14/07/2016 20:00, Jason Gunthorpe wrote: >>> That seems like it should be simple enough? Can we use the lock in the >>> kobj or something? Why is it like that today anyhow? >> >> Yeah, I think it'll be safer to use lower level locking (like in kobj). > > That is something that could productively be a dedicated patch > series.. > >> Well, another option (not saying it's a better one) is to develop the new >> ABI in a separate branch and try real world applications using it (with a >> modified libibverbs of course). This shall give use a feel if things are >> mature enough. The last step could be adding a write->ioctl interface. >> That's qualified to "flag day" conversion in the main branch. > > My advice is to not do that. > > I've observed a low success rate with such huge flag day projects. > > Transitional is more work, but it is the historical 'kernel way' > Yeah, I'm aware of that. >>> So I don't know what your plan is to 'ditch the locks' with the new >>> infrastucture while maintaining the uverbs api.. > >> Using a conversion layer of the old ABI to the new ABI. Move the current >> write locks to lower kobj level as you suggested. > > Hm, that is a long way to go :) > >> If you want to break device != fd, you might want to destroy a device and >> start using the same fd with another device. > > But that would return EBUSY like all our other destory calls if the > subordinate objects have not been deleted? > Not sure -EBUSY, but an error of course. I think this should be done in a lower layer. I would like to avoid managing dependencies between object in the general infrastructure. It's the same as you can't destroy a PD when there's a bounded CQ (you should get some error for that). >>>>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj)); >>>>> [.. full setup ..] >>>>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add >>>>> >>>>> resp.handle = uobj->id; >>>>> copy_to_user(..); >>>>> >>>>> put_uobj(uobj); >>>>> return ret; >>>>> >>>>> err1: >>>>> uobj_deactivate(uobj); >>>>> err2: >>>>> put_obj(uobj); >>>>> >>>> >>>> I agree with the general schema, but I think idr_add should be done in >>>> uobj_alloc as it can fail. >>> >>> In my version uobj_activate can fail, and we already have to have the >>> goto-unwind infrastructure to cope with that (copy_to_user could >>> fail), so I'm not sure why avoiding it would be helpful? >>> >> >> I think it should be the other way around - copy_to_user is done by the >> handler and uobj_activate is done by the infrastructure. Therefore, the >> general doesn't know how to roll back the command and I want to guarantee >> anything it does after calling the handler succeed. > > That just shuffles the requirement around. copy_to_user can always > fail, and it always has to happen after the idr_add - so there must > always be an unwind for it. Which means we can allow uobj_activate to > fail too and use the same basic unwind. > [Infra] validate params allocate uobjects and IDRs [handler] copy_from_user execute if (success) { copy_to_user return success } else { return failure } [infra contiune] if success enable_new_uobjs unlock_uobjs destroy_what_has_to_be_destroyed else delete_new_uobjs unlock_uobjs In that way, the infrastructure doesn't need to roll back commands. Only to roll back whatever it executed. >> IMHO, uverbs_cmd should become write_format->ioctl_format convertor. >> We don't want to have two paths doing almost the same thing in two different >> ways. > > Right, but that is a long way off... > > Jason > Matan -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jul 20, 2016 at 10:54:55AM +0300, Matan Barak (External) wrote: > Not sure -EBUSY, but an error of course. I think this should be done in a > lower layer. I would like to avoid managing dependencies between object in > the general infrastructure. It's the same as you can't destroy a PD when > there's a bounded CQ (you should get some error for that). Yes, that is my thinking.. > >>I think it should be the other way around - copy_to_user is done by the > >>handler and uobj_activate is done by the infrastructure. Therefore, the > >>general doesn't know how to roll back the command and I want to guarantee > >>anything it does after calling the handler succeed. > > > >That just shuffles the requirement around. copy_to_user can always > >fail, and it always has to happen after the idr_add - so there must > >always be an unwind for it. Which means we can allow uobj_activate to > >fail too and use the same basic unwind. > > > > [Infra] > validate params > allocate uobjects and IDRs > [handler] > copy_from_user > execute > if (success) { > copy_to_user > return success > } else { > return failure > } > > [infra contiune] > if success > enable_new_uobjs > unlock_uobjs > destroy_what_has_to_be_destroyed > else > delete_new_uobjs > unlock_uobjs > > > In that way, the infrastructure doesn't need to roll back commands. Only to > roll back whatever it executed. I'd really prefer to avoid this kind of flow where the idr is installed at the top. The current code does that and uses 'live' to fix it, but that is not how idrs are typically used in the kernel, and getting rid of live is a noble goal, IMHO. I suggest you use the same basic flow but 'handler' returns a functional uobj that is *not* inside the idr, and the destroy of a functional uboj is simply handled by calling the types' destroy method, like any other case. To make this work better I suggest having a standard place in the ioctl response for the allocated uobj # and let the core code fill that in, that way we don't need to copy_to_user in the handler for the uobj #. If idr insertion fails, then call the type's destroy function, just like on a fd close or otherwise. No trouble for the core code. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/infiniband/core/uidr.c b/drivers/infiniband/core/uidr.c index 72c4c77..6018717 100644 --- a/drivers/infiniband/core/uidr.c +++ b/drivers/infiniband/core/uidr.c @@ -88,7 +88,7 @@ struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type, if (!uobj) return ERR_PTR(-ENOMEM); - init_uobj(uobj, 0, ucontext, &type->lock_class); + init_uobj(uobj, 0, ucontext); /* lock idr */ ret = ib_uverbs_uobject_add(uobj, type); @@ -213,10 +213,6 @@ static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context, if (!uobj) return NULL; - if (nested) - down_read_nested(&uobj->mutex, SINGLE_DEPTH_NESTING); - else - down_read(&uobj->mutex); if (!uobj->live) { put_uobj_read(uobj); return NULL; @@ -233,11 +229,8 @@ struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context) if (!uobj) return NULL; - down_write(&uobj->mutex); - if (!uobj->live) { - put_uobj_write(uobj); + if (!uobj->live) return NULL; - } return uobj; } diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c index c3d1098..ed862e8 100644 --- a/drivers/infiniband/core/uobject.c +++ b/drivers/infiniband/core/uobject.c @@ -179,13 +179,11 @@ void ib_uverbs_uobject_enable(struct ib_uobject *uobject) */ void init_uobj(struct ib_uobject *uobj, u64 user_handle, - struct ib_ucontext *context, struct uverbs_lock_class *c) + struct ib_ucontext *context) { uobj->user_handle = user_handle; uobj->context = context; kref_init(&uobj->ref); - init_rwsem(&uobj->mutex); - lockdep_set_class_and_name(&uobj->mutex, &c->key, c->name); uobj->live = 0; } @@ -201,12 +199,10 @@ void put_uobj(struct ib_uobject *uobj) void put_uobj_read(struct ib_uobject *uobj) { - up_read(&uobj->mutex); put_uobj(uobj); } void put_uobj_write(struct ib_uobject *uobj) { - up_write(&uobj->mutex); put_uobj(uobj); } diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h index 13fdaef..3514a1b 100644 --- a/drivers/infiniband/core/uobject.h +++ b/drivers/infiniband/core/uobject.h @@ -48,12 +48,12 @@ struct uverbs_uobject_type { struct ib_uobject *uobject, struct ib_ucontext *ucontext); u16 obj_type; - struct uverbs_lock_class lock_class; }; /* embed in ucontext per type */ struct uverbs_uobject_list { struct uverbs_uobject_type *type; + /* TODO: replace with hash for faster access */ struct list_head list; struct list_head type_list; }; @@ -64,7 +64,7 @@ void ib_uverbs_uobject_remove(struct ib_uobject *uobject); void ib_uverbs_uobject_enable(struct ib_uobject *uobject); void init_uobj(struct ib_uobject *uobj, u64 user_handle, - struct ib_ucontext *context, struct uverbs_lock_class *c); + struct ib_ucontext *context); void release_uobj(struct kref *kref); void put_uobj(struct ib_uobject *uobj); diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c index 22406fe..148e26e 100644 --- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -46,16 +46,6 @@ #include "core_priv.h" -static struct uverbs_lock_class pd_lock_class = { .name = "PD-uobj" }; -static struct uverbs_lock_class mr_lock_class = { .name = "MR-uobj" }; -static struct uverbs_lock_class mw_lock_class = { .name = "MW-uobj" }; -static struct uverbs_lock_class cq_lock_class = { .name = "CQ-uobj" }; -static struct uverbs_lock_class qp_lock_class = { .name = "QP-uobj" }; -static struct uverbs_lock_class ah_lock_class = { .name = "AH-uobj" }; -static struct uverbs_lock_class srq_lock_class = { .name = "SRQ-uobj" }; -static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" }; -static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" }; - ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file, struct ib_device *ib_dev, const char __user *buf, @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file, if (!uobj) return -ENOMEM; - init_uobj(uobj, 0, file->ucontext, &pd_lock_class); - down_write(&uobj->mutex); + down_read(&file->close_sem); + init_uobj(uobj, 0, file->ucontext); pd = ib_dev->alloc_pd(ib_dev, file->ucontext, &udata); if (IS_ERR(pd)) { @@ -342,7 +332,7 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file, uobj->live = 1; - up_write(&uobj->mutex); + up_read(&file->close_sem); return in_len; @@ -543,9 +533,8 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file, goto err_tree_mutex_unlock; } - init_uobj(&obj->uobject, 0, file->ucontext, &xrcd_lock_class); - - down_write(&obj->uobject.mutex); + down_read(&file->close_sem); + init_uobj(&obj->uobject, 0, file->ucontext); if (!xrcd) { xrcd = ib_dev->alloc_xrcd(ib_dev, file->ucontext, &udata); @@ -595,7 +584,7 @@ ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file, mutex_unlock(&file->mutex); obj->uobject.live = 1; - up_write(&obj->uobject.mutex); + up_read(&file->close_sem); mutex_unlock(&file->device->xrcd_tree_mutex); return in_len; @@ -737,8 +726,8 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, if (!uobj) return -ENOMEM; - init_uobj(uobj, 0, file->ucontext, &mr_lock_class); - down_write(&uobj->mutex); + down_read(&file->close_sem); + init_uobj(uobj, 0, file->ucontext); pd = idr_read_pd(cmd.pd_handle, file->ucontext); if (!pd) { @@ -791,7 +780,7 @@ ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file, uobj->live = 1; - up_write(&uobj->mutex); + up_read(&file->close_sem); return in_len; @@ -959,8 +948,8 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file, if (!uobj) return -ENOMEM; - init_uobj(uobj, 0, file->ucontext, &mw_lock_class); - down_write(&uobj->mutex); + down_read(&file->close_sem); + init_uobj(uobj, 0, file->ucontext); pd = idr_read_pd(cmd.pd_handle, file->ucontext); if (!pd) { @@ -1007,7 +996,7 @@ ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file, uobj->live = 1; - up_write(&uobj->mutex); + up_read(&file->close_sem); return in_len; @@ -1129,8 +1118,8 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file, if (!obj) return ERR_PTR(-ENOMEM); - init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class); - down_write(&obj->uobject.mutex); + down_read(&file->close_sem); + init_uobj(&obj->uobject, cmd->user_handle, file->ucontext); if (cmd->comp_channel >= 0) { ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel); @@ -1188,7 +1177,7 @@ static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file, obj->uobject.live = 1; - up_write(&obj->uobject.mutex); + up_read(&file->close_sem); return obj; @@ -1530,9 +1519,8 @@ static int create_qp(struct ib_uverbs_file *file, if (!obj) return -ENOMEM; - init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext, - &qp_lock_class); - down_write(&obj->uevent.uobject.mutex); + down_read(&file->close_sem); + init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext); if (cmd->qp_type == IB_QPT_XRC_TGT) { xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext, @@ -1692,7 +1680,7 @@ static int create_qp(struct ib_uverbs_file *file, obj->uevent.uobject.live = 1; - up_write(&obj->uevent.uobject.mutex); + up_read(&file->close_sem); return 0; err_cb: @@ -1853,8 +1841,8 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file, if (!obj) return -ENOMEM; - init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class); - down_write(&obj->uevent.uobject.mutex); + down_read(&file->close_sem); + init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext); xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj); if (!xrcd) { @@ -1904,7 +1892,7 @@ ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file, obj->uevent.uobject.live = 1; - up_write(&obj->uevent.uobject.mutex); + up_read(&file->close_sem); return in_len; @@ -2593,8 +2581,8 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file, if (!uobj) return -ENOMEM; - init_uobj(uobj, cmd.user_handle, file->ucontext, &ah_lock_class); - down_write(&uobj->mutex); + down_read(&file->close_sem); + init_uobj(uobj, cmd.user_handle, file->ucontext); pd = idr_read_pd(cmd.pd_handle, file->ucontext); if (!pd) { @@ -2644,7 +2632,7 @@ ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file, uobj->live = 1; - up_write(&uobj->mutex); + up_read(&file->close_sem); return in_len; @@ -2904,8 +2892,8 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, err = -ENOMEM; goto err_free_attr; } - init_uobj(uobj, 0, file->ucontext, &rule_lock_class); - down_write(&uobj->mutex); + down_read(&file->close_sem); + init_uobj(uobj, 0, file->ucontext); qp = idr_read_qp(cmd.qp_handle, file->ucontext); if (!qp) { @@ -2975,7 +2963,7 @@ int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file, uobj->live = 1; - up_write(&uobj->mutex); + up_read(&file->close_sem); kfree(flow_attr); if (cmd.flow_attr.num_of_specs) kfree(kern_flow_attr); @@ -3055,8 +3043,8 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file, if (!obj) return -ENOMEM; - init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext, &srq_lock_class); - down_write(&obj->uevent.uobject.mutex); + down_read(&file->close_sem); + init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext); if (cmd->srq_type == IB_SRQT_XRC) { attr.ext.xrc.xrcd = idr_read_xrcd(cmd->xrcd_handle, file->ucontext, &xrcd_uobj); @@ -3144,7 +3132,7 @@ static int __uverbs_create_xsrq(struct ib_uverbs_file *file, obj->uevent.uobject.live = 1; - up_write(&obj->uevent.uobject.mutex); + up_read(&file->close_sem); return 0; diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index e402473..84d72d2 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -1356,7 +1356,6 @@ struct ib_uobject { int id; /* index into kernel idr */ struct kref ref; atomic_t usecnt; - struct rw_semaphore mutex; /* protects .live */ struct rcu_head rcu; /* kfree_rcu() overhead */ int live; /* List of object under uverbs_object_type */