diff mbox

[RFC,ABI,V1,7/8] RDMA/core: Change locking of ib_uobject

Message ID 1467293971-25688-8-git-send-email-matanb@mellanox.com (mailing list archive)
State RFC
Headers show

Commit Message

Matan Barak June 30, 2016, 1:39 p.m. UTC
This patch presents some concept about lockless scheme. It's only
intended to open discussion and still misses some crucial parts.

Every ib_uobject has a usecnt atomic variable. Single rwsem in
ib_uverbs_file (protect destruction of device from execution of
commands).

For every command execution:
  a. Check if device is available (as of today, with srcu)
  b. Take down_read(&file->close_sem)
  c. <User is now protected from closing the process and ib-dev
     destruction>
  d. When an object is used -> use uverbs_lock_object function
     implemented in this patch.
       i .If the object isn't available (-EBUSY)
  e. Unlock with uverbs_unlock_object (implemented in this patch)
  f. Release up_read(&file->close_sem)

Disassociate/process destruction:
  * In disassociate, do thee following for every process
  1. Grab down_write(&file->close_sem)
  2. Release all objects from context, ordered by type list (call free
     function the driver has specified)
  3. Release up_write(&file->close_sem)

ib_uobjects are lockless. If two commands are executed in
parallel and one need exclusive access (WRITE/DESTROY) -> one
command will fail. User-space needs to either synchronize or do
something productive with the failure.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Haggai Eran <haggaie@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
---
 drivers/infiniband/core/uidr.c       | 11 +-----
 drivers/infiniband/core/uobject.c    |  6 +--
 drivers/infiniband/core/uobject.h    |  4 +-
 drivers/infiniband/core/uverbs_cmd.c | 72 +++++++++++++++---------------------
 include/rdma/ib_verbs.h              |  1 -
 5 files changed, 35 insertions(+), 59 deletions(-)

Comments

Jason Gunthorpe July 13, 2016, 5:16 p.m. UTC | #1
On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote:
> This patch presents some concept about lockless scheme. It's only
> intended to open discussion and still misses some crucial parts.

It isn't really lockless, is it? I gather the intent is to remove the
per-uobj lock(s) and replace with a ucontext rwsem?

> Every ib_uobject has a usecnt atomic variable. Single rwsem in
> ib_uverbs_file (protect destruction of device from execution of
> commands).
> 
> For every command execution:
>   a. Check if device is available (as of today, with srcu)
>   b. Take down_read(&file->close_sem)
>   c. <User is now protected from closing the process and ib-dev
>      destruction>
>   d. When an object is used -> use uverbs_lock_object function
>      implemented in this patch.
>        i .If the object isn't available (-EBUSY)

I'd like to see a detailed list of all the calls that change from
blocking to EBUSY with this approach.

I also think this should be stand-alone as part of the uobject rework
series.. Again it seems like this benifits any approach.

uverbs_lock_object is in an different patch in the series..

I'm always nervous when I see people create their own locks with
atomics. Don't do that. That is a rw lock using try_lock.

>   e. Unlock with uverbs_unlock_object (implemented in this patch)
>   f. Release up_read(&file->close_sem)

g. release srcu

If the close_sem is always being held, then I suspect we can drop SRCU
as well??

> ib_uobjects are lockless. If two commands are executed in
> parallel and one need exclusive access (WRITE/DESTROY) -> one
> command will fail. User-space needs to either synchronize or do
> something productive with the failure.

I think this can work for destroy, but I have a hard time seeing how
it is going to be OK for WRITE/MODIFY/etc operations.

eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing
that be allowed to fail ..

> @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
>  	if (!uobj)
>  		return -ENOMEM;
>  
> -	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
> -	down_write(&uobj->mutex);
> +	down_read(&file->close_sem);
> +	init_uobj(uobj, 0, file->ucontext);

Seeing this pattern in your patches really makes me wonder.. What is
the down_write even doing at that point???

        uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
        init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
        down_write(&uobj->mutex);

init_uobj doesn't add the memory to any lists or anything. How could
another thread get access to the pointer and contend on the mutex?

I guess it is because later we do:

        ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);

While holding the lock. But that is a really ugly API. Nothing should be
added to the idr until it is 100% initialized, so we don't need
concurrency protection. What is the write lock doing at that point??

This whole flow should be simplified, which will make the locking
requirements a lot clearer.

  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
  [.. full setup ..]
  uobj_activate(uobj); // live = 1, list_add_tail, idr_add

  resp.handle = uobj->id;
  copy_to_user(..);

  put_uobj(uobj);
  return ret;

err1:
  uobj_deactivate(uobj);
err2:
  put_obj(uobj);

I would do the above sort of restructuring as another (mergable, even)
patch, then rip out the locking.

Also, I think the destruction should be equally cleaned up. If you
introduce a patch with actual ops and a destroy function pointer
per-type then a centralized destruct call can be implemented that does
the correct lock, wrapping the functionc all. Again this would be a
nice clean up to have distinct from the ioctl stuff...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak July 14, 2016, 7:59 a.m. UTC | #2
On 13/07/2016 20:16, Jason Gunthorpe wrote:
> On Thu, Jun 30, 2016 at 04:39:30PM +0300, Matan Barak wrote:
>> This patch presents some concept about lockless scheme. It's only
>> intended to open discussion and still misses some crucial parts.
>
> It isn't really lockless, is it? I gather the intent is to remove the
> per-uobj lock(s) and replace with a ucontext rwsem?
>
>> Every ib_uobject has a usecnt atomic variable. Single rwsem in
>> ib_uverbs_file (protect destruction of device from execution of
>> commands).
>>
>> For every command execution:
>>   a. Check if device is available (as of today, with srcu)
>>   b. Take down_read(&file->close_sem)
>>   c. <User is now protected from closing the process and ib-dev
>>      destruction>
>>   d. When an object is used -> use uverbs_lock_object function
>>      implemented in this patch.
>>        i .If the object isn't available (-EBUSY)
>
> I'd like to see a detailed list of all the calls that change from
> blocking to EBUSY with this approach.
>

I'm not at the office right now, but AFAIK it's attach_mcast, 
detach_mcast, rereg_mr and all the destroy functions.

> I also think this should be stand-alone as part of the uobject rework
> series.. Again it seems like this benifits any approach.
>

Yeah, but since the new infrastructure is going to do the locks 
automatically for you, we'll ditch the locks in uverbs_cmd anyway. 
Therefore, I don't see a real benefit refactoring uverbs_cmd and then 
ditching all these changes.

> uverbs_lock_object is in an different patch in the series..
>
> I'm always nervous when I see people create their own locks with
> atomics. Don't do that. That is a rw lock using try_lock.
>

Ok

>>   e. Unlock with uverbs_unlock_object (implemented in this patch)
>>   f. Release up_read(&file->close_sem)
>
> g. release srcu
>
> If the close_sem is always being held, then I suspect we can drop SRCU
> as well??
>

No, it's not the same thing. SRCU protects device removal from command 
execution. close_sem protects context destruction from command execution.
You're correct that in the current state of the code, context is only 
closed when a file is closed or when a device is removed, so it's 
protected. However, if we want to add a destroy_context action - it 
becomes necessary.

If we move to a per uverbs_file idr, contexts (either file context or 
client context), we still need to sync destruction of these contexts 
with carrying out actions. Since the uverbs_file outlives them, it's 
reasonable to put the lock there.

>> ib_uobjects are lockless. If two commands are executed in
>> parallel and one need exclusive access (WRITE/DESTROY) -> one
>> command will fail. User-space needs to either synchronize or do
>> something productive with the failure.
>
> I think this can work for destroy, but I have a hard time seeing how
> it is going to be OK for WRITE/MODIFY/etc operations.
>
> eg rereg_mr holds the write lock, I'm not sure I'm comfortable seeing
> that be allowed to fail ..
>

Currently, as far as I remember, modify isn't protected by the write 
lock anyway. Besides that, nothing stops the user from syncing the actions.

If you do the synchronization in kernel, you need to make sure locks are 
taken in the exact same order for all commands to avoid deadlock in the 
kernel. This is more expensive than just letting the user do that.
For example, if a user has a single thread to control his resources, why 
do we have to waste time or sorting his handles in kernel?

>> @@ -308,8 +298,8 @@ ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
>>  	if (!uobj)
>>  		return -ENOMEM;
>>
>> -	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
>> -	down_write(&uobj->mutex);
>> +	down_read(&file->close_sem);
>> +	init_uobj(uobj, 0, file->ucontext);
>
> Seeing this pattern in your patches really makes me wonder.. What is
> the down_write even doing at that point???
>
>         uobj = kmalloc(sizeof *uobj, GFP_KERNEL);
>         init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
>         down_write(&uobj->mutex);
>
> init_uobj doesn't add the memory to any lists or anything. How could
> another thread get access to the pointer and contend on the mutex?
>

Our latest locking series (not posted yet) removes the mutex from the 
uobject.

> I guess it is because later we do:
>
>         ret = idr_add_uobj(&ib_uverbs_pd_idr, uobj);
>
> While holding the lock. But that is a really ugly API. Nothing should be
> added to the idr until it is 100% initialized, so we don't need
> concurrency protection. What is the write lock doing at that point??
>

We assume the enable function can't fail. Since adding something to idr 
could fail, we preferred adding it first with live = 0 flag and then 
enabling it.

> This whole flow should be simplified, which will make the locking
> requirements a lot clearer.
>
>   uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>   [.. full setup ..]
>   uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>
>   resp.handle = uobj->id;
>   copy_to_user(..);
>
>   put_uobj(uobj);
>   return ret;
>
> err1:
>   uobj_deactivate(uobj);
> err2:
>   put_obj(uobj);
>

I agree with the general schema, but I think idr_add should be done in 
uobj_alloc as it can fail.

> I would do the above sort of restructuring as another (mergable, even)
> patch, then rip out the locking.
>

This locking is done automatically using the new infrastructure. The 
goal is to remove all these locking semantics from the user handler.
Therefore, I don't think we should refactor uverbs_cmd and then delete 
all locks from there.

> Also, I think the destruction should be equally cleaned up. If you
> introduce a patch with actual ops and a destroy function pointer
> per-type then a centralized destruct call can be implemented that does
> the correct lock, wrapping the functionc all. Again this would be a
> nice clean up to have distinct from the ioctl stuff...
>

We have a free function per object type. The correct locking is indeed 
done by the wrapper.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 14, 2016, 5 p.m. UTC | #3
On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote:

> I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast,
> rereg_mr and all the destroy functions.

I don't think we can change those three, I would be too scared that
would break existing apps that don't serialize them.

I think we need to change the kernel locking for those three cases to
rely on a different lock so that the modify style operations can not
return EBUSY.

That seems like it should be simple enough? Can we use the lock in the
kobj or something? Why is it like that today anyhow?

> >I also think this should be stand-alone as part of the uobject rework
> >series.. Again it seems like this benifits any approach.
> 
> Yeah, but since the new infrastructure is going to do the locks
> automatically for you, we'll ditch the locks in uverbs_cmd anyway.
> Therefore, I don't see a real benefit refactoring uverbs_cmd and then
> ditching all these changes.

We still have to go through a transitional approach here, to me that
means we should move existing uverbs closer to the new approach
(particularly in terms of locking and code sharing), trial that in
real apps, and then add the new api on that tested basis. We can't
just abandon uverbs, and we probably shouldn't try to do a flag day
convert either.

So I don't know what your plan is to 'ditch the locks' with the new
infrastucture while maintaining the uverbs api..

My preference is to see smaller patch series with clear goals that can
be applied - and 'rework uverbs locking' seems to fit that bill quite
well.

For instance, if we are introducing a uobject type scheme like this
series does then it should be another stand alone series that doesn't
try and introduce the ioctl stuff..

> No, it's not the same thing. SRCU protects device removal from command
> execution. close_sem protects context destruction from command
> execution.

It is the same, device removal requires 'context destruction' as
a precondition, so it should be able to ride on the same lock.

The 'context destruction' for device removal is a bit special but it
is still basically the same action as close(fd).

> You're correct that in the current state of the code, context is only closed
> when a file is closed or when a device is removed, so it's protected.
> However, if we want to add a destroy_context action - it becomes necessary.

What would destroy_context do? Why would we need that?

> If we move to a per uverbs_file idr, contexts (either file context or client
> context), we still need to sync destruction of these contexts with carrying
> out actions. Since the uverbs_file outlives them, it's reasonable to put the
> lock there.

Are you proposing that all object destroy will be serialized?

> Currently, as far as I remember, modify isn't protected by the write lock
> anyway. Besides that, nothing stops the user from syncing the actions.

You can't make that argument when dealing with compat. We can't change
something like the locking requirements in user space.

> >While holding the lock. But that is a really ugly API. Nothing should be
> >added to the idr until it is 100% initialized, so we don't need
> >concurrency protection. What is the write lock doing at that point??
> 
> We assume the enable function can't fail. Since adding something to idr
> could fail, we preferred adding it first with live = 0 flag and then
> enabling it.

Why can't it fail?

> >  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
> >  [.. full setup ..]
> >  uobj_activate(uobj); // live = 1, list_add_tail, idr_add
> >
> >  resp.handle = uobj->id;
> >  copy_to_user(..);
> >
> >  put_uobj(uobj);
> >  return ret;
> >
> >err1:
> >  uobj_deactivate(uobj);
> >err2:
> >  put_obj(uobj);
> >
> 
> I agree with the general schema, but I think idr_add should be done in
> uobj_alloc as it can fail.

In my version uobj_activate can fail, and we already have to have the
goto-unwind infrastructure to cope with that (copy_to_user could
fail), so I'm not sure why avoiding it would be helpful?

I also wonder if the above flow is enough to remove live? That always
seemed like a strange approach to me, most idr users would just use in-idr
as 'live' ...

> >I would do the above sort of restructuring as another (mergable, even)
> >patch, then rip out the locking.
> 
> This locking is done automatically using the new infrastructure. The goal is
> to remove all these locking semantics from the user handler.
> Therefore, I don't think we should refactor uverbs_cmd and then delete all
> locks from there.

What happens to uverbs_cmd then? As I said above we still need to keep
it around and it still need to fit with the new infrastructure. I'd
rather see a patch flow that slowly transforms uverbs_cmd to a point
where adding the new ioctl becomes a simpler patch.

There is obvoisly a lot of in-kernel infrastructure work to do.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak July 19, 2016, 3:16 p.m. UTC | #4
On 14/07/2016 20:00, Jason Gunthorpe wrote:
> On Thu, Jul 14, 2016 at 10:59:41AM +0300, Matan Barak wrote:
>
>> I'm not at the office right now, but AFAIK it's attach_mcast, detach_mcast,
>> rereg_mr and all the destroy functions.
>
> I don't think we can change those three, I would be too scared that
> would break existing apps that don't serialize them.
>
> I think we need to change the kernel locking for those three cases to
> rely on a different lock so that the modify style operations can not
> return EBUSY.
>
> That seems like it should be simple enough? Can we use the lock in the
> kobj or something? Why is it like that today anyhow?
>

Yeah, I think it'll be safer to use lower level locking (like in kobj).

>>> I also think this should be stand-alone as part of the uobject rework
>>> series.. Again it seems like this benifits any approach.
>>
>> Yeah, but since the new infrastructure is going to do the locks
>> automatically for you, we'll ditch the locks in uverbs_cmd anyway.
>> Therefore, I don't see a real benefit refactoring uverbs_cmd and then
>> ditching all these changes.
>
> We still have to go through a transitional approach here, to me that
> means we should move existing uverbs closer to the new approach
> (particularly in terms of locking and code sharing), trial that in
> real apps, and then add the new api on that tested basis. We can't
> just abandon uverbs, and we probably shouldn't try to do a flag day
> convert either.
>

Well, another option (not saying it's a better one) is to develop the 
new ABI in a separate branch and try real world applications using it 
(with a modified libibverbs of course). This shall give use a feel if 
things are mature enough. The last step could be adding a write->ioctl 
interface. That's qualified to "flag day" conversion in the main branch.

Saying that, I see that advantages of a transitional approach. We need 
to balance between transitional approach and try avoiding write code 
that we know will be dumped anyway.

Lets decide on the development phases here after we have a clear idea on 
how the new ABI is going to look like and we could all work towards this 
approach.

> So I don't know what your plan is to 'ditch the locks' with the new
> infrastucture while maintaining the uverbs api..
>

Using a conversion layer of the old ABI to the new ABI. Move the current 
write locks to lower kobj level as you suggested.

> My preference is to see smaller patch series with clear goals that can
> be applied - and 'rework uverbs locking' seems to fit that bill quite
> well.
>
> For instance, if we are introducing a uobject type scheme like this
> series does then it should be another stand alone series that doesn't
> try and introduce the ioctl stuff..
>

Yeah, changes have a clear smaller goals and won't be changed 
dramatically as the ABI task is evolving should definitely go through a 
different patch-set.

>> No, it's not the same thing. SRCU protects device removal from command
>> execution. close_sem protects context destruction from command
>> execution.
>
> It is the same, device removal requires 'context destruction' as
> a precondition, so it should be able to ride on the same lock.
>
> The 'context destruction' for device removal is a bit special but it
> is still basically the same action as close(fd).
>

rechecked - you're right, both calls ib_uverbs_cleanup_ucontext.

>> You're correct that in the current state of the code, context is only closed
>> when a file is closed or when a device is removed, so it's protected.
>> However, if we want to add a destroy_context action - it becomes necessary.
>
> What would destroy_context do? Why would we need that?
>

If you want to break device != fd, you might want to destroy a device 
and start using the same fd with another device.

>> If we move to a per uverbs_file idr, contexts (either file context or client
>> context), we still need to sync destruction of these contexts with carrying
>> out actions. Since the uverbs_file outlives them, it's reasonable to put the
>> lock there.
>
> Are you proposing that all object destroy will be serialized?
>

When a process/device dies, I think it's reasonable to destroy objects 
in a serial way. I think it's reasonable that when a device is removed, 
you serialize this with handling commands on the same fd. Of course 
destroying unrelated objects could be done in parallel.

>> Currently, as far as I remember, modify isn't protected by the write lock
>> anyway. Besides that, nothing stops the user from syncing the actions.
>
> You can't make that argument when dealing with compat. We can't change
> something like the locking requirements in user space.
>

Of course, but if currently modify_qp uses read_lock - there's no change 
in user space locking requirement.

>>> While holding the lock. But that is a really ugly API. Nothing should be
>>> added to the idr until it is 100% initialized, so we don't need
>>> concurrency protection. What is the write lock doing at that point??
>>
>> We assume the enable function can't fail. Since adding something to idr
>> could fail, we preferred adding it first with live = 0 flag and then
>> enabling it.
>
> Why can't it fail?
>

Because in our v2 (to be sent soon) we only adds it to the type list and 
set the live flag.

>>>  uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>>>  [.. full setup ..]
>>>  uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>>>
>>>  resp.handle = uobj->id;
>>>  copy_to_user(..);
>>>
>>>  put_uobj(uobj);
>>>  return ret;
>>>
>>> err1:
>>>  uobj_deactivate(uobj);
>>> err2:
>>>  put_obj(uobj);
>>>
>>
>> I agree with the general schema, but I think idr_add should be done in
>> uobj_alloc as it can fail.
>
> In my version uobj_activate can fail, and we already have to have the
> goto-unwind infrastructure to cope with that (copy_to_user could
> fail), so I'm not sure why avoiding it would be helpful?
>

I think it should be the other way around - copy_to_user is done by the 
handler and uobj_activate is done by the infrastructure. Therefore, the 
general doesn't know how to roll back the command and I want to 
guarantee anything it does after calling the handler succeed.

> I also wonder if the above flow is enough to remove live? That always
> seemed like a strange approach to me, most idr users would just use in-idr
> as 'live' ...
>
>>> I would do the above sort of restructuring as another (mergable, even)
>>> patch, then rip out the locking.
>>
>> This locking is done automatically using the new infrastructure. The goal is
>> to remove all these locking semantics from the user handler.
>> Therefore, I don't think we should refactor uverbs_cmd and then delete all
>> locks from there.
>
> What happens to uverbs_cmd then? As I said above we still need to keep
> it around and it still need to fit with the new infrastructure. I'd
> rather see a patch flow that slowly transforms uverbs_cmd to a point
> where adding the new ioctl becomes a simpler patch.
>

IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
We don't want to have two paths doing almost the same thing in two 
different ways.

> There is obvoisly a lot of in-kernel infrastructure work to do.
>

Yeah, that's why we're still in the RFC phase. I think we need to agree 
on how we are going to do things and get the general picture clear.

> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 19, 2016, 8:44 p.m. UTC | #5
On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote:
> On 14/07/2016 20:00, Jason Gunthorpe wrote:
> >That seems like it should be simple enough? Can we use the lock in the
> >kobj or something? Why is it like that today anyhow?
> 
> Yeah, I think it'll be safer to use lower level locking (like in kobj).

That is something that could productively be a dedicated patch
series..

> Well, another option (not saying it's a better one) is to develop the new
> ABI in a separate branch and try real world applications using it (with a
> modified libibverbs of course). This shall give use a feel if things are
> mature enough. The last step could be adding a write->ioctl interface.
> That's qualified to "flag day" conversion in the main branch.

My advice is to not do that.

I've observed a low success rate with such huge flag day projects.

Transitional is more work, but it is the historical 'kernel way'

> >So I don't know what your plan is to 'ditch the locks' with the new
> >infrastucture while maintaining the uverbs api..

> Using a conversion layer of the old ABI to the new ABI. Move the current
> write locks to lower kobj level as you suggested.

Hm, that is a long way to go :)

> If you want to break device != fd, you might want to destroy a device and
> start using the same fd with another device.

But that would return EBUSY like all our other destory calls if the
subordinate objects have not been deleted?

> >>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
> >>> [.. full setup ..]
> >>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add
> >>>
> >>> resp.handle = uobj->id;
> >>> copy_to_user(..);
> >>>
> >>> put_uobj(uobj);
> >>> return ret;
> >>>
> >>>err1:
> >>> uobj_deactivate(uobj);
> >>>err2:
> >>> put_obj(uobj);
> >>>
> >>
> >>I agree with the general schema, but I think idr_add should be done in
> >>uobj_alloc as it can fail.
> >
> >In my version uobj_activate can fail, and we already have to have the
> >goto-unwind infrastructure to cope with that (copy_to_user could
> >fail), so I'm not sure why avoiding it would be helpful?
> >
> 
> I think it should be the other way around - copy_to_user is done by the
> handler and uobj_activate is done by the infrastructure. Therefore, the
> general doesn't know how to roll back the command and I want to guarantee
> anything it does after calling the handler succeed.

That just shuffles the requirement around. copy_to_user can always
fail, and it always has to happen after the idr_add - so there must
always be an unwind for it. Which means we can allow uobj_activate to
fail too and use the same basic unwind.

> IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
> We don't want to have two paths doing almost the same thing in two different
> ways.

Right, but that is a long way off...

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Matan Barak July 20, 2016, 7:54 a.m. UTC | #6
On 19/07/2016 23:44, Jason Gunthorpe wrote:
> On Tue, Jul 19, 2016 at 06:16:11PM +0300, Matan Barak wrote:
>> On 14/07/2016 20:00, Jason Gunthorpe wrote:
>>> That seems like it should be simple enough? Can we use the lock in the
>>> kobj or something? Why is it like that today anyhow?
>>
>> Yeah, I think it'll be safer to use lower level locking (like in kobj).
>
> That is something that could productively be a dedicated patch
> series..
>
>> Well, another option (not saying it's a better one) is to develop the new
>> ABI in a separate branch and try real world applications using it (with a
>> modified libibverbs of course). This shall give use a feel if things are
>> mature enough. The last step could be adding a write->ioctl interface.
>> That's qualified to "flag day" conversion in the main branch.
>
> My advice is to not do that.
>
> I've observed a low success rate with such huge flag day projects.
>
> Transitional is more work, but it is the historical 'kernel way'
>

Yeah, I'm aware of that.

>>> So I don't know what your plan is to 'ditch the locks' with the new
>>> infrastucture while maintaining the uverbs api..
>
>> Using a conversion layer of the old ABI to the new ABI. Move the current
>> write locks to lower kobj level as you suggested.
>
> Hm, that is a long way to go :)
>
>> If you want to break device != fd, you might want to destroy a device and
>> start using the same fd with another device.
>
> But that would return EBUSY like all our other destory calls if the
> subordinate objects have not been deleted?
>

Not sure -EBUSY, but an error of course. I think this should be done in 
a lower layer. I would like to avoid managing dependencies between 
object in the general infrastructure. It's the same as you can't destroy 
a PD when there's a bounded CQ (you should get some error for that).

>>>>> uobj = uobj_alloc(file->ucontext, 0, sizeof(*uobj));
>>>>> [.. full setup ..]
>>>>> uobj_activate(uobj); // live = 1, list_add_tail, idr_add
>>>>>
>>>>> resp.handle = uobj->id;
>>>>> copy_to_user(..);
>>>>>
>>>>> put_uobj(uobj);
>>>>> return ret;
>>>>>
>>>>> err1:
>>>>> uobj_deactivate(uobj);
>>>>> err2:
>>>>> put_obj(uobj);
>>>>>
>>>>
>>>> I agree with the general schema, but I think idr_add should be done in
>>>> uobj_alloc as it can fail.
>>>
>>> In my version uobj_activate can fail, and we already have to have the
>>> goto-unwind infrastructure to cope with that (copy_to_user could
>>> fail), so I'm not sure why avoiding it would be helpful?
>>>
>>
>> I think it should be the other way around - copy_to_user is done by the
>> handler and uobj_activate is done by the infrastructure. Therefore, the
>> general doesn't know how to roll back the command and I want to guarantee
>> anything it does after calling the handler succeed.
>
> That just shuffles the requirement around. copy_to_user can always
> fail, and it always has to happen after the idr_add - so there must
> always be an unwind for it. Which means we can allow uobj_activate to
> fail too and use the same basic unwind.
>

[Infra]
validate params
allocate uobjects and IDRs
		[handler]
			copy_from_user
			execute
			if (success) {
				copy_to_user
				return success
			} else {
				return failure
			}

[infra contiune]
if success
	enable_new_uobjs
	unlock_uobjs
	destroy_what_has_to_be_destroyed
else
	delete_new_uobjs
	unlock_uobjs


In that way, the infrastructure doesn't need to roll back commands. Only 
to roll back whatever it executed.

>> IMHO, uverbs_cmd should become write_format->ioctl_format convertor.
>> We don't want to have two paths doing almost the same thing in two different
>> ways.
>
> Right, but that is a long way off...
>
> Jason
>

Matan
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jason Gunthorpe July 20, 2016, 5:56 p.m. UTC | #7
On Wed, Jul 20, 2016 at 10:54:55AM +0300, Matan Barak (External) wrote:
> Not sure -EBUSY, but an error of course. I think this should be done in a
> lower layer. I would like to avoid managing dependencies between object in
> the general infrastructure. It's the same as you can't destroy a PD when
> there's a bounded CQ (you should get some error for that).

Yes, that is my thinking..
> >>I think it should be the other way around - copy_to_user is done by the
> >>handler and uobj_activate is done by the infrastructure. Therefore, the
> >>general doesn't know how to roll back the command and I want to guarantee
> >>anything it does after calling the handler succeed.
> >
> >That just shuffles the requirement around. copy_to_user can always
> >fail, and it always has to happen after the idr_add - so there must
> >always be an unwind for it. Which means we can allow uobj_activate to
> >fail too and use the same basic unwind.
> >
> 
> [Infra]
> validate params
> allocate uobjects and IDRs
> 		[handler]
> 			copy_from_user
> 			execute
> 			if (success) {
> 				copy_to_user
> 				return success
> 			} else {
> 				return failure
> 			}
> 
> [infra contiune]
> if success
> 	enable_new_uobjs
> 	unlock_uobjs
> 	destroy_what_has_to_be_destroyed
> else
> 	delete_new_uobjs
> 	unlock_uobjs
> 
> 
> In that way, the infrastructure doesn't need to roll back commands. Only to
> roll back whatever it executed.

I'd really prefer to avoid this kind of flow where the idr is
installed at the top. The current code does that and uses 'live' to
fix it, but that is not how idrs are typically used in the kernel, and
getting rid of live is a noble goal, IMHO.

I suggest you use the same basic flow but 'handler' returns a
functional uobj that is *not* inside the idr, and the destroy of a
functional uboj is simply handled by calling the types' destroy
method, like any other case.

To make this work better I suggest having a standard place in the
ioctl response for the allocated uobj # and let the core code fill
that in, that way we don't need to copy_to_user in the handler for the
uobj #.

If idr insertion fails, then call the type's destroy function, just
like on a fd close or otherwise. No trouble for the core code.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/infiniband/core/uidr.c b/drivers/infiniband/core/uidr.c
index 72c4c77..6018717 100644
--- a/drivers/infiniband/core/uidr.c
+++ b/drivers/infiniband/core/uidr.c
@@ -88,7 +88,7 @@  struct ib_uobject *uverbs_get_type_from_idr(struct uverbs_uobject_type *type,
 		if (!uobj)
 			return ERR_PTR(-ENOMEM);
 
-		init_uobj(uobj, 0, ucontext, &type->lock_class);
+		init_uobj(uobj, 0, ucontext);
 
 		/* lock idr */
 		ret = ib_uverbs_uobject_add(uobj, type);
@@ -213,10 +213,6 @@  static struct ib_uobject *idr_read_uobj(int id, struct ib_ucontext *context,
 	if (!uobj)
 		return NULL;
 
-	if (nested)
-		down_read_nested(&uobj->mutex, SINGLE_DEPTH_NESTING);
-	else
-		down_read(&uobj->mutex);
 	if (!uobj->live) {
 		put_uobj_read(uobj);
 		return NULL;
@@ -233,11 +229,8 @@  struct ib_uobject *idr_write_uobj(int id, struct ib_ucontext *context)
 	if (!uobj)
 		return NULL;
 
-	down_write(&uobj->mutex);
-	if (!uobj->live) {
-		put_uobj_write(uobj);
+	if (!uobj->live)
 		return NULL;
-	}
 
 	return uobj;
 }
diff --git a/drivers/infiniband/core/uobject.c b/drivers/infiniband/core/uobject.c
index c3d1098..ed862e8 100644
--- a/drivers/infiniband/core/uobject.c
+++ b/drivers/infiniband/core/uobject.c
@@ -179,13 +179,11 @@  void ib_uverbs_uobject_enable(struct ib_uobject *uobject)
  */
 
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
-	       struct ib_ucontext *context, struct uverbs_lock_class *c)
+	       struct ib_ucontext *context)
 {
 	uobj->user_handle = user_handle;
 	uobj->context     = context;
 	kref_init(&uobj->ref);
-	init_rwsem(&uobj->mutex);
-	lockdep_set_class_and_name(&uobj->mutex, &c->key, c->name);
 	uobj->live        = 0;
 }
 
@@ -201,12 +199,10 @@  void put_uobj(struct ib_uobject *uobj)
 
 void put_uobj_read(struct ib_uobject *uobj)
 {
-	up_read(&uobj->mutex);
 	put_uobj(uobj);
 }
 
 void put_uobj_write(struct ib_uobject *uobj)
 {
-	up_write(&uobj->mutex);
 	put_uobj(uobj);
 }
diff --git a/drivers/infiniband/core/uobject.h b/drivers/infiniband/core/uobject.h
index 13fdaef..3514a1b 100644
--- a/drivers/infiniband/core/uobject.h
+++ b/drivers/infiniband/core/uobject.h
@@ -48,12 +48,12 @@  struct uverbs_uobject_type {
 		     struct ib_uobject *uobject,
 		     struct ib_ucontext *ucontext);
 	u16			obj_type;
-	struct uverbs_lock_class lock_class;
 };
 
 /* embed in ucontext per type */
 struct uverbs_uobject_list {
 	struct uverbs_uobject_type	*type;
+	/* TODO: replace with hash for faster access */
 	struct list_head		list;
 	struct list_head		type_list;
 };
@@ -64,7 +64,7 @@  void ib_uverbs_uobject_remove(struct ib_uobject *uobject);
 void ib_uverbs_uobject_enable(struct ib_uobject *uobject);
 
 void init_uobj(struct ib_uobject *uobj, u64 user_handle,
-	       struct ib_ucontext *context, struct uverbs_lock_class *c);
+	       struct ib_ucontext *context);
 
 void release_uobj(struct kref *kref);
 void put_uobj(struct ib_uobject *uobj);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 22406fe..148e26e 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -46,16 +46,6 @@ 
 
 #include "core_priv.h"
 
-static struct uverbs_lock_class pd_lock_class	= { .name = "PD-uobj" };
-static struct uverbs_lock_class mr_lock_class	= { .name = "MR-uobj" };
-static struct uverbs_lock_class mw_lock_class	= { .name = "MW-uobj" };
-static struct uverbs_lock_class cq_lock_class	= { .name = "CQ-uobj" };
-static struct uverbs_lock_class qp_lock_class	= { .name = "QP-uobj" };
-static struct uverbs_lock_class ah_lock_class	= { .name = "AH-uobj" };
-static struct uverbs_lock_class srq_lock_class	= { .name = "SRQ-uobj" };
-static struct uverbs_lock_class xrcd_lock_class = { .name = "XRCD-uobj" };
-static struct uverbs_lock_class rule_lock_class = { .name = "RULE-uobj" };
-
 ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 			      struct ib_device *ib_dev,
 			      const char __user *buf,
@@ -308,8 +298,8 @@  ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &pd_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = ib_dev->alloc_pd(ib_dev, file->ucontext, &udata);
 	if (IS_ERR(pd)) {
@@ -342,7 +332,7 @@  ssize_t ib_uverbs_alloc_pd(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -543,9 +533,8 @@  ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 		goto err_tree_mutex_unlock;
 	}
 
-	init_uobj(&obj->uobject, 0, file->ucontext, &xrcd_lock_class);
-
-	down_write(&obj->uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uobject, 0, file->ucontext);
 
 	if (!xrcd) {
 		xrcd = ib_dev->alloc_xrcd(ib_dev, file->ucontext, &udata);
@@ -595,7 +584,7 @@  ssize_t ib_uverbs_open_xrcd(struct ib_uverbs_file *file,
 	mutex_unlock(&file->mutex);
 
 	obj->uobject.live = 1;
-	up_write(&obj->uobject.mutex);
+	up_read(&file->close_sem);
 
 	mutex_unlock(&file->device->xrcd_tree_mutex);
 	return in_len;
@@ -737,8 +726,8 @@  ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &mr_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -791,7 +780,7 @@  ssize_t ib_uverbs_reg_mr(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -959,8 +948,8 @@  ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, 0, file->ucontext, &mw_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -1007,7 +996,7 @@  ssize_t ib_uverbs_alloc_mw(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -1129,8 +1118,8 @@  static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 	if (!obj)
 		return ERR_PTR(-ENOMEM);
 
-	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext, &cq_lock_class);
-	down_write(&obj->uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->comp_channel >= 0) {
 		ev_file = ib_uverbs_lookup_comp_file(cmd->comp_channel);
@@ -1188,7 +1177,7 @@  static struct ib_ucq_object *create_cq(struct ib_uverbs_file *file,
 
 	obj->uobject.live = 1;
 
-	up_write(&obj->uobject.mutex);
+	up_read(&file->close_sem);
 
 	return obj;
 
@@ -1530,9 +1519,8 @@  static int create_qp(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext,
-		  &qp_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->qp_type == IB_QPT_XRC_TGT) {
 		xrcd = idr_read_xrcd(cmd->pd_handle, file->ucontext,
@@ -1692,7 +1680,7 @@  static int create_qp(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return 0;
 err_cb:
@@ -1853,8 +1841,8 @@  ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext, &qp_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd.user_handle, file->ucontext);
 
 	xrcd = idr_read_xrcd(cmd.pd_handle, file->ucontext, &xrcd_uobj);
 	if (!xrcd) {
@@ -1904,7 +1892,7 @@  ssize_t ib_uverbs_open_qp(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -2593,8 +2581,8 @@  ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 	if (!uobj)
 		return -ENOMEM;
 
-	init_uobj(uobj, cmd.user_handle, file->ucontext, &ah_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, cmd.user_handle, file->ucontext);
 
 	pd = idr_read_pd(cmd.pd_handle, file->ucontext);
 	if (!pd) {
@@ -2644,7 +2632,7 @@  ssize_t ib_uverbs_create_ah(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 
 	return in_len;
 
@@ -2904,8 +2892,8 @@  int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 		err = -ENOMEM;
 		goto err_free_attr;
 	}
-	init_uobj(uobj, 0, file->ucontext, &rule_lock_class);
-	down_write(&uobj->mutex);
+	down_read(&file->close_sem);
+	init_uobj(uobj, 0, file->ucontext);
 
 	qp = idr_read_qp(cmd.qp_handle, file->ucontext);
 	if (!qp) {
@@ -2975,7 +2963,7 @@  int ib_uverbs_ex_create_flow(struct ib_uverbs_file *file,
 
 	uobj->live = 1;
 
-	up_write(&uobj->mutex);
+	up_read(&file->close_sem);
 	kfree(flow_attr);
 	if (cmd.flow_attr.num_of_specs)
 		kfree(kern_flow_attr);
@@ -3055,8 +3043,8 @@  static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 	if (!obj)
 		return -ENOMEM;
 
-	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext, &srq_lock_class);
-	down_write(&obj->uevent.uobject.mutex);
+	down_read(&file->close_sem);
+	init_uobj(&obj->uevent.uobject, cmd->user_handle, file->ucontext);
 
 	if (cmd->srq_type == IB_SRQT_XRC) {
 		attr.ext.xrc.xrcd  = idr_read_xrcd(cmd->xrcd_handle, file->ucontext, &xrcd_uobj);
@@ -3144,7 +3132,7 @@  static int __uverbs_create_xsrq(struct ib_uverbs_file *file,
 
 	obj->uevent.uobject.live = 1;
 
-	up_write(&obj->uevent.uobject.mutex);
+	up_read(&file->close_sem);
 
 	return 0;
 
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index e402473..84d72d2 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1356,7 +1356,6 @@  struct ib_uobject {
 	int			id;		/* index into kernel idr */
 	struct kref		ref;
 	atomic_t		usecnt;
-	struct rw_semaphore	mutex;		/* protects .live */
 	struct rcu_head		rcu;		/* kfree_rcu() overhead */
 	int			live;
 	/* List of object under uverbs_object_type */