svc_xprt_put is no longer BH-safe
diff mbox

Message ID 1934654A-78D7-4180-A0AA-4B144254BFC2@oracle.com
State New
Headers show

Commit Message

Chuck Lever Oct. 29, 2016, 4:21 p.m. UTC
Hi Bruce-

I hit this lockdep splat this morning

Oct 29 11:38:25 klimt kernel: =================================
Oct 29 11:38:25 klimt kernel: [ INFO: inconsistent lock state ]
Oct 29 11:38:25 klimt kernel: 4.9.0-rc2-00003-g114ddae #9 Not tainted
Oct 29 11:38:25 klimt kernel: ---------------------------------
Oct 29 11:38:25 klimt kernel: inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
Oct 29 11:38:25 klimt kernel: swapper/6/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
Oct 29 11:38:25 klimt kernel: (
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: ){+.?...}
Oct 29 11:38:25 klimt kernel: , at: 
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: {SOFTIRQ-ON-W} state was registered at:
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffff810e2483>] __lock_acquire+0x343/0x1670
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffff810e3c97>] lock_acquire+0x197/0x1f0
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffff8174cbf8>] _raw_spin_lock+0x38/0x50
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488242>] xprt_switch_put+0x22/0x30 [sunrpc]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa048492d>] svc_xprt_free+0x5d/0x80 [sunrpc]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa048554a>] svc_xprt_release+0x12a/0x140 [sunrpc]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa0487242>] svc_recv+0xcb2/0xed0 [sunrpc]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffffa04d57b8>] nfsd+0xe8/0x160 [nfsd]
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffff810b267b>] kthread+0x10b/0x120
Oct 29 11:38:25 klimt kernel:  
Oct 29 11:38:25 klimt kernel: [<ffffffff8174d8ba>] ret_from_fork+0x2a/0x40
Oct 29 11:38:25 klimt kernel: irq event stamp: 483745258
Oct 29 11:38:25 klimt kernel: hardirqs last  enabled at (483745258): 
Oct 29 11:38:25 klimt kernel: [<ffffffff81093a45>] __local_bh_enable_ip+0x95/0xd0
Oct 29 11:38:25 klimt kernel: hardirqs last disabled at (483745257): 
Oct 29 11:38:25 klimt kernel: [<ffffffff81093a05>] __local_bh_enable_ip+0x55/0xd0
Oct 29 11:38:25 klimt kernel: softirqs last  enabled at (483744848): 
Oct 29 11:38:25 klimt kernel: [<ffffffff81092a52>] _local_bh_enable+0x42/0x50
Oct 29 11:38:25 klimt kernel: softirqs last disabled at (483744849): 
Oct 29 11:38:25 klimt kernel: [<ffffffff81093b4b>] irq_exit+0x5b/0xf0
Oct 29 11:38:25 klimt kernel: #012other info that might help us debug this:
Oct 29 11:38:25 klimt kernel: Possible unsafe locking scenario:
Oct 29 11:38:25 klimt kernel:       CPU0
Oct 29 11:38:25 klimt kernel:       ----
Oct 29 11:38:25 klimt kernel:  lock(
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: );
Oct 29 11:38:25 klimt kernel:  <Interrupt>
Oct 29 11:38:25 klimt kernel:    lock(
Oct 29 11:38:25 klimt kernel: &(&xps->xps_lock)->rlock
Oct 29 11:38:25 klimt kernel: );
Oct 29 11:38:25 klimt kernel: #012 *** DEADLOCK ***
Oct 29 11:38:25 klimt kernel: no locks held by swapper/6/0.
Oct 29 11:38:25 klimt kernel: #012stack backtrace:
Oct 29 11:38:25 klimt kernel: CPU: 6 PID: 0 Comm: swapper/6 Not tainted 4.9.0-rc2-00003-g114ddae #9
Oct 29 11:38:25 klimt kernel: Hardware name: Supermicro Super Server/X10SRL-F, BIOS 1.0c 09/09/2015
Oct 29 11:38:25 klimt kernel: ffff88087bd83be8 ffffffff81392e51 ffff880857008040 ffffffff827e2500
Oct 29 11:38:25 klimt kernel: ffff88087bd83c38 ffffffff811b3e76 0000000000000001 0000000000000001
Oct 29 11:38:25 klimt kernel: 0000000000000000 0000000000000006 ffff880857008040 ffffffff810e1060
Oct 29 11:38:25 klimt kernel: Call Trace:
Oct 29 11:38:25 klimt kernel: <IRQ> 
Oct 29 11:38:25 klimt kernel: [<ffffffff81392e51>] dump_stack+0x85/0xc4
Oct 29 11:38:25 klimt kernel: [<ffffffff811b3e76>] print_usage_bug+0x1eb/0x1fc
Oct 29 11:38:25 klimt kernel: [<ffffffff810e1060>] ? print_irq_inversion_bug+0x200/0x200
Oct 29 11:38:25 klimt kernel: [<ffffffff810e1a65>] mark_lock+0x175/0x290
Oct 29 11:38:25 klimt kernel: [<ffffffff810e23cf>] __lock_acquire+0x28f/0x1670
Oct 29 11:38:25 klimt kernel: [<ffffffff810e3c97>] lock_acquire+0x197/0x1f0
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] ? xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffff8174cbf8>] _raw_spin_lock+0x38/0x50
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] ? xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488056>] xprt_switch_free+0x26/0xb0 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0488242>] xprt_switch_put+0x22/0x30 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa048492d>] svc_xprt_free+0x5d/0x80 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa048501d>] svc_xprt_put+0x1d/0x20 [sunrpc]
Oct 29 11:38:25 klimt kernel: [<ffffffffa05753db>] svc_rdma_wc_receive+0xcb/0xe0 [rpcrdma]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0240d55>] __ib_process_cq+0x35/0xc0 [ib_core]
Oct 29 11:38:25 klimt kernel: [<ffffffffa0240ed2>] ib_poll_handler+0x22/0x60 [ib_core]
Oct 29 11:38:25 klimt kernel: [<ffffffff813c8bb5>] irq_poll_softirq+0x85/0x100
Oct 29 11:38:25 klimt kernel: [<ffffffff817507a9>] __do_softirq+0x1f9/0x425
Oct 29 11:38:25 klimt kernel: [<ffffffff81093b4b>] irq_exit+0x5b/0xf0
Oct 29 11:38:25 klimt kernel: [<ffffffff817502df>] do_IRQ+0xef/0x110
Oct 29 11:38:25 klimt kernel: [<ffffffff8174e016>] common_interrupt+0x96/0x96
Oct 29 11:38:25 klimt kernel: <EOI> 
Oct 29 11:38:25 klimt kernel: [<ffffffff815b996c>] ? cpuidle_enter_state+0x22c/0x300
Oct 29 11:38:25 klimt kernel: [<ffffffff815b9a77>] cpuidle_enter+0x17/0x20
Oct 29 11:38:25 klimt kernel: [<ffffffff810dbe7d>] call_cpuidle+0x3d/0x50
Oct 29 11:38:25 klimt kernel: [<ffffffff810dc13e>] cpu_startup_entry+0x19e/0x240
Oct 29 11:38:25 klimt kernel: [<ffffffff81059590>] start_secondary+0x160/0x1a0


In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
all backchannels') you added:


svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).

However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
everywhere without disabling BHs.

It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
Not sure if svc_xprt_put() was intended to be BH-safe beforehand.

Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
like a temporary solution.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

J . Bruce Fields Nov. 8, 2016, 8:03 p.m. UTC | #1
On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
> Hi Bruce-

Sorry for the slow response!

...
> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
> all backchannels') you added:
> 
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index f5572e3..4f01f63 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
>         /* See comment on corresponding get in xs_setup_bc_tcp(): */
>         if (xprt->xpt_bc_xprt)
>                 xprt_put(xprt->xpt_bc_xprt);
> +       if (xprt->xpt_bc_xps)
> +               xprt_switch_put(xprt->xpt_bc_xps);
>         xprt->xpt_ops->xpo_free(xprt);
>         module_put(owner);
>  }
> 
> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).

Is that necessary?  I wonder why the svcrdma code seems to be taking so
many of its own references on svc_xprts.

> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
> everywhere without disabling BHs.
> 
> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
> 
> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
> like a temporary solution.

Since xpo_free is also called from svc_xprt_put that doesn't sound like
it would change anything.  Or do we not trunk over RDMA for some reason?

--b.

> 
> 
> --
> Chuck Lever
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Nov. 8, 2016, 8:13 p.m. UTC | #2
> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
>> Hi Bruce-
> 
> Sorry for the slow response!
> 
> ...
>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
>> all backchannels') you added:
>> 
>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>> index f5572e3..4f01f63 100644
>> --- a/net/sunrpc/svc_xprt.c
>> +++ b/net/sunrpc/svc_xprt.c
>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
>>        /* See comment on corresponding get in xs_setup_bc_tcp(): */
>>        if (xprt->xpt_bc_xprt)
>>                xprt_put(xprt->xpt_bc_xprt);
>> +       if (xprt->xpt_bc_xps)
>> +               xprt_switch_put(xprt->xpt_bc_xps);
>>        xprt->xpt_ops->xpo_free(xprt);
>>        module_put(owner);
>> }
>> 
>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
> 
> Is that necessary?  I wonder why the svcrdma code seems to be taking so
> many of its own references on svc_xprts.

The idea is to keep the xprt around while Work Requests (I/O) are running,
so that the xprt is guaranteed to be there during work completions. The
completion handlers (where svc_xprt_put is often invoked) run in soft IRQ
context.

It's simple to change completions to use a Work Queue instead, but testing
so far shows that will result in a performance loss. I'm still studying it.

Is there another way to keep the xprt's ref count boosted while I/O is
going on?


>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
>> everywhere without disabling BHs.
>> 
>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
>> 
>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
>> like a temporary solution.
> 
> Since xpo_free is also called from svc_xprt_put that doesn't sound like
> it would change anything.  Or do we not trunk over RDMA for some reason?

It's quite desirable to trunk NFS/RDMA on multiple connections, and it
should work just like it does for TCP, but so far it's not been tested.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J . Bruce Fields Nov. 8, 2016, 8:20 p.m. UTC | #3
On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote:
> 
> > On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
> >> Hi Bruce-
> > 
> > Sorry for the slow response!
> > 
> > ...
> >> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
> >> all backchannels') you added:
> >> 
> >> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >> index f5572e3..4f01f63 100644
> >> --- a/net/sunrpc/svc_xprt.c
> >> +++ b/net/sunrpc/svc_xprt.c
> >> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
> >>        /* See comment on corresponding get in xs_setup_bc_tcp(): */
> >>        if (xprt->xpt_bc_xprt)
> >>                xprt_put(xprt->xpt_bc_xprt);
> >> +       if (xprt->xpt_bc_xps)
> >> +               xprt_switch_put(xprt->xpt_bc_xps);
> >>        xprt->xpt_ops->xpo_free(xprt);
> >>        module_put(owner);
> >> }
> >> 
> >> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
> >> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
> > 
> > Is that necessary?  I wonder why the svcrdma code seems to be taking so
> > many of its own references on svc_xprts.
> 
> The idea is to keep the xprt around while Work Requests (I/O) are running,
> so that the xprt is guaranteed to be there during work completions. The
> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ
> context.
> 
> It's simple to change completions to use a Work Queue instead, but testing
> so far shows that will result in a performance loss. I'm still studying it.
> 
> Is there another way to keep the xprt's ref count boosted while I/O is
> going on?

Why do you need the svc_xprt in the completion?

Can the xpo_detach method wait for any pending completions?

--b.

> 
> 
> >> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
> >> everywhere without disabling BHs.
> >> 
> >> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
> >> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
> >> 
> >> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
> >> like a temporary solution.
> > 
> > Since xpo_free is also called from svc_xprt_put that doesn't sound like
> > it would change anything.  Or do we not trunk over RDMA for some reason?
> 
> It's quite desirable to trunk NFS/RDMA on multiple connections, and it
> should work just like it does for TCP, but so far it's not been tested.
> 
> 
> --
> Chuck Lever
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Nov. 8, 2016, 9:05 p.m. UTC | #4
> On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> 
> On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote:
>> 
>>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
>>> 
>>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
>>>> Hi Bruce-
>>> 
>>> Sorry for the slow response!
>>> 
>>> ...
>>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
>>>> all backchannels') you added:
>>>> 
>>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
>>>> index f5572e3..4f01f63 100644
>>>> --- a/net/sunrpc/svc_xprt.c
>>>> +++ b/net/sunrpc/svc_xprt.c
>>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
>>>>       /* See comment on corresponding get in xs_setup_bc_tcp(): */
>>>>       if (xprt->xpt_bc_xprt)
>>>>               xprt_put(xprt->xpt_bc_xprt);
>>>> +       if (xprt->xpt_bc_xps)
>>>> +               xprt_switch_put(xprt->xpt_bc_xps);
>>>>       xprt->xpt_ops->xpo_free(xprt);
>>>>       module_put(owner);
>>>> }
>>>> 
>>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
>>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
>>> 
>>> Is that necessary?  I wonder why the svcrdma code seems to be taking so
>>> many of its own references on svc_xprts.
>> 
>> The idea is to keep the xprt around while Work Requests (I/O) are running,
>> so that the xprt is guaranteed to be there during work completions. The
>> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ
>> context.
>> 
>> It's simple to change completions to use a Work Queue instead, but testing
>> so far shows that will result in a performance loss. I'm still studying it.
>> 
>> Is there another way to keep the xprt's ref count boosted while I/O is
>> going on?
> 
> Why do you need the svc_xprt in the completion?

1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue
accounting mechanism. SQ accounting has to be updated for each completion:
the completion indicates that the SQ entry used by this Work Request is
now available, and that other callers waiting for enough SQEs can go ahead.

2. When handling a Receive completion, the incoming RPC message is enqueued
on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed.

3. When handling a Read completion, the incoming RPC message is enqueued on
the svc_xprt via svc_xprt_enqueue.


> Can the xpo_detach method wait for any pending completions?

So the completions would call wake_up on a waitqueue, or use a kernel
completion? That sounds more expensive than managing an atomic reference
count.

I suppose some other reference count could be used to trigger a work item
that would invoke xpo_detach.

But based on your comments, then, svc_xprt_put() was never intended to be
BH-safe.


> --b.
> 
>> 
>> 
>>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
>>>> everywhere without disabling BHs.
>>>> 
>>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
>>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
>>>> 
>>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
>>>> like a temporary solution.
>>> 
>>> Since xpo_free is also called from svc_xprt_put that doesn't sound like
>>> it would change anything.  Or do we not trunk over RDMA for some reason?
>> 
>> It's quite desirable to trunk NFS/RDMA on multiple connections, and it
>> should work just like it does for TCP, but so far it's not been tested.


--
Chuck Lever



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
J . Bruce Fields Nov. 8, 2016, 9:42 p.m. UTC | #5
On Tue, Nov 08, 2016 at 04:05:54PM -0500, Chuck Lever wrote:
> 
> > On Nov 8, 2016, at 3:20 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> > 
> > On Tue, Nov 08, 2016 at 03:13:13PM -0500, Chuck Lever wrote:
> >> 
> >>> On Nov 8, 2016, at 3:03 PM, J. Bruce Fields <bfields@fieldses.org> wrote:
> >>> 
> >>> On Sat, Oct 29, 2016 at 12:21:03PM -0400, Chuck Lever wrote:
> >>>> Hi Bruce-
> >>> 
> >>> Sorry for the slow response!
> >>> 
> >>> ...
> >>>> In commit 39a9beab5acb83176e8b9a4f0778749a09341f1f ('rpc: share one xps between
> >>>> all backchannels') you added:
> >>>> 
> >>>> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> >>>> index f5572e3..4f01f63 100644
> >>>> --- a/net/sunrpc/svc_xprt.c
> >>>> +++ b/net/sunrpc/svc_xprt.c
> >>>> @@ -136,6 +136,8 @@ static void svc_xprt_free(struct kref *kref)
> >>>>       /* See comment on corresponding get in xs_setup_bc_tcp(): */
> >>>>       if (xprt->xpt_bc_xprt)
> >>>>               xprt_put(xprt->xpt_bc_xprt);
> >>>> +       if (xprt->xpt_bc_xps)
> >>>> +               xprt_switch_put(xprt->xpt_bc_xps);
> >>>>       xprt->xpt_ops->xpo_free(xprt);
> >>>>       module_put(owner);
> >>>> }
> >>>> 
> >>>> svc_xprt_free() is invoked by svc_xprt_put(). svc_xprt_put() is called
> >>>> from svc_rdma in soft IRQ context (eg. svc_rdma_wc_receive).
> >>> 
> >>> Is that necessary?  I wonder why the svcrdma code seems to be taking so
> >>> many of its own references on svc_xprts.
> >> 
> >> The idea is to keep the xprt around while Work Requests (I/O) are running,
> >> so that the xprt is guaranteed to be there during work completions. The
> >> completion handlers (where svc_xprt_put is often invoked) run in soft IRQ
> >> context.
> >> 
> >> It's simple to change completions to use a Work Queue instead, but testing
> >> so far shows that will result in a performance loss. I'm still studying it.
> >> 
> >> Is there another way to keep the xprt's ref count boosted while I/O is
> >> going on?
> > 
> > Why do you need the svc_xprt in the completion?
> 
> 1. The svc_xprt contains the svcrdma_xprt, which contains the Send Queue
> accounting mechanism. SQ accounting has to be updated for each completion:
> the completion indicates that the SQ entry used by this Work Request is
> now available, and that other callers waiting for enough SQEs can go ahead.
> 
> 2. When handling a Receive completion, the incoming RPC message is enqueued
> on the svc_xprt via svc_xprt_enqueue, unless RDMA Reads are needed.
> 
> 3. When handling a Read completion, the incoming RPC message is enqueued on
> the svc_xprt via svc_xprt_enqueue.
> 
> 
> > Can the xpo_detach method wait for any pending completions?
> 
> So the completions would call wake_up on a waitqueue, or use a kernel
> completion? That sounds more expensive than managing an atomic reference
> count.
> 
> I suppose some other reference count could be used to trigger a work item
> that would invoke xpo_detach.
> 
> But based on your comments, then, svc_xprt_put() was never intended to be
> BH-safe.

I'm not sure what was intended.  It doesn't look to me like any other
callers require it.

--b.

> 
> 
> > --b.
> > 
> >> 
> >> 
> >>>> However, xprt_switch_put() takes a spin lock (xps_lock) which is locked
> >>>> everywhere without disabling BHs.
> >>>> 
> >>>> It looks to me like 39a9beab5acb makes svc_xprt_put() no longer BH-safe?
> >>>> Not sure if svc_xprt_put() was intended to be BH-safe beforehand.
> >>>> 
> >>>> Maybe xprt_switch_put() could be invoked in ->xpo_free, but that seems
> >>>> like a temporary solution.
> >>> 
> >>> Since xpo_free is also called from svc_xprt_put that doesn't sound like
> >>> it would change anything.  Or do we not trunk over RDMA for some reason?
> >> 
> >> It's quite desirable to trunk NFS/RDMA on multiple connections, and it
> >> should work just like it does for TCP, but so far it's not been tested.
> 
> 
> --
> Chuck Lever
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index f5572e3..4f01f63 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -136,6 +136,8 @@  static void svc_xprt_free(struct kref *kref)
        /* See comment on corresponding get in xs_setup_bc_tcp(): */
        if (xprt->xpt_bc_xprt)
                xprt_put(xprt->xpt_bc_xprt);
+       if (xprt->xpt_bc_xps)
+               xprt_switch_put(xprt->xpt_bc_xps);
        xprt->xpt_ops->xpo_free(xprt);
        module_put(owner);
 }