diff mbox

[RFC] NFS/RDMA Release resources in svcrdma when device is removed

Message ID 556F4A0C.2030804@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shirley Ma June 3, 2015, 6:40 p.m. UTC
When removing underlying RDMA device, the rmmod will hang forever if there
are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts 
could also prevent the server from shutting down. Further debugging shows 
that the existing connections are not teared down and resource are not 
released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the 
original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler 
thus svc_xprt_free is never invoked to release the existing connection resources.

The patch has been passed removing, adding device back and forth without 
stopping NFS/RDMA service. This will also allow a device to be unplugged 
and swapped out without shutting down NFS service.

Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
---
 net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
 1 file changed, 1 insertion(+)


Shirley
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Chuck Lever III June 3, 2015, 6:44 p.m. UTC | #1
On Jun 3, 2015, at 2:40 PM, Shirley Ma <shirley.ma@oracle.com> wrote:

> When removing underlying RDMA device, the rmmod will hang forever if there
> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts 
> could also prevent the server from shutting down. Further debugging shows 
> that the existing connections are not teared down and resource are not 
> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the 
> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler 
> thus svc_xprt_free is never invoked to release the existing connection resources.
> 
> The patch has been passed removing, adding device back and forth without 
> stopping NFS/RDMA service. This will also allow a device to be unplugged 
> and swapped out without shutting down NFS service.
> 
> Signed-off-by: Shirley Ma <shirley.ma@oracle.com>

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

> ---
> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> index f609c1c..2b82569 100644
> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
> 		if (xprt) {
> 			set_bit(XPT_CLOSE, &xprt->xpt_flags);
> 			svc_xprt_enqueue(xprt);
> +			svc_xprt_put(xprt);
> 		}
> 		break;
> 	default:
> 
> Shirley

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III June 3, 2015, 6:49 p.m. UTC | #2
On Jun 3, 2015, at 2:44 PM, Chuck Lever <chuck.lever@oracle.com> wrote:

> 
> On Jun 3, 2015, at 2:40 PM, Shirley Ma <shirley.ma@oracle.com> wrote:
> 
>> When removing underlying RDMA device, the rmmod will hang forever if there
>> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts 
>> could also prevent the server from shutting down. Further debugging shows 
>> that the existing connections are not teared down and resource are not 
>> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the 
>> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler 
>> thus svc_xprt_free is never invoked to release the existing connection resources.
>> 
>> The patch has been passed removing, adding device back and forth without 
>> stopping NFS/RDMA service. This will also allow a device to be unplugged 
>> and swapped out without shutting down NFS service.

And maybe also add:

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

here.

>> Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
> 
> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
> 
>> ---
>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
>> 1 file changed, 1 insertion(+)
>> 
>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> index f609c1c..2b82569 100644
>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
>> 		if (xprt) {
>> 			set_bit(XPT_CLOSE, &xprt->xpt_flags);
>> 			svc_xprt_enqueue(xprt);
>> +			svc_xprt_put(xprt);
>> 		}
>> 		break;
>> 	default:
>> 
>> Shirley
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shirley Ma June 3, 2015, 9:56 p.m. UTC | #3
On 06/03/2015 11:49 AM, Chuck Lever wrote:
> 
> On Jun 3, 2015, at 2:44 PM, Chuck Lever <chuck.lever@oracle.com> wrote:
> 
>>
>> On Jun 3, 2015, at 2:40 PM, Shirley Ma <shirley.ma@oracle.com> wrote:
>>
>>> When removing underlying RDMA device, the rmmod will hang forever if there
>>> are any outstanding NFS/RDMA client mounts. The outstanding NFS/RDMA counts 
>>> could also prevent the server from shutting down. Further debugging shows 
>>> that the existing connections are not teared down and resource are not 
>>> released when receiving RDMA_CM_EVENT_DEVICE_REMOVAL event. It seems the 
>>> original code missing svc_xprt_put() in RDMA_CM_EVENT_REMOVAL event handler 
>>> thus svc_xprt_free is never invoked to release the existing connection resources.
>>>
>>> The patch has been passed removing, adding device back and forth without 
>>> stopping NFS/RDMA service. This will also allow a device to be unplugged 
>>> and swapped out without shutting down NFS service.
> 
> And maybe also add:
> 
> BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=252

Yes, this patch has addressed above problem too. I forgot this bug.
 
> here.
> 
>>> Signed-off-by: Shirley Ma <shirley.ma@oracle.com>
>>
>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>
>>> ---
>>> net/sunrpc/xprtrdma/svc_rdma_transport.c | 1 +
>>> 1 file changed, 1 insertion(+)
>>>
>>> diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> index f609c1c..2b82569 100644
>>> --- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> +++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
>>> @@ -673,6 +673,7 @@ static int rdma_cma_handler(struct rdma_cm_id *cma_id,
>>> 		if (xprt) {
>>> 			set_bit(XPT_CLOSE, &xprt->xpt_flags);
>>> 			svc_xprt_enqueue(xprt);
>>> +			svc_xprt_put(xprt);
>>> 		}
>>> 		break;
>>> 	default:
>>>
>>> Shirley
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/svc_rdma_transport.c b/net/sunrpc/xprtrdma/svc_rdma_transport.c
index f609c1c..2b82569 100644
--- a/net/sunrpc/xprtrdma/svc_rdma_transport.c
+++ b/net/sunrpc/xprtrdma/svc_rdma_transport.c
@@ -673,6 +673,7 @@  static int rdma_cma_handler(struct rdma_cm_id *cma_id,
 		if (xprt) {
 			set_bit(XPT_CLOSE, &xprt->xpt_flags);
 			svc_xprt_enqueue(xprt);
+			svc_xprt_put(xprt);
 		}
 		break;
 	default: