diff mbox

[v2,02/16] xprtrdma: Warn when there are orphaned IB objects

Message ID 20150511180235.31263.71754.stgit@manet.1015granger.net (mailing list archive)
State Not Applicable
Headers show

Commit Message

Chuck Lever III May 11, 2015, 6:02 p.m. UTC
WARN during transport destruction if ib_dealloc_pd() fails. This is
a sign that xprtrdma orphaned one or more RDMA API objects at some
point, which can pin lower layer kernel modules and cause shutdown
to hang.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
---
 net/sunrpc/xprtrdma/verbs.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Schumaker, Anna May 12, 2015, 6:12 p.m. UTC | #1
Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause an RDMA object to get orphaned in the first place?  Is there any way to prevent that?

Anna

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
> ---
>  net/sunrpc/xprtrdma/verbs.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
>  		dprintk("RPC:       %s: ib_dereg_mr returned %i\n",
>  			__func__, rc);
>  	}
> +
>  	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
>  		if (ia->ri_id->qp)
>  			rdma_destroy_qp(ia->ri_id);
>  		rdma_destroy_id(ia->ri_id);
>  		ia->ri_id = NULL;
>  	}
> -	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> -		rc = ib_dealloc_pd(ia->ri_pd);
> -		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
> -			__func__, rc);
> -	}
> +
> +	/* If the pd is still busy, xprtrdma missed freeing a resource */
> +	if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> +		WARN_ON(ib_dealloc_pd(ia->ri_pd));
>  }
>  
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Schumaker, Anna May 12, 2015, 6:12 p.m. UTC | #2
Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause an RDMA object to get orphaned in the first place?  Is there any way to prevent that?

Anna

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
> ---
>  net/sunrpc/xprtrdma/verbs.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
>  		dprintk("RPC:       %s: ib_dereg_mr returned %i\n",
>  			__func__, rc);
>  	}
> +
>  	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
>  		if (ia->ri_id->qp)
>  			rdma_destroy_qp(ia->ri_id);
>  		rdma_destroy_id(ia->ri_id);
>  		ia->ri_id = NULL;
>  	}
> -	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> -		rc = ib_dealloc_pd(ia->ri_pd);
> -		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
> -			__func__, rc);
> -	}
> +
> +	/* If the pd is still busy, xprtrdma missed freeing a resource */
> +	if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> +		WARN_ON(ib_dealloc_pd(ia->ri_pd));
>  }
>  
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Schumaker, Anna May 12, 2015, 6:14 p.m. UTC | #3
Hi Chuck,

On 05/11/2015 02:02 PM, Chuck Lever wrote:
> WARN during transport destruction if ib_dealloc_pd() fails. This is
> a sign that xprtrdma orphaned one or more RDMA API objects at some
> point, which can pin lower layer kernel modules and cause shutdown
> to hang.

I'm curious, what would cause the API objects to get orphaned in the first place?  Is there any way to prevent it?

Anna

> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
> ---
>  net/sunrpc/xprtrdma/verbs.c |   10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index 4870d27..51900e6 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
>  		dprintk("RPC:       %s: ib_dereg_mr returned %i\n",
>  			__func__, rc);
>  	}
> +
>  	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
>  		if (ia->ri_id->qp)
>  			rdma_destroy_qp(ia->ri_id);
>  		rdma_destroy_id(ia->ri_id);
>  		ia->ri_id = NULL;
>  	}
> -	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
> -		rc = ib_dealloc_pd(ia->ri_pd);
> -		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
> -			__func__, rc);
> -	}
> +
> +	/* If the pd is still busy, xprtrdma missed freeing a resource */
> +	if (ia->ri_pd && !IS_ERR(ia->ri_pd))
> +		WARN_ON(ib_dealloc_pd(ia->ri_pd));
>  }
>  
>  /*
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever III May 12, 2015, 6:20 p.m. UTC | #4
On May 12, 2015, at 2:12 PM, Anna Schumaker <Anna.Schumaker@netapp.com> wrote:

> Hi Chuck,
> 
> On 05/11/2015 02:02 PM, Chuck Lever wrote:
>> WARN during transport destruction if ib_dealloc_pd() fails. This is
>> a sign that xprtrdma orphaned one or more RDMA API objects at some
>> point, which can pin lower layer kernel modules and cause shutdown
>> to hang.
> 
> I'm curious, what would cause an RDMA object to get orphaned in the first place?

A leaked object means there’s a software bug in the API consumer, which
is xprtrdma in this case. xprtrdma is supposed to track and clean up
every object it creates.

> Is there any way to prevent that?

The usual thing to do is find and fix the bug that allowed the leak.

> Anna
> 
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> Reviewed-by: Steve Wise <swise@opengridcomputing.com>
>> Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
>> Reviewed-by: Devesh Sharma <devesh.sharma@avagotech.com>
>> ---
>> net/sunrpc/xprtrdma/verbs.c |   10 +++++-----
>> 1 file changed, 5 insertions(+), 5 deletions(-)
>> 
>> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
>> index 4870d27..51900e6 100644
>> --- a/net/sunrpc/xprtrdma/verbs.c
>> +++ b/net/sunrpc/xprtrdma/verbs.c
>> @@ -702,17 +702,17 @@ rpcrdma_ia_close(struct rpcrdma_ia *ia)
>> 		dprintk("RPC:       %s: ib_dereg_mr returned %i\n",
>> 			__func__, rc);
>> 	}
>> +
>> 	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
>> 		if (ia->ri_id->qp)
>> 			rdma_destroy_qp(ia->ri_id);
>> 		rdma_destroy_id(ia->ri_id);
>> 		ia->ri_id = NULL;
>> 	}
>> -	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
>> -		rc = ib_dealloc_pd(ia->ri_pd);
>> -		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
>> -			__func__, rc);
>> -	}
>> +
>> +	/* If the pd is still busy, xprtrdma missed freeing a resource */
>> +	if (ia->ri_pd && !IS_ERR(ia->ri_pd))
>> +		WARN_ON(ib_dealloc_pd(ia->ri_pd));
>> }
>> 
>> /*
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
Chuck Lever
chuck[dot]lever[at]oracle[dot]com



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index 4870d27..51900e6 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -702,17 +702,17 @@  rpcrdma_ia_close(struct rpcrdma_ia *ia)
 		dprintk("RPC:       %s: ib_dereg_mr returned %i\n",
 			__func__, rc);
 	}
+
 	if (ia->ri_id != NULL && !IS_ERR(ia->ri_id)) {
 		if (ia->ri_id->qp)
 			rdma_destroy_qp(ia->ri_id);
 		rdma_destroy_id(ia->ri_id);
 		ia->ri_id = NULL;
 	}
-	if (ia->ri_pd != NULL && !IS_ERR(ia->ri_pd)) {
-		rc = ib_dealloc_pd(ia->ri_pd);
-		dprintk("RPC:       %s: ib_dealloc_pd returned %i\n",
-			__func__, rc);
-	}
+
+	/* If the pd is still busy, xprtrdma missed freeing a resource */
+	if (ia->ri_pd && !IS_ERR(ia->ri_pd))
+		WARN_ON(ib_dealloc_pd(ia->ri_pd));
 }
 
 /*