diff mbox

[v2,07/10] xprtrdma: Display async errors

Message ID 20141109011501.8806.23478.stgit@manet.1015granger.net (mailing list archive)
State New, archived
Headers show

Commit Message

Chuck Lever Nov. 9, 2014, 1:15 a.m. UTC
An async error upcall is a hard error, and should be reported in
the system log.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 net/sunrpc/xprtrdma/verbs.c |   36 ++++++++++++++++++++++++++++++++----
 1 file changed, 32 insertions(+), 4 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sagi Grimberg Nov. 11, 2014, 2:30 p.m. UTC | #1
On 11/9/2014 3:15 AM, Chuck Lever wrote:
> An async error upcall is a hard error, and should be reported in
> the system log.
>

Could be useful to others... Any chance you put this in ib_core for all
of us?

Sagi.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Chuck Lever Nov. 11, 2014, 4:52 p.m. UTC | #2
> On Nov 11, 2014, at 8:30 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
> 
>> On 11/9/2014 3:15 AM, Chuck Lever wrote:
>> An async error upcall is a hard error, and should be reported in
>> the system log.
>> 
> 
> Could be useful to others... Any chance you put this in ib_core for all
> of us?

Eventually. We certainly wouldn't want copies of this array of strings
to appear many times in the kernel. That would be a waste of space.

I have a similar patch that adds an array for CQ status codes, and
xprtrdma has a string array already for connection status. Are those
also interesting?--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sagi Grimberg Nov. 11, 2014, 6:49 p.m. UTC | #3
On 11/11/2014 6:52 PM, Chuck Lever wrote:
>
>> On Nov 11, 2014, at 8:30 AM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
>>
>>> On 11/9/2014 3:15 AM, Chuck Lever wrote:
>>> An async error upcall is a hard error, and should be reported in
>>> the system log.
>>>
>>
>> Could be useful to others... Any chance you put this in ib_core for all
>> of us?
>
> Eventually. We certainly wouldn't want copies of this array of strings
> to appear many times in the kernel. That would be a waste of space.
>
> I have a similar patch that adds an array for CQ status codes, and
> xprtrdma has a string array already for connection status. Are those
> also interesting?
>

Yep, also RDMA_CM events. Would certainly help people avoid source
code navigation to understand what is going on...
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz Nov. 11, 2014, 8:30 p.m. UTC | #4
On Tue, Nov 11, 2014 at 8:49 PM, Sagi Grimberg <sagig@dev.mellanox.co.il> wrote:
> On 11/11/2014 6:52 PM, Chuck Lever wrote:
>>
>>
>>> On Nov 11, 2014, at 8:30 AM, Sagi Grimberg <sagig@dev.mellanox.co.il>
>>> wrote:
>>>
>>>> On 11/9/2014 3:15 AM, Chuck Lever wrote:
>>>> An async error upcall is a hard error, and should be reported in
>>>> the system log.
>>>>
>>>
>>> Could be useful to others... Any chance you put this in ib_core for all
>>> of us?
>>
>>
>> Eventually. We certainly wouldn't want copies of this array of strings
>> to appear many times in the kernel. That would be a waste of space.
>>
>> I have a similar patch that adds an array for CQ status codes, and
>> xprtrdma has a string array already for connection status. Are those
>> also interesting?
>>
>
> Yep, also RDMA_CM events. Would certainly help people avoid source
> code navigation to understand what is going on...

Oh yes, Chuck, good if you can pick  this up, AFAIRemeber most of the
strings are already in the RDS code (net/rds) - please re-factor them
from there into some IB core helpers, thanks alot!!
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
index e6ac964..5783c1a 100644
--- a/net/sunrpc/xprtrdma/verbs.c
+++ b/net/sunrpc/xprtrdma/verbs.c
@@ -106,6 +106,32 @@  rpcrdma_run_tasklet(unsigned long data)
 
 static DECLARE_TASKLET(rpcrdma_tasklet_g, rpcrdma_run_tasklet, 0UL);
 
+static const char * const async_event[] = {
+	"CQ error",
+	"QP fatal error",
+	"QP request error",
+	"QP access error",
+	"communication established",
+	"send queue drained",
+	"path migration successful",
+	"path mig error",
+	"device fatal error",
+	"port active",
+	"port error",
+	"LID change",
+	"P_key change",
+	"SM change",
+	"SRQ error",
+	"SRQ limit reached",
+	"last WQE reached",
+	"client reregister",
+	"GID change",
+};
+
+#define ASYNC_MSG(status)					\
+	((status) < ARRAY_SIZE(async_event) ?			\
+		async_event[(status)] : "unknown async error")
+
 static void
 rpcrdma_schedule_tasklet(struct list_head *sched_list)
 {
@@ -122,8 +148,9 @@  rpcrdma_qp_async_error_upcall(struct ib_event *event, void *context)
 {
 	struct rpcrdma_ep *ep = context;
 
-	dprintk("RPC:       %s: QP error %X on device %s ep %p\n",
-		__func__, event->event, event->device->name, context);
+	pr_err("RPC:       %s: %s on device %s ep %p\n",
+	       __func__, ASYNC_MSG(event->event),
+		event->device->name, context);
 	if (ep->rep_connected == 1) {
 		ep->rep_connected = -EIO;
 		ep->rep_func(ep);
@@ -136,8 +163,9 @@  rpcrdma_cq_async_error_upcall(struct ib_event *event, void *context)
 {
 	struct rpcrdma_ep *ep = context;
 
-	dprintk("RPC:       %s: CQ error %X on device %s ep %p\n",
-		__func__, event->event, event->device->name, context);
+	pr_err("RPC:       %s: %s on device %s ep %p\n",
+	       __func__, ASYNC_MSG(event->event),
+		event->device->name, context);
 	if (ep->rep_connected == 1) {
 		ep->rep_connected = -EIO;
 		ep->rep_func(ep);