diff mbox

list corruption in IPOIB

Message ID 519898B0.1000901@profitbricks.com (mailing list archive)
State Rejected
Headers show

Commit Message

Jinpu Wang May 19, 2013, 9:17 a.m. UTC
On 2013?05?19? 08:00, Or Gerlitz wrote:
> On 19/05/2013 00:36, Jack Wang wrote:
>> I tried 3.4.23, and mainline kernel from Roland's rdma-for-linus, we
>> added bug injection interface,  run multithread iperf, and switched ib
>> mode between connected and datagram in sync on each side as Shlomo
>> suggested.
> 
> Can you be more specific re the  bug injection interface, is that
> existing kernel mechanism or something you added? so the bug triggers
> when you run iperf in multi-threaded mode AND in parallel inject errors
> AND  in parallel switch between datagram and connected mode? bee --- I
> assume this isn't something you do just for the fun of it... so some
> problem X hits you in production and this problem Y you get with the
> above juggling, any known or empiric relation between the two?
> 
> Or.

we added inject_bug sysfs node to make function run into error case,
like something below.
Yes, you are right, we want to speedup the bug reproduce process,
and we saw the warning and come to conclusion the neigh->list corrupted
some where.
What's your opinion?

Regards,
Jack
                           wc->status, wr_id, wc->vendor_err);
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Or Gerlitz May 20, 2013, 9:05 a.m. UTC | #1
On 19/05/2013 12:17, Jack Wang wrote:
> we added inject_bug sysfs node to make function run into error case, like something below. Yes, you are right, we want to speedup the bug reproduce process, and we saw the warning and come to conclusion the neigh->list corrupted some where. What's your opinion?

Yes, for the synthetic experiment you made, there's a possible point here:

Under the CM error flow (e.g your injected error), we delete a 
neighbour, but we also do it from the flow that flush neighbours (e.g 
when changing the device mode from UD to CM, as you did in your script). 
When this happens concurrently, these two code pieces call for deleting 
the neighbour from the list. So the spinlock might not be enough and we 
should have do list_del_init(&neigh->list) instead of list_del, helps?

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang May 20, 2013, 9:10 a.m. UTC | #2
which list_del do you mean? in ipoib_cm_tx_start?

On Mon, May 20, 2013 at 11:05 AM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 19/05/2013 12:17, Jack Wang wrote:
>>
>> we added inject_bug sysfs node to make function run into error case, like
>> something below. Yes, you are right, we want to speedup the bug reproduce
>> process, and we saw the warning and come to conclusion the neigh->list
>> corrupted some where. What's your opinion?
>
>
> Yes, for the synthetic experiment you made, there's a possible point here:
>
> Under the CM error flow (e.g your injected error), we delete a neighbour,
> but we also do it from the flow that flush neighbours (e.g when changing the
> device mode from UD to CM, as you did in your script). When this happens
> concurrently, these two code pieces call for deleting the neighbour from the
> list. So the spinlock might not be enough and we should have do
> list_del_init(&neigh->list) instead of list_del, helps?
>
> Or.
Or Gerlitz May 20, 2013, 10:58 a.m. UTC | #3
On 20/05/2013 12:10, Jinpu Wang wrote:
> which list_del do you mean? in ipoib_cm_tx_start?
yes, but not only, you can start with 5KG hammer and convert all 
thesehits to list_del_init

linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang May 20, 2013, 12:46 p.m. UTC | #4
A quick test show the list_corruption warning is gone, after I convert
 all list_del(&neigh->list) to  list_del_list(&neigh->list).

Test is still running, will update status if anything wrong.

Thanks Or.



On Mon, May 20, 2013 at 12:58 PM, Or Gerlitz <ogerlitz@mellanox.com> wrote:
> On 20/05/2013 12:10, Jinpu Wang wrote:
>>
>> which list_del do you mean? in ipoib_cm_tx_start?
>
> yes, but not only, you can start with 5KG hammer and convert all thesehits
> to list_del_init
>
> linux-2.6]# grep list_del drivers/infiniband/ulp/ipoib/*.c | grep neigh
> drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_cm.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
> drivers/infiniband/ulp/ipoib/ipoib_main.c: list_del(&neigh->list);
>
Or Gerlitz May 20, 2013, 12:51 p.m. UTC | #5
On 20/05/2013 15:46, Jinpu Wang wrote:
> A quick test show the list_corruption warning is gone, after I convert
>   all list_del(&neigh->list) to  list_del_list(&neigh->list).

yes, but this wasn't your original problem or was it?

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shlomo Pongratz May 20, 2013, 1:38 p.m. UTC | #6
On 5/20/2013 3:58 PM, Jack Wang wrote:
> I haven't reproduced the original bug we saw in our production 
> environment
> BUG: unable to handle kernel
>   at 0000000000000008
> IP: [<ffffffffa0206c30>] ipoib_cm_tx_reap+0xe0/0x5a0 [ib_ipoib]
>  ...
> RIP: 0010:[<ffffffffa0206c30>] [<ffffffffa0206c30>] 
> ipoib_cm_tx_reap+0xe0/0x5a0 [ib_ipoib]
> RSP: 0018:ffff8825fdcbddb0  EFLAGS: 00010086
> RAX: 0000000000000246 RBX: ffff8807b59c29c0 RCX: 0000000000000000
> RDX: 4400000006000002 RSI: 0000000000000246 RDI: ffff8810026527c0
> RBP: ffff881002652000 R08: 0000000000015360 R09: dead000000200200
> R10: dead000000100100 R11: 0000000000000001 R12: 0000000000000001
> R13: 0000000000000000 R14: ffff8810026523a0 R15: ffff8810026527c0
> FS:  00007f4c9a325700(0000) GS:ffff880807c00000(0000) 
> knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 0000000000000008 CR3: 0000002605e3a000 CR4: 00000000000407f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kworker/u:3 (pid: 61374, threadinfo ffff8825fdcbc000, task 
> ffff8807fd0eafb0)
> Stack:
>  ffff8820043303c0
>  ffff880807d52700
>  ffff8807fd0eafb0
>  ffff8825fdcbdde0
>
>  ffff8810026533b8
>  ffffffffa0039868
>  ffff8825fdcbdde0
>  ffff8805fd549a00
>
>  ffffffff81b9d480
>  ffff8807fd2f4000
>  ffffffffa0206b50
>  0000000000000000
>
> Call Trace:
>  [<ffffffffa0039868>] ? process_req+0xe8/0x1a0 [ib_addr]
>  [<ffffffffa0206b50>] ? ipoib_cm_tx_handler+0x2d0/0x2d0 [ib_ipoib]
>  [<ffffffff81052d64>] ? process_one_work+0x114/0x470
>  [<ffffffff81055033>] ? worker_thread+0x163/0x3e0
>  [<ffffffff81054ed0>] ? manage_workers+0x200/0x200
>  [<ffffffff81054ed0>] ? manage_workers+0x200/0x200
>  [<ffffffff8105963e>] ? kthread+0x9e/0xb0
>  [<ffffffff8167e9e4>] ? kernel_thread_helper+0x4/0x10
>  [<ffffffff810595a0>] ? kthread_freezable_should_stop+0x60/0x60
>  [<ffffffff8167e9e0>] ? gs_change+0x13/0x13
>  ...
>  [<ffffffffa01fec30>] ipoib_cm_tx_reap+0xe0/0x5a0 [ib_ipoib]
>  RSP <ffff881d275f1db0>
> ---[ end trace 38ff082cbc03dd75 ]---
> Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
> , only the A variant of the crash in has been reproduced:
>
> WARNING: at lib/list_debug.c:49 __list_del_entry+0x63/0xd0()
> Hardware name: System Product Name
> list_del corruption, ffff88020dbd3080->next is LIST_POISON1 
> (dead000000100100)
> Modules linked in: ...
> Pid: 16248, comm: iperf Tainted: G        W  3.4.23-pserver+ #76
> Call Trace:
>  <IRQ>  [<ffffffff8103c21f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8103c316>] warn_slowpath_fmt+0x46/0x50
>  [<ffffffff81428563>] ? do_raw_spin_lock+0xd3/0x140
>  [<ffffffff81428883>] __list_del_entry+0x63/0xd0
>  [<ffffffff81428901>] list_del+0x11/0x40
>  [<ffffffffa02f64c5>] ipoib_cm_handle_tx_wc+0x225/0x380 [ib_ipoib]
>  [<ffffffffa02eea44>] ipoib_poll+0x164/0x190 [ib_ipoib]
>  [<ffffffff815d91fd>] net_rx_action+0x13d/0x320
>  [<ffffffff81044f29>] ? __do_softirq+0x89/0x380
>  [<ffffffff81044f98>] __do_softirq+0xf8/0x380
>  [<ffffffff8174632c>] call_softirq+0x1c/0x30
>  <EOI>  [<ffffffff81004305>] do_softirq+0x95/0xd0
>  [<ffffffff815daacc>] ? dev_queue_xmit+0x29c/0xbf0
>  [<ffffffff8104461b>] local_bh_enable+0xeb/0xf0
>  [<ffffffff815daacc>] dev_queue_xmit+0x29c/0xbf0
>  [<ffffffff815da830>] ? ptype_seq_start+0xb0/0xb0
>  [<ffffffff815e0d87>] neigh_connected_output+0xc7/0x110
>  [<ffffffff8109f36d>] ? trace_hardirqs_on+0xd/0x10
>  [<ffffffff81617386>] ip_finish_output2+0x1c6/0x460
>  [<ffffffff8161723a>] ? ip_finish_output2+0x7a/0x460
>  [<ffffffff81619033>] ip_finish_output+0xc3/0x230
>  [<ffffffff81619510>] ip_output+0xa0/0x110
>  [<ffffffff8161764d>] ip_local_out+0x2d/0x90
>  [<ffffffff816176cb>] ip_send_skb+0x1b/0x60
>  [<ffffffff8163f27b>] udp_send_skb+0x10b/0x380
>  [<ffffffff815c3a70>] ? sock_def_wakeup+0x1b0/0x1b0
>  [<ffffffff81616e90>] ? ip_append_page+0x530/0x530
>  [<ffffffff81641462>] udp_sendmsg+0x3b2/0xb50
>  [<ffffffff8173c530>] ? retint_restore_args+0x13/0x13
>  [<ffffffff8164d9a0>] ? inet_create+0x5b0/0x5b0
>  [<ffffffff815c2310>] ? sock_update_classid+0x150/0x2b0
>  [<ffffffff8164dacb>] inet_sendmsg+0x12b/0x240
>  [<ffffffff8164d9a0>] ? inet_create+0x5b0/0x5b0
>  [<ffffffff815c2272>] ? sock_update_classid+0xb2/0x2b0
>  [<ffffffff815c2310>] ? sock_update_classid+0x150/0x2b0
>  [<ffffffff815bda40>] sock_aio_write+0x190/0x1b0
>  [<ffffffff8142214e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
>  [<ffffffff8116e82a>] do_sync_write+0xea/0x130
>  [<ffffffff8109bfdd>] ? trace_hardirqs_off+0xd/0x10
>  [<ffffffff811713d3>] ? fget_light+0x43/0x490
>  [<ffffffff813b14f3>] ? security_file_permission+0x23/0x90
>  [<ffffffff8116ee82>] vfs_write+0x172/0x190
>  [<ffffffff8116ef91>] sys_write+0x51/0x90
>  [<ffffffff81744de9>] system_call_fastpath+0x16/0x1b
> ---[ end trace 66110390802a41db ]---
>
> after apply
> commit fa16ebed31f336e41970f3f0ea9e8279f6be2d27
>   Author: Shlomo Pongratz <shlomop@mellanox.com 
> <mailto:shlomop@mellanox.com>>
>   Date:   Mon Aug 13 14:39:49 2012 +0000
>
>       IB/ipoib: Add missing locking when CM object is deleted
>
> Above warning is gone, but we still see the warning at the begin of 
> this thread.
>
>
>
> 2013/5/20 Or Gerlitz <ogerlitz@mellanox.com 
> <mailto:ogerlitz@mellanox.com>>
>
>     On 20/05/2013 15:46, Jinpu Wang wrote:
>
>         A quick test show the list_corruption warning is gone, after I
>         convert
>           all list_del(&neigh->list) to  list_del_list(&neigh->list).
>
>
>     yes, but this wasn't your original problem or was it?
>
>
>     --
>     To unsubscribe from this list: send the line "unsubscribe
>     linux-rdma" in
>     the body of a message to majordomo@vger.kernel.org
>     <mailto:majordomo@vger.kernel.org>
>     More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

Hi Jack,

I don't understand what is the current status, that is what do you see 
now after applying the patches.
If you don't get the original bug why did you gave the trace of it? Or 
is it a new trace? It is not clear from your mail.
Please add only the trace of the current issue.

Best regards,

S.P.


--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang May 20, 2013, 2:36 p.m. UTC | #7
> 
> Hi Jack,
> 
> I don't understand what is the current status, that is what do you see
> now after applying the patches.
> If you don't get the original bug why did you gave the trace of it? Or
> is it a new trace? It is not clear from your mail.
> Please add only the trace of the current issue.
> 
> Best regards,
> 
> S.P.

Hi Shlomo,

Sorry for confusion.
Current list corruption is gone in my preliminary test, after I changed
list_del to list_del_init as Or suggested.

As Or asked for the original bug, so I just want to show him the whole
story.

Regards,
Jack


> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 20, 2013, 7 p.m. UTC | #8
On Mon, May 20, 2013 at 5:36 PM, Jack Wang <jinpu.wang@profitbricks.com> wrote:
> Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed
> list_del to list_del_init as Or suggested.
> As Or asked for the original bug, so I just want to show him the whole story.

I am still not clear if  the bug you saw in your production
environment is gone with the list_del_init patch applied, please
clarify.

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang May 20, 2013, 7:38 p.m. UTC | #9
On 2013?05?20? 21:00, Or Gerlitz wrote:
> On Mon, May 20, 2013 at 5:36 PM, Jack Wang <jinpu.wang@profitbricks.com> wrote:
>> Sorry for confusion. Current list corruption is gone in my preliminary test, after I changed
>> list_del to list_del_init as Or suggested.
>> As Or asked for the original bug, so I just want to show him the whole story.
> 
> I am still not clear if  the bug you saw in your production
> environment is gone with the list_del_init patch applied, please
> clarify.
> 
> Or.
> 
The bug in our production environment is introduced in our backport
about ipoib fixes from mainline, and when we hit that bug we reverted
back to old kernel without the backport patch, and the bug didn't happen
for now.

This list_del_init patch do fix list corruption warning, but it's not
the one we hit in production, the list corruption is reproduced in our
test setup with bug injection patch && iperf -P 50 && mode switch.

Is this clear for you now?

Jack
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Or Gerlitz May 20, 2013, 7:50 p.m. UTC | #10
On Mon, May 20, 2013 at 10:38 PM, Jack Wang <jinpu.wang@profitbricks.com> wrote:
>
> The bug in our production environment is introduced in our backport
> about ipoib fixes from mainline, and when we hit that bug we reverted
> back to old kernel without the backport patch, and the bug didn't happen for now.
>
> This list_del_init patch do fix list corruption warning, but it's not the one we hit in production, the list corruption is reproduced in our test setup with bug injection patch && iperf -P 50 && mode switch.
>
> Is this clear for you now?


NO, you say that the list_del_init patch eliminates the "list
corruption warning", does the "list corruption" is still reproduced in
your test setup even when  the patch is applied?! what's the trace?
and what is the trace you see in your production when using kernel X
(which?) patches with commit Y (which?)

Or.
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jinpu Wang May 20, 2013, 7:57 p.m. UTC | #11
On 2013?05?20? 21:50, Or Gerlitz wrote:
> On Mon, May 20, 2013 at 10:38 PM, Jack Wang <jinpu.wang@profitbricks.com> wrote:
>>
>> The bug in our production environment is introduced in our backport
>> about ipoib fixes from mainline, and when we hit that bug we reverted
>> back to old kernel without the backport patch, and the bug didn't happen for now.
>>
>> This list_del_init patch do fix list corruption warning, but it's not the one we hit in production, the list corruption is reproduced in our test setup with bug injection patch && iperf -P 50 && mode switch.
>>
>> Is this clear for you now?
> 
> 
> NO, you say that the list_del_init patch eliminates the "list
> corruption warning", does the "list corruption" is still reproduced in
> your test setup even when  the patch is applied?! what's the trace?
> and what is the trace you see in your production when using kernel X
> (which?) patches with commit Y (which?)
> 
> Or.
> 

No, the list_del_init fixed the list corruption warning, no list
corruption anymore.

The trace is in my earlier mail in this thread.

Jack



--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -797,10 +797,12 @@  void ipoib_cm_handle_tx_wc(struct net_device *dev,
struct ib_wc *wc)
            test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
                netif_wake_queue(dev);

-       if (wc->status != IB_WC_SUCCESS &&
-           wc->status != IB_WC_WR_FLUSH_ERR) {
+       if (priv->inject_bug ||
+           (wc->status != IB_WC_SUCCESS &&
+            wc->status != IB_WC_WR_FLUSH_ERR)) {
                struct ipoib_neigh *neigh;

+               priv->inject_bug = 0;
                ipoib_dbg(priv, "failed cm send event "
                           "(status=%d, wrid=%d vend_err %x)\n",