From patchwork Thu Aug 9 13:03:15 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Idan Kedar X-Patchwork-Id: 1300701 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id 1AAFFDFF7B for ; Thu, 9 Aug 2012 13:04:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758003Ab2HIND7 (ORCPT ); Thu, 9 Aug 2012 09:03:59 -0400 Received: from mail-pb0-f46.google.com ([209.85.160.46]:36781 "EHLO mail-pb0-f46.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1757555Ab2HIND4 (ORCPT ); Thu, 9 Aug 2012 09:03:56 -0400 Received: by pbbrr13 with SMTP id rr13so885582pbb.19 for ; Thu, 09 Aug 2012 06:03:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:from:date:message-id:subject:to:cc :content-type:x-gm-message-state; bh=1BqINYga3ElajIGNe43kVa7VHlSK2CIFAVRigGN848s=; b=hoffzPDMPU/TlrCpNd6vvFuVaV6bmRa8huC1D3TNEHYHOj7ZYBXPlu3R0wQNrwd5gy S4sjWWcSeBXfm4Z67Gft+4khuluWEgCYn60m5DCrVauM7ISlcYBVQ0EOLnDJNQuSZqbI d4O83sa90ylBPCo/S2p/YsQRI8ywaupskHy5PdKI4SaFloMoZ4J0kp+4+V1FhSIZWWZQ 8Xw8e8SFjFuGPzTH9a/l+GdIHPag7UnknBvDrFkOtDKyhb3jc+XmPXY1qNfnq67yITIy neBNWo5aRJBX9mlRHVbDCbuwZ83XdQ4TnpFJ9TyBOKCw+ExeuYyrTceUBnKWwtj9xq+L lPGQ== Received: by 10.68.138.169 with SMTP id qr9mr4156870pbb.27.1344517435872; Thu, 09 Aug 2012 06:03:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.68.242.199 with HTTP; Thu, 9 Aug 2012 06:03:15 -0700 (PDT) X-Originating-IP: [79.183.208.134] From: Idan Kedar Date: Thu, 9 Aug 2012 16:03:15 +0300 Message-ID: Subject: return layout on error, BUG/deadlock To: Boaz Harrosh , NFS list Cc: Benny Halevy X-Gm-Message-State: ALoCoQkHdN6lQRN9H+0L3PP1KDLbLdTsnbIbR+wgQ3hpcX3WnW69kTzRTpNer1Po/8+gh6jxRwVW Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org Hi, As a result of some experiments, I wanted to see what happens when I inject an error (hard coded) to the object layout driver. the patch is at the bottom of this mail. the reason I did this is because when I inject errors in my modified version of the object layout driver, I get the same BUG Tigran reported about yesterday: nfs4proc.c:6252 : BUG_ON(!list_empty(&lo->plh_segs)); In my modified version (based on kernel 3.3), the bug seems to be that pnfs_ld_write_done calls pnfs_return_layout in the error path, even if there is in-flight I/O. I wanted to see if this is a result of my modifications, so I injected errors to the ORE (on kernel 3.5) and ran Connectathon's basic tests, and got a deadlock AND that BUG: [ 112.659066] ================================= [ 112.659066] [ INFO: inconsistent lock state ] [ 112.659066] 3.5.0-nfsobj+ #35 Not tainted [ 112.659066] --------------------------------- [ 112.659066] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 112.659066] kworker/0:2/456 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 112.659066] (&(&objlay->lock)->rlock){+.?...}, at: [] objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver] [ 112.659066] {IN-SOFTIRQ-W} state was registered at: [ 112.659066] [] __lock_acquire+0x5e8/0x1bb0 [ 112.659066] [] lock_acquire+0x92/0x120 [ 112.659066] [] _raw_spin_lock+0x36/0x70 [ 112.659066] [] objlayout_iodone.part.1+0x25/0x56 [objlayoutdriver] [ 112.659066] [] objlayout_write_done+0xcb/0xd0 [objlayoutdriver] [ 112.659066] [] _write_done+0x43/0x60 [objlayoutdriver] [ 112.659066] [] _last_io+0x13/0x20 [libore] [ 112.659066] [] _done_io+0x28/0x30 [libore] [ 112.659066] [] osd_request_async_done+0xd4/0xf0 [libosd] [ 112.659066] [] blk_finish_request+0xa8/0x2c0 [ 112.659066] [] blk_end_bidi_request+0x4f/0x80 [ 112.659066] [] blk_end_request+0x10/0x20 [ 112.659066] [] scsi_io_completion+0xaf/0x670 [ 112.659066] [] scsi_finish_command+0xd1/0x130 [ 112.659066] [] scsi_softirq_done+0x13f/0x160 [ 112.659066] [] blk_done_softirq+0x8c/0xa0 [ 112.659066] [] __do_softirq+0xc8/0x250 [ 112.659066] [] call_softirq+0x1c/0x30 [ 112.659066] [] do_softirq+0xa5/0xe0 [ 112.659066] [] local_bh_enable_ip+0xeb/0x100 [ 112.659066] [] _raw_spin_unlock_bh+0x34/0x40 [ 112.659066] [] release_sock+0x14c/0x190 [ 112.659066] [] tcp_sendpage+0xc5/0x6e0 [ 112.659066] [] inet_sendpage+0xd3/0x120 [ 112.659066] [] iscsi_sw_tcp_pdu_xmit+0x116/0x290 [iscsi_tcp] [ 112.659066] [] iscsi_tcp_task_xmit+0xb0/0x2d0 [ 112.659066] [] iscsi_xmit_task+0x5e/0xb0 [ 112.659066] [] iscsi_xmitworker+0x1f7/0x330 [ 112.659066] [] process_one_work+0x19f/0x530 [ 112.659066] [] worker_thread+0x159/0x340 [ 112.659066] [] kthread+0xb7/0xc0 [ 112.659066] [] kernel_thread_helper+0x4/0x10 [ 112.659066] irq event stamp: 56183 [ 112.659066] hardirqs last enabled at (56183): [] local_bh_enable_ip+0x97/0x100 [ 112.659066] hardirqs last disabled at (56181): [] local_bh_enable_ip+0x44/0x100 [ 112.659066] softirqs last enabled at (56182): [] xprt_prepare_transmit+0x70/0x90 [sunrpc] [ 112.659066] softirqs last disabled at (56180): [] _raw_spin_lock_bh+0x18/0x70 [ 112.659066] [ 112.659066] other info that might help us debug this: [ 112.659066] Possible unsafe locking scenario: [ 112.659066] [ 112.659066] CPU0 [ 112.659066] ---- [ 112.659066] lock(&(&objlay->lock)->rlock); [ 112.659066] [ 112.659066] lock(&(&objlay->lock)->rlock); [ 112.659066] [ 112.659066] *** DEADLOCK *** [ 112.659066] [ 112.659066] 2 locks held by kworker/0:2/456: [ 112.659066] #0: (events){.+.+.+}, at: [] process_one_work+0x13c/0x530 [ 112.659066] #1: ((&wdata->task.u.tk_work)){+.+.+.}, at: [] process_one_work+0x13c/0x530 [ 112.659066] [ 112.659066] stack backtrace: [ 112.659066] Pid: 456, comm: kworker/0:2 Not tainted 3.5.0-nfsobj+ #35 [ 112.659066] Call Trace: [ 112.659066] [] print_usage_bug+0x1f5/0x206 [ 112.659066] [] ? save_stack_trace+0x2f/0x50 [ 112.659066] [] mark_lock+0x295/0x2f0 [ 112.659066] [] ? check_usage_forwards+0x140/0x140 [ 112.659066] [] __lock_acquire+0x64a/0x1bb0 [ 112.659066] [] ? trace_hardirqs_off+0xd/0x10 [ 112.659066] [] ? local_clock+0x6f/0x80 [ 112.659066] [] ? native_sched_clock+0x13/0x80 [ 112.659066] [] ? sched_clock+0x9/0x10 [ 112.659066] [] ? sched_clock_local+0x25/0x90 [ 112.659066] [] ? sched_clock_cpu+0xa8/0x110 [ 112.659066] [] ? objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver] [ 112.659066] [] lock_acquire+0x92/0x120 [ 112.659066] [] ? objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver] [ 112.659066] [] ? sched_clock+0x9/0x10 [ 112.659066] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.659066] [] _raw_spin_lock+0x36/0x70 [ 112.659066] [] ? objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver] [ 112.659066] [] ? sched_clock+0x9/0x10 [ 112.659066] [] objlayout_encode_layoutreturn+0x6c/0x420 [objlayoutdriver] [ 112.659066] [] ? sched_clock_cpu+0xa8/0x110 [ 112.659066] [] ? trace_hardirqs_off+0xd/0x10 [ 112.659066] [] ? local_clock+0x6f/0x80 [ 112.659066] [] ? nfs4_xdr_enc_layoutreturn+0x11a/0x170 [nfs] [ 112.659066] [] ? xdr_encode_opaque_fixed+0x47/0x80 [sunrpc] [ 112.659066] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.659066] [] nfs4_xdr_enc_layoutreturn+0x149/0x170 [nfs] [ 112.659066] [] ? local_bh_enable_ip+0x97/0x100 [ 112.659066] [] ? xprt_prepare_transmit+0x70/0x90 [sunrpc] [ 112.659066] [] ? nfs4_xdr_enc_commit+0xe0/0xe0 [nfs] [ 112.659066] [] rpcauth_wrap_req+0x65/0x70 [sunrpc] [ 112.659066] [] call_transmit+0x18e/0x260 [sunrpc] [ 112.659066] [] __rpc_execute+0x70/0x280 [sunrpc] [ 112.659066] [] ? call_connect+0x40/0x40 [sunrpc] [ 112.659066] [] ? call_connect+0x40/0x40 [sunrpc] [ 112.659066] [] ? wake_up_bit+0x2e/0x40 [ 112.659066] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.659066] [] rpc_execute+0x4f/0xb0 [sunrpc] [ 112.659066] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.659066] [] rpc_run_task+0x75/0x90 [sunrpc] [ 112.659066] [] nfs4_proc_layoutreturn+0x86/0xb0 [nfs] [ 112.659066] [] _pnfs_return_layout+0x114/0x180 [nfs] [ 112.659066] [] pnfs_ld_write_done+0x67/0xe0 [nfs] [ 112.659066] [] _rpc_write_complete+0x15/0x20 [objlayoutdriver] [ 112.659066] [] process_one_work+0x19f/0x530 [ 112.659066] [] ? process_one_work+0x13c/0x530 [ 112.659066] [] worker_thread+0x159/0x340 [ 112.659066] [] ? manage_workers+0x230/0x230 [ 112.659066] [] kthread+0xb7/0xc0 [ 112.659066] [] ? trace_hardirqs_on_caller+0x105/0x190 [ 112.659066] [] kernel_thread_helper+0x4/0x10 [ 112.659066] [] ? retint_restore_args+0x13/0x13 [ 112.659066] [] ? __init_kthread_worker+0x70/0x70 [ 112.659066] [] ? gs_change+0x13/0x13 [ 112.787051] ------------[ cut here ]------------ [ 112.787799] kernel BUG at fs/nfs/nfs4proc.c:6252! [ 112.787799] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC [ 112.787799] CPU 0 [ 112.787799] Modules linked in:[ 112.787799] nfs_layout_nfsv41_files objlayoutdriver exofs libore osd libosd iscsi_tcp netconsole nfs nfsd fscache lockd auth_rpcgss nfs_acl sunrpc e1000 rtc_cmos serio_raw [last unloaded: scsi_wait_scan] [ 112.787799] Pid: 456, comm: kworker/0:2 Not tainted 3.5.0-nfsobj+ #35 innotek GmbH VirtualBox [ 112.787799] RIP: 0010:[] [] nfs4_layoutreturn_done+0xd5/0xe0 [nfs] [ 112.787799] RSP: 0018:ffff88003c7bdb50 EFLAGS: 00010206 [ 112.787799] RAX: ffff880034480b90 RBX: ffff88003cbb5860 RCX: ffff88003add4ca8 [ 112.787799] RDX: 0000000000000000 RSI: ffff8800354b1288 RDI: 0000000000000246 [ 112.787799] RBP: ffff88003c7bdb70 R08: 0000000000000002 R09: 0000000000000000 [ 112.787799] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88003a782240 [ 112.787799] R13: ffff880034480b68 R14: ffff88003a787138 R15: ffffffffa02551a0 [ 112.787799] FS: 0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000 [ 112.787799] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 112.787799] CR2: 0000003f33c9b640 CR3: 0000000034070000 CR4: 00000000000006f0 [ 112.787799] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 112.787799] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 112.787799] Process kworker/0:2 (pid: 456, threadinfo ffff88003c7bc000, task ffff88003add4620) [ 112.787799] Stack: [ 112.787799] ffff88003a787138 ffff88003a782240 ffff8800347c4858 ffff88003c7bdcf8 [ 112.787799] ffff88003c7bdb90 ffffffffa00391fc ffff88003a787138 ffff88003a782240 [ 112.787799] ffff88003c7bdc00 ffffffffa003a360 000000003c7bdbc0 ffff88003a7822b0 [ 112.787799] Call Trace: [ 112.787799] [] rpc_exit_task+0x2c/0x90 [sunrpc] [ 112.787799] [] __rpc_execute+0x70/0x280 [sunrpc] [ 112.787799] [] ? rpc_async_release+0x20/0x20 [sunrpc] [ 112.787799] [] ? rpc_async_release+0x20/0x20 [sunrpc] [ 112.787799] [] ? wake_up_bit+0x2e/0x40 [ 112.787799] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.787799] [] rpc_execute+0x4f/0xb0 [sunrpc] [ 112.787799] [] ? __copy_nfsS_and_zero_terminate+0x90/0x90 [objlayoutdriver] [ 112.787799] [] rpc_run_task+0x75/0x90 [sunrpc] [ 112.787799] [] nfs4_proc_layoutreturn+0x86/0xb0 [nfs] [ 112.787799] [] _pnfs_return_layout+0x114/0x180 [nfs] [ 112.787799] [] pnfs_ld_write_done+0x67/0xe0 [nfs] [ 112.787799] [] _rpc_write_complete+0x15/0x20 [objlayoutdriver] [ 112.787799] [] process_one_work+0x19f/0x530 [ 112.787799] [] ? process_one_work+0x13c/0x530 [ 112.787799] [] worker_thread+0x159/0x340 [ 112.787799] [] ? manage_workers+0x230/0x230 [ 112.787799] [] kthread+0xb7/0xc0 [ 112.787799] [] ? trace_hardirqs_on_caller+0x105/0x190 [ 112.787799] [] kernel_thread_helper+0x4/0x10 [ 112.787799] [] ? retint_restore_args+0x13/0x13 [ 112.787799] [] ? __init_kthread_worker+0x70/0x70 [ 112.787799] [] ? gs_change+0x13/0x13 [ 112.787799] Code: c3 0f 1f 44 00 00 48 8d 73 64 ba 01 00 00 00 4c 89 ef e8 ff ab 01 00 eb c8 0f 1f 44 00 00 4c 89 e7 e8 e0 66 ed ff e9 5a ff ff ff <0f> 0b 66 0f 1f 84 00 00 00 00 00 55 48 89 e5 53 48 83 ec 08 66 [ 112.787799] RIP [] nfs4_layoutreturn_done+0xd5/0xe0 [nfs] [ 112.787799] RSP [ 112.872492] ---[ end trace 114822acc0612b2a ]--- My setup: Data server is osd-osc emulator MDS is nfsd-exofs on kernel 3.3 (Benny's kernel) Is there any fix for this? Or is it not an issue and my injection is just wrong and stupid? I've seen some discussions on this in the mailing list, and I saw the patch that removes the BUG, but I'm not sure if this is the right thing to do for object layout. --- fs/exofs/ore.c | 11 ++++++++--- 1 files changed, 8 insertions(+), 3 deletions(-) /* start read offset passed endof file */ diff --git a/fs/exofs/ore.c b/fs/exofs/ore.c index 24a49d4..e9d7b45 100644 --- a/fs/exofs/ore.c +++ b/fs/exofs/ore.c @@ -426,9 +426,14 @@ int ore_check_io(struct ore_io_state *ios, ore_on_dev_error on_dev_error) if (unlikely(!or)) continue; - ret = osd_req_decode_sense(or, &osi); - if (likely(!ret)) - continue; + if (jiffies % 6 == 0) { + osi.osd_err_pri = OSD_ERR_PRI_EIO; + ret = -EIO; + } else { + ret = osd_req_decode_sense(or, &osi); + if (likely(!ret)) + continue; + } if (OSD_ERR_PRI_CLEAR_PAGES == osi.osd_err_pri) {