From patchwork Thu Feb 7 16:41:34 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "J. Bruce Fields" X-Patchwork-Id: 2112071 Return-Path: X-Original-To: patchwork-linux-rdma@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork2.kernel.org (Postfix) with ESMTP id D09A0DFB7B for ; Thu, 7 Feb 2013 16:41:57 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1758976Ab3BGQlh (ORCPT ); Thu, 7 Feb 2013 11:41:37 -0500 Received: from fieldses.org ([174.143.236.118]:54246 "EHLO fieldses.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1758971Ab3BGQlf (ORCPT ); Thu, 7 Feb 2013 11:41:35 -0500 Received: from bfields by fieldses.org with local (Exim 4.76) (envelope-from ) id 1U3UXW-0001Nb-K3; Thu, 07 Feb 2013 11:41:34 -0500 Date: Thu, 7 Feb 2013 11:41:34 -0500 From: "J. Bruce Fields" To: Yan Burman Cc: linux-nfs@vger.kernel.org, swise@opengridcomputing.com, linux-rdma@vger.kernel.org, Or Gerlitz Subject: Re: NFS over RDMA crashing Message-ID: <20130207164134.GK3222@fieldses.org> References: <51127B3F.2090200@mellanox.com> <20130206222435.GL16417@fieldses.org> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20130206222435.GL16417@fieldses.org> User-Agent: Mutt/1.5.21 (2010-09-15) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Wed, Feb 06, 2013 at 05:24:35PM -0500, J. Bruce Fields wrote: > On Wed, Feb 06, 2013 at 05:48:15PM +0200, Yan Burman wrote: > > When killing mount command that got stuck: > > ------------------------------------------- > > > > BUG: unable to handle kernel paging request at ffff880324dc7ff8 > > IP: [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] > > PGD 1a0c063 PUD 32f82e063 PMD 32f2fd063 PTE 8000000324dc7161 > > Oops: 0003 [#1] PREEMPT SMP > > Modules linked in: md5 ib_ipoib xprtrdma svcrdma rdma_cm ib_cm iw_cm > > ib_addr nfsd exportfs netconsole ip6table_filter ip6_tables > > iptable_filter ip_tables ebtable_nat nfsv3 nfs_acl ebtables x_tables > > nfsv4 auth_rpcgss nfs lockd autofs4 sunrpc target_core_iblock > > target_core_file target_core_pscsi target_core_mod configfs 8021q > > bridge stp llc ipv6 dm_mirror dm_region_hash dm_log vhost_net > > macvtap macvlan tun uinput iTCO_wdt iTCO_vendor_support kvm_intel > > kvm crc32c_intel microcode pcspkr joydev i2c_i801 lpc_ich mfd_core > > ehci_pci ehci_hcd sg ioatdma ixgbe mdio mlx4_ib ib_sa ib_mad ib_core > > mlx4_en mlx4_core igb hwmon dca ptp pps_core button dm_mod ext3 jbd > > sd_mod ata_piix libata uhci_hcd megaraid_sas scsi_mod > > CPU 6 > > Pid: 4744, comm: nfsd Not tainted 3.8.0-rc5+ #4 Supermicro > > X8DTH-i/6/iF/6F/X8DTH > > RIP: 0010:[] [] > > rdma_read_xdr+0x8bb/0xd40 [svcrdma] > > RSP: 0018:ffff880324c3dbf8 EFLAGS: 00010297 > > RAX: ffff880324dc8000 RBX: 0000000000000001 RCX: ffff880324dd8428 > > RDX: ffff880324dc7ff8 RSI: ffff880324dd8428 RDI: ffffffff81149618 > > RBP: ffff880324c3dd78 R08: 000060f9c0000860 R09: 0000000000000001 > > R10: ffff880324dd8000 R11: 0000000000000001 R12: ffff8806299dcb10 > > R13: 0000000000000003 R14: 0000000000000001 R15: 0000000000000010 > > FS: 0000000000000000(0000) GS:ffff88063fc00000(0000) knlGS:0000000000000000 > > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b > > CR2: ffff880324dc7ff8 CR3: 0000000001a0b000 CR4: 00000000000007e0 > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > > Process nfsd (pid: 4744, threadinfo ffff880324c3c000, task ffff880330550000) > > Stack: > > ffff880324c3dc78 ffff880324c3dcd8 0000000000000282 ffff880631cec000 > > ffff880324dd8000 ffff88062ed33040 0000000124c3dc48 ffff880324dd8000 > > ffff88062ed33058 ffff880630ce2b90 ffff8806299e8000 0000000000000003 > > Call Trace: > > [] svc_rdma_recvfrom+0x3ee/0xd80 [svcrdma] > > [] ? try_to_wake_up+0x2f0/0x2f0 > > [] svc_recv+0x3ef/0x4b0 [sunrpc] > > [] ? nfsd_svc+0x740/0x740 [nfsd] > > [] nfsd+0xad/0x130 [nfsd] > > [] ? nfsd_svc+0x740/0x740 [nfsd] > > [] kthread+0xd6/0xe0 > > [] ? __init_kthread_worker+0x70/0x70 > > [] ret_from_fork+0x7c/0xb0 > > [] ? __init_kthread_worker+0x70/0x70 > > Code: 63 c2 49 8d 8c c2 18 02 00 00 48 39 ce 77 e1 49 8b 82 40 0a 00 > > 00 48 39 c6 0f 84 92 f7 ff ff 90 48 8d 50 f8 49 89 92 40 0a 00 00 > > <48> c7 40 f8 00 00 00 00 49 8b 82 40 0a 00 00 49 3b 82 30 0a 00 > > RIP [] rdma_read_xdr+0x8bb/0xd40 [svcrdma] > > RSP > > CR2: ffff880324dc7ff8 > > ---[ end trace 06d0384754e9609a ]--- > > > > > > It seems that commit afc59400d6c65bad66d4ad0b2daf879cbff8e23e > > "nfsd4: cleanup: replace rq_resused count by rq_next_page pointer" > > is responsible for the crash (it seems to be crashing in > > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:527) > > It may be because I have CONFIG_DEBUG_SET_MODULE_RONX and > > CONFIG_DEBUG_RODATA enabled. I did not try to disable them yet. > > > > When I moved to commit 79f77bf9a4e3dd5ead006b8f17e7c4ff07d8374e I > > was no longer getting the server crashes, > > so the reset of my tests were done using that point (it is somewhere > > in the middle of 3.7.0-rc2). > > OK, so this part's clearly my fault--I'll work on a patch, but the > rdma's use of the ->rq_pages array is pretty confusing. Does this help? They must have added this for some reason, but I'm not seeing how it could have ever done anything.... --b. --- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c index 0ce7552..e8f25ec 100644 --- a/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c +++ b/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c @@ -520,13 +520,6 @@ next_sge: for (ch_no = 0; &rqstp->rq_pages[ch_no] < rqstp->rq_respages; ch_no++) rqstp->rq_pages[ch_no] = NULL; - /* - * Detach res pages. If svc_release sees any it will attempt to - * put them. - */ - while (rqstp->rq_next_page != rqstp->rq_respages) - *(--rqstp->rq_next_page) = NULL; - return err; }