From patchwork Tue Jun 23 06:47:55 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 6659001 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id 1685D9F380 for ; Tue, 23 Jun 2015 06:49:29 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id D45FE2060A for ; Tue, 23 Jun 2015 06:49:27 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 67788205E3 for ; Tue, 23 Jun 2015 06:49:26 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t5N6mgmb007261 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Tue, 23 Jun 2015 06:48:42 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t5N6me26031247 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Tue, 23 Jun 2015 06:48:40 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Z7I0e-0004lw-P0; Mon, 22 Jun 2015 23:48:40 -0700 Received: from aserv0022.oracle.com ([141.146.126.234]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Z7I0G-0004lJ-Sa for ocfs2-devel@oss.oracle.com; Mon, 22 Jun 2015 23:48:16 -0700 Received: from aserp1020.oracle.com (aserp1020.oracle.com [141.146.126.67]) by aserv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t5N6mGv4029998 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 23 Jun 2015 06:48:16 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp1020.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t5N6mF93002666 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Tue, 23 Jun 2015 06:48:15 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.14.7/8.14.7) with SMTP id t5N6ixQO004640 for ; Tue, 23 Jun 2015 06:48:15 GMT Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [119.145.14.65]) by userp2030.oracle.com with ESMTP id 1v6h9m4gxm-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Tue, 23 Jun 2015 06:48:13 +0000 Received: from 172.24.2.119 (EHLO szxeml432-hub.china.huawei.com) ([172.24.2.119]) by szxrg02-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CNK17000; Tue, 23 Jun 2015 14:48:01 +0800 (CST) Received: from [127.0.0.1] (10.177.24.125) by szxeml432-hub.china.huawei.com (10.82.67.209) with Microsoft SMTP Server id 14.3.158.1; Tue, 23 Jun 2015 14:47:57 +0800 Message-ID: <5589011B.9030205@huawei.com> Date: Tue, 23 Jun 2015 14:47:55 +0800 From: Joseph Qi User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Zhangguanghui References: <2015060917590038012350@h3c.com> In-Reply-To: <2015060917590038012350@h3c.com> X-Originating-IP: [10.177.24.125] X-CFilter-Loop: Reflected X-ServerName: szxga02-in.huawei.com X-Proofpoint-Virus-Version: vendor=nai engine=5700 definitions=7840 signatures=670596 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=6.96484907525274e-08 kscore.compositescore=0 circleOfTrustscore=0 compositescore=0.760072870332355 suspectscore=2 recipient_domain_to_sender_totalscore=0 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.760072870332355 recipient_to_sender_totalscore=0 recipient_domain_to_sender_domain_totalscore=0 spamscore=0 recipient_to_sender_domain_totalscore=0 urlsuspectscore=0.760072870332355 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1402240000 definitions=main-1506230122 X-MIME-Autoconverted: from 8bit to quoted-printable by aserv0022.oracle.com id t5N6mGv4029998 Cc: "ocfs2-devel@oss.oracle.com" Subject: Re: [Ocfs2-devel] __ocfs2_journal_access review, BUG X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Could you please test my fix? It will retry once the SAN recovers. On 2015/6/9 17:59, Zhangguanghui wrote: > In the process of __ocfs2_journal_access? > > If LUNs can not be accessed for some reasons?such as storage network fails )?then BUG. > > When disk timeout , the server of fence ( emergency_restart() ) will fail, only can recovery by the reset of ILO. > > So we have to return the error -EIO, and avoid to BUG(panic). > > Moreover, whether all BUG_ON(!buffer_uptodate(bh)) in the ocfs2 file system can handle in the same way?? > > Finally, any feedback about this process (positive or negative) would be greatly appreciated. > > > --- journal.c 2015-05-18 00:55:21.000000000 +0800 > +++ journal.c.bk 2015-06-09 17:37:13.531333444 +0800 > @@ -670,7 +670,7 @@ > mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n"); > mlog(ML_ERROR, "b_blocknr=%llu\n", > (unsigned long long)bh->b_blocknr); > - BUG(); > + return -EIO; > } > > /* Set the current transaction information on the ci so > > > > Jun 9 15:20:23 cvk68 kernel: [76994.822719] (pool,13568,12):__ocfs2_journal_access:664 ERROR: giving me a buffer that's not uptodate! > Jun 9 15:20:23 cvk68 kernel: [76994.822721] (pool,13568,12):__ocfs2_journal_access:666 ERROR: b_blocknr=33030401 > Jun 9 15:20:23 cvk68 kernel: [76994.822716] Read(10): 28 00 00 00 29 80 00 00 1f 00 > Jun 9 15:20:23 cvk68 kernel: [76994.822729] (ksoftirqd/25,263,25):o2hb_bio_end_io:381 ERROR: IO Error -5 > Jun 9 15:20:23 cvk68 kernel: [76994.822737] ------------[ cut here ]------------ > Jun 9 15:20:23 cvk68 kernel: [76994.822740] (o2hb-771CAAF371,7589,9):o2hb_do_disk_heartbeat:993 ERROR: status = -5 > Jun 9 15:20:23 cvk68 kernel: [76994.822746] Kernel BUG at ffffffffa048b15d [verbose debug info unavailable] > Jun 9 15:20:23 cvk68 kernel: [76994.822748] invalid opcode: 0000 [#1] SMP > Jun 9 15:20:23 cvk68 kernel: [76994.822751] sd 13:0:0:0: rejecting I/O to offline device > Jun 9 15:20:23 cvk68 kernel: [76994.822753] (o2hb-771CAAF371,7589,9):o2hb_bio_end_io:381 ERROR: IO Error -5 > Jun 9 15:20:23 cvk68 kernel: [76994.822755] (o2hb-771CAAF371,7589,9):o2hb_do_disk_heartbeat:993 ERROR: status = -5 > Jun 9 15:20:23 cvk68 kernel: [76994.822751] Modules linked in: ip6table_filter(F) ip6_tables(F) iptable_filter(F) ip_tables(F) ebtable_nat(F) ebtables(F) x_tables(F) ocfs2(OF) quota_tree(F) cls_u32(F) sch_sfq(F) sch_htb(F) drbd(F) lru_cache(F) 8021q(F) mrp(F) garp(F) stp(F) llc(F) vhost_net(F) macvtap(F) macvlan(F) vhost(F) kvm_intel(F) kvm(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) ocfs2_dlmfs(OF) ocfs2_stack_o2cb(OF) ocfs2_dlm(OF) ocfs2_nodemanager(OF) ocfs2_stackglue(OF) configfs(F) openvswitch(OF) libcrc32c(F) gre(F) nfsd(F) nfs_acl(F) auth_rpcgss(F) nfs(F) fscache(F) lockd(F) sunrpc(F) psmouse(F) sb_edac(F) ioatdma(F) edac_core(F) gpio_ich(F) dm_multipath(F) serio_raw(F) scsi_dh(F) dca(F) hpwdt(F) hpilo(F) mac_hid(F) lpc_ich(F) video(F) acpi_power_meter(F) lp(F) parport(F) be2iscsi(F) iscsi_boot_sysfs(F) libiscsi(F) hpsa(F) scsi_transport_iscsi(F) be2net(F) nbd(F) [last unloaded: ipmi_si] > Jun 9 15:20:23 cvk68 kernel: [76994.822802] CPU: 12 PID: 13568 Comm: pool Tainted: GF O 3.13.6 #1 > Jun 9 15:20:23 cvk68 kernel: [76994.822804] Hardware name: H3C FlexServer B390, BIOS I31 02/10/2014 > Jun 9 15:20:23 cvk68 kernel: [76994.822806] task: ffff880611451810 ti: ffff8802cf8da000 task.ti: ffff8802cf8da000 > Jun 9 15:20:23 cvk68 kernel: [76994.822808] RIP: 0010:[] [] __ocfs2_journal_access+0x30d/0x350 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822832] RSP: 0018:ffff8802cf8dbb78 EFLAGS: 00010292 > Jun 9 15:20:23 cvk68 kernel: [76994.822834] RAX: 0000000000000044 RBX: 1000000000000000 RCX: 000000000000c5c0 > Jun 9 15:20:23 cvk68 kernel: [76994.822836] RDX: 0000000000000082 RSI: 0000000065ee65ea RDI: 0000000000000246 > Jun 9 15:20:23 cvk68 kernel: [76994.822838] RBP: ffff8802cf8dbbf8 R08: ffffffff81ec09a8 R09: ffffffff81ee8f20 > Jun 9 15:20:23 cvk68 kernel: [76994.822840] R10: 0000000000000064 R11: 0000000000017adc R12: ffff880604b31138 > Jun 9 15:20:23 cvk68 kernel: [76994.822842] R13: ffff880611451810 R14: ffff880611451ce0 R15: 0000000000000001 > Jun 9 15:20:23 cvk68 kernel: [76994.822845] FS: 00007f9bcffff700(0000) GS:ffff880c3f880000(0000) knlGS:0000000000000000 > Jun 9 15:20:23 cvk68 kernel: [76994.822847] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > Jun 9 15:20:23 cvk68 kernel: [76994.822849] CR2: 000000000133b7b8 CR3: 000000061168a000 CR4: 00000000001427e0 > Jun 9 15:20:23 cvk68 kernel: [76994.822851] Stack: > Jun 9 15:20:23 cvk68 kernel: [76994.822852] 0000000001f80101 000000000000000b ffff880c1cc84030 0000000000000000 > Jun 9 15:20:23 cvk68 kernel: [76994.822857] ffffffffa0505430 ffff880c1d183000 ffff880c1cc84030 0000000001f80101 > Jun 9 15:20:23 cvk68 kernel: [76994.822861] 0000000001f80101 00001000a0473010 0000000000000000 ffff880c1dd35000 > Jun 9 15:20:23 cvk68 kernel: [76994.822865] Call Trace: > Jun 9 15:20:23 cvk68 kernel: [76994.822878] [] ocfs2_journal_access_di+0x18/0x20 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822888] [] ocfs2_write_end_nolock+0x63/0x430 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822897] [] ? ocfs2_write_begin+0x1e2/0x230 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822906] [] ocfs2_write_end+0x26/0x50 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822910] [] generic_file_buffered_write+0x165/0x280 > Jun 9 15:20:23 cvk68 kernel: [76994.822921] [] ocfs2_file_aio_write+0x74f/0x790 [ocfs2] > Jun 9 15:20:23 cvk68 kernel: [76994.822925] [] do_sync_write+0x5a/0x90 > Jun 9 15:20:23 cvk68 kernel: [76994.822928] [] vfs_write+0xc5/0x1f0 > Jun 9 15:20:23 cvk68 kernel: [76994.822931] [] SyS_write+0x52/0xa0 > Jun 9 15:20:23 cvk68 kernel: [76994.822934] [] system_call_fastpath+0x1a/0x1f > Jun 9 15:20:23 cvk68 kernel: [76994.822936] Code: 8b 95 fc 02 00 00 48 63 c9 48 89 04 24 41 b9 9a 02 00 00 49 c7 c0 e0 dc 4e a0 4c 89 f6 48 c7 c7 18 a4 4f a0 31 c0 e8 29 09 2c e1 <0f> 0b 65 8b 0c 25 64 b0 00 00 65 48 8b 34 25 c0 c7 00 00 8b 96 > Jun 9 15:20:23 cvk68 kernel: [76994.822961] RIP [] __ocfs2_journal_access+0x30d/0x350 [ocfs2] > > ------------------------------------------------------------------------------------------------------------------------------------- > ???????????????????????????????????????? > ???????????????????????????????????????? > ???????????????????????????????????????? > ??? > This e-mail and its attachments contain confidential information from H3C, which is > intended only for the person or entity whose address is listed above. Any use of the > information contained herein in any way (including, but not limited to, total or partial > disclosure, reproduction, or dissemination) by persons other than the intended > recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender > by phone or email immediately and delete it! > > > _______________________________________________ > Ocfs2-devel mailing list > Ocfs2-devel@oss.oracle.com > https://oss.oracle.com/mailman/listinfo/ocfs2-devel > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 8017032..92cc36a 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -670,7 +670,23 @@ static int __ocfs2_journal_access(handle_t *handle, mlog(ML_ERROR, "giving me a buffer that's not uptodate!\n"); mlog(ML_ERROR, "b_blocknr=%llu\n", (unsigned long long)bh->b_blocknr); - BUG(); + + lock_buffer(bh); + /* + * A previous attempt to write this buffer head failed. + * Nothing we can do but to retry the write and hope for + * the best. + */ + if (buffer_write_io_error(bh) && !buffer_uptodate(bh)) { + clear_buffer_write_io_error(bh); + set_buffer_uptodate(bh); + } + + if (!buffer_uptodate(bh)) { + unlock_buffer(bh); + return -EIO; + } + unlock_buffer(bh); } /* Set the current transaction information on the ci so