From patchwork Thu Jan 12 01:45:41 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Changwei Ge X-Patchwork-Id: 9511719 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 733E460573 for ; Thu, 12 Jan 2017 01:47:49 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 640862852A for ; Thu, 12 Jan 2017 01:47:49 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5546B28650; Thu, 12 Jan 2017 01:47:49 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 435362852A for ; Thu, 12 Jan 2017 01:47:46 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v0C1kvol023640 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 12 Jan 2017 01:46:57 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.13.8/8.14.4) with ESMTP id v0C1kt7O000738 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Thu, 12 Jan 2017 01:46:55 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1cRUTf-0004Yf-LO; Wed, 11 Jan 2017 17:46:55 -0800 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1cRUT0-0004WA-OA for ocfs2-devel@oss.oracle.com; Wed, 11 Jan 2017 17:46:14 -0800 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id v0C1kDdI004751 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK) for ; Thu, 12 Jan 2017 01:46:14 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id v0C1kCaN031843 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Thu, 12 Jan 2017 01:46:13 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.17/8.16.0.17) with SMTP id v0C1hnZr042473 for ; Thu, 12 Jan 2017 01:46:12 GMT Authentication-Results: oracle.com; spf=none smtp.mailfrom=ge.changwei@h3c.com Received: from h3cmg01-ex.h3c.com (smtp.h3c.com [60.191.123.56]) by userp2030.oracle.com with ESMTP id 27x0c9g1kx-1 for ; Thu, 12 Jan 2017 01:46:11 +0000 Received: from BJHUB01-EX.srv.huawei-3com.com (unknown [10.63.20.169]) by h3cmg01-ex.h3c.com with smtp id 666f_04a8_4972823f_05af_4e9e_950a_c1f219008db4; Thu, 12 Jan 2017 09:46:03 +0800 Received: from H3CMLB12-EX.srv.huawei-3com.com ([fe80::10fe:abde:731b:fdde]) by BJHUB01-EX.srv.huawei-3com.com ([::1]) with mapi id 14.03.0248.002; Thu, 12 Jan 2017 09:45:42 +0800 From: Gechangwei To: Joseph Qi , "ocfs2-devel@oss.oracle.com" , "akpm@linux-foundation.org" Thread-Topic: [PATCH v2] ocfs2/journal: fix umount hang after flushing journal failure Thread-Index: AdJrsMN+vH2RessgS9uu/58J1Gvu2w== Date: Thu, 12 Jan 2017 01:45:41 +0000 Message-ID: <63ADC13FD55D6546B7DECE290D39E37342E74B81@H3CMLB12-EX.srv.huawei-3com.com> References: <63ADC13FD55D6546B7DECE290D39E37342E73E73@H3CMLB12-EX.srv.huawei-3com.com> <2e92ef09-08b2-8f34-36c5-bd0a9d4f6578@gmail.com> Accept-Language: en-US, zh-CN Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.96.76.110] MIME-Version: 1.0 X-PDR: PASS X-ServerName: smtp.h3c.com X-Proofpoint-SPF-Result: None X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8405 signatures=670798 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1612050000 definitions=main-1701120016 Subject: Re: [Ocfs2-devel] [PATCH v2] ocfs2/journal: fix umount hang after flushing journal failure X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Virus-Scanned: ClamAV using ClamSMTP Hi Joseph, At the very beginning, I tried to put the quoted string within the log in two separated lines. But I was alerted that: WARNING: line over 80 characters #21: FILE: fs/ocfs2/journal.c:2320: + mlog(ML_ERROR, "journal is already abort and cannot be " WARNING: quoted string split across lines #22: FILE: fs/ocfs2/journal.c:2321: + mlog(ML_ERROR, "journal is already abort and cannot be " + "flushed any more. So ignore the pending " WARNING: quoted string split across lines #23: FILE: fs/ocfs2/journal.c:2322: + "flushed any more. So ignore the pending " + "transactions to avoid blocking ocfs2 unmount.\n"); total: 0 errors, 3 warnings, 24 lines checked So I did like my last patch to make these warning quiet. And I accept your suggestion that turning logging level from ML_KTHREAD to ML_ERROR. Also, let's make the log description like what you proposed. It may alert system administrator strikingly. So does this one look fine to you? From 686b52ee2f06395c53e36e2c7515c276dc7541fb Mon Sep 17 00:00:00 2001 From: Changwei Ge Date: Wed, 11 Jan 2017 09:05:35 +0800 Subject: [PATCH v3] fix umount hang after journal flushing failure Signed-off-by: Changwei Ge --- fs/ocfs2/journal.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) + * disk.Set it to ZERO so that umount will + * continue during shutting down journal + */ + atomic_set(&journal->j_num_trans, 0); + } } } -- 1.7.9.5 Thanks. Br. Changwei On 2017/1/11 11:22, Joseph Qi wrote: > The patch is malformed... > > BTW, I'd like ML_ERROR level log rather ML_KTHREAD, because journal > > abort at least should alert filesystem administrator, also this will discard > > some transactions. > > So how about: > > if (status < 0) { > > mlog(ML_ERROR, "journal is already abort and cannot be " > > "flushed any more. So ignore the pending " > > " transactions to avoid blocking ocfs2 unmount.\n"); > > atomic_set(&journal->j_num_trans, 0); > > } > > Thanks, > Joseph > > On 17/1/11 10:16, Gechangwei wrote: >> Hi, >> >> As prior e-mail described, umount would hang after journal flushing failure. >> >> When journal flushing in ocfs2_commit_cache() fails, following umount >> procedure at stage of shutting journal may be blocked due to non-zero >> transactions number. >> >> Once jbd2_journal_flush() fails, the journal will be marked as ABORT >> state within JBD2. There is no way to come back afterwards. >> >> So there is no chance to set transactions number to zero, thus, shutting >> journal may be blocked . >> >> ocfs2_commit_thread() >> >> ocfs2_commit_cache() >> >> jbd2_journal_flush() -> failure takes journal into ABORT state, >> thus, transaction number will never set to zero >> >> >> >> The back trace is cited blow: >> >> [] kthread_stop+0x4c/0x150 >> [] ocfs2_journal_shutdown+0xa7/0x400 [ocfs2] >> [] ocfs2_dismount_volume+0xbe/0x4a0 [ocfs2] >> [] ocfs2_put_super+0x37/0xb0 [ocfs2] >> [] generic_shutdown_super+0x7e/0x110 >> [] kill_block_super+0x30/0x80 >> [] deactivate_locked_super+0x59/0x90 >> [] deactivate_super+0x4e/0x70 >> [] cleanup_mnt+0x43/0x90 >> [] __cleanup_mnt+0x12/0x20 >> [] task_work_run+0xb7/0xf0 >> [] do_notify_resume+0x8c/0xa0 >> [] int_signal+0x12/0x17 >> [] 0xffffffffffffffff >> >> >> To solve this issue, I propose an improved version of patch referring to >> Andrew's and Joseph's comments. >> >> Any more comments is welcomed as always. >> >> >> From f3315307dd2a30da138383340689a81708134a44 Mon Sep 17 00:00:00 2001 >> From: Changwei Ge >> Date: Wed, 11 Jan 2017 09:05:35 +0800 >> Subject: [PATCH v2] fix umount hang after journal flushing failure >> >> Signed-off-by: Changwei Ge >> --- >> fs/ocfs2/journal.c | 18 ++++++++++++++++++ >> 1 file changed, 18 insertions(+) >> >> diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c >> index a244f14..9a6b234 100644 >> --- a/fs/ocfs2/journal.c >> +++ b/fs/ocfs2/journal.c >> @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg) >> "commit_thread: %u transactions pending on " >> "shutdown\n", >> atomic_read(&journal->j_num_trans)); >> + >> + if (status < 0) { >> + mlog(ML_KTHREAD, >> + "Although %u transactions left, >> journal must be shut down right now.\n", >> + atomic_read(&journal->j_num_trans)); >> + /* >> + * This may a litte hacky, however, no >> chance >> + * for ocfs2/journal to decrease this >> variable >> + * thourgh commit-thread. I have to do so to >> + * avoid umount hang after journal flushing >> + * failure. Since jounral has been >> marked ABORT >> + * within jbd2_journal_flush, commit >> cache will >> + * never do any real work to flush >> journal to >> + * disk.Set t to ZERO so that umount will >> + * continue during shutting downjournal >> + */ >> + atomic_set(&journal->j_num_trans, 0); >> + } >> } >> } >> >> -- >> 1.7.9.5 >> >> >> Thanks. >> >> Br. >> >> Changwei >> >> >> >> ------------------------------------------------------------------------------------------------------------------------------------- >> 本邮件及其附件含有杭州华三通信技术有限公司的保密信息,仅限于发送给上面地址中列出 >> 的个人或群组。禁止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、 >> 或散发)本邮件中的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本 >> 邮件! >> This e-mail and its attachments contain confidential information from H3C, which is >> intended only for the person or entity whose address is listed above. Any use of the >> information contained herein in any way (including, but not limited to, total or partial >> disclosure, reproduction, or dissemination) by persons other than the intended >> recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender >> by phone or email immediately and delete it! > diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index a244f14..5f3c862 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -2315,6 +2315,24 @@ static int ocfs2_commit_thread(void *arg) "commit_thread: %u transactions pending on " "shutdown\n", atomic_read(&journal->j_num_trans)); + + if (status < 0) { + mlog(ML_ERROR, "journal is already abort and cannot be " + "flushed any more. So ignore the pending " + "transactions to avoid blocking ocfs2 unmount.\n"); + /* + * This may a litte hacky, however, no chance + * for ocfs2/journal to decrease this variable + * thourgh commit-thread. I have to do so to + * avoid umount hang after journal flushing + * failure. Since jounral has been marked ABORT + * within jbd2_journal_flush, commit cache will + * never do any real work to flush journal to