From patchwork Sat Sep 10 09:55:35 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhen Ren X-Patchwork-Id: 9324865 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id CABD160780 for ; Sat, 10 Sep 2016 09:57:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B15D2299C3 for ; Sat, 10 Sep 2016 09:57:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A5B90299C9; Sat, 10 Sep 2016 09:57:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 068DE299C3 for ; Sat, 10 Sep 2016 09:57:25 +0000 (UTC) Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8A9uahL019087 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Sat, 10 Sep 2016 09:56:36 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8A9uYPc001726 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sat, 10 Sep 2016 09:56:34 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bif1V-0004tv-VM; Sat, 10 Sep 2016 02:56:33 -0700 Received: from aserv0021.oracle.com ([141.146.126.233]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1bif17-0004tJ-K4 for ocfs2-devel@oss.oracle.com; Sat, 10 Sep 2016 02:56:09 -0700 Received: from aserp1060.oracle.com (aserp1060.oracle.com [141.146.126.71]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8A9u9Wc003978 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sat, 10 Sep 2016 09:56:09 GMT Received: from userp2030.oracle.com (userp2030.oracle.com [156.151.31.89]) by aserp1060.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8A9u8fb020465 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sat, 10 Sep 2016 09:56:09 GMT Received: from pps.filterd (userp2030.oracle.com [127.0.0.1]) by userp2030.oracle.com (8.16.0.17/8.16.0.17) with SMTP id u8A9qURN043228 for ; Sat, 10 Sep 2016 09:56:08 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=zren@suse.com Received: from prv3-mh.provo.novell.com (prv3-mh.provo.novell.com [137.65.250.26]) by userp2030.oracle.com with ESMTP id 25cbguvdtg-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sat, 10 Sep 2016 09:56:07 +0000 Received: from laptop.ha.suse (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by prv3-mh.provo.novell.com with ESMTP (TLS encrypted); Sat, 10 Sep 2016 03:55:56 -0600 From: Eric Ren To: akpm@linux-foundation.org Date: Sat, 10 Sep 2016 17:55:35 +0800 Message-Id: <1473501335-12519-1-git-send-email-zren@suse.com> X-Mailer: git-send-email 2.6.6 X-ServerName: prv3-mh.provo.novell.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 include:microfocus.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8283 signatures=670675 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=13 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609100139 Cc: mfasheh@suse.com, ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: userv0021.oracle.com [156.151.31.71] X-Virus-Scanned: ClamAV using ClamSMTP The testcase "mmaptruncate" of ocfs2-test deadlocked occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: [] __wait_on_bit_lock+0x50/0xa0 [] __lock_page+0xb7/0xc0 [] ? autoremove_wake_function+0x40/0x40 [] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] [] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2] [] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] [] do_page_mkwrite+0x66/0xc0 [] handle_mm_fault+0x685/0x1350 [] ? __fpu__restore_sig+0x70/0x530 [] __do_page_fault+0x1d8/0x4d0 [] trace_do_page_fault+0x37/0xf0 [] do_async_page_fault+0x19/0x70 [] async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if ENOSPC is turned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Fix this issue by unlocking the target page after we fail to allocate enough space at the first time. Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Signed-off-by: Eric Ren Reviewed-by: Gang He --- fs/ocfs2/aops.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 98d3654..78d1d67 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -1860,6 +1860,13 @@ out: */ try_free = 0; + /* + * Unlock mmap_page because the page has been locked when we + * are here. + */ + if (mmap_page) + unlock_page(mmap_page); + ret1 = ocfs2_try_to_free_truncate_log(osb, clusters_need); if (ret1 == 1) goto try_again;