From patchwork Sun Sep 18 04:45:02 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zhen Ren X-Patchwork-Id: 9337583 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A60F4607D0 for ; Sun, 18 Sep 2016 04:47:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8063228EBB for ; Sun, 18 Sep 2016 04:47:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5FEB128EBE; Sun, 18 Sep 2016 04:47:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DB01A28EBB for ; Sun, 18 Sep 2016 04:47:46 +0000 (UTC) Received: from aserv0022.oracle.com (aserv0022.oracle.com [141.146.126.234]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8I4kW9O009781 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Sun, 18 Sep 2016 04:46:33 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0022.oracle.com (8.14.4/8.14.4) with ESMTP id u8I4kSVe025000 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Sun, 18 Sep 2016 04:46:28 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1blTzo-0003zI-Eo; Sat, 17 Sep 2016 21:46:28 -0700 Received: from userv0021.oracle.com ([156.151.31.71]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1blTz5-0003rr-1W for ocfs2-devel@oss.oracle.com; Sat, 17 Sep 2016 21:45:43 -0700 Received: from userp1030.oracle.com (userp1030.oracle.com [156.151.31.80]) by userv0021.oracle.com (8.13.8/8.13.8) with ESMTP id u8I4jgjX004466 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Sun, 18 Sep 2016 04:45:42 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by userp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id u8I4jfbU021113 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Sun, 18 Sep 2016 04:45:42 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.16.0.17/8.16.0.17) with SMTP id u8I4hfvB006241 for ; Sun, 18 Sep 2016 04:45:41 GMT Authentication-Results: oracle.com; spf=pass smtp.mailfrom=zren@suse.com Received: from prv3-mh.provo.novell.com (victor.provo.novell.com [137.65.250.26]) by userp2040.oracle.com with ESMTP id 25gvf6aq5j-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT) for ; Sun, 18 Sep 2016 04:45:40 +0000 Received: from laptop.apac.novell.com (prv-ext-foundry1int.gns.novell.com [137.65.251.240]) by prv3-mh.provo.novell.com with ESMTP (TLS encrypted); Sat, 17 Sep 2016 22:45:23 -0600 From: Eric Ren To: akpm@linux-foundation.org Date: Sun, 18 Sep 2016 12:45:02 +0800 Message-Id: <1474173902-32075-1-git-send-email-zren@suse.com> X-Mailer: git-send-email 2.6.6 X-ServerName: victor.provo.novell.com X-Proofpoint-SPF-Result: pass X-Proofpoint-SPF-Record: v=spf1 include:microfocus.com ~all X-Proofpoint-Virus-Version: vendor=nai engine=5800 definitions=8291 signatures=670696 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 suspectscore=3 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1609020000 definitions=main-1609180068 Cc: mfasheh@suse.com, ocfs2-devel@oss.oracle.com Subject: [Ocfs2-devel] [PATCH v2] ocfs2: fix deadlock on mmapped page in ocfs2_write_begin_nolock() X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0022.oracle.com [141.146.126.234] X-Virus-Scanned: ClamAV using ClamSMTP The testcase "mmaptruncate" of ocfs2-test deadlocks occasionally. In this testcase, we create a 2*CLUSTER_SIZE file and mmap() on it; there are 2 process repeatedly performing the following operations respectively: one is doing memset(mmaped_addr + 2*CLUSTER_SIZE - 1, 'a', 1), while the another is playing ftruncate(fd, 2*CLUSTER_SIZE) and then ftruncate(fd, CLUSTER_SIZE) again and again. This is the backtrace when the deadlock happens: [] __wait_on_bit_lock+0x50/0xa0 [] __lock_page+0xb7/0xc0 [] ? autoremove_wake_function+0x40/0x40 [] ocfs2_write_begin_nolock+0x163f/0x1790 [ocfs2] [] ? ocfs2_allocate_extend_trans+0x180/0x180 [ocfs2] [] ocfs2_page_mkwrite+0x1c7/0x2a0 [ocfs2] [] do_page_mkwrite+0x66/0xc0 [] handle_mm_fault+0x685/0x1350 [] ? __fpu__restore_sig+0x70/0x530 [] __do_page_fault+0x1d8/0x4d0 [] trace_do_page_fault+0x37/0xf0 [] do_async_page_fault+0x19/0x70 [] async_page_fault+0x28/0x30 In ocfs2_write_begin_nolock(), we first grab the pages and then allocate disk space for this write; ocfs2_try_to_free_truncate_log() will be called if -ENOSPC is returned; if we're lucky to get enough clusters, which is usually the case, we start over again. But in ocfs2_free_write_ctxt() the target page isn't unlocked, so we will deadlock when trying to grab the target page again. Also, -ENOMEM might be returned in ocfs2_grab_pages_for_write(). Another deadlock will happen in __do_page_mkwrite() if ocfs2_page_mkwrite() returns non-VM_FAULT_LOCKED, and along with a locked target page. These two errors fail on the same path, so fix them by unlocking the target page manually before ocfs2_free_write_ctxt(). Jan Kara helps me clear out the JBD2 part, and suggest the hint for root cause. Changes since v1: 1. Also put ENOMEM error case into consideration. Signed-off-by: Eric Ren Reviewed-by: He Gang Acked-by: Joseph Qi --- fs/ocfs2/aops.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 98d3654..bbb4b3e 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -1842,6 +1842,16 @@ int ocfs2_write_begin_nolock(struct address_space *mapping, ocfs2_commit_trans(osb, handle); out: + /* + * The mmapped page won't be unlocked in ocfs2_free_write_ctxt(), + * even in case of error here like ENOSPC and ENOMEM. So, we need + * to unlock the target page manually to prevent deadlocks when + * retrying again on ENOSPC, or when returning non-VM_FAULT_LOCKED + * to VM code. + */ + if (wc->w_target_locked) + unlock_page(mmap_page); + ocfs2_free_write_ctxt(inode, wc); if (data_ac) {