From patchwork Fri Jul 10 11:18:38 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Joseph Qi X-Patchwork-Id: 6765101 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id 0B4E5C05AC for ; Fri, 10 Jul 2015 11:20:12 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id F3EAC20646 for ; Fri, 10 Jul 2015 11:20:10 +0000 (UTC) Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id ABE3320630 for ; Fri, 10 Jul 2015 11:20:09 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t6ABJmN0021422 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Fri, 10 Jul 2015 11:19:50 GMT Received: from oss.oracle.com (oss-old-reserved.oracle.com [137.254.22.2]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t6ABJl75010225 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Fri, 10 Jul 2015 11:19:47 GMT Received: from localhost ([127.0.0.1] helo=lb-oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZDWLL-0003D8-93; Fri, 10 Jul 2015 04:19:47 -0700 Received: from userv0022.oracle.com ([156.151.31.74]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1ZDWKk-0003CG-0c for ocfs2-devel@oss.oracle.com; Fri, 10 Jul 2015 04:19:10 -0700 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by userv0022.oracle.com (8.13.8/8.13.8) with ESMTP id t6ABJ9XW017387 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Fri, 10 Jul 2015 11:19:09 GMT Received: from userp2040.oracle.com (userp2040.oracle.com [156.151.31.90]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t6ABJ84I019697 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO) for ; Fri, 10 Jul 2015 11:19:08 GMT Received: from pps.filterd (userp2040.oracle.com [127.0.0.1]) by userp2040.oracle.com (8.14.7/8.14.7) with SMTP id t6ABFSht017395 for ; Fri, 10 Jul 2015 11:19:08 GMT Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [58.251.152.64]) by userp2040.oracle.com with ESMTP id 1vjbkr1y03-1 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=NOT) for ; Fri, 10 Jul 2015 11:19:07 +0000 Received: from 172.24.2.119 (EHLO szxeml430-hub.china.huawei.com) ([172.24.2.119]) by szxrg01-dlp.huawei.com (MOS 4.3.7-GA FastPath queued) with ESMTP id CRA51616; Fri, 10 Jul 2015 19:18:52 +0800 (CST) Received: from [127.0.0.1] (10.177.24.125) by szxeml430-hub.china.huawei.com (10.82.67.185) with Microsoft SMTP Server id 14.3.158.1; Fri, 10 Jul 2015 19:18:41 +0800 Message-ID: <559FAA0E.8090005@huawei.com> Date: Fri, 10 Jul 2015 19:18:38 +0800 From: Joseph Qi User-Agent: Mozilla/5.0 (Windows NT 6.1; rv:17.0) Gecko/20130801 Thunderbird/17.0.8 MIME-Version: 1.0 To: Andrew Morton X-Originating-IP: [10.177.24.125] X-CFilter-Loop: Reflected X-ServerName: szxga01-in.huawei.com X-Proofpoint-Virus-Version: vendor=nai engine=5700 definitions=7857 signatures=670605 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 kscore.is_bulkscore=0 kscore.compositescore=1 compositescore=0.9 suspectscore=2 phishscore=0 bulkscore=0 kscore.is_spamscore=0 rbsscore=0.9 spamscore=0 urlsuspectscore=0.9 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=7.0.1-1506180000 definitions=main-1507100167 Cc: Mark Fasheh , "ocfs2-devel@oss.oracle.com" Subject: [Ocfs2-devel] [PATCH 1/2 v2] ocfs2: fix race between dio and recover orphan X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP During direct io the inode will be added to orphan first and then deleted from orphan. There is a race window that the orphan entry will be deleted twice and thus trigger the BUG when validating OCFS2_DIO_ORPHANED_FL in ocfs2_del_inode_from_orphan. ocfs2_direct_IO_write ... ocfs2_add_inode_to_orphan >>>>>>>> race window. 1) another node may rm the file and then down, this node take care of orphan recovery and clear flag OCFS2_DIO_ORPHANED_FL. 2) since rw lock is unlocked, it may race with another orphan recovery and append dio. ocfs2_del_inode_from_orphan So take Take inode mutex lock when recover orphans and make rw unlock at the end of aio write in case of append dio. Reported-by: Yiwen Jiang Signed-off-by: Joseph Qi --- fs/ocfs2/aops.c | 9 ++++++--- fs/ocfs2/file.c | 2 +- fs/ocfs2/inode.h | 2 -- fs/ocfs2/journal.c | 8 ++++---- fs/ocfs2/namei.c | 42 +++++++++++++----------------------------- fs/ocfs2/super.c | 2 -- 6 files changed, 24 insertions(+), 41 deletions(-) diff --git a/fs/ocfs2/aops.c b/fs/ocfs2/aops.c index 1a35c61..eb8b05d 100644 --- a/fs/ocfs2/aops.c +++ b/fs/ocfs2/aops.c @@ -627,10 +627,13 @@ static void ocfs2_dio_end_io(struct kiocb *iocb, mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio); } - ocfs2_iocb_clear_rw_locked(iocb); + /* Let rw unlock to be done later to protect append direct io write */ + if (offset + bytes <= i_size_read(inode)) { + ocfs2_iocb_clear_rw_locked(iocb); - level = ocfs2_iocb_rw_locked_level(iocb); - ocfs2_rw_unlock(inode, level); + level = ocfs2_iocb_rw_locked_level(iocb); + ocfs2_rw_unlock(inode, level); + } } static int ocfs2_releasepage(struct page *page, gfp_t wait) diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c index 719f7f4..3403a54 100644 --- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -2410,7 +2410,7 @@ no_sync: unaligned_dio = 0; } - if (unaligned_dio) { + if (unaligned_dio && ocfs2_iocb_is_unaligned_aio(iocb)) { ocfs2_iocb_clear_unaligned_aio(iocb); mutex_unlock(&OCFS2_I(inode)->ip_unaligned_aio); } diff --git a/fs/ocfs2/inode.h b/fs/ocfs2/inode.h index 5e86b24..ca3431e 100644 --- a/fs/ocfs2/inode.h +++ b/fs/ocfs2/inode.h @@ -81,8 +81,6 @@ struct ocfs2_inode_info tid_t i_sync_tid; tid_t i_datasync_tid; - wait_queue_head_t append_dio_wq; - struct dquot *i_dquot[MAXQUOTAS]; }; diff --git a/fs/ocfs2/journal.c b/fs/ocfs2/journal.c index 7c099f7..5e56268 100644 --- a/fs/ocfs2/journal.c +++ b/fs/ocfs2/journal.c @@ -2170,6 +2170,7 @@ static int ocfs2_recover_orphans(struct ocfs2_super *osb, iter = oi->ip_next_orphan; oi->ip_next_orphan = NULL; + mutex_lock(&inode->i_mutex); ret = ocfs2_rw_lock(inode, 1); if (ret < 0) { mlog_errno(ret); @@ -2206,17 +2207,16 @@ static int ocfs2_recover_orphans(struct ocfs2_super *osb, ret = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); if (ret) mlog_errno(ret); - - wake_up(&OCFS2_I(inode)->append_dio_wq); } /* else if ORPHAN_NO_NEED_TRUNCATE, do nothing */ unlock_inode: ocfs2_inode_unlock(inode, 1); + brelse(di_bh); + di_bh = NULL; unlock_rw: ocfs2_rw_unlock(inode, 1); next: + mutex_unlock(&inode->i_mutex); iput(inode); - brelse(di_bh); - di_bh = NULL; inode = iter; } diff --git a/fs/ocfs2/namei.c b/fs/ocfs2/namei.c index 6e6abb9..38b3336 100644 --- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -2570,27 +2570,6 @@ leave: return status; } -static int ocfs2_dio_orphan_recovered(struct inode *inode) -{ - int ret; - struct buffer_head *di_bh = NULL; - struct ocfs2_dinode *di = NULL; - - ret = ocfs2_inode_lock(inode, &di_bh, 1); - if (ret < 0) { - mlog_errno(ret); - return 0; - } - - di = (struct ocfs2_dinode *) di_bh->b_data; - ret = !(di->i_flags & cpu_to_le32(OCFS2_DIO_ORPHANED_FL)); - ocfs2_inode_unlock(inode, 1); - brelse(di_bh); - - return ret; -} - -#define OCFS2_DIO_ORPHANED_FL_CHECK_INTERVAL 10000 int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb, struct inode *inode) { @@ -2602,7 +2581,6 @@ int ocfs2_add_inode_to_orphan(struct ocfs2_super *osb, handle_t *handle = NULL; struct ocfs2_dinode *di = NULL; -restart: status = ocfs2_inode_lock(inode, &di_bh, 1); if (status < 0) { mlog_errno(status); @@ -2612,15 +2590,21 @@ restart: di = (struct ocfs2_dinode *) di_bh->b_data; /* * Another append dio crashed? - * If so, wait for recovery first. + * If so, manually recover it first. */ if (unlikely(di->i_flags & cpu_to_le32(OCFS2_DIO_ORPHANED_FL))) { - ocfs2_inode_unlock(inode, 1); - brelse(di_bh); - wait_event_interruptible_timeout(OCFS2_I(inode)->append_dio_wq, - ocfs2_dio_orphan_recovered(inode), - msecs_to_jiffies(OCFS2_DIO_ORPHANED_FL_CHECK_INTERVAL)); - goto restart; + status = ocfs2_truncate_file(inode, di_bh, i_size_read(inode)); + if (status < 0) { + if (status != -ENOSPC) + mlog_errno(status); + goto bail_unlock_inode; + } + + status = ocfs2_del_inode_from_orphan(osb, inode, di_bh, 0, 0); + if (status < 0) { + mlog_errno(status); + goto bail_unlock_inode; + } } status = ocfs2_prepare_orphan_dir(osb, &orphan_dir_inode, diff --git a/fs/ocfs2/super.c b/fs/ocfs2/super.c index 403c566..4474ef2 100644 --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1746,8 +1746,6 @@ static void ocfs2_inode_init_once(void *data) ocfs2_lock_res_init_once(&oi->ip_inode_lockres); ocfs2_lock_res_init_once(&oi->ip_open_lockres); - init_waitqueue_head(&oi->append_dio_wq); - ocfs2_metadata_cache_init(INODE_CACHE(&oi->vfs_inode), &ocfs2_inode_caching_ops);