From patchwork Mon Jun 9 20:04:04 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Morton X-Patchwork-Id: 4323421 Return-Path: X-Original-To: patchwork-ocfs2-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.19.201]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B33CFBEEAA for ; Mon, 9 Jun 2014 20:05:02 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id B324B20204 for ; Mon, 9 Jun 2014 20:05:01 +0000 (UTC) Received: from aserp1040.oracle.com (aserp1040.oracle.com [141.146.126.69]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D88E201BB for ; Mon, 9 Jun 2014 20:05:00 +0000 (UTC) Received: from acsinet22.oracle.com (acsinet22.oracle.com [141.146.126.238]) by aserp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s59K4jRN001321 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Mon, 9 Jun 2014 20:04:45 GMT Received: from oss.oracle.com (oss-external.oracle.com [137.254.96.51]) by acsinet22.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s59K4itA015123 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 9 Jun 2014 20:04:44 GMT Received: from localhost ([127.0.0.1] helo=oss.oracle.com) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Wu5oC-0000Bh-Km; Mon, 09 Jun 2014 13:04:44 -0700 Received: from ucsinet21.oracle.com ([156.151.31.93]) by oss.oracle.com with esmtp (Exim 4.63) (envelope-from ) id 1Wu5nb-00008r-JQ for ocfs2-devel@oss.oracle.com; Mon, 09 Jun 2014 13:04:07 -0700 Received: from aserp1030.oracle.com (aserp1030.oracle.com [141.146.126.68]) by ucsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s59K46pB022953 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 9 Jun 2014 20:04:07 GMT Received: from mail-ve0-f201.google.com (mail-ve0-f201.google.com [209.85.128.201]) by aserp1030.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s59K45xu001867 (version=TLSv1/SSLv3 cipher=RC4-SHA bits=128 verify=OK) for ; Mon, 9 Jun 2014 20:04:06 GMT Received: by mail-ve0-f201.google.com with SMTP id jz11so376736veb.0 for ; Mon, 09 Jun 2014 13:04:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:subject:to:cc:from:date:mime-version :content-type:content-transfer-encoding:message-id; bh=Q/EobXSPBRgJ9M/5v2+7tm+H02YMxqxMQ7b/LKxbtBM=; b=GqfAxwSITYibTsNHxXsNn4G87v3ewRtsexeQprcnm+3ZX0ce/7rt2+yNw2PJvpSxxp /eqV4iWjLEkjrS9eRBnajE6bRbt11JQkCzjop1rqDH+gvMnp2J8zZO8RcvP3vqkqKdrn DQ5HgcOIbGLlmUWqZ1MQ+4FdTMwNvmkt2HV32je/2siMRee977T9PflJHjtymZflIHED RsCXT5/2rFyRvSE7cmvzaX5xlzb5fGA1avfDkUfOMtunYHaA98ddvUlO1zhu6Zi3k3lS jVLNTKCmY1M5POIJ9vaYeLa5XYvWGtPJpKZnR5+W08k6vAVsg8WNpxbR1+KqEhNt5G7l hwHA== X-Gm-Message-State: ALoCoQm6lwNFaYBwb07GZcpScb+aQwbaA4Mid2ZTevFBu276mKy3YsV1FdtrLb1Zjh6c9TuHwrw3 X-Received: by 10.58.160.72 with SMTP id xi8mr14646289veb.15.1402344245300; Mon, 09 Jun 2014 13:04:05 -0700 (PDT) Received: from corp2gmr1-2.hot.corp.google.com (corp2gmr1-2.hot.corp.google.com [172.24.189.93]) by gmr-mx.google.com with ESMTPS id t4si1091503yhm.0.2014.06.09.13.04.05 for (version=TLSv1.1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 09 Jun 2014 13:04:05 -0700 (PDT) Received: from localhost.localdomain (akpm3.mtv.corp.google.com [172.17.131.127]) by corp2gmr1-2.hot.corp.google.com (Postfix) with ESMTP id B3C6B5A4A39; Mon, 9 Jun 2014 13:04:04 -0700 (PDT) To: ocfs2-devel@oss.oracle.com From: akpm@linux-foundation.org Date: Mon, 09 Jun 2014 13:04:04 -0700 MIME-Version: 1.0 Message-Id: <20140609200404.B3C6B5A4A39@corp2gmr1-2.hot.corp.google.com> X-Flow-Control-Info: class=Pass-to-MM reputation=ipRisk-All ip=209.85.128.201 ct-class=R5 ct-vol1=-85 ct-vol2=7 ct-vol3=6 ct-risk=45 ct-spam1=68 ct-spam2=8 ct-bulk=63 rcpts=1 size=6081 X-Sendmail-CM-Score: 0.00% X-Sendmail-CM-Analysis: v=2.1 cv=cOfZ7DuN c=1 sm=1 tr=0 a=HGuyF6nJB24snP6Vn0f3iQ==:117 a=8jfahF8B4_cA:10 a=NEiEQogP1MkA:10 a=os2CZ2fo8YAA:10 a=Z4Rwk6OoAAAA:8 a=1XWaLZrsAAAA:8 a=yPCof4ZbAAAA:8 a=iox4zFpeAAAA:8 a=IXr_WNlcAAAA:8 a=0an2JgK4IuGVRYeJMU0A:9 a=Yv5mUB7k -RnX2cil:21 a=QYahzMhFKYrc_5nk:21 a=e4xtJxf3HDoA:10 a=7DSvI1NPTFQA:10 a=n9GBPR9yFnkA:10 a=T5ZRoNnfl4MA:10 a=jbrJJM5MRmoA:10 X-Sendmail-CT-RefID: str=0001.0A090203.53961336.00B7, ss=1, re=0.000, recu=0.000, reip=0.000, cl=1, cld=1, fgs=0 X-Sendmail-CT-Classification: not spam Cc: mfasheh@suse.com Subject: [Ocfs2-devel] [patch 5/8] ocfs2: refcount: take inode_lock until write io issued X-BeenThere: ocfs2-devel@oss.oracle.com X-Mailman-Version: 2.1.9 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: ocfs2-devel-bounces@oss.oracle.com Errors-To: ocfs2-devel-bounces@oss.oracle.com X-Source-IP: acsinet22.oracle.com [141.146.126.238] X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Wengang Wang Subject: ocfs2: refcount: take inode_lock until write io issued This patch tries to fix this crash: #5 [ffff88003c1cd690] do_invalid_op at ffffffff810166d5 #6 [ffff88003c1cd730] invalid_op at ffffffff8159b2de [exception RIP: ocfs2_direct_IO_get_blocks+359] RIP: ffffffffa05dfa27 RSP: ffff88003c1cd7e8 RFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88003c1cdaa8 RCX: 0000000000000000 RDX: 000000000000000c RSI: ffff880027a95000 RDI: ffff88003c79b540 RBP: ffff88003c1cd858 R8: 0000000000000000 R9: ffffffff815f6ba0 R10: 00000000000001c9 R11: 00000000000001c9 R12: ffff88002d271500 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000001000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #7 [ffff88003c1cd860] do_direct_IO at ffffffff811cd31b #8 [ffff88003c1cd950] direct_IO_iovec at ffffffff811cde9c #9 [ffff88003c1cd9b0] do_blockdev_direct_IO at ffffffff811ce764 #10 [ffff88003c1cdb80] __blockdev_direct_IO at ffffffff811ce7cc #11 [ffff88003c1cdbb0] ocfs2_direct_IO at ffffffffa05df756 [ocfs2] #12 [ffff88003c1cdbe0] generic_file_direct_write_iter at ffffffff8112f935 #13 [ffff88003c1cdc40] ocfs2_file_write_iter at ffffffffa0600ccc [ocfs2] #14 [ffff88003c1cdd50] do_aio_write at ffffffff8119126c #15 [ffff88003c1cddc0] aio_rw_vect_retry at ffffffff811d9bb4 #16 [ffff88003c1cddf0] aio_run_iocb at ffffffff811db880 #17 [ffff88003c1cde30] io_submit_one at ffffffff811dc238 #18 [ffff88003c1cde80] do_io_submit at ffffffff811dc437 #19 [ffff88003c1cdf70] sys_io_submit at ffffffff811dc530 #20 [ffff88003c1cdf80] system_call_fastpath at ffffffff8159a159 It crashes at BUG_ON(create && (ext_flags & OCFS2_EXT_REFCOUNTED)); in ocfs2_direct_IO_get_blocks. ocfs2_direct_IO_get_blocks is expecting the OCFS2_EXT_REFCOUNTED be removed in ocfs2_prepare_inode_for_write() if it was there. But no cluster lock is taken during the time before (or inside) ocfs2_prepare_inode_for_write() and after ocfs2_direct_IO_get_blocks(). It can happen in this case: Node A(which crashes) Node B ------------------------ --------------------------- ocfs2_file_aio_write ocfs2_prepare_inode_for_write ocfs2_inode_lock ... ocfs2_inode_unlock #no refcount found .... ocfs2_reflink ocfs2_inode_lock ... ocfs2_inode_unlock #now, refcount flag set on extent ... flush change to disk ocfs2_direct_IO_get_blocks ocfs2_get_clusters #extent map miss #buffer_head miss read extents from disk found refcount flag on extent crash.. Fix: We have to take the inode_lock long enough until ocfs2_direct_IO_get_blocks finished. Signed-off-by: Wengang Wang Cc: Mark Fasheh Cc: Joel Becker Signed-off-by: Andrew Morton --- fs/ocfs2/file.c | 23 ++++++++++++++++++++--- fs/ocfs2/ocfs2_fs.h | 2 +- 2 files changed, 21 insertions(+), 4 deletions(-) diff -puN fs/ocfs2/file.c~refcount-take-inode_lock-until-write-io-issued fs/ocfs2/file.c --- a/fs/ocfs2/file.c~refcount-take-inode_lock-until-write-io-issued +++ a/fs/ocfs2/file.c @@ -2104,13 +2104,16 @@ static int ocfs2_prepare_inode_for_write size_t count, int appending, int *direct_io, - int *has_refcount) + int *has_refcount, + int *meta_level_out) { int ret = 0, meta_level = 0; struct dentry *dentry = file->f_path.dentry; struct inode *inode = dentry->d_inode; loff_t saved_pos = 0, end; + if (meta_level_out) + *meta_level_out = -1; /* * We start with a read level meta lock and only jump to an ex * if we need to make modifications here. @@ -2226,6 +2229,15 @@ out_unlock: saved_pos, appending, count, direct_io, has_refcount); + /* + * If direct IO would be done later, we have to keep inode_lock locked. + * Buffer'd IO is fine since the COW work will be done again in + * ocfs2_write_begin. + */ + if (direct_io && *direct_io && meta_level_out && !ret) { + *meta_level_out = meta_level; + meta_level = -1; + } if (meta_level >= 0) ocfs2_inode_unlock(inode, meta_level); @@ -2251,6 +2263,7 @@ static ssize_t ocfs2_file_aio_write(stru int full_coherency = !(osb->s_mount_opt & OCFS2_MOUNT_COHERENCY_BUFFERED); int unaligned_dio = 0; + int meta_level = -1; trace_ocfs2_file_aio_write(inode, file, file->f_path.dentry, (unsigned long long)OCFS2_I(inode)->ip_blkno, @@ -2310,7 +2323,8 @@ relock: can_do_direct = direct_io; ret = ocfs2_prepare_inode_for_write(file, ppos, iocb->ki_nbytes, appending, - &can_do_direct, &has_refcount); + &can_do_direct, &has_refcount, + &meta_level); if (ret < 0) { mlog_errno(ret); goto out; @@ -2434,6 +2448,8 @@ out_sems: if (have_alloc_sem) ocfs2_iocb_clear_sem_locked(iocb); + if (meta_level >= 0) + ocfs2_inode_unlock(inode, meta_level); mutex_unlock(&inode->i_mutex); if (written) @@ -2448,7 +2464,8 @@ static int ocfs2_splice_to_file(struct p int ret; ret = ocfs2_prepare_inode_for_write(out, &sd->pos, - sd->total_len, 0, NULL, NULL); + sd->total_len, 0, NULL, NULL, + NULL); if (ret < 0) { mlog_errno(ret); return ret; diff -puN fs/ocfs2/ocfs2_fs.h~refcount-take-inode_lock-until-write-io-issued fs/ocfs2/ocfs2_fs.h --- a/fs/ocfs2/ocfs2_fs.h~refcount-take-inode_lock-until-write-io-issued +++ a/fs/ocfs2/ocfs2_fs.h @@ -724,7 +724,7 @@ struct ocfs2_dinode { __le64 i_xattr_loc; /*80*/ struct ocfs2_block_check i_check; /* Error checking */ /*88*/ __le64 i_dx_root; /* Pointer to dir index root block */ -/*90*/ __le64 i_refcount_loc; +/*90*/ __le64 i_refcount_loc; /* Root block of the refcount tree */ __le64 i_suballoc_loc; /* Suballocator block group this inode belongs to. Only valid if allocated from a