From patchwork Fri Oct 5 01:23:35 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10627269 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9ACB0112B for ; Fri, 5 Oct 2018 01:23:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8C6AF29408 for ; Fri, 5 Oct 2018 01:23:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 80C3A2940B; Fri, 5 Oct 2018 01:23:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 901FF29407 for ; Fri, 5 Oct 2018 01:23:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726676AbeJEIUA (ORCPT ); Fri, 5 Oct 2018 04:20:00 -0400 Received: from ipmailnode02.adl6.internode.on.net ([150.101.137.148]:25268 "EHLO ipmailnode02.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726073AbeJEIUA (ORCPT ); Fri, 5 Oct 2018 04:20:00 -0400 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail02.adl6.internode.on.net with ESMTP; 05 Oct 2018 10:53:39 +0930 Received: from discord.disaster.area ([192.168.1.111]) by dastard with esmtp (Exim 4.80) (envelope-from ) id 1g8EqA-0002u8-FK for linux-xfs@vger.kernel.org; Fri, 05 Oct 2018 11:23:38 +1000 Received: from dave by discord.disaster.area with local (Exim 4.91) (envelope-from ) id 1g8EqA-0000OV-Dl for linux-xfs@vger.kernel.org; Fri, 05 Oct 2018 11:23:38 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 1/2] xfs: fix data corruption w/ unaligned dedupe ranges Date: Fri, 5 Oct 2018 11:23:35 +1000 Message-Id: <20181005012336.1418-2-david@fromorbit.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20181005012336.1418-1-david@fromorbit.com> References: <20181005012336.1418-1-david@fromorbit.com> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP From: Dave Chinner A deduplication data corruption is Exposed by fstests generic/505 on XFS. It is caused by extending the block match range to include the partial EOF block, but then allowing unknown data beyond EOF to be considered a "match" to data in the destination file because the comparison is only made to the end of the source file. This corrupts the destination file when the source extent is shared with it. XFS only supports whole block dedupe, but we still need to appear to support whole file dedupe correctly. Hence if the dedupe request includes the last block of the souce file, don't include it in the actual XFS dedupe operation. If the rest of the range dedupes successfully, then report the partial last block as deduped, too, so that userspace sees it as a successful dedupe rather than return EINVAL because we can't dedupe unaligned blocks. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_reflink.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 5289e22cb081..6b0da1b80103 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1222,6 +1222,19 @@ xfs_iolock_two_inodes_and_break_layout( /* * Link a range of blocks from one file to another. + * + * The VFS allows partial EOF blocks to "match" for dedupe even though it hasn't + * checked that the bytes beyond EOF physically match. Hence we cannot use the + * EOF block in the source dedupe range because it's not a complete block match, + * hence can introduce a corruption into the file that has it's + * block replaced. + * + * Despite this issue, we still need to report that range as successfully + * deduped to avoid confusing userspace with EINVAL errors on completely + * matching file data. The only time that an unaligned length will be passed to + * us is when it spans the EOF block of the source file, so if we simply mask it + * down to be block aligned here the we will dedupe everything but that partial + * EOF block. */ int xfs_reflink_remap_range( @@ -1274,6 +1287,14 @@ xfs_reflink_remap_range( if (ret <= 0) goto out_unlock; + /* + * If the dedupe data matches, chop off the partial EOF block + * from the source file so we don't try to dedupe the partial + * EOF block. + */ + if (is_dedupe) + len &= ~((u64)i_blocksize(inode_in) - 1); + /* Attach dquots to dest inode before changing block map */ ret = xfs_qm_dqattach(dest); if (ret) From patchwork Fri Oct 5 01:23:36 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dave Chinner X-Patchwork-Id: 10627271 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A4C2B112B for ; Fri, 5 Oct 2018 01:23:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 9775129408 for ; Fri, 5 Oct 2018 01:23:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8BCFE2940C; Fri, 5 Oct 2018 01:23:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 228E729408 for ; Fri, 5 Oct 2018 01:23:46 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726073AbeJEIUE (ORCPT ); Fri, 5 Oct 2018 04:20:04 -0400 Received: from ipmailnode02.adl6.internode.on.net ([150.101.137.148]:18220 "EHLO ipmailnode02.adl6.internode.on.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726795AbeJEIUD (ORCPT ); Fri, 5 Oct 2018 04:20:03 -0400 Received: from ppp59-167-129-252.static.internode.on.net (HELO dastard) ([59.167.129.252]) by ipmail02.adl6.internode.on.net with ESMTP; 05 Oct 2018 10:53:39 +0930 Received: from discord.disaster.area ([192.168.1.111]) by dastard with esmtp (Exim 4.80) (envelope-from ) id 1g8EqA-0002u9-G1 for linux-xfs@vger.kernel.org; Fri, 05 Oct 2018 11:23:38 +1000 Received: from dave by discord.disaster.area with local (Exim 4.91) (envelope-from ) id 1g8EqA-0000OY-Ex for linux-xfs@vger.kernel.org; Fri, 05 Oct 2018 11:23:38 +1000 From: Dave Chinner To: linux-xfs@vger.kernel.org Subject: [PATCH 2/2] xfs: fix data corruption w/ unaligned reflink ranges Date: Fri, 5 Oct 2018 11:23:36 +1000 Message-Id: <20181005012336.1418-3-david@fromorbit.com> X-Mailer: git-send-email 2.17.0 In-Reply-To: <20181005012336.1418-1-david@fromorbit.com> References: <20181005012336.1418-1-david@fromorbit.com> Sender: linux-xfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP When reflinking sub-file ranges, a data corruption can occur when the source file range includes a partial EOF block. This shares the unknown data beyond EOF into the second file at a position inside EOF, exposing stale data in the second file. XFS only supports whole block sharing, but we still need to support whole file reflink correctly. Hence if the reflink request includes the last block of the souce file, only proceed with the reflink operation if it lands at or past the destination file's current EOF. If it lands within the destination file EOF, reject the entire request with -EINVAL and make the caller go the hard way. This avoids the data corruption vector, but also avoids disruption of returning EINVAL to userspace for the common case of whole file cloning. Signed-off-by: Dave Chinner Reviewed-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_reflink.c | 39 +++++++++++++++++++++++++++++++-------- 1 file changed, 31 insertions(+), 8 deletions(-) diff --git a/fs/xfs/xfs_reflink.c b/fs/xfs/xfs_reflink.c index 6b0da1b80103..2615271603ce 100644 --- a/fs/xfs/xfs_reflink.c +++ b/fs/xfs/xfs_reflink.c @@ -1229,12 +1229,24 @@ xfs_iolock_two_inodes_and_break_layout( * hence can introduce a corruption into the file that has it's * block replaced. * - * Despite this issue, we still need to report that range as successfully - * deduped to avoid confusing userspace with EINVAL errors on completely - * matching file data. The only time that an unaligned length will be passed to - * us is when it spans the EOF block of the source file, so if we simply mask it - * down to be block aligned here the we will dedupe everything but that partial - * EOF block. + * In similar fashion, the VFS file cloning also allows partial EOF blocks to be + * "block aligned" for the purposes of cloning entire files. + * However, if the source file range + * includes the EOF block and it lands within the existing EOF of the + * destination file, then we can expose stale data from beyond the source file + * EOF in the destination file. + * + * XFs doesn't support partial block sharing, so in both cases we have check + * these cases ourselves. For dedupe, we can simply round the length to dedupe + * down to the previous whole block and ignore the partial EOF block. While this + * means we can't dedupe the last block of a file, this is an acceptible + * tradeoff for simplicity on implementation. + * + * For cloning, we want to share the partial EOF block if it is also the new EOF + * block of the destination file. If the partial EOF blck lies inside the + * existing destination EOF, then we have to abort the clone to avoid exposing + * stale data int eh destination file. Hence we reject these clone attempts with + * -EINVAL in this case. */ int xfs_reflink_remap_range( @@ -1255,6 +1267,7 @@ xfs_reflink_remap_range( xfs_filblks_t fsblen; xfs_extlen_t cowextsize; ssize_t ret; + u64 blkmask = i_blocksize(inode_in) - 1; if (!xfs_sb_version_hasreflink(&mp->m_sb)) return -EOPNOTSUPP; @@ -1292,8 +1305,18 @@ xfs_reflink_remap_range( * from the source file so we don't try to dedupe the partial * EOF block. */ - if (is_dedupe) - len &= ~((u64)i_blocksize(inode_in) - 1); + if (is_dedupe) { + len &= ~blkmask; + } else if (len & blkmask) { + /* + * The user is attempting to share a partial EOF block, + * if it's inside the destination EOF then reject it + */ + if (pos_out + len < i_size_read(inode_out)) { + ret = -EINVAL; + goto out_unlock; + } + } /* Attach dquots to dest inode before changing block map */ ret = xfs_qm_dqattach(dest);