From patchwork Fri Dec 30 22:20:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085808 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B64AC4167B for ; Sat, 31 Dec 2022 03:05:07 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229906AbiLaDFG (ORCPT ); Fri, 30 Dec 2022 22:05:06 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34570 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236290AbiLaDFF (ORCPT ); Fri, 30 Dec 2022 22:05:05 -0500 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BBFC215F27 for ; Fri, 30 Dec 2022 19:05:04 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id 57C1061D32 for ; Sat, 31 Dec 2022 03:05:04 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B4349C433D2; Sat, 31 Dec 2022 03:05:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672455903; bh=37QeixoFuMvg3qMvwAcU8CjWjfDkXHQ1uiGM9k7iQx8=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=Rjc8/WjUjq+y7GFtugATKFLuYQ2BOOpu9aw5fife0W73+Kii6mgiiWqVUjiHUJ1ls 24i21oiSmfQZ1djcZJx6G2liXlRF5AAIhV7H2x7SFR/2he1ZHFGhiDjBnArsexd4c7 J9JzE1vmRYkIyWvSbd3aN4egNJ7YmJgGr6UpyDp4iENXej437hwviXMcfIu3CT8xSf QPnEuupVUTS4s65Kd8hTu9QtjQSffd1pSrPE6wj6Js/tQrkyLKrYOnH1XWdL1QikUi f5Oh5jnHOC6L+y3wPwuCOZjPEcSyzvzhuHiMyV+MT0Sed1gPa+X+a0zqnoAvkX/2L6 7P90iag3tC19A== Subject: [PATCH 1/3] xfs: enable extent size hints for CoW when rtextsize > 1 From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:20:16 -0800 Message-ID: <167243881611.735065.16808285137243477595.stgit@magnolia> In-Reply-To: <167243881598.735065.1487919004054265294.stgit@magnolia> References: <167243881598.735065.1487919004054265294.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong CoW extent size hints are not allowed on filesystems that have large realtime extents because we only want to perform the minimum required amount of write-around (aka write amplification) for shared extents. On filesystems where rtextsize > 1, allocations can only be done in units of full rt extents, which means that we can only map an entire rt extent's worth of blocks into the data fork. Hole punch requests become conversions to unwritten if the request isn't aligned properly. Because a copy-write fundamentally requires remapping, this means that we also can only do copy-writes of a full rt extent. This is too expensive for large hint sizes, since it's all or nothing. Signed-off-by: Darrick J. Wong --- libxfs/xfs_bmap.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/libxfs/xfs_bmap.c b/libxfs/xfs_bmap.c index d5842e3b4f6..b8fe093f0f3 100644 --- a/libxfs/xfs_bmap.c +++ b/libxfs/xfs_bmap.c @@ -6445,6 +6445,28 @@ xfs_get_cowextsz_hint( if (ip->i_diflags2 & XFS_DIFLAG2_COWEXTSIZE) a = ip->i_cowextsize; if (XFS_IS_REALTIME_INODE(ip)) { + /* + * For realtime files, the realtime extent is the fundamental + * unit of allocation. This means that data sharing and CoW + * remapping can only be done in those units. For filesystems + * where the extent size is larger than one block, write + * requests that are not aligned to an extent boundary employ + * an unshare-around strategy to ensure that all pages for a + * shared extent are fully dirtied. + * + * Because the remapping alignment requirement applies equally + * to all CoW writes, any regular overwrites that could be + * turned (by a speculative CoW preallocation) into a CoW write + * must either employ this dirty-around strategy, or be smart + * enough to ignore the CoW fork mapping unless the entire + * extent is dirty or becomes shared by writeback time. Doing + * the first would dramatically increase write amplification, + * and the second would require deeper insight into the state + * of the page cache during a writeback request. For now, we + * ignore the hint. + */ + if (ip->i_mount->m_sb.sb_rextsize > 1) + return ip->i_mount->m_sb.sb_rextsize; b = 0; if (ip->i_diflags & XFS_DIFLAG_EXTSIZE) b = ip->i_extsize; From patchwork Fri Dec 30 22:20:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085809 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 728D6C4332F for ; Sat, 31 Dec 2022 03:05:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236302AbiLaDFY (ORCPT ); Fri, 30 Dec 2022 22:05:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34594 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236290AbiLaDFX (ORCPT ); Fri, 30 Dec 2022 22:05:23 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D9E7315816 for ; Fri, 30 Dec 2022 19:05:21 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 84314B81E5B for ; Sat, 31 Dec 2022 03:05:20 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 487B6C433EF; Sat, 31 Dec 2022 03:05:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672455919; bh=zPeS1UxEpj1feyJBDz/HvbAlBd1ieN+GIugeL7oTsDQ=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=GtNHlZhqstXnSD7GM5rB2/KRis3668MVYWSY4lDG6LZBHI51xi3AQlN3X1k4FBWeo OyrwrACbSDIWi0PVaejwR1q6/Ow6NiG0W+WYhEsQddya/eKvY+kU+bJUSsgnCJRc/b ZX3LAww+fEhYZGw6OKINldmT77BalX702wul763Z6vccKAC0JdJl+yCuYhXlpNLO4J IS3itI4zrFwMvXuVxHk4cf3tiygtQhr5iVukhtssEynhIxTydalzuJAdi/ep+KJIuT TAgODSWLqVnQubprNHpvOUHSfxJZgNgRGuvK6W9hD0+PVOAjU4vU87Ejzelu+dvEH/ nqSbV6HaFpftA== Subject: [PATCH 2/3] xfs: fix integer overflow when validating extent size hints From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:20:16 -0800 Message-ID: <167243881625.735065.12011702487699537803.stgit@magnolia> In-Reply-To: <167243881598.735065.1487919004054265294.stgit@magnolia> References: <167243881598.735065.1487919004054265294.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Both file extent size hints are stored as 32-bit quantities, in units of filesystem blocks. As part of validating the hints, we convert these quantities to bytes to ensure that the hint is congruent with the file's allocation size. The maximum possible hint value is 2097151 (aka XFS_MAX_BMBT_EXTLEN). If the file allocation unit is larger than 2048, the unit conversion will exceed 32 bits in size, which overflows the uint32_t used to store the value used in the comparison. This isn't a problem for files on the data device since the hint will always be a multiple of the block size. However, this is a problem for realtime files because the rtextent size can be any integer number of fs blocks, and truncation of upper bits changes the outcome of division. Eliminate the overflow by performing the congruency check in units of blocks, not bytes. Otherwise, we get errors like this: $ truncate -s 500T /tmp/a $ mkfs.xfs -f -N /tmp/a -d extszinherit=2097151,rtinherit=1 -r extsize=28k illegal extent size hint 2097151, must be less than 2097151 and a multiple of 7. Signed-off-by: Darrick J. Wong --- libxfs/xfs_inode_buf.c | 20 ++++++-------------- 1 file changed, 6 insertions(+), 14 deletions(-) diff --git a/libxfs/xfs_inode_buf.c b/libxfs/xfs_inode_buf.c index ba4df981bd0..866cc187769 100644 --- a/libxfs/xfs_inode_buf.c +++ b/libxfs/xfs_inode_buf.c @@ -737,13 +737,11 @@ xfs_inode_validate_extsize( bool rt_flag; bool hint_flag; bool inherit_flag; - uint32_t extsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags & XFS_DIFLAG_EXTSIZE); inherit_flag = (flags & XFS_DIFLAG_EXTSZINHERIT); - extsize_bytes = XFS_FSB_TO_B(mp, extsize); /* * This comment describes a historic gap in this verifier function. @@ -772,9 +770,7 @@ xfs_inode_validate_extsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if ((hint_flag || inherit_flag) && !(S_ISDIR(mode) || S_ISREG(mode))) return __this_address; @@ -792,7 +788,7 @@ xfs_inode_validate_extsize( if (mode && !(hint_flag || inherit_flag) && extsize != 0) return __this_address; - if (extsize_bytes % blocksize_bytes) + if (extsize % alloc_unit) return __this_address; if (extsize > XFS_MAX_BMBT_EXTLEN) @@ -827,12 +823,10 @@ xfs_inode_validate_cowextsize( { bool rt_flag; bool hint_flag; - uint32_t cowextsize_bytes; - uint32_t blocksize_bytes; + uint32_t alloc_unit = 1; rt_flag = (flags & XFS_DIFLAG_REALTIME); hint_flag = (flags2 & XFS_DIFLAG2_COWEXTSIZE); - cowextsize_bytes = XFS_FSB_TO_B(mp, cowextsize); /* * Similar to extent size hints, a directory can be configured to @@ -847,9 +841,7 @@ xfs_inode_validate_cowextsize( */ if (rt_flag) - blocksize_bytes = XFS_FSB_TO_B(mp, mp->m_sb.sb_rextsize); - else - blocksize_bytes = mp->m_sb.sb_blocksize; + alloc_unit = mp->m_sb.sb_rextsize; if (hint_flag && !xfs_has_reflink(mp)) return __this_address; @@ -864,7 +856,7 @@ xfs_inode_validate_cowextsize( if (mode && !hint_flag && cowextsize != 0) return __this_address; - if (cowextsize_bytes % blocksize_bytes) + if (cowextsize % alloc_unit) return __this_address; if (cowextsize > XFS_MAX_BMBT_EXTLEN) From patchwork Fri Dec 30 22:20:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13085810 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9898BC4332F for ; Sat, 31 Dec 2022 03:05:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230132AbiLaDFq (ORCPT ); Fri, 30 Dec 2022 22:05:46 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S236267AbiLaDFj (ORCPT ); Fri, 30 Dec 2022 22:05:39 -0500 Received: from ams.source.kernel.org (ams.source.kernel.org [145.40.68.75]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id BA3A815F27 for ; Fri, 30 Dec 2022 19:05:37 -0800 (PST) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ams.source.kernel.org (Postfix) with ESMTPS id 1ACACB81EA2 for ; Sat, 31 Dec 2022 03:05:36 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id C624BC433EF; Sat, 31 Dec 2022 03:05:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1672455934; bh=sjSBYA/knL1WWk0Oudvpjp0psOsDs7f+DmVXuSiNjI0=; h=Subject:From:To:Cc:Date:In-Reply-To:References:From; b=tSGut9Nfa0kWh7FITamIxy1nHEAsqA/F1VY68h3ah6eco34QZhY8Aw1I0O/i+y+iF h2EKjk3kC6juQiJAdA5KyjZS+tfGKRpqADqWEjTLnk13oEgqmckZhfFzgZ1v+3BQEi kau5mgRWtMX5kabuQmdrYT9s+fywyLS1fqG+1GMTsDDhST9Cdt+YxgjVbq2VNvmqVv lh1rxUBWmZzBPJfupvcyLysZ35g8nIcVcWaKV8HQfCU4RUK177oegjNJiJmRZgWsHn qkAtJnKQOgQGozCb4MpEsU/lAxl/6n1ah6E2HORSuBO5fSTIy0mNOIhyFiNZdu7sqe JcM/NutQRKYvg== Subject: [PATCH 3/3] mkfs: enable reflink with realtime extent sizes > 1 From: "Darrick J. Wong" To: djwong@kernel.org, cem@kernel.org Cc: linux-xfs@vger.kernel.org Date: Fri, 30 Dec 2022 14:20:16 -0800 Message-ID: <167243881638.735065.16589572258129197530.stgit@magnolia> In-Reply-To: <167243881598.735065.1487919004054265294.stgit@magnolia> References: <167243881598.735065.1487919004054265294.stgit@magnolia> User-Agent: StGit/0.19 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-xfs@vger.kernel.org From: Darrick J. Wong Allow creation of filesystems with reflink enabled and realtime extent size larger than 1 block. Signed-off-by: Darrick J. Wong --- libxfs/init.c | 7 ------- mkfs/xfs_mkfs.c | 37 ------------------------------------- 2 files changed, 44 deletions(-) diff --git a/libxfs/init.c b/libxfs/init.c index a4023f78655..c04a30bb829 100644 --- a/libxfs/init.c +++ b/libxfs/init.c @@ -448,13 +448,6 @@ rtmount_init( if (mp->m_sb.sb_rblocks == 0) return 0; - if (xfs_has_reflink(mp) && mp->m_sb.sb_rextsize > 1) { - fprintf(stderr, - _("%s: Reflink not compatible with realtime extent size > 1. Please try a newer xfsprogs.\n"), - progname); - return -1; - } - if (mp->m_rtdev_targp->bt_bdev == 0 && !xfs_is_debugger(mp)) { fprintf(stderr, _("%s: filesystem has a realtime subvolume\n"), progname); diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c index e406fa6a5ea..db828deadfb 100644 --- a/mkfs/xfs_mkfs.c +++ b/mkfs/xfs_mkfs.c @@ -2385,24 +2385,6 @@ _("inode btree counters not supported without finobt support\n")); } if (cli->xi->rtname) { - if (cli->rtextsize && cli->sb_feat.reflink) { - if (cli_opt_set(&mopts, M_REFLINK)) { - fprintf(stderr, -_("reflink not supported on realtime devices with rt extent size specified\n")); - usage(); - } - cli->sb_feat.reflink = false; - } - if (cli->blocksize < XFS_MIN_RTEXTSIZE && cli->sb_feat.reflink) { - if (cli_opt_set(&mopts, M_REFLINK)) { - fprintf(stderr, -_("reflink not supported on realtime devices with blocksize %d < %d\n"), - cli->blocksize, - XFS_MIN_RTEXTSIZE); - usage(); - } - cli->sb_feat.reflink = false; - } if (!cli->sb_feat.rtgroups && cli->sb_feat.reflink) { if (cli_opt_set(&mopts, M_REFLINK) && cli_opt_set(&ropts, R_RTGROUPS)) { @@ -2582,19 +2564,6 @@ validate_rtextsize( usage(); } cfg->rtextblocks = (xfs_extlen_t)(rtextbytes >> cfg->blocklog); - } else if (cli->sb_feat.reflink && cli->xi->rtname) { - /* - * reflink doesn't support rt extent size > 1FSB yet, so set - * an extent size of 1FSB. Make sure we still satisfy the - * minimum rt extent size. - */ - if (cfg->blocksize < XFS_MIN_RTEXTSIZE) { - fprintf(stderr, - _("reflink not supported on rt volume with blocksize %d\n"), - cfg->blocksize); - usage(); - } - cfg->rtextblocks = 1; } else { /* * If realtime extsize has not been specified by the user, @@ -2626,12 +2595,6 @@ validate_rtextsize( } } ASSERT(cfg->rtextblocks); - - if (cli->sb_feat.reflink && cfg->rtblocks > 0 && cfg->rtextblocks > 1) { - fprintf(stderr, -_("reflink not supported on realtime with extent sizes > 1\n")); - usage(); - } } /* Validate the incoming extsize hint. */