From patchwork Thu Oct 3 15:09:01 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13821130 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1660D1E52C; Thu, 3 Oct 2024 15:09:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968142; cv=none; b=gY+qq/TQ6GrX5M06hw6ljxKXByYakkJaaZ1dDjj3A3XjPS37l+le+hebgZpHFLdAJJUNRgKPVBAi97PXZTWb5BFzV0ILNrSDFMjHAUgcUh0VtYvPk5ElwOHUgdRDajXWw+6I579QYJG2rdnSlYZ9ijW97d+rXTzZq44jiqcoGrA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968142; c=relaxed/simple; bh=NCjc18fQlwrCcRl2m64OcTUP4xeLduMqkrR3YMLjeSQ=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=J8eA9ccggku3UDe4jBzSmzH1P0HpfFIWN/9/VzWCxfzq5qAI4y2IoaDebFH8dbbm+rRx4pWUK5e6Wk6MbUxlLtvXJQh1tJ39eRlUKolwdK+/ZrodCy4FIrxSTqrDoKZlWA1H7z9ZZXQ4WWiqAPvPxo4vq8+sTuv9bhBH1estxYw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=OqP/QS71; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="OqP/QS71" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9E44FC4CEC5; Thu, 3 Oct 2024 15:09:01 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727968141; bh=NCjc18fQlwrCcRl2m64OcTUP4xeLduMqkrR3YMLjeSQ=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=OqP/QS71ze71nh3zmfa47RLHrWp0Hnx1nODSS07rfHE9pyTe8KkNbiqmt7oifktgh OZcIn6QH5FEdYUtiMsUZEM4jFMl3rhcVjJSSWw0cMsp11S19fL69wfmpqsQtwVw5VR S0aiQcDeCuHku/cuSpWtrb26b/GsUXWajcIWp2gKRMS74Tgvf26jaFsH+8J9Ys1qFZ XgjyVyfFBr+5flK7qSEyRMfOMBvG/vZj6FJKBzBuJp5F3xx6h3eRggcLanY31kWmu7 BD8C476i8nThQKH/Yvsvu3WBDe6aUicMGbNAs7M04//gsGtU/BtZWFj1CfgELjolju SKSY4s6i2XPCA== Date: Thu, 03 Oct 2024 08:09:01 -0700 Subject: [PATCH 1/4] xfs: don't allocate COW extents when unsharing a hole From: "Darrick J. Wong" To: willy@infradead.org, brauner@kernel.org, djwong@kernel.org, cem@kernel.org Cc: ruansy.fnst@fujitsu.com, linux-fsdevel@vger.kernel.org, hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <172796813277.1131942.5486112889531210260.stgit@frogsfrogsfrogs> In-Reply-To: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> References: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong It doesn't make sense to allocate a COW extent when unsharing a hole because holes cannot be shared. Fixes: 1f1397b7218d7 ("xfs: don't allocate into the data fork for an unshare request") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_iomap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 1e11f48814c0..dc0443848485 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -707,7 +707,7 @@ imap_needs_cow( return false; /* when zeroing we don't have to COW holes or unwritten extents */ - if (flags & IOMAP_ZERO) { + if (flags & (IOMAP_UNSHARE | IOMAP_ZERO)) { if (!nimaps || imap->br_startblock == HOLESTARTBLOCK || imap->br_state == XFS_EXT_UNWRITTEN) From patchwork Thu Oct 3 15:09:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13821131 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E1AD9823C3; Thu, 3 Oct 2024 15:09:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968158; cv=none; b=SoLHFi+2zRfDRDRHygQVJvgcfQWpW1yGbxQ5dtwFIExwJqCtTUwSSgcG/7RWSISYRhI/c3D7zZ++c+21HqF9w1WrbYFxvuVBZCMX6g53CzKb9ARxLCyJVw0/GX4tLcguD+LA4uq2UOgJVURiwXQX1wpPAs77bxe/GNA9mLraZUU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968158; c=relaxed/simple; bh=ajmpiaefvtvuMEdWdelFDJpcrfXyWlkxsRWBtS4oBNU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=YeiKCG5EYSkDkpQyqPwaPa/450TxKok7ZqOSEf1VzBNt28IjmWYco427ReAdjIiAw3tOIRffVb4UUegWzLgzW7QDgNUDLDFEetsC6kmcBNK0ssBXWO+4ajUo4QL5rKvIVIETtxRCAHKkdg2Z5jL7SU6urUS/VwUUa9Ug7fvgerk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=W3UYHQDc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="W3UYHQDc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 48583C4CEC5; Thu, 3 Oct 2024 15:09:17 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727968157; bh=ajmpiaefvtvuMEdWdelFDJpcrfXyWlkxsRWBtS4oBNU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=W3UYHQDcP9nlpZiLbuWbmGakoiEWTGrr1Ofn1hueY43/sC36QO85ZRMZt1eUEb8CK gWJVeNqvyIDP3dNMs2LzJLpni7nSH2dgjSWEGniYOdJLTxRvALBHsczAkl4UY9yzjt 9ai2f49KBG67qbRXGOO3hEOoewUktoexsm12j8Q28zYOlb5p1/Ct/08f7GkVgozKFL O4NAlvGcC102Wp3ZvPCQwQJmn6p9ZDkzhKooHHyph8hfciKNUvIG0sHQ31UxoMxPhO Od4ZQbEpCvaSbM8vLSDgZAzFslnfg7keYQl5oueBlE9214UMSnfwFR8Nx5jj/Olc4O QGGQqKwucggTA== Date: Thu, 03 Oct 2024 08:09:16 -0700 Subject: [PATCH 2/4] iomap: share iomap_unshare_iter predicate code with fsdax From: "Darrick J. Wong" To: willy@infradead.org, brauner@kernel.org, djwong@kernel.org, cem@kernel.org Cc: ruansy.fnst@fujitsu.com, linux-fsdevel@vger.kernel.org, hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <172796813294.1131942.15762084021076932620.stgit@frogsfrogsfrogs> In-Reply-To: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> References: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The predicate code that iomap_unshare_iter uses to decide if it's really needs to unshare a file range mapping should be shared with the fsdax version, because right now they're opencoded and inconsistent. Note that we simplify the predicate logic a bit -- we no longer allow unsharing of inline data mappings, but there aren't any filesystems that allow shared inline data currently. This is a fix in the sense that it should have been ported to fsdax. Fixes: b53fdb215d13 ("iomap: improve shared block detection in iomap_unshare_iter") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/dax.c | 3 +-- fs/iomap/buffered-io.c | 30 ++++++++++++++++-------------- include/linux/iomap.h | 1 + 3 files changed, 18 insertions(+), 16 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index c62acd2812f8..5064eefb1c1e 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1268,8 +1268,7 @@ static s64 dax_unshare_iter(struct iomap_iter *iter) s64 ret = 0; void *daddr = NULL, *saddr = NULL; - /* don't bother with blocks that are not shared to start with */ - if (!(iomap->flags & IOMAP_F_SHARED)) + if (!iomap_want_unshare_iter(iter)) return length; id = dax_read_lock(); diff --git a/fs/iomap/buffered-io.c b/fs/iomap/buffered-io.c index 78ebd265f425..3899169b2cf7 100644 --- a/fs/iomap/buffered-io.c +++ b/fs/iomap/buffered-io.c @@ -1309,19 +1309,12 @@ void iomap_file_buffered_write_punch_delalloc(struct inode *inode, } EXPORT_SYMBOL_GPL(iomap_file_buffered_write_punch_delalloc); -static loff_t iomap_unshare_iter(struct iomap_iter *iter) +bool iomap_want_unshare_iter(const struct iomap_iter *iter) { - struct iomap *iomap = &iter->iomap; - loff_t pos = iter->pos; - loff_t length = iomap_length(iter); - loff_t written = 0; - - /* Don't bother with blocks that are not shared to start with. */ - if (!(iomap->flags & IOMAP_F_SHARED)) - return length; - /* - * Don't bother with delalloc reservations, holes or unwritten extents. + * Don't bother with blocks that are not shared to start with; or + * mappings that cannot be shared, such as inline data, delalloc + * reservations, holes or unwritten extents. * * Note that we use srcmap directly instead of iomap_iter_srcmap as * unsharing requires providing a separate source map, and the presence @@ -1329,9 +1322,18 @@ static loff_t iomap_unshare_iter(struct iomap_iter *iter) * IOMAP_F_SHARED which can be set for any data that goes into the COW * fork for XFS. */ - if (iter->srcmap.type == IOMAP_HOLE || - iter->srcmap.type == IOMAP_DELALLOC || - iter->srcmap.type == IOMAP_UNWRITTEN) + return (iter->iomap.flags & IOMAP_F_SHARED) && + iter->srcmap.type == IOMAP_MAPPED; +} + +static loff_t iomap_unshare_iter(struct iomap_iter *iter) +{ + struct iomap *iomap = &iter->iomap; + loff_t pos = iter->pos; + loff_t length = iomap_length(iter); + loff_t written = 0; + + if (!iomap_want_unshare_iter(iter)) return length; do { diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 4ad12a3c8bae..d8a7fc84348c 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -267,6 +267,7 @@ void iomap_invalidate_folio(struct folio *folio, size_t offset, size_t len); bool iomap_dirty_folio(struct address_space *mapping, struct folio *folio); int iomap_file_unshare(struct inode *inode, loff_t pos, loff_t len, const struct iomap_ops *ops); +bool iomap_want_unshare_iter(const struct iomap_iter *iter); int iomap_zero_range(struct inode *inode, loff_t pos, loff_t len, bool *did_zero, const struct iomap_ops *ops); int iomap_truncate_page(struct inode *inode, loff_t pos, bool *did_zero, From patchwork Thu Oct 3 15:09:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13821132 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 11921823C3; Thu, 3 Oct 2024 15:09:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968173; cv=none; b=XCaNvtfeOPxljzW7arp04PAemrw0MWxPIyFtR1Q23nw6l1LurixHZ3jJqlVuvvl8fKpTSnculNm9szdz5OOxdbWYsrqEiilymt9YL+b+tSRWtQ3B8/nMCtAj+eYdVCpA0sT+jD/FUKRQXReLG1/3dssKJ0b1AMACl+r7I9wmyIA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968173; c=relaxed/simple; bh=aGck3uoHkepYDWrYE3ghqnF0P8QpTLJc960nV3wgOyw=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=GIlxsJSpaSZ5uffANLH5zLpsx7iJcPubFq7xmkGzZ8x95nAuHO0sWfJ/9o+44y0b7ghoa4k3cx2W+HO3OJ31NyMRk8S1AZsA/DrEB+N9wFqe102EURbk7igVkvuD0umkhGWIobuC5QqGy3NvTgmTEchXxrFluVqXIgAyLaYDoCU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=ooxUf5mT; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="ooxUf5mT" Received: by smtp.kernel.org (Postfix) with ESMTPSA id DBD5AC4CEC5; Thu, 3 Oct 2024 15:09:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727968172; bh=aGck3uoHkepYDWrYE3ghqnF0P8QpTLJc960nV3wgOyw=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=ooxUf5mTJGczGhrpitaWCOj1O1xgPJOvgVGdG/QSs8oXBfhxGZRqriPCpTXusE1qe MsHvY/s+CV35ZCB0vavWzLgwsrbTwkDlmav9TLpzjg0ZJGHgyM/9mBoDMw2xdzxY3N KLLZnvmWwvChsm/DcqoZ2zs8O6zPV3aspiEUvVPx82O0cQefgM6XGbx58QTlqiT2q+ +kyPqWnH3Q9vZtMyXyI+xps+i1oOgtnDQNRLMj/hbfrHrLqKZf+tklE/FpshWzvT2u uCKNbwjCMtGvrh19bO4+e0cX2eyPsIjaoQdgmimwwl0O/K2KZogqDuM8yhP2qrWVmI AjFgenXnOyJJw== Date: Thu, 03 Oct 2024 08:09:32 -0700 Subject: [PATCH 3/4] fsdax: remove zeroing code from dax_unshare_iter From: "Darrick J. Wong" To: willy@infradead.org, brauner@kernel.org, djwong@kernel.org, cem@kernel.org Cc: ruansy.fnst@fujitsu.com, ruansy.fnst@fujitsu.com, linux-fsdevel@vger.kernel.org, hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <172796813311.1131942.16033376284752798632.stgit@frogsfrogsfrogs> In-Reply-To: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> References: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong Remove the code in dax_unshare_iter that zeroes the destination memory because it's not necessary. If srcmap is unwritten, we don't have to do anything because that unwritten extent came from the regular file mapping, and unwritten extents cannot be shared. The same applies to holes. Furthermore, zeroing to unshare a mapping is just plain wrong because unsharing means copy on write, and we should be copying data. This is effectively a revert of commit 13dd4e04625f ("fsdax: unshare: zero destination if srcmap is HOLE or UNWRITTEN") Cc: ruansy.fnst@fujitsu.com Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/dax.c | 8 -------- 1 file changed, 8 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 5064eefb1c1e..9fbbdaa784b4 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1276,14 +1276,6 @@ static s64 dax_unshare_iter(struct iomap_iter *iter) if (ret < 0) goto out_unlock; - /* zero the distance if srcmap is HOLE or UNWRITTEN */ - if (srcmap->flags & IOMAP_F_SHARED || srcmap->type == IOMAP_UNWRITTEN) { - memset(daddr, 0, length); - dax_flush(iomap->dax_dev, daddr, length); - ret = length; - goto out_unlock; - } - ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL); if (ret < 0) goto out_unlock; From patchwork Thu Oct 3 15:09:48 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13821133 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7AE51B969; Thu, 3 Oct 2024 15:09:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968189; cv=none; b=OrunVc/l2vbMZc0Pjt4of2f9RuXVYSgmsFdHJ9lKaah/U9u4erELf1MrreiovtM0eMnKGQ+9U72Q1/Tqkpqle9MvfmqrKtCfiGcrCq7NxsDwbWTcyfwxGOuPn+rvLdniw47nw1ZaJ/6ah9+TKqBdpqHmBBIpdcBmfCt63MxVAqc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1727968189; c=relaxed/simple; bh=t1erP4051fKxjeO5bhlf4vXNm9QCAZdEkniTTG/tkWU=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cvkXmRsqPu68mzhO1cF2xANpK4xwS76BMTfkAqRnU+CJThBuWVtvmfOVL3JzKRnQcoSvbbALt5i6z2LCqsZtjqnElqmO/fcXAbbfixM6KSELy4m3q/BJ7p8Z4kucDzdx5Q8mFbhxJN0IDiYe9YREyQ0HNBqVLRAPHL3o7PTn4s0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=b5hrQQIc; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="b5hrQQIc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 75C88C4CEC7; Thu, 3 Oct 2024 15:09:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1727968188; bh=t1erP4051fKxjeO5bhlf4vXNm9QCAZdEkniTTG/tkWU=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=b5hrQQIc8tWiAxg1Dl9aDiFL6H9CqgDR/NzOUCbzp8PhFSRxed4dL4kotgU0kUvs3 KzvijsmaSQ/y1xmRxX8AxU5R39B+rDrZQYw0Ps/yYWlM+u+nAydKFxUVYW9r+o1ZVz MBlRIC0ZFppfw2K88UbOqY/bwg8fmpFRHPX4T86Bo5AKfZ2Ydjkb/UIC2c1EQ7IfBG 9I19KQpcm4/RGxErLe3uQrlPBksjCWZejM2746ByhpdyROZqPfaf62WNT8eqDmISQV Kd2tM8G21oRzwzWLi8gAZgNFGjaZtw/RpDDDYOIspFr2lTK8Pjp5kvVEJaZ9XHfSMR 2lsMz57LzArZQ== Date: Thu, 03 Oct 2024 08:09:48 -0700 Subject: [PATCH 4/4] fsdax: dax_unshare_iter needs to copy entire blocks From: "Darrick J. Wong" To: willy@infradead.org, brauner@kernel.org, djwong@kernel.org, cem@kernel.org Cc: ruansy.fnst@fujitsu.com, ruansy.fnst@fujitsu.com, linux-fsdevel@vger.kernel.org, hch@lst.de, linux-xfs@vger.kernel.org Message-ID: <172796813328.1131942.16777025316348797355.stgit@frogsfrogsfrogs> In-Reply-To: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> References: <172796813251.1131942.12184885574609980777.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The code that copies data from srcmap to iomap in dax_unshare_iter is very very broken, which bfoster's recent fsx changes have exposed. If the pos and len passed to dax_file_unshare are not aligned to an fsblock boundary, the iter pos and length in the _iter function will reflect this unalignment. dax_iomap_direct_access always returns a pointer to the start of the kmapped fsdax page, even if its pos argument is in the middle of that page. This is catastrophic for data integrity when iter->pos is not aligned to a page, because daddr/saddr do not point to the same byte in the file as iter->pos. Hence we corrupt user data by copying it to the wrong place. If iter->pos + iomap_length() in the _iter function not aligned to a page, then we fail to copy a full block, and only partially populate the destination block. This is catastrophic for data confidentiality because we expose stale pmem contents. Fix both of these issues by aligning copy_pos/copy_len to a page boundary (remember, this is fsdax so 1 fsblock == 1 base page) so that we always copy full blocks. We're not done yet -- there's no call to invalidate_inode_pages2_range, so programs that have the file range mmap'd will continue accessing the old memory mapping after the file metadata updates have completed. Be careful with the return value -- if the unshare succeeds, we still need to return the number of bytes that the iomap iter thinks we're operating on. Cc: ruansy.fnst@fujitsu.com Fixes: d984648e428b ("fsdax,xfs: port unshare to fsdax") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/dax.c | 34 +++++++++++++++++++++++++++------- 1 file changed, 27 insertions(+), 7 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 9fbbdaa784b4..21b47402b3dc 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -1262,26 +1262,46 @@ static s64 dax_unshare_iter(struct iomap_iter *iter) { struct iomap *iomap = &iter->iomap; const struct iomap *srcmap = iomap_iter_srcmap(iter); - loff_t pos = iter->pos; - loff_t length = iomap_length(iter); + loff_t copy_pos = iter->pos; + u64 copy_len = iomap_length(iter); + u32 mod; int id = 0; s64 ret = 0; void *daddr = NULL, *saddr = NULL; if (!iomap_want_unshare_iter(iter)) - return length; + return iomap_length(iter); + + /* + * Extend the file range to be aligned to fsblock/pagesize, because + * we need to copy entire blocks, not just the byte range specified. + * Invalidate the mapping because we're about to CoW. + */ + mod = offset_in_page(copy_pos); + if (mod) { + copy_len += mod; + copy_pos -= mod; + } + + mod = offset_in_page(copy_pos + copy_len); + if (mod) + copy_len += PAGE_SIZE - mod; + + invalidate_inode_pages2_range(iter->inode->i_mapping, + copy_pos >> PAGE_SHIFT, + (copy_pos + copy_len - 1) >> PAGE_SHIFT); id = dax_read_lock(); - ret = dax_iomap_direct_access(iomap, pos, length, &daddr, NULL); + ret = dax_iomap_direct_access(iomap, copy_pos, copy_len, &daddr, NULL); if (ret < 0) goto out_unlock; - ret = dax_iomap_direct_access(srcmap, pos, length, &saddr, NULL); + ret = dax_iomap_direct_access(srcmap, copy_pos, copy_len, &saddr, NULL); if (ret < 0) goto out_unlock; - if (copy_mc_to_kernel(daddr, saddr, length) == 0) - ret = length; + if (copy_mc_to_kernel(daddr, saddr, copy_len) == 0) + ret = iomap_length(iter); else ret = -EIO;