diff mbox series

[RFC,v4,6/8] xfs: correct the truncate blocksize of realtime inode

Message ID 20240529095206.2568162-7-yi.zhang@huaweicloud.com (mailing list archive)
State Superseded, archived
Headers show
Series iomap/xfs: fix stale data exposure when truncating realtime inodes | expand

Commit Message

Zhang Yi May 29, 2024, 9:52 a.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

When unaligned truncating down a realtime file which sb_rextsize is
bigger than one block, xfs_truncate_page() only zeros out the tail EOF
block, this could expose stale data since commit '943bc0882ceb ("iomap:
don't increase i_size if it's not a write operation")'.

If we truncate file that contains a large enough written extent:

     |<    rxext    >|<    rtext    >|
  ...WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
        ^ (new EOF)      ^ old EOF

Since we only zeros out the tail of the EOF block, and
xfs_itruncate_extents() unmap the whole ailgned extents, it becomes
this state:

     |<    rxext    >|
  ...WWWzWWWWWWWWWWWWW
        ^ new EOF

Then if we do an extending write like this, the blocks in the previous
tail extent becomes stale:

     |<    rxext    >|
  ...WWWzSSSSSSSSSSSSS..........WWWWWWWWWWWWWWWWW
        ^ old EOF               ^ append start  ^ new EOF

Fix this by zeroing out the tail allocation uint and also make sure
xfs_itruncate_extents() unmap rtextsize aligned extents.

Fixes: 943bc0882ceb ("iomap: don't increase i_size if it's not a write operation")
Reported-by: Chandan Babu R <chandanbabu@kernel.org>
Link: https://lore.kernel.org/linux-xfs/0b92a215-9d9b-3788-4504-a520778953c2@huaweicloud.com
Signed-off-by: Zhang Yi <yi.zhang@huawei.com>
---
 fs/xfs/xfs_inode.c | 3 +++
 fs/xfs/xfs_iops.c  | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig May 31, 2024, 1:36 p.m. UTC | #1
On Wed, May 29, 2024 at 05:52:04PM +0800, Zhang Yi wrote:
> +	if (xfs_inode_has_bigrtalloc(ip))
> +		first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block);

Given that first_unmap_block is a xfs_fileoff_t and not a xfs_rtblock_t,
this looks a bit confusing.  I'd suggest to just open code the
arithmetics in xfs_rtb_roundup_rtx.  For future proofing my also
use xfs_inode_alloc_unitsize() as in the hunk below instead of hard
coding the rtextsize.  I.e.:

	first_unmap_block = XFS_B_TO_FSB(mp,
		roundup_64(new_size, xfs_inode_alloc_unitsize(ip)));
Zhang Yi June 3, 2024, 2:35 p.m. UTC | #2
On 2024/5/31 21:36, Christoph Hellwig wrote:
> On Wed, May 29, 2024 at 05:52:04PM +0800, Zhang Yi wrote:
>> +	if (xfs_inode_has_bigrtalloc(ip))
>> +		first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block);
> 
> Given that first_unmap_block is a xfs_fileoff_t and not a xfs_rtblock_t,
> this looks a bit confusing.  I'd suggest to just open code the
> arithmetics in xfs_rtb_roundup_rtx.  For future proofing my also
> use xfs_inode_alloc_unitsize() as in the hunk below instead of hard
> coding the rtextsize.  I.e.:
> 
> 	first_unmap_block = XFS_B_TO_FSB(mp,
> 		roundup_64(new_size, xfs_inode_alloc_unitsize(ip)));
> 
Sure, makes sense to me.

Thanks,
Yi.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 58fb7a5062e1..db35167acef6 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -35,6 +35,7 @@ 
 #include "xfs_trans_priv.h"
 #include "xfs_log.h"
 #include "xfs_bmap_btree.h"
+#include "xfs_rtbitmap.h"
 #include "xfs_reflink.h"
 #include "xfs_ag.h"
 #include "xfs_log_priv.h"
@@ -1512,6 +1513,8 @@  xfs_itruncate_extents_flags(
 	 * the page cache can't scale that far.
 	 */
 	first_unmap_block = XFS_B_TO_FSB(mp, (xfs_ufsize_t)new_size);
+	if (xfs_inode_has_bigrtalloc(ip))
+		first_unmap_block = xfs_rtb_roundup_rtx(mp, first_unmap_block);
 	if (!xfs_verify_fileoff(mp, first_unmap_block)) {
 		WARN_ON_ONCE(first_unmap_block > XFS_MAX_FILEOFF);
 		return 0;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index d24927075022..ec7b7bdf8825 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -865,7 +865,7 @@  xfs_setattr_size(
 	 */
 	write_back = newsize > ip->i_disk_size && oldsize != ip->i_disk_size;
 	if (newsize < oldsize) {
-		unsigned int blocksize = i_blocksize(inode);
+		unsigned int blocksize = xfs_inode_alloc_unitsize(ip);
 
 		/*
 		 * Zeroing out the partial EOF block and the rest of the extra