diff mbox series

xfs: allow changing extsize on file

Message ID 20240508170343.2774-1-wen.gang.wang@oracle.com (mailing list archive)
State New
Headers show
Series xfs: allow changing extsize on file | expand

Commit Message

Wengang Wang May 8, 2024, 5:03 p.m. UTC
Hi Dave, this is more a question than a patch.

We are current disallowing the change of extsize on files/dirs if the file/dir
have blocks allocated. That's not that friendly to users. Say somehow the
extsize was set very huge (1GiB), in the following cases, it's not that
convenient:
case 1: the file now extends very little. -- 1GiB extsize leads a waste of
        almost 1GiB.
case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the
        IO pattern, so the huge preallocting and then reclaiming is not necessary
        and that cost extra time especially when the system if fragmented.

In above cases, changing extsize smaller is needed.

In theory, the exthint is a hint for future allocation, I can't connect it
to the blocks which are already allocated to the file/dir.
So the only reason why we disallow that is that there might be some problems if
we allow it.  Well, can we fix the real problem(s) rather than disallowing
extsize changing?

Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
---
 fs/xfs/xfs_ioctl.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

Comments

Dave Chinner May 11, 2024, 1:56 a.m. UTC | #1
On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote:
> Hi Dave, this is more a question than a patch.
> 
> We are current disallowing the change of extsize on files/dirs if the file/dir
> have blocks allocated. That's not that friendly to users. Say somehow the
> extsize was set very huge (1GiB), in the following cases, it's not that

The first problem is ensuring that "say somehow extsize was set very
huge" doesn't happen in the first place. Then all the other problems
just don't happen.

> convenient:
> case 1: the file now extends very little. -- 1GiB extsize leads a waste of
>         almost 1GiB.
> case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the
>         IO pattern, so the huge preallocting and then reclaiming is not necessary
>         and that cost extra time especially when the system if fragmented.
> 
> In above cases, changing extsize smaller is needed.
> 
> In theory, the exthint is a hint for future allocation,

It's not that simple because future allocation is influenced by past
allocation. e.g. What happens if the new extent size hint is not
aligned with the old one and we now have two different extent
alignments in the file?

What happens if an admin sees this when trying to triage some
other problem and doesn't know that the extent size hint has been
changed? They'll think there is a bug in the filesystem allocator
and report it.

What do we do with that report now? Do we waste hours trying to
reproduce it and fail, maybe never learning that the an extent
size hint change caused the issue? i.e. how do we determine that the
issue is a real allocation alignment bug versus it simply being a
result of "application did something whacky with extent size hints"?

Hence allowing extent size hints to change dynamically basically
makes it impossible to trust that the current extent size hint
defines the alignment for all the extents in the file. And at that
point, we completely lose the ability to triage allocation alignment
issues without an exact reproducer from the reporter...

Now, just disabling extent size hints avoids this issue (i.e. allow
return to zero if extents already exist) because there's now no
alignment restriction at all and nobody is going to care. However,
this creates new issues.

e.g it opens up the possibility that applications will scan existing
files for extent size hints set on them and be able to -override the
admin set alignment hints- used to create the data set.

The admin may have set inheritable extent size hints to ensure
allocation alignment to underlying storage because the applications
don't know about optimal storage alignments (e.g. for PMD alignment
on DAX storage). We don't want applications to be able to disable
these hints because the precise reason they are set is to optimise
storage alignment for better application performance....

IOWs, there are good reasons for not allowing extent size hints to
be overrridden by applications just by clearing/changing the inode
extent size field...

> I can't connect it
> to the blocks which are already allocated to the file/dir.
> So the only reason why we disallow that is that there might be some problems if
> we allow it.  Well, can we fix the real problem(s) rather than disallowing
> extsize changing?

The only reliable way to change extent size hints so allocation
alignment always matches the new extent size hint is to physically
realign the data in the file to the new extent size hint. i.e. do it
through xfs_fsr to "defrag" the file according to the new extent
size hint. Then when we swap the old and new data extents, we also
set the new extent size hint that matches the new data extents.

This extent size hint change is then enabled through a completely
different interface which is not one applications will use in
general operation. Hence it becomes an explicit admin operation,
enabling users to rectify the rare problems you document above
without compromising the existing behaviour of extent size hints for
everyone else.

Cheers,

Dave.
Wengang Wang May 13, 2024, 5:17 p.m. UTC | #2
Thanks Dave for replying. I will think about it.

Wengang

> On May 10, 2024, at 6:56 PM, Dave Chinner <david@fromorbit.com> wrote:
> 
> On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote:
>> Hi Dave, this is more a question than a patch.
>> 
>> We are current disallowing the change of extsize on files/dirs if the file/dir
>> have blocks allocated. That's not that friendly to users. Say somehow the
>> extsize was set very huge (1GiB), in the following cases, it's not that
> 
> The first problem is ensuring that "say somehow extsize was set very
> huge" doesn't happen in the first place. Then all the other problems
> just don't happen.
> 
>> convenient:
>> case 1: the file now extends very little. -- 1GiB extsize leads a waste of
>>        almost 1GiB.
>> case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the
>>        IO pattern, so the huge preallocting and then reclaiming is not necessary
>>        and that cost extra time especially when the system if fragmented.
>> 
>> In above cases, changing extsize smaller is needed.
>> 
>> In theory, the exthint is a hint for future allocation,
> 
> It's not that simple because future allocation is influenced by past
> allocation. e.g. What happens if the new extent size hint is not
> aligned with the old one and we now have two different extent
> alignments in the file?
> 
> What happens if an admin sees this when trying to triage some
> other problem and doesn't know that the extent size hint has been
> changed? They'll think there is a bug in the filesystem allocator
> and report it.
> 
> What do we do with that report now? Do we waste hours trying to
> reproduce it and fail, maybe never learning that the an extent
> size hint change caused the issue? i.e. how do we determine that the
> issue is a real allocation alignment bug versus it simply being a
> result of "application did something whacky with extent size hints"?
> 
> Hence allowing extent size hints to change dynamically basically
> makes it impossible to trust that the current extent size hint
> defines the alignment for all the extents in the file. And at that
> point, we completely lose the ability to triage allocation alignment
> issues without an exact reproducer from the reporter...
> 
> Now, just disabling extent size hints avoids this issue (i.e. allow
> return to zero if extents already exist) because there's now no
> alignment restriction at all and nobody is going to care. However,
> this creates new issues.
> 
> e.g it opens up the possibility that applications will scan existing
> files for extent size hints set on them and be able to -override the
> admin set alignment hints- used to create the data set.
> 
> The admin may have set inheritable extent size hints to ensure
> allocation alignment to underlying storage because the applications
> don't know about optimal storage alignments (e.g. for PMD alignment
> on DAX storage). We don't want applications to be able to disable
> these hints because the precise reason they are set is to optimise
> storage alignment for better application performance....
> 
> IOWs, there are good reasons for not allowing extent size hints to
> be overrridden by applications just by clearing/changing the inode
> extent size field...
> 
>> I can't connect it
>> to the blocks which are already allocated to the file/dir.
>> So the only reason why we disallow that is that there might be some problems if
>> we allow it.  Well, can we fix the real problem(s) rather than disallowing
>> extsize changing?
> 
> The only reliable way to change extent size hints so allocation
> alignment always matches the new extent size hint is to physically
> realign the data in the file to the new extent size hint. i.e. do it
> through xfs_fsr to "defrag" the file according to the new extent
> size hint. Then when we swap the old and new data extents, we also
> set the new extent size hint that matches the new data extents.
> 
> This extent size hint change is then enabled through a completely
> different interface which is not one applications will use in
> general operation. Hence it becomes an explicit admin operation,
> enabling users to rectify the rare problems you document above
> without compromising the existing behaviour of extent size hints for
> everyone else.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com
kernel test robot May 20, 2024, 4:58 a.m. UTC | #3
hi, Wengang Wang,

we noticed "this is more a question than a patch."
but not sure if below report could supply any useful information to you, so
still send FYI what we observed in our tests.


Hello,

kernel test robot noticed "xfstests.xfs.207.fail" on:

commit: ab25dcd364e632586fe0bd3a2cdb194e03f5b484 ("[PATCH] xfs: allow changing extsize on file")
url: https://github.com/intel-lab-lkp/linux/commits/Wengang-Wang/xfs-allow-changing-extsize-on-file/20240509-010818
base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next
patch link: https://lore.kernel.org/all/20240508170343.2774-1-wen.gang.wang@oracle.com/
patch subject: [PATCH] xfs: allow changing extsize on file

in testcase: xfstests
version: xfstests-x86_64-b26d68da-1_20240517
with following parameters:

	disk: 4HDD
	fs: xfs
	test: xfs-207



compiler: gcc-13
test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202405201028.ca3b53bf-oliver.sang@intel.com

2024-05-17 16:19:24 export TEST_DIR=/fs/sda1
2024-05-17 16:19:24 export TEST_DEV=/dev/sda1
2024-05-17 16:19:24 export FSTYP=xfs
2024-05-17 16:19:24 export SCRATCH_MNT=/fs/scratch
2024-05-17 16:19:24 mkdir /fs/scratch -p
2024-05-17 16:19:24 export SCRATCH_DEV=/dev/sda4
2024-05-17 16:19:24 export SCRATCH_LOGDEV=/dev/sda2
2024-05-17 16:19:24 export SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4
2024-05-17 16:19:24 export SCRATCH_XFS_LIST_FUZZ_VERBS=random
2024-05-17 16:19:24 export MKFS_OPTIONS=-mreflink=1
2024-05-17 16:19:24 echo xfs/207
2024-05-17 16:19:24 ./check xfs/207
FSTYP         -- xfs (debug)
PLATFORM      -- Linux/x86_64 lkp-skl-d06 6.9.0-rc4-00248-gab25dcd364e6 #1 SMP PREEMPT_DYNAMIC Fri May 17 14:27:24 CST 2024
MKFS_OPTIONS  -- -f -mreflink=1 /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /fs/scratch

xfs/207       - output mismatch (see /lkp/benchmarks/xfstests/results//xfs/207.out.bad)
    --- tests/xfs/207.out	2024-05-17 14:38:40.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//xfs/207.out.bad	2024-05-17 16:20:09.902112701 +0000
    @@ -3,12 +3,11 @@
     Create the original files
     Set extsz and cowextsz on zero byte file
     Set extsz and cowextsz on 1Mbyte file
    -xfs_io: FS_IOC_FSSETXATTR SCRATCH_MNT/test-207/file2: Invalid argument
     Check extsz and cowextsz settings on zero byte file
     [1048576] SCRATCH_MNT/test-207/file1
     [1048576] SCRATCH_MNT/test-207/file1
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/xfs/207.out /lkp/benchmarks/xfstests/results//xfs/207.out.bad'  to see the entire diff)
Ran: xfs/207
Failures: xfs/207
Failed 1 of 1 tests




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240520/202405201028.ca3b53bf-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
index d0e2cec6210d..b34992d9932f 100644
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@ -1221,8 +1221,7 @@  xfs_ioctl_setattr_get_trans(
 }
 
 /*
- * Validate a proposed extent size hint.  For regular files, the hint can only
- * be changed if no extents are allocated.
+ * Validate a proposed extent size hint.
  */
 static int
 xfs_ioctl_setattr_check_extsize(
@@ -1236,10 +1235,6 @@  xfs_ioctl_setattr_check_extsize(
 	if (!fa->fsx_valid)
 		return 0;
 
-	if (S_ISREG(VFS_I(ip)->i_mode) && ip->i_df.if_nextents &&
-	    XFS_FSB_TO_B(mp, ip->i_extsize) != fa->fsx_extsize)
-		return -EINVAL;
-
 	if (fa->fsx_extsize & mp->m_blockmask)
 		return -EINVAL;