Message ID | 20240508170343.2774-1-wen.gang.wang@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: allow changing extsize on file | expand |
On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote: > Hi Dave, this is more a question than a patch. > > We are current disallowing the change of extsize on files/dirs if the file/dir > have blocks allocated. That's not that friendly to users. Say somehow the > extsize was set very huge (1GiB), in the following cases, it's not that The first problem is ensuring that "say somehow extsize was set very huge" doesn't happen in the first place. Then all the other problems just don't happen. > convenient: > case 1: the file now extends very little. -- 1GiB extsize leads a waste of > almost 1GiB. > case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the > IO pattern, so the huge preallocting and then reclaiming is not necessary > and that cost extra time especially when the system if fragmented. > > In above cases, changing extsize smaller is needed. > > In theory, the exthint is a hint for future allocation, It's not that simple because future allocation is influenced by past allocation. e.g. What happens if the new extent size hint is not aligned with the old one and we now have two different extent alignments in the file? What happens if an admin sees this when trying to triage some other problem and doesn't know that the extent size hint has been changed? They'll think there is a bug in the filesystem allocator and report it. What do we do with that report now? Do we waste hours trying to reproduce it and fail, maybe never learning that the an extent size hint change caused the issue? i.e. how do we determine that the issue is a real allocation alignment bug versus it simply being a result of "application did something whacky with extent size hints"? Hence allowing extent size hints to change dynamically basically makes it impossible to trust that the current extent size hint defines the alignment for all the extents in the file. And at that point, we completely lose the ability to triage allocation alignment issues without an exact reproducer from the reporter... Now, just disabling extent size hints avoids this issue (i.e. allow return to zero if extents already exist) because there's now no alignment restriction at all and nobody is going to care. However, this creates new issues. e.g it opens up the possibility that applications will scan existing files for extent size hints set on them and be able to -override the admin set alignment hints- used to create the data set. The admin may have set inheritable extent size hints to ensure allocation alignment to underlying storage because the applications don't know about optimal storage alignments (e.g. for PMD alignment on DAX storage). We don't want applications to be able to disable these hints because the precise reason they are set is to optimise storage alignment for better application performance.... IOWs, there are good reasons for not allowing extent size hints to be overrridden by applications just by clearing/changing the inode extent size field... > I can't connect it > to the blocks which are already allocated to the file/dir. > So the only reason why we disallow that is that there might be some problems if > we allow it. Well, can we fix the real problem(s) rather than disallowing > extsize changing? The only reliable way to change extent size hints so allocation alignment always matches the new extent size hint is to physically realign the data in the file to the new extent size hint. i.e. do it through xfs_fsr to "defrag" the file according to the new extent size hint. Then when we swap the old and new data extents, we also set the new extent size hint that matches the new data extents. This extent size hint change is then enabled through a completely different interface which is not one applications will use in general operation. Hence it becomes an explicit admin operation, enabling users to rectify the rare problems you document above without compromising the existing behaviour of extent size hints for everyone else. Cheers, Dave.
Thanks Dave for replying. I will think about it. Wengang > On May 10, 2024, at 6:56 PM, Dave Chinner <david@fromorbit.com> wrote: > > On Wed, May 08, 2024 at 10:03:43AM -0700, Wengang Wang wrote: >> Hi Dave, this is more a question than a patch. >> >> We are current disallowing the change of extsize on files/dirs if the file/dir >> have blocks allocated. That's not that friendly to users. Say somehow the >> extsize was set very huge (1GiB), in the following cases, it's not that > > The first problem is ensuring that "say somehow extsize was set very > huge" doesn't happen in the first place. Then all the other problems > just don't happen. > >> convenient: >> case 1: the file now extends very little. -- 1GiB extsize leads a waste of >> almost 1GiB. >> case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the >> IO pattern, so the huge preallocting and then reclaiming is not necessary >> and that cost extra time especially when the system if fragmented. >> >> In above cases, changing extsize smaller is needed. >> >> In theory, the exthint is a hint for future allocation, > > It's not that simple because future allocation is influenced by past > allocation. e.g. What happens if the new extent size hint is not > aligned with the old one and we now have two different extent > alignments in the file? > > What happens if an admin sees this when trying to triage some > other problem and doesn't know that the extent size hint has been > changed? They'll think there is a bug in the filesystem allocator > and report it. > > What do we do with that report now? Do we waste hours trying to > reproduce it and fail, maybe never learning that the an extent > size hint change caused the issue? i.e. how do we determine that the > issue is a real allocation alignment bug versus it simply being a > result of "application did something whacky with extent size hints"? > > Hence allowing extent size hints to change dynamically basically > makes it impossible to trust that the current extent size hint > defines the alignment for all the extents in the file. And at that > point, we completely lose the ability to triage allocation alignment > issues without an exact reproducer from the reporter... > > Now, just disabling extent size hints avoids this issue (i.e. allow > return to zero if extents already exist) because there's now no > alignment restriction at all and nobody is going to care. However, > this creates new issues. > > e.g it opens up the possibility that applications will scan existing > files for extent size hints set on them and be able to -override the > admin set alignment hints- used to create the data set. > > The admin may have set inheritable extent size hints to ensure > allocation alignment to underlying storage because the applications > don't know about optimal storage alignments (e.g. for PMD alignment > on DAX storage). We don't want applications to be able to disable > these hints because the precise reason they are set is to optimise > storage alignment for better application performance.... > > IOWs, there are good reasons for not allowing extent size hints to > be overrridden by applications just by clearing/changing the inode > extent size field... > >> I can't connect it >> to the blocks which are already allocated to the file/dir. >> So the only reason why we disallow that is that there might be some problems if >> we allow it. Well, can we fix the real problem(s) rather than disallowing >> extsize changing? > > The only reliable way to change extent size hints so allocation > alignment always matches the new extent size hint is to physically > realign the data in the file to the new extent size hint. i.e. do it > through xfs_fsr to "defrag" the file according to the new extent > size hint. Then when we swap the old and new data extents, we also > set the new extent size hint that matches the new data extents. > > This extent size hint change is then enabled through a completely > different interface which is not one applications will use in > general operation. Hence it becomes an explicit admin operation, > enabling users to rectify the rare problems you document above > without compromising the existing behaviour of extent size hints for > everyone else. > > Cheers, > > Dave. > -- > Dave Chinner > david@fromorbit.com
hi, Wengang Wang, we noticed "this is more a question than a patch." but not sure if below report could supply any useful information to you, so still send FYI what we observed in our tests. Hello, kernel test robot noticed "xfstests.xfs.207.fail" on: commit: ab25dcd364e632586fe0bd3a2cdb194e03f5b484 ("[PATCH] xfs: allow changing extsize on file") url: https://github.com/intel-lab-lkp/linux/commits/Wengang-Wang/xfs-allow-changing-extsize-on-file/20240509-010818 base: https://git.kernel.org/cgit/fs/xfs/xfs-linux.git for-next patch link: https://lore.kernel.org/all/20240508170343.2774-1-wen.gang.wang@oracle.com/ patch subject: [PATCH] xfs: allow changing extsize on file in testcase: xfstests version: xfstests-x86_64-b26d68da-1_20240517 with following parameters: disk: 4HDD fs: xfs test: xfs-207 compiler: gcc-13 test machine: 4 threads Intel(R) Xeon(R) CPU E3-1225 v5 @ 3.30GHz (Skylake) with 16G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202405201028.ca3b53bf-oliver.sang@intel.com 2024-05-17 16:19:24 export TEST_DIR=/fs/sda1 2024-05-17 16:19:24 export TEST_DEV=/dev/sda1 2024-05-17 16:19:24 export FSTYP=xfs 2024-05-17 16:19:24 export SCRATCH_MNT=/fs/scratch 2024-05-17 16:19:24 mkdir /fs/scratch -p 2024-05-17 16:19:24 export SCRATCH_DEV=/dev/sda4 2024-05-17 16:19:24 export SCRATCH_LOGDEV=/dev/sda2 2024-05-17 16:19:24 export SCRATCH_XFS_LIST_METADATA_FIELDS=u3.sfdir3.hdr.parent.i4 2024-05-17 16:19:24 export SCRATCH_XFS_LIST_FUZZ_VERBS=random 2024-05-17 16:19:24 export MKFS_OPTIONS=-mreflink=1 2024-05-17 16:19:24 echo xfs/207 2024-05-17 16:19:24 ./check xfs/207 FSTYP -- xfs (debug) PLATFORM -- Linux/x86_64 lkp-skl-d06 6.9.0-rc4-00248-gab25dcd364e6 #1 SMP PREEMPT_DYNAMIC Fri May 17 14:27:24 CST 2024 MKFS_OPTIONS -- -f -mreflink=1 /dev/sda4 MOUNT_OPTIONS -- /dev/sda4 /fs/scratch xfs/207 - output mismatch (see /lkp/benchmarks/xfstests/results//xfs/207.out.bad) --- tests/xfs/207.out 2024-05-17 14:38:40.000000000 +0000 +++ /lkp/benchmarks/xfstests/results//xfs/207.out.bad 2024-05-17 16:20:09.902112701 +0000 @@ -3,12 +3,11 @@ Create the original files Set extsz and cowextsz on zero byte file Set extsz and cowextsz on 1Mbyte file -xfs_io: FS_IOC_FSSETXATTR SCRATCH_MNT/test-207/file2: Invalid argument Check extsz and cowextsz settings on zero byte file [1048576] SCRATCH_MNT/test-207/file1 [1048576] SCRATCH_MNT/test-207/file1 ... (Run 'diff -u /lkp/benchmarks/xfstests/tests/xfs/207.out /lkp/benchmarks/xfstests/results//xfs/207.out.bad' to see the entire diff) Ran: xfs/207 Failures: xfs/207 Failed 1 of 1 tests The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240520/202405201028.ca3b53bf-oliver.sang@intel.com
diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c index d0e2cec6210d..b34992d9932f 100644 --- a/fs/xfs/xfs_ioctl.c +++ b/fs/xfs/xfs_ioctl.c @@ -1221,8 +1221,7 @@ xfs_ioctl_setattr_get_trans( } /* - * Validate a proposed extent size hint. For regular files, the hint can only - * be changed if no extents are allocated. + * Validate a proposed extent size hint. */ static int xfs_ioctl_setattr_check_extsize( @@ -1236,10 +1235,6 @@ xfs_ioctl_setattr_check_extsize( if (!fa->fsx_valid) return 0; - if (S_ISREG(VFS_I(ip)->i_mode) && ip->i_df.if_nextents && - XFS_FSB_TO_B(mp, ip->i_extsize) != fa->fsx_extsize) - return -EINVAL; - if (fa->fsx_extsize & mp->m_blockmask) return -EINVAL;
Hi Dave, this is more a question than a patch. We are current disallowing the change of extsize on files/dirs if the file/dir have blocks allocated. That's not that friendly to users. Say somehow the extsize was set very huge (1GiB), in the following cases, it's not that convenient: case 1: the file now extends very little. -- 1GiB extsize leads a waste of almost 1GiB. case 2: when CoW happens, 1GiB is preallocated. 1GiB is now too big for the IO pattern, so the huge preallocting and then reclaiming is not necessary and that cost extra time especially when the system if fragmented. In above cases, changing extsize smaller is needed. In theory, the exthint is a hint for future allocation, I can't connect it to the blocks which are already allocated to the file/dir. So the only reason why we disallow that is that there might be some problems if we allow it. Well, can we fix the real problem(s) rather than disallowing extsize changing? Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> --- fs/xfs/xfs_ioctl.c | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-)