diff mbox series

[17/18] xfs: remove check for block sizes smaller than PAGE_SIZE

Message ID 20230918110510.66470-18-hare@suse.de (mailing list archive)
State New, archived
Headers show
Series block: update buffer_head for Large-block I/O | expand

Commit Message

Hannes Reinecke Sept. 18, 2023, 11:05 a.m. UTC
We now support block sizes larger than PAGE_SIZE, so this
check is pointless.

Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 fs/xfs/xfs_super.c | 12 ------------
 1 file changed, 12 deletions(-)

Comments

Dave Chinner Sept. 20, 2023, 2:13 a.m. UTC | #1
On Mon, Sep 18, 2023 at 01:05:09PM +0200, Hannes Reinecke wrote:
> We now support block sizes larger than PAGE_SIZE, so this
> check is pointless.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  fs/xfs/xfs_super.c | 12 ------------
>  1 file changed, 12 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 1f77014c6e1a..67dcdd4dcf2d 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -1651,18 +1651,6 @@ xfs_fs_fill_super(
>  		goto out_free_sb;
>  	}
>  
> -	/*
> -	 * Until this is fixed only page-sized or smaller data blocks work.
> -	 */
> -	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
> -		xfs_warn(mp,
> -		"File system with blocksize %d bytes. "
> -		"Only pagesize (%ld) or less will currently work.",
> -				mp->m_sb.sb_blocksize, PAGE_SIZE);
> -		error = -ENOSYS;
> -		goto out_free_sb;
> -	}

This really needs to be replaced with an EXPERIMENTAL warning -
we're not going to support these LBS configurations until we are
sure it doesn't eat data.

Anyway, smoke tests....

# mkfs.xfs -f -b size=64k /dev/pmem0
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=65536  blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1
log      =internal log           bsize=65536  blocks=1024, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
# mount /dev/pmem0 /mnt/test

Message from syslogd@test3 at Sep 20 11:23:32 ...
 kernel:[   73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134

Message from syslogd@test3 at Sep 20 11:23:32 ...
 kernel:[   73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134
Segmentation fault
#

Looks like this hasn't been tested with CONFIG_XFS_DEBUG=y. If
that's the case, I expect that none of this actually works... :/

I've attached a patch at the end of the email that allows XFS
filesystems to mount with debug enabled.

Next problem, filesystem created with a 32kB sector size:

# mkfs.xfs -f -b size=64k -s size=32k /dev/pmem0
meta-data=/dev/pmem0             isize=512    agcount=4, agsize=32768 blks
         =                       sectsz=32768 attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=1
         =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
data     =                       bsize=65536  blocks=131072, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=65536  ascii-ci=0, ftype=1
log      =internal log           bsize=65536  blocks=2709, version=2
         =                       sectsz=32768 sunit=1 blks, lazy-count=1
realtime =none                   extsz=65536  blocks=0, rtextents=0
#

and then running xfs_db on it to change the UUID:

# xfs_admin -U generate /dev/pmem0

Results in a kernel panic:

[  132.151886] XFS (pmem0): Mounting V5 Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a
[  132.161673] XFS (pmem0): Ending clean mount
[  175.824015] XFS (pmem0): Unmounting Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a
[  185.759251] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0
[  185.766632] CPU: 1 PID: 4383 Comm: xfs_db Not tainted 6.6.0-rc2-dgc+ #1903
[  185.771882] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  185.778827] Call Trace:
[  185.780706]  <TASK>
[  185.782318]  dump_stack_lvl+0x37/0x50
[  185.785069]  dump_stack+0x10/0x20
[  185.787557]  panic+0x15b/0x300
[  185.789763]  ? do_mpage_readpage+0x7e5/0x7f0
[  185.792705]  __stack_chk_fail+0x14/0x20
[  185.795183]  do_mpage_readpage+0x7e5/0x7f0
[  185.797826]  ? blkdev_write_begin+0x30/0x30
[  185.800500]  ? blkdev_readahead+0x15/0x20
[  185.802894]  ? read_pages+0x5c/0x230
[  185.805023]  ? page_cache_ra_order+0x2ae/0x310
[  185.807538]  ? ondemand_readahead+0x1f1/0x3a0
[  185.809899]  ? page_cache_async_ra+0x26/0x30
[  185.812175]  ? filemap_get_pages+0x540/0x6d0
[  185.814327]  ? _copy_to_iter+0x65/0x4c0
[  185.816283]  ? filemap_read+0xfc/0x3a0
[  185.818086]  ? __fsnotify_parent+0x107/0x340
[  185.820142]  ? __might_sleep+0x42/0x70
[  185.821854]  ? blkdev_read_iter+0x6d/0x150
[  185.823697]  ? vfs_read+0x1b1/0x300
[  185.825307]  ? __x64_sys_pread64+0x8f/0xc0
[  185.827159]  ? irqentry_exit+0x33/0x40
[  185.828858]  ? do_syscall_64+0x35/0x80
[  185.830485]  ? entry_SYSCALL_64_after_hwframe+0x63/0xcd
[  185.832677]  </TASK>
[  185.834068] Kernel Offset: disabled
[  185.835510] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 ]---

Probably related to the block device having a 32kB block size set on
the pmem device by xfs_db when it only really has a 4kB sector size....

Anyway, back to just using 64k fsb, 4k sector.  generic/001 ASSERT
fails immediately with:

[  111.785796] run fstests generic/001 at 2023-09-20 11:50:19
[  113.346797] XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1392
[  113.352512] ------------[ cut here ]------------
[  113.354793] kernel BUG at fs/xfs/xfs_message.c:102!
[  113.358444] invalid opcode: 0000 [#1] PREEMPT SMP
[  113.360769] CPU: 8 PID: 7581 Comm: cp Not tainted 6.6.0-rc2-dgc+ #1903
[  113.364183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[  113.369784] RIP: 0010:assfail+0x35/0x40
[  113.372038] Code: c9 48 c7 c2 00 d8 6c 82 48 89 e5 48 89 f1 48 89 fe 48 c7 c7 d8 d3 60 82 e8 a8 fd ff ff 80 3d d1 36 ec 02 00 75 04 0f 0b 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 63 f6 49 89
[  113.384178] RSP: 0018:ffffc9000962bca0 EFLAGS: 00010202
[  113.387467] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000007fffffff
[  113.392326] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8260d3d8
[  113.397235] RBP: ffffc9000962bca0 R08: 0000000000000000 R09: 000000000000000a
[  113.401449] R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000000
[  113.406312] R13: 00000000ffffff8b R14: 0000000000000000 R15: ffff88810de00f00
[  113.410562] FS:  00007f4f6219b500(0000) GS:ffff8885fec00000(0000) knlGS:0000000000000000
[  113.415506] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  113.418688] CR2: 0000564951bfcf50 CR3: 00000005cd6b8005 CR4: 0000000000060ee0
[  113.422825] Call Trace:
[  113.424302]  <TASK>
[  113.425566]  ? show_regs+0x61/0x70
[  113.427444]  ? die+0x37/0x90
[  113.429175]  ? do_trap+0xec/0x100
[  113.431016]  ? do_error_trap+0x6c/0x90
[  113.432951]  ? assfail+0x35/0x40
[  113.434677]  ? exc_invalid_op+0x52/0x70
[  113.436755]  ? assfail+0x35/0x40
[  113.438659]  ? asm_exc_invalid_op+0x1b/0x20
[  113.440864]  ? assfail+0x35/0x40
[  113.442671]  xfs_reflink_remap_blocks+0x197/0x350
[  113.445278]  xfs_file_remap_range+0xf3/0x320
[  113.447504]  do_clone_file_range+0xfe/0x2b0
[  113.449689]  vfs_clone_file_range+0x3f/0x150
[  113.452080]  ioctl_file_clone+0x52/0xa0
[  113.453600]  do_vfs_ioctl+0x485/0x8d0
[  113.455054]  ? selinux_file_ioctl+0x96/0x120
[  113.456637]  ? selinux_file_ioctl+0x96/0x120
[  113.458213]  __x64_sys_ioctl+0x73/0xd0
[  113.459598]  do_syscall_64+0x35/0x80
[  113.460840]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Problems with unexpected extent types in the reflink remap code.

This is due to the reflink operation finding a delalloc extent where
it should be finding a real extent. this implies the
xfs_flush_unmap_range() call in xfs_reflink_remap_prep() didn't
flush the full data range it was supposed to.
xfs_flush_unmap_range() is supposed to round the range out to:

	rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE);

the block size for these cases, so it is LBS aware. So maybe there's
a problem with filemap_write_and_wait_range() and/or
truncate_pagecache_range() when dealing with LBS enabled page cache?

So, yeah, debug XFS builds tell us straight away that important stuff
does not appear to be not working correctly. Debug really needs to
be enabled, otherwise silent data corruption situations like this
can go unnoticed by fstests tests...

-Dave
diff mbox series

Patch

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 1f77014c6e1a..67dcdd4dcf2d 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1651,18 +1651,6 @@  xfs_fs_fill_super(
 		goto out_free_sb;
 	}
 
-	/*
-	 * Until this is fixed only page-sized or smaller data blocks work.
-	 */
-	if (mp->m_sb.sb_blocksize > PAGE_SIZE) {
-		xfs_warn(mp,
-		"File system with blocksize %d bytes. "
-		"Only pagesize (%ld) or less will currently work.",
-				mp->m_sb.sb_blocksize, PAGE_SIZE);
-		error = -ENOSYS;
-		goto out_free_sb;
-	}
-
 	/* Ensure this filesystem fits in the page cache limits */
 	if (xfs_sb_validate_fsb_count(&mp->m_sb, mp->m_sb.sb_dblocks) ||
 	    xfs_sb_validate_fsb_count(&mp->m_sb, mp->m_sb.sb_rblocks)) {