Message ID | 20170614071133.GD31671@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, 14 Jun 2017, Al Viro wrote: ... > AFAICS, a conservative approach would be > * reject UFS_42POSTBLFMT for 44bsd ones - it's almost certainly > *not* one. > * check if fs_maxbsize is equal to frag size; treat that as > "counts are read from new location and stored both to old and new". > 44bsd fs_maxbsize != block size => not converted, just use old locations > for everything. UFS2 => use new locations for everything, don't bother > with old ones. IOW, something like this (WARNING: completely untested, > might screw your filesystem) might do. > > NOTE: all I have is your image *after* it had counters buggered; I don't > know the exact sequence of operations that fucked it in your case. One > way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount > on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD. This patch > apparently fixes that, but your reproducer might be something different. > > Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> > --- > diff --git a/fs/ufs/super.c b/fs/ufs/super.c > index d9aa2627c9df..eca838a8b43e 100644 > --- a/fs/ufs/super.c > +++ b/fs/ufs/super.c > @@ -480,7 +480,7 @@ static void ufs_setup_cstotal(struct super_block *sb) > usb3 = ubh_get_usb_third(uspi); > > if ((mtype == UFS_MOUNT_UFSTYPE_44BSD && > - (usb1->fs_flags & UFS_FLAGS_UPDATED)) || > + (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) || > mtype == UFS_MOUNT_UFSTYPE_UFS2) { > /*we have statistic in different place, then usual*/ > uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir); > @@ -596,9 +596,7 @@ static void ufs_put_cstotal(struct super_block *sb) > usb2 = ubh_get_usb_second(uspi); > usb3 = ubh_get_usb_third(uspi); > > - if ((mtype == UFS_MOUNT_UFSTYPE_44BSD && > - (usb1->fs_flags & UFS_FLAGS_UPDATED)) || > - mtype == UFS_MOUNT_UFSTYPE_UFS2) { > + if (mtype == UFS_MOUNT_UFSTYPE_UFS2) { > /*we have statistic in different place, then usual*/ > usb2->fs_un.fs_u2.cs_ndir = > cpu_to_fs64(sb, uspi->cs_total.cs_ndir); > @@ -608,16 +606,26 @@ static void ufs_put_cstotal(struct super_block *sb) > cpu_to_fs64(sb, uspi->cs_total.cs_nifree); > usb3->fs_un1.fs_u2.cs_nffree = > cpu_to_fs64(sb, uspi->cs_total.cs_nffree); > - } else { > - usb1->fs_cstotal.cs_ndir = > - cpu_to_fs32(sb, uspi->cs_total.cs_ndir); > - usb1->fs_cstotal.cs_nbfree = > - cpu_to_fs32(sb, uspi->cs_total.cs_nbfree); > - usb1->fs_cstotal.cs_nifree = > - cpu_to_fs32(sb, uspi->cs_total.cs_nifree); > - usb1->fs_cstotal.cs_nffree = > - cpu_to_fs32(sb, uspi->cs_total.cs_nffree); > + goto out; > } > + > + if (mtype == UFS_MOUNT_UFSTYPE_44BSD && > + (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) { > + /* store stats in both old and new places */ > + usb2->fs_un.fs_u2.cs_ndir = > + cpu_to_fs64(sb, uspi->cs_total.cs_ndir); > + usb2->fs_un.fs_u2.cs_nbfree = > + cpu_to_fs64(sb, uspi->cs_total.cs_nbfree); > + usb3->fs_un1.fs_u2.cs_nifree = > + cpu_to_fs64(sb, uspi->cs_total.cs_nifree); > + usb3->fs_un1.fs_u2.cs_nffree = > + cpu_to_fs64(sb, uspi->cs_total.cs_nffree); > + } > + usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir); > + usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree); > + usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree); > + usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree); > +out: > ubh_mark_buffer_dirty(USPI_UBH(uspi)); > ufs_print_super_stuff(sb, usb1, usb2, usb3); > UFSD("EXIT\n"); > @@ -997,6 +1005,13 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) > flags |= UFS_ST_SUN; > } > > + if ((flags & UFS_ST_MASK) == UFS_ST_44BSD && > + uspi->s_postblformat == UFS_42POSTBLFMT) { > + if (!silent) > + pr_err("this is not a 44bsd filesystem"); > + goto failed; > + } > + > /* > * Check ufs magic number > */ > Al this patch looks good to me (so far). I tested all 6 combinations of ufs1 and ufs2 in FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 For each combination, I do 5 steps: 1) BSD: Make a ufs filesystem dd if=/dev/zero to the BSD subpartition make (newfs) a ufs (1 or 2) filesystem on the BSD subpartiton 2) Linux: Create a subdirectory and make a large file mkdir a dd if=/dev/zero bs=1M count=3072 3) BSD: Check a ufs filesystem fsck -f 4) Linux: Remove the large file and the subdirectory rm rmdir 5) BSD; check a ufs filesystem fsck -f Tested-By: Richard Narron <comet.berkeley@gmail.com>
On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote: > NOTE: all I have is your image *after* it had counters buggered; I don't > know the exact sequence of operations that fucked it in your case. One > way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount > on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD. This patch > apparently fixes that, but your reproducer might be something different. FWIW, it seems to work here. Said that, *BSD fsck_ffs is not worth much - play a bit with redundancy in UFS superblock (starting with fragment and block sizes, their ratio, logarithms, bitmasks, etc.) and you can screw at least 10.3 into the ground when mounting an image that passes their fsck. Sure, anyone who mounts untrusted images is a cretin who deserves everything they get, fsck or no fsck, but... no complaints from fsck is not a reliable indicator of image being in good condition and that's PITA for testing. Another pile of fun: "reserve ->s_minfree percents of total" logics had been broken. * using hardwired 5% is wrong - especially for ufs2, where it's not even the default * ufs_freespace() returns u64; testing for <= 0 is not doing the right thing * no capability checks before we need them, TYVM... * ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it). 64bit variants were even calculated - and never used. * while we are at it, doing "multiply the total data frags by s_minfree and divide by 100" every time we allocate a block is bloody dumb - that should be calculated once. We really need to get the sodding tail unpacking moved up from the place where it's buried - turns out that my doubts about that code managing to avoid deadlocks had been correct. Long-term we need to move that thing to iomap-based ->write_iter() and do unpacking there and in truncate(). For now I've slapped together something that is easier to backport - avoiding ->truncate_mutex when possible and not holding ->s_lock over ufs_change_blocknr(). Another bug in the same area: ufs_get_locked_page() doesn't guarantee that buffer_heads are attached (race with vmscan trying to evict the page in question can end with buffer_heads freed and page left alive and uptodate). Callers do expect buffer_heads to be there, so we either need to do create_empty_buffers() in those callers or in ufs_get_locked_page(); I went for the latter for now. Off-by-one in ufs_truncate_blocks(): the logics when deciding whether we need to do anything with direct blocks is broken when new size is within the last direct block. It's better to find the path to the last byte _not_ to be removed and use that instead of the path to the beginning of the first block to be freed. I've pushed fixes for those into vfs.git#ufs-fixes; they do need more testing before I send a pull request, though.
On Thu, 15 Jun 2017, Al Viro wrote: > On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote: > FWIW, it seems to work here. Said that, *BSD fsck_ffs is not worth much - > play a bit with redundancy in UFS superblock (starting with fragment > and block sizes, their ratio, logarithms, bitmasks, etc.) and you can > screw at least 10.3 into the ground when mounting an image that passes > their fsck. Sure, anyone who mounts untrusted images is a cretin who > deserves everything they get, fsck or no fsck, but... no complaints from > fsck is not a reliable indicator of image being in good condition and > that's PITA for testing. > > Another pile of fun: "reserve ->s_minfree percents of total" logics had > been broken. > * using hardwired 5% is wrong - especially for ufs2, where it's > not even the default > * ufs_freespace() returns u64; testing for <= 0 is not doing the > right thing > * no capability checks before we need them, TYVM... > * ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it). > 64bit variants were even calculated - and never used. > * while we are at it, doing "multiply the total data frags by > s_minfree and divide by 100" every time we allocate a block is bloody > dumb - that should be calculated once. > > We really need to get the sodding tail unpacking moved up from the place > where it's buried - turns out that my doubts about that code managing to > avoid deadlocks had been correct. Long-term we need to move that thing > to iomap-based ->write_iter() and do unpacking there and in truncate(). > For now I've slapped together something that is easier to backport - > avoiding ->truncate_mutex when possible and not holding ->s_lock over > ufs_change_blocknr(). > > Another bug in the same area: ufs_get_locked_page() doesn't guarantee > that buffer_heads are attached (race with vmscan trying to evict the > page in question can end with buffer_heads freed and page left alive > and uptodate). Callers do expect buffer_heads to be there, so we either > need to do create_empty_buffers() in those callers or in ufs_get_locked_page(); > I went for the latter for now. > > Off-by-one in ufs_truncate_blocks(): the logics when deciding whether > we need to do anything with direct blocks is broken when new size is > within the last direct block. It's better to find the path to the > last byte _not_ to be removed and use that instead of the path to the > beginning of the first block to be freed. > > I've pushed fixes for those into vfs.git#ufs-fixes; they do need more > testing before I send a pull request, though. The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5. They seem to work fine with the simple testing that I do. I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2 filesystems, 44bsd (ufs1) and ufs2. I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm large file, rmdir and BSD fsck in any of the 6 combinations. Doing a "df" on BSD and Linux now match on the counts including the "Available" counts. It might be worth testing with ufs filesystems using softdep and/or journaling. Should the Linux mount command reject such filesystems? Now that ufs write access is working more or less, we're dangerous.
On Sun, 18 Jun 2017, Al Viro wrote: > On Sat, Jun 17, 2017 at 03:15:48AM +0100, Al Viro wrote: >> On Fri, Jun 16, 2017 at 07:29:00AM -0700, Richard Narron wrote: >> >>> The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5. >>> They seem to work fine with the simple testing that I do. >>> >>> I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2 >>> filesystems, 44bsd (ufs1) and ufs2. >>> I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm >>> large file, rmdir and BSD fsck in any of the 6 combinations. >> >> FWIW, with xfstests I see the following: >> * _require_metadata_journaling needs to be told not to bother with >> our ufs >> * generic/{409,410,411} lack $MOUNT_OPTIONS in several invocations >> of mount -t $FSTYP; for ufs it's obviosuly a problem. Trivial bug in tests, >> fixing it makes them pass. >> * generic/258 (timestamp wraparound) fails; fs is left intact > > Trivially fixed (cast to (signed) in ufs1_read_inode(), similar to what > other filesystems with 32bit timestamps are doing); ufs2 has no problem > at all) > >> * generic/426 (fhandle stuff) fails and buggers the filesystem >> Everything else passes with no fs corruption that could be detected by >> fsck.ufs. > > Also trivially fixed - it's a self-inflicted wound. Just have zero nlink in > ufs{1,2}_read_inode() fail with -ESTALE instead of triggering ufs_error(). > >> As for my immediate plans, I'll look into the two failing tests, >> but any further active work on ufs will have to wait for the next >> cycle. It had been a fun couple of weeks, but I have more than >> enough other stuff to deal with. And I would still very much prefer >> for somebody to adopt that puppy. > > Another piece of fun spotted: the logics for switching between two allocation > policies when relocating a packed tail that can't be expanded in place had > been b0rken since typo in 2.4.14.7 - switch back from OPTTIME to OPTSPACE > had been screwed by this: > - usb1->fs_optim = SWAB32(UFS_OPTSPACE); > + usb1->fs_optim = cpu_to_fs32(sb, UFS_OPTTIME); > > And fragmentation levels for switching back and force really ought to be > calculated at mount time. Another (minor) issue is mentioned in this > commit message from Kirck McKusick back in 1995: > The threshold for switching from time-space and space-time is too small > when minfree is 5%...so make it stay at space in this case. > Not that minfree at 5% had been frequently seen - default has never been that > low (back in 4.2BSD it was 10%, these days it's 8%) > > Resulting kernel passes xfstests clean and now I'm definitely done with UFS for > this cycle. Linus, in case you want to pull that sucker, pull request would > be as below: > > The following changes since commit a8fad984833832d5ca11a9ed64ddc55646da30e3: > > ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400) > > are available in the git repository at: > > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes > > for you to fetch changes up to 77e9ce327d9b607cd6e57c0f4524a654dc59c4b1: > > ufs: fix the logics for tail relocation (2017-06-17 17:22:42 -0400) > > ---------------------------------------------------------------- > Al Viro (3): > fix signedness of timestamps on ufs1 > ufs_iget(): fail with -ESTALE on deleted inode > ufs: fix the logics for tail relocation > > fs/ufs/balloc.c | 22 ++++++---------------- > fs/ufs/inode.c | 27 +++++++++++---------------- > fs/ufs/super.c | 9 +++++++++ > fs/ufs/ufs_fs.h | 2 ++ > 4 files changed, 28 insertions(+), 32 deletions(-) > I just tested these 3 patches along with the earlier 8 patches against Linux 4.12-rc5 and they look fine. All 6 test cases look good. The ufs code is much better now than it was before all these patches.
diff --git a/fs/ufs/super.c b/fs/ufs/super.c index d9aa2627c9df..eca838a8b43e 100644 --- a/fs/ufs/super.c +++ b/fs/ufs/super.c @@ -480,7 +480,7 @@ static void ufs_setup_cstotal(struct super_block *sb) usb3 = ubh_get_usb_third(uspi); if ((mtype == UFS_MOUNT_UFSTYPE_44BSD && - (usb1->fs_flags & UFS_FLAGS_UPDATED)) || + (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) || mtype == UFS_MOUNT_UFSTYPE_UFS2) { /*we have statistic in different place, then usual*/ uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir); @@ -596,9 +596,7 @@ static void ufs_put_cstotal(struct super_block *sb) usb2 = ubh_get_usb_second(uspi); usb3 = ubh_get_usb_third(uspi); - if ((mtype == UFS_MOUNT_UFSTYPE_44BSD && - (usb1->fs_flags & UFS_FLAGS_UPDATED)) || - mtype == UFS_MOUNT_UFSTYPE_UFS2) { + if (mtype == UFS_MOUNT_UFSTYPE_UFS2) { /*we have statistic in different place, then usual*/ usb2->fs_un.fs_u2.cs_ndir = cpu_to_fs64(sb, uspi->cs_total.cs_ndir); @@ -608,16 +606,26 @@ static void ufs_put_cstotal(struct super_block *sb) cpu_to_fs64(sb, uspi->cs_total.cs_nifree); usb3->fs_un1.fs_u2.cs_nffree = cpu_to_fs64(sb, uspi->cs_total.cs_nffree); - } else { - usb1->fs_cstotal.cs_ndir = - cpu_to_fs32(sb, uspi->cs_total.cs_ndir); - usb1->fs_cstotal.cs_nbfree = - cpu_to_fs32(sb, uspi->cs_total.cs_nbfree); - usb1->fs_cstotal.cs_nifree = - cpu_to_fs32(sb, uspi->cs_total.cs_nifree); - usb1->fs_cstotal.cs_nffree = - cpu_to_fs32(sb, uspi->cs_total.cs_nffree); + goto out; } + + if (mtype == UFS_MOUNT_UFSTYPE_44BSD && + (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) { + /* store stats in both old and new places */ + usb2->fs_un.fs_u2.cs_ndir = + cpu_to_fs64(sb, uspi->cs_total.cs_ndir); + usb2->fs_un.fs_u2.cs_nbfree = + cpu_to_fs64(sb, uspi->cs_total.cs_nbfree); + usb3->fs_un1.fs_u2.cs_nifree = + cpu_to_fs64(sb, uspi->cs_total.cs_nifree); + usb3->fs_un1.fs_u2.cs_nffree = + cpu_to_fs64(sb, uspi->cs_total.cs_nffree); + } + usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir); + usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree); + usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree); + usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree); +out: ubh_mark_buffer_dirty(USPI_UBH(uspi)); ufs_print_super_stuff(sb, usb1, usb2, usb3); UFSD("EXIT\n"); @@ -997,6 +1005,13 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent) flags |= UFS_ST_SUN; } + if ((flags & UFS_ST_MASK) == UFS_ST_44BSD && + uspi->s_postblformat == UFS_42POSTBLFMT) { + if (!silent) + pr_err("this is not a 44bsd filesystem"); + goto failed; + } + /* * Check ufs magic number */