diff mbox

[git,pull] first batch of ufs fixes

Message ID 20170614071133.GD31671@ZenIV.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Al Viro June 14, 2017, 7:11 a.m. UTC
On Tue, Jun 13, 2017 at 02:56:23PM -0700, Richard Narron wrote:
> On Tue, 13 Jun 2017, Al Viro wrote:
> 
> > On Mon, Jun 12, 2017 at 05:54:06PM -0700, Richard Narron wrote:
> > 
> > > Earlier today I could not reproduce the OpenBSD 6.1 ufs1 fsck error after
> > > Linux 4.12-rc5 copy of my >2GB file using "cp".
> > > 
> > > But later today I get the error when I copy using your "dd" method...
> > > 
> > > In any case I always get a ufs1 fsck error after the Linux rm and rmdir.
> > 
> > Interesting...  Could you put together an image (starting with zeroing the
> > device before newfs, and ideally with dd from /dev/zero to create files)
> > that would
> > 	a) pass fsck on OpenBSD
> > 	b) after rm on Linux fail the same
> > then convert it to qcow2 and publish?  Or just compress it - all free and
> > data blocks would contain only zeroes, so any kind of compression (gzip,
> > bzip2, whatever) would reduce the size to something more managable...
> 
> I created a gzip and sent you an email with the link to a UFS1 OpenBSD
> filesytem image.
> 
> I finished simple testing of UFS1 with FreeBSD and NetBSD and found no
> problems except for the differences between "available" blocks in df
> commands.

AFAICS, what happens is a combination of OpenBSD and FreeBSD acting differently
on when reading UFS1 and Linux "[PATCH] ufs: make fsck -f happy" getting the
logics wrong.  First of all, on UFS1 writing a superblock always duplicates the
values into old locations, UFS_FLAGS_UPDATED or not.  Linux implementation
writes either only to new or only to old locations.  What's more, on the read
side the rules are different between FreeBSD and OpenBSD.  The former does
	if we hadn't set fs_un.fs_u2.fs_maxbsize to block size
		set it so
		read from old locations (and copy them to new ones)
The latter *always* reads from old locations.  It also sets FS_FLAGS_UPDATED
at the same spot (FreeBSD does it a bit upstream) and has an ifdefed out "if
flag is already set, bugger off" logics.

Hell knows...  Using FS_FLAGS_UPDATED as a predicate is wrong, due to OpenBSD
fsck clearing it when it modifies a superblock for any reason.  FWIW, using
fs_maxbsize as an indicator looks like a good idea.  The thing is, it lives
in place where the first two elements of ->opostbl used to be.  In filesystems
with ->s_postblformat equal to UFS_42POSTBLFMT.  Which excludes everything
created by 4.4 newfs; in fact, 4.3-Reno is already too recent for that.
All of those will have zeroes in the entire ->opostbl area.

AFAICS, a conservative approach would be
	* reject UFS_42POSTBLFMT for 44bsd ones - it's almost certainly
*not* one.
	* check if fs_maxbsize is equal to frag size; treat that as
"counts are read from new location and stored both to old and new".
44bsd fs_maxbsize != block size => not converted, just use old locations
for everything.  UFS2 => use new locations for everything, don't bother
with old ones.  IOW, something like this (WARNING: completely untested,
might screw your filesystem) might do.

NOTE: all I have is your image *after* it had counters buggered; I don't
know the exact sequence of operations that fucked it in your case.  One
way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
apparently fixes that, but your reproducer might be something different.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---

Comments

Richard Narron June 14, 2017, 8:33 p.m. UTC | #1
On Wed, 14 Jun 2017, Al Viro wrote:
...
> AFAICS, a conservative approach would be
> 	* reject UFS_42POSTBLFMT for 44bsd ones - it's almost certainly
> *not* one.
> 	* check if fs_maxbsize is equal to frag size; treat that as
> "counts are read from new location and stored both to old and new".
> 44bsd fs_maxbsize != block size => not converted, just use old locations
> for everything.  UFS2 => use new locations for everything, don't bother
> with old ones.  IOW, something like this (WARNING: completely untested,
> might screw your filesystem) might do.
>
> NOTE: all I have is your image *after* it had counters buggered; I don't
> know the exact sequence of operations that fucked it in your case.  One
> way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
> on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
> apparently fixes that, but your reproducer might be something different.
>
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> diff --git a/fs/ufs/super.c b/fs/ufs/super.c
> index d9aa2627c9df..eca838a8b43e 100644
> --- a/fs/ufs/super.c
> +++ b/fs/ufs/super.c
> @@ -480,7 +480,7 @@ static void ufs_setup_cstotal(struct super_block *sb)
> 	usb3 = ubh_get_usb_third(uspi);
>
> 	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> -	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
> +	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) ||
> 	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> 		/*we have statistic in different place, then usual*/
> 		uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir);
> @@ -596,9 +596,7 @@ static void ufs_put_cstotal(struct super_block *sb)
> 	usb2 = ubh_get_usb_second(uspi);
> 	usb3 = ubh_get_usb_third(uspi);
>
> -	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> -	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
> -	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> +	if (mtype == UFS_MOUNT_UFSTYPE_UFS2) {
> 		/*we have statistic in different place, then usual*/
> 		usb2->fs_un.fs_u2.cs_ndir =
> 			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
> @@ -608,16 +606,26 @@ static void ufs_put_cstotal(struct super_block *sb)
> 			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
> 		usb3->fs_un1.fs_u2.cs_nffree =
> 			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
> -	} else {
> -		usb1->fs_cstotal.cs_ndir =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
> -		usb1->fs_cstotal.cs_nbfree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
> -		usb1->fs_cstotal.cs_nifree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
> -		usb1->fs_cstotal.cs_nffree =
> -			cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
> +		goto out;
> 	}
> +
> +	if (mtype == UFS_MOUNT_UFSTYPE_44BSD &&
> +	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) {
> +		/* store stats in both old and new places */
> +		usb2->fs_un.fs_u2.cs_ndir =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
> +		usb2->fs_un.fs_u2.cs_nbfree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nbfree);
> +		usb3->fs_un1.fs_u2.cs_nifree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
> +		usb3->fs_un1.fs_u2.cs_nffree =
> +			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
> +	}
> +	usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
> +	usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
> +	usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
> +	usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
> +out:
> 	ubh_mark_buffer_dirty(USPI_UBH(uspi));
> 	ufs_print_super_stuff(sb, usb1, usb2, usb3);
> 	UFSD("EXIT\n");
> @@ -997,6 +1005,13 @@ static int ufs_fill_super(struct super_block *sb, void *data, int silent)
> 		flags |=  UFS_ST_SUN;
> 	}
>
> +	if ((flags & UFS_ST_MASK) == UFS_ST_44BSD &&
> +	    uspi->s_postblformat == UFS_42POSTBLFMT) {
> +		if (!silent)
> +			pr_err("this is not a 44bsd filesystem");
> +		goto failed;
> +	}
> +
> 	/*
> 	 * Check ufs magic number
> 	 */
>

Al this patch looks good to me (so far). I tested all 6 combinations of 
ufs1 and ufs2 in FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1

For each combination, I do 5 steps:

   1) BSD: Make a ufs filesystem
      dd if=/dev/zero to the BSD subpartition
      make (newfs) a ufs (1 or 2) filesystem on the BSD subpartiton

   2) Linux: Create a subdirectory and make a large file
      mkdir a
      dd if=/dev/zero bs=1M count=3072

   3) BSD: Check a ufs filesystem
      fsck -f

   4) Linux: Remove the large file and the subdirectory
      rm
      rmdir

   5) BSD; check a ufs filesystem
      fsck -f

Tested-By: Richard Narron <comet.berkeley@gmail.com>
Al Viro June 15, 2017, 8 a.m. UTC | #2
On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote:
> NOTE: all I have is your image *after* it had counters buggered; I don't
> know the exact sequence of operations that fucked it in your case.  One
> way to trigger it is to mount/umount on OpenBSD, then mount/modify/umount
> on Linux, then mount/umount on OpenBSD, then fsck on OpenBSD.  This patch
> apparently fixes that, but your reproducer might be something different.

FWIW, it seems to work here.  Said that, *BSD fsck_ffs is not worth much -
play a bit with redundancy in UFS superblock (starting with fragment
and block sizes, their ratio, logarithms, bitmasks, etc.) and you can
screw at least 10.3 into the ground when mounting an image that passes
their fsck.  Sure, anyone who mounts untrusted images is a cretin who
deserves everything they get, fsck or no fsck, but... no complaints from
fsck is not a reliable indicator of image being in good condition and
that's PITA for testing.

Another pile of fun: "reserve ->s_minfree percents of total" logics had
been broken.
	* using hardwired 5% is wrong - especially for ufs2, where it's
not even the default
	* ufs_freespace() returns u64; testing for <= 0 is not doing the
right thing
	* no capability checks before we need them, TYVM...
	* ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it).
64bit variants were even calculated - and never used.
	* while we are at it, doing "multiply the total data frags by
s_minfree and divide by 100" every time we allocate a block is bloody
dumb - that should be calculated once.

We really need to get the sodding tail unpacking moved up from the place
where it's buried - turns out that my doubts about that code managing to
avoid deadlocks had been correct.  Long-term we need to move that thing
to iomap-based ->write_iter() and do unpacking there and in truncate().
For now I've slapped together something that is easier to backport -
avoiding ->truncate_mutex when possible and not holding ->s_lock over
ufs_change_blocknr().

Another bug in the same area: ufs_get_locked_page() doesn't guarantee
that buffer_heads are attached (race with vmscan trying to evict the
page in question can end with buffer_heads freed and page left alive
and uptodate).  Callers do expect buffer_heads to be there, so we either
need to do create_empty_buffers() in those callers or in ufs_get_locked_page();
I went for the latter for now.

Off-by-one in ufs_truncate_blocks(): the logics when deciding whether
we need to do anything with direct blocks is broken when new size is
within the last direct block.  It's better to find the path to the
last byte _not_ to be removed and use that instead of the path to the
beginning of the first block to be freed.

I've pushed fixes for those into vfs.git#ufs-fixes; they do need more
testing before I send a pull request, though.
Richard Narron June 16, 2017, 2:29 p.m. UTC | #3
On Thu, 15 Jun 2017, Al Viro wrote:

> On Wed, Jun 14, 2017 at 08:11:33AM +0100, Al Viro wrote:
> FWIW, it seems to work here.  Said that, *BSD fsck_ffs is not worth much -
> play a bit with redundancy in UFS superblock (starting with fragment
> and block sizes, their ratio, logarithms, bitmasks, etc.) and you can
> screw at least 10.3 into the ground when mounting an image that passes
> their fsck.  Sure, anyone who mounts untrusted images is a cretin who
> deserves everything they get, fsck or no fsck, but... no complaints from
> fsck is not a reliable indicator of image being in good condition and
> that's PITA for testing.
>
> Another pile of fun: "reserve ->s_minfree percents of total" logics had
> been broken.
> 	* using hardwired 5% is wrong - especially for ufs2, where it's
> not even the default
> 	* ufs_freespace() returns u64; testing for <= 0 is not doing the
> right thing
> 	* no capability checks before we need them, TYVM...
> 	* ufs2 needs 64bit uspi->s_dsize (and ->s_size, while we are at it).
> 64bit variants were even calculated - and never used.
> 	* while we are at it, doing "multiply the total data frags by
> s_minfree and divide by 100" every time we allocate a block is bloody
> dumb - that should be calculated once.
>
> We really need to get the sodding tail unpacking moved up from the place
> where it's buried - turns out that my doubts about that code managing to
> avoid deadlocks had been correct.  Long-term we need to move that thing
> to iomap-based ->write_iter() and do unpacking there and in truncate().
> For now I've slapped together something that is easier to backport -
> avoiding ->truncate_mutex when possible and not holding ->s_lock over
> ufs_change_blocknr().
>
> Another bug in the same area: ufs_get_locked_page() doesn't guarantee
> that buffer_heads are attached (race with vmscan trying to evict the
> page in question can end with buffer_heads freed and page left alive
> and uptodate).  Callers do expect buffer_heads to be there, so we either
> need to do create_empty_buffers() in those callers or in ufs_get_locked_page();
> I went for the latter for now.
>
> Off-by-one in ufs_truncate_blocks(): the logics when deciding whether
> we need to do anything with direct blocks is broken when new size is
> within the last direct block.  It's better to find the path to the
> last byte _not_ to be removed and use that instead of the path to the
> beginning of the first block to be freed.
>
> I've pushed fixes for those into vfs.git#ufs-fixes; they do need more
> testing before I send a pull request, though.

The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
They seem to work fine with the simple testing that I do.

I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2 
filesystems, 44bsd (ufs1) and ufs2.
I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm 
large file, rmdir and BSD fsck in any of the 6 combinations.

Doing a "df" on BSD and Linux now match on the counts including the 
"Available" counts.

It might be worth testing with ufs filesystems using softdep and/or 
journaling.  Should the Linux mount command reject such filesystems?

Now that ufs write access is working more or less, we're dangerous.
Richard Narron June 18, 2017, 8:45 p.m. UTC | #4
On Sun, 18 Jun 2017, Al Viro wrote:

> On Sat, Jun 17, 2017 at 03:15:48AM +0100, Al Viro wrote:
>> On Fri, Jun 16, 2017 at 07:29:00AM -0700, Richard Narron wrote:
>>
>>> The 8 patches in the ufs-fixes group were applied to Linux 4.12-rc5.
>>> They seem to work fine with the simple testing that I do.
>>>
>>> I tested all 3 BSDs, FreeBSD 11.0, OpenBSD 6.1 and NetBSD 7.1 using 2
>>> filesystems, 44bsd (ufs1) and ufs2.
>>> I found no errors doing a Linux mkdir, copy large file, BSD fsck, Linux rm
>>> large file, rmdir and BSD fsck in any of the 6 combinations.
>>
>> FWIW, with xfstests I see the following:
>> 	* _require_metadata_journaling needs to be told not to bother with
>> our ufs
>> 	* generic/{409,410,411} lack $MOUNT_OPTIONS in several invocations
>> of mount -t $FSTYP; for ufs it's obviosuly a problem.  Trivial bug in tests,
>> fixing it makes them pass.
>> 	* generic/258 (timestamp wraparound) fails; fs is left intact
>
> Trivially fixed (cast to (signed) in ufs1_read_inode(), similar to what
> other filesystems with 32bit timestamps are doing); ufs2 has no problem
> at all)
>
>> 	* generic/426 (fhandle stuff) fails and buggers the filesystem
>> Everything else passes with no fs corruption that could be detected by
>> fsck.ufs.
>
> Also trivially fixed - it's a self-inflicted wound.  Just have zero nlink in
> ufs{1,2}_read_inode() fail with -ESTALE instead of triggering ufs_error().
>
>> As for my immediate plans, I'll look into the two failing tests,
>> but any further active work on ufs will have to wait for the next
>> cycle.  It had been a fun couple of weeks, but I have more than
>> enough other stuff to deal with.  And I would still very much prefer
>> for somebody to adopt that puppy.
>
> Another piece of fun spotted: the logics for switching between two allocation
> policies when relocating a packed tail that can't be expanded in place had
> been b0rken since typo in 2.4.14.7 - switch back from OPTTIME to OPTSPACE
> had been screwed by this:
> -               usb1->fs_optim = SWAB32(UFS_OPTSPACE);
> +               usb1->fs_optim = cpu_to_fs32(sb, UFS_OPTTIME);
>
> And fragmentation levels for switching back and force really ought to be
> calculated at mount time.  Another (minor) issue is mentioned in this
> commit message from Kirck McKusick back in 1995:
>        The threshold for switching from time-space and space-time is too small
>        when minfree is 5%...so make it stay at space in this case.
> Not that minfree at 5% had been frequently seen - default has never been that
> low (back in 4.2BSD it was 10%, these days it's 8%)
>
> Resulting kernel passes xfstests clean and now I'm definitely done with UFS for
> this cycle.  Linus, in case you want to pull that sucker, pull request would
> be as below:
>
> The following changes since commit a8fad984833832d5ca11a9ed64ddc55646da30e3:
>
>  ufs_truncate_blocks(): fix the case when size is in the last direct block (2017-06-15 03:57:46 -0400)
>
> are available in the git repository at:
>
>  git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git ufs-fixes
>
> for you to fetch changes up to 77e9ce327d9b607cd6e57c0f4524a654dc59c4b1:
>
>  ufs: fix the logics for tail relocation (2017-06-17 17:22:42 -0400)
>
> ----------------------------------------------------------------
> Al Viro (3):
>      fix signedness of timestamps on ufs1
>      ufs_iget(): fail with -ESTALE on deleted inode
>      ufs: fix the logics for tail relocation
>
> fs/ufs/balloc.c | 22 ++++++----------------
> fs/ufs/inode.c  | 27 +++++++++++----------------
> fs/ufs/super.c  |  9 +++++++++
> fs/ufs/ufs_fs.h |  2 ++
> 4 files changed, 28 insertions(+), 32 deletions(-)
>

I just tested these 3 patches along with the earlier 8 patches against 
Linux 4.12-rc5 and they look fine. All 6 test cases look good.

The ufs code is much better now than it was before all these patches.
diff mbox

Patch

diff --git a/fs/ufs/super.c b/fs/ufs/super.c
index d9aa2627c9df..eca838a8b43e 100644
--- a/fs/ufs/super.c
+++ b/fs/ufs/super.c
@@ -480,7 +480,7 @@  static void ufs_setup_cstotal(struct super_block *sb)
 	usb3 = ubh_get_usb_third(uspi);
 
 	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
-	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
+	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) ||
 	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
 		/*we have statistic in different place, then usual*/
 		uspi->cs_total.cs_ndir = fs64_to_cpu(sb, usb2->fs_un.fs_u2.cs_ndir);
@@ -596,9 +596,7 @@  static void ufs_put_cstotal(struct super_block *sb)
 	usb2 = ubh_get_usb_second(uspi);
 	usb3 = ubh_get_usb_third(uspi);
 
-	if ((mtype == UFS_MOUNT_UFSTYPE_44BSD &&
-	     (usb1->fs_flags & UFS_FLAGS_UPDATED)) ||
-	    mtype == UFS_MOUNT_UFSTYPE_UFS2) {
+	if (mtype == UFS_MOUNT_UFSTYPE_UFS2) {
 		/*we have statistic in different place, then usual*/
 		usb2->fs_un.fs_u2.cs_ndir =
 			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
@@ -608,16 +606,26 @@  static void ufs_put_cstotal(struct super_block *sb)
 			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
 		usb3->fs_un1.fs_u2.cs_nffree =
 			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
-	} else {
-		usb1->fs_cstotal.cs_ndir =
-			cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
-		usb1->fs_cstotal.cs_nbfree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
-		usb1->fs_cstotal.cs_nifree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
-		usb1->fs_cstotal.cs_nffree =
-			cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
+		goto out;
 	}
+
+	if (mtype == UFS_MOUNT_UFSTYPE_44BSD &&
+	     (usb2->fs_un.fs_u2.fs_maxbsize == usb1->fs_bsize)) {
+		/* store stats in both old and new places */
+		usb2->fs_un.fs_u2.cs_ndir =
+			cpu_to_fs64(sb, uspi->cs_total.cs_ndir);
+		usb2->fs_un.fs_u2.cs_nbfree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nbfree);
+		usb3->fs_un1.fs_u2.cs_nifree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nifree);
+		usb3->fs_un1.fs_u2.cs_nffree =
+			cpu_to_fs64(sb, uspi->cs_total.cs_nffree);
+	}
+	usb1->fs_cstotal.cs_ndir = cpu_to_fs32(sb, uspi->cs_total.cs_ndir);
+	usb1->fs_cstotal.cs_nbfree = cpu_to_fs32(sb, uspi->cs_total.cs_nbfree);
+	usb1->fs_cstotal.cs_nifree = cpu_to_fs32(sb, uspi->cs_total.cs_nifree);
+	usb1->fs_cstotal.cs_nffree = cpu_to_fs32(sb, uspi->cs_total.cs_nffree);
+out:
 	ubh_mark_buffer_dirty(USPI_UBH(uspi));
 	ufs_print_super_stuff(sb, usb1, usb2, usb3);
 	UFSD("EXIT\n");
@@ -997,6 +1005,13 @@  static int ufs_fill_super(struct super_block *sb, void *data, int silent)
 		flags |=  UFS_ST_SUN;
 	}
 
+	if ((flags & UFS_ST_MASK) == UFS_ST_44BSD &&
+	    uspi->s_postblformat == UFS_42POSTBLFMT) {
+		if (!silent)
+			pr_err("this is not a 44bsd filesystem");
+		goto failed;
+	}
+
 	/*
 	 * Check ufs magic number
 	 */