[v3] Btrfs: fix metadata inconsistencies after directory fsync

We can get into inconsistency between inodes and directory entries
after fsyncing a directory. The issue is that while a directory gets
the new dentries persisted in the fsync log and replayed at mount time,
the link count of the inode that directory entries point to doesn't
get updated, staying with an incorrect link count (smaller then the
correct value). This later leads to stale file handle errors when
accessing (including attempt to delete) some of the links if all the
other ones are removed, which also implies impossibility to delete the
parent directories, since the dentries can not be removed.

Another issue is that, unlike xfs and ext4, when fsyncing a directory,
new files aren't logged (their metadata and dentries) nor any child
directories. So this patch fixes this issue too, since it has the same
resolution as the incorrect inode link count issue mentioned before.

This is very easy to reproduce, and the following excerpt from my test
case for xfstests shows how:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our test file and directory.
  touch $SCRATCH_MNT/foo
  mkdir $SCRATCH_MNT/mydir

  # Make sure all metadata is durably persisted.
  sync

  # Add a hard link to 'foo' inside our test directory and fsync only the
  # directory. The btrfs fsync implementation had a bug that caused the new
  # directory entry to be visible after the fsync log replay but, the inode
  # of our file remained with a link count of 1.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_2

  # Add a few more links and files created in the current transaction.
  # Just to verify nothing breaks or gives incorrect results.
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/foo_3
  echo "hello world" > $SCRATCH_MNT/hello
  ln $SCRATCH_MNT/hello $SCRATCH_MNT/mydir/hello_2

  # Add some subdirectories and new files and links to them. This is to verify
  # that after fsyncing our top level directory 'mydir', all the subdirectories
  # and their files/links are registered in the fsync log and exist after the
  # fsync log is replayed - xfs and ext4 give this guarantee.
  mkdir -p $SCRATCH_MNT/mydir/x/y/z
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/foo_y_link
  ln $SCRATCH_MNT/foo $SCRATCH_MNT/mydir/x/y/z/foo_z_link
  touch $SCRATCH_MNT/mydir/x/y/z/qwerty

  # Now fsync only our top directory.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/mydir

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Remove the first name of our inode. Because of the directory fsync bug, the
  # inode's link count was 1 instead of 5, so removing the 'foo' name ends up
  # deleting the inode and the other names are stale directory entries (and still
  # visible to applications). Attempting to remove or access the remaining
  # dentries pointing to that inode resulted in stale file handle errors, and
  # made it impossible to remove the parent directories since it was impossible
  # for them to become empty.
  echo "file 'foo' link count after log replay: $(stat -c %h $SCRATCH_MNT/foo)"
  rm -f $SCRATCH_MNT/foo

  # Now verify that all files, links and directories created before fsyncing our
  # directory exist after the fsync log was replayed.
  [ -f $SCRATCH_MNT/mydir/foo_2 ] || echo "Link mydir/foo_2 is missing"
  [ -f $SCRATCH_MNT/mydir/foo_3 ] || echo "Link mydir/foo_3 is missing"
  [ -f $SCRATCH_MNT/hello ] || echo "File hello is missing"
  echo "file 'hello' size after log replay: $(stat -c %s $SCRATCH_MNT/hello)"
  [ -f $SCRATCH_MNT/mydir/hello_2 ] || echo "Link mydir/hello_2 is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/foo_y_link ] || \
      echo "Link mydir/x/y/foo_y_link is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/z/foo_z_link ] || \
      echo "Link mydir/x/y/z/foo_z_link is missing"
  [ -f $SCRATCH_MNT/mydir/x/y/z/qwerty ] || \
     echo "File mydir/x/y/z/qwerty is missing"

  # Now remove all files/links, under our test directory 'mydir', and verify we
  # can remove all the directories.
  rm -f $SCRATCH_MNT/mydir/x/y/z/*
  rmdir $SCRATCH_MNT/mydir/x/y/z
  rm -f $SCRATCH_MNT/mydir/x/y/*
  rmdir $SCRATCH_MNT/mydir/x/y
  rmdir $SCRATCH_MNT/mydir/x
  rm -f $SCRATCH_MNT/mydir/*
  rmdir $SCRATCH_MNT/mydir

  # An fsck, run by the fstests framework everytime a test finishes, also detected
  # the inconsistency and printed the following error message:
  #
  # root 5 inode 257 errors 2001, no inode item, link count wrong
  #    unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
  #    unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

  status=0
  exit

The expected golden output for the test is:

  file 'foo' link count after log replay: 5
  file 'hello' size after log replay: 0

Which is the output after this patch and when running the test against
xfs and ext4. Without this patch, the test's output is:

  file 'foo' link count after log replay: 1
  Link mydir/foo_2 is missing
  Link mydir/foo_3 is missing
  File hello is missing
  stat: cannot stat '/home/fdmanana/btrfs-tests/scratch_1/hello': No such file or directory
  file 'hello' size after log replay:
  Link mydir/hello_2 is missing
  Link mydir/x/y/foo_y_link is missing
  Link mydir/x/y/z/foo_z_link is missing
  File mydir/x/y/z/qwerty is missing
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y/z': No such file or directory
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x/y': No such file or directory
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/x': No such file or directory
  rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_2': Stale file handle
  rm: cannot remove '/home/fdmanana/btrfs-tests/scratch_1/mydir/foo_3': Stale file handle
  rmdir: failed to remove '/home/fdmanana/btrfs-tests/scratch_1/mydir': Directory not empty

Fsck, without this fix, also complains about the wrong link count:

  root 5 inode 257 errors 2001, no inode item, link count wrong
      unresolved ref dir 258 index 2 namelen 5 name foo_2 filetype 1 errors 4, no inode ref
      unresolved ref dir 258 index 3 namelen 5 name foo_3 filetype 1 errors 4, no inode ref

So fix this by logging the dentries when fsyncing a directory.

A test case for xfstests follows.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---

V2: Speedup search when there's a gap between dentry indexes, and we reached
    the last slot of a leaf, by using btrfs_next_leaf().

V3: Don't break out of while loop if next_leaf returns 1, there might be
    other child directory inodes to process, so goto next_dir_inode instead.
    This is only for the case where parent directories have an inode number
    larger than the number of some children directory inodes.

 fs/btrfs/file.c     |   5 +-
 fs/btrfs/tree-log.c | 184 ++++++++++++++++++++++++++++++++++++++++++++++++++--
 fs/btrfs/tree-log.h |   2 +
 3 files changed, 185 insertions(+), 6 deletions(-)

[v3] Btrfs: fix metadata inconsistencies after directory fsync

Commit Message

Patch