diff mbox series

[v2,2/2] btrfs: send: fix failures when processing inodes with no links

Message ID 20220811100912.126447-3-bingjingc@synology.com (mailing list archive)
State New, archived
Headers show
Series btrfs: send: fix failures when processing inodes with no links | expand

Commit Message

bingjingc Aug. 11, 2022, 10:09 a.m. UTC
From: BingJing Chang <bingjingc@synology.com>

There is a bug causing send failures when processing an orphan directory
with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
has deleted files still open")', the orphan inode issue was addressed. The
send operation fails with a ENOENT error because of any attempts to
generate a path for the inode with a link count of zero. Therefore, in that
patch, sctx->ignore_cur_inode was introduced to be set if the current inode
has a link count of zero for bypassing some unnecessary steps. And a helper
function btrfs_unlink_all_paths() was introduced and called to clean up old
paths found in the parent snapshot. However, not only regular files but
also directories can be orphan inodes. So if the send operation meets an
orphan directory, it will issue a wrong unlink command for that directory
now. Soon the receive operation fails with a EISDIR error. Besides, the
send operation also fails with a ENOENT error later when it tries to
generate a path of it.

Similar example but making an orphan dir for an incremental send:

  $ btrfs subvolume create vol
  $ mkdir vol/dir
  $ touch vol/dir/foo

  $ btrfs subvolume snapshot -r vol snap1
  $ btrfs subvolume snapshot -r vol snap2

  # Turn the second snapshot to RW mode and delete the whole dir while
  # holding an open file descriptor on it.
  $ btrfs property set snap2 ro false
  $ exec 73<snap2/dir
  $ rm -rf snap2/dir

  # Set the second snapshot back to RO mode and do an incremental send.
  $ btrfs property set snap2 ro true
  $ mkdir receive_dir
  $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
  At subvol snap2
  At snapshot snap2
  ERROR: send ioctl failed with -2: No such file or directory
  ERROR: unlink dir failed. Is a directory

Actually, orphan inodes are more common use cases in cascading backups.
(Please see the illustration below.) In a cascading backup, a user wants
to replicate a couple of snapshots from Machine A to Machine B and from
Machine B to Machine C. Machine B doesn't take any RO snapshots for
sending. All a receiver does is create an RW snapshot of its parent
snapshot, apply the send stream and turn it into RO mode at the end. Even
if all paths of some inodes are deleted in applying the send stream, these
inodes would not be deleted and become orphans after changing the subvolume
from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
but also in parent snapshots because Machine B may do a batch replication
of a couple of snapshots.

An illustration for cascading backups:
Machine A (snapshot {1..n}) --> Machine B --> Machine C

The intuition to solve the problem is to delete all the items of orphan
inodes before using these snapshots for sending. I used to think that the
reasonable timing for doing that is during the ioctl of changing the
subvolume from RW to RO because it sounds good that we will not modify the
fs tree of a RO snapshot anymore. However, attempting to do the orphan
cleanup in the ioctl would be pointless. Because if someone is holding an
open file descriptor on the inode, the reference count of the inode will
never drop to 0. Then iput() cannot trigger eviction, which finally deletes
all the items of it. So we try to extend the original patch to handle
orphans in send/parent snapshots. Here are several cases that need to be
considered:

Case 1: BTRFS_COMPARE_TREE_NEW
       |  send snapshot  | action
 --------------------------------
 nlink |        0        | ignore

In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
it means that a new inode is found in the send snapshot and it doesn't
appear in the parent snapshot. Since this inode has a link count of zero
(It's an orphan and there're no paths for it.), we can leverage
sctx->ignore_cur_inode in the original patch to prevent it from being
created.

Case 2: BTRFS_COMPARE_TREE_DELETED
       | parent snapshot | action
 ----------------------------------
 nlink |        0        | as usual

In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
result, it means that the inode only appears in the parent snapshot.
As usual, the send operation will try to delete all its paths. However,
this inode has a link count of zero, so no paths of it will be found. No
deletion operations will be issued. We don't need to change any logic.

Case 3: BTRFS_COMPARE_TREE_CHANGED
           |       | parent snapshot | send snapshot | action
 -----------------------------------------------------------------------
 subcase 1 | nlink |        0        |       0       | ignore
 subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
 subcase 3 | nlink |        0        |      >0       | new_gen(creation)

In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
it means that the inode appears in both snapshots. Here're three subcases.

First, if the inode has link counts of zero in both snapshots. Since there
are no paths for this inode in (source/destination) parent snapshots and we
don't care about whether there is also an orphan inode in destination or
not, we can set sctx->ignore_cur_inode on to prevent it from being created.

For the second and the third subcases, if there're paths in one snapshot
and there're no paths in the other snapshot for this inode. We can treat
this inode as a new generation. We can also leverage the logic handling a
new generation of an inode with small adjustments. Then it will delete all
old paths and create a new inode with new attributes and paths only when
there's a positive link count in the send snapshot. In subcase 2, the
send operation only needs to delete all old paths as in the parent
snapshot. But it may require more operations for a directory to remove its
old paths. If a not-empty directory is going to be deleted (because it has
a link count of zero in the send snapshot) but there're files/directories
with bigger inode numbers under it, the send operation will need to rename
it to its orphan name first. After processing and deleting the last item
under this directory, the send operation will check this directory, aka
the parent directory of the last item, again and issue a rmdir operation
to remove it finally. Therefore, we also need to treat inodes with a link
count of zero as if they didn't exist in get_cur_inode_state(), which is
used in process_recorded_refs(). By doing this, when reviewing a directory
with orphan names after the last item under it has been deleted, the send
operation now can properly issue a rmdir operation. Otherwise, without
doing this, the orphan directory with an orphan name would be kept here
at the end due to the existing inode with a link count of zero being found.
In subcase 3, as in case 2, no old paths would be found, so no deletion
operations will be issued. The send operation will only create a new one
for that inode.

Note that subcase 3 is not a common case. That's because it's easy to
reduce the hard links of an inode, but once all valid paths are removed,
there're no valid paths for creating other hard links. The only way to do
that is trying to send an older snapshot after a newer snapshot has been
sent.

Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
failure when root has deleted files still open
Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
introduce recorded_ref_alloc and recorded_ref_free
Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
sending link commands for existing file paths
Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
use the rbtree based inode ref management infrastructure
Reviewed-by: Robbie Ko <robbieko@synology.com>
Signed-off-by: BingJing Chang <bingjingc@synology.com>
---
 fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
 1 file changed, 85 insertions(+), 129 deletions(-)

Comments

Filipe Manana Aug. 11, 2022, noon UTC | #1
On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
>
> From: BingJing Chang <bingjingc@synology.com>
>
> There is a bug causing send failures when processing an orphan directory
> with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> has deleted files still open")', the orphan inode issue was addressed. The
> send operation fails with a ENOENT error because of any attempts to
> generate a path for the inode with a link count of zero. Therefore, in that
> patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> has a link count of zero for bypassing some unnecessary steps. And a helper
> function btrfs_unlink_all_paths() was introduced and called to clean up old
> paths found in the parent snapshot. However, not only regular files but
> also directories can be orphan inodes. So if the send operation meets an
> orphan directory, it will issue a wrong unlink command for that directory
> now. Soon the receive operation fails with a EISDIR error. Besides, the
> send operation also fails with a ENOENT error later when it tries to
> generate a path of it.
>
> Similar example but making an orphan dir for an incremental send:
>
>   $ btrfs subvolume create vol
>   $ mkdir vol/dir
>   $ touch vol/dir/foo
>
>   $ btrfs subvolume snapshot -r vol snap1
>   $ btrfs subvolume snapshot -r vol snap2
>
>   # Turn the second snapshot to RW mode and delete the whole dir while
>   # holding an open file descriptor on it.
>   $ btrfs property set snap2 ro false
>   $ exec 73<snap2/dir
>   $ rm -rf snap2/dir
>
>   # Set the second snapshot back to RO mode and do an incremental send.
>   $ btrfs property set snap2 ro true
>   $ mkdir receive_dir
>   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
>   At subvol snap2
>   At snapshot snap2
>   ERROR: send ioctl failed with -2: No such file or directory
>   ERROR: unlink dir failed. Is a directory
>
> Actually, orphan inodes are more common use cases in cascading backups.
> (Please see the illustration below.) In a cascading backup, a user wants
> to replicate a couple of snapshots from Machine A to Machine B and from
> Machine B to Machine C. Machine B doesn't take any RO snapshots for
> sending. All a receiver does is create an RW snapshot of its parent
> snapshot, apply the send stream and turn it into RO mode at the end. Even
> if all paths of some inodes are deleted in applying the send stream, these
> inodes would not be deleted and become orphans after changing the subvolume
> from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> but also in parent snapshots because Machine B may do a batch replication
> of a couple of snapshots.
>
> An illustration for cascading backups:
> Machine A (snapshot {1..n}) --> Machine B --> Machine C
>
> The intuition to solve the problem is to delete all the items of orphan
> inodes before using these snapshots for sending. I used to think that the
> reasonable timing for doing that is during the ioctl of changing the
> subvolume from RW to RO because it sounds good that we will not modify the
> fs tree of a RO snapshot anymore. However, attempting to do the orphan
> cleanup in the ioctl would be pointless. Because if someone is holding an
> open file descriptor on the inode, the reference count of the inode will
> never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> all the items of it. So we try to extend the original patch to handle
> orphans in send/parent snapshots. Here are several cases that need to be
> considered:
>
> Case 1: BTRFS_COMPARE_TREE_NEW
>        |  send snapshot  | action
>  --------------------------------
>  nlink |        0        | ignore
>
> In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> it means that a new inode is found in the send snapshot and it doesn't
> appear in the parent snapshot. Since this inode has a link count of zero
> (It's an orphan and there're no paths for it.), we can leverage
> sctx->ignore_cur_inode in the original patch to prevent it from being
> created.
>
> Case 2: BTRFS_COMPARE_TREE_DELETED
>        | parent snapshot | action
>  ----------------------------------
>  nlink |        0        | as usual
>
> In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> result, it means that the inode only appears in the parent snapshot.
> As usual, the send operation will try to delete all its paths. However,
> this inode has a link count of zero, so no paths of it will be found. No
> deletion operations will be issued. We don't need to change any logic.
>
> Case 3: BTRFS_COMPARE_TREE_CHANGED
>            |       | parent snapshot | send snapshot | action
>  -----------------------------------------------------------------------
>  subcase 1 | nlink |        0        |       0       | ignore
>  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
>  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
>
> In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> it means that the inode appears in both snapshots. Here're three subcases.
>
> First, if the inode has link counts of zero in both snapshots. Since there
> are no paths for this inode in (source/destination) parent snapshots and we
> don't care about whether there is also an orphan inode in destination or
> not, we can set sctx->ignore_cur_inode on to prevent it from being created.
>
> For the second and the third subcases, if there're paths in one snapshot
> and there're no paths in the other snapshot for this inode. We can treat
> this inode as a new generation. We can also leverage the logic handling a
> new generation of an inode with small adjustments. Then it will delete all
> old paths and create a new inode with new attributes and paths only when
> there's a positive link count in the send snapshot. In subcase 2, the
> send operation only needs to delete all old paths as in the parent
> snapshot. But it may require more operations for a directory to remove its
> old paths. If a not-empty directory is going to be deleted (because it has
> a link count of zero in the send snapshot) but there're files/directories
> with bigger inode numbers under it, the send operation will need to rename
> it to its orphan name first. After processing and deleting the last item
> under this directory, the send operation will check this directory, aka
> the parent directory of the last item, again and issue a rmdir operation
> to remove it finally. Therefore, we also need to treat inodes with a link
> count of zero as if they didn't exist in get_cur_inode_state(), which is
> used in process_recorded_refs(). By doing this, when reviewing a directory
> with orphan names after the last item under it has been deleted, the send
> operation now can properly issue a rmdir operation. Otherwise, without
> doing this, the orphan directory with an orphan name would be kept here
> at the end due to the existing inode with a link count of zero being found.
> In subcase 3, as in case 2, no old paths would be found, so no deletion
> operations will be issued. The send operation will only create a new one
> for that inode.
>
> Note that subcase 3 is not a common case. That's because it's easy to
> reduce the hard links of an inode, but once all valid paths are removed,
> there're no valid paths for creating other hard links. The only way to do
> that is trying to send an older snapshot after a newer snapshot has been
> sent.
>
> Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> failure when root has deleted files still open
> Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> introduce recorded_ref_alloc and recorded_ref_free
> Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> sending link commands for existing file paths
> Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> use the rbtree based inode ref management infrastructure

Btw, lines with CC, Fixes, etc, tags should not be broken even if they
are wider than 74 characters.

So, in v1 when I gave you that example of CC stable tags, it wasn't
meant for you to literally copy-paste them.

First I asked if the purpose of the original Fixes tag was to backport
the fix to stable releases.
Was that the intention? You didn't provide an answer about that.

Then I told if that was the case, the proper way would be adding CC
stable tags and listing any
dependencies. I gave those 4 as examples with commits that are fairly
recent and obvious dependencies,
but I also said that probably there's a lot more missing - especially
if we want to backport to as far as 4.9.

Even with just those 4 dependencies, some of those commits are fairly
large, and that may be frowned upon
according to stable backport rules (listed at
https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
For e.g., patches with over 100 lines changed.

Now, did you actually verify if there were more dependencies? (and test)
And do you really want to go as far as 4.9 (currently the oldest
stable release)?
I seriously doubt that those 4 commits are the only dependencies in
order to be able to cleanly backport to 4.9 and other old branches.

It may be better to backport only to a few younger stable branches, or
just provide later a version of the patch to
apply to each desired stable branch (once the fix is in Linus' tree
and in a -rc release).

If you are not interested in backporting to stable or don't have the
time to verify the dependencies and test, then just remove all the
stable tags.
Just leave a fixes tag:

Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")

Also, please don't forget to send a test case for fstests, covering as
many cases as possible (not just the example
at the beginning of the changelog).

Thanks.

> Reviewed-by: Robbie Ko <robbieko@synology.com>
> Signed-off-by: BingJing Chang <bingjingc@synology.com>
> ---
>  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
>  1 file changed, 85 insertions(+), 129 deletions(-)
>
> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> index f8d77a33b9b7..6ab1ba66ff4b 100644
> --- a/fs/btrfs/send.c
> +++ b/fs/btrfs/send.c
> @@ -850,6 +850,7 @@ struct btrfs_inode_info {
>         u64 gid;
>         u64 rdev;
>         u64 attr;
> +       u64 nlink;
>  };
>
>  /*
> @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
>         info->uid = btrfs_inode_uid(path->nodes[0], ii);
>         info->gid = btrfs_inode_gid(path->nodes[0], ii);
>         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
>         /*
>          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
>          * otherwise logically split to 32/32 parts.
> @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
>         int right_ret;
>         u64 left_gen;
>         u64 right_gen;
> +       struct btrfs_inode_info info;
>
> -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> +       ret = get_inode_info(sctx->send_root, ino, &info);
>         if (ret < 0 && ret != -ENOENT)
>                 goto out;
> -       left_ret = ret;
> +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> +       left_gen = info.gen;
>
>         if (!sctx->parent_root) {
>                 right_ret = -ENOENT;
>         } else {
> -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> +               ret = get_inode_info(sctx->parent_root, ino, &info);
>                 if (ret < 0 && ret != -ENOENT)
>                         goto out;
> -               right_ret = ret;
> +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> +               right_gen = info.gen;
>         }
>
>         if (!left_ret && !right_ret) {
> @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
>         return ret;
>  }
>
> -struct parent_paths_ctx {
> -       struct list_head *refs;
> -       struct send_ctx *sctx;
> -};
> -
> -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> -                            void *ctx)
> -{
> -       struct parent_paths_ctx *ppctx = ctx;
> -
> -       /*
> -        * Pass 0 as the generation for the directory, we don't care about it
> -        * here as we have no new references to add, we just want to delete all
> -        * references for an inode.
> -        */
> -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> -                                 name, dir, 0, ppctx->sctx);
> -}
> -
> -/*
> - * Issue unlink operations for all paths of the current inode found in the
> - * parent snapshot.
> - */
> -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> -{
> -       LIST_HEAD(deleted_refs);
> -       struct btrfs_path *path;
> -       struct btrfs_root *root = sctx->parent_root;
> -       struct btrfs_key key;
> -       struct btrfs_key found_key;
> -       struct parent_paths_ctx ctx;
> -       int iter_ret = 0;
> -       int ret;
> -
> -       path = alloc_path_for_send();
> -       if (!path)
> -               return -ENOMEM;
> -
> -       key.objectid = sctx->cur_ino;
> -       key.type = BTRFS_INODE_REF_KEY;
> -       key.offset = 0;
> -
> -       ctx.refs = &deleted_refs;
> -       ctx.sctx = sctx;
> -
> -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> -               if (found_key.objectid != key.objectid)
> -                       break;
> -               if (found_key.type != key.type &&
> -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> -                       break;
> -
> -               ret = iterate_inode_ref(root, path, &found_key, 1,
> -                                       record_parent_ref, &ctx);
> -               if (ret < 0)
> -                       goto out;
> -       }
> -       /* Catch error found during iteration */
> -       if (iter_ret < 0) {
> -               ret = iter_ret;
> -               goto out;
> -       }
> -
> -       while (!list_empty(&deleted_refs)) {
> -               struct recorded_ref *ref;
> -
> -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> -               ret = send_unlink(sctx, ref->full_path);
> -               if (ret < 0)
> -                       goto out;
> -               recorded_ref_free(ref);
> -       }
> -       ret = 0;
> -out:
> -       btrfs_free_path(path);
> -       if (ret)
> -               __free_recorded_refs(&deleted_refs);
> -       return ret;
> -}
> -
>  static void close_current_inode(struct send_ctx *sctx)
>  {
>         u64 i_size;
> @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
>          * file descriptor against it or turning a RO snapshot into RW mode,
>          * keep an open file descriptor against a file, delete it and then
>          * turn the snapshot back to RO mode before using it for a send
> -        * operation. So if we find such cases, ignore the inode and all its
> -        * items completely if it's a new inode, or if it's a changed inode
> -        * make sure all its previous paths (from the parent snapshot) are all
> -        * unlinked and all other the inode items are ignored.
> +        * operation. The former is what the receiver operation does.
> +        * Therefore, if we want to send these snapshots soon after they're
> +        * received, we need to handle orphan inodes as well. Moreover,
> +        * orphans can appear not only in the send snapshot but also in the
> +        * parent snapshot. Here are several cases:
> +        *
> +        * Case 1: BTRFS_COMPARE_TREE_NEW
> +        *       |  send snapshot  | action
> +        * --------------------------------
> +        * nlink |        0        | ignore
> +        *
> +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> +        *       | parent snapshot | action
> +        * ----------------------------------
> +        * nlink |        0        | as usual
> +        * Note: No unlinks will be sent because there're no paths for it.
> +        *
> +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> +        *           |       | parent snapshot | send snapshot | action
> +        * -----------------------------------------------------------------------
> +        * subcase 1 | nlink |        0        |       0       | ignore
> +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> +        *
>          */
> -       if (result == BTRFS_COMPARE_TREE_NEW ||
> -           result == BTRFS_COMPARE_TREE_CHANGED) {
> -               u32 nlinks;
> -
> -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> -               if (nlinks == 0) {
> +       if (result == BTRFS_COMPARE_TREE_NEW) {
> +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> +                                     0) {
>                         sctx->ignore_cur_inode = true;
> -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> -                               ret = btrfs_unlink_all_paths(sctx);
>                         goto out;
>                 }
> -       }
> -
> -       if (result == BTRFS_COMPARE_TREE_NEW) {
>                 sctx->cur_inode_gen = left_gen;
>                 sctx->cur_inode_new = true;
>                 sctx->cur_inode_deleted = false;
> @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
>                 sctx->cur_inode_mode = btrfs_inode_mode(
>                                 sctx->right_path->nodes[0], right_ii);
>         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> +               u32 new_nlinks, old_nlinks;
> +
> +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> +                                              left_ii);
> +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> +                                              right_ii);
> +               if (new_nlinks == 0 && old_nlinks == 0) {
> +                       sctx->ignore_cur_inode = true;
> +                       goto out;
> +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> +                       sctx->cur_inode_new_gen = 1;
> +               }
>                 /*
>                  * We need to do some special handling in case the inode was
>                  * reported as changed with a changed generation number. This
> @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
>                         /*
>                          * Now process the inode as if it was new.
>                          */
> -                       sctx->cur_inode_gen = left_gen;
> -                       sctx->cur_inode_new = true;
> -                       sctx->cur_inode_deleted = false;
> -                       sctx->cur_inode_size = btrfs_inode_size(
> -                                       sctx->left_path->nodes[0], left_ii);
> -                       sctx->cur_inode_mode = btrfs_inode_mode(
> -                                       sctx->left_path->nodes[0], left_ii);
> -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> -                                       sctx->left_path->nodes[0], left_ii);
> -                       ret = send_create_inode_if_needed(sctx);
> -                       if (ret < 0)
> -                               goto out;
> +                       if (new_nlinks > 0) {
> +                               sctx->cur_inode_gen = left_gen;
> +                               sctx->cur_inode_new = true;
> +                               sctx->cur_inode_deleted = false;
> +                               sctx->cur_inode_size = btrfs_inode_size(
> +                                               sctx->left_path->nodes[0],
> +                                               left_ii);
> +                               sctx->cur_inode_mode = btrfs_inode_mode(
> +                                               sctx->left_path->nodes[0],
> +                                               left_ii);
> +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> +                                               sctx->left_path->nodes[0],
> +                                               left_ii);
> +                               ret = send_create_inode_if_needed(sctx);
> +                               if (ret < 0)
> +                                       goto out;
>
> -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> -                       if (ret < 0)
> -                               goto out;
> -                       /*
> -                        * Advance send_progress now as we did not get into
> -                        * process_recorded_refs_if_needed in the new_gen case.
> -                        */
> -                       sctx->send_progress = sctx->cur_ino + 1;
> +                               ret = process_all_refs(sctx,
> +                                               BTRFS_COMPARE_TREE_NEW);
> +                               if (ret < 0)
> +                                       goto out;
> +                               /*
> +                                * Advance send_progress now as we did not get
> +                                * into process_recorded_refs_if_needed in the
> +                                * new_gen case.
> +                                */
> +                               sctx->send_progress = sctx->cur_ino + 1;
>
> -                       /*
> -                        * Now process all extents and xattrs of the inode as if
> -                        * they were all new.
> -                        */
> -                       ret = process_all_extents(sctx);
> -                       if (ret < 0)
> -                               goto out;
> -                       ret = process_all_new_xattrs(sctx);
> -                       if (ret < 0)
> -                               goto out;
> +                               /*
> +                                * Now process all extents and xattrs of the
> +                                * inode as if they were all new.
> +                                */
> +                               ret = process_all_extents(sctx);
> +                               if (ret < 0)
> +                                       goto out;
> +                               ret = process_all_new_xattrs(sctx);
> +                               if (ret < 0)
> +                                       goto out;
> +                       }
>                 } else {
>                         sctx->cur_inode_gen = left_gen;
>                         sctx->cur_inode_new = false;
> --
> 2.37.1
>
bingjing chang Aug. 12, 2022, 2:36 p.m. UTC | #2
Filipe Manana <fdmanana@kernel.org> 於 2022年8月11日 週四 晚上8:00寫道:
>
> On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
> >
> > From: BingJing Chang <bingjingc@synology.com>
> >
> > There is a bug causing send failures when processing an orphan directory
> > with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> > has deleted files still open")', the orphan inode issue was addressed. The
> > send operation fails with a ENOENT error because of any attempts to
> > generate a path for the inode with a link count of zero. Therefore, in that
> > patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> > has a link count of zero for bypassing some unnecessary steps. And a helper
> > function btrfs_unlink_all_paths() was introduced and called to clean up old
> > paths found in the parent snapshot. However, not only regular files but
> > also directories can be orphan inodes. So if the send operation meets an
> > orphan directory, it will issue a wrong unlink command for that directory
> > now. Soon the receive operation fails with a EISDIR error. Besides, the
> > send operation also fails with a ENOENT error later when it tries to
> > generate a path of it.
> >
> > Similar example but making an orphan dir for an incremental send:
> >
> >   $ btrfs subvolume create vol
> >   $ mkdir vol/dir
> >   $ touch vol/dir/foo
> >
> >   $ btrfs subvolume snapshot -r vol snap1
> >   $ btrfs subvolume snapshot -r vol snap2
> >
> >   # Turn the second snapshot to RW mode and delete the whole dir while
> >   # holding an open file descriptor on it.
> >   $ btrfs property set snap2 ro false
> >   $ exec 73<snap2/dir
> >   $ rm -rf snap2/dir
> >
> >   # Set the second snapshot back to RO mode and do an incremental send.
> >   $ btrfs property set snap2 ro true
> >   $ mkdir receive_dir
> >   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
> >   At subvol snap2
> >   At snapshot snap2
> >   ERROR: send ioctl failed with -2: No such file or directory
> >   ERROR: unlink dir failed. Is a directory
> >
> > Actually, orphan inodes are more common use cases in cascading backups.
> > (Please see the illustration below.) In a cascading backup, a user wants
> > to replicate a couple of snapshots from Machine A to Machine B and from
> > Machine B to Machine C. Machine B doesn't take any RO snapshots for
> > sending. All a receiver does is create an RW snapshot of its parent
> > snapshot, apply the send stream and turn it into RO mode at the end. Even
> > if all paths of some inodes are deleted in applying the send stream, these
> > inodes would not be deleted and become orphans after changing the subvolume
> > from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> > but also in parent snapshots because Machine B may do a batch replication
> > of a couple of snapshots.
> >
> > An illustration for cascading backups:
> > Machine A (snapshot {1..n}) --> Machine B --> Machine C
> >
> > The intuition to solve the problem is to delete all the items of orphan
> > inodes before using these snapshots for sending. I used to think that the
> > reasonable timing for doing that is during the ioctl of changing the
> > subvolume from RW to RO because it sounds good that we will not modify the
> > fs tree of a RO snapshot anymore. However, attempting to do the orphan
> > cleanup in the ioctl would be pointless. Because if someone is holding an
> > open file descriptor on the inode, the reference count of the inode will
> > never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> > all the items of it. So we try to extend the original patch to handle
> > orphans in send/parent snapshots. Here are several cases that need to be
> > considered:
> >
> > Case 1: BTRFS_COMPARE_TREE_NEW
> >        |  send snapshot  | action
> >  --------------------------------
> >  nlink |        0        | ignore
> >
> > In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> > it means that a new inode is found in the send snapshot and it doesn't
> > appear in the parent snapshot. Since this inode has a link count of zero
> > (It's an orphan and there're no paths for it.), we can leverage
> > sctx->ignore_cur_inode in the original patch to prevent it from being
> > created.
> >
> > Case 2: BTRFS_COMPARE_TREE_DELETED
> >        | parent snapshot | action
> >  ----------------------------------
> >  nlink |        0        | as usual
> >
> > In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> > result, it means that the inode only appears in the parent snapshot.
> > As usual, the send operation will try to delete all its paths. However,
> > this inode has a link count of zero, so no paths of it will be found. No
> > deletion operations will be issued. We don't need to change any logic.
> >
> > Case 3: BTRFS_COMPARE_TREE_CHANGED
> >            |       | parent snapshot | send snapshot | action
> >  -----------------------------------------------------------------------
> >  subcase 1 | nlink |        0        |       0       | ignore
> >  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> >  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> >
> > In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> > it means that the inode appears in both snapshots. Here're three subcases.
> >
> > First, if the inode has link counts of zero in both snapshots. Since there
> > are no paths for this inode in (source/destination) parent snapshots and we
> > don't care about whether there is also an orphan inode in destination or
> > not, we can set sctx->ignore_cur_inode on to prevent it from being created.
> >
> > For the second and the third subcases, if there're paths in one snapshot
> > and there're no paths in the other snapshot for this inode. We can treat
> > this inode as a new generation. We can also leverage the logic handling a
> > new generation of an inode with small adjustments. Then it will delete all
> > old paths and create a new inode with new attributes and paths only when
> > there's a positive link count in the send snapshot. In subcase 2, the
> > send operation only needs to delete all old paths as in the parent
> > snapshot. But it may require more operations for a directory to remove its
> > old paths. If a not-empty directory is going to be deleted (because it has
> > a link count of zero in the send snapshot) but there're files/directories
> > with bigger inode numbers under it, the send operation will need to rename
> > it to its orphan name first. After processing and deleting the last item
> > under this directory, the send operation will check this directory, aka
> > the parent directory of the last item, again and issue a rmdir operation
> > to remove it finally. Therefore, we also need to treat inodes with a link
> > count of zero as if they didn't exist in get_cur_inode_state(), which is
> > used in process_recorded_refs(). By doing this, when reviewing a directory
> > with orphan names after the last item under it has been deleted, the send
> > operation now can properly issue a rmdir operation. Otherwise, without
> > doing this, the orphan directory with an orphan name would be kept here
> > at the end due to the existing inode with a link count of zero being found.
> > In subcase 3, as in case 2, no old paths would be found, so no deletion
> > operations will be issued. The send operation will only create a new one
> > for that inode.
> >
> > Note that subcase 3 is not a common case. That's because it's easy to
> > reduce the hard links of an inode, but once all valid paths are removed,
> > there're no valid paths for creating other hard links. The only way to do
> > that is trying to send an older snapshot after a newer snapshot has been
> > sent.
> >
> > Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> > failure when root has deleted files still open
> > Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> > introduce recorded_ref_alloc and recorded_ref_free
> > Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> > sending link commands for existing file paths
> > Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> > use the rbtree based inode ref management infrastructure
>
> Btw, lines with CC, Fixes, etc, tags should not be broken even if they
> are wider than 74 characters.
>

Okay, thank you for telling me that.

> So, in v1 when I gave you that example of CC stable tags, it wasn't
> meant for you to literally copy-paste them.
>
> First I asked if the purpose of the original Fixes tag was to backport
> the fix to stable releases.
> Was that the intention? You didn't provide an answer about that.
>

Oh, I misunderstood your suggestion. I'm sorry about that.
Our intention is to report this bug and try to provide a reasonable and
acceptable fix for it. Backporting is not our goal.

> Then I told if that was the case, the proper way would be adding CC
> stable tags and listing any
> dependencies. I gave those 4 as examples with commits that are fairly
> recent and obvious dependencies,
> but I also said that probably there's a lot more missing - especially
> if we want to backport to as far as 4.9.
>

> Even with just those 4 dependencies, some of those commits are fairly
> large, and that may be frowned upon
> according to stable backport rules (listed at
> https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
> For e.g., patches with over 100 lines changed.
>
> Now, did you actually verify if there were more dependencies? (and test)
> And do you really want to go as far as 4.9 (currently the oldest
> stable release)?

No, I didn't. I used to think the CC tag was a very cool feature, which
just putting a few commits lets backport easily when I read your mail,
so I copied and pasted these 4 commits in the beginning of revising
the patch v2. However, I'm wrong.

> I seriously doubt that those 4 commits are the only dependencies in
> order to be able to cleanly backport to 4.9 and other old branches.
>
> It may be better to backport only to a few younger stable branches, or
> just provide later a version of the patch to
> apply to each desired stable branch (once the fix is in Linus' tree
> and in a -rc release).
>
> If you are not interested in backporting to stable or don't have the
> time to verify the dependencies and test, then just remove all the
> stable tags.
> Just leave a fixes tag:
>
> Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
>

Since backporting is not our goal. I will just leave the fix tag here.

> Also, please don't forget to send a test case for fstests, covering as
> many cases as possible (not just the example
> at the beginning of the changelog).
>

Okay, I will submit a test case covering all cases.
Because I still need to spend time learning how to use the fssum utility
for the last test case you reviewed, so I will submit the test case later.

Thanks.

> Thanks.
>
> > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > Signed-off-by: BingJing Chang <bingjingc@synology.com>
> > ---
> >  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
> >  1 file changed, 85 insertions(+), 129 deletions(-)
> >
> > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > index f8d77a33b9b7..6ab1ba66ff4b 100644
> > --- a/fs/btrfs/send.c
> > +++ b/fs/btrfs/send.c
> > @@ -850,6 +850,7 @@ struct btrfs_inode_info {
> >         u64 gid;
> >         u64 rdev;
> >         u64 attr;
> > +       u64 nlink;
> >  };
> >
> >  /*
> > @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
> >         info->uid = btrfs_inode_uid(path->nodes[0], ii);
> >         info->gid = btrfs_inode_gid(path->nodes[0], ii);
> >         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> > +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
> >         /*
> >          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
> >          * otherwise logically split to 32/32 parts.
> > @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
> >         int right_ret;
> >         u64 left_gen;
> >         u64 right_gen;
> > +       struct btrfs_inode_info info;
> >
> > -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> > +       ret = get_inode_info(sctx->send_root, ino, &info);
> >         if (ret < 0 && ret != -ENOENT)
> >                 goto out;
> > -       left_ret = ret;
> > +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> > +       left_gen = info.gen;
> >
> >         if (!sctx->parent_root) {
> >                 right_ret = -ENOENT;
> >         } else {
> > -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> > +               ret = get_inode_info(sctx->parent_root, ino, &info);
> >                 if (ret < 0 && ret != -ENOENT)
> >                         goto out;
> > -               right_ret = ret;
> > +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> > +               right_gen = info.gen;
> >         }
> >
> >         if (!left_ret && !right_ret) {
> > @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
> >         return ret;
> >  }
> >
> > -struct parent_paths_ctx {
> > -       struct list_head *refs;
> > -       struct send_ctx *sctx;
> > -};
> > -
> > -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> > -                            void *ctx)
> > -{
> > -       struct parent_paths_ctx *ppctx = ctx;
> > -
> > -       /*
> > -        * Pass 0 as the generation for the directory, we don't care about it
> > -        * here as we have no new references to add, we just want to delete all
> > -        * references for an inode.
> > -        */
> > -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> > -                                 name, dir, 0, ppctx->sctx);
> > -}
> > -
> > -/*
> > - * Issue unlink operations for all paths of the current inode found in the
> > - * parent snapshot.
> > - */
> > -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> > -{
> > -       LIST_HEAD(deleted_refs);
> > -       struct btrfs_path *path;
> > -       struct btrfs_root *root = sctx->parent_root;
> > -       struct btrfs_key key;
> > -       struct btrfs_key found_key;
> > -       struct parent_paths_ctx ctx;
> > -       int iter_ret = 0;
> > -       int ret;
> > -
> > -       path = alloc_path_for_send();
> > -       if (!path)
> > -               return -ENOMEM;
> > -
> > -       key.objectid = sctx->cur_ino;
> > -       key.type = BTRFS_INODE_REF_KEY;
> > -       key.offset = 0;
> > -
> > -       ctx.refs = &deleted_refs;
> > -       ctx.sctx = sctx;
> > -
> > -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> > -               if (found_key.objectid != key.objectid)
> > -                       break;
> > -               if (found_key.type != key.type &&
> > -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> > -                       break;
> > -
> > -               ret = iterate_inode_ref(root, path, &found_key, 1,
> > -                                       record_parent_ref, &ctx);
> > -               if (ret < 0)
> > -                       goto out;
> > -       }
> > -       /* Catch error found during iteration */
> > -       if (iter_ret < 0) {
> > -               ret = iter_ret;
> > -               goto out;
> > -       }
> > -
> > -       while (!list_empty(&deleted_refs)) {
> > -               struct recorded_ref *ref;
> > -
> > -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> > -               ret = send_unlink(sctx, ref->full_path);
> > -               if (ret < 0)
> > -                       goto out;
> > -               recorded_ref_free(ref);
> > -       }
> > -       ret = 0;
> > -out:
> > -       btrfs_free_path(path);
> > -       if (ret)
> > -               __free_recorded_refs(&deleted_refs);
> > -       return ret;
> > -}
> > -
> >  static void close_current_inode(struct send_ctx *sctx)
> >  {
> >         u64 i_size;
> > @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
> >          * file descriptor against it or turning a RO snapshot into RW mode,
> >          * keep an open file descriptor against a file, delete it and then
> >          * turn the snapshot back to RO mode before using it for a send
> > -        * operation. So if we find such cases, ignore the inode and all its
> > -        * items completely if it's a new inode, or if it's a changed inode
> > -        * make sure all its previous paths (from the parent snapshot) are all
> > -        * unlinked and all other the inode items are ignored.
> > +        * operation. The former is what the receiver operation does.
> > +        * Therefore, if we want to send these snapshots soon after they're
> > +        * received, we need to handle orphan inodes as well. Moreover,
> > +        * orphans can appear not only in the send snapshot but also in the
> > +        * parent snapshot. Here are several cases:
> > +        *
> > +        * Case 1: BTRFS_COMPARE_TREE_NEW
> > +        *       |  send snapshot  | action
> > +        * --------------------------------
> > +        * nlink |        0        | ignore
> > +        *
> > +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> > +        *       | parent snapshot | action
> > +        * ----------------------------------
> > +        * nlink |        0        | as usual
> > +        * Note: No unlinks will be sent because there're no paths for it.
> > +        *
> > +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> > +        *           |       | parent snapshot | send snapshot | action
> > +        * -----------------------------------------------------------------------
> > +        * subcase 1 | nlink |        0        |       0       | ignore
> > +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > +        *
> >          */
> > -       if (result == BTRFS_COMPARE_TREE_NEW ||
> > -           result == BTRFS_COMPARE_TREE_CHANGED) {
> > -               u32 nlinks;
> > -
> > -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> > -               if (nlinks == 0) {
> > +       if (result == BTRFS_COMPARE_TREE_NEW) {
> > +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> > +                                     0) {
> >                         sctx->ignore_cur_inode = true;
> > -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> > -                               ret = btrfs_unlink_all_paths(sctx);
> >                         goto out;
> >                 }
> > -       }
> > -
> > -       if (result == BTRFS_COMPARE_TREE_NEW) {
> >                 sctx->cur_inode_gen = left_gen;
> >                 sctx->cur_inode_new = true;
> >                 sctx->cur_inode_deleted = false;
> > @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
> >                 sctx->cur_inode_mode = btrfs_inode_mode(
> >                                 sctx->right_path->nodes[0], right_ii);
> >         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> > +               u32 new_nlinks, old_nlinks;
> > +
> > +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> > +                                              left_ii);
> > +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> > +                                              right_ii);
> > +               if (new_nlinks == 0 && old_nlinks == 0) {
> > +                       sctx->ignore_cur_inode = true;
> > +                       goto out;
> > +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> > +                       sctx->cur_inode_new_gen = 1;
> > +               }
> >                 /*
> >                  * We need to do some special handling in case the inode was
> >                  * reported as changed with a changed generation number. This
> > @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
> >                         /*
> >                          * Now process the inode as if it was new.
> >                          */
> > -                       sctx->cur_inode_gen = left_gen;
> > -                       sctx->cur_inode_new = true;
> > -                       sctx->cur_inode_deleted = false;
> > -                       sctx->cur_inode_size = btrfs_inode_size(
> > -                                       sctx->left_path->nodes[0], left_ii);
> > -                       sctx->cur_inode_mode = btrfs_inode_mode(
> > -                                       sctx->left_path->nodes[0], left_ii);
> > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > -                                       sctx->left_path->nodes[0], left_ii);
> > -                       ret = send_create_inode_if_needed(sctx);
> > -                       if (ret < 0)
> > -                               goto out;
> > +                       if (new_nlinks > 0) {
> > +                               sctx->cur_inode_gen = left_gen;
> > +                               sctx->cur_inode_new = true;
> > +                               sctx->cur_inode_deleted = false;
> > +                               sctx->cur_inode_size = btrfs_inode_size(
> > +                                               sctx->left_path->nodes[0],
> > +                                               left_ii);
> > +                               sctx->cur_inode_mode = btrfs_inode_mode(
> > +                                               sctx->left_path->nodes[0],
> > +                                               left_ii);
> > +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > +                                               sctx->left_path->nodes[0],
> > +                                               left_ii);
> > +                               ret = send_create_inode_if_needed(sctx);
> > +                               if (ret < 0)
> > +                                       goto out;
> >
> > -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> > -                       if (ret < 0)
> > -                               goto out;
> > -                       /*
> > -                        * Advance send_progress now as we did not get into
> > -                        * process_recorded_refs_if_needed in the new_gen case.
> > -                        */
> > -                       sctx->send_progress = sctx->cur_ino + 1;
> > +                               ret = process_all_refs(sctx,
> > +                                               BTRFS_COMPARE_TREE_NEW);
> > +                               if (ret < 0)
> > +                                       goto out;
> > +                               /*
> > +                                * Advance send_progress now as we did not get
> > +                                * into process_recorded_refs_if_needed in the
> > +                                * new_gen case.
> > +                                */
> > +                               sctx->send_progress = sctx->cur_ino + 1;
> >
> > -                       /*
> > -                        * Now process all extents and xattrs of the inode as if
> > -                        * they were all new.
> > -                        */
> > -                       ret = process_all_extents(sctx);
> > -                       if (ret < 0)
> > -                               goto out;
> > -                       ret = process_all_new_xattrs(sctx);
> > -                       if (ret < 0)
> > -                               goto out;
> > +                               /*
> > +                                * Now process all extents and xattrs of the
> > +                                * inode as if they were all new.
> > +                                */
> > +                               ret = process_all_extents(sctx);
> > +                               if (ret < 0)
> > +                                       goto out;
> > +                               ret = process_all_new_xattrs(sctx);
> > +                               if (ret < 0)
> > +                                       goto out;
> > +                       }
> >                 } else {
> >                         sctx->cur_inode_gen = left_gen;
> >                         sctx->cur_inode_new = false;
> > --
> > 2.37.1
> >
Filipe Manana Aug. 12, 2022, 4:28 p.m. UTC | #3
On Fri, Aug 12, 2022 at 3:36 PM bingjing chang <bxxxjxxg@gmail.com> wrote:
>
> Filipe Manana <fdmanana@kernel.org> 於 2022年8月11日 週四 晚上8:00寫道:
> >
> > On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
> > >
> > > From: BingJing Chang <bingjingc@synology.com>
> > >
> > > There is a bug causing send failures when processing an orphan directory
> > > with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> > > has deleted files still open")', the orphan inode issue was addressed. The
> > > send operation fails with a ENOENT error because of any attempts to
> > > generate a path for the inode with a link count of zero. Therefore, in that
> > > patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> > > has a link count of zero for bypassing some unnecessary steps. And a helper
> > > function btrfs_unlink_all_paths() was introduced and called to clean up old
> > > paths found in the parent snapshot. However, not only regular files but
> > > also directories can be orphan inodes. So if the send operation meets an
> > > orphan directory, it will issue a wrong unlink command for that directory
> > > now. Soon the receive operation fails with a EISDIR error. Besides, the
> > > send operation also fails with a ENOENT error later when it tries to
> > > generate a path of it.
> > >
> > > Similar example but making an orphan dir for an incremental send:
> > >
> > >   $ btrfs subvolume create vol
> > >   $ mkdir vol/dir
> > >   $ touch vol/dir/foo
> > >
> > >   $ btrfs subvolume snapshot -r vol snap1
> > >   $ btrfs subvolume snapshot -r vol snap2
> > >
> > >   # Turn the second snapshot to RW mode and delete the whole dir while
> > >   # holding an open file descriptor on it.
> > >   $ btrfs property set snap2 ro false
> > >   $ exec 73<snap2/dir
> > >   $ rm -rf snap2/dir
> > >
> > >   # Set the second snapshot back to RO mode and do an incremental send.
> > >   $ btrfs property set snap2 ro true
> > >   $ mkdir receive_dir
> > >   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
> > >   At subvol snap2
> > >   At snapshot snap2
> > >   ERROR: send ioctl failed with -2: No such file or directory
> > >   ERROR: unlink dir failed. Is a directory
> > >
> > > Actually, orphan inodes are more common use cases in cascading backups.
> > > (Please see the illustration below.) In a cascading backup, a user wants
> > > to replicate a couple of snapshots from Machine A to Machine B and from
> > > Machine B to Machine C. Machine B doesn't take any RO snapshots for
> > > sending. All a receiver does is create an RW snapshot of its parent
> > > snapshot, apply the send stream and turn it into RO mode at the end. Even
> > > if all paths of some inodes are deleted in applying the send stream, these
> > > inodes would not be deleted and become orphans after changing the subvolume
> > > from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> > > but also in parent snapshots because Machine B may do a batch replication
> > > of a couple of snapshots.
> > >
> > > An illustration for cascading backups:
> > > Machine A (snapshot {1..n}) --> Machine B --> Machine C
> > >
> > > The intuition to solve the problem is to delete all the items of orphan
> > > inodes before using these snapshots for sending. I used to think that the
> > > reasonable timing for doing that is during the ioctl of changing the
> > > subvolume from RW to RO because it sounds good that we will not modify the
> > > fs tree of a RO snapshot anymore. However, attempting to do the orphan
> > > cleanup in the ioctl would be pointless. Because if someone is holding an
> > > open file descriptor on the inode, the reference count of the inode will
> > > never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> > > all the items of it. So we try to extend the original patch to handle
> > > orphans in send/parent snapshots. Here are several cases that need to be
> > > considered:
> > >
> > > Case 1: BTRFS_COMPARE_TREE_NEW
> > >        |  send snapshot  | action
> > >  --------------------------------
> > >  nlink |        0        | ignore
> > >
> > > In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> > > it means that a new inode is found in the send snapshot and it doesn't
> > > appear in the parent snapshot. Since this inode has a link count of zero
> > > (It's an orphan and there're no paths for it.), we can leverage
> > > sctx->ignore_cur_inode in the original patch to prevent it from being
> > > created.
> > >
> > > Case 2: BTRFS_COMPARE_TREE_DELETED
> > >        | parent snapshot | action
> > >  ----------------------------------
> > >  nlink |        0        | as usual
> > >
> > > In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> > > result, it means that the inode only appears in the parent snapshot.
> > > As usual, the send operation will try to delete all its paths. However,
> > > this inode has a link count of zero, so no paths of it will be found. No
> > > deletion operations will be issued. We don't need to change any logic.
> > >
> > > Case 3: BTRFS_COMPARE_TREE_CHANGED
> > >            |       | parent snapshot | send snapshot | action
> > >  -----------------------------------------------------------------------
> > >  subcase 1 | nlink |        0        |       0       | ignore
> > >  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > >  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > >
> > > In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> > > it means that the inode appears in both snapshots. Here're three subcases.
> > >
> > > First, if the inode has link counts of zero in both snapshots. Since there
> > > are no paths for this inode in (source/destination) parent snapshots and we
> > > don't care about whether there is also an orphan inode in destination or
> > > not, we can set sctx->ignore_cur_inode on to prevent it from being created.
> > >
> > > For the second and the third subcases, if there're paths in one snapshot
> > > and there're no paths in the other snapshot for this inode. We can treat
> > > this inode as a new generation. We can also leverage the logic handling a
> > > new generation of an inode with small adjustments. Then it will delete all
> > > old paths and create a new inode with new attributes and paths only when
> > > there's a positive link count in the send snapshot. In subcase 2, the
> > > send operation only needs to delete all old paths as in the parent
> > > snapshot. But it may require more operations for a directory to remove its
> > > old paths. If a not-empty directory is going to be deleted (because it has
> > > a link count of zero in the send snapshot) but there're files/directories
> > > with bigger inode numbers under it, the send operation will need to rename
> > > it to its orphan name first. After processing and deleting the last item
> > > under this directory, the send operation will check this directory, aka
> > > the parent directory of the last item, again and issue a rmdir operation
> > > to remove it finally. Therefore, we also need to treat inodes with a link
> > > count of zero as if they didn't exist in get_cur_inode_state(), which is
> > > used in process_recorded_refs(). By doing this, when reviewing a directory
> > > with orphan names after the last item under it has been deleted, the send
> > > operation now can properly issue a rmdir operation. Otherwise, without
> > > doing this, the orphan directory with an orphan name would be kept here
> > > at the end due to the existing inode with a link count of zero being found.
> > > In subcase 3, as in case 2, no old paths would be found, so no deletion
> > > operations will be issued. The send operation will only create a new one
> > > for that inode.
> > >
> > > Note that subcase 3 is not a common case. That's because it's easy to
> > > reduce the hard links of an inode, but once all valid paths are removed,
> > > there're no valid paths for creating other hard links. The only way to do
> > > that is trying to send an older snapshot after a newer snapshot has been
> > > sent.
> > >
> > > Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> > > failure when root has deleted files still open
> > > Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> > > introduce recorded_ref_alloc and recorded_ref_free
> > > Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> > > sending link commands for existing file paths
> > > Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> > > use the rbtree based inode ref management infrastructure
> >
> > Btw, lines with CC, Fixes, etc, tags should not be broken even if they
> > are wider than 74 characters.
> >
>
> Okay, thank you for telling me that.
>
> > So, in v1 when I gave you that example of CC stable tags, it wasn't
> > meant for you to literally copy-paste them.
> >
> > First I asked if the purpose of the original Fixes tag was to backport
> > the fix to stable releases.
> > Was that the intention? You didn't provide an answer about that.
> >
>
> Oh, I misunderstood your suggestion. I'm sorry about that.
> Our intention is to report this bug and try to provide a reasonable and
> acceptable fix for it. Backporting is not our goal.
>
> > Then I told if that was the case, the proper way would be adding CC
> > stable tags and listing any
> > dependencies. I gave those 4 as examples with commits that are fairly
> > recent and obvious dependencies,
> > but I also said that probably there's a lot more missing - especially
> > if we want to backport to as far as 4.9.
> >
>
> > Even with just those 4 dependencies, some of those commits are fairly
> > large, and that may be frowned upon
> > according to stable backport rules (listed at
> > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
> > For e.g., patches with over 100 lines changed.
> >
> > Now, did you actually verify if there were more dependencies? (and test)
> > And do you really want to go as far as 4.9 (currently the oldest
> > stable release)?
>
> No, I didn't. I used to think the CC tag was a very cool feature, which
> just putting a few commits lets backport easily when I read your mail,
> so I copied and pasted these 4 commits in the beginning of revising
> the patch v2. However, I'm wrong.
>
> > I seriously doubt that those 4 commits are the only dependencies in
> > order to be able to cleanly backport to 4.9 and other old branches.
> >
> > It may be better to backport only to a few younger stable branches, or
> > just provide later a version of the patch to
> > apply to each desired stable branch (once the fix is in Linus' tree
> > and in a -rc release).
> >
> > If you are not interested in backporting to stable or don't have the
> > time to verify the dependencies and test, then just remove all the
> > stable tags.
> > Just leave a fixes tag:
> >
> > Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
> >
>
> Since backporting is not our goal. I will just leave the fix tag here.
>
> > Also, please don't forget to send a test case for fstests, covering as
> > many cases as possible (not just the example
> > at the beginning of the changelog).
> >
>
> Okay, I will submit a test case covering all cases.
> Because I still need to spend time learning how to use the fssum utility
> for the last test case you reviewed, so I will submit the test case later.

There's nothing special about fssum in order to use it.

Call it once on the original snapshot/subvolume to generate a digest
based on the tree and
save it to a file. The digest is based on all data and metadata of
every inode accessible through the given path.

After receiving the send stream, call fssum again to compute the
digest on the resulting snapshot and compare it
with the digest that was computed and saved before - if they match
then send/receive produced the correct
results, otherwise it didn't and the test should fail.

There are plenty of send/receive test cases using fssum, just look at
them (btrfs/241 for e.g.).

>
> Thanks.
>
> > Thanks.
> >
> > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > Signed-off-by: BingJing Chang <bingjingc@synology.com>
> > > ---
> > >  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
> > >  1 file changed, 85 insertions(+), 129 deletions(-)
> > >
> > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > > index f8d77a33b9b7..6ab1ba66ff4b 100644
> > > --- a/fs/btrfs/send.c
> > > +++ b/fs/btrfs/send.c
> > > @@ -850,6 +850,7 @@ struct btrfs_inode_info {
> > >         u64 gid;
> > >         u64 rdev;
> > >         u64 attr;
> > > +       u64 nlink;
> > >  };
> > >
> > >  /*
> > > @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
> > >         info->uid = btrfs_inode_uid(path->nodes[0], ii);
> > >         info->gid = btrfs_inode_gid(path->nodes[0], ii);
> > >         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> > > +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
> > >         /*
> > >          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
> > >          * otherwise logically split to 32/32 parts.
> > > @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
> > >         int right_ret;
> > >         u64 left_gen;
> > >         u64 right_gen;
> > > +       struct btrfs_inode_info info;
> > >
> > > -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> > > +       ret = get_inode_info(sctx->send_root, ino, &info);
> > >         if (ret < 0 && ret != -ENOENT)
> > >                 goto out;
> > > -       left_ret = ret;
> > > +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > +       left_gen = info.gen;
> > >
> > >         if (!sctx->parent_root) {
> > >                 right_ret = -ENOENT;
> > >         } else {
> > > -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> > > +               ret = get_inode_info(sctx->parent_root, ino, &info);
> > >                 if (ret < 0 && ret != -ENOENT)
> > >                         goto out;
> > > -               right_ret = ret;
> > > +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > +               right_gen = info.gen;
> > >         }
> > >
> > >         if (!left_ret && !right_ret) {
> > > @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
> > >         return ret;
> > >  }
> > >
> > > -struct parent_paths_ctx {
> > > -       struct list_head *refs;
> > > -       struct send_ctx *sctx;
> > > -};
> > > -
> > > -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> > > -                            void *ctx)
> > > -{
> > > -       struct parent_paths_ctx *ppctx = ctx;
> > > -
> > > -       /*
> > > -        * Pass 0 as the generation for the directory, we don't care about it
> > > -        * here as we have no new references to add, we just want to delete all
> > > -        * references for an inode.
> > > -        */
> > > -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> > > -                                 name, dir, 0, ppctx->sctx);
> > > -}
> > > -
> > > -/*
> > > - * Issue unlink operations for all paths of the current inode found in the
> > > - * parent snapshot.
> > > - */
> > > -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> > > -{
> > > -       LIST_HEAD(deleted_refs);
> > > -       struct btrfs_path *path;
> > > -       struct btrfs_root *root = sctx->parent_root;
> > > -       struct btrfs_key key;
> > > -       struct btrfs_key found_key;
> > > -       struct parent_paths_ctx ctx;
> > > -       int iter_ret = 0;
> > > -       int ret;
> > > -
> > > -       path = alloc_path_for_send();
> > > -       if (!path)
> > > -               return -ENOMEM;
> > > -
> > > -       key.objectid = sctx->cur_ino;
> > > -       key.type = BTRFS_INODE_REF_KEY;
> > > -       key.offset = 0;
> > > -
> > > -       ctx.refs = &deleted_refs;
> > > -       ctx.sctx = sctx;
> > > -
> > > -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> > > -               if (found_key.objectid != key.objectid)
> > > -                       break;
> > > -               if (found_key.type != key.type &&
> > > -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> > > -                       break;
> > > -
> > > -               ret = iterate_inode_ref(root, path, &found_key, 1,
> > > -                                       record_parent_ref, &ctx);
> > > -               if (ret < 0)
> > > -                       goto out;
> > > -       }
> > > -       /* Catch error found during iteration */
> > > -       if (iter_ret < 0) {
> > > -               ret = iter_ret;
> > > -               goto out;
> > > -       }
> > > -
> > > -       while (!list_empty(&deleted_refs)) {
> > > -               struct recorded_ref *ref;
> > > -
> > > -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> > > -               ret = send_unlink(sctx, ref->full_path);
> > > -               if (ret < 0)
> > > -                       goto out;
> > > -               recorded_ref_free(ref);
> > > -       }
> > > -       ret = 0;
> > > -out:
> > > -       btrfs_free_path(path);
> > > -       if (ret)
> > > -               __free_recorded_refs(&deleted_refs);
> > > -       return ret;
> > > -}
> > > -
> > >  static void close_current_inode(struct send_ctx *sctx)
> > >  {
> > >         u64 i_size;
> > > @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
> > >          * file descriptor against it or turning a RO snapshot into RW mode,
> > >          * keep an open file descriptor against a file, delete it and then
> > >          * turn the snapshot back to RO mode before using it for a send
> > > -        * operation. So if we find such cases, ignore the inode and all its
> > > -        * items completely if it's a new inode, or if it's a changed inode
> > > -        * make sure all its previous paths (from the parent snapshot) are all
> > > -        * unlinked and all other the inode items are ignored.
> > > +        * operation. The former is what the receiver operation does.
> > > +        * Therefore, if we want to send these snapshots soon after they're
> > > +        * received, we need to handle orphan inodes as well. Moreover,
> > > +        * orphans can appear not only in the send snapshot but also in the
> > > +        * parent snapshot. Here are several cases:
> > > +        *
> > > +        * Case 1: BTRFS_COMPARE_TREE_NEW
> > > +        *       |  send snapshot  | action
> > > +        * --------------------------------
> > > +        * nlink |        0        | ignore
> > > +        *
> > > +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> > > +        *       | parent snapshot | action
> > > +        * ----------------------------------
> > > +        * nlink |        0        | as usual
> > > +        * Note: No unlinks will be sent because there're no paths for it.
> > > +        *
> > > +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > +        *           |       | parent snapshot | send snapshot | action
> > > +        * -----------------------------------------------------------------------
> > > +        * subcase 1 | nlink |        0        |       0       | ignore
> > > +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > +        *
> > >          */
> > > -       if (result == BTRFS_COMPARE_TREE_NEW ||
> > > -           result == BTRFS_COMPARE_TREE_CHANGED) {
> > > -               u32 nlinks;
> > > -
> > > -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> > > -               if (nlinks == 0) {
> > > +       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> > > +                                     0) {
> > >                         sctx->ignore_cur_inode = true;
> > > -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> > > -                               ret = btrfs_unlink_all_paths(sctx);
> > >                         goto out;
> > >                 }
> > > -       }
> > > -
> > > -       if (result == BTRFS_COMPARE_TREE_NEW) {
> > >                 sctx->cur_inode_gen = left_gen;
> > >                 sctx->cur_inode_new = true;
> > >                 sctx->cur_inode_deleted = false;
> > > @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
> > >                 sctx->cur_inode_mode = btrfs_inode_mode(
> > >                                 sctx->right_path->nodes[0], right_ii);
> > >         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> > > +               u32 new_nlinks, old_nlinks;
> > > +
> > > +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> > > +                                              left_ii);
> > > +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> > > +                                              right_ii);
> > > +               if (new_nlinks == 0 && old_nlinks == 0) {
> > > +                       sctx->ignore_cur_inode = true;
> > > +                       goto out;
> > > +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> > > +                       sctx->cur_inode_new_gen = 1;
> > > +               }
> > >                 /*
> > >                  * We need to do some special handling in case the inode was
> > >                  * reported as changed with a changed generation number. This
> > > @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
> > >                         /*
> > >                          * Now process the inode as if it was new.
> > >                          */
> > > -                       sctx->cur_inode_gen = left_gen;
> > > -                       sctx->cur_inode_new = true;
> > > -                       sctx->cur_inode_deleted = false;
> > > -                       sctx->cur_inode_size = btrfs_inode_size(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       sctx->cur_inode_mode = btrfs_inode_mode(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       ret = send_create_inode_if_needed(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > +                       if (new_nlinks > 0) {
> > > +                               sctx->cur_inode_gen = left_gen;
> > > +                               sctx->cur_inode_new = true;
> > > +                               sctx->cur_inode_deleted = false;
> > > +                               sctx->cur_inode_size = btrfs_inode_size(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               sctx->cur_inode_mode = btrfs_inode_mode(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               ret = send_create_inode_if_needed(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > >
> > > -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > -                       /*
> > > -                        * Advance send_progress now as we did not get into
> > > -                        * process_recorded_refs_if_needed in the new_gen case.
> > > -                        */
> > > -                       sctx->send_progress = sctx->cur_ino + 1;
> > > +                               ret = process_all_refs(sctx,
> > > +                                               BTRFS_COMPARE_TREE_NEW);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                               /*
> > > +                                * Advance send_progress now as we did not get
> > > +                                * into process_recorded_refs_if_needed in the
> > > +                                * new_gen case.
> > > +                                */
> > > +                               sctx->send_progress = sctx->cur_ino + 1;
> > >
> > > -                       /*
> > > -                        * Now process all extents and xattrs of the inode as if
> > > -                        * they were all new.
> > > -                        */
> > > -                       ret = process_all_extents(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > -                       ret = process_all_new_xattrs(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > +                               /*
> > > +                                * Now process all extents and xattrs of the
> > > +                                * inode as if they were all new.
> > > +                                */
> > > +                               ret = process_all_extents(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                               ret = process_all_new_xattrs(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                       }
> > >                 } else {
> > >                         sctx->cur_inode_gen = left_gen;
> > >                         sctx->cur_inode_new = false;
> > > --
> > > 2.37.1
> > >
David Sterba Aug. 18, 2022, 1:46 p.m. UTC | #4
On Fri, Aug 12, 2022 at 10:36:38PM +0800, bingjing chang wrote:
> > I seriously doubt that those 4 commits are the only dependencies in
> > order to be able to cleanly backport to 4.9 and other old branches.
> >
> > It may be better to backport only to a few younger stable branches, or
> > just provide later a version of the patch to
> > apply to each desired stable branch (once the fix is in Linus' tree
> > and in a -rc release).
> >
> > If you are not interested in backporting to stable or don't have the
> > time to verify the dependencies and test, then just remove all the
> > stable tags.
> > Just leave a fixes tag:
> >
> > Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
> 
> Since backporting is not our goal. I will just leave the fix tag here.

This Fixes: points to the original send patch, so that's not really
useful, otherwise if there's a target stable release where the patches
still apply cleanly, or with minimal conflicts it's sufficient to add a
CC: stable tag it's good to have it.
Filipe Manana Sept. 22, 2022, 10:08 a.m. UTC | #5
,

On Fri, Aug 12, 2022 at 3:36 PM bingjing chang <bxxxjxxg@gmail.com> wrote:
>
> Filipe Manana <fdmanana@kernel.org> 於 2022年8月11日 週四 晚上8:00寫道:
> >
> > On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
> > >
> > > From: BingJing Chang <bingjingc@synology.com>
> > >
> > > There is a bug causing send failures when processing an orphan directory
> > > with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> > > has deleted files still open")', the orphan inode issue was addressed. The
> > > send operation fails with a ENOENT error because of any attempts to
> > > generate a path for the inode with a link count of zero. Therefore, in that
> > > patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> > > has a link count of zero for bypassing some unnecessary steps. And a helper
> > > function btrfs_unlink_all_paths() was introduced and called to clean up old
> > > paths found in the parent snapshot. However, not only regular files but
> > > also directories can be orphan inodes. So if the send operation meets an
> > > orphan directory, it will issue a wrong unlink command for that directory
> > > now. Soon the receive operation fails with a EISDIR error. Besides, the
> > > send operation also fails with a ENOENT error later when it tries to
> > > generate a path of it.
> > >
> > > Similar example but making an orphan dir for an incremental send:
> > >
> > >   $ btrfs subvolume create vol
> > >   $ mkdir vol/dir
> > >   $ touch vol/dir/foo
> > >
> > >   $ btrfs subvolume snapshot -r vol snap1
> > >   $ btrfs subvolume snapshot -r vol snap2
> > >
> > >   # Turn the second snapshot to RW mode and delete the whole dir while
> > >   # holding an open file descriptor on it.
> > >   $ btrfs property set snap2 ro false
> > >   $ exec 73<snap2/dir
> > >   $ rm -rf snap2/dir
> > >
> > >   # Set the second snapshot back to RO mode and do an incremental send.
> > >   $ btrfs property set snap2 ro true
> > >   $ mkdir receive_dir
> > >   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
> > >   At subvol snap2
> > >   At snapshot snap2
> > >   ERROR: send ioctl failed with -2: No such file or directory
> > >   ERROR: unlink dir failed. Is a directory
> > >
> > > Actually, orphan inodes are more common use cases in cascading backups.
> > > (Please see the illustration below.) In a cascading backup, a user wants
> > > to replicate a couple of snapshots from Machine A to Machine B and from
> > > Machine B to Machine C. Machine B doesn't take any RO snapshots for
> > > sending. All a receiver does is create an RW snapshot of its parent
> > > snapshot, apply the send stream and turn it into RO mode at the end. Even
> > > if all paths of some inodes are deleted in applying the send stream, these
> > > inodes would not be deleted and become orphans after changing the subvolume
> > > from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> > > but also in parent snapshots because Machine B may do a batch replication
> > > of a couple of snapshots.
> > >
> > > An illustration for cascading backups:
> > > Machine A (snapshot {1..n}) --> Machine B --> Machine C
> > >
> > > The intuition to solve the problem is to delete all the items of orphan
> > > inodes before using these snapshots for sending. I used to think that the
> > > reasonable timing for doing that is during the ioctl of changing the
> > > subvolume from RW to RO because it sounds good that we will not modify the
> > > fs tree of a RO snapshot anymore. However, attempting to do the orphan
> > > cleanup in the ioctl would be pointless. Because if someone is holding an
> > > open file descriptor on the inode, the reference count of the inode will
> > > never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> > > all the items of it. So we try to extend the original patch to handle
> > > orphans in send/parent snapshots. Here are several cases that need to be
> > > considered:
> > >
> > > Case 1: BTRFS_COMPARE_TREE_NEW
> > >        |  send snapshot  | action
> > >  --------------------------------
> > >  nlink |        0        | ignore
> > >
> > > In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> > > it means that a new inode is found in the send snapshot and it doesn't
> > > appear in the parent snapshot. Since this inode has a link count of zero
> > > (It's an orphan and there're no paths for it.), we can leverage
> > > sctx->ignore_cur_inode in the original patch to prevent it from being
> > > created.
> > >
> > > Case 2: BTRFS_COMPARE_TREE_DELETED
> > >        | parent snapshot | action
> > >  ----------------------------------
> > >  nlink |        0        | as usual
> > >
> > > In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> > > result, it means that the inode only appears in the parent snapshot.
> > > As usual, the send operation will try to delete all its paths. However,
> > > this inode has a link count of zero, so no paths of it will be found. No
> > > deletion operations will be issued. We don't need to change any logic.
> > >
> > > Case 3: BTRFS_COMPARE_TREE_CHANGED
> > >            |       | parent snapshot | send snapshot | action
> > >  -----------------------------------------------------------------------
> > >  subcase 1 | nlink |        0        |       0       | ignore
> > >  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > >  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > >
> > > In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> > > it means that the inode appears in both snapshots. Here're three subcases.
> > >
> > > First, if the inode has link counts of zero in both snapshots. Since there
> > > are no paths for this inode in (source/destination) parent snapshots and we
> > > don't care about whether there is also an orphan inode in destination or
> > > not, we can set sctx->ignore_cur_inode on to prevent it from being created.
> > >
> > > For the second and the third subcases, if there're paths in one snapshot
> > > and there're no paths in the other snapshot for this inode. We can treat
> > > this inode as a new generation. We can also leverage the logic handling a
> > > new generation of an inode with small adjustments. Then it will delete all
> > > old paths and create a new inode with new attributes and paths only when
> > > there's a positive link count in the send snapshot. In subcase 2, the
> > > send operation only needs to delete all old paths as in the parent
> > > snapshot. But it may require more operations for a directory to remove its
> > > old paths. If a not-empty directory is going to be deleted (because it has
> > > a link count of zero in the send snapshot) but there're files/directories
> > > with bigger inode numbers under it, the send operation will need to rename
> > > it to its orphan name first. After processing and deleting the last item
> > > under this directory, the send operation will check this directory, aka
> > > the parent directory of the last item, again and issue a rmdir operation
> > > to remove it finally. Therefore, we also need to treat inodes with a link
> > > count of zero as if they didn't exist in get_cur_inode_state(), which is
> > > used in process_recorded_refs(). By doing this, when reviewing a directory
> > > with orphan names after the last item under it has been deleted, the send
> > > operation now can properly issue a rmdir operation. Otherwise, without
> > > doing this, the orphan directory with an orphan name would be kept here
> > > at the end due to the existing inode with a link count of zero being found.
> > > In subcase 3, as in case 2, no old paths would be found, so no deletion
> > > operations will be issued. The send operation will only create a new one
> > > for that inode.
> > >
> > > Note that subcase 3 is not a common case. That's because it's easy to
> > > reduce the hard links of an inode, but once all valid paths are removed,
> > > there're no valid paths for creating other hard links. The only way to do
> > > that is trying to send an older snapshot after a newer snapshot has been
> > > sent.
> > >
> > > Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> > > failure when root has deleted files still open
> > > Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> > > introduce recorded_ref_alloc and recorded_ref_free
> > > Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> > > sending link commands for existing file paths
> > > Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> > > use the rbtree based inode ref management infrastructure
> >
> > Btw, lines with CC, Fixes, etc, tags should not be broken even if they
> > are wider than 74 characters.
> >
>
> Okay, thank you for telling me that.
>
> > So, in v1 when I gave you that example of CC stable tags, it wasn't
> > meant for you to literally copy-paste them.
> >
> > First I asked if the purpose of the original Fixes tag was to backport
> > the fix to stable releases.
> > Was that the intention? You didn't provide an answer about that.
> >
>
> Oh, I misunderstood your suggestion. I'm sorry about that.
> Our intention is to report this bug and try to provide a reasonable and
> acceptable fix for it. Backporting is not our goal.
>
> > Then I told if that was the case, the proper way would be adding CC
> > stable tags and listing any
> > dependencies. I gave those 4 as examples with commits that are fairly
> > recent and obvious dependencies,
> > but I also said that probably there's a lot more missing - especially
> > if we want to backport to as far as 4.9.
> >
>
> > Even with just those 4 dependencies, some of those commits are fairly
> > large, and that may be frowned upon
> > according to stable backport rules (listed at
> > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
> > For e.g., patches with over 100 lines changed.
> >
> > Now, did you actually verify if there were more dependencies? (and test)
> > And do you really want to go as far as 4.9 (currently the oldest
> > stable release)?
>
> No, I didn't. I used to think the CC tag was a very cool feature, which
> just putting a few commits lets backport easily when I read your mail,
> so I copied and pasted these 4 commits in the beginning of revising
> the patch v2. However, I'm wrong.
>
> > I seriously doubt that those 4 commits are the only dependencies in
> > order to be able to cleanly backport to 4.9 and other old branches.
> >
> > It may be better to backport only to a few younger stable branches, or
> > just provide later a version of the patch to
> > apply to each desired stable branch (once the fix is in Linus' tree
> > and in a -rc release).
> >
> > If you are not interested in backporting to stable or don't have the
> > time to verify the dependencies and test, then just remove all the
> > stable tags.
> > Just leave a fixes tag:
> >
> > Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
> >
>
> Since backporting is not our goal. I will just leave the fix tag here.
>
> > Also, please don't forget to send a test case for fstests, covering as
> > many cases as possible (not just the example
> > at the beginning of the changelog).
> >
>
> Okay, I will submit a test case covering all cases.
> Because I still need to spend time learning how to use the fssum utility
> for the last test case you reviewed, so I will submit the test case later.

BingJing, any progress with the test case?
We would love to have that in fstests to help prevent regressions in the future.

Thanks.

>
> Thanks.
>
> > Thanks.
> >
> > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > Signed-off-by: BingJing Chang <bingjingc@synology.com>
> > > ---
> > >  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
> > >  1 file changed, 85 insertions(+), 129 deletions(-)
> > >
> > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > > index f8d77a33b9b7..6ab1ba66ff4b 100644
> > > --- a/fs/btrfs/send.c
> > > +++ b/fs/btrfs/send.c
> > > @@ -850,6 +850,7 @@ struct btrfs_inode_info {
> > >         u64 gid;
> > >         u64 rdev;
> > >         u64 attr;
> > > +       u64 nlink;
> > >  };
> > >
> > >  /*
> > > @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
> > >         info->uid = btrfs_inode_uid(path->nodes[0], ii);
> > >         info->gid = btrfs_inode_gid(path->nodes[0], ii);
> > >         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> > > +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
> > >         /*
> > >          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
> > >          * otherwise logically split to 32/32 parts.
> > > @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
> > >         int right_ret;
> > >         u64 left_gen;
> > >         u64 right_gen;
> > > +       struct btrfs_inode_info info;
> > >
> > > -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> > > +       ret = get_inode_info(sctx->send_root, ino, &info);
> > >         if (ret < 0 && ret != -ENOENT)
> > >                 goto out;
> > > -       left_ret = ret;
> > > +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > +       left_gen = info.gen;
> > >
> > >         if (!sctx->parent_root) {
> > >                 right_ret = -ENOENT;
> > >         } else {
> > > -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> > > +               ret = get_inode_info(sctx->parent_root, ino, &info);
> > >                 if (ret < 0 && ret != -ENOENT)
> > >                         goto out;
> > > -               right_ret = ret;
> > > +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > +               right_gen = info.gen;
> > >         }
> > >
> > >         if (!left_ret && !right_ret) {
> > > @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
> > >         return ret;
> > >  }
> > >
> > > -struct parent_paths_ctx {
> > > -       struct list_head *refs;
> > > -       struct send_ctx *sctx;
> > > -};
> > > -
> > > -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> > > -                            void *ctx)
> > > -{
> > > -       struct parent_paths_ctx *ppctx = ctx;
> > > -
> > > -       /*
> > > -        * Pass 0 as the generation for the directory, we don't care about it
> > > -        * here as we have no new references to add, we just want to delete all
> > > -        * references for an inode.
> > > -        */
> > > -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> > > -                                 name, dir, 0, ppctx->sctx);
> > > -}
> > > -
> > > -/*
> > > - * Issue unlink operations for all paths of the current inode found in the
> > > - * parent snapshot.
> > > - */
> > > -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> > > -{
> > > -       LIST_HEAD(deleted_refs);
> > > -       struct btrfs_path *path;
> > > -       struct btrfs_root *root = sctx->parent_root;
> > > -       struct btrfs_key key;
> > > -       struct btrfs_key found_key;
> > > -       struct parent_paths_ctx ctx;
> > > -       int iter_ret = 0;
> > > -       int ret;
> > > -
> > > -       path = alloc_path_for_send();
> > > -       if (!path)
> > > -               return -ENOMEM;
> > > -
> > > -       key.objectid = sctx->cur_ino;
> > > -       key.type = BTRFS_INODE_REF_KEY;
> > > -       key.offset = 0;
> > > -
> > > -       ctx.refs = &deleted_refs;
> > > -       ctx.sctx = sctx;
> > > -
> > > -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> > > -               if (found_key.objectid != key.objectid)
> > > -                       break;
> > > -               if (found_key.type != key.type &&
> > > -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> > > -                       break;
> > > -
> > > -               ret = iterate_inode_ref(root, path, &found_key, 1,
> > > -                                       record_parent_ref, &ctx);
> > > -               if (ret < 0)
> > > -                       goto out;
> > > -       }
> > > -       /* Catch error found during iteration */
> > > -       if (iter_ret < 0) {
> > > -               ret = iter_ret;
> > > -               goto out;
> > > -       }
> > > -
> > > -       while (!list_empty(&deleted_refs)) {
> > > -               struct recorded_ref *ref;
> > > -
> > > -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> > > -               ret = send_unlink(sctx, ref->full_path);
> > > -               if (ret < 0)
> > > -                       goto out;
> > > -               recorded_ref_free(ref);
> > > -       }
> > > -       ret = 0;
> > > -out:
> > > -       btrfs_free_path(path);
> > > -       if (ret)
> > > -               __free_recorded_refs(&deleted_refs);
> > > -       return ret;
> > > -}
> > > -
> > >  static void close_current_inode(struct send_ctx *sctx)
> > >  {
> > >         u64 i_size;
> > > @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
> > >          * file descriptor against it or turning a RO snapshot into RW mode,
> > >          * keep an open file descriptor against a file, delete it and then
> > >          * turn the snapshot back to RO mode before using it for a send
> > > -        * operation. So if we find such cases, ignore the inode and all its
> > > -        * items completely if it's a new inode, or if it's a changed inode
> > > -        * make sure all its previous paths (from the parent snapshot) are all
> > > -        * unlinked and all other the inode items are ignored.
> > > +        * operation. The former is what the receiver operation does.
> > > +        * Therefore, if we want to send these snapshots soon after they're
> > > +        * received, we need to handle orphan inodes as well. Moreover,
> > > +        * orphans can appear not only in the send snapshot but also in the
> > > +        * parent snapshot. Here are several cases:
> > > +        *
> > > +        * Case 1: BTRFS_COMPARE_TREE_NEW
> > > +        *       |  send snapshot  | action
> > > +        * --------------------------------
> > > +        * nlink |        0        | ignore
> > > +        *
> > > +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> > > +        *       | parent snapshot | action
> > > +        * ----------------------------------
> > > +        * nlink |        0        | as usual
> > > +        * Note: No unlinks will be sent because there're no paths for it.
> > > +        *
> > > +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > +        *           |       | parent snapshot | send snapshot | action
> > > +        * -----------------------------------------------------------------------
> > > +        * subcase 1 | nlink |        0        |       0       | ignore
> > > +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > +        *
> > >          */
> > > -       if (result == BTRFS_COMPARE_TREE_NEW ||
> > > -           result == BTRFS_COMPARE_TREE_CHANGED) {
> > > -               u32 nlinks;
> > > -
> > > -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> > > -               if (nlinks == 0) {
> > > +       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> > > +                                     0) {
> > >                         sctx->ignore_cur_inode = true;
> > > -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> > > -                               ret = btrfs_unlink_all_paths(sctx);
> > >                         goto out;
> > >                 }
> > > -       }
> > > -
> > > -       if (result == BTRFS_COMPARE_TREE_NEW) {
> > >                 sctx->cur_inode_gen = left_gen;
> > >                 sctx->cur_inode_new = true;
> > >                 sctx->cur_inode_deleted = false;
> > > @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
> > >                 sctx->cur_inode_mode = btrfs_inode_mode(
> > >                                 sctx->right_path->nodes[0], right_ii);
> > >         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> > > +               u32 new_nlinks, old_nlinks;
> > > +
> > > +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> > > +                                              left_ii);
> > > +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> > > +                                              right_ii);
> > > +               if (new_nlinks == 0 && old_nlinks == 0) {
> > > +                       sctx->ignore_cur_inode = true;
> > > +                       goto out;
> > > +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> > > +                       sctx->cur_inode_new_gen = 1;
> > > +               }
> > >                 /*
> > >                  * We need to do some special handling in case the inode was
> > >                  * reported as changed with a changed generation number. This
> > > @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
> > >                         /*
> > >                          * Now process the inode as if it was new.
> > >                          */
> > > -                       sctx->cur_inode_gen = left_gen;
> > > -                       sctx->cur_inode_new = true;
> > > -                       sctx->cur_inode_deleted = false;
> > > -                       sctx->cur_inode_size = btrfs_inode_size(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       sctx->cur_inode_mode = btrfs_inode_mode(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > -                                       sctx->left_path->nodes[0], left_ii);
> > > -                       ret = send_create_inode_if_needed(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > +                       if (new_nlinks > 0) {
> > > +                               sctx->cur_inode_gen = left_gen;
> > > +                               sctx->cur_inode_new = true;
> > > +                               sctx->cur_inode_deleted = false;
> > > +                               sctx->cur_inode_size = btrfs_inode_size(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               sctx->cur_inode_mode = btrfs_inode_mode(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > +                                               sctx->left_path->nodes[0],
> > > +                                               left_ii);
> > > +                               ret = send_create_inode_if_needed(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > >
> > > -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > -                       /*
> > > -                        * Advance send_progress now as we did not get into
> > > -                        * process_recorded_refs_if_needed in the new_gen case.
> > > -                        */
> > > -                       sctx->send_progress = sctx->cur_ino + 1;
> > > +                               ret = process_all_refs(sctx,
> > > +                                               BTRFS_COMPARE_TREE_NEW);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                               /*
> > > +                                * Advance send_progress now as we did not get
> > > +                                * into process_recorded_refs_if_needed in the
> > > +                                * new_gen case.
> > > +                                */
> > > +                               sctx->send_progress = sctx->cur_ino + 1;
> > >
> > > -                       /*
> > > -                        * Now process all extents and xattrs of the inode as if
> > > -                        * they were all new.
> > > -                        */
> > > -                       ret = process_all_extents(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > -                       ret = process_all_new_xattrs(sctx);
> > > -                       if (ret < 0)
> > > -                               goto out;
> > > +                               /*
> > > +                                * Now process all extents and xattrs of the
> > > +                                * inode as if they were all new.
> > > +                                */
> > > +                               ret = process_all_extents(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                               ret = process_all_new_xattrs(sctx);
> > > +                               if (ret < 0)
> > > +                                       goto out;
> > > +                       }
> > >                 } else {
> > >                         sctx->cur_inode_gen = left_gen;
> > >                         sctx->cur_inode_new = false;
> > > --
> > > 2.37.1
> > >
bingjing chang Sept. 23, 2022, 2:16 p.m. UTC | #6
Filipe Manana <fdmanana@kernel.org> 於 2022年9月22日 週四 下午6:08寫道:
>
> ,
>
> On Fri, Aug 12, 2022 at 3:36 PM bingjing chang <bxxxjxxg@gmail.com> wrote:
> >
> > Filipe Manana <fdmanana@kernel.org> 於 2022年8月11日 週四 晚上8:00寫道:
> > >
> > > On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
> > > >
> > > > From: BingJing Chang <bingjingc@synology.com>
> > > >
> > > > There is a bug causing send failures when processing an orphan directory
> > > > with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> > > > has deleted files still open")', the orphan inode issue was addressed. The
> > > > send operation fails with a ENOENT error because of any attempts to
> > > > generate a path for the inode with a link count of zero. Therefore, in that
> > > > patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> > > > has a link count of zero for bypassing some unnecessary steps. And a helper
> > > > function btrfs_unlink_all_paths() was introduced and called to clean up old
> > > > paths found in the parent snapshot. However, not only regular files but
> > > > also directories can be orphan inodes. So if the send operation meets an
> > > > orphan directory, it will issue a wrong unlink command for that directory
> > > > now. Soon the receive operation fails with a EISDIR error. Besides, the
> > > > send operation also fails with a ENOENT error later when it tries to
> > > > generate a path of it.
> > > >
> > > > Similar example but making an orphan dir for an incremental send:
> > > >
> > > >   $ btrfs subvolume create vol
> > > >   $ mkdir vol/dir
> > > >   $ touch vol/dir/foo
> > > >
> > > >   $ btrfs subvolume snapshot -r vol snap1
> > > >   $ btrfs subvolume snapshot -r vol snap2
> > > >
> > > >   # Turn the second snapshot to RW mode and delete the whole dir while
> > > >   # holding an open file descriptor on it.
> > > >   $ btrfs property set snap2 ro false
> > > >   $ exec 73<snap2/dir
> > > >   $ rm -rf snap2/dir
> > > >
> > > >   # Set the second snapshot back to RO mode and do an incremental send.
> > > >   $ btrfs property set snap2 ro true
> > > >   $ mkdir receive_dir
> > > >   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
> > > >   At subvol snap2
> > > >   At snapshot snap2
> > > >   ERROR: send ioctl failed with -2: No such file or directory
> > > >   ERROR: unlink dir failed. Is a directory
> > > >
> > > > Actually, orphan inodes are more common use cases in cascading backups.
> > > > (Please see the illustration below.) In a cascading backup, a user wants
> > > > to replicate a couple of snapshots from Machine A to Machine B and from
> > > > Machine B to Machine C. Machine B doesn't take any RO snapshots for
> > > > sending. All a receiver does is create an RW snapshot of its parent
> > > > snapshot, apply the send stream and turn it into RO mode at the end. Even
> > > > if all paths of some inodes are deleted in applying the send stream, these
> > > > inodes would not be deleted and become orphans after changing the subvolume
> > > > from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> > > > but also in parent snapshots because Machine B may do a batch replication
> > > > of a couple of snapshots.
> > > >
> > > > An illustration for cascading backups:
> > > > Machine A (snapshot {1..n}) --> Machine B --> Machine C
> > > >
> > > > The intuition to solve the problem is to delete all the items of orphan
> > > > inodes before using these snapshots for sending. I used to think that the
> > > > reasonable timing for doing that is during the ioctl of changing the
> > > > subvolume from RW to RO because it sounds good that we will not modify the
> > > > fs tree of a RO snapshot anymore. However, attempting to do the orphan
> > > > cleanup in the ioctl would be pointless. Because if someone is holding an
> > > > open file descriptor on the inode, the reference count of the inode will
> > > > never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> > > > all the items of it. So we try to extend the original patch to handle
> > > > orphans in send/parent snapshots. Here are several cases that need to be
> > > > considered:
> > > >
> > > > Case 1: BTRFS_COMPARE_TREE_NEW
> > > >        |  send snapshot  | action
> > > >  --------------------------------
> > > >  nlink |        0        | ignore
> > > >
> > > > In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> > > > it means that a new inode is found in the send snapshot and it doesn't
> > > > appear in the parent snapshot. Since this inode has a link count of zero
> > > > (It's an orphan and there're no paths for it.), we can leverage
> > > > sctx->ignore_cur_inode in the original patch to prevent it from being
> > > > created.
> > > >
> > > > Case 2: BTRFS_COMPARE_TREE_DELETED
> > > >        | parent snapshot | action
> > > >  ----------------------------------
> > > >  nlink |        0        | as usual
> > > >
> > > > In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> > > > result, it means that the inode only appears in the parent snapshot.
> > > > As usual, the send operation will try to delete all its paths. However,
> > > > this inode has a link count of zero, so no paths of it will be found. No
> > > > deletion operations will be issued. We don't need to change any logic.
> > > >
> > > > Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > >            |       | parent snapshot | send snapshot | action
> > > >  -----------------------------------------------------------------------
> > > >  subcase 1 | nlink |        0        |       0       | ignore
> > > >  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > >  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > >
> > > > In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> > > > it means that the inode appears in both snapshots. Here're three subcases.
> > > >
> > > > First, if the inode has link counts of zero in both snapshots. Since there
> > > > are no paths for this inode in (source/destination) parent snapshots and we
> > > > don't care about whether there is also an orphan inode in destination or
> > > > not, we can set sctx->ignore_cur_inode on to prevent it from being created.
> > > >
> > > > For the second and the third subcases, if there're paths in one snapshot
> > > > and there're no paths in the other snapshot for this inode. We can treat
> > > > this inode as a new generation. We can also leverage the logic handling a
> > > > new generation of an inode with small adjustments. Then it will delete all
> > > > old paths and create a new inode with new attributes and paths only when
> > > > there's a positive link count in the send snapshot. In subcase 2, the
> > > > send operation only needs to delete all old paths as in the parent
> > > > snapshot. But it may require more operations for a directory to remove its
> > > > old paths. If a not-empty directory is going to be deleted (because it has
> > > > a link count of zero in the send snapshot) but there're files/directories
> > > > with bigger inode numbers under it, the send operation will need to rename
> > > > it to its orphan name first. After processing and deleting the last item
> > > > under this directory, the send operation will check this directory, aka
> > > > the parent directory of the last item, again and issue a rmdir operation
> > > > to remove it finally. Therefore, we also need to treat inodes with a link
> > > > count of zero as if they didn't exist in get_cur_inode_state(), which is
> > > > used in process_recorded_refs(). By doing this, when reviewing a directory
> > > > with orphan names after the last item under it has been deleted, the send
> > > > operation now can properly issue a rmdir operation. Otherwise, without
> > > > doing this, the orphan directory with an orphan name would be kept here
> > > > at the end due to the existing inode with a link count of zero being found.
> > > > In subcase 3, as in case 2, no old paths would be found, so no deletion
> > > > operations will be issued. The send operation will only create a new one
> > > > for that inode.
> > > >
> > > > Note that subcase 3 is not a common case. That's because it's easy to
> > > > reduce the hard links of an inode, but once all valid paths are removed,
> > > > there're no valid paths for creating other hard links. The only way to do
> > > > that is trying to send an older snapshot after a newer snapshot has been
> > > > sent.
> > > >
> > > > Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> > > > failure when root has deleted files still open
> > > > Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> > > > introduce recorded_ref_alloc and recorded_ref_free
> > > > Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> > > > sending link commands for existing file paths
> > > > Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> > > > use the rbtree based inode ref management infrastructure
> > >
> > > Btw, lines with CC, Fixes, etc, tags should not be broken even if they
> > > are wider than 74 characters.
> > >
> >
> > Okay, thank you for telling me that.
> >
> > > So, in v1 when I gave you that example of CC stable tags, it wasn't
> > > meant for you to literally copy-paste them.
> > >
> > > First I asked if the purpose of the original Fixes tag was to backport
> > > the fix to stable releases.
> > > Was that the intention? You didn't provide an answer about that.
> > >
> >
> > Oh, I misunderstood your suggestion. I'm sorry about that.
> > Our intention is to report this bug and try to provide a reasonable and
> > acceptable fix for it. Backporting is not our goal.
> >
> > > Then I told if that was the case, the proper way would be adding CC
> > > stable tags and listing any
> > > dependencies. I gave those 4 as examples with commits that are fairly
> > > recent and obvious dependencies,
> > > but I also said that probably there's a lot more missing - especially
> > > if we want to backport to as far as 4.9.
> > >
> >
> > > Even with just those 4 dependencies, some of those commits are fairly
> > > large, and that may be frowned upon
> > > according to stable backport rules (listed at
> > > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
> > > For e.g., patches with over 100 lines changed.
> > >
> > > Now, did you actually verify if there were more dependencies? (and test)
> > > And do you really want to go as far as 4.9 (currently the oldest
> > > stable release)?
> >
> > No, I didn't. I used to think the CC tag was a very cool feature, which
> > just putting a few commits lets backport easily when I read your mail,
> > so I copied and pasted these 4 commits in the beginning of revising
> > the patch v2. However, I'm wrong.
> >
> > > I seriously doubt that those 4 commits are the only dependencies in
> > > order to be able to cleanly backport to 4.9 and other old branches.
> > >
> > > It may be better to backport only to a few younger stable branches, or
> > > just provide later a version of the patch to
> > > apply to each desired stable branch (once the fix is in Linus' tree
> > > and in a -rc release).
> > >
> > > If you are not interested in backporting to stable or don't have the
> > > time to verify the dependencies and test, then just remove all the
> > > stable tags.
> > > Just leave a fixes tag:
> > >
> > > Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
> > >
> >
> > Since backporting is not our goal. I will just leave the fix tag here.
> >
> > > Also, please don't forget to send a test case for fstests, covering as
> > > many cases as possible (not just the example
> > > at the beginning of the changelog).
> > >
> >
> > Okay, I will submit a test case covering all cases.
> > Because I still need to spend time learning how to use the fssum utility
> > for the last test case you reviewed, so I will submit the test case later.
>
> BingJing, any progress with the test case?
> We would love to have that in fstests to help prevent regressions in the future.
>
> Thanks.
>

Hi Filipe,

I used to wait the git pull of my patch to the master branch, then I can
attach the commit uuid in tests as before. However, it seems that there're
troubles for the maintainer to pull all of the bug fixes into the new
kernel 6.0 at once.

Okay, I will try to create a test covering the cases this weekend.
Sorry for the inconvenience made.

Thanks.

> >
> > Thanks.
> >
> > > Thanks.
> > >
> > > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > > Signed-off-by: BingJing Chang <bingjingc@synology.com>
> > > > ---
> > > >  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
> > > >  1 file changed, 85 insertions(+), 129 deletions(-)
> > > >
> > > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > > > index f8d77a33b9b7..6ab1ba66ff4b 100644
> > > > --- a/fs/btrfs/send.c
> > > > +++ b/fs/btrfs/send.c
> > > > @@ -850,6 +850,7 @@ struct btrfs_inode_info {
> > > >         u64 gid;
> > > >         u64 rdev;
> > > >         u64 attr;
> > > > +       u64 nlink;
> > > >  };
> > > >
> > > >  /*
> > > > @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
> > > >         info->uid = btrfs_inode_uid(path->nodes[0], ii);
> > > >         info->gid = btrfs_inode_gid(path->nodes[0], ii);
> > > >         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> > > > +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
> > > >         /*
> > > >          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
> > > >          * otherwise logically split to 32/32 parts.
> > > > @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
> > > >         int right_ret;
> > > >         u64 left_gen;
> > > >         u64 right_gen;
> > > > +       struct btrfs_inode_info info;
> > > >
> > > > -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> > > > +       ret = get_inode_info(sctx->send_root, ino, &info);
> > > >         if (ret < 0 && ret != -ENOENT)
> > > >                 goto out;
> > > > -       left_ret = ret;
> > > > +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > > +       left_gen = info.gen;
> > > >
> > > >         if (!sctx->parent_root) {
> > > >                 right_ret = -ENOENT;
> > > >         } else {
> > > > -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> > > > +               ret = get_inode_info(sctx->parent_root, ino, &info);
> > > >                 if (ret < 0 && ret != -ENOENT)
> > > >                         goto out;
> > > > -               right_ret = ret;
> > > > +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > > +               right_gen = info.gen;
> > > >         }
> > > >
> > > >         if (!left_ret && !right_ret) {
> > > > @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
> > > >         return ret;
> > > >  }
> > > >
> > > > -struct parent_paths_ctx {
> > > > -       struct list_head *refs;
> > > > -       struct send_ctx *sctx;
> > > > -};
> > > > -
> > > > -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> > > > -                            void *ctx)
> > > > -{
> > > > -       struct parent_paths_ctx *ppctx = ctx;
> > > > -
> > > > -       /*
> > > > -        * Pass 0 as the generation for the directory, we don't care about it
> > > > -        * here as we have no new references to add, we just want to delete all
> > > > -        * references for an inode.
> > > > -        */
> > > > -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> > > > -                                 name, dir, 0, ppctx->sctx);
> > > > -}
> > > > -
> > > > -/*
> > > > - * Issue unlink operations for all paths of the current inode found in the
> > > > - * parent snapshot.
> > > > - */
> > > > -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> > > > -{
> > > > -       LIST_HEAD(deleted_refs);
> > > > -       struct btrfs_path *path;
> > > > -       struct btrfs_root *root = sctx->parent_root;
> > > > -       struct btrfs_key key;
> > > > -       struct btrfs_key found_key;
> > > > -       struct parent_paths_ctx ctx;
> > > > -       int iter_ret = 0;
> > > > -       int ret;
> > > > -
> > > > -       path = alloc_path_for_send();
> > > > -       if (!path)
> > > > -               return -ENOMEM;
> > > > -
> > > > -       key.objectid = sctx->cur_ino;
> > > > -       key.type = BTRFS_INODE_REF_KEY;
> > > > -       key.offset = 0;
> > > > -
> > > > -       ctx.refs = &deleted_refs;
> > > > -       ctx.sctx = sctx;
> > > > -
> > > > -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> > > > -               if (found_key.objectid != key.objectid)
> > > > -                       break;
> > > > -               if (found_key.type != key.type &&
> > > > -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> > > > -                       break;
> > > > -
> > > > -               ret = iterate_inode_ref(root, path, &found_key, 1,
> > > > -                                       record_parent_ref, &ctx);
> > > > -               if (ret < 0)
> > > > -                       goto out;
> > > > -       }
> > > > -       /* Catch error found during iteration */
> > > > -       if (iter_ret < 0) {
> > > > -               ret = iter_ret;
> > > > -               goto out;
> > > > -       }
> > > > -
> > > > -       while (!list_empty(&deleted_refs)) {
> > > > -               struct recorded_ref *ref;
> > > > -
> > > > -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> > > > -               ret = send_unlink(sctx, ref->full_path);
> > > > -               if (ret < 0)
> > > > -                       goto out;
> > > > -               recorded_ref_free(ref);
> > > > -       }
> > > > -       ret = 0;
> > > > -out:
> > > > -       btrfs_free_path(path);
> > > > -       if (ret)
> > > > -               __free_recorded_refs(&deleted_refs);
> > > > -       return ret;
> > > > -}
> > > > -
> > > >  static void close_current_inode(struct send_ctx *sctx)
> > > >  {
> > > >         u64 i_size;
> > > > @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
> > > >          * file descriptor against it or turning a RO snapshot into RW mode,
> > > >          * keep an open file descriptor against a file, delete it and then
> > > >          * turn the snapshot back to RO mode before using it for a send
> > > > -        * operation. So if we find such cases, ignore the inode and all its
> > > > -        * items completely if it's a new inode, or if it's a changed inode
> > > > -        * make sure all its previous paths (from the parent snapshot) are all
> > > > -        * unlinked and all other the inode items are ignored.
> > > > +        * operation. The former is what the receiver operation does.
> > > > +        * Therefore, if we want to send these snapshots soon after they're
> > > > +        * received, we need to handle orphan inodes as well. Moreover,
> > > > +        * orphans can appear not only in the send snapshot but also in the
> > > > +        * parent snapshot. Here are several cases:
> > > > +        *
> > > > +        * Case 1: BTRFS_COMPARE_TREE_NEW
> > > > +        *       |  send snapshot  | action
> > > > +        * --------------------------------
> > > > +        * nlink |        0        | ignore
> > > > +        *
> > > > +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> > > > +        *       | parent snapshot | action
> > > > +        * ----------------------------------
> > > > +        * nlink |        0        | as usual
> > > > +        * Note: No unlinks will be sent because there're no paths for it.
> > > > +        *
> > > > +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > > +        *           |       | parent snapshot | send snapshot | action
> > > > +        * -----------------------------------------------------------------------
> > > > +        * subcase 1 | nlink |        0        |       0       | ignore
> > > > +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > > +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > > +        *
> > > >          */
> > > > -       if (result == BTRFS_COMPARE_TREE_NEW ||
> > > > -           result == BTRFS_COMPARE_TREE_CHANGED) {
> > > > -               u32 nlinks;
> > > > -
> > > > -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> > > > -               if (nlinks == 0) {
> > > > +       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > > +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> > > > +                                     0) {
> > > >                         sctx->ignore_cur_inode = true;
> > > > -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> > > > -                               ret = btrfs_unlink_all_paths(sctx);
> > > >                         goto out;
> > > >                 }
> > > > -       }
> > > > -
> > > > -       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > >                 sctx->cur_inode_gen = left_gen;
> > > >                 sctx->cur_inode_new = true;
> > > >                 sctx->cur_inode_deleted = false;
> > > > @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
> > > >                 sctx->cur_inode_mode = btrfs_inode_mode(
> > > >                                 sctx->right_path->nodes[0], right_ii);
> > > >         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> > > > +               u32 new_nlinks, old_nlinks;
> > > > +
> > > > +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> > > > +                                              left_ii);
> > > > +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> > > > +                                              right_ii);
> > > > +               if (new_nlinks == 0 && old_nlinks == 0) {
> > > > +                       sctx->ignore_cur_inode = true;
> > > > +                       goto out;
> > > > +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> > > > +                       sctx->cur_inode_new_gen = 1;
> > > > +               }
> > > >                 /*
> > > >                  * We need to do some special handling in case the inode was
> > > >                  * reported as changed with a changed generation number. This
> > > > @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
> > > >                         /*
> > > >                          * Now process the inode as if it was new.
> > > >                          */
> > > > -                       sctx->cur_inode_gen = left_gen;
> > > > -                       sctx->cur_inode_new = true;
> > > > -                       sctx->cur_inode_deleted = false;
> > > > -                       sctx->cur_inode_size = btrfs_inode_size(
> > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > -                       sctx->cur_inode_mode = btrfs_inode_mode(
> > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > -                       ret = send_create_inode_if_needed(sctx);
> > > > -                       if (ret < 0)
> > > > -                               goto out;
> > > > +                       if (new_nlinks > 0) {
> > > > +                               sctx->cur_inode_gen = left_gen;
> > > > +                               sctx->cur_inode_new = true;
> > > > +                               sctx->cur_inode_deleted = false;
> > > > +                               sctx->cur_inode_size = btrfs_inode_size(
> > > > +                                               sctx->left_path->nodes[0],
> > > > +                                               left_ii);
> > > > +                               sctx->cur_inode_mode = btrfs_inode_mode(
> > > > +                                               sctx->left_path->nodes[0],
> > > > +                                               left_ii);
> > > > +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > > +                                               sctx->left_path->nodes[0],
> > > > +                                               left_ii);
> > > > +                               ret = send_create_inode_if_needed(sctx);
> > > > +                               if (ret < 0)
> > > > +                                       goto out;
> > > >
> > > > -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> > > > -                       if (ret < 0)
> > > > -                               goto out;
> > > > -                       /*
> > > > -                        * Advance send_progress now as we did not get into
> > > > -                        * process_recorded_refs_if_needed in the new_gen case.
> > > > -                        */
> > > > -                       sctx->send_progress = sctx->cur_ino + 1;
> > > > +                               ret = process_all_refs(sctx,
> > > > +                                               BTRFS_COMPARE_TREE_NEW);
> > > > +                               if (ret < 0)
> > > > +                                       goto out;
> > > > +                               /*
> > > > +                                * Advance send_progress now as we did not get
> > > > +                                * into process_recorded_refs_if_needed in the
> > > > +                                * new_gen case.
> > > > +                                */
> > > > +                               sctx->send_progress = sctx->cur_ino + 1;
> > > >
> > > > -                       /*
> > > > -                        * Now process all extents and xattrs of the inode as if
> > > > -                        * they were all new.
> > > > -                        */
> > > > -                       ret = process_all_extents(sctx);
> > > > -                       if (ret < 0)
> > > > -                               goto out;
> > > > -                       ret = process_all_new_xattrs(sctx);
> > > > -                       if (ret < 0)
> > > > -                               goto out;
> > > > +                               /*
> > > > +                                * Now process all extents and xattrs of the
> > > > +                                * inode as if they were all new.
> > > > +                                */
> > > > +                               ret = process_all_extents(sctx);
> > > > +                               if (ret < 0)
> > > > +                                       goto out;
> > > > +                               ret = process_all_new_xattrs(sctx);
> > > > +                               if (ret < 0)
> > > > +                                       goto out;
> > > > +                       }
> > > >                 } else {
> > > >                         sctx->cur_inode_gen = left_gen;
> > > >                         sctx->cur_inode_new = false;
> > > > --
> > > > 2.37.1
> > > >
Filipe Manana Sept. 23, 2022, 3:32 p.m. UTC | #7
On Fri, Sep 23, 2022 at 3:16 PM bingjing chang <bxxxjxxg@gmail.com> wrote:
>
> Filipe Manana <fdmanana@kernel.org> 於 2022年9月22日 週四 下午6:08寫道:
> >
> > ,
> >
> > On Fri, Aug 12, 2022 at 3:36 PM bingjing chang <bxxxjxxg@gmail.com> wrote:
> > >
> > > Filipe Manana <fdmanana@kernel.org> 於 2022年8月11日 週四 晚上8:00寫道:
> > > >
> > > > On Thu, Aug 11, 2022 at 11:09 AM bingjingc <bingjingc@synology.com> wrote:
> > > > >
> > > > > From: BingJing Chang <bingjingc@synology.com>
> > > > >
> > > > > There is a bug causing send failures when processing an orphan directory
> > > > > with no links. In commit 46b2f4590aab ("Btrfs: fix send failure when root
> > > > > has deleted files still open")', the orphan inode issue was addressed. The
> > > > > send operation fails with a ENOENT error because of any attempts to
> > > > > generate a path for the inode with a link count of zero. Therefore, in that
> > > > > patch, sctx->ignore_cur_inode was introduced to be set if the current inode
> > > > > has a link count of zero for bypassing some unnecessary steps. And a helper
> > > > > function btrfs_unlink_all_paths() was introduced and called to clean up old
> > > > > paths found in the parent snapshot. However, not only regular files but
> > > > > also directories can be orphan inodes. So if the send operation meets an
> > > > > orphan directory, it will issue a wrong unlink command for that directory
> > > > > now. Soon the receive operation fails with a EISDIR error. Besides, the
> > > > > send operation also fails with a ENOENT error later when it tries to
> > > > > generate a path of it.
> > > > >
> > > > > Similar example but making an orphan dir for an incremental send:
> > > > >
> > > > >   $ btrfs subvolume create vol
> > > > >   $ mkdir vol/dir
> > > > >   $ touch vol/dir/foo
> > > > >
> > > > >   $ btrfs subvolume snapshot -r vol snap1
> > > > >   $ btrfs subvolume snapshot -r vol snap2
> > > > >
> > > > >   # Turn the second snapshot to RW mode and delete the whole dir while
> > > > >   # holding an open file descriptor on it.
> > > > >   $ btrfs property set snap2 ro false
> > > > >   $ exec 73<snap2/dir
> > > > >   $ rm -rf snap2/dir
> > > > >
> > > > >   # Set the second snapshot back to RO mode and do an incremental send.
> > > > >   $ btrfs property set snap2 ro true
> > > > >   $ mkdir receive_dir
> > > > >   $ btrfs send snap2 -p snap1 | btrfs receive receive_dir/
> > > > >   At subvol snap2
> > > > >   At snapshot snap2
> > > > >   ERROR: send ioctl failed with -2: No such file or directory
> > > > >   ERROR: unlink dir failed. Is a directory
> > > > >
> > > > > Actually, orphan inodes are more common use cases in cascading backups.
> > > > > (Please see the illustration below.) In a cascading backup, a user wants
> > > > > to replicate a couple of snapshots from Machine A to Machine B and from
> > > > > Machine B to Machine C. Machine B doesn't take any RO snapshots for
> > > > > sending. All a receiver does is create an RW snapshot of its parent
> > > > > snapshot, apply the send stream and turn it into RO mode at the end. Even
> > > > > if all paths of some inodes are deleted in applying the send stream, these
> > > > > inodes would not be deleted and become orphans after changing the subvolume
> > > > > from RW to RO. Moreover, orphan inodes can occur not only in send snapshots
> > > > > but also in parent snapshots because Machine B may do a batch replication
> > > > > of a couple of snapshots.
> > > > >
> > > > > An illustration for cascading backups:
> > > > > Machine A (snapshot {1..n}) --> Machine B --> Machine C
> > > > >
> > > > > The intuition to solve the problem is to delete all the items of orphan
> > > > > inodes before using these snapshots for sending. I used to think that the
> > > > > reasonable timing for doing that is during the ioctl of changing the
> > > > > subvolume from RW to RO because it sounds good that we will not modify the
> > > > > fs tree of a RO snapshot anymore. However, attempting to do the orphan
> > > > > cleanup in the ioctl would be pointless. Because if someone is holding an
> > > > > open file descriptor on the inode, the reference count of the inode will
> > > > > never drop to 0. Then iput() cannot trigger eviction, which finally deletes
> > > > > all the items of it. So we try to extend the original patch to handle
> > > > > orphans in send/parent snapshots. Here are several cases that need to be
> > > > > considered:
> > > > >
> > > > > Case 1: BTRFS_COMPARE_TREE_NEW
> > > > >        |  send snapshot  | action
> > > > >  --------------------------------
> > > > >  nlink |        0        | ignore
> > > > >
> > > > > In case 1, when we get a BTRFS_COMPARE_TREE_NEW tree comparison result,
> > > > > it means that a new inode is found in the send snapshot and it doesn't
> > > > > appear in the parent snapshot. Since this inode has a link count of zero
> > > > > (It's an orphan and there're no paths for it.), we can leverage
> > > > > sctx->ignore_cur_inode in the original patch to prevent it from being
> > > > > created.
> > > > >
> > > > > Case 2: BTRFS_COMPARE_TREE_DELETED
> > > > >        | parent snapshot | action
> > > > >  ----------------------------------
> > > > >  nlink |        0        | as usual
> > > > >
> > > > > In case 2, when we get a BTRFS_COMPARE_TREE_DELETED tree comparison
> > > > > result, it means that the inode only appears in the parent snapshot.
> > > > > As usual, the send operation will try to delete all its paths. However,
> > > > > this inode has a link count of zero, so no paths of it will be found. No
> > > > > deletion operations will be issued. We don't need to change any logic.
> > > > >
> > > > > Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > > >            |       | parent snapshot | send snapshot | action
> > > > >  -----------------------------------------------------------------------
> > > > >  subcase 1 | nlink |        0        |       0       | ignore
> > > > >  subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > > >  subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > > >
> > > > > In case 3, when we get a BTRFS_COMPARE_TREE_CHANGED tree comparison result,
> > > > > it means that the inode appears in both snapshots. Here're three subcases.
> > > > >
> > > > > First, if the inode has link counts of zero in both snapshots. Since there
> > > > > are no paths for this inode in (source/destination) parent snapshots and we
> > > > > don't care about whether there is also an orphan inode in destination or
> > > > > not, we can set sctx->ignore_cur_inode on to prevent it from being created.
> > > > >
> > > > > For the second and the third subcases, if there're paths in one snapshot
> > > > > and there're no paths in the other snapshot for this inode. We can treat
> > > > > this inode as a new generation. We can also leverage the logic handling a
> > > > > new generation of an inode with small adjustments. Then it will delete all
> > > > > old paths and create a new inode with new attributes and paths only when
> > > > > there's a positive link count in the send snapshot. In subcase 2, the
> > > > > send operation only needs to delete all old paths as in the parent
> > > > > snapshot. But it may require more operations for a directory to remove its
> > > > > old paths. If a not-empty directory is going to be deleted (because it has
> > > > > a link count of zero in the send snapshot) but there're files/directories
> > > > > with bigger inode numbers under it, the send operation will need to rename
> > > > > it to its orphan name first. After processing and deleting the last item
> > > > > under this directory, the send operation will check this directory, aka
> > > > > the parent directory of the last item, again and issue a rmdir operation
> > > > > to remove it finally. Therefore, we also need to treat inodes with a link
> > > > > count of zero as if they didn't exist in get_cur_inode_state(), which is
> > > > > used in process_recorded_refs(). By doing this, when reviewing a directory
> > > > > with orphan names after the last item under it has been deleted, the send
> > > > > operation now can properly issue a rmdir operation. Otherwise, without
> > > > > doing this, the orphan directory with an orphan name would be kept here
> > > > > at the end due to the existing inode with a link count of zero being found.
> > > > > In subcase 3, as in case 2, no old paths would be found, so no deletion
> > > > > operations will be issued. The send operation will only create a new one
> > > > > for that inode.
> > > > >
> > > > > Note that subcase 3 is not a common case. That's because it's easy to
> > > > > reduce the hard links of an inode, but once all valid paths are removed,
> > > > > there're no valid paths for creating other hard links. The only way to do
> > > > > that is trying to send an older snapshot after a newer snapshot has been
> > > > > sent.
> > > > >
> > > > > Cc: <stable@vger.kernel.org> # 4.9: 46b2f4590aab: Btrfs: fix send
> > > > > failure when root has deleted files still open
> > > > > Cc: <stable@vger.kernel.org> # 4.9: 71ecfc133b03: btrfs: send:
> > > > > introduce recorded_ref_alloc and recorded_ref_free
> > > > > Cc: <stable@vger.kernel.org> # 4.9: 3aa5bd367fa5: btrfs: send: fix
> > > > > sending link commands for existing file paths
> > > > > Cc: <stable@vger.kernel.org> # 4.9: 0d8869fb6b6f8: btrfs: send: always
> > > > > use the rbtree based inode ref management infrastructure
> > > >
> > > > Btw, lines with CC, Fixes, etc, tags should not be broken even if they
> > > > are wider than 74 characters.
> > > >
> > >
> > > Okay, thank you for telling me that.
> > >
> > > > So, in v1 when I gave you that example of CC stable tags, it wasn't
> > > > meant for you to literally copy-paste them.
> > > >
> > > > First I asked if the purpose of the original Fixes tag was to backport
> > > > the fix to stable releases.
> > > > Was that the intention? You didn't provide an answer about that.
> > > >
> > >
> > > Oh, I misunderstood your suggestion. I'm sorry about that.
> > > Our intention is to report this bug and try to provide a reasonable and
> > > acceptable fix for it. Backporting is not our goal.
> > >
> > > > Then I told if that was the case, the proper way would be adding CC
> > > > stable tags and listing any
> > > > dependencies. I gave those 4 as examples with commits that are fairly
> > > > recent and obvious dependencies,
> > > > but I also said that probably there's a lot more missing - especially
> > > > if we want to backport to as far as 4.9.
> > > >
> > >
> > > > Even with just those 4 dependencies, some of those commits are fairly
> > > > large, and that may be frowned upon
> > > > according to stable backport rules (listed at
> > > > https://www.kernel.org/doc/Documentation/process/stable-kernel-rules.rst).
> > > > For e.g., patches with over 100 lines changed.
> > > >
> > > > Now, did you actually verify if there were more dependencies? (and test)
> > > > And do you really want to go as far as 4.9 (currently the oldest
> > > > stable release)?
> > >
> > > No, I didn't. I used to think the CC tag was a very cool feature, which
> > > just putting a few commits lets backport easily when I read your mail,
> > > so I copied and pasted these 4 commits in the beginning of revising
> > > the patch v2. However, I'm wrong.
> > >
> > > > I seriously doubt that those 4 commits are the only dependencies in
> > > > order to be able to cleanly backport to 4.9 and other old branches.
> > > >
> > > > It may be better to backport only to a few younger stable branches, or
> > > > just provide later a version of the patch to
> > > > apply to each desired stable branch (once the fix is in Linus' tree
> > > > and in a -rc release).
> > > >
> > > > If you are not interested in backporting to stable or don't have the
> > > > time to verify the dependencies and test, then just remove all the
> > > > stable tags.
> > > > Just leave a fixes tag:
> > > >
> > > > Fixes: 31db9f7c23fbf7 ("Btrfs: introduce BTRFS_IOC_SEND for btrfs send/receive")
> > > >
> > >
> > > Since backporting is not our goal. I will just leave the fix tag here.
> > >
> > > > Also, please don't forget to send a test case for fstests, covering as
> > > > many cases as possible (not just the example
> > > > at the beginning of the changelog).
> > > >
> > >
> > > Okay, I will submit a test case covering all cases.
> > > Because I still need to spend time learning how to use the fssum utility
> > > for the last test case you reviewed, so I will submit the test case later.
> >
> > BingJing, any progress with the test case?
> > We would love to have that in fstests to help prevent regressions in the future.
> >
> > Thanks.
> >
>
> Hi Filipe,
>
> I used to wait the git pull of my patch to the master branch, then I can
> attach the commit uuid in tests as before. However, it seems that there're
> troubles for the maintainer to pull all of the bug fixes into the new
> kernel 6.0 at once.

Ah, yes, it will only be in 6.1, so it should land in Linus' tree in
about 2 weeks maybe (we are at 6.0-rc6, next week is likely the final
rc7).
If you want to wait for that, it's fine. Just wanted to check if you
needed any help and that it wasn't forgotten.

Thanks!

>
> Okay, I will try to create a test covering the cases this weekend.
> Sorry for the inconvenience made.
>
> Thanks.
>
> > >
> > > Thanks.
> > >
> > > > Thanks.
> > > >
> > > > > Reviewed-by: Robbie Ko <robbieko@synology.com>
> > > > > Signed-off-by: BingJing Chang <bingjingc@synology.com>
> > > > > ---
> > > > >  fs/btrfs/send.c | 214 +++++++++++++++++++-----------------------------
> > > > >  1 file changed, 85 insertions(+), 129 deletions(-)
> > > > >
> > > > > diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
> > > > > index f8d77a33b9b7..6ab1ba66ff4b 100644
> > > > > --- a/fs/btrfs/send.c
> > > > > +++ b/fs/btrfs/send.c
> > > > > @@ -850,6 +850,7 @@ struct btrfs_inode_info {
> > > > >         u64 gid;
> > > > >         u64 rdev;
> > > > >         u64 attr;
> > > > > +       u64 nlink;
> > > > >  };
> > > > >
> > > > >  /*
> > > > > @@ -888,6 +889,7 @@ static int get_inode_info(struct btrfs_root *root, u64 ino,
> > > > >         info->uid = btrfs_inode_uid(path->nodes[0], ii);
> > > > >         info->gid = btrfs_inode_gid(path->nodes[0], ii);
> > > > >         info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
> > > > > +       info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
> > > > >         /*
> > > > >          * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
> > > > >          * otherwise logically split to 32/32 parts.
> > > > > @@ -1652,19 +1654,22 @@ static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
> > > > >         int right_ret;
> > > > >         u64 left_gen;
> > > > >         u64 right_gen;
> > > > > +       struct btrfs_inode_info info;
> > > > >
> > > > > -       ret = get_inode_gen(sctx->send_root, ino, &left_gen);
> > > > > +       ret = get_inode_info(sctx->send_root, ino, &info);
> > > > >         if (ret < 0 && ret != -ENOENT)
> > > > >                 goto out;
> > > > > -       left_ret = ret;
> > > > > +       left_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > > > +       left_gen = info.gen;
> > > > >
> > > > >         if (!sctx->parent_root) {
> > > > >                 right_ret = -ENOENT;
> > > > >         } else {
> > > > > -               ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
> > > > > +               ret = get_inode_info(sctx->parent_root, ino, &info);
> > > > >                 if (ret < 0 && ret != -ENOENT)
> > > > >                         goto out;
> > > > > -               right_ret = ret;
> > > > > +               right_ret = (info.nlink == 0) ? -ENOENT : ret;
> > > > > +               right_gen = info.gen;
> > > > >         }
> > > > >
> > > > >         if (!left_ret && !right_ret) {
> > > > > @@ -6413,86 +6418,6 @@ static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
> > > > >         return ret;
> > > > >  }
> > > > >
> > > > > -struct parent_paths_ctx {
> > > > > -       struct list_head *refs;
> > > > > -       struct send_ctx *sctx;
> > > > > -};
> > > > > -
> > > > > -static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
> > > > > -                            void *ctx)
> > > > > -{
> > > > > -       struct parent_paths_ctx *ppctx = ctx;
> > > > > -
> > > > > -       /*
> > > > > -        * Pass 0 as the generation for the directory, we don't care about it
> > > > > -        * here as we have no new references to add, we just want to delete all
> > > > > -        * references for an inode.
> > > > > -        */
> > > > > -       return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
> > > > > -                                 name, dir, 0, ppctx->sctx);
> > > > > -}
> > > > > -
> > > > > -/*
> > > > > - * Issue unlink operations for all paths of the current inode found in the
> > > > > - * parent snapshot.
> > > > > - */
> > > > > -static int btrfs_unlink_all_paths(struct send_ctx *sctx)
> > > > > -{
> > > > > -       LIST_HEAD(deleted_refs);
> > > > > -       struct btrfs_path *path;
> > > > > -       struct btrfs_root *root = sctx->parent_root;
> > > > > -       struct btrfs_key key;
> > > > > -       struct btrfs_key found_key;
> > > > > -       struct parent_paths_ctx ctx;
> > > > > -       int iter_ret = 0;
> > > > > -       int ret;
> > > > > -
> > > > > -       path = alloc_path_for_send();
> > > > > -       if (!path)
> > > > > -               return -ENOMEM;
> > > > > -
> > > > > -       key.objectid = sctx->cur_ino;
> > > > > -       key.type = BTRFS_INODE_REF_KEY;
> > > > > -       key.offset = 0;
> > > > > -
> > > > > -       ctx.refs = &deleted_refs;
> > > > > -       ctx.sctx = sctx;
> > > > > -
> > > > > -       btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
> > > > > -               if (found_key.objectid != key.objectid)
> > > > > -                       break;
> > > > > -               if (found_key.type != key.type &&
> > > > > -                   found_key.type != BTRFS_INODE_EXTREF_KEY)
> > > > > -                       break;
> > > > > -
> > > > > -               ret = iterate_inode_ref(root, path, &found_key, 1,
> > > > > -                                       record_parent_ref, &ctx);
> > > > > -               if (ret < 0)
> > > > > -                       goto out;
> > > > > -       }
> > > > > -       /* Catch error found during iteration */
> > > > > -       if (iter_ret < 0) {
> > > > > -               ret = iter_ret;
> > > > > -               goto out;
> > > > > -       }
> > > > > -
> > > > > -       while (!list_empty(&deleted_refs)) {
> > > > > -               struct recorded_ref *ref;
> > > > > -
> > > > > -               ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
> > > > > -               ret = send_unlink(sctx, ref->full_path);
> > > > > -               if (ret < 0)
> > > > > -                       goto out;
> > > > > -               recorded_ref_free(ref);
> > > > > -       }
> > > > > -       ret = 0;
> > > > > -out:
> > > > > -       btrfs_free_path(path);
> > > > > -       if (ret)
> > > > > -               __free_recorded_refs(&deleted_refs);
> > > > > -       return ret;
> > > > > -}
> > > > > -
> > > > >  static void close_current_inode(struct send_ctx *sctx)
> > > > >  {
> > > > >         u64 i_size;
> > > > > @@ -6583,25 +6508,37 @@ static int changed_inode(struct send_ctx *sctx,
> > > > >          * file descriptor against it or turning a RO snapshot into RW mode,
> > > > >          * keep an open file descriptor against a file, delete it and then
> > > > >          * turn the snapshot back to RO mode before using it for a send
> > > > > -        * operation. So if we find such cases, ignore the inode and all its
> > > > > -        * items completely if it's a new inode, or if it's a changed inode
> > > > > -        * make sure all its previous paths (from the parent snapshot) are all
> > > > > -        * unlinked and all other the inode items are ignored.
> > > > > +        * operation. The former is what the receiver operation does.
> > > > > +        * Therefore, if we want to send these snapshots soon after they're
> > > > > +        * received, we need to handle orphan inodes as well. Moreover,
> > > > > +        * orphans can appear not only in the send snapshot but also in the
> > > > > +        * parent snapshot. Here are several cases:
> > > > > +        *
> > > > > +        * Case 1: BTRFS_COMPARE_TREE_NEW
> > > > > +        *       |  send snapshot  | action
> > > > > +        * --------------------------------
> > > > > +        * nlink |        0        | ignore
> > > > > +        *
> > > > > +        * Case 2: BTRFS_COMPARE_TREE_DELETED
> > > > > +        *       | parent snapshot | action
> > > > > +        * ----------------------------------
> > > > > +        * nlink |        0        | as usual
> > > > > +        * Note: No unlinks will be sent because there're no paths for it.
> > > > > +        *
> > > > > +        * Case 3: BTRFS_COMPARE_TREE_CHANGED
> > > > > +        *           |       | parent snapshot | send snapshot | action
> > > > > +        * -----------------------------------------------------------------------
> > > > > +        * subcase 1 | nlink |        0        |       0       | ignore
> > > > > +        * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
> > > > > +        * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
> > > > > +        *
> > > > >          */
> > > > > -       if (result == BTRFS_COMPARE_TREE_NEW ||
> > > > > -           result == BTRFS_COMPARE_TREE_CHANGED) {
> > > > > -               u32 nlinks;
> > > > > -
> > > > > -               nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
> > > > > -               if (nlinks == 0) {
> > > > > +       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > > > +               if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
> > > > > +                                     0) {
> > > > >                         sctx->ignore_cur_inode = true;
> > > > > -                       if (result == BTRFS_COMPARE_TREE_CHANGED)
> > > > > -                               ret = btrfs_unlink_all_paths(sctx);
> > > > >                         goto out;
> > > > >                 }
> > > > > -       }
> > > > > -
> > > > > -       if (result == BTRFS_COMPARE_TREE_NEW) {
> > > > >                 sctx->cur_inode_gen = left_gen;
> > > > >                 sctx->cur_inode_new = true;
> > > > >                 sctx->cur_inode_deleted = false;
> > > > > @@ -6622,6 +6559,18 @@ static int changed_inode(struct send_ctx *sctx,
> > > > >                 sctx->cur_inode_mode = btrfs_inode_mode(
> > > > >                                 sctx->right_path->nodes[0], right_ii);
> > > > >         } else if (result == BTRFS_COMPARE_TREE_CHANGED) {
> > > > > +               u32 new_nlinks, old_nlinks;
> > > > > +
> > > > > +               new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
> > > > > +                                              left_ii);
> > > > > +               old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
> > > > > +                                              right_ii);
> > > > > +               if (new_nlinks == 0 && old_nlinks == 0) {
> > > > > +                       sctx->ignore_cur_inode = true;
> > > > > +                       goto out;
> > > > > +               } else if (new_nlinks == 0 || old_nlinks == 0) {
> > > > > +                       sctx->cur_inode_new_gen = 1;
> > > > > +               }
> > > > >                 /*
> > > > >                  * We need to do some special handling in case the inode was
> > > > >                  * reported as changed with a changed generation number. This
> > > > > @@ -6648,38 +6597,45 @@ static int changed_inode(struct send_ctx *sctx,
> > > > >                         /*
> > > > >                          * Now process the inode as if it was new.
> > > > >                          */
> > > > > -                       sctx->cur_inode_gen = left_gen;
> > > > > -                       sctx->cur_inode_new = true;
> > > > > -                       sctx->cur_inode_deleted = false;
> > > > > -                       sctx->cur_inode_size = btrfs_inode_size(
> > > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > > -                       sctx->cur_inode_mode = btrfs_inode_mode(
> > > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > > -                       sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > > > -                                       sctx->left_path->nodes[0], left_ii);
> > > > > -                       ret = send_create_inode_if_needed(sctx);
> > > > > -                       if (ret < 0)
> > > > > -                               goto out;
> > > > > +                       if (new_nlinks > 0) {
> > > > > +                               sctx->cur_inode_gen = left_gen;
> > > > > +                               sctx->cur_inode_new = true;
> > > > > +                               sctx->cur_inode_deleted = false;
> > > > > +                               sctx->cur_inode_size = btrfs_inode_size(
> > > > > +                                               sctx->left_path->nodes[0],
> > > > > +                                               left_ii);
> > > > > +                               sctx->cur_inode_mode = btrfs_inode_mode(
> > > > > +                                               sctx->left_path->nodes[0],
> > > > > +                                               left_ii);
> > > > > +                               sctx->cur_inode_rdev = btrfs_inode_rdev(
> > > > > +                                               sctx->left_path->nodes[0],
> > > > > +                                               left_ii);
> > > > > +                               ret = send_create_inode_if_needed(sctx);
> > > > > +                               if (ret < 0)
> > > > > +                                       goto out;
> > > > >
> > > > > -                       ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
> > > > > -                       if (ret < 0)
> > > > > -                               goto out;
> > > > > -                       /*
> > > > > -                        * Advance send_progress now as we did not get into
> > > > > -                        * process_recorded_refs_if_needed in the new_gen case.
> > > > > -                        */
> > > > > -                       sctx->send_progress = sctx->cur_ino + 1;
> > > > > +                               ret = process_all_refs(sctx,
> > > > > +                                               BTRFS_COMPARE_TREE_NEW);
> > > > > +                               if (ret < 0)
> > > > > +                                       goto out;
> > > > > +                               /*
> > > > > +                                * Advance send_progress now as we did not get
> > > > > +                                * into process_recorded_refs_if_needed in the
> > > > > +                                * new_gen case.
> > > > > +                                */
> > > > > +                               sctx->send_progress = sctx->cur_ino + 1;
> > > > >
> > > > > -                       /*
> > > > > -                        * Now process all extents and xattrs of the inode as if
> > > > > -                        * they were all new.
> > > > > -                        */
> > > > > -                       ret = process_all_extents(sctx);
> > > > > -                       if (ret < 0)
> > > > > -                               goto out;
> > > > > -                       ret = process_all_new_xattrs(sctx);
> > > > > -                       if (ret < 0)
> > > > > -                               goto out;
> > > > > +                               /*
> > > > > +                                * Now process all extents and xattrs of the
> > > > > +                                * inode as if they were all new.
> > > > > +                                */
> > > > > +                               ret = process_all_extents(sctx);
> > > > > +                               if (ret < 0)
> > > > > +                                       goto out;
> > > > > +                               ret = process_all_new_xattrs(sctx);
> > > > > +                               if (ret < 0)
> > > > > +                                       goto out;
> > > > > +                       }
> > > > >                 } else {
> > > > >                         sctx->cur_inode_gen = left_gen;
> > > > >                         sctx->cur_inode_new = false;
> > > > > --
> > > > > 2.37.1
> > > > >
diff mbox series

Patch

diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index f8d77a33b9b7..6ab1ba66ff4b 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -850,6 +850,7 @@  struct btrfs_inode_info {
 	u64 gid;
 	u64 rdev;
 	u64 attr;
+	u64 nlink;
 };
 
 /*
@@ -888,6 +889,7 @@  static int get_inode_info(struct btrfs_root *root, u64 ino,
 	info->uid = btrfs_inode_uid(path->nodes[0], ii);
 	info->gid = btrfs_inode_gid(path->nodes[0], ii);
 	info->rdev = btrfs_inode_rdev(path->nodes[0], ii);
+	info->nlink = btrfs_inode_nlink(path->nodes[0], ii);
 	/*
 	 * Transfer the unchanged u64 value of btrfs_inode_item::flags, that's
 	 * otherwise logically split to 32/32 parts.
@@ -1652,19 +1654,22 @@  static int get_cur_inode_state(struct send_ctx *sctx, u64 ino, u64 gen)
 	int right_ret;
 	u64 left_gen;
 	u64 right_gen;
+	struct btrfs_inode_info info;
 
-	ret = get_inode_gen(sctx->send_root, ino, &left_gen);
+	ret = get_inode_info(sctx->send_root, ino, &info);
 	if (ret < 0 && ret != -ENOENT)
 		goto out;
-	left_ret = ret;
+	left_ret = (info.nlink == 0) ? -ENOENT : ret;
+	left_gen = info.gen;
 
 	if (!sctx->parent_root) {
 		right_ret = -ENOENT;
 	} else {
-		ret = get_inode_gen(sctx->parent_root, ino, &right_gen);
+		ret = get_inode_info(sctx->parent_root, ino, &info);
 		if (ret < 0 && ret != -ENOENT)
 			goto out;
-		right_ret = ret;
+		right_ret = (info.nlink == 0) ? -ENOENT : ret;
+		right_gen = info.gen;
 	}
 
 	if (!left_ret && !right_ret) {
@@ -6413,86 +6418,6 @@  static int finish_inode_if_needed(struct send_ctx *sctx, int at_end)
 	return ret;
 }
 
-struct parent_paths_ctx {
-	struct list_head *refs;
-	struct send_ctx *sctx;
-};
-
-static int record_parent_ref(int num, u64 dir, int index, struct fs_path *name,
-			     void *ctx)
-{
-	struct parent_paths_ctx *ppctx = ctx;
-
-	/*
-	 * Pass 0 as the generation for the directory, we don't care about it
-	 * here as we have no new references to add, we just want to delete all
-	 * references for an inode.
-	 */
-	return record_ref_in_tree(&ppctx->sctx->rbtree_deleted_refs, ppctx->refs,
-				  name, dir, 0, ppctx->sctx);
-}
-
-/*
- * Issue unlink operations for all paths of the current inode found in the
- * parent snapshot.
- */
-static int btrfs_unlink_all_paths(struct send_ctx *sctx)
-{
-	LIST_HEAD(deleted_refs);
-	struct btrfs_path *path;
-	struct btrfs_root *root = sctx->parent_root;
-	struct btrfs_key key;
-	struct btrfs_key found_key;
-	struct parent_paths_ctx ctx;
-	int iter_ret = 0;
-	int ret;
-
-	path = alloc_path_for_send();
-	if (!path)
-		return -ENOMEM;
-
-	key.objectid = sctx->cur_ino;
-	key.type = BTRFS_INODE_REF_KEY;
-	key.offset = 0;
-
-	ctx.refs = &deleted_refs;
-	ctx.sctx = sctx;
-
-	btrfs_for_each_slot(root, &key, &found_key, path, iter_ret) {
-		if (found_key.objectid != key.objectid)
-			break;
-		if (found_key.type != key.type &&
-		    found_key.type != BTRFS_INODE_EXTREF_KEY)
-			break;
-
-		ret = iterate_inode_ref(root, path, &found_key, 1,
-					record_parent_ref, &ctx);
-		if (ret < 0)
-			goto out;
-	}
-	/* Catch error found during iteration */
-	if (iter_ret < 0) {
-		ret = iter_ret;
-		goto out;
-	}
-
-	while (!list_empty(&deleted_refs)) {
-		struct recorded_ref *ref;
-
-		ref = list_first_entry(&deleted_refs, struct recorded_ref, list);
-		ret = send_unlink(sctx, ref->full_path);
-		if (ret < 0)
-			goto out;
-		recorded_ref_free(ref);
-	}
-	ret = 0;
-out:
-	btrfs_free_path(path);
-	if (ret)
-		__free_recorded_refs(&deleted_refs);
-	return ret;
-}
-
 static void close_current_inode(struct send_ctx *sctx)
 {
 	u64 i_size;
@@ -6583,25 +6508,37 @@  static int changed_inode(struct send_ctx *sctx,
 	 * file descriptor against it or turning a RO snapshot into RW mode,
 	 * keep an open file descriptor against a file, delete it and then
 	 * turn the snapshot back to RO mode before using it for a send
-	 * operation. So if we find such cases, ignore the inode and all its
-	 * items completely if it's a new inode, or if it's a changed inode
-	 * make sure all its previous paths (from the parent snapshot) are all
-	 * unlinked and all other the inode items are ignored.
+	 * operation. The former is what the receiver operation does.
+	 * Therefore, if we want to send these snapshots soon after they're
+	 * received, we need to handle orphan inodes as well. Moreover,
+	 * orphans can appear not only in the send snapshot but also in the
+	 * parent snapshot. Here are several cases:
+	 *
+	 * Case 1: BTRFS_COMPARE_TREE_NEW
+	 *       |  send snapshot  | action
+	 * --------------------------------
+	 * nlink |        0        | ignore
+	 *
+	 * Case 2: BTRFS_COMPARE_TREE_DELETED
+	 *       | parent snapshot | action
+	 * ----------------------------------
+	 * nlink |        0        | as usual
+	 * Note: No unlinks will be sent because there're no paths for it.
+	 *
+	 * Case 3: BTRFS_COMPARE_TREE_CHANGED
+	 *           |       | parent snapshot | send snapshot | action
+	 * -----------------------------------------------------------------------
+	 * subcase 1 | nlink |        0        |       0       | ignore
+	 * subcase 2 | nlink |       >0        |       0       | new_gen(deletion)
+	 * subcase 3 | nlink |        0        |      >0       | new_gen(creation)
+	 *
 	 */
-	if (result == BTRFS_COMPARE_TREE_NEW ||
-	    result == BTRFS_COMPARE_TREE_CHANGED) {
-		u32 nlinks;
-
-		nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii);
-		if (nlinks == 0) {
+	if (result == BTRFS_COMPARE_TREE_NEW) {
+		if (btrfs_inode_nlink(sctx->left_path->nodes[0], left_ii) ==
+				      0) {
 			sctx->ignore_cur_inode = true;
-			if (result == BTRFS_COMPARE_TREE_CHANGED)
-				ret = btrfs_unlink_all_paths(sctx);
 			goto out;
 		}
-	}
-
-	if (result == BTRFS_COMPARE_TREE_NEW) {
 		sctx->cur_inode_gen = left_gen;
 		sctx->cur_inode_new = true;
 		sctx->cur_inode_deleted = false;
@@ -6622,6 +6559,18 @@  static int changed_inode(struct send_ctx *sctx,
 		sctx->cur_inode_mode = btrfs_inode_mode(
 				sctx->right_path->nodes[0], right_ii);
 	} else if (result == BTRFS_COMPARE_TREE_CHANGED) {
+		u32 new_nlinks, old_nlinks;
+
+		new_nlinks = btrfs_inode_nlink(sctx->left_path->nodes[0],
+					       left_ii);
+		old_nlinks = btrfs_inode_nlink(sctx->right_path->nodes[0],
+					       right_ii);
+		if (new_nlinks == 0 && old_nlinks == 0) {
+			sctx->ignore_cur_inode = true;
+			goto out;
+		} else if (new_nlinks == 0 || old_nlinks == 0) {
+			sctx->cur_inode_new_gen = 1;
+		}
 		/*
 		 * We need to do some special handling in case the inode was
 		 * reported as changed with a changed generation number. This
@@ -6648,38 +6597,45 @@  static int changed_inode(struct send_ctx *sctx,
 			/*
 			 * Now process the inode as if it was new.
 			 */
-			sctx->cur_inode_gen = left_gen;
-			sctx->cur_inode_new = true;
-			sctx->cur_inode_deleted = false;
-			sctx->cur_inode_size = btrfs_inode_size(
-					sctx->left_path->nodes[0], left_ii);
-			sctx->cur_inode_mode = btrfs_inode_mode(
-					sctx->left_path->nodes[0], left_ii);
-			sctx->cur_inode_rdev = btrfs_inode_rdev(
-					sctx->left_path->nodes[0], left_ii);
-			ret = send_create_inode_if_needed(sctx);
-			if (ret < 0)
-				goto out;
+			if (new_nlinks > 0) {
+				sctx->cur_inode_gen = left_gen;
+				sctx->cur_inode_new = true;
+				sctx->cur_inode_deleted = false;
+				sctx->cur_inode_size = btrfs_inode_size(
+						sctx->left_path->nodes[0],
+						left_ii);
+				sctx->cur_inode_mode = btrfs_inode_mode(
+						sctx->left_path->nodes[0],
+						left_ii);
+				sctx->cur_inode_rdev = btrfs_inode_rdev(
+						sctx->left_path->nodes[0],
+						left_ii);
+				ret = send_create_inode_if_needed(sctx);
+				if (ret < 0)
+					goto out;
 
-			ret = process_all_refs(sctx, BTRFS_COMPARE_TREE_NEW);
-			if (ret < 0)
-				goto out;
-			/*
-			 * Advance send_progress now as we did not get into
-			 * process_recorded_refs_if_needed in the new_gen case.
-			 */
-			sctx->send_progress = sctx->cur_ino + 1;
+				ret = process_all_refs(sctx,
+						BTRFS_COMPARE_TREE_NEW);
+				if (ret < 0)
+					goto out;
+				/*
+				 * Advance send_progress now as we did not get
+				 * into process_recorded_refs_if_needed in the
+				 * new_gen case.
+				 */
+				sctx->send_progress = sctx->cur_ino + 1;
 
-			/*
-			 * Now process all extents and xattrs of the inode as if
-			 * they were all new.
-			 */
-			ret = process_all_extents(sctx);
-			if (ret < 0)
-				goto out;
-			ret = process_all_new_xattrs(sctx);
-			if (ret < 0)
-				goto out;
+				/*
+				 * Now process all extents and xattrs of the
+				 * inode as if they were all new.
+				 */
+				ret = process_all_extents(sctx);
+				if (ret < 0)
+					goto out;
+				ret = process_all_new_xattrs(sctx);
+				if (ret < 0)
+					goto out;
+			}
 		} else {
 			sctx->cur_inode_gen = left_gen;
 			sctx->cur_inode_new = false;