diff mbox series

[v2] btrfs: fix infinite directory reads

Message ID 1ae6e30a71112e07c727f9e93ff32032051bbce7.1706176168.git.wqu@suse.com (mailing list archive)
State New, archived
Headers show
Series [v2] btrfs: fix infinite directory reads | expand

Commit Message

Qu Wenruo Jan. 25, 2024, 9:50 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

[ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]

The readdir implementation currently processes always up to the last index
it finds. This however can result in an infinite loop if the directory has
a large number of entries such that they won't all fit in the given buffer
passed to the readdir callback, that is, dir_emit() returns a non-zero
value. Because in that case readdir() will be called again and if in the
meanwhile new directory entries were added and we still can't put all the
remaining entries in the buffer, we keep repeating this over and over.

The following C program and test script reproduce the problem:

  $ cat /mnt/readdir_prog.c
  #include <sys/types.h>
  #include <dirent.h>
  #include <stdio.h>

  int main(int argc, char *argv[])
  {
    DIR *dir = opendir(".");
    struct dirent *dd;

    while ((dd = readdir(dir))) {
      printf("%s\n", dd->d_name);
      rename(dd->d_name, "TEMPFILE");
      rename("TEMPFILE", dd->d_name);
    }
    closedir(dir);
  }

  $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/sdi
  MNT=/mnt/sdi

  mkfs.btrfs -f $DEV &> /dev/null
  #mkfs.xfs -f $DEV &> /dev/null
  #mkfs.ext4 -F $DEV &> /dev/null

  mount $DEV $MNT

  mkdir $MNT/testdir
  for ((i = 1; i <= 2000; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  cd $MNT/testdir
  /mnt/readdir_prog

  cd /mnt

  umount $MNT

This behaviour is surprising to applications and it's unlike ext4, xfs,
tmpfs, vfat and other filesystems, which always finish. In this case where
new entries were added due to renames, some file names may be reported
more than once, but this varies according to each filesystem - for example
ext4 never reported the same file more than once while xfs reports the
first 13 file names twice.

So change our readdir implementation to track the last index number when
opendir() is called and then make readdir() never process beyond that
index number. This gives the same behaviour as ext4.

Reported-by: Rob Landley <rob@landley.net>
Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
CC: stable@vger.kernel.org # 5.15
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
[ Resolve a conflict due to member changes in 96d89923fa94 ]
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/ctree.h         |   1 +
 fs/btrfs/delayed-inode.c |   5 +-
 fs/btrfs/delayed-inode.h |   1 +
 fs/btrfs/inode.c         | 131 +++++++++++++++++++++++----------------
 4 files changed, 84 insertions(+), 54 deletions(-)
---
Changelog:
v2:
- Fix the upstream commit hash

Comments

Filipe Manana Jan. 25, 2024, 10:02 a.m. UTC | #1
On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>
> From: Filipe Manana <fdmanana@suse.com>
>
> [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
>
> The readdir implementation currently processes always up to the last index
> it finds. This however can result in an infinite loop if the directory has
> a large number of entries such that they won't all fit in the given buffer
> passed to the readdir callback, that is, dir_emit() returns a non-zero
> value. Because in that case readdir() will be called again and if in the
> meanwhile new directory entries were added and we still can't put all the
> remaining entries in the buffer, we keep repeating this over and over.
>
> The following C program and test script reproduce the problem:
>
>   $ cat /mnt/readdir_prog.c
>   #include <sys/types.h>
>   #include <dirent.h>
>   #include <stdio.h>
>
>   int main(int argc, char *argv[])
>   {
>     DIR *dir = opendir(".");
>     struct dirent *dd;
>
>     while ((dd = readdir(dir))) {
>       printf("%s\n", dd->d_name);
>       rename(dd->d_name, "TEMPFILE");
>       rename("TEMPFILE", dd->d_name);
>     }
>     closedir(dir);
>   }
>
>   $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
>
>   $ cat test.sh
>   #!/bin/bash
>
>   DEV=/dev/sdi
>   MNT=/mnt/sdi
>
>   mkfs.btrfs -f $DEV &> /dev/null
>   #mkfs.xfs -f $DEV &> /dev/null
>   #mkfs.ext4 -F $DEV &> /dev/null
>
>   mount $DEV $MNT
>
>   mkdir $MNT/testdir
>   for ((i = 1; i <= 2000; i++)); do
>       echo -n > $MNT/testdir/file_$i
>   done
>
>   cd $MNT/testdir
>   /mnt/readdir_prog
>
>   cd /mnt
>
>   umount $MNT
>
> This behaviour is surprising to applications and it's unlike ext4, xfs,
> tmpfs, vfat and other filesystems, which always finish. In this case where
> new entries were added due to renames, some file names may be reported
> more than once, but this varies according to each filesystem - for example
> ext4 never reported the same file more than once while xfs reports the
> first 13 file names twice.
>
> So change our readdir implementation to track the last index number when
> opendir() is called and then make readdir() never process beyond that
> index number. This gives the same behaviour as ext4.
>
> Reported-by: Rob Landley <rob@landley.net>
> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
> CC: stable@vger.kernel.org # 5.15
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> Signed-off-by: David Sterba <dsterba@suse.com>
> [ Resolve a conflict due to member changes in 96d89923fa94 ]
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---

Thanks for the backport, and running the corresponding test case from
fstests to verify it's working.

However when backporting a commit, one should also check if there are
fixes for that commit, as they
often introduce regressions or have some other bug - and that's the
case here. We also need to backport
the following 3 commits:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5

One regression, the one regarding rewinddir(3), even has a test case
in fstests too (generic/471) and would have been caught
when running the "dir" group tests in fstests:

https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626

I'll work on making backports of those 3 other patches on top of your
backport, and then send all of them in a series,
including your patch, to make it easier to follow and apply all at once.

Thanks.


>  fs/btrfs/ctree.h         |   1 +
>  fs/btrfs/delayed-inode.c |   5 +-
>  fs/btrfs/delayed-inode.h |   1 +
>  fs/btrfs/inode.c         | 131 +++++++++++++++++++++++----------------
>  4 files changed, 84 insertions(+), 54 deletions(-)
> ---
> Changelog:
> v2:
> - Fix the upstream commit hash
>
> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
> index 1467bf439cb4..7905f178efa3 100644
> --- a/fs/btrfs/ctree.h
> +++ b/fs/btrfs/ctree.h
> @@ -1361,6 +1361,7 @@ struct btrfs_drop_extents_args {
>
>  struct btrfs_file_private {
>         void *filldir_buf;
> +       u64 last_index;
>  };
>
>
> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
> index fd951aeaeac5..5a98c5da1225 100644
> --- a/fs/btrfs/delayed-inode.c
> +++ b/fs/btrfs/delayed-inode.c
> @@ -1513,6 +1513,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
>  }
>
>  bool btrfs_readdir_get_delayed_items(struct inode *inode,
> +                                    u64 last_index,
>                                      struct list_head *ins_list,
>                                      struct list_head *del_list)
>  {
> @@ -1532,14 +1533,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
>
>         mutex_lock(&delayed_node->mutex);
>         item = __btrfs_first_delayed_insertion_item(delayed_node);
> -       while (item) {
> +       while (item && item->key.offset <= last_index) {
>                 refcount_inc(&item->refs);
>                 list_add_tail(&item->readdir_list, ins_list);
>                 item = __btrfs_next_delayed_item(item);
>         }
>
>         item = __btrfs_first_delayed_deletion_item(delayed_node);
> -       while (item) {
> +       while (item && item->key.offset <= last_index) {
>                 refcount_inc(&item->refs);
>                 list_add_tail(&item->readdir_list, del_list);
>                 item = __btrfs_next_delayed_item(item);
> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
> index b2412160c5bc..a9cfce856d2e 100644
> --- a/fs/btrfs/delayed-inode.h
> +++ b/fs/btrfs/delayed-inode.h
> @@ -123,6 +123,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
>
>  /* Used for readdir() */
>  bool btrfs_readdir_get_delayed_items(struct inode *inode,
> +                                    u64 last_index,
>                                      struct list_head *ins_list,
>                                      struct list_head *del_list);
>  void btrfs_readdir_put_delayed_items(struct inode *inode,
> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
> index 95af29634e55..1df374ce829b 100644
> --- a/fs/btrfs/inode.c
> +++ b/fs/btrfs/inode.c
> @@ -6121,6 +6121,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>         return d_splice_alias(inode, dentry);
>  }
>
> +/*
> + * Find the highest existing sequence number in a directory and then set the
> + * in-memory index_cnt variable to the first free sequence number.
> + */
> +static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
> +{
> +       struct btrfs_root *root = inode->root;
> +       struct btrfs_key key, found_key;
> +       struct btrfs_path *path;
> +       struct extent_buffer *leaf;
> +       int ret;
> +
> +       key.objectid = btrfs_ino(inode);
> +       key.type = BTRFS_DIR_INDEX_KEY;
> +       key.offset = (u64)-1;
> +
> +       path = btrfs_alloc_path();
> +       if (!path)
> +               return -ENOMEM;
> +
> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> +       if (ret < 0)
> +               goto out;
> +       /* FIXME: we should be able to handle this */
> +       if (ret == 0)
> +               goto out;
> +       ret = 0;
> +
> +       if (path->slots[0] == 0) {
> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
> +               goto out;
> +       }
> +
> +       path->slots[0]--;
> +
> +       leaf = path->nodes[0];
> +       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> +
> +       if (found_key.objectid != btrfs_ino(inode) ||
> +           found_key.type != BTRFS_DIR_INDEX_KEY) {
> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
> +               goto out;
> +       }
> +
> +       inode->index_cnt = found_key.offset + 1;
> +out:
> +       btrfs_free_path(path);
> +       return ret;
> +}
> +
> +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
> +{
> +       if (dir->index_cnt == (u64)-1) {
> +               int ret;
> +
> +               ret = btrfs_inode_delayed_dir_index_count(dir);
> +               if (ret) {
> +                       ret = btrfs_set_inode_index_count(dir);
> +                       if (ret)
> +                               return ret;
> +               }
> +       }
> +
> +       *index = dir->index_cnt;
> +
> +       return 0;
> +}
> +
>  /*
>   * All this infrastructure exists because dir_emit can fault, and we are holding
>   * the tree lock when doing readdir.  For now just allocate a buffer and copy
> @@ -6133,10 +6201,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>  static int btrfs_opendir(struct inode *inode, struct file *file)
>  {
>         struct btrfs_file_private *private;
> +       u64 last_index;
> +       int ret;
> +
> +       ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
> +       if (ret)
> +               return ret;
>
>         private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
>         if (!private)
>                 return -ENOMEM;
> +       private->last_index = last_index;
>         private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>         if (!private->filldir_buf) {
>                 kfree(private);
> @@ -6205,7 +6280,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>
>         INIT_LIST_HEAD(&ins_list);
>         INIT_LIST_HEAD(&del_list);
> -       put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
> +       put = btrfs_readdir_get_delayed_items(inode, private->last_index,
> +                                             &ins_list, &del_list);
>
>  again:
>         key.type = BTRFS_DIR_INDEX_KEY;
> @@ -6238,6 +6314,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>                         break;
>                 if (found_key.offset < ctx->pos)
>                         goto next;
> +               if (found_key.offset > private->last_index)
> +                       break;
>                 if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
>                         goto next;
>                 di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
> @@ -6371,57 +6449,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
>         return dirty ? btrfs_dirty_inode(inode) : 0;
>  }
>
> -/*
> - * find the highest existing sequence number in a directory
> - * and then set the in-memory index_cnt variable to reflect
> - * free sequence numbers
> - */
> -static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
> -{
> -       struct btrfs_root *root = inode->root;
> -       struct btrfs_key key, found_key;
> -       struct btrfs_path *path;
> -       struct extent_buffer *leaf;
> -       int ret;
> -
> -       key.objectid = btrfs_ino(inode);
> -       key.type = BTRFS_DIR_INDEX_KEY;
> -       key.offset = (u64)-1;
> -
> -       path = btrfs_alloc_path();
> -       if (!path)
> -               return -ENOMEM;
> -
> -       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
> -       if (ret < 0)
> -               goto out;
> -       /* FIXME: we should be able to handle this */
> -       if (ret == 0)
> -               goto out;
> -       ret = 0;
> -
> -       if (path->slots[0] == 0) {
> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
> -               goto out;
> -       }
> -
> -       path->slots[0]--;
> -
> -       leaf = path->nodes[0];
> -       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
> -
> -       if (found_key.objectid != btrfs_ino(inode) ||
> -           found_key.type != BTRFS_DIR_INDEX_KEY) {
> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
> -               goto out;
> -       }
> -
> -       inode->index_cnt = found_key.offset + 1;
> -out:
> -       btrfs_free_path(path);
> -       return ret;
> -}
> -
>  /*
>   * helper to find a free sequence number in a given directory.  This current
>   * code is very simple, later versions will do smarter things in the btree
> --
> 2.43.0
>
>
Eugeniu Rosca Jan. 25, 2024, 11:33 a.m. UTC | #2
Hi Filipe and Qu,

On Thu, Jan 25, 2024 at 10:02:01AM +0000, Filipe Manana wrote:
> On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
> >
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
> >
> > The readdir implementation currently processes always up to the last index
> > it finds. This however can result in an infinite loop if the directory has

[..]

> Thanks for the backport, and running the corresponding test case from
> fstests to verify it's working.
> 
> However when backporting a commit, one should also check if there are
> fixes for that commit, as they
> often introduce regressions or have some other bug - 

+1. Good to see this best practice applied here.

> and that's the
> case here. We also need to backport
> the following 3 commits:
> 
> https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
> https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
> https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5

Good catch. I get the same list thanks to the reference of the culprit:

$ git log --oneline --grep 9b378f6ad linux/master
8e7f82deb0c038 btrfs: fix race between reading a directory and adding entries to it
e60aa5da14d01f btrfs: refresh dir last index during a rewinddir(3) call
357950361cbc6d btrfs: set last dir index to the current last index when opening dir

> One regression, the one regarding rewinddir(3), even has a test case
> in fstests too (generic/471) and would have been caught
> when running the "dir" group tests in fstests:
> 
> https:// git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626
> 
> I'll work on making backports of those 3 other patches on top of your
> backport, and then send all of them in a series,
> including your patch, to make it easier to follow and apply all at once.

Thanks for your support. Looking forward.

BR, Eugeniu
Filipe Manana Jan. 25, 2024, 12:03 p.m. UTC | #3
On Thu, Jan 25, 2024 at 11:34 AM Eugeniu Rosca <erosca@de.adit-jv.com> wrote:
>
> Hi Filipe and Qu,
>
> On Thu, Jan 25, 2024 at 10:02:01AM +0000, Filipe Manana wrote:
> > On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
> > >
> > > From: Filipe Manana <fdmanana@suse.com>
> > >
> > > [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
> > >
> > > The readdir implementation currently processes always up to the last index
> > > it finds. This however can result in an infinite loop if the directory has
>
> [..]
>
> > Thanks for the backport, and running the corresponding test case from
> > fstests to verify it's working.
> >
> > However when backporting a commit, one should also check if there are
> > fixes for that commit, as they
> > often introduce regressions or have some other bug -
>
> +1. Good to see this best practice applied here.
>
> > and that's the
> > case here. We also need to backport
> > the following 3 commits:
> >
> > https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
> > https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
> > https:// git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5
>
> Good catch. I get the same list thanks to the reference of the culprit:
>
> $ git log --oneline --grep 9b378f6ad linux/master
> 8e7f82deb0c038 btrfs: fix race between reading a directory and adding entries to it
> e60aa5da14d01f btrfs: refresh dir last index during a rewinddir(3) call
> 357950361cbc6d btrfs: set last dir index to the current last index when opening dir
>
> > One regression, the one regarding rewinddir(3), even has a test case
> > in fstests too (generic/471) and would have been caught
> > when running the "dir" group tests in fstests:
> >
> > https:// git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626
> >
> > I'll work on making backports of those 3 other patches on top of your
> > backport, and then send all of them in a series,
> > including your patch, to make it easier to follow and apply all at once.
>
> Thanks for your support. Looking forward.

It's here now:
https://lore.kernel.org/linux-btrfs/cover.1706183427.git.fdmanana@suse.com/

>
> BR, Eugeniu
Qu Wenruo Jan. 25, 2024, 11:06 p.m. UTC | #4
On 2024/1/25 20:32, Filipe Manana wrote:
> On Thu, Jan 25, 2024 at 9:51 AM Qu Wenruo <wqu@suse.com> wrote:
>>
>> From: Filipe Manana <fdmanana@suse.com>
>>
>> [ Upstream commit 9b378f6ad48cfa195ed868db9123c09ee7ec5ea2 ]
>>
>> The readdir implementation currently processes always up to the last index
>> it finds. This however can result in an infinite loop if the directory has
>> a large number of entries such that they won't all fit in the given buffer
>> passed to the readdir callback, that is, dir_emit() returns a non-zero
>> value. Because in that case readdir() will be called again and if in the
>> meanwhile new directory entries were added and we still can't put all the
>> remaining entries in the buffer, we keep repeating this over and over.
>>
>> The following C program and test script reproduce the problem:
>>
>>    $ cat /mnt/readdir_prog.c
>>    #include <sys/types.h>
>>    #include <dirent.h>
>>    #include <stdio.h>
>>
>>    int main(int argc, char *argv[])
>>    {
>>      DIR *dir = opendir(".");
>>      struct dirent *dd;
>>
>>      while ((dd = readdir(dir))) {
>>        printf("%s\n", dd->d_name);
>>        rename(dd->d_name, "TEMPFILE");
>>        rename("TEMPFILE", dd->d_name);
>>      }
>>      closedir(dir);
>>    }
>>
>>    $ gcc -o /mnt/readdir_prog /mnt/readdir_prog.c
>>
>>    $ cat test.sh
>>    #!/bin/bash
>>
>>    DEV=/dev/sdi
>>    MNT=/mnt/sdi
>>
>>    mkfs.btrfs -f $DEV &> /dev/null
>>    #mkfs.xfs -f $DEV &> /dev/null
>>    #mkfs.ext4 -F $DEV &> /dev/null
>>
>>    mount $DEV $MNT
>>
>>    mkdir $MNT/testdir
>>    for ((i = 1; i <= 2000; i++)); do
>>        echo -n > $MNT/testdir/file_$i
>>    done
>>
>>    cd $MNT/testdir
>>    /mnt/readdir_prog
>>
>>    cd /mnt
>>
>>    umount $MNT
>>
>> This behaviour is surprising to applications and it's unlike ext4, xfs,
>> tmpfs, vfat and other filesystems, which always finish. In this case where
>> new entries were added due to renames, some file names may be reported
>> more than once, but this varies according to each filesystem - for example
>> ext4 never reported the same file more than once while xfs reports the
>> first 13 file names twice.
>>
>> So change our readdir implementation to track the last index number when
>> opendir() is called and then make readdir() never process beyond that
>> index number. This gives the same behaviour as ext4.
>>
>> Reported-by: Rob Landley <rob@landley.net>
>> Link: https://lore.kernel.org/linux-btrfs/2c8c55ec-04c6-e0dc-9c5c-8c7924778c35@landley.net/
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217681
>> CC: stable@vger.kernel.org # 5.15
>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>> Signed-off-by: David Sterba <dsterba@suse.com>
>> [ Resolve a conflict due to member changes in 96d89923fa94 ]
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>
> Thanks for the backport, and running the corresponding test case from
> fstests to verify it's working.
>
> However when backporting a commit, one should also check if there are
> fixes for that commit, as they
> often introduce regressions or have some other bug - and that's the
> case here. We also need to backport
> the following 3 commits:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=357950361cbc6d54fb68ed878265c647384684ae
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e60aa5da14d01fed8411202dbe4adf6c44bd2a57
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8e7f82deb0c0386a03b62e30082574347f8b57d5
>
> One regression, the one regarding rewinddir(3), even has a test case
> in fstests too (generic/471) and would have been caught
> when running the "dir" group tests in fstests:

My bad, I get used to be informed by our internal building system about
missing fixes.

And obviously there is no such automatic systems checking missing fixes
here...
>
> https://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git/commit/?h=for-next&id=68b958f5dc4ab13cfd86f7fb82621f9f022b7626
>
> I'll work on making backports of those 3 other patches on top of your
> backport, and then send all of them in a series,
> including your patch, to make it easier to follow and apply all at once.

Thanks a lot, that's much appreciated.

Thanks,
Qu
>
> Thanks.
>
>
>>   fs/btrfs/ctree.h         |   1 +
>>   fs/btrfs/delayed-inode.c |   5 +-
>>   fs/btrfs/delayed-inode.h |   1 +
>>   fs/btrfs/inode.c         | 131 +++++++++++++++++++++++----------------
>>   4 files changed, 84 insertions(+), 54 deletions(-)
>> ---
>> Changelog:
>> v2:
>> - Fix the upstream commit hash
>>
>> diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
>> index 1467bf439cb4..7905f178efa3 100644
>> --- a/fs/btrfs/ctree.h
>> +++ b/fs/btrfs/ctree.h
>> @@ -1361,6 +1361,7 @@ struct btrfs_drop_extents_args {
>>
>>   struct btrfs_file_private {
>>          void *filldir_buf;
>> +       u64 last_index;
>>   };
>>
>>
>> diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
>> index fd951aeaeac5..5a98c5da1225 100644
>> --- a/fs/btrfs/delayed-inode.c
>> +++ b/fs/btrfs/delayed-inode.c
>> @@ -1513,6 +1513,7 @@ int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
>>   }
>>
>>   bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> +                                    u64 last_index,
>>                                       struct list_head *ins_list,
>>                                       struct list_head *del_list)
>>   {
>> @@ -1532,14 +1533,14 @@ bool btrfs_readdir_get_delayed_items(struct inode *inode,
>>
>>          mutex_lock(&delayed_node->mutex);
>>          item = __btrfs_first_delayed_insertion_item(delayed_node);
>> -       while (item) {
>> +       while (item && item->key.offset <= last_index) {
>>                  refcount_inc(&item->refs);
>>                  list_add_tail(&item->readdir_list, ins_list);
>>                  item = __btrfs_next_delayed_item(item);
>>          }
>>
>>          item = __btrfs_first_delayed_deletion_item(delayed_node);
>> -       while (item) {
>> +       while (item && item->key.offset <= last_index) {
>>                  refcount_inc(&item->refs);
>>                  list_add_tail(&item->readdir_list, del_list);
>>                  item = __btrfs_next_delayed_item(item);
>> diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
>> index b2412160c5bc..a9cfce856d2e 100644
>> --- a/fs/btrfs/delayed-inode.h
>> +++ b/fs/btrfs/delayed-inode.h
>> @@ -123,6 +123,7 @@ void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
>>
>>   /* Used for readdir() */
>>   bool btrfs_readdir_get_delayed_items(struct inode *inode,
>> +                                    u64 last_index,
>>                                       struct list_head *ins_list,
>>                                       struct list_head *del_list);
>>   void btrfs_readdir_put_delayed_items(struct inode *inode,
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index 95af29634e55..1df374ce829b 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -6121,6 +6121,74 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>>          return d_splice_alias(inode, dentry);
>>   }
>>
>> +/*
>> + * Find the highest existing sequence number in a directory and then set the
>> + * in-memory index_cnt variable to the first free sequence number.
>> + */
>> +static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> +{
>> +       struct btrfs_root *root = inode->root;
>> +       struct btrfs_key key, found_key;
>> +       struct btrfs_path *path;
>> +       struct extent_buffer *leaf;
>> +       int ret;
>> +
>> +       key.objectid = btrfs_ino(inode);
>> +       key.type = BTRFS_DIR_INDEX_KEY;
>> +       key.offset = (u64)-1;
>> +
>> +       path = btrfs_alloc_path();
>> +       if (!path)
>> +               return -ENOMEM;
>> +
>> +       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> +       if (ret < 0)
>> +               goto out;
>> +       /* FIXME: we should be able to handle this */
>> +       if (ret == 0)
>> +               goto out;
>> +       ret = 0;
>> +
>> +       if (path->slots[0] == 0) {
>> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> +               goto out;
>> +       }
>> +
>> +       path->slots[0]--;
>> +
>> +       leaf = path->nodes[0];
>> +       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> +
>> +       if (found_key.objectid != btrfs_ino(inode) ||
>> +           found_key.type != BTRFS_DIR_INDEX_KEY) {
>> +               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> +               goto out;
>> +       }
>> +
>> +       inode->index_cnt = found_key.offset + 1;
>> +out:
>> +       btrfs_free_path(path);
>> +       return ret;
>> +}
>> +
>> +static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
>> +{
>> +       if (dir->index_cnt == (u64)-1) {
>> +               int ret;
>> +
>> +               ret = btrfs_inode_delayed_dir_index_count(dir);
>> +               if (ret) {
>> +                       ret = btrfs_set_inode_index_count(dir);
>> +                       if (ret)
>> +                               return ret;
>> +               }
>> +       }
>> +
>> +       *index = dir->index_cnt;
>> +
>> +       return 0;
>> +}
>> +
>>   /*
>>    * All this infrastructure exists because dir_emit can fault, and we are holding
>>    * the tree lock when doing readdir.  For now just allocate a buffer and copy
>> @@ -6133,10 +6201,17 @@ static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
>>   static int btrfs_opendir(struct inode *inode, struct file *file)
>>   {
>>          struct btrfs_file_private *private;
>> +       u64 last_index;
>> +       int ret;
>> +
>> +       ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
>> +       if (ret)
>> +               return ret;
>>
>>          private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
>>          if (!private)
>>                  return -ENOMEM;
>> +       private->last_index = last_index;
>>          private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
>>          if (!private->filldir_buf) {
>>                  kfree(private);
>> @@ -6205,7 +6280,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>>
>>          INIT_LIST_HEAD(&ins_list);
>>          INIT_LIST_HEAD(&del_list);
>> -       put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
>> +       put = btrfs_readdir_get_delayed_items(inode, private->last_index,
>> +                                             &ins_list, &del_list);
>>
>>   again:
>>          key.type = BTRFS_DIR_INDEX_KEY;
>> @@ -6238,6 +6314,8 @@ static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
>>                          break;
>>                  if (found_key.offset < ctx->pos)
>>                          goto next;
>> +               if (found_key.offset > private->last_index)
>> +                       break;
>>                  if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
>>                          goto next;
>>                  di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
>> @@ -6371,57 +6449,6 @@ static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
>>          return dirty ? btrfs_dirty_inode(inode) : 0;
>>   }
>>
>> -/*
>> - * find the highest existing sequence number in a directory
>> - * and then set the in-memory index_cnt variable to reflect
>> - * free sequence numbers
>> - */
>> -static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
>> -{
>> -       struct btrfs_root *root = inode->root;
>> -       struct btrfs_key key, found_key;
>> -       struct btrfs_path *path;
>> -       struct extent_buffer *leaf;
>> -       int ret;
>> -
>> -       key.objectid = btrfs_ino(inode);
>> -       key.type = BTRFS_DIR_INDEX_KEY;
>> -       key.offset = (u64)-1;
>> -
>> -       path = btrfs_alloc_path();
>> -       if (!path)
>> -               return -ENOMEM;
>> -
>> -       ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
>> -       if (ret < 0)
>> -               goto out;
>> -       /* FIXME: we should be able to handle this */
>> -       if (ret == 0)
>> -               goto out;
>> -       ret = 0;
>> -
>> -       if (path->slots[0] == 0) {
>> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> -               goto out;
>> -       }
>> -
>> -       path->slots[0]--;
>> -
>> -       leaf = path->nodes[0];
>> -       btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
>> -
>> -       if (found_key.objectid != btrfs_ino(inode) ||
>> -           found_key.type != BTRFS_DIR_INDEX_KEY) {
>> -               inode->index_cnt = BTRFS_DIR_START_INDEX;
>> -               goto out;
>> -       }
>> -
>> -       inode->index_cnt = found_key.offset + 1;
>> -out:
>> -       btrfs_free_path(path);
>> -       return ret;
>> -}
>> -
>>   /*
>>    * helper to find a free sequence number in a given directory.  This current
>>    * code is very simple, later versions will do smarter things in the btree
>> --
>> 2.43.0
>>
>>
>
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 1467bf439cb4..7905f178efa3 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -1361,6 +1361,7 @@  struct btrfs_drop_extents_args {
 
 struct btrfs_file_private {
 	void *filldir_buf;
+	u64 last_index;
 };
 
 
diff --git a/fs/btrfs/delayed-inode.c b/fs/btrfs/delayed-inode.c
index fd951aeaeac5..5a98c5da1225 100644
--- a/fs/btrfs/delayed-inode.c
+++ b/fs/btrfs/delayed-inode.c
@@ -1513,6 +1513,7 @@  int btrfs_inode_delayed_dir_index_count(struct btrfs_inode *inode)
 }
 
 bool btrfs_readdir_get_delayed_items(struct inode *inode,
+				     u64 last_index,
 				     struct list_head *ins_list,
 				     struct list_head *del_list)
 {
@@ -1532,14 +1533,14 @@  bool btrfs_readdir_get_delayed_items(struct inode *inode,
 
 	mutex_lock(&delayed_node->mutex);
 	item = __btrfs_first_delayed_insertion_item(delayed_node);
-	while (item) {
+	while (item && item->key.offset <= last_index) {
 		refcount_inc(&item->refs);
 		list_add_tail(&item->readdir_list, ins_list);
 		item = __btrfs_next_delayed_item(item);
 	}
 
 	item = __btrfs_first_delayed_deletion_item(delayed_node);
-	while (item) {
+	while (item && item->key.offset <= last_index) {
 		refcount_inc(&item->refs);
 		list_add_tail(&item->readdir_list, del_list);
 		item = __btrfs_next_delayed_item(item);
diff --git a/fs/btrfs/delayed-inode.h b/fs/btrfs/delayed-inode.h
index b2412160c5bc..a9cfce856d2e 100644
--- a/fs/btrfs/delayed-inode.h
+++ b/fs/btrfs/delayed-inode.h
@@ -123,6 +123,7 @@  void btrfs_destroy_delayed_inodes(struct btrfs_fs_info *fs_info);
 
 /* Used for readdir() */
 bool btrfs_readdir_get_delayed_items(struct inode *inode,
+				     u64 last_index,
 				     struct list_head *ins_list,
 				     struct list_head *del_list);
 void btrfs_readdir_put_delayed_items(struct inode *inode,
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 95af29634e55..1df374ce829b 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -6121,6 +6121,74 @@  static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
 	return d_splice_alias(inode, dentry);
 }
 
+/*
+ * Find the highest existing sequence number in a directory and then set the
+ * in-memory index_cnt variable to the first free sequence number.
+ */
+static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
+{
+	struct btrfs_root *root = inode->root;
+	struct btrfs_key key, found_key;
+	struct btrfs_path *path;
+	struct extent_buffer *leaf;
+	int ret;
+
+	key.objectid = btrfs_ino(inode);
+	key.type = BTRFS_DIR_INDEX_KEY;
+	key.offset = (u64)-1;
+
+	path = btrfs_alloc_path();
+	if (!path)
+		return -ENOMEM;
+
+	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
+	if (ret < 0)
+		goto out;
+	/* FIXME: we should be able to handle this */
+	if (ret == 0)
+		goto out;
+	ret = 0;
+
+	if (path->slots[0] == 0) {
+		inode->index_cnt = BTRFS_DIR_START_INDEX;
+		goto out;
+	}
+
+	path->slots[0]--;
+
+	leaf = path->nodes[0];
+	btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
+
+	if (found_key.objectid != btrfs_ino(inode) ||
+	    found_key.type != BTRFS_DIR_INDEX_KEY) {
+		inode->index_cnt = BTRFS_DIR_START_INDEX;
+		goto out;
+	}
+
+	inode->index_cnt = found_key.offset + 1;
+out:
+	btrfs_free_path(path);
+	return ret;
+}
+
+static int btrfs_get_dir_last_index(struct btrfs_inode *dir, u64 *index)
+{
+	if (dir->index_cnt == (u64)-1) {
+		int ret;
+
+		ret = btrfs_inode_delayed_dir_index_count(dir);
+		if (ret) {
+			ret = btrfs_set_inode_index_count(dir);
+			if (ret)
+				return ret;
+		}
+	}
+
+	*index = dir->index_cnt;
+
+	return 0;
+}
+
 /*
  * All this infrastructure exists because dir_emit can fault, and we are holding
  * the tree lock when doing readdir.  For now just allocate a buffer and copy
@@ -6133,10 +6201,17 @@  static struct dentry *btrfs_lookup(struct inode *dir, struct dentry *dentry,
 static int btrfs_opendir(struct inode *inode, struct file *file)
 {
 	struct btrfs_file_private *private;
+	u64 last_index;
+	int ret;
+
+	ret = btrfs_get_dir_last_index(BTRFS_I(inode), &last_index);
+	if (ret)
+		return ret;
 
 	private = kzalloc(sizeof(struct btrfs_file_private), GFP_KERNEL);
 	if (!private)
 		return -ENOMEM;
+	private->last_index = last_index;
 	private->filldir_buf = kzalloc(PAGE_SIZE, GFP_KERNEL);
 	if (!private->filldir_buf) {
 		kfree(private);
@@ -6205,7 +6280,8 @@  static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 
 	INIT_LIST_HEAD(&ins_list);
 	INIT_LIST_HEAD(&del_list);
-	put = btrfs_readdir_get_delayed_items(inode, &ins_list, &del_list);
+	put = btrfs_readdir_get_delayed_items(inode, private->last_index,
+					      &ins_list, &del_list);
 
 again:
 	key.type = BTRFS_DIR_INDEX_KEY;
@@ -6238,6 +6314,8 @@  static int btrfs_real_readdir(struct file *file, struct dir_context *ctx)
 			break;
 		if (found_key.offset < ctx->pos)
 			goto next;
+		if (found_key.offset > private->last_index)
+			break;
 		if (btrfs_should_delete_dir_index(&del_list, found_key.offset))
 			goto next;
 		di = btrfs_item_ptr(leaf, slot, struct btrfs_dir_item);
@@ -6371,57 +6449,6 @@  static int btrfs_update_time(struct inode *inode, struct timespec64 *now,
 	return dirty ? btrfs_dirty_inode(inode) : 0;
 }
 
-/*
- * find the highest existing sequence number in a directory
- * and then set the in-memory index_cnt variable to reflect
- * free sequence numbers
- */
-static int btrfs_set_inode_index_count(struct btrfs_inode *inode)
-{
-	struct btrfs_root *root = inode->root;
-	struct btrfs_key key, found_key;
-	struct btrfs_path *path;
-	struct extent_buffer *leaf;
-	int ret;
-
-	key.objectid = btrfs_ino(inode);
-	key.type = BTRFS_DIR_INDEX_KEY;
-	key.offset = (u64)-1;
-
-	path = btrfs_alloc_path();
-	if (!path)
-		return -ENOMEM;
-
-	ret = btrfs_search_slot(NULL, root, &key, path, 0, 0);
-	if (ret < 0)
-		goto out;
-	/* FIXME: we should be able to handle this */
-	if (ret == 0)
-		goto out;
-	ret = 0;
-
-	if (path->slots[0] == 0) {
-		inode->index_cnt = BTRFS_DIR_START_INDEX;
-		goto out;
-	}
-
-	path->slots[0]--;
-
-	leaf = path->nodes[0];
-	btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]);
-
-	if (found_key.objectid != btrfs_ino(inode) ||
-	    found_key.type != BTRFS_DIR_INDEX_KEY) {
-		inode->index_cnt = BTRFS_DIR_START_INDEX;
-		goto out;
-	}
-
-	inode->index_cnt = found_key.offset + 1;
-out:
-	btrfs_free_path(path);
-	return ret;
-}
-
 /*
  * helper to find a free sequence number in a given directory.  This current
  * code is very simple, later versions will do smarter things in the btree