libfs: fix infinite directory reads for offset dir

Message ID	20240731043835.1828697-1-yangerkun@huawei.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: yangerkun <yangerkun@huawei.com> To: hch@infradead.org, chuck.lever@oracle.com, brauner@kernel.org, viro@zeniv.linux.org.uk, jack@suse.cz Cc: hughd@google.com, zlang@kernel.org, fdmanana@suse.com, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, yangerkun@huawei.com, yangerkun@huaweicloud.com Subject: [PATCH] libfs: fix infinite directory reads for offset dir Date: Wed, 31 Jul 2024 12:38:35 +0800 Message-Id: <20240731043835.1828697-1-yangerkun@huawei.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	libfs: fix infinite directory reads for offset dir \| expand libfs: fix infinite directory reads for offset dir

yangerkun July 31, 2024, 4:38 a.m. UTC

After we switch tmpfs dir operations from simple_dir_operations to
simple_offset_dir_operations, every rename happened will fill new dentry
to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
key starting with octx->newx_offset, and then set newx_offset equals to
free key + 1. This will lead to infinite readdir combine with rename
happened at the same time, which fail generic/736 in xfstests(detail show
as below).

1. create 5000 files(1 2 3...) under one dir
2. call readdir(man 3 readdir) once, and get one entry
3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
4. loop 2~3, until readdir return nothing or we loop too many
   times(tmpfs break test with the second condition)

We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
directory reads") to fix it, record the last_index when we open dir, and
do not emit the entry which index >= last_index. The file->private_data
now used in offset dir can use directly to do this, and we also update
the last_index when we llseek the dir file.

Fixes: a2e459555c5f ("shmem: stable directory offsets")
Signed-off-by: yangerkun <yangerkun@huawei.com>
---
 fs/libfs.c | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

Jan Kara July 31, 2024, 11:51 a.m. UTC | #1

On Wed 31-07-24 12:38:35, yangerkun wrote:
> After we switch tmpfs dir operations from simple_dir_operations to
> simple_offset_dir_operations, every rename happened will fill new dentry
> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> key starting with octx->newx_offset, and then set newx_offset equals to
> free key + 1. This will lead to infinite readdir combine with rename
> happened at the same time, which fail generic/736 in xfstests(detail show
> as below).
> 
> 1. create 5000 files(1 2 3...) under one dir
> 2. call readdir(man 3 readdir) once, and get one entry
> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
> 4. loop 2~3, until readdir return nothing or we loop too many
>    times(tmpfs break test with the second condition)
> 
> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
> directory reads") to fix it, record the last_index when we open dir, and
> do not emit the entry which index >= last_index. The file->private_data
> now used in offset dir can use directly to do this, and we also update
> the last_index when we llseek the dir file.

The patch looks good! Just I'm not sure about the llseek part. As far as I
understand it was added due to this sentence in the standard:

"If a file is removed from or added to the directory after the most recent
call to opendir() or rewinddir(), whether a subsequent call to readdir()
returns an entry for that file is unspecified."

So if the offset used in offset_dir_llseek() is 0, then we should update
last_index. But otherwise I'd leave it alone because IMHO it would do more
harm than good.

								Honza

> 
> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> Signed-off-by: yangerkun <yangerkun@huawei.com>
> ---
>  fs/libfs.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 8aa34870449f..38b306738c00 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>  	mtree_destroy(&octx->mt);
>  }
>  
> +static int offset_dir_open(struct inode *inode, struct file *file)
> +{
> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> +
> +	file->private_data = (void *)ctx->next_offset;
> +	return 0;
> +}
> +
>  /**
>   * offset_dir_llseek - Advance the read position of a directory descriptor
>   * @file: an open directory whose position is to be updated
> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>   */
>  static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>  {
> +	struct inode *inode = file->f_inode;
> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> +
>  	switch (whence) {
>  	case SEEK_CUR:
>  		offset += file->f_pos;
> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>  	}
>  
>  	/* In this case, ->private_data is protected by f_pos_lock */
> -	file->private_data = NULL;
> +	file->private_data = (void *)ctx->next_offset;
>  	return vfs_setpos(file, offset, LONG_MAX);
>  }
>  
> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>  			  inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>  }
>  
> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>  {
>  	struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
>  	struct dentry *dentry;
> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>  	while (true) {
>  		dentry = offset_find_next(octx, ctx->pos);
>  		if (!dentry)
> -			return ERR_PTR(-ENOENT);
> +			return;
> +
> +		if (dentry2offset(dentry) >= last_index) {
> +			dput(dentry);
> +			return;
> +		}
>  
>  		if (!offset_dir_emit(ctx, dentry)) {
>  			dput(dentry);
> -			break;
> +			return;
>  		}
>  
>  		ctx->pos = dentry2offset(dentry) + 1;
>  		dput(dentry);
>  	}
> -	return NULL;
>  }
>  
>  /**
> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>  static int offset_readdir(struct file *file, struct dir_context *ctx)
>  {
>  	struct dentry *dir = file->f_path.dentry;
> +	long last_index = (long)file->private_data;
>  
>  	lockdep_assert_held(&d_inode(dir)->i_rwsem);
>  
>  	if (!dir_emit_dots(file, ctx))
>  		return 0;
>  
> -	/* In this case, ->private_data is protected by f_pos_lock */
> -	if (ctx->pos == DIR_OFFSET_MIN)
> -		file->private_data = NULL;
> -	else if (file->private_data == ERR_PTR(-ENOENT))
> -		return 0;
> -	file->private_data = offset_iterate_dir(d_inode(dir), ctx);
> +	offset_iterate_dir(d_inode(dir), ctx, last_index);
>  	return 0;
>  }
>  
>  const struct file_operations simple_offset_dir_operations = {
> +	.open		= offset_dir_open,
>  	.llseek		= offset_dir_llseek,
>  	.iterate_shared	= offset_readdir,
>  	.read		= generic_read_dir,
> -- 
> 2.39.2
>

yangerkun July 31, 2024, 12:51 p.m. UTC | #2

Hi!

在 2024/7/31 19:51, Jan Kara 写道:
> On Wed 31-07-24 12:38:35, yangerkun wrote:
>> After we switch tmpfs dir operations from simple_dir_operations to
>> simple_offset_dir_operations, every rename happened will fill new dentry
>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>> key starting with octx->newx_offset, and then set newx_offset equals to
>> free key + 1. This will lead to infinite readdir combine with rename
>> happened at the same time, which fail generic/736 in xfstests(detail show
>> as below).
>>
>> 1. create 5000 files(1 2 3...) under one dir
>> 2. call readdir(man 3 readdir) once, and get one entry
>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
>> 4. loop 2~3, until readdir return nothing or we loop too many
>>     times(tmpfs break test with the second condition)
>>
>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
>> directory reads") to fix it, record the last_index when we open dir, and
>> do not emit the entry which index >= last_index. The file->private_data
>> now used in offset dir can use directly to do this, and we also update
>> the last_index when we llseek the dir file.
> 
> The patch looks good! Just I'm not sure about the llseek part. As far as I
> understand it was added due to this sentence in the standard:
> 
> "If a file is removed from or added to the directory after the most recent
> call to opendir() or rewinddir(), whether a subsequent call to readdir()
> returns an entry for that file is unspecified."
> 
> So if the offset used in offset_dir_llseek() is 0, then we should update
> last_index. But otherwise I'd leave it alone because IMHO it would do more
> harm than good.

IIUC, what you means is that we should only reset the private_data to
new last_index when we call rewinddir(which will call lseek to set
offset of dir file to 0)?

Yeah, I prefer the logic you describle! Besides, we may also change
btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index
during a rewinddir(3) call")). Filipe, how do you think?

Thanks,
Erkun.

> 								Honza
> 
>>
>> Fixes: a2e459555c5f ("shmem: stable directory offsets")
>> Signed-off-by: yangerkun <yangerkun@huawei.com>
>> ---
>>   fs/libfs.c | 34 +++++++++++++++++++++++-----------
>>   1 file changed, 23 insertions(+), 11 deletions(-)
>>
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index 8aa34870449f..38b306738c00 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>   	mtree_destroy(&octx->mt);
>>   }
>>   
>> +static int offset_dir_open(struct inode *inode, struct file *file)
>> +{
>> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>> +
>> +	file->private_data = (void *)ctx->next_offset;
>> +	return 0;
>> +}
>> +
>>   /**
>>    * offset_dir_llseek - Advance the read position of a directory descriptor
>>    * @file: an open directory whose position is to be updated
>> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>    */
>>   static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>   {
>> +	struct inode *inode = file->f_inode;
>> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>> +
>>   	switch (whence) {
>>   	case SEEK_CUR:
>>   		offset += file->f_pos;
>> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>   	}
>>   
>>   	/* In this case, ->private_data is protected by f_pos_lock */
>> -	file->private_data = NULL;
>> +	file->private_data = (void *)ctx->next_offset;
>>   	return vfs_setpos(file, offset, LONG_MAX);
>>   }
>>   
>> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>   			  inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>>   }
>>   
>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>>   {
>>   	struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
>>   	struct dentry *dentry;
>> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>   	while (true) {
>>   		dentry = offset_find_next(octx, ctx->pos);
>>   		if (!dentry)
>> -			return ERR_PTR(-ENOENT);
>> +			return;
>> +
>> +		if (dentry2offset(dentry) >= last_index) {
>> +			dput(dentry);
>> +			return;
>> +		}
>>   
>>   		if (!offset_dir_emit(ctx, dentry)) {
>>   			dput(dentry);
>> -			break;
>> +			return;
>>   		}
>>   
>>   		ctx->pos = dentry2offset(dentry) + 1;
>>   		dput(dentry);
>>   	}
>> -	return NULL;
>>   }
>>   
>>   /**
>> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>   static int offset_readdir(struct file *file, struct dir_context *ctx)
>>   {
>>   	struct dentry *dir = file->f_path.dentry;
>> +	long last_index = (long)file->private_data;
>>   
>>   	lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>   
>>   	if (!dir_emit_dots(file, ctx))
>>   		return 0;
>>   
>> -	/* In this case, ->private_data is protected by f_pos_lock */
>> -	if (ctx->pos == DIR_OFFSET_MIN)
>> -		file->private_data = NULL;
>> -	else if (file->private_data == ERR_PTR(-ENOENT))
>> -		return 0;
>> -	file->private_data = offset_iterate_dir(d_inode(dir), ctx);
>> +	offset_iterate_dir(d_inode(dir), ctx, last_index);
>>   	return 0;
>>   }
>>   
>>   const struct file_operations simple_offset_dir_operations = {
>> +	.open		= offset_dir_open,
>>   	.llseek		= offset_dir_llseek,
>>   	.iterate_shared	= offset_readdir,
>>   	.read		= generic_read_dir,
>> -- 
>> 2.39.2
>>

Jan Kara July 31, 2024, 1:04 p.m. UTC | #3

On Wed 31-07-24 20:51:05, yangerkun wrote:
> 在 2024/7/31 19:51, Jan Kara 写道:
> > On Wed 31-07-24 12:38:35, yangerkun wrote:
> > > After we switch tmpfs dir operations from simple_dir_operations to
> > > simple_offset_dir_operations, every rename happened will fill new dentry
> > > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> > > key starting with octx->newx_offset, and then set newx_offset equals to
> > > free key + 1. This will lead to infinite readdir combine with rename
> > > happened at the same time, which fail generic/736 in xfstests(detail show
> > > as below).
> > > 
> > > 1. create 5000 files(1 2 3...) under one dir
> > > 2. call readdir(man 3 readdir) once, and get one entry
> > > 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
> > > 4. loop 2~3, until readdir return nothing or we loop too many
> > >     times(tmpfs break test with the second condition)
> > > 
> > > We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
> > > directory reads") to fix it, record the last_index when we open dir, and
> > > do not emit the entry which index >= last_index. The file->private_data
> > > now used in offset dir can use directly to do this, and we also update
> > > the last_index when we llseek the dir file.
> > 
> > The patch looks good! Just I'm not sure about the llseek part. As far as I
> > understand it was added due to this sentence in the standard:
> > 
> > "If a file is removed from or added to the directory after the most recent
> > call to opendir() or rewinddir(), whether a subsequent call to readdir()
> > returns an entry for that file is unspecified."
> > 
> > So if the offset used in offset_dir_llseek() is 0, then we should update
> > last_index. But otherwise I'd leave it alone because IMHO it would do more
> > harm than good.
> 
> IIUC, what you means is that we should only reset the private_data to
> new last_index when we call rewinddir(which will call lseek to set
> offset of dir file to 0)?

Yes, exactly. Sorry for being a bit vague.

								Honza

Filipe Manana July 31, 2024, 1:10 p.m. UTC | #4

On Wed, Jul 31, 2024 at 1:51 PM yangerkun <yangerkun@huaweicloud.com> wrote:
>
> Hi!
>
> 在 2024/7/31 19:51, Jan Kara 写道:
> > On Wed 31-07-24 12:38:35, yangerkun wrote:
> >> After we switch tmpfs dir operations from simple_dir_operations to
> >> simple_offset_dir_operations, every rename happened will fill new dentry
> >> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> >> key starting with octx->newx_offset, and then set newx_offset equals to
> >> free key + 1. This will lead to infinite readdir combine with rename
> >> happened at the same time, which fail generic/736 in xfstests(detail show
> >> as below).
> >>
> >> 1. create 5000 files(1 2 3...) under one dir
> >> 2. call readdir(man 3 readdir) once, and get one entry
> >> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
> >> 4. loop 2~3, until readdir return nothing or we loop too many
> >>     times(tmpfs break test with the second condition)
> >>
> >> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
> >> directory reads") to fix it, record the last_index when we open dir, and
> >> do not emit the entry which index >= last_index. The file->private_data
> >> now used in offset dir can use directly to do this, and we also update
> >> the last_index when we llseek the dir file.
> >
> > The patch looks good! Just I'm not sure about the llseek part. As far as I
> > understand it was added due to this sentence in the standard:
> >
> > "If a file is removed from or added to the directory after the most recent
> > call to opendir() or rewinddir(), whether a subsequent call to readdir()
> > returns an entry for that file is unspecified."
> >
> > So if the offset used in offset_dir_llseek() is 0, then we should update
> > last_index. But otherwise I'd leave it alone because IMHO it would do more
> > harm than good.
>
> IIUC, what you means is that we should only reset the private_data to
> new last_index when we call rewinddir(which will call lseek to set
> offset of dir file to 0)?
>
> Yeah, I prefer the logic you describle! Besides, we may also change
> btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index
> during a rewinddir(3) call")). Filipe, how do you think?

What problem does it solve?
The standard doesn't forbid it, and I can't see anything wrong with it.

>
> Thanks,
> Erkun.
>
> >                                                               Honza
> >
> >>
> >> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> >> Signed-off-by: yangerkun <yangerkun@huawei.com>
> >> ---
> >>   fs/libfs.c | 34 +++++++++++++++++++++++-----------
> >>   1 file changed, 23 insertions(+), 11 deletions(-)
> >>
> >> diff --git a/fs/libfs.c b/fs/libfs.c
> >> index 8aa34870449f..38b306738c00 100644
> >> --- a/fs/libfs.c
> >> +++ b/fs/libfs.c
> >> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
> >>      mtree_destroy(&octx->mt);
> >>   }
> >>
> >> +static int offset_dir_open(struct inode *inode, struct file *file)
> >> +{
> >> +    struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> >> +
> >> +    file->private_data = (void *)ctx->next_offset;
> >> +    return 0;
> >> +}
> >> +
> >>   /**
> >>    * offset_dir_llseek - Advance the read position of a directory descriptor
> >>    * @file: an open directory whose position is to be updated
> >> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
> >>    */
> >>   static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
> >>   {
> >> +    struct inode *inode = file->f_inode;
> >> +    struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> >> +
> >>      switch (whence) {
> >>      case SEEK_CUR:
> >>              offset += file->f_pos;
> >> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
> >>      }
> >>
> >>      /* In this case, ->private_data is protected by f_pos_lock */
> >> -    file->private_data = NULL;
> >> +    file->private_data = (void *)ctx->next_offset;
> >>      return vfs_setpos(file, offset, LONG_MAX);
> >>   }
> >>
> >> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
> >>                        inode->i_ino, fs_umode_to_dtype(inode->i_mode));
> >>   }
> >>
> >> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
> >> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
> >>   {
> >>      struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
> >>      struct dentry *dentry;
> >> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
> >>      while (true) {
> >>              dentry = offset_find_next(octx, ctx->pos);
> >>              if (!dentry)
> >> -                    return ERR_PTR(-ENOENT);
> >> +                    return;
> >> +
> >> +            if (dentry2offset(dentry) >= last_index) {
> >> +                    dput(dentry);
> >> +                    return;
> >> +            }
> >>
> >>              if (!offset_dir_emit(ctx, dentry)) {
> >>                      dput(dentry);
> >> -                    break;
> >> +                    return;
> >>              }
> >>
> >>              ctx->pos = dentry2offset(dentry) + 1;
> >>              dput(dentry);
> >>      }
> >> -    return NULL;
> >>   }
> >>
> >>   /**
> >> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
> >>   static int offset_readdir(struct file *file, struct dir_context *ctx)
> >>   {
> >>      struct dentry *dir = file->f_path.dentry;
> >> +    long last_index = (long)file->private_data;
> >>
> >>      lockdep_assert_held(&d_inode(dir)->i_rwsem);
> >>
> >>      if (!dir_emit_dots(file, ctx))
> >>              return 0;
> >>
> >> -    /* In this case, ->private_data is protected by f_pos_lock */
> >> -    if (ctx->pos == DIR_OFFSET_MIN)
> >> -            file->private_data = NULL;
> >> -    else if (file->private_data == ERR_PTR(-ENOENT))
> >> -            return 0;
> >> -    file->private_data = offset_iterate_dir(d_inode(dir), ctx);
> >> +    offset_iterate_dir(d_inode(dir), ctx, last_index);
> >>      return 0;
> >>   }
> >>
> >>   const struct file_operations simple_offset_dir_operations = {
> >> +    .open           = offset_dir_open,
> >>      .llseek         = offset_dir_llseek,
> >>      .iterate_shared = offset_readdir,
> >>      .read           = generic_read_dir,
> >> --
> >> 2.39.2
> >>
>

Chuck Lever July 31, 2024, 1:44 p.m. UTC | #5

On Wed, Jul 31, 2024 at 12:38:35PM +0800, yangerkun wrote:
> After we switch tmpfs dir operations from simple_dir_operations to
> simple_offset_dir_operations, every rename happened will fill new dentry
> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> key starting with octx->newx_offset, and then set newx_offset equals to
> free key + 1. This will lead to infinite readdir combine with rename
> happened at the same time, which fail generic/736 in xfstests(detail show
> as below).
> 
> 1. create 5000 files(1 2 3...) under one dir
> 2. call readdir(man 3 readdir) once, and get one entry
> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
> 4. loop 2~3, until readdir return nothing or we loop too many
>    times(tmpfs break test with the second condition)
> 
> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
> directory reads") to fix it, record the last_index when we open dir, and
> do not emit the entry which index >= last_index. The file->private_data
> now used in offset dir can use directly to do this, and we also update
> the last_index when we llseek the dir file.
> 
> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> Signed-off-by: yangerkun <yangerkun@huawei.com>

I agree with Jan's down-thread comments about llseek, but other than
that:

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>


> ---
>  fs/libfs.c | 34 +++++++++++++++++++++++-----------
>  1 file changed, 23 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 8aa34870449f..38b306738c00 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>  	mtree_destroy(&octx->mt);
>  }
>  
> +static int offset_dir_open(struct inode *inode, struct file *file)
> +{
> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> +
> +	file->private_data = (void *)ctx->next_offset;
> +	return 0;
> +}
> +
>  /**
>   * offset_dir_llseek - Advance the read position of a directory descriptor
>   * @file: an open directory whose position is to be updated
> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>   */
>  static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>  {
> +	struct inode *inode = file->f_inode;
> +	struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
> +
>  	switch (whence) {
>  	case SEEK_CUR:
>  		offset += file->f_pos;
> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>  	}
>  
>  	/* In this case, ->private_data is protected by f_pos_lock */
> -	file->private_data = NULL;
> +	file->private_data = (void *)ctx->next_offset;
>  	return vfs_setpos(file, offset, LONG_MAX);
>  }
>  
> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>  			  inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>  }
>  
> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>  {
>  	struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
>  	struct dentry *dentry;
> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>  	while (true) {
>  		dentry = offset_find_next(octx, ctx->pos);
>  		if (!dentry)
> -			return ERR_PTR(-ENOENT);
> +			return;
> +
> +		if (dentry2offset(dentry) >= last_index) {
> +			dput(dentry);
> +			return;
> +		}
>  
>  		if (!offset_dir_emit(ctx, dentry)) {
>  			dput(dentry);
> -			break;
> +			return;
>  		}
>  
>  		ctx->pos = dentry2offset(dentry) + 1;
>  		dput(dentry);
>  	}
> -	return NULL;
>  }
>  
>  /**
> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>  static int offset_readdir(struct file *file, struct dir_context *ctx)
>  {
>  	struct dentry *dir = file->f_path.dentry;
> +	long last_index = (long)file->private_data;
>  
>  	lockdep_assert_held(&d_inode(dir)->i_rwsem);
>  
>  	if (!dir_emit_dots(file, ctx))
>  		return 0;
>  
> -	/* In this case, ->private_data is protected by f_pos_lock */
> -	if (ctx->pos == DIR_OFFSET_MIN)
> -		file->private_data = NULL;
> -	else if (file->private_data == ERR_PTR(-ENOENT))
> -		return 0;
> -	file->private_data = offset_iterate_dir(d_inode(dir), ctx);
> +	offset_iterate_dir(d_inode(dir), ctx, last_index);
>  	return 0;
>  }
>  
>  const struct file_operations simple_offset_dir_operations = {
> +	.open		= offset_dir_open,
>  	.llseek		= offset_dir_llseek,
>  	.iterate_shared	= offset_readdir,
>  	.read		= generic_read_dir,
> -- 
> 2.39.2
> 
>

Christian Brauner July 31, 2024, 2:16 p.m. UTC | #6

On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote:
> After we switch tmpfs dir operations from simple_dir_operations to
> simple_offset_dir_operations, every rename happened will fill new dentry
> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> key starting with octx->newx_offset, and then set newx_offset equals to
> free key + 1. This will lead to infinite readdir combine with rename
> happened at the same time, which fail generic/736 in xfstests(detail show
> as below).
> 
> [...]

@Chuck, @Jan I did the requested change directly. Please check!

---

Applied to the vfs.fixes branch of the vfs/vfs.git tree.
Patches in the vfs.fixes branch should appear in linux-next soon.

Please report any outstanding bugs that were missed during review in a
new review to the original patch series allowing us to drop it.

It's encouraged to provide Acked-bys and Reviewed-bys even though the
patch has now been applied. If possible patch trailers will be updated.

Note that commit hashes shown below are subject to change due to rebase,
trailer updates or similar. If in doubt, please check the listed branch.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
branch: vfs.fixes

[1/1] libfs: fix infinite directory reads for offset dir
      https://git.kernel.org/vfs/vfs/c/fad90bfe412e

Jan Kara July 31, 2024, 3:37 p.m. UTC | #7

On Wed 31-07-24 16:16:42, Christian Brauner wrote:
> On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote:
> > After we switch tmpfs dir operations from simple_dir_operations to
> > simple_offset_dir_operations, every rename happened will fill new dentry
> > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> > key starting with octx->newx_offset, and then set newx_offset equals to
> > free key + 1. This will lead to infinite readdir combine with rename
> > happened at the same time, which fail generic/736 in xfstests(detail show
> > as below).
> > 
> > [...]
> 
> @Chuck, @Jan I did the requested change directly. Please check!

Thanks! Looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> 
> ---
> 
> Applied to the vfs.fixes branch of the vfs/vfs.git tree.
> Patches in the vfs.fixes branch should appear in linux-next soon.
> 
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
> 
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
> 
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.fixes
> 
> [1/1] libfs: fix infinite directory reads for offset dir
>       https://git.kernel.org/vfs/vfs/c/fad90bfe412e
>

yangerkun Aug. 1, 2024, 3:15 a.m. UTC | #8

Hi!

在 2024/7/31 21:10, Filipe Manana 写道:
> On Wed, Jul 31, 2024 at 1:51 PM yangerkun <yangerkun@huaweicloud.com> wrote:
>>
>> Hi!
>>
>> 在 2024/7/31 19:51, Jan Kara 写道:
>>> On Wed 31-07-24 12:38:35, yangerkun wrote:
>>>> After we switch tmpfs dir operations from simple_dir_operations to
>>>> simple_offset_dir_operations, every rename happened will fill new dentry
>>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>>>> key starting with octx->newx_offset, and then set newx_offset equals to
>>>> free key + 1. This will lead to infinite readdir combine with rename
>>>> happened at the same time, which fail generic/736 in xfstests(detail show
>>>> as below).
>>>>
>>>> 1. create 5000 files(1 2 3...) under one dir
>>>> 2. call readdir(man 3 readdir) once, and get one entry
>>>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
>>>> 4. loop 2~3, until readdir return nothing or we loop too many
>>>>      times(tmpfs break test with the second condition)
>>>>
>>>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
>>>> directory reads") to fix it, record the last_index when we open dir, and
>>>> do not emit the entry which index >= last_index. The file->private_data
>>>> now used in offset dir can use directly to do this, and we also update
>>>> the last_index when we llseek the dir file.
>>>
>>> The patch looks good! Just I'm not sure about the llseek part. As far as I
>>> understand it was added due to this sentence in the standard:
>>>
>>> "If a file is removed from or added to the directory after the most recent
>>> call to opendir() or rewinddir(), whether a subsequent call to readdir()
>>> returns an entry for that file is unspecified."
>>>
>>> So if the offset used in offset_dir_llseek() is 0, then we should update
>>> last_index. But otherwise I'd leave it alone because IMHO it would do more
>>> harm than good.
>>
>> IIUC, what you means is that we should only reset the private_data to
>> new last_index when we call rewinddir(which will call lseek to set
>> offset of dir file to 0)?
>>
>> Yeah, I prefer the logic you describle! Besides, we may also change
>> btrfs that do the same(e60aa5da14d0 ("btrfs: refresh dir last index
>> during a rewinddir(3) call")). Filipe, how do you think?
> 
> What problem does it solve?
> The standard doesn't forbid it, and I can't see anything wrong with it.


Yeah, I didn't find any obvious bugs for this behavior. It's a choose
does this seems better, and I think it's also ok to keep btrfs not
change now. Thanks for your reply!

> 
>>
>> Thanks,
>> Erkun.
>>
>>>                                                                Honza
>>>
>>>>
>>>> Fixes: a2e459555c5f ("shmem: stable directory offsets")
>>>> Signed-off-by: yangerkun <yangerkun@huawei.com>
>>>> ---
>>>>    fs/libfs.c | 34 +++++++++++++++++++++++-----------
>>>>    1 file changed, 23 insertions(+), 11 deletions(-)
>>>>
>>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>>> index 8aa34870449f..38b306738c00 100644
>>>> --- a/fs/libfs.c
>>>> +++ b/fs/libfs.c
>>>> @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>>>       mtree_destroy(&octx->mt);
>>>>    }
>>>>
>>>> +static int offset_dir_open(struct inode *inode, struct file *file)
>>>> +{
>>>> +    struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>>>> +
>>>> +    file->private_data = (void *)ctx->next_offset;
>>>> +    return 0;
>>>> +}
>>>> +
>>>>    /**
>>>>     * offset_dir_llseek - Advance the read position of a directory descriptor
>>>>     * @file: an open directory whose position is to be updated
>>>> @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>>>     */
>>>>    static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>>>    {
>>>> +    struct inode *inode = file->f_inode;
>>>> +    struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>>>> +
>>>>       switch (whence) {
>>>>       case SEEK_CUR:
>>>>               offset += file->f_pos;
>>>> @@ -476,7 +487,7 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>>>       }
>>>>
>>>>       /* In this case, ->private_data is protected by f_pos_lock */
>>>> -    file->private_data = NULL;
>>>> +    file->private_data = (void *)ctx->next_offset;
>>>>       return vfs_setpos(file, offset, LONG_MAX);
>>>>    }
>>>>
>>>> @@ -507,7 +518,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>>>                         inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>>>>    }
>>>>
>>>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>>>>    {
>>>>       struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode);
>>>>       struct dentry *dentry;
>>>> @@ -515,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>>       while (true) {
>>>>               dentry = offset_find_next(octx, ctx->pos);
>>>>               if (!dentry)
>>>> -                    return ERR_PTR(-ENOENT);
>>>> +                    return;
>>>> +
>>>> +            if (dentry2offset(dentry) >= last_index) {
>>>> +                    dput(dentry);
>>>> +                    return;
>>>> +            }
>>>>
>>>>               if (!offset_dir_emit(ctx, dentry)) {
>>>>                       dput(dentry);
>>>> -                    break;
>>>> +                    return;
>>>>               }
>>>>
>>>>               ctx->pos = dentry2offset(dentry) + 1;
>>>>               dput(dentry);
>>>>       }
>>>> -    return NULL;
>>>>    }
>>>>
>>>>    /**
>>>> @@ -552,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>>    static int offset_readdir(struct file *file, struct dir_context *ctx)
>>>>    {
>>>>       struct dentry *dir = file->f_path.dentry;
>>>> +    long last_index = (long)file->private_data;
>>>>
>>>>       lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>>>
>>>>       if (!dir_emit_dots(file, ctx))
>>>>               return 0;
>>>>
>>>> -    /* In this case, ->private_data is protected by f_pos_lock */
>>>> -    if (ctx->pos == DIR_OFFSET_MIN)
>>>> -            file->private_data = NULL;
>>>> -    else if (file->private_data == ERR_PTR(-ENOENT))
>>>> -            return 0;
>>>> -    file->private_data = offset_iterate_dir(d_inode(dir), ctx);
>>>> +    offset_iterate_dir(d_inode(dir), ctx, last_index);
>>>>       return 0;
>>>>    }
>>>>
>>>>    const struct file_operations simple_offset_dir_operations = {
>>>> +    .open           = offset_dir_open,
>>>>       .llseek         = offset_dir_llseek,
>>>>       .iterate_shared = offset_readdir,
>>>>       .read           = generic_read_dir,
>>>> --
>>>> 2.39.2
>>>>
>>

yangerkun Aug. 1, 2024, 3:32 a.m. UTC | #9

Hi!

在 2024/7/31 22:16, Christian Brauner 写道:
> On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote:
>> After we switch tmpfs dir operations from simple_dir_operations to
>> simple_offset_dir_operations, every rename happened will fill new dentry
>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>> key starting with octx->newx_offset, and then set newx_offset equals to
>> free key + 1. This will lead to infinite readdir combine with rename
>> happened at the same time, which fail generic/736 in xfstests(detail show
>> as below).
>>
>> [...]
> 
> @Chuck, @Jan I did the requested change directly. Please check!

Thanks for applied this patch, the suggestions from Jan and Chuck will
be a separates patch!


> 
> ---
> 
> Applied to the vfs.fixes branch of the vfs/vfs.git tree.
> Patches in the vfs.fixes branch should appear in linux-next soon.
> 
> Please report any outstanding bugs that were missed during review in a
> new review to the original patch series allowing us to drop it.
> 
> It's encouraged to provide Acked-bys and Reviewed-bys even though the
> patch has now been applied. If possible patch trailers will be updated.
> 
> Note that commit hashes shown below are subject to change due to rebase,
> trailer updates or similar. If in doubt, please check the listed branch.
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git
> branch: vfs.fixes
> 
> [1/1] libfs: fix infinite directory reads for offset dir
>        https://git.kernel.org/vfs/vfs/c/fad90bfe412e

Jan Kara Aug. 1, 2024, 1:30 p.m. UTC | #10

On Thu 01-08-24 11:32:25, yangerkun wrote:
> Hi!
> 
> 在 2024/7/31 22:16, Christian Brauner 写道:
> > On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote:
> > > After we switch tmpfs dir operations from simple_dir_operations to
> > > simple_offset_dir_operations, every rename happened will fill new dentry
> > > to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
> > > key starting with octx->newx_offset, and then set newx_offset equals to
> > > free key + 1. This will lead to infinite readdir combine with rename
> > > happened at the same time, which fail generic/736 in xfstests(detail show
> > > as below).
> > > 
> > > [...]
> > 
> > @Chuck, @Jan I did the requested change directly. Please check!
> 
> Thanks for applied this patch, the suggestions from Jan and Chuck will
> be a separates patch!

Christian already updated the patch as I've suggested so no need for you
to send anything.

								Honza

yangerkun Aug. 1, 2024, 1:38 p.m. UTC | #11

在 2024/8/1 21:30, Jan Kara 写道:
> On Thu 01-08-24 11:32:25, yangerkun wrote:
>> Hi!
>>
>> 在 2024/7/31 22:16, Christian Brauner 写道:
>>> On Wed, 31 Jul 2024 12:38:35 +0800, yangerkun wrote:
>>>> After we switch tmpfs dir operations from simple_dir_operations to
>>>> simple_offset_dir_operations, every rename happened will fill new dentry
>>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>>>> key starting with octx->newx_offset, and then set newx_offset equals to
>>>> free key + 1. This will lead to infinite readdir combine with rename
>>>> happened at the same time, which fail generic/736 in xfstests(detail show
>>>> as below).
>>>>
>>>> [...]
>>>
>>> @Chuck, @Jan I did the requested change directly. Please check!
>>
>> Thanks for applied this patch, the suggestions from Jan and Chuck will
>> be a separates patch!
> 
> Christian already updated the patch as I've suggested so no need for you
> to send anything.

OK, thanks for that!
> 
> 								Honza
>

libfs: fix infinite directory reads for offset dir

Commit Message

Comments

Patch