diff mbox series

fuse: fix illegal access to inode with reused nodeid

Message ID 20210609181158.479781-1-amir73il@gmail.com (mailing list archive)
State New
Headers show
Series fuse: fix illegal access to inode with reused nodeid | expand

Commit Message

Amir Goldstein June 9, 2021, 6:11 p.m. UTC
Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
with outarg containing nodeid and generation.

If a fuse inode is found in inode cache with the same nodeid but
different generation, the existing fuse inode should be unhashed and
marked "bad" and a new inode with the new generation should be hashed
instead.

This can happen, for example, with passhrough fuse filesystem that
returns the real filesystem ino/generation on lookup and where real inode
numbers can get recycled due to real files being unlinked not via the fuse
passthrough filesystem.

With current code, this situation will not be detected and an old fuse
dentry that used to point to an older generation real inode, can be used
to access a completely new inode, which should be accessed only via the
new dentry.

Note that because the FORGET message carries the nodeid w/o generation,
the server should wait to get FORGET counts for the nlookup counts of
the old and reused inodes combined, before it can free the resources
associated to that nodeid.

Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
---

Miklos,

I was able to reproduce this issue with a passthrough fs that stored
ino+generation and uses then to open fd on lookup.

I extended libfuse's test_syscalls [1] program to demonstrate the issue
described in commit message.

Max, IIUC, you are making a modification to virtiofs-rs that would
result is being exposed to this bug.  You are welcome to try out the
test and let me know if you can reproduce the issue.

Note that some test_syscalls test fail with cache enabled, so libfuse's
test_examples.py only runs test_syscalls in cache disabled config.

Thanks,
Amir.

[1] https://github.com/amir73il/libfuse/commits/test-reused-inodes

 fs/fuse/dir.c     | 3 ++-
 fs/fuse/fuse_i.h  | 9 +++++++++
 fs/fuse/inode.c   | 4 ++--
 fs/fuse/readdir.c | 7 +++++--
 4 files changed, 18 insertions(+), 5 deletions(-)

Comments

Vivek Goyal June 11, 2021, 4:26 p.m. UTC | #1
On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> with outarg containing nodeid and generation.
> 
> If a fuse inode is found in inode cache with the same nodeid but
> different generation, the existing fuse inode should be unhashed and
> marked "bad" and a new inode with the new generation should be hashed
> instead.
> 
> This can happen, for example, with passhrough fuse filesystem that
> returns the real filesystem ino/generation on lookup and where real inode
> numbers can get recycled due to real files being unlinked not via the fuse
> passthrough filesystem.
> 
> With current code, this situation will not be detected and an old fuse
> dentry that used to point to an older generation real inode, can be used
> to access a completely new inode, which should be accessed only via the
> new dentry.

Hi Amir,

Curious that how server gets access to new inode on host. If server
keeps an fd open to file, then we will continue to refer to old
unlinked file. Well in that case inode number can't be recycled to
begin with, so this situation does not arise to begin with.

If server is keeping file handles (like Max's patches) and file gets
recycled and inode number recycled, then I am assuming old inode in
server can't resolve that file handle because that file is gone
and a new file/inode is in place. IOW, I am assuming open_by_handle_at()
should fail in this case.

IOW, IIUC, even if we refer to old inode, server does not have a 
way to provide access to new file (with reused inode number). And
will be forced to return -ESTALE or something like that?  Did I 
miss the point completely?

> 
> Note that because the FORGET message carries the nodeid w/o generation,
> the server should wait to get FORGET counts for the nlookup counts of
> the old and reused inodes combined, before it can free the resources
> associated to that nodeid.

This seems like an odd piece. Wondering if it will make sense to enhance
FORGET message to also send generation number so that server does not
have to keep both the inodes around.

Thanks
Vivek

> 
> Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Miklos,
> 
> I was able to reproduce this issue with a passthrough fs that stored
> ino+generation and uses then to open fd on lookup.
> 
> I extended libfuse's test_syscalls [1] program to demonstrate the issue
> described in commit message.
> 
> Max, IIUC, you are making a modification to virtiofs-rs that would
> result is being exposed to this bug.  You are welcome to try out the
> test and let me know if you can reproduce the issue.
> 
> Note that some test_syscalls test fail with cache enabled, so libfuse's
> test_examples.py only runs test_syscalls in cache disabled config.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/libfuse/commits/test-reused-inodes
> 
>  fs/fuse/dir.c     | 3 ++-
>  fs/fuse/fuse_i.h  | 9 +++++++++
>  fs/fuse/inode.c   | 4 ++--
>  fs/fuse/readdir.c | 7 +++++--
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 1b6c001a7dd1..b06628fd7d8e 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -239,7 +239,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>  		if (!ret) {
>  			fi = get_fuse_inode(inode);
>  			if (outarg.nodeid != get_node_id(inode) ||
> -			    (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
> +			    fuse_stale_inode(inode, outarg.generation,
> +					     &outarg.attr)) {
>  				fuse_queue_forget(fm->fc, forget,
>  						  outarg.nodeid, 1);
>  				goto invalid;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7e463e220053..f1bd28c176a9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -867,6 +867,15 @@ static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
>  	return atomic64_read(&fc->attr_version);
>  }
>  
> +static inline bool fuse_stale_inode(const struct inode *inode, int generation,
> +				    struct fuse_attr *attr)
> +{
> +	return inode->i_generation != generation ||
> +		inode_wrong_type(inode, attr->mode) ||
> +		(bool) IS_AUTOMOUNT(inode) !=
> +		(bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
> +}
> +
>  static inline void fuse_make_bad(struct inode *inode)
>  {
>  	remove_inode_hash(inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 393e36b74dc4..257bb3e1cac8 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -350,8 +350,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
>  		inode->i_generation = generation;
>  		fuse_init_inode(inode, attr);
>  		unlock_new_inode(inode);
> -	} else if (inode_wrong_type(inode, attr->mode)) {
> -		/* Inode has changed type, any I/O on the old should fail */
> +	} else if (fuse_stale_inode(inode, generation, attr)) {
> +		/* nodeid was reused, any I/O on the old inode should fail */
>  		fuse_make_bad(inode);
>  		iput(inode);
>  		goto retry;
> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> index 277f7041d55a..bc267832310c 100644
> --- a/fs/fuse/readdir.c
> +++ b/fs/fuse/readdir.c
> @@ -200,9 +200,12 @@ static int fuse_direntplus_link(struct file *file,
>  	if (!d_in_lookup(dentry)) {
>  		struct fuse_inode *fi;
>  		inode = d_inode(dentry);
> +		if (inode && get_node_id(inode) != o->nodeid)
> +			inode = NULL;
>  		if (!inode ||
> -		    get_node_id(inode) != o->nodeid ||
> -		    inode_wrong_type(inode, o->attr.mode)) {
> +		    fuse_stale_inode(inode, o->generation, &o->attr)) {
> +			if (inode)
> +				fuse_make_bad(inode);
>  			d_invalidate(dentry);
>  			dput(dentry);
>  			goto retry;
> -- 
> 2.31.1
>
Amir Goldstein June 11, 2021, 5:44 p.m. UTC | #2
On Fri, Jun 11, 2021 at 7:26 PM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> > Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> > with outarg containing nodeid and generation.
> >
> > If a fuse inode is found in inode cache with the same nodeid but
> > different generation, the existing fuse inode should be unhashed and
> > marked "bad" and a new inode with the new generation should be hashed
> > instead.
> >
> > This can happen, for example, with passhrough fuse filesystem that
> > returns the real filesystem ino/generation on lookup and where real inode
> > numbers can get recycled due to real files being unlinked not via the fuse
> > passthrough filesystem.
> >
> > With current code, this situation will not be detected and an old fuse
> > dentry that used to point to an older generation real inode, can be used
> > to access a completely new inode, which should be accessed only via the
> > new dentry.
>
> Hi Amir,
>
> Curious that how server gets access to new inode on host. If server
> keeps an fd open to file, then we will continue to refer to old
> unlinked file. Well in that case inode number can't be recycled to
> begin with, so this situation does not arise to begin with.
>

Therefore, none of the example fs in libfuse exhibit the bug.

> If server is keeping file handles (like Max's patches) and file gets
> recycled and inode number recycled, then I am assuming old inode in
> server can't resolve that file handle because that file is gone
> and a new file/inode is in place. IOW, I am assuming open_by_handle_at()
> should fail in this case.
>
> IOW, IIUC, even if we refer to old inode, server does not have a
> way to provide access to new file (with reused inode number). And
> will be forced to return -ESTALE or something like that?  Did I
> miss the point completely?
>

Yes :-) it is much more simple than that.
I will explain with an example from the test in link [1]:

test_syscalls has ~50 test cases.
Each test case (or some) create a file named testfile.$n
some test cases truncate the file, some chmod, whatever.
At the end of each test case the file is closed and unlinked.
This means that the server if run over ext4/xfs very likely reuses
the same inode number in many test cases.

Normally, unlinking the testfile will drop the inode refcount to zero
and kernel will evict the inode and send FORGER before the server
creates another file with the same inode number.

I modified the test to keep an open O_PATH fd of the testfiles
around until the end of the test.
This does not keep the file open on the server, so the real inode
number can and does get reused, but it does keep the inode
with elevated refcount in the kernel, so there is no final FORGET
to the server.

Now the server gets a CREATE for the next testfile and it happens
to find a file with an inode number that already exists in the server
with a different generation.

The server has no problem detecting this situation, but what can the
server do about it? If server returns success, the existing kernel
inode will now refer to the new server object.
If the server returns failure, this is a permanent failure.

My filesystem used to free the existing inode object and replace it with
a new one, but the same ino will keep getting FORGET messages from
the old kernel inode, so needed to remember the old nlookup.

The server can send an invalidate command for the inode, but that
won't make the kernel inode go away nor be marked "bad".

Eventually, at the end of the test_syscalls, my modification iterates
on all the O_PATH fd's, which correspond to different dentries, most
of them now pointing at the same inode object and fstat() on most of
those fd's return the same ino/size/mode, which is not a match to the
file that O_PATH fd used to refer to. IOW, you got to peek at the
content of a file that is not yours at all.

> >
> > Note that because the FORGET message carries the nodeid w/o generation,
> > the server should wait to get FORGET counts for the nlookup counts of
> > the old and reused inodes combined, before it can free the resources
> > associated to that nodeid.
>
> This seems like an odd piece. Wondering if it will make sense to enhance
> FORGET message to also send generation number so that server does not
> have to keep both the inodes around.

The server does not keep both inodes.
The server has a single object which is referenced by ino, because
all protocol messages identify with only ino.

When the underlying fs reuses an inode number, the server will reuse the
inode object as well (freeing all resources that were relevant to the old file),
but same as the underlying filesystem keeps a generation in the inode object,
so does the server.

Regarding nlookup count, I cannot think of a better way to address this
nor do I see any problem with keeping a balance count of LOOKUP/FORGET
the balance should work fine per ino, regardless of generation, as long as
we make sure the fuse kernel driver has a single "live" inode object per ino
at all times (it can have many "bad" inode objects).

Not sure if above is clear, but the result is that fuse driver has several inode
objects, one hashed and some unhashed and when all are finally evicted,
the server nlookup count per ino will level at 0 and the server can
free the inode
object.

Thanks,
Amir.
Vivek Goyal June 11, 2021, 9:33 p.m. UTC | #3
On Fri, Jun 11, 2021 at 08:44:16PM +0300, Amir Goldstein wrote:
> On Fri, Jun 11, 2021 at 7:26 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> >
> > On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> > > Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> > > with outarg containing nodeid and generation.
> > >
> > > If a fuse inode is found in inode cache with the same nodeid but
> > > different generation, the existing fuse inode should be unhashed and
> > > marked "bad" and a new inode with the new generation should be hashed
> > > instead.
> > >
> > > This can happen, for example, with passhrough fuse filesystem that
> > > returns the real filesystem ino/generation on lookup and where real inode
> > > numbers can get recycled due to real files being unlinked not via the fuse
> > > passthrough filesystem.
> > >
> > > With current code, this situation will not be detected and an old fuse
> > > dentry that used to point to an older generation real inode, can be used
> > > to access a completely new inode, which should be accessed only via the
> > > new dentry.
> >
> > Hi Amir,
> >
> > Curious that how server gets access to new inode on host. If server
> > keeps an fd open to file, then we will continue to refer to old
> > unlinked file. Well in that case inode number can't be recycled to
> > begin with, so this situation does not arise to begin with.
> >
> 
> Therefore, none of the example fs in libfuse exhibit the bug.
> 
> > If server is keeping file handles (like Max's patches) and file gets
> > recycled and inode number recycled, then I am assuming old inode in
> > server can't resolve that file handle because that file is gone
> > and a new file/inode is in place. IOW, I am assuming open_by_handle_at()
> > should fail in this case.
> >
> > IOW, IIUC, even if we refer to old inode, server does not have a
> > way to provide access to new file (with reused inode number). And
> > will be forced to return -ESTALE or something like that?  Did I
> > miss the point completely?
> >
> 
> Yes :-) it is much more simple than that.
> I will explain with an example from the test in link [1]:
> 
> test_syscalls has ~50 test cases.
> Each test case (or some) create a file named testfile.$n
> some test cases truncate the file, some chmod, whatever.
> At the end of each test case the file is closed and unlinked.
> This means that the server if run over ext4/xfs very likely reuses
> the same inode number in many test cases.
> 
> Normally, unlinking the testfile will drop the inode refcount to zero
> and kernel will evict the inode and send FORGER before the server
> creates another file with the same inode number.
> 
> I modified the test to keep an open O_PATH fd of the testfiles
> around until the end of the test.
> This does not keep the file open on the server, so the real inode
> number can and does get reused, but it does keep the inode
> with elevated refcount in the kernel, so there is no final FORGET
> to the server.
> 
> Now the server gets a CREATE for the next testfile and it happens
> to find a file with an inode number that already exists in the server
> with a different generation.
> 
> The server has no problem detecting this situation, but what can the
> server do about it? If server returns success, the existing kernel
> inode will now refer to the new server object.
> If the server returns failure, this is a permanent failure.

Hi Amir,

Thanks for the detailed explanation. I guess I am beginning to understand
it now.

In above example, when CREATE comes along and server detects that
inode it has in cache has same inode number but different generation,
then problem can be solved if it creates a new inode and new node
id) and sends back new inode id instead? But I guess your file
server is using real inode number as inode id and you can't do
that and that's why facing the issue?

> 
> My filesystem used to free the existing inode object and replace it with
> a new one, but the same ino will keep getting FORGET messages from
> the old kernel inode, so needed to remember the old nlookup.
> 
> The server can send an invalidate command for the inode, but that
> won't make the kernel inode go away nor be marked "bad".
> 
> Eventually, at the end of the test_syscalls, my modification iterates
> on all the O_PATH fd's, which correspond to different dentries, most
> of them now pointing at the same inode object and fstat() on most of
> those fd's return the same ino/size/mode, which is not a match to the
> file that O_PATH fd used to refer to. IOW, you got to peek at the
> content of a file that is not yours at all.

Got it. So with your invalidation patch, inode (opened with O_PATH)
will be marked bad and if you do fstat() on this, fuse will return
-EIO, instead of stats of new file which reused inode number, right?

What happens in following scenario.

- You have file open with O_PATH.
- Somebody unlinked the file on server and put a new file which
  reused inode number.
- Now I do fstat(fd). 

I am assuming in this case I will still be able to get stats of new
file? Or your server implementation detects that its not same
file anymore and returns an error instead?

Thanks
Vivek

> 
> > >
> > > Note that because the FORGET message carries the nodeid w/o generation,
> > > the server should wait to get FORGET counts for the nlookup counts of
> > > the old and reused inodes combined, before it can free the resources
> > > associated to that nodeid.
> >
> > This seems like an odd piece. Wondering if it will make sense to enhance
> > FORGET message to also send generation number so that server does not
> > have to keep both the inodes around.
> 
> The server does not keep both inodes.
> The server has a single object which is referenced by ino, because
> all protocol messages identify with only ino.
> 
> When the underlying fs reuses an inode number, the server will reuse the
> inode object as well (freeing all resources that were relevant to the old file),
> but same as the underlying filesystem keeps a generation in the inode object,
> so does the server.
> 
> Regarding nlookup count, I cannot think of a better way to address this
> nor do I see any problem with keeping a balance count of LOOKUP/FORGET
> the balance should work fine per ino, regardless of generation, as long as
> we make sure the fuse kernel driver has a single "live" inode object per ino
> at all times (it can have many "bad" inode objects).
> 
> Not sure if above is clear, but the result is that fuse driver has several inode
> objects, one hashed and some unhashed and when all are finally evicted,
> the server nlookup count per ino will level at 0 and the server can
> free the inode
> object.
> 
> Thanks,
> Amir.
>
Amir Goldstein June 11, 2021, 11:13 p.m. UTC | #4
On Sat, Jun 12, 2021 at 12:33 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Fri, Jun 11, 2021 at 08:44:16PM +0300, Amir Goldstein wrote:
> > On Fri, Jun 11, 2021 at 7:26 PM Vivek Goyal <vgoyal@redhat.com> wrote:
> > >
> > > On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> > > > Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> > > > with outarg containing nodeid and generation.
> > > >
> > > > If a fuse inode is found in inode cache with the same nodeid but
> > > > different generation, the existing fuse inode should be unhashed and
> > > > marked "bad" and a new inode with the new generation should be hashed
> > > > instead.
> > > >
> > > > This can happen, for example, with passhrough fuse filesystem that
> > > > returns the real filesystem ino/generation on lookup and where real inode
> > > > numbers can get recycled due to real files being unlinked not via the fuse
> > > > passthrough filesystem.
> > > >
> > > > With current code, this situation will not be detected and an old fuse
> > > > dentry that used to point to an older generation real inode, can be used
> > > > to access a completely new inode, which should be accessed only via the
> > > > new dentry.
> > >
> > > Hi Amir,
> > >
> > > Curious that how server gets access to new inode on host. If server
> > > keeps an fd open to file, then we will continue to refer to old
> > > unlinked file. Well in that case inode number can't be recycled to
> > > begin with, so this situation does not arise to begin with.
> > >
> >
> > Therefore, none of the example fs in libfuse exhibit the bug.
> >
> > > If server is keeping file handles (like Max's patches) and file gets
> > > recycled and inode number recycled, then I am assuming old inode in
> > > server can't resolve that file handle because that file is gone
> > > and a new file/inode is in place. IOW, I am assuming open_by_handle_at()
> > > should fail in this case.
> > >
> > > IOW, IIUC, even if we refer to old inode, server does not have a
> > > way to provide access to new file (with reused inode number). And
> > > will be forced to return -ESTALE or something like that?  Did I
> > > miss the point completely?
> > >
> >
> > Yes :-) it is much more simple than that.
> > I will explain with an example from the test in link [1]:
> >
> > test_syscalls has ~50 test cases.
> > Each test case (or some) create a file named testfile.$n
> > some test cases truncate the file, some chmod, whatever.
> > At the end of each test case the file is closed and unlinked.
> > This means that the server if run over ext4/xfs very likely reuses
> > the same inode number in many test cases.
> >
> > Normally, unlinking the testfile will drop the inode refcount to zero
> > and kernel will evict the inode and send FORGER before the server
> > creates another file with the same inode number.
> >
> > I modified the test to keep an open O_PATH fd of the testfiles
> > around until the end of the test.
> > This does not keep the file open on the server, so the real inode
> > number can and does get reused, but it does keep the inode
> > with elevated refcount in the kernel, so there is no final FORGET
> > to the server.
> >
> > Now the server gets a CREATE for the next testfile and it happens
> > to find a file with an inode number that already exists in the server
> > with a different generation.
> >
> > The server has no problem detecting this situation, but what can the
> > server do about it? If server returns success, the existing kernel
> > inode will now refer to the new server object.
> > If the server returns failure, this is a permanent failure.
>
> Hi Amir,
>
> Thanks for the detailed explanation. I guess I am beginning to understand
> it now.
>
> In above example, when CREATE comes along and server detects that
> inode it has in cache has same inode number but different generation,
> then problem can be solved if it creates a new inode and new node
> id) and sends back new inode id instead? But I guess your file
> server is using real inode number as inode id and you can't do
> that and that's why facing the issue?
>

That is correct.
For fs with 32bit ino (ext4) I encode nodeid from ino+generation
so there is no nodeid reuse issue.
Since I need persistent inode numbers it would be impractical to
keep a persistent mapping of real ino to nodeid in a db.
And besides, FUSE protocol returns generation in LOOKUP
response for a reason - this result must not be linked to an existing
inode with previous generation in the FUSE inode cache.

> >
> > My filesystem used to free the existing inode object and replace it with
> > a new one, but the same ino will keep getting FORGET messages from
> > the old kernel inode, so needed to remember the old nlookup.
> >
> > The server can send an invalidate command for the inode, but that
> > won't make the kernel inode go away nor be marked "bad".
> >
> > Eventually, at the end of the test_syscalls, my modification iterates
> > on all the O_PATH fd's, which correspond to different dentries, most
> > of them now pointing at the same inode object and fstat() on most of
> > those fd's return the same ino/size/mode, which is not a match to the
> > file that O_PATH fd used to refer to. IOW, you got to peek at the
> > content of a file that is not yours at all.
>
> Got it. So with your invalidation patch, inode (opened with O_PATH)
> will be marked bad and if you do fstat() on this, fuse will return
> -EIO, instead of stats of new file which reused inode number, right?
>

Yes.

> What happens in following scenario.
>
> - You have file open with O_PATH.
> - Somebody unlinked the file on server and put a new file which
>   reused inode number.
> - Now I do fstat(fd).
>
> I am assuming in this case I will still be able to get stats of new
> file? Or your server implementation detects that its not same
> file anymore and returns an error instead?
>

Yes, in that case, the server doesn't know about the reuse yet,
but when a request comes in with that ino, the server will find
the Inode object, use the stored file handle to try and get an
fd on the real file and will not be able to, because it's a stale file
handle.

The stale Inode object will linger in the server until either:
- O_PATH fd is closed and kernel inode is evicted OR
- The location of the reused inode is found by another LOOKUP
that will reuse the server Inode object with a new file handle

Thanks,
Amir.
Amir Goldstein June 16, 2021, 3:03 p.m. UTC | #5
On Wed, Jun 9, 2021 at 9:12 PM Amir Goldstein <amir73il@gmail.com> wrote:
>
> Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> with outarg containing nodeid and generation.
>
> If a fuse inode is found in inode cache with the same nodeid but
> different generation, the existing fuse inode should be unhashed and
> marked "bad" and a new inode with the new generation should be hashed
> instead.
>
> This can happen, for example, with passhrough fuse filesystem that
> returns the real filesystem ino/generation on lookup and where real inode
> numbers can get recycled due to real files being unlinked not via the fuse
> passthrough filesystem.
>
> With current code, this situation will not be detected and an old fuse
> dentry that used to point to an older generation real inode, can be used
> to access a completely new inode, which should be accessed only via the
> new dentry.
>
> Note that because the FORGET message carries the nodeid w/o generation,
> the server should wait to get FORGET counts for the nlookup counts of
> the old and reused inodes combined, before it can free the resources
> associated to that nodeid.
>
> Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Miklos,
>
> I was able to reproduce this issue with a passthrough fs that stored
> ino+generation and uses then to open fd on lookup.
>
> I extended libfuse's test_syscalls [1] program to demonstrate the issue
> described in commit message.
>
> Max, IIUC, you are making a modification to virtiofs-rs that would
> result is being exposed to this bug.  You are welcome to try out the
> test and let me know if you can reproduce the issue.
>
> Note that some test_syscalls test fail with cache enabled, so libfuse's
> test_examples.py only runs test_syscalls in cache disabled config.
>
> Thanks,
> Amir.
>
> [1] https://github.com/amir73il/libfuse/commits/test-reused-inodes
>

Miklos,

Not sure if you got to look at this already, but I had noticed that the
link above is broken because I deleted the branch after Nikolaus
merged it to upstream libfuse, so here is a new link to the PR [2]
with some more relevant context.

Per request from Nikolaus, I modified the passthrough_hp example
to reuse inodes on last close+unlink, so it now hits the failure in the
new test with upstream kernel and it passes the test with this kernel fix.

Thanks,
Amir.

[2] https://github.com/libfuse/libfuse/pull/612


>  fs/fuse/dir.c     | 3 ++-
>  fs/fuse/fuse_i.h  | 9 +++++++++
>  fs/fuse/inode.c   | 4 ++--
>  fs/fuse/readdir.c | 7 +++++--
>  4 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 1b6c001a7dd1..b06628fd7d8e 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -239,7 +239,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>                 if (!ret) {
>                         fi = get_fuse_inode(inode);
>                         if (outarg.nodeid != get_node_id(inode) ||
> -                           (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
> +                           fuse_stale_inode(inode, outarg.generation,
> +                                            &outarg.attr)) {
>                                 fuse_queue_forget(fm->fc, forget,
>                                                   outarg.nodeid, 1);
>                                 goto invalid;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7e463e220053..f1bd28c176a9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -867,6 +867,15 @@ static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
>         return atomic64_read(&fc->attr_version);
>  }
>
> +static inline bool fuse_stale_inode(const struct inode *inode, int generation,
> +                                   struct fuse_attr *attr)
> +{
> +       return inode->i_generation != generation ||
> +               inode_wrong_type(inode, attr->mode) ||
> +               (bool) IS_AUTOMOUNT(inode) !=
> +               (bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
> +}
> +
>  static inline void fuse_make_bad(struct inode *inode)
>  {
>         remove_inode_hash(inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 393e36b74dc4..257bb3e1cac8 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -350,8 +350,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
>                 inode->i_generation = generation;
>                 fuse_init_inode(inode, attr);
>                 unlock_new_inode(inode);
> -       } else if (inode_wrong_type(inode, attr->mode)) {
> -               /* Inode has changed type, any I/O on the old should fail */
> +       } else if (fuse_stale_inode(inode, generation, attr)) {
> +               /* nodeid was reused, any I/O on the old inode should fail */
>                 fuse_make_bad(inode);
>                 iput(inode);
>                 goto retry;
> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> index 277f7041d55a..bc267832310c 100644
> --- a/fs/fuse/readdir.c
> +++ b/fs/fuse/readdir.c
> @@ -200,9 +200,12 @@ static int fuse_direntplus_link(struct file *file,
>         if (!d_in_lookup(dentry)) {
>                 struct fuse_inode *fi;
>                 inode = d_inode(dentry);
> +               if (inode && get_node_id(inode) != o->nodeid)
> +                       inode = NULL;
>                 if (!inode ||
> -                   get_node_id(inode) != o->nodeid ||
> -                   inode_wrong_type(inode, o->attr.mode)) {
> +                   fuse_stale_inode(inode, o->generation, &o->attr)) {
> +                       if (inode)
> +                               fuse_make_bad(inode);
>                         d_invalidate(dentry);
>                         dput(dentry);
>                         goto retry;
> --
> 2.31.1
>
Nikolaus Rath June 16, 2021, 5:24 p.m. UTC | #6
Hi Amir,

On Wed, 16 Jun 2021, at 16:03, Amir Goldstein wrote:
> Per request from Nikolaus, I modified the passthrough_hp example
> to reuse inodes on last close+unlink, so it now hits the failure in the
> new test with upstream kernel and it passes the test with this kernel fix.
> 
> Thanks,
> Amir.
> 
> [2] https://github.com/libfuse/libfuse/pull/612

Actually, I am no longer sure this was a good idea. Having the libfuse test suite detect problems that with the kernel doesn't seem to helpful.. I think the testsuite should identify problems in libfuse.  Currently, having the tests means that users might be hesitant to update to the newer libfuse because of the failing test - when in fact there is nothing wrong with libfuse at all.

I assume the test will start failing on some future kernel (which is why it passed CL), and then start passing again for some kernel after that?

Best,
-Nikolaus
--
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«
Amir Goldstein June 16, 2021, 6:25 p.m. UTC | #7
On Wed, Jun 16, 2021 at 8:25 PM Nikolaus Rath <nikolaus@rath.org> wrote:
>
> Hi Amir,
>
> On Wed, 16 Jun 2021, at 16:03, Amir Goldstein wrote:
> > Per request from Nikolaus, I modified the passthrough_hp example
> > to reuse inodes on last close+unlink, so it now hits the failure in the
> > new test with upstream kernel and it passes the test with this kernel fix.
> >
> > Thanks,
> > Amir.
> >
> > [2] https://github.com/libfuse/libfuse/pull/612
>
> Actually, I am no longer sure this was a good idea. Having the libfuse test suite detect problems that with the kernel doesn't seem to helpful.. I think the testsuite should identify problems in libfuse.  Currently, having the tests means that users might be hesitant to update to the newer libfuse because of the failing test - when in fact there is nothing wrong with libfuse at all.
>

I suppose you are right.
I could take the tesy_syscalls test to xfstest, but fuse support for
xfstests is still WIP.

> I assume the test will start failing on some future kernel (which is why it passed CL), and then start passing again for some kernel after that?

I was not aware that it passes CI.
There are no test results available on github.
I am not aware of any specific kernel version where the test should pass,
but the results also depend on the underlying filesystem.
If your underlying filesystem is btrfs, it does not reuse inode numbers
at all, so the test will not fail.

For me the test fails on ext4 and xfs on LTS kernel 5.10.
As I wrote in PR:
"...Fails the modified test_syscalls in this PR on upstream kernel"

If you revert the last commit the test would pass on upstream kernel:
80f2b8b ("passthrough_hp: excercise reusing inode numbers")

We could make behavior of passthrough_hp example depend
on some minimal kernel protocol version or new kernel capability like
FUSE_SETXATTR_EXT if Miklos intends to merge the fix for the coming
kernel release or we could just make that new test optional via pytest option.

After all, regardless of the kernel bug, this adds test coverage that was
missing, so it also covers a possible future regression in libfuse.

Let me know if you want me to implement any of the listed options.

Thanks,
Amir.
Nikolaus Rath June 17, 2021, 7:52 a.m. UTC | #8
On Jun 16 2021, Amir Goldstein <amir73il@gmail.com> wrote:
> On Wed, Jun 16, 2021 at 8:25 PM Nikolaus Rath <nikolaus@rath.org> wrote:
>>
>> Hi Amir,
>>
>> On Wed, 16 Jun 2021, at 16:03, Amir Goldstein wrote:
>> > Per request from Nikolaus, I modified the passthrough_hp example
>> > to reuse inodes on last close+unlink, so it now hits the failure in the
>> > new test with upstream kernel and it passes the test with this kernel fix.
>> >
>> > Thanks,
>> > Amir.
>> >
>> > [2] https://github.com/libfuse/libfuse/pull/612
>>
>> Actually, I am no longer sure this was a good idea. Having the libfuse test suite detect
>> problems that with the kernel doesn't seem to helpful.. I think the testsuite should
>> identify problems in libfuse.  Currently, having the tests means that users might be
>> hesitant to update to the newer libfuse because of the failing test - when in fact there
>> is nothing wrong with libfuse at all.
>>
>
> I suppose you are right.
> I could take the tesy_syscalls test to xfstest, but fuse support for
> xfstests is still WIP.
>
>> I assume the test will start failing on some future kernel (which is why it passed CL),
>> and then start passing again for some kernel after that?
>
> I was not aware that it passes CI.
> There are no test results available on github.

Arg. Looks like something is broken there. I mistook the absence of
results for a passing result.

> I am not aware of any specific kernel version where the test should pass,
> but the results also depend on the underlying filesystem.
>
> If your underlying filesystem is btrfs, it does not reuse inode numbers
> at all, so the test will not fail.
>
> For me the test fails on ext4 and xfs on LTS kernel 5.10.
> As I wrote in PR:
> "...Fails the modified test_syscalls in this PR on upstream kernel"
>
> If you revert the last commit the test would pass on upstream kernel:
> 80f2b8b ("passthrough_hp: excercise reusing inode numbers")
>
> We could make behavior of passthrough_hp example depend
> on some minimal kernel protocol version or new kernel capability like
> FUSE_SETXATTR_EXT if Miklos intends to merge the fix for the coming
> kernel release or we could just make that new test optional via pytest option.
>
> After all, regardless of the kernel bug, this adds test coverage that was
> missing, so it also covers a possible future regression in libfuse.
>
> Let me know if you want me to implement any of the listed options.

I don't want an old kernel to result in libfuse unit tests failing, but
I think it's a good idea to cover this case in some form.

Would you be able to make the test conditional on a recent enough kernel
version?

Or, if that's too much work, print an error message that explains that
there is a kernel bug but do fail the test?

Best,
-Nikolaus
Amir Goldstein June 17, 2021, 1:51 p.m. UTC | #9
On Thu, Jun 17, 2021 at 10:52 AM Nikolaus Rath <Nikolaus@rath.org> wrote:
>
> On Jun 16 2021, Amir Goldstein <amir73il@gmail.com> wrote:
> > On Wed, Jun 16, 2021 at 8:25 PM Nikolaus Rath <nikolaus@rath.org> wrote:
> >>
> >> Hi Amir,
> >>
> >> On Wed, 16 Jun 2021, at 16:03, Amir Goldstein wrote:
> >> > Per request from Nikolaus, I modified the passthrough_hp example
> >> > to reuse inodes on last close+unlink, so it now hits the failure in the
> >> > new test with upstream kernel and it passes the test with this kernel fix.
> >> >
> >> > Thanks,
> >> > Amir.
> >> >
> >> > [2] https://github.com/libfuse/libfuse/pull/612
> >>
> >> Actually, I am no longer sure this was a good idea. Having the libfuse test suite detect
> >> problems that with the kernel doesn't seem to helpful.. I think the testsuite should
> >> identify problems in libfuse.  Currently, having the tests means that users might be
> >> hesitant to update to the newer libfuse because of the failing test - when in fact there
> >> is nothing wrong with libfuse at all.
> >>
> >
> > I suppose you are right.
> > I could take the tesy_syscalls test to xfstest, but fuse support for
> > xfstests is still WIP.
> >
> >> I assume the test will start failing on some future kernel (which is why it passed CL),
> >> and then start passing again for some kernel after that?
> >
> > I was not aware that it passes CI.
> > There are no test results available on github.
>
> Arg. Looks like something is broken there. I mistook the absence of
> results for a passing result.
>
> > I am not aware of any specific kernel version where the test should pass,
> > but the results also depend on the underlying filesystem.
> >
> > If your underlying filesystem is btrfs, it does not reuse inode numbers
> > at all, so the test will not fail.
> >
> > For me the test fails on ext4 and xfs on LTS kernel 5.10.
> > As I wrote in PR:
> > "...Fails the modified test_syscalls in this PR on upstream kernel"
> >
> > If you revert the last commit the test would pass on upstream kernel:
> > 80f2b8b ("passthrough_hp: excercise reusing inode numbers")
> >
> > We could make behavior of passthrough_hp example depend
> > on some minimal kernel protocol version or new kernel capability like
> > FUSE_SETXATTR_EXT if Miklos intends to merge the fix for the coming
> > kernel release or we could just make that new test optional via pytest option.
> >
> > After all, regardless of the kernel bug, this adds test coverage that was
> > missing, so it also covers a possible future regression in libfuse.
> >
> > Let me know if you want me to implement any of the listed options.
>
> I don't want an old kernel to result in libfuse unit tests failing, but
> I think it's a good idea to cover this case in some form.
>
> Would you be able to make the test conditional on a recent enough kernel
> version?
>

That looks trivial, like:
def test_write_cache(tmpdir, writeback, output_checker):
    if writeback and LooseVersion(platform.release()) < '3.14':
        pytest.skip('Requires kernel 3.14 or newer')

I'll just wait to see if Miklos takes the kernel fix to v5.13
so I know which version to use in the condition.

Thanks,
Amir.
Vivek Goyal June 17, 2021, 9:28 p.m. UTC | #10
On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> with outarg containing nodeid and generation.
> 
> If a fuse inode is found in inode cache with the same nodeid but
> different generation, the existing fuse inode should be unhashed and
> marked "bad" and a new inode with the new generation should be hashed
> instead.
> 
> This can happen, for example, with passhrough fuse filesystem that
> returns the real filesystem ino/generation on lookup and where real inode
> numbers can get recycled due to real files being unlinked not via the fuse
> passthrough filesystem.

Hi Amir,

Is the code for filesystem you have written is public? If yes, can you
please provide a link. 

Is there an API to lookup generation number from host filesystem. Or
that's something your file server updates based on file handle has
changed.

Thanks
Vivek

> 
> With current code, this situation will not be detected and an old fuse
> dentry that used to point to an older generation real inode, can be used
> to access a completely new inode, which should be accessed only via the
> new dentry.
> 
> Note that because the FORGET message carries the nodeid w/o generation,
> the server should wait to get FORGET counts for the nlookup counts of
> the old and reused inodes combined, before it can free the resources
> associated to that nodeid.
> 
> Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
> 
> Miklos,
> 
> I was able to reproduce this issue with a passthrough fs that stored
> ino+generation and uses then to open fd on lookup.
> 
> I extended libfuse's test_syscalls [1] program to demonstrate the issue
> described in commit message.
> 
> Max, IIUC, you are making a modification to virtiofs-rs that would
> result is being exposed to this bug.  You are welcome to try out the
> test and let me know if you can reproduce the issue.
> 
> Note that some test_syscalls test fail with cache enabled, so libfuse's
> test_examples.py only runs test_syscalls in cache disabled config.
> 
> Thanks,
> Amir.
> 
> [1] https://github.com/amir73il/libfuse/commits/test-reused-inodes
> 
>  fs/fuse/dir.c     | 3 ++-
>  fs/fuse/fuse_i.h  | 9 +++++++++
>  fs/fuse/inode.c   | 4 ++--
>  fs/fuse/readdir.c | 7 +++++--
>  4 files changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 1b6c001a7dd1..b06628fd7d8e 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -239,7 +239,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>  		if (!ret) {
>  			fi = get_fuse_inode(inode);
>  			if (outarg.nodeid != get_node_id(inode) ||
> -			    (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
> +			    fuse_stale_inode(inode, outarg.generation,
> +					     &outarg.attr)) {
>  				fuse_queue_forget(fm->fc, forget,
>  						  outarg.nodeid, 1);
>  				goto invalid;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7e463e220053..f1bd28c176a9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -867,6 +867,15 @@ static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
>  	return atomic64_read(&fc->attr_version);
>  }
>  
> +static inline bool fuse_stale_inode(const struct inode *inode, int generation,
> +				    struct fuse_attr *attr)
> +{
> +	return inode->i_generation != generation ||
> +		inode_wrong_type(inode, attr->mode) ||
> +		(bool) IS_AUTOMOUNT(inode) !=
> +		(bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
> +}
> +
>  static inline void fuse_make_bad(struct inode *inode)
>  {
>  	remove_inode_hash(inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 393e36b74dc4..257bb3e1cac8 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -350,8 +350,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
>  		inode->i_generation = generation;
>  		fuse_init_inode(inode, attr);
>  		unlock_new_inode(inode);
> -	} else if (inode_wrong_type(inode, attr->mode)) {
> -		/* Inode has changed type, any I/O on the old should fail */
> +	} else if (fuse_stale_inode(inode, generation, attr)) {
> +		/* nodeid was reused, any I/O on the old inode should fail */
>  		fuse_make_bad(inode);
>  		iput(inode);
>  		goto retry;
> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> index 277f7041d55a..bc267832310c 100644
> --- a/fs/fuse/readdir.c
> +++ b/fs/fuse/readdir.c
> @@ -200,9 +200,12 @@ static int fuse_direntplus_link(struct file *file,
>  	if (!d_in_lookup(dentry)) {
>  		struct fuse_inode *fi;
>  		inode = d_inode(dentry);
> +		if (inode && get_node_id(inode) != o->nodeid)
> +			inode = NULL;
>  		if (!inode ||
> -		    get_node_id(inode) != o->nodeid ||
> -		    inode_wrong_type(inode, o->attr.mode)) {
> +		    fuse_stale_inode(inode, o->generation, &o->attr)) {
> +			if (inode)
> +				fuse_make_bad(inode);
>  			d_invalidate(dentry);
>  			dput(dentry);
>  			goto retry;
> -- 
> 2.31.1
>
Amir Goldstein June 18, 2021, 6:34 a.m. UTC | #11
On Fri, Jun 18, 2021 at 12:28 AM Vivek Goyal <vgoyal@redhat.com> wrote:
>
> On Wed, Jun 09, 2021 at 09:11:58PM +0300, Amir Goldstein wrote:
> > Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> > with outarg containing nodeid and generation.
> >
> > If a fuse inode is found in inode cache with the same nodeid but
> > different generation, the existing fuse inode should be unhashed and
> > marked "bad" and a new inode with the new generation should be hashed
> > instead.
> >
> > This can happen, for example, with passhrough fuse filesystem that
> > returns the real filesystem ino/generation on lookup and where real inode
> > numbers can get recycled due to real files being unlinked not via the fuse
> > passthrough filesystem.
>
> Hi Amir,
>
> Is the code for filesystem you have written is public? If yes, can you
> please provide a link.

Yes, I provided the link in the discussion about the bug:

[1] https://github.com/amir73il/libfuse/commits/cachegwfs

>
> Is there an API to lookup generation number from host filesystem. Or

There is FS_IOC_GETVERSION which a few fs implement
(ext4/xfs/btrfs/f2fs), but I don't use it.

> that's something your file server updates based on file handle has
> changed.
>

My filesystem takes ino/gen from file handle.
The file handle format is filesystem specific - but my filesystem only
has support for ext4 and xfs, so it makes used of that to extract
the generation from file handle.

When inode number is 32bit (ext4) the entire file handle is encoded
into 64bit nodeid.

Thanks,
Amir.
Miklos Szeredi June 21, 2021, 9:27 a.m. UTC | #12
On Wed, 9 Jun 2021 at 20:12, Amir Goldstein <amir73il@gmail.com> wrote:
>
> Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> with outarg containing nodeid and generation.
>
> If a fuse inode is found in inode cache with the same nodeid but
> different generation, the existing fuse inode should be unhashed and
> marked "bad" and a new inode with the new generation should be hashed
> instead.
>
> This can happen, for example, with passhrough fuse filesystem that
> returns the real filesystem ino/generation on lookup and where real inode
> numbers can get recycled due to real files being unlinked not via the fuse
> passthrough filesystem.
>
> With current code, this situation will not be detected and an old fuse
> dentry that used to point to an older generation real inode, can be used
> to access a completely new inode, which should be accessed only via the
> new dentry.
>
> Note that because the FORGET message carries the nodeid w/o generation,
> the server should wait to get FORGET counts for the nlookup counts of
> the old and reused inodes combined, before it can free the resources
> associated to that nodeid.
>
> Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
> Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> ---
>
> Miklos,
>
> I was able to reproduce this issue with a passthrough fs that stored
> ino+generation and uses then to open fd on lookup.
>
> I extended libfuse's test_syscalls [1] program to demonstrate the issue
> described in commit message.
>
> Max, IIUC, you are making a modification to virtiofs-rs that would
> result is being exposed to this bug.  You are welcome to try out the
> test and let me know if you can reproduce the issue.
>
> Note that some test_syscalls test fail with cache enabled, so libfuse's
> test_examples.py only runs test_syscalls in cache disabled config.
>
> Thanks,
> Amir.
>
> [1] https://github.com/amir73il/libfuse/commits/test-reused-inodes
>
>  fs/fuse/dir.c     | 3 ++-
>  fs/fuse/fuse_i.h  | 9 +++++++++
>  fs/fuse/inode.c   | 4 ++--
>  fs/fuse/readdir.c | 7 +++++--
>  4 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> index 1b6c001a7dd1..b06628fd7d8e 100644
> --- a/fs/fuse/dir.c
> +++ b/fs/fuse/dir.c
> @@ -239,7 +239,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
>                 if (!ret) {
>                         fi = get_fuse_inode(inode);
>                         if (outarg.nodeid != get_node_id(inode) ||
> -                           (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
> +                           fuse_stale_inode(inode, outarg.generation,
> +                                            &outarg.attr)) {

This changes behavior in the inode_wrong_type() case.  I guess that
was not intended.


>                                 fuse_queue_forget(fm->fc, forget,
>                                                   outarg.nodeid, 1);
>                                 goto invalid;
> diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> index 7e463e220053..f1bd28c176a9 100644
> --- a/fs/fuse/fuse_i.h
> +++ b/fs/fuse/fuse_i.h
> @@ -867,6 +867,15 @@ static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
>         return atomic64_read(&fc->attr_version);
>  }
>
> +static inline bool fuse_stale_inode(const struct inode *inode, int generation,
> +                                   struct fuse_attr *attr)
> +{
> +       return inode->i_generation != generation ||
> +               inode_wrong_type(inode, attr->mode) ||
> +               (bool) IS_AUTOMOUNT(inode) !=
> +               (bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
> +}
> +
>  static inline void fuse_make_bad(struct inode *inode)
>  {
>         remove_inode_hash(inode);
> diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> index 393e36b74dc4..257bb3e1cac8 100644
> --- a/fs/fuse/inode.c
> +++ b/fs/fuse/inode.c
> @@ -350,8 +350,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
>                 inode->i_generation = generation;
>                 fuse_init_inode(inode, attr);
>                 unlock_new_inode(inode);
> -       } else if (inode_wrong_type(inode, attr->mode)) {
> -               /* Inode has changed type, any I/O on the old should fail */
> +       } else if (fuse_stale_inode(inode, generation, attr)) {

This one adds the automount check.  That should be okay, since
directories must never have aliases and such a beast would error out
anyway later on d_splice_alias.

> +               /* nodeid was reused, any I/O on the old inode should fail */
>                 fuse_make_bad(inode);
>                 iput(inode);
>                 goto retry;
> diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> index 277f7041d55a..bc267832310c 100644
> --- a/fs/fuse/readdir.c
> +++ b/fs/fuse/readdir.c
> @@ -200,9 +200,12 @@ static int fuse_direntplus_link(struct file *file,
>         if (!d_in_lookup(dentry)) {
>                 struct fuse_inode *fi;
>                 inode = d_inode(dentry);
> +               if (inode && get_node_id(inode) != o->nodeid)
> +                       inode = NULL;
>                 if (!inode ||
> -                   get_node_id(inode) != o->nodeid ||
> -                   inode_wrong_type(inode, o->attr.mode)) {
> +                   fuse_stale_inode(inode, o->generation, &o->attr)) {

Again, automount check added, I'm not at all sure how automount and
readdirplus interact.    This needs to be looked at (though it's
mostly a separate issue from this patch).

Thanks,
Miklos
Amir Goldstein June 21, 2021, 1:27 p.m. UTC | #13
On Mon, Jun 21, 2021 at 12:27 PM Miklos Szeredi <miklos@szeredi.hu> wrote:
>
> On Wed, 9 Jun 2021 at 20:12, Amir Goldstein <amir73il@gmail.com> wrote:
> >
> > Server responds to LOOKUP and other ops (READDIRPLUS/CREATE/MKNOD/...)
> > with outarg containing nodeid and generation.
> >
> > If a fuse inode is found in inode cache with the same nodeid but
> > different generation, the existing fuse inode should be unhashed and
> > marked "bad" and a new inode with the new generation should be hashed
> > instead.
> >
> > This can happen, for example, with passhrough fuse filesystem that
> > returns the real filesystem ino/generation on lookup and where real inode
> > numbers can get recycled due to real files being unlinked not via the fuse
> > passthrough filesystem.
> >
> > With current code, this situation will not be detected and an old fuse
> > dentry that used to point to an older generation real inode, can be used
> > to access a completely new inode, which should be accessed only via the
> > new dentry.
> >
> > Note that because the FORGET message carries the nodeid w/o generation,
> > the server should wait to get FORGET counts for the nlookup counts of
> > the old and reused inodes combined, before it can free the resources
> > associated to that nodeid.
> >
> > Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxgDMGUpK35huwqFYGH_idBB8S6eLiz85o0DDKOyDH4Syg@mail.gmail.com/
> > Signed-off-by: Amir Goldstein <amir73il@gmail.com>
> > ---
> >
> > Miklos,
> >
> > I was able to reproduce this issue with a passthrough fs that stored
> > ino+generation and uses then to open fd on lookup.
> >
> > I extended libfuse's test_syscalls [1] program to demonstrate the issue
> > described in commit message.
> >
> > Max, IIUC, you are making a modification to virtiofs-rs that would
> > result is being exposed to this bug.  You are welcome to try out the
> > test and let me know if you can reproduce the issue.
> >
> > Note that some test_syscalls test fail with cache enabled, so libfuse's
> > test_examples.py only runs test_syscalls in cache disabled config.
> >
> > Thanks,
> > Amir.
> >
> > [1] https://github.com/amir73il/libfuse/commits/test-reused-inodes
> >
> >  fs/fuse/dir.c     | 3 ++-
> >  fs/fuse/fuse_i.h  | 9 +++++++++
> >  fs/fuse/inode.c   | 4 ++--
> >  fs/fuse/readdir.c | 7 +++++--
> >  4 files changed, 18 insertions(+), 5 deletions(-)
> >
> > diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
> > index 1b6c001a7dd1..b06628fd7d8e 100644
> > --- a/fs/fuse/dir.c
> > +++ b/fs/fuse/dir.c
> > @@ -239,7 +239,8 @@ static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
> >                 if (!ret) {
> >                         fi = get_fuse_inode(inode);
> >                         if (outarg.nodeid != get_node_id(inode) ||
> > -                           (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
> > +                           fuse_stale_inode(inode, outarg.generation,
> > +                                            &outarg.attr)) {
>
> This changes behavior in the inode_wrong_type() case.  I guess that
> was not intended.
>

Right. fixed in v2.

>
> >                                 fuse_queue_forget(fm->fc, forget,
> >                                                   outarg.nodeid, 1);
> >                                 goto invalid;
> > diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
> > index 7e463e220053..f1bd28c176a9 100644
> > --- a/fs/fuse/fuse_i.h
> > +++ b/fs/fuse/fuse_i.h
> > @@ -867,6 +867,15 @@ static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
> >         return atomic64_read(&fc->attr_version);
> >  }
> >
> > +static inline bool fuse_stale_inode(const struct inode *inode, int generation,
> > +                                   struct fuse_attr *attr)
> > +{
> > +       return inode->i_generation != generation ||
> > +               inode_wrong_type(inode, attr->mode) ||
> > +               (bool) IS_AUTOMOUNT(inode) !=
> > +               (bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
> > +}
> > +
> >  static inline void fuse_make_bad(struct inode *inode)
> >  {
> >         remove_inode_hash(inode);
> > diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
> > index 393e36b74dc4..257bb3e1cac8 100644
> > --- a/fs/fuse/inode.c
> > +++ b/fs/fuse/inode.c
> > @@ -350,8 +350,8 @@ struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
> >                 inode->i_generation = generation;
> >                 fuse_init_inode(inode, attr);
> >                 unlock_new_inode(inode);
> > -       } else if (inode_wrong_type(inode, attr->mode)) {
> > -               /* Inode has changed type, any I/O on the old should fail */
> > +       } else if (fuse_stale_inode(inode, generation, attr)) {
>
> This one adds the automount check.  That should be okay, since
> directories must never have aliases and such a beast would error out
> anyway later on d_splice_alias.
>
> > +               /* nodeid was reused, any I/O on the old inode should fail */
> >                 fuse_make_bad(inode);
> >                 iput(inode);
> >                 goto retry;
> > diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
> > index 277f7041d55a..bc267832310c 100644
> > --- a/fs/fuse/readdir.c
> > +++ b/fs/fuse/readdir.c
> > @@ -200,9 +200,12 @@ static int fuse_direntplus_link(struct file *file,
> >         if (!d_in_lookup(dentry)) {
> >                 struct fuse_inode *fi;
> >                 inode = d_inode(dentry);
> > +               if (inode && get_node_id(inode) != o->nodeid)
> > +                       inode = NULL;
> >                 if (!inode ||
> > -                   get_node_id(inode) != o->nodeid ||
> > -                   inode_wrong_type(inode, o->attr.mode)) {
> > +                   fuse_stale_inode(inode, o->generation, &o->attr)) {
>
> Again, automount check added, I'm not at all sure how automount and
> readdirplus interact.    This needs to be looked at (though it's
> mostly a separate issue from this patch).
>

I removed the automount check completely from fuse_stale_inode()
for v2.

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index 1b6c001a7dd1..b06628fd7d8e 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -239,7 +239,8 @@  static int fuse_dentry_revalidate(struct dentry *entry, unsigned int flags)
 		if (!ret) {
 			fi = get_fuse_inode(inode);
 			if (outarg.nodeid != get_node_id(inode) ||
-			    (bool) IS_AUTOMOUNT(inode) != (bool) (outarg.attr.flags & FUSE_ATTR_SUBMOUNT)) {
+			    fuse_stale_inode(inode, outarg.generation,
+					     &outarg.attr)) {
 				fuse_queue_forget(fm->fc, forget,
 						  outarg.nodeid, 1);
 				goto invalid;
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index 7e463e220053..f1bd28c176a9 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -867,6 +867,15 @@  static inline u64 fuse_get_attr_version(struct fuse_conn *fc)
 	return atomic64_read(&fc->attr_version);
 }
 
+static inline bool fuse_stale_inode(const struct inode *inode, int generation,
+				    struct fuse_attr *attr)
+{
+	return inode->i_generation != generation ||
+		inode_wrong_type(inode, attr->mode) ||
+		(bool) IS_AUTOMOUNT(inode) !=
+		(bool) (attr->flags & FUSE_ATTR_SUBMOUNT);
+}
+
 static inline void fuse_make_bad(struct inode *inode)
 {
 	remove_inode_hash(inode);
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index 393e36b74dc4..257bb3e1cac8 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -350,8 +350,8 @@  struct inode *fuse_iget(struct super_block *sb, u64 nodeid,
 		inode->i_generation = generation;
 		fuse_init_inode(inode, attr);
 		unlock_new_inode(inode);
-	} else if (inode_wrong_type(inode, attr->mode)) {
-		/* Inode has changed type, any I/O on the old should fail */
+	} else if (fuse_stale_inode(inode, generation, attr)) {
+		/* nodeid was reused, any I/O on the old inode should fail */
 		fuse_make_bad(inode);
 		iput(inode);
 		goto retry;
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c
index 277f7041d55a..bc267832310c 100644
--- a/fs/fuse/readdir.c
+++ b/fs/fuse/readdir.c
@@ -200,9 +200,12 @@  static int fuse_direntplus_link(struct file *file,
 	if (!d_in_lookup(dentry)) {
 		struct fuse_inode *fi;
 		inode = d_inode(dentry);
+		if (inode && get_node_id(inode) != o->nodeid)
+			inode = NULL;
 		if (!inode ||
-		    get_node_id(inode) != o->nodeid ||
-		    inode_wrong_type(inode, o->attr.mode)) {
+		    fuse_stale_inode(inode, o->generation, &o->attr)) {
+			if (inode)
+				fuse_make_bad(inode);
 			d_invalidate(dentry);
 			dput(dentry);
 			goto retry;