diff mbox series

knfsd: fix the fallback implementation of the get_name export operation

Message ID 20231228201510.985235-1-trondmy@kernel.org (mailing list archive)
State New
Headers show
Series knfsd: fix the fallback implementation of the get_name export operation | expand

Commit Message

Trond Myklebust Dec. 28, 2023, 8:15 p.m. UTC
From: Trond Myklebust <trond.myklebust@hammerspace.com>

The fallback implementation for the get_name export operation uses
readdir() to try to match the inode number to a filename. That filename
is then used together with lookup_one() to produce a dentry.
A problem arises when we match the '.' or '..' entries, since that
causes lookup_one() to fail. This has sometimes been seen to occur for
filesystems that violate POSIX requirements around uniqueness of inode
numbers, something that is common for snapshot directories.

This patch just ensures that we skip '.' and '..' rather than allowing a
match.

Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/exportfs/expfs.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Amir Goldstein Dec. 29, 2023, 5:46 a.m. UTC | #1
[CC: fsdevel, viro]

On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
>
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>
> The fallback implementation for the get_name export operation uses
> readdir() to try to match the inode number to a filename. That filename
> is then used together with lookup_one() to produce a dentry.
> A problem arises when we match the '.' or '..' entries, since that
> causes lookup_one() to fail. This has sometimes been seen to occur for
> filesystems that violate POSIX requirements around uniqueness of inode
> numbers, something that is common for snapshot directories.

Ouch. Nasty.

Looks to me like the root cause is "filesystems that violate POSIX
requirements around uniqueness of inode numbers".
This violation can cause any of the parent's children to wrongly match
get_name() not only '.' and '..' and fail the d_inode sanity check after
lookup_one().

I understand why this would be common with parent of snapshot dir,
but the only fs that support snapshots that I know of (btrfs, bcachefs)
do implement ->get_name(), so which filesystem did you encounter
this behavior with? can it be fixed by implementing a snapshot
aware ->get_name()?

>
> This patch just ensures that we skip '.' and '..' rather than allowing a
> match.

I agree that skipping '.' and '..' makes sense, but...

>
> Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")

...This Fixes is a bit odd to me.
Does the problem go away if the Fixes patch is reverted?
I don't think so, I think you would just hit the d_inode sanity check
after lookup_one() succeeds.
Maybe I did not understand the problem then.

> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/exportfs/expfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index 3ae0154c5680..84af58eaf2ca 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len,
>                 container_of(ctx, struct getdents_callback, ctx);
>
>         buf->sequence++;
> -       if (buf->ino == ino && len <= NAME_MAX) {
> +       /* Ignore the '.' and '..' entries */
> +       if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) &&

I wish I did not have to review that this condition is correct.
I wish there was a common helper is_dot_or_dotdot() that would be
used here as !is_dot_dotdot(name, len).
I found 3 copies of is_dot_dotdot().
I didn't even try to find how many places have open coded this.

Thanks,
Amir.
Jeffrey Layton Dec. 29, 2023, 1:59 p.m. UTC | #2
On Thu, 2023-12-28 at 15:15 -0500, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> The fallback implementation for the get_name export operation uses
> readdir() to try to match the inode number to a filename. That filename
> is then used together with lookup_one() to produce a dentry.
> A problem arises when we match the '.' or '..' entries, since that
> causes lookup_one() to fail. This has sometimes been seen to occur for
> filesystems that violate POSIX requirements around uniqueness of inode
> numbers, something that is common for snapshot directories.
> 
> This patch just ensures that we skip '.' and '..' rather than allowing a
> match.
> 
> Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/exportfs/expfs.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> index 3ae0154c5680..84af58eaf2ca 100644
> --- a/fs/exportfs/expfs.c
> +++ b/fs/exportfs/expfs.c
> @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len,
>  		container_of(ctx, struct getdents_callback, ctx);
>  
>  	buf->sequence++;
> -	if (buf->ino == ino && len <= NAME_MAX) {
> +	/* Ignore the '.' and '..' entries */
> +	if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) &&
> +	    buf->ino == ino && len <= NAME_MAX) {
>  		memcpy(buf->name, name, len);
>  		buf->name[len] = '\0';
>  		buf->found = 1;

Reviewed-by: Jeff Layton <jlayton@kernel.org>
Chuck Lever Dec. 29, 2023, 2:34 p.m. UTC | #3
On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote:
> [CC: fsdevel, viro]

Thanks for picking this up, Amir, and for copying viro/fsdevel. I
was planning to repost this next week when more folks are back, but
this works too.

Trond, if you'd like, I can handle review changes if you don't have
time to follow up.


> On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> >
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> >
> > The fallback implementation for the get_name export operation uses
> > readdir() to try to match the inode number to a filename. That filename
> > is then used together with lookup_one() to produce a dentry.
> > A problem arises when we match the '.' or '..' entries, since that
> > causes lookup_one() to fail. This has sometimes been seen to occur for
> > filesystems that violate POSIX requirements around uniqueness of inode
> > numbers, something that is common for snapshot directories.
> 
> Ouch. Nasty.
> 
> Looks to me like the root cause is "filesystems that violate POSIX
> requirements around uniqueness of inode numbers".
> This violation can cause any of the parent's children to wrongly match
> get_name() not only '.' and '..' and fail the d_inode sanity check after
> lookup_one().
> 
> I understand why this would be common with parent of snapshot dir,
> but the only fs that support snapshots that I know of (btrfs, bcachefs)
> do implement ->get_name(), so which filesystem did you encounter
> this behavior with? can it be fixed by implementing a snapshot
> aware ->get_name()?
> 
> > This patch just ensures that we skip '.' and '..' rather than allowing a
> > match.
> 
> I agree that skipping '.' and '..' makes sense, but...

Does skipping '.' and '..' make sense for file systems that do
indeed guarantee inode number uniqueness? Given your explanation
here, I'm wondering whether the generic get_name() function is the
right place to address the issue.


> > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")
> 
> ...This Fixes is a bit odd to me.

Me too, but I didn't see a more obvious choice. Maybe drop the
specific Fixes: tag in favor of just Cc: stable.


> Does the problem go away if the Fixes patch is reverted?
> I don't think so, I think you would just hit the d_inode sanity check
> after lookup_one() succeeds.
> Maybe I did not understand the problem then.
> 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/exportfs/expfs.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> > index 3ae0154c5680..84af58eaf2ca 100644
> > --- a/fs/exportfs/expfs.c
> > +++ b/fs/exportfs/expfs.c
> > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context *ctx, const char *name, int len,
> >                 container_of(ctx, struct getdents_callback, ctx);
> >
> >         buf->sequence++;
> > -       if (buf->ino == ino && len <= NAME_MAX) {
> > +       /* Ignore the '.' and '..' entries */
> > +       if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) &&
> 
> I wish I did not have to review that this condition is correct.
> I wish there was a common helper is_dot_or_dotdot() that would be
> used here as !is_dot_dotdot(name, len).
> I found 3 copies of is_dot_dotdot().
> I didn't even try to find how many places have open coded this.
Trond Myklebust Dec. 29, 2023, 3:21 p.m. UTC | #4
On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote:
> [CC: fsdevel, viro]
> 
> On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > 
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The fallback implementation for the get_name export operation uses
> > readdir() to try to match the inode number to a filename. That
> > filename
> > is then used together with lookup_one() to produce a dentry.
> > A problem arises when we match the '.' or '..' entries, since that
> > causes lookup_one() to fail. This has sometimes been seen to occur
> > for
> > filesystems that violate POSIX requirements around uniqueness of
> > inode
> > numbers, something that is common for snapshot directories.
> 
> Ouch. Nasty.
> 
> Looks to me like the root cause is "filesystems that violate POSIX
> requirements around uniqueness of inode numbers".
> This violation can cause any of the parent's children to wrongly
> match
> get_name() not only '.' and '..' and fail the d_inode sanity check
> after
> lookup_one().
> 
> I understand why this would be common with parent of snapshot dir,
> but the only fs that support snapshots that I know of (btrfs,
> bcachefs)
> do implement ->get_name(), so which filesystem did you encounter
> this behavior with? can it be fixed by implementing a snapshot
> aware ->get_name()?

NFS (i.e. re-exporting NFS).

Why do you not want a fix in the generic code?

> 
> > 
> > This patch just ensures that we skip '.' and '..' rather than
> > allowing a
> > match.
> 
> I agree that skipping '.' and '..' makes sense, but...
> 
> > 
> > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")
> 
> ...This Fixes is a bit odd to me.
> Does the problem go away if the Fixes patch is reverted?
> I don't think so, I think you would just hit the d_inode sanity check
> after lookup_one() succeeds.
> Maybe I did not understand the problem then.
> 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/exportfs/expfs.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
> > index 3ae0154c5680..84af58eaf2ca 100644
> > --- a/fs/exportfs/expfs.c
> > +++ b/fs/exportfs/expfs.c
> > @@ -255,7 +255,9 @@ static bool filldir_one(struct dir_context
> > *ctx, const char *name, int len,
> >                 container_of(ctx, struct getdents_callback, ctx);
> > 
> >         buf->sequence++;
> > -       if (buf->ino == ino && len <= NAME_MAX) {
> > +       /* Ignore the '.' and '..' entries */
> > +       if ((len > 2 || name[0] != '.' || (len == 2 && name[1] !=
> > '.')) &&
> 
> I wish I did not have to review that this condition is correct.
> I wish there was a common helper is_dot_or_dotdot() that would be
> used here as !is_dot_dotdot(name, len).
> I found 3 copies of is_dot_dotdot().
> I didn't even try to find how many places have open coded this.
> 
> Thanks,
> Amir.
>
Amir Goldstein Dec. 29, 2023, 5:44 p.m. UTC | #5
On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote:
>
> On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote:
> > [CC: fsdevel, viro]
>
> Thanks for picking this up, Amir, and for copying viro/fsdevel. I
> was planning to repost this next week when more folks are back, but
> this works too.
>
> Trond, if you'd like, I can handle review changes if you don't have
> time to follow up.
>
>
> > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > >
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > The fallback implementation for the get_name export operation uses
> > > readdir() to try to match the inode number to a filename. That filename
> > > is then used together with lookup_one() to produce a dentry.
> > > A problem arises when we match the '.' or '..' entries, since that
> > > causes lookup_one() to fail. This has sometimes been seen to occur for
> > > filesystems that violate POSIX requirements around uniqueness of inode
> > > numbers, something that is common for snapshot directories.
> >
> > Ouch. Nasty.
> >
> > Looks to me like the root cause is "filesystems that violate POSIX
> > requirements around uniqueness of inode numbers".
> > This violation can cause any of the parent's children to wrongly match
> > get_name() not only '.' and '..' and fail the d_inode sanity check after
> > lookup_one().
> >
> > I understand why this would be common with parent of snapshot dir,
> > but the only fs that support snapshots that I know of (btrfs, bcachefs)
> > do implement ->get_name(), so which filesystem did you encounter
> > this behavior with? can it be fixed by implementing a snapshot
> > aware ->get_name()?
> >
> > > This patch just ensures that we skip '.' and '..' rather than allowing a
> > > match.
> >
> > I agree that skipping '.' and '..' makes sense, but...
>
> Does skipping '.' and '..' make sense for file systems that do

It makes sense because if the child's name in its parent would
have been "." or ".." it would have been its own parent or its own
grandparent (ELOOP situation).
IOW, we can safely skip "." and "..", regardless of anything else.

> indeed guarantee inode number uniqueness? Given your explanation
> here, I'm wondering whether the generic get_name() function is the
> right place to address the issue.
>
>
> > > Fixes: 21d8a15ac333 ("lookup_one_len: don't accept . and ..")
> >
> > ...This Fixes is a bit odd to me.
>
> Me too, but I didn't see a more obvious choice. Maybe drop the
> specific Fixes: tag in favor of just Cc: stable.
>

Yeh, that'd be fine.

Thanks,
Amir.
Amir Goldstein Dec. 29, 2023, 5:54 p.m. UTC | #6
On Fri, Dec 29, 2023 at 5:22 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Fri, 2023-12-29 at 07:46 +0200, Amir Goldstein wrote:
> > [CC: fsdevel, viro]
> >
> > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > >
> > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > >
> > > The fallback implementation for the get_name export operation uses
> > > readdir() to try to match the inode number to a filename. That
> > > filename
> > > is then used together with lookup_one() to produce a dentry.
> > > A problem arises when we match the '.' or '..' entries, since that
> > > causes lookup_one() to fail. This has sometimes been seen to occur
> > > for
> > > filesystems that violate POSIX requirements around uniqueness of
> > > inode
> > > numbers, something that is common for snapshot directories.
> >
> > Ouch. Nasty.
> >
> > Looks to me like the root cause is "filesystems that violate POSIX
> > requirements around uniqueness of inode numbers".
> > This violation can cause any of the parent's children to wrongly
> > match
> > get_name() not only '.' and '..' and fail the d_inode sanity check
> > after
> > lookup_one().
> >
> > I understand why this would be common with parent of snapshot dir,
> > but the only fs that support snapshots that I know of (btrfs,
> > bcachefs)
> > do implement ->get_name(), so which filesystem did you encounter
> > this behavior with? can it be fixed by implementing a snapshot
> > aware ->get_name()?
>
> NFS (i.e. re-exporting NFS).
>

Ah.

> Why do you not want a fix in the generic code?
>

I do not object to your fix at all.
I only objected to the Fixes tag.
I was just pointing out that this is not a complete solution.
A decode of an NFS (re-exported) file handle could fail if
get_name() iterates the parent of a snapshot root dir
and finds a false match (which is not "." nor "..") before it
finds the snapshot subdir name.

It may be solved by nfs_get_name() which does not stop when
if finds an ino match but checks further, but I don't know nfs re-export
to know what else could be checked.

Anyway, for this patch, without the Fixes tag, feel free to add:
Acked-by: Amir Goldstein <amir73il@gmail.com>

I'd prefer the use of is_dot_dotdot(), but I do not insist.

Thanks and a happy new year!
Amir.
Chuck Lever Dec. 29, 2023, 11:29 p.m. UTC | #7
On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote:
> On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever <chuck.lever@oracle.com> wrote:
> >
> > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote:
> > > [CC: fsdevel, viro]
> >
> > Thanks for picking this up, Amir, and for copying viro/fsdevel. I
> > was planning to repost this next week when more folks are back, but
> > this works too.
> >
> > Trond, if you'd like, I can handle review changes if you don't have
> > time to follow up.
> >
> >
> > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > > >
> > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > >
> > > > The fallback implementation for the get_name export operation uses
> > > > readdir() to try to match the inode number to a filename. That filename
> > > > is then used together with lookup_one() to produce a dentry.
> > > > A problem arises when we match the '.' or '..' entries, since that
> > > > causes lookup_one() to fail. This has sometimes been seen to occur for
> > > > filesystems that violate POSIX requirements around uniqueness of inode
> > > > numbers, something that is common for snapshot directories.
> > >
> > > Ouch. Nasty.
> > >
> > > Looks to me like the root cause is "filesystems that violate POSIX
> > > requirements around uniqueness of inode numbers".
> > > This violation can cause any of the parent's children to wrongly match
> > > get_name() not only '.' and '..' and fail the d_inode sanity check after
> > > lookup_one().
> > >
> > > I understand why this would be common with parent of snapshot dir,
> > > but the only fs that support snapshots that I know of (btrfs, bcachefs)
> > > do implement ->get_name(), so which filesystem did you encounter
> > > this behavior with? can it be fixed by implementing a snapshot
> > > aware ->get_name()?
> > >
> > > > This patch just ensures that we skip '.' and '..' rather than allowing a
> > > > match.
> > >
> > > I agree that skipping '.' and '..' makes sense, but...
> >
> > Does skipping '.' and '..' make sense for file systems that do
> 
> It makes sense because if the child's name in its parent would
> have been "." or ".." it would have been its own parent or its own
> grandparent (ELOOP situation).
> IOW, we can safely skip "." and "..", regardless of anything else.

This new comment:

+	/* Ignore the '.' and '..' entries */

then seems inadequate to explain why dot and dot-dot are now never
matched. Perhaps the function's documenting comment could expand on
this a little. I'll give it some thought.


> > indeed guarantee inode number uniqueness? Given your explanation
> > here, I'm wondering whether the generic get_name() function is the
> > right place to address the issue.
Trond Myklebust Dec. 29, 2023, 11:49 p.m. UTC | #8
On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote:
> On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote:
> > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever
> > <chuck.lever@oracle.com> wrote:
> > > 
> > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote:
> > > > [CC: fsdevel, viro]
> > > 
> > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I
> > > was planning to repost this next week when more folks are back,
> > > but
> > > this works too.
> > > 
> > > Trond, if you'd like, I can handle review changes if you don't
> > > have
> > > time to follow up.
> > > 
> > > 
> > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > > > > 
> > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > 
> > > > > The fallback implementation for the get_name export operation
> > > > > uses
> > > > > readdir() to try to match the inode number to a filename.
> > > > > That filename
> > > > > is then used together with lookup_one() to produce a dentry.
> > > > > A problem arises when we match the '.' or '..' entries, since
> > > > > that
> > > > > causes lookup_one() to fail. This has sometimes been seen to
> > > > > occur for
> > > > > filesystems that violate POSIX requirements around uniqueness
> > > > > of inode
> > > > > numbers, something that is common for snapshot directories.
> > > > 
> > > > Ouch. Nasty.
> > > > 
> > > > Looks to me like the root cause is "filesystems that violate
> > > > POSIX
> > > > requirements around uniqueness of inode numbers".
> > > > This violation can cause any of the parent's children to
> > > > wrongly match
> > > > get_name() not only '.' and '..' and fail the d_inode sanity
> > > > check after
> > > > lookup_one().
> > > > 
> > > > I understand why this would be common with parent of snapshot
> > > > dir,
> > > > but the only fs that support snapshots that I know of (btrfs,
> > > > bcachefs)
> > > > do implement ->get_name(), so which filesystem did you
> > > > encounter
> > > > this behavior with? can it be fixed by implementing a snapshot
> > > > aware ->get_name()?
> > > > 
> > > > > This patch just ensures that we skip '.' and '..' rather than
> > > > > allowing a
> > > > > match.
> > > > 
> > > > I agree that skipping '.' and '..' makes sense, but...
> > > 
> > > Does skipping '.' and '..' make sense for file systems that do
> > 
> > It makes sense because if the child's name in its parent would
> > have been "." or ".." it would have been its own parent or its own
> > grandparent (ELOOP situation).
> > IOW, we can safely skip "." and "..", regardless of anything else.
> 
> This new comment:
> 
> +	/* Ignore the '.' and '..' entries */
> 
> then seems inadequate to explain why dot and dot-dot are now never
> matched. Perhaps the function's documenting comment could expand on
> this a little. I'll give it some thought.

The point of this code is to attempt to create a valid path that
connects the inode found by the filehandle to the export point. The
readdir() must determine a valid name for a dentry that is a component
of that path, which is why '.' and '..' can never be acceptable.

This is why I think we should keep the 'Fixes:' line. The commit it
points to explains quite concisely why this patch is needed.

> 
> 
> > > indeed guarantee inode number uniqueness? Given your explanation
> > > here, I'm wondering whether the generic get_name() function is
> > > the
> > > right place to address the issue.
>
Amir Goldstein Dec. 30, 2023, 6:23 a.m. UTC | #9
On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote:
> > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote:
> > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever
> > > <chuck.lever@oracle.com> wrote:
> > > >
> > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein wrote:
> > > > > [CC: fsdevel, viro]
> > > >
> > > > Thanks for picking this up, Amir, and for copying viro/fsdevel. I
> > > > was planning to repost this next week when more folks are back,
> > > > but
> > > > this works too.
> > > >
> > > > Trond, if you'd like, I can handle review changes if you don't
> > > > have
> > > > time to follow up.
> > > >
> > > >
> > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org> wrote:
> > > > > >
> > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > >
> > > > > > The fallback implementation for the get_name export operation
> > > > > > uses
> > > > > > readdir() to try to match the inode number to a filename.
> > > > > > That filename
> > > > > > is then used together with lookup_one() to produce a dentry.
> > > > > > A problem arises when we match the '.' or '..' entries, since
> > > > > > that
> > > > > > causes lookup_one() to fail. This has sometimes been seen to
> > > > > > occur for
> > > > > > filesystems that violate POSIX requirements around uniqueness
> > > > > > of inode
> > > > > > numbers, something that is common for snapshot directories.
> > > > >
> > > > > Ouch. Nasty.
> > > > >
> > > > > Looks to me like the root cause is "filesystems that violate
> > > > > POSIX
> > > > > requirements around uniqueness of inode numbers".
> > > > > This violation can cause any of the parent's children to
> > > > > wrongly match
> > > > > get_name() not only '.' and '..' and fail the d_inode sanity
> > > > > check after
> > > > > lookup_one().
> > > > >
> > > > > I understand why this would be common with parent of snapshot
> > > > > dir,
> > > > > but the only fs that support snapshots that I know of (btrfs,
> > > > > bcachefs)
> > > > > do implement ->get_name(), so which filesystem did you
> > > > > encounter
> > > > > this behavior with? can it be fixed by implementing a snapshot
> > > > > aware ->get_name()?
> > > > >
> > > > > > This patch just ensures that we skip '.' and '..' rather than
> > > > > > allowing a
> > > > > > match.
> > > > >
> > > > > I agree that skipping '.' and '..' makes sense, but...
> > > >
> > > > Does skipping '.' and '..' make sense for file systems that do
> > >
> > > It makes sense because if the child's name in its parent would
> > > have been "." or ".." it would have been its own parent or its own
> > > grandparent (ELOOP situation).
> > > IOW, we can safely skip "." and "..", regardless of anything else.
> >
> > This new comment:
> >
> > +     /* Ignore the '.' and '..' entries */
> >
> > then seems inadequate to explain why dot and dot-dot are now never
> > matched. Perhaps the function's documenting comment could expand on
> > this a little. I'll give it some thought.
>
> The point of this code is to attempt to create a valid path that
> connects the inode found by the filehandle to the export point. The
> readdir() must determine a valid name for a dentry that is a component
> of that path, which is why '.' and '..' can never be acceptable.
>
> This is why I think we should keep the 'Fixes:' line. The commit it
> points to explains quite concisely why this patch is needed.
>

By all means, mention this commit, just not with a fixed tag please.
IIUC, commit 21d8a15ac333 did not introduce a regression that this
patch fixes. Right?
So why insist on abusing Fixes: tag instead of a mention?

Thanks,
Amir.
Trond Myklebust Dec. 30, 2023, 7:36 p.m. UTC | #10
On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote:
> On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust
> <trondmy@hammerspace.com> wrote:
> > 
> > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote:
> > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote:
> > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever
> > > > <chuck.lever@oracle.com> wrote:
> > > > > 
> > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein
> > > > > wrote:
> > > > > > [CC: fsdevel, viro]
> > > > > 
> > > > > Thanks for picking this up, Amir, and for copying
> > > > > viro/fsdevel. I
> > > > > was planning to repost this next week when more folks are
> > > > > back,
> > > > > but
> > > > > this works too.
> > > > > 
> > > > > Trond, if you'd like, I can handle review changes if you
> > > > > don't
> > > > > have
> > > > > time to follow up.
> > > > > 
> > > > > 
> > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org>
> > > > > > wrote:
> > > > > > > 
> > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > > 
> > > > > > > The fallback implementation for the get_name export
> > > > > > > operation
> > > > > > > uses
> > > > > > > readdir() to try to match the inode number to a filename.
> > > > > > > That filename
> > > > > > > is then used together with lookup_one() to produce a
> > > > > > > dentry.
> > > > > > > A problem arises when we match the '.' or '..' entries,
> > > > > > > since
> > > > > > > that
> > > > > > > causes lookup_one() to fail. This has sometimes been seen
> > > > > > > to
> > > > > > > occur for
> > > > > > > filesystems that violate POSIX requirements around
> > > > > > > uniqueness
> > > > > > > of inode
> > > > > > > numbers, something that is common for snapshot
> > > > > > > directories.
> > > > > > 
> > > > > > Ouch. Nasty.
> > > > > > 
> > > > > > Looks to me like the root cause is "filesystems that
> > > > > > violate
> > > > > > POSIX
> > > > > > requirements around uniqueness of inode numbers".
> > > > > > This violation can cause any of the parent's children to
> > > > > > wrongly match
> > > > > > get_name() not only '.' and '..' and fail the d_inode
> > > > > > sanity
> > > > > > check after
> > > > > > lookup_one().
> > > > > > 
> > > > > > I understand why this would be common with parent of
> > > > > > snapshot
> > > > > > dir,
> > > > > > but the only fs that support snapshots that I know of
> > > > > > (btrfs,
> > > > > > bcachefs)
> > > > > > do implement ->get_name(), so which filesystem did you
> > > > > > encounter
> > > > > > this behavior with? can it be fixed by implementing a
> > > > > > snapshot
> > > > > > aware ->get_name()?
> > > > > > 
> > > > > > > This patch just ensures that we skip '.' and '..' rather
> > > > > > > than
> > > > > > > allowing a
> > > > > > > match.
> > > > > > 
> > > > > > I agree that skipping '.' and '..' makes sense, but...
> > > > > 
> > > > > Does skipping '.' and '..' make sense for file systems that
> > > > > do
> > > > 
> > > > It makes sense because if the child's name in its parent would
> > > > have been "." or ".." it would have been its own parent or its
> > > > own
> > > > grandparent (ELOOP situation).
> > > > IOW, we can safely skip "." and "..", regardless of anything
> > > > else.
> > > 
> > > This new comment:
> > > 
> > > +     /* Ignore the '.' and '..' entries */
> > > 
> > > then seems inadequate to explain why dot and dot-dot are now
> > > never
> > > matched. Perhaps the function's documenting comment could expand
> > > on
> > > this a little. I'll give it some thought.
> > 
> > The point of this code is to attempt to create a valid path that
> > connects the inode found by the filehandle to the export point. The
> > readdir() must determine a valid name for a dentry that is a
> > component
> > of that path, which is why '.' and '..' can never be acceptable.
> > 
> > This is why I think we should keep the 'Fixes:' line. The commit it
> > points to explains quite concisely why this patch is needed.
> > 
> 
> By all means, mention this commit, just not with a fixed tag please.
> IIUC, commit 21d8a15ac333 did not introduce a regression that this
> patch fixes. Right?
> So why insist on abusing Fixes: tag instead of a mention?

I don't see it as being that straightforward.

Prior to commit 21d8a15ac333, the call to lookup_one_len() could return
a dentry (albeit one with an invalid name) depending on whether or not
the filesystem lookup succeeds. Note that knfsd does support a lookup
of "." and "..", as do several other NFS servers.

With commit 21d8a15ac333 applied, however, lookup_one_len()
automatically returns an EACCES error.

So while I agree that there are good reasons for introducing commit
21d8a15ac333, it does change the behaviour in this code path.
Amir Goldstein Dec. 31, 2023, 10:44 a.m. UTC | #11
On Sat, Dec 30, 2023 at 9:36 PM Trond Myklebust <trondmy@hammerspace.com> wrote:
>
> On Sat, 2023-12-30 at 08:23 +0200, Amir Goldstein wrote:
> > On Sat, Dec 30, 2023 at 1:50 AM Trond Myklebust
> > <trondmy@hammerspace.com> wrote:
> > >
> > > On Fri, 2023-12-29 at 18:29 -0500, Chuck Lever wrote:
> > > > On Fri, Dec 29, 2023 at 07:44:20PM +0200, Amir Goldstein wrote:
> > > > > On Fri, Dec 29, 2023 at 4:35 PM Chuck Lever
> > > > > <chuck.lever@oracle.com> wrote:
> > > > > >
> > > > > > On Fri, Dec 29, 2023 at 07:46:54AM +0200, Amir Goldstein
> > > > > > wrote:
> > > > > > > [CC: fsdevel, viro]
> > > > > >
> > > > > > Thanks for picking this up, Amir, and for copying
> > > > > > viro/fsdevel. I
> > > > > > was planning to repost this next week when more folks are
> > > > > > back,
> > > > > > but
> > > > > > this works too.
> > > > > >
> > > > > > Trond, if you'd like, I can handle review changes if you
> > > > > > don't
> > > > > > have
> > > > > > time to follow up.
> > > > > >
> > > > > >
> > > > > > > On Thu, Dec 28, 2023 at 10:22 PM <trondmy@kernel.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > > > > > >
> > > > > > > > The fallback implementation for the get_name export
> > > > > > > > operation
> > > > > > > > uses
> > > > > > > > readdir() to try to match the inode number to a filename.
> > > > > > > > That filename
> > > > > > > > is then used together with lookup_one() to produce a
> > > > > > > > dentry.
> > > > > > > > A problem arises when we match the '.' or '..' entries,
> > > > > > > > since
> > > > > > > > that
> > > > > > > > causes lookup_one() to fail. This has sometimes been seen
> > > > > > > > to
> > > > > > > > occur for
> > > > > > > > filesystems that violate POSIX requirements around
> > > > > > > > uniqueness
> > > > > > > > of inode
> > > > > > > > numbers, something that is common for snapshot
> > > > > > > > directories.
> > > > > > >
> > > > > > > Ouch. Nasty.
> > > > > > >
> > > > > > > Looks to me like the root cause is "filesystems that
> > > > > > > violate
> > > > > > > POSIX
> > > > > > > requirements around uniqueness of inode numbers".
> > > > > > > This violation can cause any of the parent's children to
> > > > > > > wrongly match
> > > > > > > get_name() not only '.' and '..' and fail the d_inode
> > > > > > > sanity
> > > > > > > check after
> > > > > > > lookup_one().
> > > > > > >
> > > > > > > I understand why this would be common with parent of
> > > > > > > snapshot
> > > > > > > dir,
> > > > > > > but the only fs that support snapshots that I know of
> > > > > > > (btrfs,
> > > > > > > bcachefs)
> > > > > > > do implement ->get_name(), so which filesystem did you
> > > > > > > encounter
> > > > > > > this behavior with? can it be fixed by implementing a
> > > > > > > snapshot
> > > > > > > aware ->get_name()?
> > > > > > >
> > > > > > > > This patch just ensures that we skip '.' and '..' rather
> > > > > > > > than
> > > > > > > > allowing a
> > > > > > > > match.
> > > > > > >
> > > > > > > I agree that skipping '.' and '..' makes sense, but...
> > > > > >
> > > > > > Does skipping '.' and '..' make sense for file systems that
> > > > > > do
> > > > >
> > > > > It makes sense because if the child's name in its parent would
> > > > > have been "." or ".." it would have been its own parent or its
> > > > > own
> > > > > grandparent (ELOOP situation).
> > > > > IOW, we can safely skip "." and "..", regardless of anything
> > > > > else.
> > > >
> > > > This new comment:
> > > >
> > > > +     /* Ignore the '.' and '..' entries */
> > > >
> > > > then seems inadequate to explain why dot and dot-dot are now
> > > > never
> > > > matched. Perhaps the function's documenting comment could expand
> > > > on
> > > > this a little. I'll give it some thought.
> > >
> > > The point of this code is to attempt to create a valid path that
> > > connects the inode found by the filehandle to the export point. The
> > > readdir() must determine a valid name for a dentry that is a
> > > component
> > > of that path, which is why '.' and '..' can never be acceptable.
> > >
> > > This is why I think we should keep the 'Fixes:' line. The commit it
> > > points to explains quite concisely why this patch is needed.
> > >
> >
> > By all means, mention this commit, just not with a fixed tag please.
> > IIUC, commit 21d8a15ac333 did not introduce a regression that this
> > patch fixes. Right?
> > So why insist on abusing Fixes: tag instead of a mention?
>
> I don't see it as being that straightforward.
>
> Prior to commit 21d8a15ac333, the call to lookup_one_len() could return
> a dentry (albeit one with an invalid name) depending on whether or not
> the filesystem lookup succeeds. Note that knfsd does support a lookup
> of "." and "..", as do several other NFS servers.
>
> With commit 21d8a15ac333 applied, however, lookup_one_len()
> automatically returns an EACCES error.
>
> So while I agree that there are good reasons for introducing commit
> 21d8a15ac333, it does change the behaviour in this code path.
>

I feel that we are miscommunicating.
Let me explain how I understand the code and please tell me where I am wrong.

The way I see it, before 21d8a15ac333, exportfs_decode_fh_raw() would
call lookup_one() and may get a dentry (with invalid name), but then the
sanity check following lookup_one() would surely fail, because no fs should
allow a directory to be its own parent/grandparent:

                        if (unlikely(nresult->d_inode != result->d_inode)) {
                                dput(nresult);
                                nresult = ERR_PTR(-ESTALE);
                        }

The way I see it, the only thing that commit 21d8a15ac333 changed in
this code is the return value of exportfs_decode_fh_raw() from -ESTALE
to -EACCES.

exportfs_decode_fh() converts both these errors to -ESTALE and
so does nfsd_set_fh_dentry().

Bottom line, if I am reading the code correctly, commit 21d8a15ac333 did
not change the behaviour for knfsd nor any user visible behavior for
open_by_handle_at() for userspace nfsd.

Your fix is good because:
1. It saves an unneeded call to lookup_one()
2. skipping "." and ".." increases the chance of finding the correct child
    name in the case of non-unique ino

So I have no objection to your fix in generic code, but I do not see
it being a regression fix.

Where are we miscommunicating? What am I missing?

Thanks,
Amir.
diff mbox series

Patch

diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 3ae0154c5680..84af58eaf2ca 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -255,7 +255,9 @@  static bool filldir_one(struct dir_context *ctx, const char *name, int len,
 		container_of(ctx, struct getdents_callback, ctx);
 
 	buf->sequence++;
-	if (buf->ino == ino && len <= NAME_MAX) {
+	/* Ignore the '.' and '..' entries */
+	if ((len > 2 || name[0] != '.' || (len == 2 && name[1] != '.')) &&
+	    buf->ino == ino && len <= NAME_MAX) {
 		memcpy(buf->name, name, len);
 		buf->name[len] = '\0';
 		buf->found = 1;