diff mbox

namei: results of d_is_negative() should be checked after dentry revalidation

Message ID 1444412674-3077-1-git-send-email-trond.myklebust@primarydata.com (mailing list archive)
State New, archived
Headers show

Commit Message

Trond Myklebust Oct. 9, 2015, 5:44 p.m. UTC
Leandro Awa writes:
After switching to version 4.1.6, our parallelized and distributed workflows now  fail consistently with errors of the form:

T34: ./regex.c:39:22: error: config.h: No such file or directory

From our 'git bisect' testing, the following commit appears to be
the possible cause of the behavior we've been seeing: commit 766c4cbfacd8

The issue is that revalidation may cause the dentry to be dropped in NFS
if, say, the client notes that the directory timestamps have changed.

Reported-by: Leandro Awa <lawa@nvidia.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...")
Tested-by: Leandro Awa <lawa@nvidia.com>
Cc: stable@vger.kernel.org # v4.1+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
---
 fs/namei.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Linus Torvalds Oct. 10, 2015, 12:19 a.m. UTC | #1
On Fri, Oct 9, 2015 at 10:44 AM, Trond Myklebust
<trond.myklebust@primarydata.com> wrote:
>
> The issue is that revalidation may cause the dentry to be dropped in NFS
> if, say, the client notes that the directory timestamps have changed.

Ack.

We've had this bug before, where we returned something else than
-ENOCHLD while we were doing RCU lookups. See for example commit
97242f99a013 ("link_path_walk(): be careful when failing with
ENOTDIR").

So in general, we should always (a) either verify all sequence points
or (b) return -ENOCHLD to go into slow mode. The patch seems

However, this thing was explicitly made to be this way by commit
766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq
validation"), so while my gut feel is to consider this fix
ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK
and comment from Al about the patch.

Al?

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Oct. 10, 2015, 1:36 a.m. UTC | #2
On Fri, Oct 09, 2015 at 05:19:02PM -0700, Linus Torvalds wrote:

> So in general, we should always (a) either verify all sequence points
> or (b) return -ENOCHLD to go into slow mode. The patch seems
> 
> However, this thing was explicitly made to be this way by commit
> 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq
> validation"), so while my gut feel is to consider this fix
> ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK
> and comment from Al about the patch.
> 
> Al?

Umm...  I agree that the current version is wrong and it looks like this
patch is a complete fix.  The only problem is the commit message -
what really happens is that 766c4cbfacd8 got the things subtly wrong.
We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT".
That was wrong - checking ->d_flags outside of ->d_seq protection is
unreliable and failing with hard error on what should've fallen back to
non-RCU pathname resolution is a bug.

Unfortunately, we'd pulled the test too far up and ran afoul of another
kind of staleness.  Dentry might have been absolutely stable from the
RCU point of view (and we might be on UP, etc.), but stale from the
remote fs point of view.  If ->d_revalidate() returns "it's actually
stale", dentry gets thrown away and original code wouldn't even have looked
at its ->d_flags.  What we need is to check ->d_flags where 766c4cbfacd8 does
(prior to ->d_seq validation) but only use the result in cases where we
do not discard this dentry outright.

With some explanation along the lines of the above added, consider the patch
ACKed.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Al Viro Oct. 10, 2015, 5:13 p.m. UTC | #3
On Sat, Oct 10, 2015 at 02:36:57AM +0100, Al Viro wrote:
> On Fri, Oct 09, 2015 at 05:19:02PM -0700, Linus Torvalds wrote:
> 
> > So in general, we should always (a) either verify all sequence points
> > or (b) return -ENOCHLD to go into slow mode. The patch seems
> > 
> > However, this thing was explicitly made to be this way by commit
> > 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq
> > validation"), so while my gut feel is to consider this fix
> > ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK
> > and comment from Al about the patch.
> > 
> > Al?
> 
> Umm...  I agree that the current version is wrong and it looks like this
> patch is a complete fix.  The only problem is the commit message -
> what really happens is that 766c4cbfacd8 got the things subtly wrong.
> We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT".
> That was wrong - checking ->d_flags outside of ->d_seq protection is
> unreliable and failing with hard error on what should've fallen back to
> non-RCU pathname resolution is a bug.
> 
> Unfortunately, we'd pulled the test too far up and ran afoul of another
> kind of staleness.  Dentry might have been absolutely stable from the
> RCU point of view (and we might be on UP, etc.), but stale from the
> remote fs point of view.  If ->d_revalidate() returns "it's actually
> stale", dentry gets thrown away and original code wouldn't even have looked
> at its ->d_flags.  What we need is to check ->d_flags where 766c4cbfacd8 does
> (prior to ->d_seq validation) but only use the result in cases where we
> do not discard this dentry outright.
> 
> With some explanation along the lines of the above added, consider the patch
> ACKed.

OK, I've attemtped to add an explanation of what's going on; please, pull from
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus

Shortlog:
Trond Myklebust (1):
      namei: results of d_is_negative() should be acted upon only after dentry revalidation

Diffstat:
 fs/namei.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds Oct. 10, 2015, 5:19 p.m. UTC | #4
On Sat, Oct 10, 2015 at 10:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> OK, I've attemtped to add an explanation of what's going on; please, pull from
..

Heh. I just committed it myself with your emailed explanation, so ..

                  Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/namei.c b/fs/namei.c
index 726d211db484..33e9495a3129 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1558,8 +1558,6 @@  static int lookup_fast(struct nameidata *nd,
 		negative = d_is_negative(dentry);
 		if (read_seqcount_retry(&dentry->d_seq, seq))
 			return -ECHILD;
-		if (negative)
-			return -ENOENT;
 
 		/*
 		 * This sequence count validates that the parent had no
@@ -1580,6 +1578,12 @@  static int lookup_fast(struct nameidata *nd,
 				goto unlazy;
 			}
 		}
+		/*
+		 * Note: do negative dentry check after revalidation in
+		 * case that drops it.
+		 */
+		if (negative)
+			return -ENOENT;
 		path->mnt = mnt;
 		path->dentry = dentry;
 		if (likely(__follow_mount_rcu(nd, path, inode, seqp)))