Message ID | 1444412674-3077-1-git-send-email-trond.myklebust@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Oct 9, 2015 at 10:44 AM, Trond Myklebust <trond.myklebust@primarydata.com> wrote: > > The issue is that revalidation may cause the dentry to be dropped in NFS > if, say, the client notes that the directory timestamps have changed. Ack. We've had this bug before, where we returned something else than -ENOCHLD while we were doing RCU lookups. See for example commit 97242f99a013 ("link_path_walk(): be careful when failing with ENOTDIR"). So in general, we should always (a) either verify all sequence points or (b) return -ENOCHLD to go into slow mode. The patch seems However, this thing was explicitly made to be this way by commit 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq validation"), so while my gut feel is to consider this fix ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK and comment from Al about the patch. Al? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Oct 09, 2015 at 05:19:02PM -0700, Linus Torvalds wrote: > So in general, we should always (a) either verify all sequence points > or (b) return -ENOCHLD to go into slow mode. The patch seems > > However, this thing was explicitly made to be this way by commit > 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq > validation"), so while my gut feel is to consider this fix > ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK > and comment from Al about the patch. > > Al? Umm... I agree that the current version is wrong and it looks like this patch is a complete fix. The only problem is the commit message - what really happens is that 766c4cbfacd8 got the things subtly wrong. We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". That was wrong - checking ->d_flags outside of ->d_seq protection is unreliable and failing with hard error on what should've fallen back to non-RCU pathname resolution is a bug. Unfortunately, we'd pulled the test too far up and ran afoul of another kind of staleness. Dentry might have been absolutely stable from the RCU point of view (and we might be on UP, etc.), but stale from the remote fs point of view. If ->d_revalidate() returns "it's actually stale", dentry gets thrown away and original code wouldn't even have looked at its ->d_flags. What we need is to check ->d_flags where 766c4cbfacd8 does (prior to ->d_seq validation) but only use the result in cases where we do not discard this dentry outright. With some explanation along the lines of the above added, consider the patch ACKed. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 10, 2015 at 02:36:57AM +0100, Al Viro wrote: > On Fri, Oct 09, 2015 at 05:19:02PM -0700, Linus Torvalds wrote: > > > So in general, we should always (a) either verify all sequence points > > or (b) return -ENOCHLD to go into slow mode. The patch seems > > > > However, this thing was explicitly made to be this way by commit > > 766c4cbfacd8 ("namei: d_is_negative() should be checked before ->d_seq > > validation"), so while my gut feel is to consider this fix > > ObviouslyCorrect(tm), I will delay it a bit in the hope to get an ACK > > and comment from Al about the patch. > > > > Al? > > Umm... I agree that the current version is wrong and it looks like this > patch is a complete fix. The only problem is the commit message - > what really happens is that 766c4cbfacd8 got the things subtly wrong. > We used to treat d_is_negative() after lookup_fast() as "fall with ENOENT". > That was wrong - checking ->d_flags outside of ->d_seq protection is > unreliable and failing with hard error on what should've fallen back to > non-RCU pathname resolution is a bug. > > Unfortunately, we'd pulled the test too far up and ran afoul of another > kind of staleness. Dentry might have been absolutely stable from the > RCU point of view (and we might be on UP, etc.), but stale from the > remote fs point of view. If ->d_revalidate() returns "it's actually > stale", dentry gets thrown away and original code wouldn't even have looked > at its ->d_flags. What we need is to check ->d_flags where 766c4cbfacd8 does > (prior to ->d_seq validation) but only use the result in cases where we > do not discard this dentry outright. > > With some explanation along the lines of the above added, consider the patch > ACKed. OK, I've attemtped to add an explanation of what's going on; please, pull from git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus Shortlog: Trond Myklebust (1): namei: results of d_is_negative() should be acted upon only after dentry revalidation Diffstat: fs/namei.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Oct 10, 2015 at 10:13 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > OK, I've attemtped to add an explanation of what's going on; please, pull from .. Heh. I just committed it myself with your emailed explanation, so .. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/namei.c b/fs/namei.c index 726d211db484..33e9495a3129 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1558,8 +1558,6 @@ static int lookup_fast(struct nameidata *nd, negative = d_is_negative(dentry); if (read_seqcount_retry(&dentry->d_seq, seq)) return -ECHILD; - if (negative) - return -ENOENT; /* * This sequence count validates that the parent had no @@ -1580,6 +1578,12 @@ static int lookup_fast(struct nameidata *nd, goto unlazy; } } + /* + * Note: do negative dentry check after revalidation in + * case that drops it. + */ + if (negative) + return -ENOENT; path->mnt = mnt; path->dentry = dentry; if (likely(__follow_mount_rcu(nd, path, inode, seqp)))