Message ID | 20150801072603.GV17109@ZenIV.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Al Viro wrote on Sat, Aug 01, 2015: > And that has turned the check done to an inode that *was* ours at some > point (i.e. fetching it had been followed by checking that ->d_seq had > been still valid) into something completely unprotected. Suppose we > are in lazy mode and somebody had evicted nd->path.dentry after we'd looked > it up and before that check. Sure, its ->d_seq had been bumped by that, > and we would've failed anyway. With ECHILD. Which, unlike ENOTDIR, is > "repeat in non-lazy mode". That sounds like a good find, I was looking at how to claim/protect the entry somehow as well but I just have no idea... > Folks, could you check if this fixes the problems you are seeing? > > diff --git a/fs/namei.c b/fs/namei.c > index ae4e4c1..b16c3a7 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -1954,7 +1954,11 @@ OK: > continue; > } > } > - if (unlikely(!d_can_lookup(nd->path.dentry))) > + if (unlikely(!d_can_lookup(nd->path.dentry))) { > + if (nd->flags & LOOKUP_RCU) { > + if (unlazy_walk(nd, NULL, 0)) > + return -ECHILD; > + } > return -ENOTDIR; > } > } Unfortunately, still happens for me. I had to adapt a bit because using an old kernel (4bf46a272), will try again with a recent master to doublecheck, but I had a break on the "if (nd->flags & LOOKUP_RCU)" check: - sometimes fails without ever hitting the check. I think this fixes the "ENOTDIR" I had described, but there's at least another way to fail? - When we do hit it, we're into LOOKUP_RCU at this point alright, unlazy_walk fails and we try again without RCU -- can confirm the recovery process goes OK (well, that it went OK at least once) Thanks,
Dominique Martinet wrote on Sat, Aug 01, 2015: > I had to adapt a bit because using an old kernel (4bf46a272), will try > again with a recent master to doublecheck There have been more changes than what I thought, can't seem to reproduce in a while on linus' HEAD with that fix (it fell in that check quite a few times already). I'll let test run all weekend to be safe, but it looks good to me!
On Sat, Aug 1, 2015 at 12:26 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Actually, the shit had hit the fan earlier. Look: in > commit b18825a7c8e37a7cf6abb97a12a6ad71af160de7 > Author: David Howells <dhowells@redhat.com> > Date: Thu Sep 12 19:22:53 2013 +0100 > > VFS: Put a small type field into struct dentry::d_flags > > we have this: > > - if (!can_lookup(nd->inode)) { > + if (!d_is_directory(nd->path.dentry)) { Ahh. That's subtle, yes. Inodes are stable in ways that dentries aren't. And the reason why Dominique bisected it to 4bf46a272647 would seem to be that while dentries aren't really stable, the dentry flags generally don't change. But dentry_iput() changed to actually clear the type when clearing the inode, so that probably added a few cases where it went from "stable in practice" to be more easily triggered. Your patch looks obviously correct, with the slight worry that there might be other cases of this. > there. AFAICS, other places of that sort are not a problem anymore. Al, do you plan a pull request? It would be good to get this into rc5 (tomorrow) regardless of whether there might be other issues lurking too. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 01, 2015 at 09:09:24AM -0700, Linus Torvalds wrote: > On Sat, Aug 1, 2015 at 12:26 AM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > > > Actually, the shit had hit the fan earlier. Look: in > > commit b18825a7c8e37a7cf6abb97a12a6ad71af160de7 > > Author: David Howells <dhowells@redhat.com> > > Date: Thu Sep 12 19:22:53 2013 +0100 > > > > VFS: Put a small type field into struct dentry::d_flags > > > > we have this: > > > > - if (!can_lookup(nd->inode)) { > > + if (!d_is_directory(nd->path.dentry)) { > > Ahh. That's subtle, yes. Inodes are stable in ways that dentries > aren't. And the reason why Dominique bisected it to 4bf46a272647 would > seem to be that while dentries aren't really stable, the dentry flags > generally don't change. But dentry_iput() changed to actually clear > the type when clearing the inode, so that probably added a few cases > where it went from "stable in practice" to be more easily triggered. > > Your patch looks obviously correct, with the slight worry that there > might be other cases of this. > > > there. AFAICS, other places of that sort are not a problem anymore. > > Al, do you plan a pull request? It would be good to get this into rc5 > (tomorrow) regardless of whether there might be other issues lurking > too. Will do later today. -stable branches will be interesting, though ;-/ Al, still digging through the loads of fun stuff in l-k mailbox... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 01, 2015 at 09:09:24AM -0700, Linus Torvalds wrote: > Al, do you plan a pull request? It would be good to get this into rc5 > (tomorrow) regardless of whether there might be other issues lurking > too. Please, pull from the usual place: git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus Shortlog: Al Viro (1): link_path_walk(): be careful when failing with ENOTDIR Diffstat: fs/namei.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sun, Aug 02, 2015 at 01:14:02AM +0100, Al Viro wrote: > On Sat, Aug 01, 2015 at 09:09:24AM -0700, Linus Torvalds wrote: > > > Al, do you plan a pull request? It would be good to get this into rc5 > > (tomorrow) regardless of whether there might be other issues lurking > > too. > > Please, pull from the usual place: > git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git for-linus > > Shortlog: > Al Viro (1): > link_path_walk(): be careful when failing with ENOTDIR > > Diffstat: > fs/namei.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) Diffstat: fs/namei.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) actually. Note to self: git diff HEAD *before* pushing, to verify that no uncommitted typo fixes are sitting around... My apologies. Branch head should be at 97242f9, just to make sure you get the right one... BTW, is there any convenient way to have git itself check things like that? -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 1, 2015 at 5:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > BTW, is there any convenient way to have git itself check things like > that? I guess you could just do a "pre-push" hook, and have it exit with an error if the tree isn't clean (to fail the push) or at least warn. See "man githooks" for details. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 1, 2015 at 5:23 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > Branch head should be at 97242f9, just to make sure you get > the right one... Ok, merged in my tree. However, looking at this, I'm struck by how all the callers of "link_path_walk()" tend to have very similar patterns wrt error handling. And I'm wondering - wouldn't it be nicer to extend that pattern a bit more, and make the *callers* of link_path_walk() all do if (error) { if (nd->flags & LOOKUP_RCU) { if (unlazy_walk(nd, NULL, 0)) error = -ECHILD; } } and maybe even make that part of terminate_walk() that everybody calls after getting here. Because it's not just that "!d_can_lookup()" case that triggers it, you also have that pattern in the RCU error case for may_lookup(), and get_link(). So why don't we make the rule that *every* single error we get during an RCU walk should do that unlazy_walk() and turn the error into ECHILD on failure. Hmm? We're almost there as-is. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 01, 2015 at 05:57:44PM -0700, Linus Torvalds wrote: > Because it's not just that "!d_can_lookup()" case that triggers it, > you also have that pattern in the RCU error case for may_lookup(), and > get_link(). It feels like it might make sense to handle that in caller, but... that goes only for cases when we are *NOT* going to continue after successful transition to non-lazy mode. And these two are not of that sort - we do want to continue rather than restart everything from scratch. BTW, unlazy_walk() has too many arguments, all for the sake of one caller (everything except lookup_fast() calls it with (nd, NULL, 0) as arguments) and it might make sense to split the damn thing in two. I have that in a pending pile since the last cycle, but back then you have asked to stop piling them up and let it settle, so I'd postponed that one along with other cleanups... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 1, 2015 at 6:41 PM, Al Viro <viro@zeniv.linux.org.uk> wrote: > > It feels like it might make sense to handle that in caller, but... > that goes only for cases when we are *NOT* going to continue after > successful transition to non-lazy mode. And these two are not of > that sort - we do want to continue rather than restart everything > from scratch. Ahh. Yes. I didn't notice that they actually don't return an error at all if the unlazy_walk succeeds, but just continue. Ok, so they really are very different. > BTW, unlazy_walk() has too many arguments, all for the sake of one caller > (everything except lookup_fast() calls it with (nd, NULL, 0) as arguments) > and it might make sense to split the damn thing in two. I have that in > a pending pile since the last cycle, but back then you have asked to stop > piling them up and let it settle, so I'd postponed that one along with other > cleanups... Well, I'd not be against continuing cleanups for 4.3... Well, as long as we can make sure 4.2 is solid first, of course. I'd still like to have Hugh verify that the current -git tree works for his load, but obviously that wasn't easily reproducible, so that will presumably take a few days. Dominique seems to at least not see it any more with that patch. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 1 Aug 2015, Linus Torvalds wrote: > > Well, I'd not be against continuing cleanups for 4.3... Well, as long > as we can make sure 4.2 is solid first, of course. I'd still like to > have Hugh verify that the current -git tree works for his load, but > obviously that wasn't easily reproducible, so that will presumably > take a few days. Dominique seems to at least not see it any more with > that patch. I've had Al's patch under load since this morning, and it's going fine. But I've not actually managed to reproduce the issue for several days now, however bogus a "fix" I've been testing. So certainly don't wait to hear from me: but of course I'll keep on with the loads, and shout if it does go wrong again. (I don't actually understand why the clearing of DCACHE_ENTRY_TYPE in dentry_iput() is not of continuing concern; but don't worry, there's plenty I don't understand - so long as you're both satisfied that it's not a concern, no need to persuade me.) Do we have any idea why a bug introduced in v3.13 should only now stand out, both for Dominique and for me? Has the RCU lookup somehow become much more effective recently? (I don't think symlinks were a big deal for either of us, though the kernel build does use some). Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 01, 2015 at 09:06:55PM -0700, Hugh Dickins wrote: > (I don't actually understand why the clearing of DCACHE_ENTRY_TYPE in > dentry_iput() is not of continuing concern; but don't worry, there's > plenty I don't understand - so long as you're both satisfied that > it's not a concern, no need to persuade me.) Because before we even get to dentry_iput(), we evict the fucker from hash. And that will do dentry_rcuwalk_invalidate(dentry), which will bump ->d_seq *AFTER* having it unhashed. Now look at __d_lookup_rcu() - there we fetch ->d_seq, then verify that it's still hashed. So having hit dentry_iput() means that everyone who'd found it via RCU lookup will be guaranteed a ->d_seq mismatch. The same goes for things like d_drop() and d_instantiate(). Look for dentry_rcuwalk_invalidate() callers in there... Clearing DCACHE_ENTRY_TYPE there is fine - dentry *is* made negative there, after all. What we want is to have ->d_inode stable at least as long as ->d_seq remains so. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, Aug 1, 2015 at 9:06 PM, Hugh Dickins <hughd@google.com> wrote: > > (I don't actually understand why the clearing of DCACHE_ENTRY_TYPE in > dentry_iput() is not of continuing concern; but don't worry, there's > plenty I don't understand - so long as you're both satisfied that > it's not a concern, no need to persuade me.) So dentry_iput() is only called as the dentry is being thrown away, and is stale. Yes, such a stale dentry can be seen by an RCU lookup, but the RCU lookups should always revalidate things after the lookup, so it shouldn't matter. The problem here was that there was a missing revalidate of the RCU lookup for an error case, so the error that _should_ have been a harmless race that got handled later by the proper validation instead turned into a real user-visible error. But we didn't use to clear the flags in dentry_iput, so before things generally "happened to work" anyway, because this rare error case didn't actually ever trigger in the first place. (And I still don't think we necessarily *should* clear the flags in dentry_iput(), but it really shouldn't be a correctness issue) > Do we have any idea why a bug introduced in v3.13 should only now > stand out, both for Dominique and for me? Has the RCU lookup somehow > become much more effective recently? So I do think that the clearing of the dentry flags exposed a situation that was harder to hit before. The fact that we now do RCU lookups even over symlinks probably does end up widening the possibilities for this happening too, although as you say, that shouldn't be very common during a kernel build. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Sat, 1 Aug 2015, Linus Torvalds wrote: > On Sat, Aug 1, 2015 at 9:06 PM, Hugh Dickins <hughd@google.com> wrote: > > > > (I don't actually understand why the clearing of DCACHE_ENTRY_TYPE in > > dentry_iput() is not of continuing concern; but don't worry, there's > > plenty I don't understand - so long as you're both satisfied that > > it's not a concern, no need to persuade me.) > > So dentry_iput() is only called as the dentry is being thrown away, > and is stale. > > Yes, such a stale dentry can be seen by an RCU lookup, but the RCU > lookups should always revalidate things after the lookup, so it > shouldn't matter. The problem here was that there was a missing > revalidate of the RCU lookup for an error case, so the error that > _should_ have been a harmless race that got handled later by the > proper validation instead turned into a real user-visible error. Thank you both for leading me through that: I really should have rechecked the sequence count invalidation in the source for myself (I had a wrong picture of it in my head), before inserting that parenthesis and taking your time over it; but had been in a hurry to get a response back. > > But we didn't use to clear the flags in dentry_iput, so before things > generally "happened to work" anyway, because this rare error case > didn't actually ever trigger in the first place. > > (And I still don't think we necessarily *should* clear the flags in > dentry_iput(), but it really shouldn't be a correctness issue) > > > Do we have any idea why a bug introduced in v3.13 should only now > > stand out, both for Dominique and for me? Has the RCU lookup somehow > > become much more effective recently? > > So I do think that the clearing of the dentry flags exposed a > situation that was harder to hit before. Right, that does indeed make sense of why it appeared now. I cannot actually report success from yesterday's testing, since it hung after 20 hours for, I believe, the same unrelated reason that I ran into before. I mentioned jbd2 last time, but I doubt that's at fault: it's almost certainly an issue with recent vmscan changes and/or recent loop changes - the business of page reclaim waiting on page writeback has always been tricky and fragile and deadlock-prone, the more so when loop is involved: probably the balance has got shifted slightly by recent changes, I'll look into it (but definitely not rc5 material). Thanks, Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/namei.c b/fs/namei.c index ae4e4c1..b16c3a7 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -1954,7 +1954,11 @@ OK: continue; } } - if (unlikely(!d_can_lookup(nd->path.dentry))) + if (unlikely(!d_can_lookup(nd->path.dentry))) { + if (nd->flags & LOOKUP_RCU) { + if (unlazy_walk(nd, NULL, 0)) + return -ECHILD; + } return -ENOTDIR; } }