Message ID | 20171211035017.32678-1-tytso@mit.edu (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sun, Dec 10, 2017 at 10:50:17PM -0500, Theodore Ts'o wrote: > CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result > in a large number of false positives because lockdep doesn't > understand how to deal with multiple stacked loop or MD devices. > > Until someone can figure out how to automatically add annotations to > all file system and storage devices --- hopefully without forcing > developers to insert a large number of calls to undocumented lockdep > macros into the the loop device, every single file system, and every > single device-mapper backend --- let's add an option to allow these > lockdep features to be disabled. Otherwise, many file system > developers will disable LOCKDEP entirely since it results in far too > many false positives when trying to use xfstests. Note --- this is going to be in the ext4 git tree because otherwise using Lockdep would greatly interfere with ext4 development --- and I'd prefer not to completely give up on Lockdep as a lost cause. As such, it will appear in linux-next. I'll drop it from the ext4 git branch before I send a pull request to Linus, unless I get an OK for me to merge this patch to mainline via the ext4 git tree. Cheers, - Ted
On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote: > CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result > in a large number of false positives because lockdep doesn't > understand how to deal with multiple stacked loop or MD devices. Guys, can we just remove this nasty crud already? It's not working. Give it up. It was complex, it was buggy, it was slow. Now it's causing people to disable lockdep entirely, or play these kinds of games in unrelated trees. It's time to give up on bad debugging, and definitely _not_ enable it by default for PROVE_LOCKING. Ted: in the meantime, don't use PROVE_LOCKING. Just use DEBUG_LOCK_ALLOC instead. Please drop this patch from your tree. Linus
On Mon, Dec 11, 2017 at 01:06:55PM -0800, Linus Torvalds wrote: > On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote: > > CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result > > in a large number of false positives because lockdep doesn't > > understand how to deal with multiple stacked loop or MD devices. > > Guys, can we just remove this nasty crud already? > > It's not working. Give it up. It was complex, it was buggy, it was slow. > > Now it's causing people to disable lockdep entirely, or play these > kinds of games in unrelated trees. > > It's time to give up on bad debugging, and definitely _not_ enable it > by default for PROVE_LOCKING. To be fair to Byungchul, I think it *can* be valid for finding some classes of bugs. It's just a disaster for anything to do with storage. I crafted this patch as something something which I thought *could* be a path forward; it disables it by default, and gives a warning about how it could cause a lot of pain for storage developers, but if other kernel devs want to use it to potentially find problem in their networking or wifi drivers --- sure, why not? Just make it be something *optional*. If people really want to make this work for storage, what I think we would need is variants of spin_lock_init(), mutex_init(), etc., which take a struct super or a struct block device, with proper documentation so that people don't have to struggle with undocumented C preprocessor macros where every single time I need to mess with lockdep annotations, I have to try figure out exactly what is a class and subclass. So in fact, what I was really hoping for was that some variant of this patch would end up in the sched tree, and get pushed to you v4.15-rcX patch as a regression fix, and I'd drop it from the ext4 tree. - Ted
On 12/12/2017 6:06 AM, Linus Torvalds wrote: > On Sun, Dec 10, 2017 at 7:50 PM, Theodore Ts'o <tytso@mit.edu> wrote: >> CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result >> in a large number of false positives because lockdep doesn't >> understand how to deal with multiple stacked loop or MD devices. > > Guys, can we just remove this nasty crud already? > > It's not working. Give it up. It was complex, it was buggy, it was slow. Sorry to hear that, but it works well and fortunately it has not been buggy until now, of course, it has been wrongly accused twice though. Furthormore, it's not slow now since it was fixed. You seem to say that w/o foundation but emotionally. The *problem* is false positives, since locks and waiters in kernel are not classified properly, at the moment, which is just a fact that is not related to cross-release stuff at all. IOW, that would be useful once all locks and waiters are classified correctly. It might take time but the classifying is a must-do we have to keep doing. I admit to make it optional for now, but I don't see why you want to remove it entierly. > Now it's causing people to disable lockdep entirely, or play these > kinds of games in unrelated trees. > > It's time to give up on bad debugging, and definitely _not_ enable it > by default for PROVE_LOCKING. > > Ted: in the meantime, don't use PROVE_LOCKING. Just use > DEBUG_LOCK_ALLOC instead. Please drop this patch from your tree. > > Linus >
On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > kernel are not classified properly, at the moment, which is just > a fact that is not related to cross-release stuff at all. IOW, > that would be useful once all locks and waiters are classified > correctly. It might take time but the classifying is a must-do > we have to keep doing. This is the wrong attitude. The reason why LOCKDEP was so powerful was because it automatically classified locks, instead of requiring developers to document the locking hierarchy. Requiring developers to have to document and classified locks --- especially when the d*mned mechanisms for doign so are so primitive and not even documented --- is a complete non-strarter. So, ***NO***, WE do not have to do anything. We can just disable Lockdep instead. You can not and must not transfer responsibility for documenting locks to the subsystem maintainers, as if it is some sort of bug that the locks are not "classified correctly". The fact that your new technique requires clasification is a BUG, and goes against the original design principle of LOCKDEP in the first place. > I admit to make it optional for now, but I don't see why you > want to remove it entierly. So are you willing to take my patch? Or give me permission to keep in the ext4 tree? - Ted
On Tue, Dec 12, 2017 at 08:03:43AM -0500, Theodore Ts'o wrote: > On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: > > The *problem* is false positives, since locks and waiters in > > kernel are not classified properly, at the moment, which is just > > a fact that is not related to cross-release stuff at all. IOW, > > that would be useful once all locks and waiters are classified > > correctly. It might take time but the classifying is a must-do > > we have to keep doing. > > This is the wrong attitude. The reason why LOCKDEP was so powerful > was because it automatically classified locks, instead of requiring > developers to document the locking hierarchy. Requiring developers to > have to document and classified locks --- especially when the d*mned > mechanisms for doign so are so primitive and not even documented --- > is a complete non-strarter. That's not fair. We had to annotate i_mutex nesting, for example, and several other places. crosslock doesn't change anything in this respect, it's just that the case that you hit every damn day as a filesystem developer is something that the normal person almost never does. > So are you willing to take my patch? Or give me permission to keep in > the ext4 tree? He sent a patch earlier ...
On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park <byungchul.park@lge.com> wrote: > > The *problem* is false positives, since locks and waiters in > kernel are not classified properly So the problem is that those false positives apparently end up being a big deal for the filesystem people. I personally don't think the code itself has to be removed, but I do think that it should never have been added on as part of the generic lock proving, and should always have been a separate config option. I also feel that you dismiss "false positives" much too easily. A false positive is a big problem - because it makes people ignore the real cases (or just disable the functionality entirely). It's why I am very quick to disable compiler warnings that have false positives, for example. Just a couple of "harmless" false positive warnings will poison the real warnings for people because they'll get used to seeing warnings while building, and no longer actually look at them. Linus
On 12/12/2017 10:03 PM, Theodore Ts'o wrote: > On Tue, Dec 12, 2017 at 02:20:32PM +0900, Byungchul Park wrote: >> >> The *problem* is false positives, since locks and waiters in >> kernel are not classified properly, at the moment, which is just >> a fact that is not related to cross-release stuff at all. IOW, >> that would be useful once all locks and waiters are classified >> correctly. It might take time but the classifying is a must-do >> we have to keep doing. > > This is the wrong attitude. The reason why LOCKDEP was so powerful I understand you why you told me like this, but you seem to mis-understand me. It's not a matter of attitudes. I also think any tools should not bother others as far as possible. > was because it automatically classified locks, instead of requiring Right. Original lockdep tries to classify locks automatically, but it can never achieve it with the current way. Conceptually, there are still many places where locks are not classified properly yet, but enough for lockdep to work for now. > developers to document the locking hierarchy. Requiring developers to > have to document and classified locks --- especially when the d*mned I also think requesting it to others instead of solving it internally is bad. Please don't mis-understand me. I just wanted to say that experts of specific domains can do it best. I know you and fs folks have worked on it as enough as original lockdep works, even though many other locks are still left classified unproperly. > mechanisms for doign so are so primitive and not even documented --- > is a complete non-strarter. > > So, ***NO***, WE do not have to do anything. We can just disable Right. I also think the best way is to classify locks more perfectly and automatically w/o bothering others - but we cannot achieve it in current way - relying on caller sites. > Lockdep instead. You can not and must not transfer responsibility for > documenting locks to the subsystem maintainers, as if it is some sort I said it can be done by each sub-system expert best, but it's not about responsibility. I also think we should not transfer respensibility to them. Sorry for confusing you. > of bug that the locks are not "classified correctly". The fact that However, "locks are not classified correctly, yet" is a fact. Of course, it has been just annotated as enough as lockdep works for now. But many things are still left. > your new technique requires clasification is a BUG, and goes against Sorry to say it, but I don't think so, it's not a bug. To classify locks more precisely just becomes required. For *example*, Lockdep automatically classifies locks 30%. Manually we have classified locks 20% more precisely until now. Cross-release requires to classify locks 20% more precisely. However, we still have 30% locks classified unproperly, but it's enough to work and those don't affect lockdep's work. > the original design principle of LOCKDEP in the first place. It's never against the original principle but following exactly same as original principle. >> I admit to make it optional for now, but I don't see why you >> want to remove it entierly. > > So are you willing to take my patch? Or give me permission to keep in > the ext4 tree? I agree with your patch. Or I want another to be taken which makes cross-release optional. Whatever. Thank you for your opinion. > > - Ted >
On 12/13/2017 2:00 AM, Linus Torvalds wrote: > On Mon, Dec 11, 2017 at 9:20 PM, Byungchul Park <byungchul.park@lge.com> wrote: >> >> The *problem* is false positives, since locks and waiters in >> kernel are not classified properly > > So the problem is that those false positives apparently end up being a > big deal for the filesystem people. > > I personally don't think the code itself has to be removed, but I do > think that it should never have been added on as part of the generic > lock proving, and should always have been a separate config option. I admit it. > I also feel that you dismiss "false positives" much too easily. A I don't dismiss the ones easily... Anyway, I mostly agree with your whole opinion. Thanks for replying. > false positive is a big problem - because it makes people ignore the > real cases (or just disable the functionality entirely). > > It's why I am very quick to disable compiler warnings that have false > positives, for example. Just a couple of "harmless" false positive > warnings will poison the real warnings for people because they'll get > used to seeing warnings while building, and no longer actually look at > them. > > Linus >
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 947d3e2ed5c2..acae7ba99c16 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -1099,8 +1099,8 @@ config PROVE_LOCKING select DEBUG_MUTEXES select DEBUG_RT_MUTEXES if RT_MUTEXES select DEBUG_LOCK_ALLOC - select LOCKDEP_CROSSRELEASE - select LOCKDEP_COMPLETIONS + select LOCKDEP_CROSSRELEASE if LOCKDEP_AGGRESSIVE + select LOCKDEP_COMPLETIONS if LOCKDEP_AGGRESSIVE select TRACE_IRQFLAGS default n help @@ -1137,6 +1137,16 @@ config PROVE_LOCKING For more details, see Documentation/locking/lockdep-design.txt. +config LOCKDEP_AGGRESSIVE + bool "Lock debugging: use aggressive techniques" + default n + help + This enables some extremely aggressive lockdep checking that + may result in false positives. + + Say N here if you are planning on using xfstests and don't + want to deal with many false positive test failures. + config LOCKDEP bool depends on DEBUG_KERNEL && TRACE_IRQFLAGS_SUPPORT && STACKTRACE_SUPPORT && LOCKDEP_SUPPORT
CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS can result in a large number of false positives because lockdep doesn't understand how to deal with multiple stacked loop or MD devices. Until someone can figure out how to automatically add annotations to all file system and storage devices --- hopefully without forcing developers to insert a large number of calls to undocumented lockdep macros into the the loop device, every single file system, and every single device-mapper backend --- let's add an option to allow these lockdep features to be disabled. Otherwise, many file system developers will disable LOCKDEP entirely since it results in far too many false positives when trying to use xfstests. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Byungchul Park <byungchul.park@lge.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-team@lge.com Cc: linux-block@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Cc: oleg@redhat.com Cc: tj@kernel.org --- lib/Kconfig.debug | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)