[v2] kunit: added lockdep support

Message ID	20200812193332.954395-1-urielguajardojr@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <SRS0=PM3n=BW=vger.kernel.org=linux-kselftest-owner@kernel.org> From: Uriel Guajardo <urielguajardojr@gmail.com> To: brendanhiggins@google.com, peterz@infradead.org, mingo@redhat.com, will@kernel.org Cc: linux-kselftest@vger.kernel.org, kunit-dev@googlegroups.com, linux-kernel@vger.kernel.org, urielguajardo@google.com, Uriel Guajardo <urielguajardojr@gmail.com> Subject: [PATCH v2] kunit: added lockdep support Date: Wed, 12 Aug 2020 19:33:32 +0000 Message-Id: <20200812193332.954395-1-urielguajardojr@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: linux-kselftest-owner@vger.kernel.org Precedence: bulk
Series	[v2] kunit: added lockdep support \| expand [v2] kunit: added lockdep support

Uriel Guajardo Aug. 12, 2020, 7:33 p.m. UTC

KUnit will fail tests upon observing a lockdep failure. Because lockdep
turns itself off after its first failure, only fail the first test and
warn users to not expect any future failures from lockdep.

Similar to lib/locking-selftest [1], we check if the status of
debug_locks has changed after the execution of a test case. However, we
do not reset lockdep afterwards.

Like the locking selftests, we also fix possible preemption count
corruption from lock bugs.

Depends on kunit: support failure from dynamic analysis tools [2]

[1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137

[2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajardojr@gmail.com/

Signed-off-by: Uriel Guajardo <urielguajardo@google.com>
---
v2 Changes:
- Removed lockdep_reset

- Added warning to users about lockdep shutting off
---
 lib/kunit/test.c | 27 ++++++++++++++++++++++++++-
 1 file changed, 26 insertions(+), 1 deletion(-)

Alan Maguire Aug. 13, 2020, 9:11 a.m. UTC | #1

On Wed, 12 Aug 2020, Uriel Guajardo wrote:

> KUnit will fail tests upon observing a lockdep failure. Because lockdep
> turns itself off after its first failure, only fail the first test and
> warn users to not expect any future failures from lockdep.
> 
> Similar to lib/locking-selftest [1], we check if the status of
> debug_locks has changed after the execution of a test case. However, we
> do not reset lockdep afterwards.
> 
> Like the locking selftests, we also fix possible preemption count
> corruption from lock bugs.
> 
> Depends on kunit: support failure from dynamic analysis tools [2]
> 
> [1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137
> 
> [2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajardojr@gmail.com/
> 
> Signed-off-by: Uriel Guajardo <urielguajardo@google.com>
> ---
> v2 Changes:
> - Removed lockdep_reset
> 
> - Added warning to users about lockdep shutting off
> ---
>  lib/kunit/test.c | 27 ++++++++++++++++++++++++++-
>  1 file changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> index d8189d827368..7e477482457b 100644
> --- a/lib/kunit/test.c
> +++ b/lib/kunit/test.c
> @@ -11,6 +11,7 @@
>  #include <linux/kref.h>
>  #include <linux/sched/debug.h>
>  #include <linux/sched.h>
> +#include <linux/debug_locks.h>
>  
>  #include "debugfs.h"
>  #include "string-stream.h"
> @@ -22,6 +23,26 @@ void kunit_fail_current_test(void)
>  		kunit_set_failure(current->kunit_test);
>  }
>  
> +static void kunit_check_locking_bugs(struct kunit *test,
> +				     unsigned long saved_preempt_count,
> +				     bool saved_debug_locks)
> +{
> +	preempt_count_set(saved_preempt_count);
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +	if (softirq_count())
> +		current->softirqs_enabled = 0;
> +	else
> +		current->softirqs_enabled = 1;
> +#endif
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> +	if (saved_debug_locks && !debug_locks) {
> +		kunit_set_failure(test);
> +		kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> +		kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> +	}
> +#endif
> +}

Nit: I could be wrong but the general approach for this sort of
feature is to do conditional compilation combined with "static inline"
definitions to handle the case where the feature isn't enabled. 
Could we tidy this up a bit and haul this stuff out into a
conditionally-compiled (if CONFIG_LOCKDEP) kunit lockdep.c file?
Then in kunit's lockdep.h we'd have

struct kunit_lockdep {
	int preempt_count;
	bool debug_locks;
};

#if IS_ENABLED(CONFIG_LOCKDEP)
void kunit_test_init_lockdep(struct kunit_test *test, struct 
			     kunit_lockdep *lockdep);
void kunit_test_check_lockdep(struct kunit_test *test,
			      struct kunit_lockdep *lockdep);
#else
static inline void kunit_init_lockdep(struct kunit_test *test,
				      struct kunit_lockdep *lockdep) { }
static inline void kunit_check_lockdep(struct kunit_test *test,
				       struct kunit_lockdep *lockdep) { }
#endif

The test execution code could then call

	struct kunit_lockdep lockdep;

	kunit_test_init_lockdep(test, &lockdep);

	kunit_test_check_lockdep(test, &lockdep);

If that approach makes sense, we could go a bit further
and we might benefit from a bit more generalization
here.  _If_ the pattern of needing pre- and post- test
actions is sustained across multiple analysis tools,
could we add generic hooks for this? That would allow any
additional dynamic analysis tools to utilize them.  So 
kunit_try_run_case() would then cycle through the registered
pre- hooks prior to running the case and post- hooks after,
failing if any of the latter returned a failure value.

I'm thinking something like

  kunit_register_external_test("lockdep", lockdep_pre, lockdep_post, 
			       &kunit_lockdep);

(or we could define a kunit_external_test struct for
better extensibility).

A void * would be passed to pre/post, in this case it'd
be a pointer to a struct containing the saved preempt
count/debug locks, and the registration could be called during
kunit initialization.  This doesn't need to be done with your
change of course but I wanted to float the idea as in addition
to uncluttering the test case execution code, it might allow
us to build facilities on top of that generic tool support for
situations like "I'd like to see if the test passes absent
any lockdep issues, so I'd like to disable lockdep-based failure".
Such situations are more likely to arise in a world where
kunit+tests are built as modules and run multiple times within
a single system boot admittedly, but worth considering I think.

For that we'd need a way to select which dynamic tools kunit
enables(kernel/module parameters or debugfs could do
this), but a generic approach might help that sort of thing.

An external test under this model wouldn't have to necessarily
be external to the area under test; the general criteria for
such things would be "something I want to track across multiple
test case execution".

Again I'm not trying to put you on the hook for any of
the above suggestions (having lockdep support like this is
fantastic!), but I think it'd be good to see if there's a
pattern here we could potentially exploit in other use cases.

Thanks!

Alan

Peter Zijlstra Aug. 13, 2020, 10:36 a.m. UTC | #2

On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
> KUnit will fail tests upon observing a lockdep failure. Because lockdep
> turns itself off after its first failure, only fail the first test and
> warn users to not expect any future failures from lockdep.
> 
> Similar to lib/locking-selftest [1], we check if the status of
> debug_locks has changed after the execution of a test case. However, we
> do not reset lockdep afterwards.
> 
> Like the locking selftests, we also fix possible preemption count
> corruption from lock bugs.

> +static void kunit_check_locking_bugs(struct kunit *test,
> +				     unsigned long saved_preempt_count,
> +				     bool saved_debug_locks)
> +{
> +	preempt_count_set(saved_preempt_count);
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +	if (softirq_count())
> +		current->softirqs_enabled = 0;
> +	else
> +		current->softirqs_enabled = 1;
> +#endif

Urgh, don't silently change these... if they're off that's a hard fail.

	if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count))
		preempt_count_set(saved_preempt_count);

And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger
the below fail.

> +	if (saved_debug_locks && !debug_locks) {
> +		kunit_set_failure(test);
> +		kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> +		kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> +	}
> +}

Uriel Guajardo Aug. 13, 2020, 1:15 p.m. UTC | #3

On Thu, Aug 13, 2020 at 5:36 AM <peterz@infradead.org> wrote:
>
> On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
> > KUnit will fail tests upon observing a lockdep failure. Because lockdep
> > turns itself off after its first failure, only fail the first test and
> > warn users to not expect any future failures from lockdep.
> >
> > Similar to lib/locking-selftest [1], we check if the status of
> > debug_locks has changed after the execution of a test case. However, we
> > do not reset lockdep afterwards.
> >
> > Like the locking selftests, we also fix possible preemption count
> > corruption from lock bugs.
>
> > +static void kunit_check_locking_bugs(struct kunit *test,
> > +                                  unsigned long saved_preempt_count,
> > +                                  bool saved_debug_locks)
> > +{
> > +     preempt_count_set(saved_preempt_count);
> > +#ifdef CONFIG_TRACE_IRQFLAGS
> > +     if (softirq_count())
> > +             current->softirqs_enabled = 0;
> > +     else
> > +             current->softirqs_enabled = 1;
> > +#endif
>
> Urgh, don't silently change these... if they're off that's a hard fail.
>
>         if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count))
>                 preempt_count_set(saved_preempt_count);
>
> And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger
> the below fail.

Hmm, I see. My original assumption was that lock related bugs that
could corrupt preempt_count would always be intervened by lockdep
(resulting in debug_locks already being off). Is this not always true?
In any case, I think it's better to explicitly show the failure
associated with preemption count as you have done, but I'm still
curious.

Also, for further clarification: the check you have made on
preempt_count also covers softirq_count, right? My understanding is
that softirqs are re-{enabled/disabled} due to the corruption of the
preemption count, so no changes should occur if the preemption count
remains the same. If it does change, we've already failed from
DEBUG_LOCKS_WARN_ON.

>
> > +     if (saved_debug_locks && !debug_locks) {
> > +             kunit_set_failure(test);
> > +             kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> > +             kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> > +     }
> > +}

Uriel Guajardo Aug. 13, 2020, 4:44 p.m. UTC | #4

On Thu, Aug 13, 2020 at 4:11 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>
> On Wed, 12 Aug 2020, Uriel Guajardo wrote:
>
> > KUnit will fail tests upon observing a lockdep failure. Because lockdep
> > turns itself off after its first failure, only fail the first test and
> > warn users to not expect any future failures from lockdep.
> >
> > Similar to lib/locking-selftest [1], we check if the status of
> > debug_locks has changed after the execution of a test case. However, we
> > do not reset lockdep afterwards.
> >
> > Like the locking selftests, we also fix possible preemption count
> > corruption from lock bugs.
> >
> > Depends on kunit: support failure from dynamic analysis tools [2]
> >
> > [1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137
> >
> > [2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajardojr@gmail.com/
> >
> > Signed-off-by: Uriel Guajardo <urielguajardo@google.com>
> > ---
> > v2 Changes:
> > - Removed lockdep_reset
> >
> > - Added warning to users about lockdep shutting off
> > ---
> >  lib/kunit/test.c | 27 ++++++++++++++++++++++++++-
> >  1 file changed, 26 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/kunit/test.c b/lib/kunit/test.c
> > index d8189d827368..7e477482457b 100644
> > --- a/lib/kunit/test.c
> > +++ b/lib/kunit/test.c
> > @@ -11,6 +11,7 @@
> >  #include <linux/kref.h>
> >  #include <linux/sched/debug.h>
> >  #include <linux/sched.h>
> > +#include <linux/debug_locks.h>
> >
> >  #include "debugfs.h"
> >  #include "string-stream.h"
> > @@ -22,6 +23,26 @@ void kunit_fail_current_test(void)
> >               kunit_set_failure(current->kunit_test);
> >  }
> >
> > +static void kunit_check_locking_bugs(struct kunit *test,
> > +                                  unsigned long saved_preempt_count,
> > +                                  bool saved_debug_locks)
> > +{
> > +     preempt_count_set(saved_preempt_count);
> > +#ifdef CONFIG_TRACE_IRQFLAGS
> > +     if (softirq_count())
> > +             current->softirqs_enabled = 0;
> > +     else
> > +             current->softirqs_enabled = 1;
> > +#endif
> > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > +     if (saved_debug_locks && !debug_locks) {
> > +             kunit_set_failure(test);
> > +             kunit_warn(test, "Dynamic analysis tool failure from LOCKDEP.");
> > +             kunit_warn(test, "Further tests will have LOCKDEP disabled.");
> > +     }
> > +#endif
> > +}
>
> Nit: I could be wrong but the general approach for this sort of
> feature is to do conditional compilation combined with "static inline"
> definitions to handle the case where the feature isn't enabled.
> Could we tidy this up a bit and haul this stuff out into a
> conditionally-compiled (if CONFIG_LOCKDEP) kunit lockdep.c file?

Sure! Apologies if this isn't convention.

> Then in kunit's lockdep.h we'd have
>
> struct kunit_lockdep {
>         int preempt_count;
>         bool debug_locks;
> };
>
> #if IS_ENABLED(CONFIG_LOCKDEP)
> void kunit_test_init_lockdep(struct kunit_test *test, struct
>                              kunit_lockdep *lockdep);
> void kunit_test_check_lockdep(struct kunit_test *test,
>                               struct kunit_lockdep *lockdep);
> #else
> static inline void kunit_init_lockdep(struct kunit_test *test,
>                                       struct kunit_lockdep *lockdep) { }
> static inline void kunit_check_lockdep(struct kunit_test *test,
>                                        struct kunit_lockdep *lockdep) { }
> #endif
>
>
> The test execution code could then call
>
>         struct kunit_lockdep lockdep;
>
>         kunit_test_init_lockdep(test, &lockdep);
>
>         kunit_test_check_lockdep(test, &lockdep);
>

Thanks for these helpful tips. I agree that it'll be cleaner this way.
I'll implement this in the next version of the patch.

> If that approach makes sense, we could go a bit further
> and we might benefit from a bit more generalization
> here.  _If_ the pattern of needing pre- and post- test
> actions is sustained across multiple analysis tools,
> could we add generic hooks for this? That would allow any
> additional dynamic analysis tools to utilize them.  So

I think this is a great idea. Right now I'm a little hesitant to
generalize beyond lockdep, since most analysis tools I've seen don't
seem to require this. For most tools, they fail, they report to KUnit,
then they continue working without us needing to clean state. Perhaps
the generic hooks could prove useful in other ways that I'm not
considering..

In any case, I will go ahead and work on the lockdep-specific hook for
KUnit. If you or anyone else thinks it could be useful in other ways
in the future, we can make it generic!

> kunit_try_run_case() would then cycle through the registered
> pre- hooks prior to running the case and post- hooks after,
> failing if any of the latter returned a failure value.
>
> I'm thinking something like
>
>   kunit_register_external_test("lockdep", lockdep_pre, lockdep_post,
>                                &kunit_lockdep);
>
> (or we could define a kunit_external_test struct for
> better extensibility).
>
> A void * would be passed to pre/post, in this case it'd
> be a pointer to a struct containing the saved preempt
> count/debug locks, and the registration could be called during
> kunit initialization.  This doesn't need to be done with your
> change of course but I wanted to float the idea as in addition
> to uncluttering the test case execution code, it might allow
> us to build facilities on top of that generic tool support for
> situations like "I'd like to see if the test passes absent
> any lockdep issues, so I'd like to disable lockdep-based failure".
> Such situations are more likely to arise in a world where
> kunit+tests are built as modules and run multiple times within
> a single system boot admittedly, but worth considering I think.

Interesting!

>
> For that we'd need a way to select which dynamic tools kunit
> enables(kernel/module parameters or debugfs could do
> this), but a generic approach might help that sort of thing.
>
> An external test under this model wouldn't have to necessarily
> be external to the area under test; the general criteria for
> such things would be "something I want to track across multiple
> test case execution".
>
> Again I'm not trying to put you on the hook for any of
> the above suggestions (having lockdep support like this is
> fantastic!), but I think it'd be good to see if there's a
> pattern here we could potentially exploit in other use cases.

No worries, thanks for putting these suggestions out there.

>
> Thanks!
>
> Alan

Peter Zijlstra Aug. 13, 2020, 6:35 p.m. UTC | #5

On Thu, Aug 13, 2020 at 08:15:27AM -0500, Uriel Guajardo wrote:
> On Thu, Aug 13, 2020 at 5:36 AM <peterz@infradead.org> wrote:
> >
> > On Wed, Aug 12, 2020 at 07:33:32PM +0000, Uriel Guajardo wrote:
> > > KUnit will fail tests upon observing a lockdep failure. Because lockdep
> > > turns itself off after its first failure, only fail the first test and
> > > warn users to not expect any future failures from lockdep.
> > >
> > > Similar to lib/locking-selftest [1], we check if the status of
> > > debug_locks has changed after the execution of a test case. However, we
> > > do not reset lockdep afterwards.
> > >
> > > Like the locking selftests, we also fix possible preemption count
> > > corruption from lock bugs.
> >
> > > +static void kunit_check_locking_bugs(struct kunit *test,
> > > +                                  unsigned long saved_preempt_count,
> > > +                                  bool saved_debug_locks)
> > > +{
> > > +     preempt_count_set(saved_preempt_count);
> > > +#ifdef CONFIG_TRACE_IRQFLAGS
> > > +     if (softirq_count())
> > > +             current->softirqs_enabled = 0;
> > > +     else
> > > +             current->softirqs_enabled = 1;
> > > +#endif
> >
> > Urgh, don't silently change these... if they're off that's a hard fail.
> >
> >         if (DEBUG_LOCKS_WARN_ON(preempt_count() != saved_preempt_count))
> >                 preempt_count_set(saved_preempt_count);
> >
> > And by using DEBUG_LOCKS_WARN_ON() it will kill IRQ tracing and trigger
> > the below fail.
> 
> Hmm, I see. My original assumption was that lock related bugs that
> could corrupt preempt_count would always be intervened by lockdep
> (resulting in debug_locks already being off). Is this not always true?
> In any case, I think it's better to explicitly show the failure
> associated with preemption count as you have done, but I'm still
> curious.

Code could have an unbalanced preempt_disable() unrelated to locks.

> Also, for further clarification: the check you have made on
> preempt_count also covers softirq_count, right? 

Correct.

> My understanding is
> that softirqs are re-{enabled/disabled} due to the corruption of the
> preemption count, so no changes should occur if the preemption count
> remains the same. If it does change, we've already failed from
> DEBUG_LOCKS_WARN_ON.

local_bh_enable() might call into softirq handling if it got raised
while disabled, you'll miss that here. The next interrupt will likely
run the softirq after that.

This is best effort error recovery, you got a splat, all we aim for is
living long enough to get the user to see it.

[v2] kunit: added lockdep support

Commit Message

Comments

Patch