diff mbox series

kunit: added lockdep support

Message ID 20200810213257.438861-1-urielguajardojr@gmail.com (mailing list archive)
State New
Headers show
Series kunit: added lockdep support | expand

Commit Message

Uriel Guajardo Aug. 10, 2020, 9:32 p.m. UTC
From: Uriel Guajardo <urielguajardo@google.com>

KUnit tests will now fail if lockdep detects an error during a test
case.

The idea comes from how lib/locking-selftest [1] checks for lock errors: we
first if lock debugging is turned on. If not, an error must have
occurred, so we fail the test and restart lockdep for the next test case.

Like the locking selftests, we also fix possible preemption count
corruption from lock bugs.

Depends on kunit: support failure from dynamic analysis tools [2]

[1] https://elixir.bootlin.com/linux/v5.7.12/source/lib/locking-selftest.c#L1137

[2] https://lore.kernel.org/linux-kselftest/20200806174326.3577537-1-urielguajardojr@gmail.com/

Signed-off-by: Uriel Guajardo <urielguajardo@google.com>
---
 lib/kunit/test.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

Comments

Peter Zijlstra Aug. 10, 2020, 9:43 p.m. UTC | #1
On Mon, Aug 10, 2020 at 09:32:57PM +0000, Uriel Guajardo wrote:
> +static inline void kunit_check_locking_bugs(struct kunit *test,
> +					    unsigned long saved_preempt_count)
> +{
> +	preempt_count_set(saved_preempt_count);
> +#ifdef CONFIG_TRACE_IRQFLAGS
> +	if (softirq_count())
> +		current->softirqs_enabled = 0;
> +	else
> +		current->softirqs_enabled = 1;
> +#endif
> +#if IS_ENABLED(CONFIG_LOCKDEP)
> +	local_irq_disable();
> +	if (!debug_locks) {
> +		kunit_set_failure(test);
> +		lockdep_reset();
> +	}
> +	local_irq_enable();
> +#endif
> +}

Unless you can guarantee this runs before SMP brinup, that
lockdep_reset() is terminally broken.
Uriel Guajardo Aug. 11, 2020, 5:03 p.m. UTC | #2
On Mon, Aug 10, 2020 at 4:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Mon, Aug 10, 2020 at 09:32:57PM +0000, Uriel Guajardo wrote:
> > +static inline void kunit_check_locking_bugs(struct kunit *test,
> > +                                         unsigned long saved_preempt_count)
> > +{
> > +     preempt_count_set(saved_preempt_count);
> > +#ifdef CONFIG_TRACE_IRQFLAGS
> > +     if (softirq_count())
> > +             current->softirqs_enabled = 0;
> > +     else
> > +             current->softirqs_enabled = 1;
> > +#endif
> > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > +     local_irq_disable();
> > +     if (!debug_locks) {
> > +             kunit_set_failure(test);
> > +             lockdep_reset();
> > +     }
> > +     local_irq_enable();
> > +#endif
> > +}
>
> Unless you can guarantee this runs before SMP brinup, that
> lockdep_reset() is terminally broken.

Good point. KUnit is initialized after SMP is set up, and KUnit can
also be built as a module, so it's not a guarantee that we can make.
Is there any other way to turn lockdep back on after we detect a
failure? It would be ideal if lockdep could still run in the next test
case after a failure in a previous one.

I suppose we could only display the first failure that occurs, similar
to how lockdep does it. But it could also be useful to developers if
they saw failures in subsequent test cases, with the knowledge that
those failures may be unreliable.
Peter Zijlstra Aug. 11, 2020, 7:05 p.m. UTC | #3
On Tue, Aug 11, 2020 at 12:03:51PM -0500, Uriel Guajardo wrote:
> On Mon, Aug 10, 2020 at 4:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Mon, Aug 10, 2020 at 09:32:57PM +0000, Uriel Guajardo wrote:
> > > +static inline void kunit_check_locking_bugs(struct kunit *test,
> > > +                                         unsigned long saved_preempt_count)
> > > +{
> > > +     preempt_count_set(saved_preempt_count);
> > > +#ifdef CONFIG_TRACE_IRQFLAGS
> > > +     if (softirq_count())
> > > +             current->softirqs_enabled = 0;
> > > +     else
> > > +             current->softirqs_enabled = 1;
> > > +#endif
> > > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > > +     local_irq_disable();
> > > +     if (!debug_locks) {
> > > +             kunit_set_failure(test);
> > > +             lockdep_reset();
> > > +     }
> > > +     local_irq_enable();
> > > +#endif
> > > +}
> >
> > Unless you can guarantee this runs before SMP brinup, that
> > lockdep_reset() is terminally broken.
> 
> Good point. KUnit is initialized after SMP is set up, and KUnit can
> also be built as a module, so it's not a guarantee that we can make.

Even if you could, there's still the question of wether throwing out all
the dependencies learned during boot is a sensible idea.

> Is there any other way to turn lockdep back on after we detect a
> failure? It would be ideal if lockdep could still run in the next test
> case after a failure in a previous one.

Not really; the moment lockdep reports a failure it turns off all
tracking and we instantly loose state.

You'd have to:

 - delete the 'mistaken' dependency from the graph such that we loose
   the cycle, otherwise it will continue to find and report the cycle.

 - put every task through a known empty state which turns the tracking
   back on.

Bart implemented most of what you need for the first item last year or
so, but the remaining bit and the second item would still be a fair
amount of work.

Also, I'm really not sure it's worth it, the kernel should be free of
lock cycles, so just fix one, reboot and continue.

> I suppose we could only display the first failure that occurs, similar
> to how lockdep does it. But it could also be useful to developers if
> they saw failures in subsequent test cases, with the knowledge that
> those failures may be unreliable.

People already struggle with lockdep reports enough; I really don't want
to given them dodgy report to worry about.
Uriel Guajardo Aug. 11, 2020, 10:22 p.m. UTC | #4
On Tue, Aug 11, 2020 at 2:05 PM <peterz@infradead.org> wrote:
>
> On Tue, Aug 11, 2020 at 12:03:51PM -0500, Uriel Guajardo wrote:
> > On Mon, Aug 10, 2020 at 4:43 PM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > On Mon, Aug 10, 2020 at 09:32:57PM +0000, Uriel Guajardo wrote:
> > > > +static inline void kunit_check_locking_bugs(struct kunit *test,
> > > > +                                         unsigned long saved_preempt_count)
> > > > +{
> > > > +     preempt_count_set(saved_preempt_count);
> > > > +#ifdef CONFIG_TRACE_IRQFLAGS
> > > > +     if (softirq_count())
> > > > +             current->softirqs_enabled = 0;
> > > > +     else
> > > > +             current->softirqs_enabled = 1;
> > > > +#endif
> > > > +#if IS_ENABLED(CONFIG_LOCKDEP)
> > > > +     local_irq_disable();
> > > > +     if (!debug_locks) {
> > > > +             kunit_set_failure(test);
> > > > +             lockdep_reset();
> > > > +     }
> > > > +     local_irq_enable();
> > > > +#endif
> > > > +}
> > >
> > > Unless you can guarantee this runs before SMP brinup, that
> > > lockdep_reset() is terminally broken.
> >
> > Good point. KUnit is initialized after SMP is set up, and KUnit can
> > also be built as a module, so it's not a guarantee that we can make.
>
> Even if you could, there's still the question of wether throwing out all
> the dependencies learned during boot is a sensible idea.
>
> > Is there any other way to turn lockdep back on after we detect a
> > failure? It would be ideal if lockdep could still run in the next test
> > case after a failure in a previous one.
>
> Not really; the moment lockdep reports a failure it turns off all
> tracking and we instantly loose state.
>
> You'd have to:
>
>  - delete the 'mistaken' dependency from the graph such that we loose
>    the cycle, otherwise it will continue to find and report the cycle.
>
>  - put every task through a known empty state which turns the tracking
>    back on.
>
> Bart implemented most of what you need for the first item last year or
> so, but the remaining bit and the second item would still be a fair
> amount of work.
>
> Also, I'm really not sure it's worth it, the kernel should be free of
> lock cycles, so just fix one, reboot and continue.
>
> > I suppose we could only display the first failure that occurs, similar
> > to how lockdep does it. But it could also be useful to developers if
> > they saw failures in subsequent test cases, with the knowledge that
> > those failures may be unreliable.
>
> People already struggle with lockdep reports enough; I really don't want
> to given them dodgy report to worry about.

Ah, ok! Fair enough, thanks for the info. Although resetting lockdep
would be nice to have in the future, I think it's enough to only
report the first failure and warn the user that further test cases
will have lockdep disabled. People can then fix the issue and then
re-run it. I'll follow up with a patch that does this.
diff mbox series

Patch

diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index d8189d827368..0838ececa005 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -11,6 +11,8 @@ 
 #include <linux/kref.h>
 #include <linux/sched/debug.h>
 #include <linux/sched.h>
+#include <linux/lockdep.h>
+#include <linux/debug_locks.h>
 
 #include "debugfs.h"
 #include "string-stream.h"
@@ -22,6 +24,26 @@  void kunit_fail_current_test(void)
 		kunit_set_failure(current->kunit_test);
 }
 
+static inline void kunit_check_locking_bugs(struct kunit *test,
+					    unsigned long saved_preempt_count)
+{
+	preempt_count_set(saved_preempt_count);
+#ifdef CONFIG_TRACE_IRQFLAGS
+	if (softirq_count())
+		current->softirqs_enabled = 0;
+	else
+		current->softirqs_enabled = 1;
+#endif
+#if IS_ENABLED(CONFIG_LOCKDEP)
+	local_irq_disable();
+	if (!debug_locks) {
+		kunit_set_failure(test);
+		lockdep_reset();
+	}
+	local_irq_enable();
+#endif
+}
+
 static void kunit_print_tap_version(void)
 {
 	static bool kunit_has_printed_tap_version;
@@ -289,6 +311,7 @@  static void kunit_try_run_case(void *data)
 	struct kunit *test = ctx->test;
 	struct kunit_suite *suite = ctx->suite;
 	struct kunit_case *test_case = ctx->test_case;
+	unsigned long saved_preempt_count = preempt_count();
 
 	current->kunit_test = test;
 
@@ -298,7 +321,8 @@  static void kunit_try_run_case(void *data)
 	 * thread will resume control and handle any necessary clean up.
 	 */
 	kunit_run_case_internal(test, suite, test_case);
-	/* This line may never be reached. */
+	/* These lines may never be reached. */
+	kunit_check_locking_bugs(test, saved_preempt_count);
 	kunit_run_case_cleanup(test, suite);
 }