diff mbox

possible circular locking dependency detected [was: linux-next: Tree for Aug 22]

Message ID 20170823043813.GH11771@tardis (mailing list archive)
State New, archived
Headers show

Commit Message

Boqun Feng Aug. 23, 2017, 4:38 a.m. UTC
On Wed, Aug 23, 2017 at 11:49:51AM +0800, Boqun Feng wrote:
> Hi Byungchul,
> 
> On Wed, Aug 23, 2017 at 09:03:04AM +0900, Byungchul Park wrote:
> > On Tue, Aug 22, 2017 at 09:43:56PM +0000, Bart Van Assche wrote:
> > > On Tue, 2017-08-22 at 19:47 +0900, Sergey Senozhatsky wrote:
> > > > ======================================================
> > > > WARNING: possible circular locking dependency detected
> > > > 4.13.0-rc6-next-20170822-dbg-00020-g39758ed8aae0-dirty #1746 Not tainted
> > > > ------------------------------------------------------
> > > > fsck.ext4/148 is trying to acquire lock:
> > > >  (&bdev->bd_mutex){+.+.}, at: [<ffffffff8116e73e>] __blkdev_put+0x33/0x190
> > > > 
> > > >  but now in release context of a crosslock acquired at the following:
> > > >  ((complete)&wait#2){+.+.}, at: [<ffffffff812159e0>] blk_execute_rq+0xbb/0xda
> > > > 
> > > >  which lock already depends on the new lock.
> > > > 
> 
> I felt this message really misleading, because the deadlock is detected
> at the commit time of "((complete)&wait#2)" rather than the acquisition
> time of "(&bdev->bd_mutex)", so I made the following improvement.
> 
> Thoughts?
> 
> Regards,
> Boqun
> 

While I'm on this one, I think we should also add a case in @check_src
is a cross lock, i.e. we detect cross deadlock at the acquisition time
of the cross lock. How about the following?

Regards,
Boqun

--------------------------------------->8
From: Boqun Feng <boqun.feng@gmail.com>
Date: Wed, 23 Aug 2017 12:12:16 +0800
Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at
 acquisition time

For a potential deadlock about CROSSRELEASE as follow:

	P1		P2
	===========	=============
	lock(A)
	lock(X)
			lock(A)
			commit(X)

	A: normal lock, X: cross lock

, we could detect it at two places:

1. commit time:

	We have run P1 first, and have dependency A --> X in graph, and
	then we run P2, and find the deadlock.

2. acquisition time:

	We have run P2 first, and have dependency A --> X, in
	graph(because another P3 may run previously and is acquiring for
	lock X), and then we run P1 and find the deadlock.

In current print_circular_lock_scenario(), for 1) we could print the
right scenario and note that's a deadlock related to CROSSRELEASE,
however for 2) we print the scenario as a normal lockdep deadlock.

It's better to print a proper scenario related to CROSSRELEASE to help
users find their bugs more easily, so improve this.

Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
---
 kernel/locking/lockdep.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Sergey Senozhatsky Aug. 23, 2017, 4:46 a.m. UTC | #1
On (08/23/17 12:38), Boqun Feng wrote:
[..]
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 642fb5362507..a3709e15f609 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
>  		__print_lock_name(target);
>  		printk(KERN_CONT ");\n");

KERN_CONT and "\n" should not be together. "\n" flushes the cont
buffer immediately.

	-ss

>  		printk("\n *** DEADLOCK ***\n\n");
> +	} else if (cross_lock(src->instance)) {
> +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> +		printk("       CPU0                    CPU1\n");
> +		printk("       ----                    ----\n");
> +		printk("  lock(");
> +		__print_lock_name(target);
> +		printk(KERN_CONT ");\n");
> +		printk("  lock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("                               lock(");
> +		__print_lock_name(parent == source ? target : parent);
> +		printk(KERN_CONT ");\n");
> +		printk("                               unlock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("\n *** DEADLOCK ***\n\n");
>  	} else {
>  		printk(" Possible unsafe locking scenario:\n\n");
>  		printk("       CPU0                    CPU1\n");
> -- 
> 2.14.1
>
Boqun Feng Aug. 23, 2017, 5:35 a.m. UTC | #2
On Wed, Aug 23, 2017 at 01:46:48PM +0900, Sergey Senozhatsky wrote:
> On (08/23/17 12:38), Boqun Feng wrote:
> [..]
> > diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> > index 642fb5362507..a3709e15f609 100644
> > --- a/kernel/locking/lockdep.c
> > +++ b/kernel/locking/lockdep.c
> > @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
> >  		__print_lock_name(target);
> >  		printk(KERN_CONT ");\n");
> 
> KERN_CONT and "\n" should not be together. "\n" flushes the cont
> buffer immediately.
> 

Hmm.. Not quite familiar with printk() stuffs, but I could see several
usages of printk(KERN_CONT "...\n") in kernel.

Did a bit research myself, and I now think the inappropriate use is to
use a KERN_CONT printk *after* another printk ending with a "\n". Am I
missing some recent changes or rules of KERN_CONT?

Regards,
Boqun

> 	-ss
> 
> >  		printk("\n *** DEADLOCK ***\n\n");
> > +	} else if (cross_lock(src->instance)) {
> > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > +		printk("       CPU0                    CPU1\n");
> > +		printk("       ----                    ----\n");
> > +		printk("  lock(");
> > +		__print_lock_name(target);
> > +		printk(KERN_CONT ");\n");
> > +		printk("  lock(");
> > +		__print_lock_name(source);
> > +		printk(KERN_CONT ");\n");
> > +		printk("                               lock(");
> > +		__print_lock_name(parent == source ? target : parent);
> > +		printk(KERN_CONT ");\n");
> > +		printk("                               unlock(");
> > +		__print_lock_name(source);
> > +		printk(KERN_CONT ");\n");
> > +		printk("\n *** DEADLOCK ***\n\n");
> >  	} else {
> >  		printk(" Possible unsafe locking scenario:\n\n");
> >  		printk("       CPU0                    CPU1\n");
> > -- 
> > 2.14.1
> >
Byungchul Park Aug. 23, 2017, 5:44 a.m. UTC | #3
On Wed, Aug 23, 2017 at 12:38:13PM +0800, Boqun Feng wrote:
> From: Boqun Feng <boqun.feng@gmail.com>
> Date: Wed, 23 Aug 2017 12:12:16 +0800
> Subject: [PATCH] lockdep: Print proper scenario if cross deadlock detected at
>  acquisition time
> 
> For a potential deadlock about CROSSRELEASE as follow:
> 
> 	P1		P2
> 	===========	=============
> 	lock(A)
> 	lock(X)
> 			lock(A)
> 			commit(X)
> 
> 	A: normal lock, X: cross lock
> 
> , we could detect it at two places:
> 
> 1. commit time:
> 
> 	We have run P1 first, and have dependency A --> X in graph, and
> 	then we run P2, and find the deadlock.
> 
> 2. acquisition time:
> 
> 	We have run P2 first, and have dependency A --> X, in

X -> A

> 	graph(because another P3 may run previously and is acquiring for

".. another P3 may have run previously and was holding .."
            ^
    Additionally, not only P3 but also P2 like:

	lock(A)
	lock(X)
			lock(X) // I mean it's at _P2_
			lock(A)
			commit(X)

> 	lock X), and then we run P1 and find the deadlock.
>
> In current print_circular_lock_scenario(), for 1) we could print the
> right scenario and note that's a deadlock related to CROSSRELEASE,
> however for 2) we print the scenario as a normal lockdep deadlock.
> 
> It's better to print a proper scenario related to CROSSRELEASE to help
> users find their bugs more easily, so improve this.
> 
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> ---
>  kernel/locking/lockdep.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 642fb5362507..a3709e15f609 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -1156,6 +1156,23 @@ print_circular_lock_scenario(struct held_lock *src,
>  		__print_lock_name(target);
>  		printk(KERN_CONT ");\n");
>  		printk("\n *** DEADLOCK ***\n\n");
> +	} else if (cross_lock(src->instance)) {
> +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> +		printk("       CPU0                    CPU1\n");
> +		printk("       ----                    ----\n");
> +		printk("  lock(");
> +		__print_lock_name(target);
> +		printk(KERN_CONT ");\n");
> +		printk("  lock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("                               lock(");
> +		__print_lock_name(parent == source ? target : parent);
> +		printk(KERN_CONT ");\n");
> +		printk("                               unlock(");
> +		__print_lock_name(source);
> +		printk(KERN_CONT ");\n");
> +		printk("\n *** DEADLOCK ***\n\n");
>  	} else {
>  		printk(" Possible unsafe locking scenario:\n\n");
>  		printk("       CPU0                    CPU1\n");

I need time to be sure if it's correct.
Sergey Senozhatsky Aug. 23, 2017, 5:44 a.m. UTC | #4
On (08/23/17 13:35), Boqun Feng wrote:
[..]
> > >  		printk(KERN_CONT ");\n");
> > 
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n". Am I
> missing some recent changes or rules of KERN_CONT?

has been this way for quite some time (if not always).

LOG_NEWLINE results in cont_flush(), which log_store() the content
of KERN_CONT buffer.

if we see that supplied message has no \n then we store it in a
dedicated buffer (cont buffer)

	if (!(lflags & LOG_NEWLINE))
		return cont_add();
	
	return log_store();

we flush that buffer (move its content to the kernel log buffer) when
we receive a message with a \n or when printk() from another task/context
interrupts the current cont line and, thus, forces us to flush.

	-ss
Sergey Senozhatsky Aug. 23, 2017, 5:55 a.m. UTC | #5
On (08/23/17 13:35), Boqun Feng wrote:
> > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > buffer immediately.
> > 
> 
> Hmm.. Not quite familiar with printk() stuffs, but I could see several
> usages of printk(KERN_CONT "...\n") in kernel.
> 
> Did a bit research myself, and I now think the inappropriate use is to
> use a KERN_CONT printk *after* another printk ending with a "\n".

ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
upon the return. sorry, your code is correct.

	-ss

> > >  		printk("\n *** DEADLOCK ***\n\n");
> > > +	} else if (cross_lock(src->instance)) {
> > > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > > +		printk("       CPU0                    CPU1\n");
> > > +		printk("       ----                    ----\n");
> > > +		printk("  lock(");
> > > +		__print_lock_name(target);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("  lock(");
> > > +		__print_lock_name(source);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("                               lock(");
> > > +		__print_lock_name(parent == source ? target : parent);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("                               unlock(");
> > > +		__print_lock_name(source);
> > > +		printk(KERN_CONT ");\n");
> > > +		printk("\n *** DEADLOCK ***\n\n");
> > >  	} else {
> > >  		printk(" Possible unsafe locking scenario:\n\n");
> > >  		printk("       CPU0                    CPU1\n");
> > > -- 
> > > 2.14.1
> > >
Boqun Feng Aug. 24, 2017, 4:39 a.m. UTC | #6
On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote:
> On (08/23/17 13:35), Boqun Feng wrote:
> > > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > > buffer immediately.
> > > 
> > 
> > Hmm.. Not quite familiar with printk() stuffs, but I could see several
> > usages of printk(KERN_CONT "...\n") in kernel.
> > 
> > Did a bit research myself, and I now think the inappropriate use is to
> > use a KERN_CONT printk *after* another printk ending with a "\n".
> 
> ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
> upon the return. sorry, your code is correct.
> 

So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a
correct usage, right? Thanks. Again, not familiar with printk stuff,
glad you can help me go through this ;-)

Regards,
Boqun

> 	-ss
> 
> > > >  		printk("\n *** DEADLOCK ***\n\n");
> > > > +	} else if (cross_lock(src->instance)) {
> > > > +		printk(" Possible unsafe locking scenario by crosslock:\n\n");
> > > > +		printk("       CPU0                    CPU1\n");
> > > > +		printk("       ----                    ----\n");
> > > > +		printk("  lock(");
> > > > +		__print_lock_name(target);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("  lock(");
> > > > +		__print_lock_name(source);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("                               lock(");
> > > > +		__print_lock_name(parent == source ? target : parent);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("                               unlock(");
> > > > +		__print_lock_name(source);
> > > > +		printk(KERN_CONT ");\n");
> > > > +		printk("\n *** DEADLOCK ***\n\n");
> > > >  	} else {
> > > >  		printk(" Possible unsafe locking scenario:\n\n");
> > > >  		printk("       CPU0                    CPU1\n");
> > > > -- 
> > > > 2.14.1
> > > > 
> 
>
Sergey Senozhatsky Aug. 24, 2017, 4:49 a.m. UTC | #7
Hi,

On (08/24/17 12:39), Boqun Feng wrote:
> On Wed, Aug 23, 2017 at 02:55:17PM +0900, Sergey Senozhatsky wrote:
> > On (08/23/17 13:35), Boqun Feng wrote:
> > > > KERN_CONT and "\n" should not be together. "\n" flushes the cont
> > > > buffer immediately.
> > > > 
> > > 
> > > Hmm.. Not quite familiar with printk() stuffs, but I could see several
> > > usages of printk(KERN_CONT "...\n") in kernel.
> > > 
> > > Did a bit research myself, and I now think the inappropriate use is to
> > > use a KERN_CONT printk *after* another printk ending with a "\n".
> > 
> > ah... I didn't check __print_lock_name(): it leaves unflushed cont buffer
> > upon the return. sorry, your code is correct.
> > 
> 
> So means printk(KERN_CON "..."); + printk(KERN_CONT "...\n") is a
> correct usage, right?

well, yes. with one precondition - there should be no printk-s from other
CPUs/tasks in between

	printk(KERN_CON "...");  +  printk(KERN_CONT "...\n")
                               ^^^^^
			here we can have a preliminary flush and broken
			cont line. but it's been this way forever.

	-ss
diff mbox

Patch

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 642fb5362507..a3709e15f609 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -1156,6 +1156,23 @@  print_circular_lock_scenario(struct held_lock *src,
 		__print_lock_name(target);
 		printk(KERN_CONT ");\n");
 		printk("\n *** DEADLOCK ***\n\n");
+	} else if (cross_lock(src->instance)) {
+		printk(" Possible unsafe locking scenario by crosslock:\n\n");
+		printk("       CPU0                    CPU1\n");
+		printk("       ----                    ----\n");
+		printk("  lock(");
+		__print_lock_name(target);
+		printk(KERN_CONT ");\n");
+		printk("  lock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("                               lock(");
+		__print_lock_name(parent == source ? target : parent);
+		printk(KERN_CONT ");\n");
+		printk("                               unlock(");
+		__print_lock_name(source);
+		printk(KERN_CONT ");\n");
+		printk("\n *** DEADLOCK ***\n\n");
 	} else {
 		printk(" Possible unsafe locking scenario:\n\n");
 		printk("       CPU0                    CPU1\n");