mbox series

[v3,00/21] DEPT(Dependency Tracker)

Message ID 1646042220-28952-1-git-send-email-byungchul.park@lge.com (mailing list archive)
Headers show
Series DEPT(Dependency Tracker) | expand

Message

Byungchul Park Feb. 28, 2022, 9:56 a.m. UTC
I didn't want to bother you so I was planning to send the next spin
after making more progress. However, PATCH v2 reports too many false
positives because Dept tracked the bit_wait_table[] wrong way - I
apologize for that. So I decided to send PATCH v3 first before going
further for those who want to run Dept for now.

There might still be some false positives but not overwhelming.

---

Hi Linus and folks,

I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v5.17-rc1 tag.

https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1

Benifit:

	0. Works with all lock primitives.
	1. Works with wait_for_completion()/complete().
	2. Works with 'wait' on PG_locked.
	3. Works with 'wait' on PG_writeback.
	4. Works with swait/wakeup.
	5. Works with waitqueue.
	6. Multiple reports are allowed.
	7. Deduplication control on multiple reports.
	8. Withstand false positives thanks to 6.
	9. Easy to tag any wait/event.

Future work:

	0. To make it more stable.
	1. To separates Dept from Lockdep.
	2. To improves performance in terms of time and space.
	3. To use Dept as a dependency engine for Lockdep.
	4. To add any missing tags of wait/event in the kernel.
	5. To deduplicate stack trace.

How to interpret reports:

	1. E(event) in each context cannot be triggered because of the
	   W(wait) that cannot be woken.
	2. The stack trace helping find the problematic code is located
	   in each conext's detail.

Thanks,
Byungchul

---

Changes from v2:

	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
	   reporting a lot of false positives, which is my fault.
	   Wait/event for bit_wait_table[] should've been tagged in a
	   higher layer for better work, which is a future work.
	   (feedback from Jan Kara)
	2. Disable Dept on crypto_larval's completion to prevent a false
	   positive.

Changes from v1:

	1. Fix coding style and typo. (feedback from Steven)
	2. Distinguish each work context from another in workqueue.
	3. Skip checking lock acquisition with nest_lock, which is about
	   correct lock usage that should be checked by Lockdep.

Changes from RFC:

	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
	   (feedback from Linus and Matthew)
	2. Use try version at lockdep_acquire_cpus_lock() annotation.
	3. Distinguish each syscall context from another.

Byungchul Park (21):
  llist: Move llist_{head,node} definition to types.h
  dept: Implement Dept(Dependency Tracker)
  dept: Embed Dept data in Lockdep
  dept: Add a API for skipping dependency check temporarily
  dept: Apply Dept to spinlock
  dept: Apply Dept to mutex families
  dept: Apply Dept to rwlock
  dept: Apply Dept to wait_for_completion()/complete()
  dept: Apply Dept to seqlock
  dept: Apply Dept to rwsem
  dept: Add proc knobs to show stats and dependency graph
  dept: Introduce split map concept and new APIs for them
  dept: Apply Dept to wait/event of PG_{locked,writeback}
  dept: Apply SDT to swait
  dept: Apply SDT to wait(waitqueue)
  locking/lockdep, cpu/hotplus: Use a weaker annotation in AP thread
  dept: Distinguish each syscall context from another
  dept: Distinguish each work from another
  dept: Disable Dept within the wait_bit layer by default
  dept: Add nocheck version of init_completion()
  dept: Disable Dept on struct crypto_larval's completion for now

 crypto/api.c                       |    7 +-
 include/linux/completion.h         |   50 +-
 include/linux/dept.h               |  535 +++++++
 include/linux/dept_page.h          |   78 ++
 include/linux/dept_sdt.h           |   62 +
 include/linux/hardirq.h            |    3 +
 include/linux/irqflags.h           |   33 +-
 include/linux/llist.h              |    8 -
 include/linux/lockdep.h            |  158 ++-
 include/linux/lockdep_types.h      |    3 +
 include/linux/mutex.h              |   33 +
 include/linux/page-flags.h         |   45 +-
 include/linux/pagemap.h            |    7 +-
 include/linux/percpu-rwsem.h       |   10 +-
 include/linux/rtmutex.h            |    7 +
 include/linux/rwlock.h             |   52 +
 include/linux/rwlock_api_smp.h     |    8 +-
 include/linux/rwlock_types.h       |    7 +
 include/linux/rwsem.h              |   33 +
 include/linux/sched.h              |    7 +
 include/linux/seqlock.h            |   59 +-
 include/linux/spinlock.h           |   26 +
 include/linux/spinlock_types_raw.h |   13 +
 include/linux/swait.h              |    4 +
 include/linux/types.h              |    8 +
 include/linux/wait.h               |    6 +-
 init/init_task.c                   |    2 +
 init/main.c                        |    4 +
 kernel/Makefile                    |    1 +
 kernel/cpu.c                       |    2 +-
 kernel/dependency/Makefile         |    4 +
 kernel/dependency/dept.c           | 2712 ++++++++++++++++++++++++++++++++++++
 kernel/dependency/dept_hash.h      |   10 +
 kernel/dependency/dept_internal.h  |   26 +
 kernel/dependency/dept_object.h    |   13 +
 kernel/dependency/dept_proc.c      |   92 ++
 kernel/entry/common.c              |    3 +
 kernel/exit.c                      |    1 +
 kernel/fork.c                      |    2 +
 kernel/locking/lockdep.c           |   12 +-
 kernel/module.c                    |    2 +
 kernel/sched/completion.c          |   12 +-
 kernel/sched/core.c                |    3 +
 kernel/sched/swait.c               |   10 +
 kernel/sched/wait.c                |   16 +
 kernel/sched/wait_bit.c            |    5 +-
 kernel/softirq.c                   |    6 +-
 kernel/trace/trace_preemptirq.c    |   19 +-
 kernel/workqueue.c                 |    3 +
 lib/Kconfig.debug                  |   21 +
 mm/filemap.c                       |   68 +
 mm/page_ext.c                      |    5 +
 52 files changed, 4257 insertions(+), 59 deletions(-)
 create mode 100644 include/linux/dept.h
 create mode 100644 include/linux/dept_page.h
 create mode 100644 include/linux/dept_sdt.h
 create mode 100644 kernel/dependency/Makefile
 create mode 100644 kernel/dependency/dept.c
 create mode 100644 kernel/dependency/dept_hash.h
 create mode 100644 kernel/dependency/dept_internal.h
 create mode 100644 kernel/dependency/dept_object.h
 create mode 100644 kernel/dependency/dept_proc.c

Comments

Hyeonggon Yoo March 2, 2022, 4:36 a.m. UTC | #1
On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> I didn't want to bother you so I was planning to send the next spin
> after making more progress. However, PATCH v2 reports too many false
> positives because Dept tracked the bit_wait_table[] wrong way - I
> apologize for that. So I decided to send PATCH v3 first before going
> further for those who want to run Dept for now.
> 
> There might still be some false positives but not overwhelming.
>

Hello Byungchul, I'm running DEPT v3 on my system
and I see report below.

Looking at the kmemleak code and comment, I think
kmemleak tried to avoid lockdep recursive warning
but detected by DEPT?

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_nested+0xa8/0x1b0
 scan_block+0xb4/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> ---
> 
> Hi Linus and folks,
> 
> I've been developing a tool for detecting deadlock possibilities by
> tracking wait/event rather than lock(?) acquisition order to try to
> cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> 
> https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
>
[...]
> Benifit:
> 
> 	0. Works with all lock primitives.
> 	1. Works with wait_for_completion()/complete().
> 	2. Works with 'wait' on PG_locked.
> 	3. Works with 'wait' on PG_writeback.
> 	4. Works with swait/wakeup.
> 	5. Works with waitqueue.
> 	6. Multiple reports are allowed.
> 	7. Deduplication control on multiple reports.
> 	8. Withstand false positives thanks to 6.
> 	9. Easy to tag any wait/event.
> 
> Future work:
> 
> 	0. To make it more stable.
> 	1. To separates Dept from Lockdep.
> 	2. To improves performance in terms of time and space.
> 	3. To use Dept as a dependency engine for Lockdep.
> 	4. To add any missing tags of wait/event in the kernel.
> 	5. To deduplicate stack trace.
> 
> How to interpret reports:
> 
> 	1. E(event) in each context cannot be triggered because of the
> 	   W(wait) that cannot be woken.
> 	2. The stack trace helping find the problematic code is located
> 	   in each conext's detail.
> 
> Thanks,
> Byungchul
> 
> ---
> 
> Changes from v2:
> 
> 	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> 	   reporting a lot of false positives, which is my fault.
> 	   Wait/event for bit_wait_table[] should've been tagged in a
> 	   higher layer for better work, which is a future work.
> 	   (feedback from Jan Kara)
> 	2. Disable Dept on crypto_larval's completion to prevent a false
> 	   positive.
> 
> Changes from v1:
> 
> 	1. Fix coding style and typo. (feedback from Steven)
> 	2. Distinguish each work context from another in workqueue.
> 	3. Skip checking lock acquisition with nest_lock, which is about
> 	   correct lock usage that should be checked by Lockdep.
> 
> Changes from RFC:
> 
> 	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> 	   (feedback from Linus and Matthew)
> 	2. Use try version at lockdep_acquire_cpus_lock() annotation.
> 	3. Distinguish each syscall context from another.
[ ... ]
Hyeonggon Yoo March 2, 2022, 4:53 a.m. UTC | #2
On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> > 
> > There might still be some false positives but not overwhelming.
> >
> 
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
> 
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
>

Forgot to include another warning caused by DEPT.

And comment below might be useful for debugging:

in kmemleak.c:
  43  * Locks and mutexes are acquired/nested in the following order:
  44  *
  45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
  46  *
  47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
  48  * regions.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W        
---------------------------------------------------
summary
---------------------------------------------------
*** DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [E] spin_unlock(&object->lock:0)

context B
    [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(kmemleak_lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable

---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] __raw_spin_lock_irqsave(kmemleak_lock:0):
[<ffffffc008102ebc>] scan_block+0x3c/0x128
stacktrace:
      __dept_wait+0x8c/0xa4
      dept_wait+0x6c/0x88
      _raw_spin_lock_irqsave+0xb8/0x1c4
      scan_block+0x3c/0x128
      scan_gray_list+0xc4/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128

---------------------------------------------------
context B's detail
---------------------------------------------------
context B
    [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(kmemleak_lock:0)

[S] __raw_spin_lock_irqsave(kmemleak_lock:0):
[<ffffffc008102ebc>] scan_block+0x3c/0x128
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_block+0x3c/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] _raw_spin_lock_nested(&object->lock:0):
[<ffffffc008102f34>] scan_block+0xb4/0x128
stacktrace:
      dept_wait+0x74/0x88
      _raw_spin_lock_nested+0xa8/0x1b0
      scan_block+0xb4/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20
[E] spin_unlock(kmemleak_lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
stacktrace:
      dept_event+0x7c/0xfc
      _raw_spin_unlock_irqrestore+0x8c/0x120
      scan_block+0x60/0x128
      kmemleak_scan+0x19c/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x6c/0x70
 bfs+0xc0/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_irqsave+0xb8/0x1c4
 scan_block+0x3c/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_nested+0xa8/0x1b0
>  scan_block+0xb4/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
>
[...]

--
Thank you, You are awesome!
Hyeonggon :-)
Byungchul Park March 3, 2022, 12:18 a.m. UTC | #3
On Wed, Mar 02, 2022 at 04:53:41AM +0000, Hyeonggon Yoo wrote:
> On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > > I didn't want to bother you so I was planning to send the next spin
> > > after making more progress. However, PATCH v2 reports too many false
> > > positives because Dept tracked the bit_wait_table[] wrong way - I
> > > apologize for that. So I decided to send PATCH v3 first before going
> > > further for those who want to run Dept for now.
> > > 
> > > There might still be some false positives but not overwhelming.
> > >
> > 
> > Hello Byungchul, I'm running DEPT v3 on my system
> > and I see report below.
> > 
> > Looking at the kmemleak code and comment, I think
> > kmemleak tried to avoid lockdep recursive warning
> > but detected by DEPT?
> >
> 
> Forgot to include another warning caused by DEPT.
> 
> And comment below might be useful for debugging:
> 
> in kmemleak.c:
>   43  * Locks and mutexes are acquired/nested in the following order:
>   44  *
>   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
>   46  *
>   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
>   48  * regions.
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W        
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> context B
>     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(kmemleak_lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable

Hi Hyeonggon,

Dept also allows the following scenario when an user guarantees that
each lock instance is different from another at a different depth:

   lock A0 with depth
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock A0

However, Dept does not allow the following scenario where another lock
class cuts in the dependency chain:

   lock A0 with depth
   lock B
   lock A1 with depth + 1
   lock A2 with depth + 2
   lock A3 with depth + 3
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock A1
   unlock B
   unlock A0

This scenario is clearly problematic. What do you think is going to
happen with another context running the following?

   lock A1 with depth
   lock B
   lock A2 with depth + 1
   lock A3 with depth + 2
   (and so on)
   ..
   unlock A3
   unlock A2
   unlock B
   unlock A1

It's a deadlock. That's why Dept reports this case as a problem. Or am I
missing something?

Thanks,
Byungchul

> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [<ffffffc008102ebc>] scan_block+0x3c/0x128
> stacktrace:
>       __dept_wait+0x8c/0xa4
>       dept_wait+0x6c/0x88
>       _raw_spin_lock_irqsave+0xb8/0x1c4
>       scan_block+0x3c/0x128
>       scan_gray_list+0xc4/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> 
> ---------------------------------------------------
> context B's detail
> ---------------------------------------------------
> context B
>     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(kmemleak_lock:0)
> 
> [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> [<ffffffc008102ebc>] scan_block+0x3c/0x128
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_block+0x3c/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> 
> [W] _raw_spin_lock_nested(&object->lock:0):
> [<ffffffc008102f34>] scan_block+0xb4/0x128
> stacktrace:
>       dept_wait+0x74/0x88
>       _raw_spin_lock_nested+0xa8/0x1b0
>       scan_block+0xb4/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> [E] spin_unlock(kmemleak_lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> stacktrace:
>       dept_event+0x7c/0xfc
>       _raw_spin_unlock_irqrestore+0x8c/0x120
>       scan_block+0x60/0x128
>       kmemleak_scan+0x19c/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x6c/0x70
>  bfs+0xc0/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_irqsave+0xb8/0x1c4
>  scan_block+0x3c/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #1 Tainted: G        W
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** AA DEADLOCK ***
> > 
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_gray_list+0x84/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [E] spin_unlock(&object->lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x44/0x70
> >  bfs+0x60/0x168
> >  add_dep+0x88/0x11c
> >  add_wait+0x2d0/0x2dc
> >  __dept_wait+0x8c/0xa4
> >  dept_wait+0x6c/0x88
> >  _raw_spin_lock_nested+0xa8/0x1b0
> >  scan_block+0xb4/0x128
> >  scan_gray_list+0xc4/0x13c
> >  kmemleak_scan+0x2d8/0x54c
> >  kmemleak_scan_thread+0xac/0xd4
> >  kthread+0xd4/0xe4
> >  ret_from_fork+0x10/0x20
> >
> [...]
> 
> --
> Thank you, You are awesome!
> Hyeonggon :-)
Byungchul Park March 3, 2022, 2:22 a.m. UTC | #4
On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > I didn't want to bother you so I was planning to send the next spin
> > after making more progress. However, PATCH v2 reports too many false
> > positives because Dept tracked the bit_wait_table[] wrong way - I
> > apologize for that. So I decided to send PATCH v3 first before going
> > further for those who want to run Dept for now.
> > 
> > There might still be some false positives but not overwhelming.
> >
> 
> Hello Byungchul, I'm running DEPT v3 on my system
> and I see report below.
> 
> Looking at the kmemleak code and comment, I think
> kmemleak tried to avoid lockdep recursive warning
> but detected by DEPT?
> 
> ===================================================
> DEPT: Circular dependency has been detected.
> 5.17.0-rc1+ #1 Tainted: G        W
> ---------------------------------------------------
> summary
> ---------------------------------------------------
> *** AA DEADLOCK ***
> 
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S]: start of the event context
> [W]: the wait blocked
> [E]: the event not reachable
> ---------------------------------------------------
> context A's detail
> ---------------------------------------------------
> context A
>     [S] __raw_spin_lock_irqsave(&object->lock:0)
>     [W] _raw_spin_lock_nested(&object->lock:0)
>     [E] spin_unlock(&object->lock:0)
> 
> [S] __raw_spin_lock_irqsave(&object->lock:0):
> [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> stacktrace:
>       dept_ecxt_enter+0x88/0xf4
>       _raw_spin_lock_irqsave+0xf0/0x1c4
>       scan_gray_list+0x84/0x13c
>       kmemleak_scan+0x2d8/0x54c
>       kmemleak_scan_thread+0xac/0xd4
>       kthread+0xd4/0xe4
>       ret_from_fork+0x10/0x20

[W]'s stack trace is missed. But I guess this issue is the same issue of
what you reported following this one. We can discuss this issue on the
other report's thread.

Thanks,
Byunghcul

> [E] spin_unlock(&object->lock:0):
> [<ffffffc008102ee0>] scan_block+0x60/0x128
> ---------------------------------------------------
> information that might be helpful
> ---------------------------------------------------
> CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.0+0x9c/0xc4
>  show_stack+0x14/0x28
>  dump_stack_lvl+0x9c/0xcc
>  dump_stack+0x14/0x2c
>  print_circle+0x2d4/0x438
>  cb_check_dl+0x44/0x70
>  bfs+0x60/0x168
>  add_dep+0x88/0x11c
>  add_wait+0x2d0/0x2dc
>  __dept_wait+0x8c/0xa4
>  dept_wait+0x6c/0x88
>  _raw_spin_lock_nested+0xa8/0x1b0
>  scan_block+0xb4/0x128
>  scan_gray_list+0xc4/0x13c
>  kmemleak_scan+0x2d8/0x54c
>  kmemleak_scan_thread+0xac/0xd4
>  kthread+0xd4/0xe4
>  ret_from_fork+0x10/0x20
> 
> > ---
> > 
> > Hi Linus and folks,
> > 
> > I've been developing a tool for detecting deadlock possibilities by
> > tracking wait/event rather than lock(?) acquisition order to try to
> > cover all synchonization machanisms. It's done on v5.17-rc1 tag.
> > 
> > https://github.com/lgebyungchulpark/linux-dept/commits/dept1.14_on_v5.17-rc1
> >
> [...]
> > Benifit:
> > 
> > 	0. Works with all lock primitives.
> > 	1. Works with wait_for_completion()/complete().
> > 	2. Works with 'wait' on PG_locked.
> > 	3. Works with 'wait' on PG_writeback.
> > 	4. Works with swait/wakeup.
> > 	5. Works with waitqueue.
> > 	6. Multiple reports are allowed.
> > 	7. Deduplication control on multiple reports.
> > 	8. Withstand false positives thanks to 6.
> > 	9. Easy to tag any wait/event.
> > 
> > Future work:
> > 
> > 	0. To make it more stable.
> > 	1. To separates Dept from Lockdep.
> > 	2. To improves performance in terms of time and space.
> > 	3. To use Dept as a dependency engine for Lockdep.
> > 	4. To add any missing tags of wait/event in the kernel.
> > 	5. To deduplicate stack trace.
> > 
> > How to interpret reports:
> > 
> > 	1. E(event) in each context cannot be triggered because of the
> > 	   W(wait) that cannot be woken.
> > 	2. The stack trace helping find the problematic code is located
> > 	   in each conext's detail.
> > 
> > Thanks,
> > Byungchul
> > 
> > ---
> > 
> > Changes from v2:
> > 
> > 	1. Disable Dept on bit_wait_table[] in sched/wait_bit.c
> > 	   reporting a lot of false positives, which is my fault.
> > 	   Wait/event for bit_wait_table[] should've been tagged in a
> > 	   higher layer for better work, which is a future work.
> > 	   (feedback from Jan Kara)
> > 	2. Disable Dept on crypto_larval's completion to prevent a false
> > 	   positive.
> > 
> > Changes from v1:
> > 
> > 	1. Fix coding style and typo. (feedback from Steven)
> > 	2. Distinguish each work context from another in workqueue.
> > 	3. Skip checking lock acquisition with nest_lock, which is about
> > 	   correct lock usage that should be checked by Lockdep.
> > 
> > Changes from RFC:
> > 
> > 	1. Prevent adding a wait tag at prepare_to_wait() but __schedule().
> > 	   (feedback from Linus and Matthew)
> > 	2. Use try version at lockdep_acquire_cpus_lock() annotation.
> > 	3. Distinguish each syscall context from another.
> [ ... ] 
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Hyeonggon Yoo March 3, 2022, 8:03 a.m. UTC | #5
On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> On Wed, Mar 02, 2022 at 04:53:41AM +0000, Hyeonggon Yoo wrote:
> > On Wed, Mar 02, 2022 at 04:36:38AM +0000, Hyeonggon Yoo wrote:
> > > On Mon, Feb 28, 2022 at 06:56:39PM +0900, Byungchul Park wrote:
> > > > I didn't want to bother you so I was planning to send the next spin
> > > > after making more progress. However, PATCH v2 reports too many false
> > > > positives because Dept tracked the bit_wait_table[] wrong way - I
> > > > apologize for that. So I decided to send PATCH v3 first before going
> > > > further for those who want to run Dept for now.
> > > > 
> > > > There might still be some false positives but not overwhelming.
> > > >
> > > 
> > > Hello Byungchul, I'm running DEPT v3 on my system
> > > and I see report below.
> > > 
> > > Looking at the kmemleak code and comment, I think
> > > kmemleak tried to avoid lockdep recursive warning
> > > but detected by DEPT?
> > >
> > 
> > Forgot to include another warning caused by DEPT.
> > 
> > And comment below might be useful for debugging:
> > 
> > in kmemleak.c:
> >   43  * Locks and mutexes are acquired/nested in the following order:
> >   44  *
> >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> >   46  *
> >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> >   48  * regions.
> > 
> > ===================================================
> > DEPT: Circular dependency has been detected.
> > 5.17.0-rc1+ #1 Tainted: G        W        
> > ---------------------------------------------------
> > summary
> > ---------------------------------------------------
> > *** DEADLOCK ***
> > 
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > context B
> >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(kmemleak_lock:0)
> > 
> > [S]: start of the event context
> > [W]: the wait blocked
> > [E]: the event not reachable
> 
> Hi Hyeonggon,
> 
> Dept also allows the following scenario when an user guarantees that
> each lock instance is different from another at a different depth:
>
>    lock A0 with depth
>    lock A1 with depth + 1
>    lock A2 with depth + 2
>    lock A3 with depth + 3
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock A1
>    unlock A0
> 
> However, Dept does not allow the following scenario where another lock
> class cuts in the dependency chain:
> 
>    lock A0 with depth
>    lock B
>    lock A1 with depth + 1
>    lock A2 with depth + 2
>    lock A3 with depth + 3
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock A1
>    unlock B
>    unlock A0
> 
> This scenario is clearly problematic. What do you think is going to
> happen with another context running the following?
>

First of all, I want to say I'm not expert at locking primitives.
I may be wrong.

> >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> >   46  *
> >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> >   48  * regions.

lock order in kmemleak is described above.

and DEPT detects two cases as deadlock:

1) object->lock -> other_object->lock
2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock

And in kmemleak case, 1) and 2) is not possible because it must hold
scan_mutex first.

I think the author of kmemleak intended lockdep to treat object->lock
and other_object->lock as different class, using raw_spin_lock_nested().

Am I missing something?

Thanks.

>    lock A1 with depth
>    lock B
>    lock A2 with depth + 1
>    lock A3 with depth + 2
>    (and so on)
>    ..
>    unlock A3
>    unlock A2
>    unlock B
>    unlock A1
> 
> It's a deadlock. That's why Dept reports this case as a problem. Or am I
> missing something?
> 
> Thanks,
> Byungchul
> 
> > ---------------------------------------------------
> > context A's detail
> > ---------------------------------------------------
> > context A
> >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [E] spin_unlock(&object->lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_gray_list+0x84/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > stacktrace:
> >       __dept_wait+0x8c/0xa4
> >       dept_wait+0x6c/0x88
> >       _raw_spin_lock_irqsave+0xb8/0x1c4
> >       scan_block+0x3c/0x128
> >       scan_gray_list+0xc4/0x13c
> >       kmemleak_scan+0x2d8/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [E] spin_unlock(&object->lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > 
> > ---------------------------------------------------
> > context B's detail
> > ---------------------------------------------------
> > context B
> >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> >     [W] _raw_spin_lock_nested(&object->lock:0)
> >     [E] spin_unlock(kmemleak_lock:0)
> > 
> > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > stacktrace:
> >       dept_ecxt_enter+0x88/0xf4
> >       _raw_spin_lock_irqsave+0xf0/0x1c4
> >       scan_block+0x3c/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > 
> > [W] _raw_spin_lock_nested(&object->lock:0):
> > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > stacktrace:
> >       dept_wait+0x74/0x88
> >       _raw_spin_lock_nested+0xa8/0x1b0
> >       scan_block+0xb4/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > [E] spin_unlock(kmemleak_lock:0):
> > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > stacktrace:
> >       dept_event+0x7c/0xfc
> >       _raw_spin_unlock_irqrestore+0x8c/0x120
> >       scan_block+0x60/0x128
> >       kmemleak_scan+0x19c/0x54c
> >       kmemleak_scan_thread+0xac/0xd4
> >       kthread+0xd4/0xe4
> >       ret_from_fork+0x10/0x20
> > ---------------------------------------------------
> > information that might be helpful
> > ---------------------------------------------------
> > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.0+0x9c/0xc4
> >  show_stack+0x14/0x28
> >  dump_stack_lvl+0x9c/0xcc
> >  dump_stack+0x14/0x2c
> >  print_circle+0x2d4/0x438
> >  cb_check_dl+0x6c/0x70
> >  bfs+0xc0/0x168
> >  add_dep+0x88/0x11c
> >  add_wait+0x2d0/0x2dc
> >  __dept_wait+0x8c/0xa4
> >  dept_wait+0x6c/0x88
> >  _raw_spin_lock_irqsave+0xb8/0x1c4
> >  scan_block+0x3c/0x128
> >  scan_gray_list+0xc4/0x13c
> >  kmemleak_scan+0x2d8/0x54c
> >  kmemleak_scan_thread+0xac/0xd4
> >  kthread+0xd4/0xe4
> >  ret_from_fork+0x10/0x20
> > 
> > > ===================================================
> > > DEPT: Circular dependency has been detected.
> > > 5.17.0-rc1+ #1 Tainted: G        W
> > > ---------------------------------------------------
> > > summary
> > > ---------------------------------------------------
> > > *** AA DEADLOCK ***
> > > 
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S]: start of the event context
> > > [W]: the wait blocked
> > > [E]: the event not reachable
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_gray_list+0x84/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [E] spin_unlock(&object->lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x44/0x70
> > >  bfs+0x60/0x168
> > >  add_dep+0x88/0x11c
> > >  add_wait+0x2d0/0x2dc
> > >  __dept_wait+0x8c/0xa4
> > >  dept_wait+0x6c/0x88
> > >  _raw_spin_lock_nested+0xa8/0x1b0
> > >  scan_block+0xb4/0x128
> > >  scan_gray_list+0xc4/0x13c
> > >  kmemleak_scan+0x2d8/0x54c
> > >  kmemleak_scan_thread+0xac/0xd4
> > >  kthread+0xd4/0xe4
> > >  ret_from_fork+0x10/0x20
> > >
> > [...]
> > 
> > --
> > Thank you, You are awesome!
> > Hyeonggon :-)
Byungchul Park March 3, 2022, 9:48 a.m. UTC | #6
On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > Hi Hyeonggon,
> > 
> > Dept also allows the following scenario when an user guarantees that
> > each lock instance is different from another at a different depth:
> >
> >    lock A0 with depth
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock A0

Look at this. Dept allows object->lock -> other_object->lock (with a
different depth using *_lock_nested()) so won't report it.

> > However, Dept does not allow the following scenario where another lock
> > class cuts in the dependency chain:
> > 
> >    lock A0 with depth
> >    lock B
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock B
> >    unlock A0
> > 
> > This scenario is clearly problematic. What do you think is going to
> > happen with another context running the following?
> >
> 
> First of all, I want to say I'm not expert at locking primitives.
> I may be wrong.

It's okay. Thanks anyway for your feedback.

> > >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > >   46  *
> > >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > >   48  * regions.
> 
> lock order in kmemleak is described above.
> 
> and DEPT detects two cases as deadlock:
> 
> 1) object->lock -> other_object->lock

It's not a deadlock *IF* two have different depth using *_lock_nested().
Dept also allows this case. So Dept wouldn't report it.

> 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock

But this usage is risky. I already explained it in the mail you replied
to. I copied it. See the below.

context A
> >    lock A0 with depth
> >    lock B
> >    lock A1 with depth + 1
> >    lock A2 with depth + 2
> >    lock A3 with depth + 3
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock A1
> >    unlock B
> >    unlock A0

...

context B
> >    lock A1 with depth
> >    lock B
> >    lock A2 with depth + 1
> >    lock A3 with depth + 2
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock B
> >    unlock A1

where Ax : object->lock, B : kmemleak_lock.

A deadlock might occur if the two contexts run at the same time.

> And in kmemleak case, 1) and 2) is not possible because it must hold
> scan_mutex first.

This is another issue. Let's focus on whether the order is okay for now.

> I think the author of kmemleak intended lockdep to treat object->lock
> and other_object->lock as different class, using raw_spin_lock_nested().

Yes. The author meant to assign a different class according to its depth
using a Lockdep API. Strictly speaking, those are the same class anyway
but we assign a different class to each depth to avoid Lockdep splats
*IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
each lock instance is different at a different depth.

I was fundamentally asking you... so... is the nesting lock usage safe
for real? I hope you distinguish between the safe case and the risky
case when *_lock_nested() is involved. Thoughts?

Thanks,
Byungchul

> Am I missing something?
> 
> Thanks.
> 
> >    lock A1 with depth
> >    lock B
> >    lock A2 with depth + 1
> >    lock A3 with depth + 2
> >    (and so on)
> >    ..
> >    unlock A3
> >    unlock A2
> >    unlock B
> >    unlock A1
> > 
> > It's a deadlock. That's why Dept reports this case as a problem. Or am I
> > missing something?
> > 
> > Thanks,
> > Byungchul
> > 
> > > ---------------------------------------------------
> > > context A's detail
> > > ---------------------------------------------------
> > > context A
> > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > >     [W] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > >     [E] spin_unlock(&object->lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_gray_list+0x84/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [W] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > >       __dept_wait+0x8c/0xa4
> > >       dept_wait+0x6c/0x88
> > >       _raw_spin_lock_irqsave+0xb8/0x1c4
> > >       scan_block+0x3c/0x128
> > >       scan_gray_list+0xc4/0x13c
> > >       kmemleak_scan+0x2d8/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [E] spin_unlock(&object->lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > 
> > > ---------------------------------------------------
> > > context B's detail
> > > ---------------------------------------------------
> > > context B
> > >     [S] __raw_spin_lock_irqsave(kmemleak_lock:0)
> > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > >     [E] spin_unlock(kmemleak_lock:0)
> > > 
> > > [S] __raw_spin_lock_irqsave(kmemleak_lock:0):
> > > [<ffffffc008102ebc>] scan_block+0x3c/0x128
> > > stacktrace:
> > >       dept_ecxt_enter+0x88/0xf4
> > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > >       scan_block+0x3c/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > 
> > > [W] _raw_spin_lock_nested(&object->lock:0):
> > > [<ffffffc008102f34>] scan_block+0xb4/0x128
> > > stacktrace:
> > >       dept_wait+0x74/0x88
> > >       _raw_spin_lock_nested+0xa8/0x1b0
> > >       scan_block+0xb4/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > [E] spin_unlock(kmemleak_lock:0):
> > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > stacktrace:
> > >       dept_event+0x7c/0xfc
> > >       _raw_spin_unlock_irqrestore+0x8c/0x120
> > >       scan_block+0x60/0x128
> > >       kmemleak_scan+0x19c/0x54c
> > >       kmemleak_scan_thread+0xac/0xd4
> > >       kthread+0xd4/0xe4
> > >       ret_from_fork+0x10/0x20
> > > ---------------------------------------------------
> > > information that might be helpful
> > > ---------------------------------------------------
> > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > Hardware name: linux,dummy-virt (DT)
> > > Call trace:
> > >  dump_backtrace.part.0+0x9c/0xc4
> > >  show_stack+0x14/0x28
> > >  dump_stack_lvl+0x9c/0xcc
> > >  dump_stack+0x14/0x2c
> > >  print_circle+0x2d4/0x438
> > >  cb_check_dl+0x6c/0x70
> > >  bfs+0xc0/0x168
> > >  add_dep+0x88/0x11c
> > >  add_wait+0x2d0/0x2dc
> > >  __dept_wait+0x8c/0xa4
> > >  dept_wait+0x6c/0x88
> > >  _raw_spin_lock_irqsave+0xb8/0x1c4
> > >  scan_block+0x3c/0x128
> > >  scan_gray_list+0xc4/0x13c
> > >  kmemleak_scan+0x2d8/0x54c
> > >  kmemleak_scan_thread+0xac/0xd4
> > >  kthread+0xd4/0xe4
> > >  ret_from_fork+0x10/0x20
> > > 
> > > > ===================================================
> > > > DEPT: Circular dependency has been detected.
> > > > 5.17.0-rc1+ #1 Tainted: G        W
> > > > ---------------------------------------------------
> > > > summary
> > > > ---------------------------------------------------
> > > > *** AA DEADLOCK ***
> > > > 
> > > > context A
> > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > >     [E] spin_unlock(&object->lock:0)
> > > > 
> > > > [S]: start of the event context
> > > > [W]: the wait blocked
> > > > [E]: the event not reachable
> > > > ---------------------------------------------------
> > > > context A's detail
> > > > ---------------------------------------------------
> > > > context A
> > > >     [S] __raw_spin_lock_irqsave(&object->lock:0)
> > > >     [W] _raw_spin_lock_nested(&object->lock:0)
> > > >     [E] spin_unlock(&object->lock:0)
> > > > 
> > > > [S] __raw_spin_lock_irqsave(&object->lock:0):
> > > > [<ffffffc00810302c>] scan_gray_list+0x84/0x13c
> > > > stacktrace:
> > > >       dept_ecxt_enter+0x88/0xf4
> > > >       _raw_spin_lock_irqsave+0xf0/0x1c4
> > > >       scan_gray_list+0x84/0x13c
> > > >       kmemleak_scan+0x2d8/0x54c
> > > >       kmemleak_scan_thread+0xac/0xd4
> > > >       kthread+0xd4/0xe4
> > > >       ret_from_fork+0x10/0x20
> > > > 
> > > > [E] spin_unlock(&object->lock:0):
> > > > [<ffffffc008102ee0>] scan_block+0x60/0x128
> > > > ---------------------------------------------------
> > > > information that might be helpful
> > > > ---------------------------------------------------
> > > > CPU: 1 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
> > > > Hardware name: linux,dummy-virt (DT)
> > > > Call trace:
> > > >  dump_backtrace.part.0+0x9c/0xc4
> > > >  show_stack+0x14/0x28
> > > >  dump_stack_lvl+0x9c/0xcc
> > > >  dump_stack+0x14/0x2c
> > > >  print_circle+0x2d4/0x438
> > > >  cb_check_dl+0x44/0x70
> > > >  bfs+0x60/0x168
> > > >  add_dep+0x88/0x11c
> > > >  add_wait+0x2d0/0x2dc
> > > >  __dept_wait+0x8c/0xa4
> > > >  dept_wait+0x6c/0x88
> > > >  _raw_spin_lock_nested+0xa8/0x1b0
> > > >  scan_block+0xb4/0x128
> > > >  scan_gray_list+0xc4/0x13c
> > > >  kmemleak_scan+0x2d8/0x54c
> > > >  kmemleak_scan_thread+0xac/0xd4
> > > >  kthread+0xd4/0xe4
> > > >  ret_from_fork+0x10/0x20
> > > >
> > > [...]
> > > 
> > > --
> > > Thank you, You are awesome!
> > > Hyeonggon :-)
> 
> -- 
> Thank you, You are awesome!
> Hyeonggon :-)
Hyeonggon Yoo March 3, 2022, 12:38 p.m. UTC | #7
On Thu, Mar 03, 2022 at 06:48:24PM +0900, Byungchul Park wrote:
> On Thu, Mar 03, 2022 at 08:03:21AM +0000, Hyeonggon Yoo wrote:
> > On Thu, Mar 03, 2022 at 09:18:13AM +0900, Byungchul Park wrote:
> > > Hi Hyeonggon,
> > > 
> > > Dept also allows the following scenario when an user guarantees that
> > > each lock instance is different from another at a different depth:
> > >
> > >    lock A0 with depth
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock A0
> 

[+Cc kmemleak maintainer]

> Look at this. Dept allows object->lock -> other_object->lock (with a
> different depth using *_lock_nested()) so won't report it.
>

No, It did.

S: object->lock ( _raw_spin_lock_irqsave)
W: other_object->lock (_raw_spin_lock_nested)

DEPT reported this as AA deadlock.

===================================================
DEPT: Circular dependency has been detected.
5.17.0-rc1+ #1 Tainted: G        W
---------------------------------------------------
summary
---------------------------------------------------
*** AA DEADLOCK ***

context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S]: start of the event context
[W]: the wait blocked
[E]: the event not reachable
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)
---------------------------------------------------
context A's detail
---------------------------------------------------
context A
    [S] __raw_spin_lock_irqsave(&object->lock:0)
    [W] _raw_spin_lock_nested(&object->lock:0)
    [E] spin_unlock(&object->lock:0)

[S] __raw_spin_lock_irqsave(&object->lock:0):
[<ffffffc00810302c>] scan_gray_list+0x84/0x13c
stacktrace:
      dept_ecxt_enter+0x88/0xf4
      _raw_spin_lock_irqsave+0xf0/0x1c4
      scan_gray_list+0x84/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[W] _raw_spin_lock_nested(&object->lock:0):
[<ffffffc008102f34>] scan_block+0xb4/0x128
stacktrace:
      __dept_wait+0x8c/0xa4
      dept_wait+0x6c/0x88
      _raw_spin_lock_nested+0xa8/0x1b0
      scan_block+0xb4/0x128
      scan_gray_list+0xc4/0x13c
      kmemleak_scan+0x2d8/0x54c
      kmemleak_scan_thread+0xac/0xd4
      kthread+0xd4/0xe4
      ret_from_fork+0x10/0x20

[E] spin_unlock(&object->lock:0):
[<ffffffc008102ee0>] scan_block+0x60/0x128
---------------------------------------------------
information that might be helpful
---------------------------------------------------
CPU: 2 PID: 38 Comm: kmemleak Tainted: G        W         5.17.0-rc1+ #1
Hardware name: linux,dummy-virt (DT)
Call trace:
 dump_backtrace.part.0+0x9c/0xc4
 show_stack+0x14/0x28
 dump_stack_lvl+0x9c/0xcc
 dump_stack+0x14/0x2c
 print_circle+0x2d4/0x438
 cb_check_dl+0x44/0x70
 bfs+0x60/0x168
 add_dep+0x88/0x11c
 add_wait+0x2d0/0x2dc
 __dept_wait+0x8c/0xa4
 dept_wait+0x6c/0x88
 _raw_spin_lock_nested+0xa8/0x1b0
 scan_block+0xb4/0x128
 scan_gray_list+0xc4/0x13c
 kmemleak_scan+0x2d8/0x54c
 kmemleak_scan_thread+0xac/0xd4
 kthread+0xd4/0xe4
 ret_from_fork+0x10/0x20

> > > However, Dept does not allow the following scenario where another lock
> > > class cuts in the dependency chain:
> > > 
> > >    lock A0 with depth
> > >    lock B
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock B
> > >    unlock A0
> > > 
> > > This scenario is clearly problematic. What do you think is going to
> > > happen with another context running the following?
> > >
> > 
> > First of all, I want to say I'm not expert at locking primitives.
> > I may be wrong.
> 
> It's okay. Thanks anyway for your feedback.
>

Thanks.

> > > >   45  *   scan_mutex [-> object->lock] -> kmemleak_lock -> other_object->lock (SINGLE_DEPTH_NESTING)
> > > >   46  *
> > > >   47  * No kmemleak_lock and object->lock nesting is allowed outside scan_mutex
> > > >   48  * regions.
> > 
> > lock order in kmemleak is described above.
> > 
> > and DEPT detects two cases as deadlock:
> > 
> > 1) object->lock -> other_object->lock
> 
> It's not a deadlock *IF* two have different depth using *_lock_nested().
> Dept also allows this case. So Dept wouldn't report it.
>
> > 2) object->lock -> kmemleak_lock, kmemleak_lock -> other_object->lock
>
> But this usage is risky. I already explained it in the mail you replied
> to. I copied it. See the below.
>

I understand why you said this is risky.
Its lock ordering is not good.

> context A
> > >    lock A0 with depth
> > >    lock B
> > >    lock A1 with depth + 1
> > >    lock A2 with depth + 2
> > >    lock A3 with depth + 3
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock A1
> > >    unlock B
> > >    unlock A0
>
> ...
>
> context B
> > >    lock A1 with depth
> > >    lock B
> > >    lock A2 with depth + 1
> > >    lock A3 with depth + 2
> > >    (and so on)
> > >    ..
> > >    unlock A3
> > >    unlock A2
> > >    unlock B
> > >    unlock A1
> 
> where Ax : object->lock, B : kmemleak_lock.
> 
> A deadlock might occur if the two contexts run at the same time.
>

But I want to say kmemleak is getting things under control. No two contexts
can run at same time.

> > And in kmemleak case, 1) and 2) is not possible because it must hold
> > scan_mutex first.
> 
> This is another issue. Let's focus on whether the order is okay for now.
>

Why is it another issue?

> > I think the author of kmemleak intended lockdep to treat object->lock
> > and other_object->lock as different class, using raw_spin_lock_nested().
> 
> Yes. The author meant to assign a different class according to its depth
> using a Lockdep API. Strictly speaking, those are the same class anyway
> but we assign a different class to each depth to avoid Lockdep splats
> *IF* the user guarantees the nesting lock usage is safe, IOW, guarantees
> each lock instance is different at a different depth.

Then why DEPT reports 1) and 2) as deadlock?
Does DEPT assign same class unlike Lockdep?

> I was fundamentally asking you... so... is the nesting lock usage safe
> for real?

I don't get what the point is. I agree it's not a good lock ordering.
But in kmemleak case, I think kmemleak is getting things under control.