diff mbox

mmu_notifiers: turn off lockdep around mm_take_all_locks

Message ID 20090707180630.GA8008@amt.cnet (mailing list archive)
State New, archived
Headers show

Commit Message

Marcelo Tosatti July 7, 2009, 6:06 p.m. UTC
KVM guests with CONFIG_LOCKDEP=y trigger the following warning:

BUG: MAX_LOCK_DEPTH too low!
turning off the locking correctness validator.
Pid: 4624, comm: qemu-system-x86 Not tainted 2.6.31-rc2-03981-g3abaf21
#32
Call Trace:
 [<ffffffff81068bab>] __lock_acquire+0x1559/0x15fc
 [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
 [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
 [<ffffffff81068d3c>] lock_acquire+0xee/0x112
 [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
 [<ffffffff81402596>] ? _spin_lock_nest_lock+0x20/0x50
 [<ffffffff814025b7>] _spin_lock_nest_lock+0x41/0x50
 [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
 [<ffffffff810be516>] mm_take_all_locks+0xd6/0x109
 [<ffffffff810d0f76>] do_mmu_notifier_register+0xd4/0x199
 [<ffffffff810d1060>] mmu_notifier_register+0x13/0x15
 [<ffffffffa0107f16>] kvm_dev_ioctl+0x13f/0x30e [kvm]
 [<ffffffff810e6a3a>] vfs_ioctl+0x2f/0x7d
 [<ffffffff810e6fb7>] do_vfs_ioctl+0x4af/0x4ec
 [<ffffffff814030b4>] ? error_exit+0x94/0xb0
 [<ffffffff81401f92>] ? trace_hardirqs_off_thunk+0x3a/0x3c
 [<ffffffff8100bc2d>] ? retint_swapgs+0xe/0x13
 [<ffffffff810e703b>] sys_ioctl+0x47/0x6a
 [<ffffffff811d849c>] ? __up_read+0x1a/0x85
 [<ffffffff8100b1db>] system_call_fastpath+0x16/0x1b

Since mm_take_all_locks takes a gazillion locks.

Is there any way around this other than completly shutting down lockdep?



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Peter Zijlstra July 7, 2009, 6:15 p.m. UTC | #1
On Tue, 2009-07-07 at 15:06 -0300, Marcelo Tosatti wrote:
> KVM guests with CONFIG_LOCKDEP=y trigger the following warning:
> 
> BUG: MAX_LOCK_DEPTH too low!
> turning off the locking correctness validator.
> Pid: 4624, comm: qemu-system-x86 Not tainted 2.6.31-rc2-03981-g3abaf21
> #32
> Call Trace:
>  [<ffffffff81068bab>] __lock_acquire+0x1559/0x15fc
>  [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>  [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>  [<ffffffff81068d3c>] lock_acquire+0xee/0x112
>  [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>  [<ffffffff81402596>] ? _spin_lock_nest_lock+0x20/0x50
>  [<ffffffff814025b7>] _spin_lock_nest_lock+0x41/0x50
>  [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>  [<ffffffff810be516>] mm_take_all_locks+0xd6/0x109
>  [<ffffffff810d0f76>] do_mmu_notifier_register+0xd4/0x199
>  [<ffffffff810d1060>] mmu_notifier_register+0x13/0x15
>  [<ffffffffa0107f16>] kvm_dev_ioctl+0x13f/0x30e [kvm]
>  [<ffffffff810e6a3a>] vfs_ioctl+0x2f/0x7d
>  [<ffffffff810e6fb7>] do_vfs_ioctl+0x4af/0x4ec
>  [<ffffffff814030b4>] ? error_exit+0x94/0xb0
>  [<ffffffff81401f92>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>  [<ffffffff8100bc2d>] ? retint_swapgs+0xe/0x13
>  [<ffffffff810e703b>] sys_ioctl+0x47/0x6a
>  [<ffffffff811d849c>] ? __up_read+0x1a/0x85
>  [<ffffffff8100b1db>] system_call_fastpath+0x16/0x1b
> 
> Since mm_take_all_locks takes a gazillion locks.
> 
> Is there any way around this other than completly shutting down lockdep?

When we created this the promise was that kvm would only do this on a
fresh mm with only a few vmas, has that changed?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity July 7, 2009, 6:18 p.m. UTC | #2
On 07/07/2009 09:15 PM, Peter Zijlstra wrote:
> On Tue, 2009-07-07 at 15:06 -0300, Marcelo Tosatti wrote:
>    
>> KVM guests with CONFIG_LOCKDEP=y trigger the following warning:
>>
>> BUG: MAX_LOCK_DEPTH too low!
>> turning off the locking correctness validator.
>> Pid: 4624, comm: qemu-system-x86 Not tainted 2.6.31-rc2-03981-g3abaf21
>> #32
>> Call Trace:
>>   [<ffffffff81068bab>] __lock_acquire+0x1559/0x15fc
>>   [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>>   [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>>   [<ffffffff81068d3c>] lock_acquire+0xee/0x112
>>   [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>>   [<ffffffff81402596>] ? _spin_lock_nest_lock+0x20/0x50
>>   [<ffffffff814025b7>] _spin_lock_nest_lock+0x41/0x50
>>   [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>>   [<ffffffff810be516>] mm_take_all_locks+0xd6/0x109
>>   [<ffffffff810d0f76>] do_mmu_notifier_register+0xd4/0x199
>>   [<ffffffff810d1060>] mmu_notifier_register+0x13/0x15
>>   [<ffffffffa0107f16>] kvm_dev_ioctl+0x13f/0x30e [kvm]
>>   [<ffffffff810e6a3a>] vfs_ioctl+0x2f/0x7d
>>   [<ffffffff810e6fb7>] do_vfs_ioctl+0x4af/0x4ec
>>   [<ffffffff814030b4>] ? error_exit+0x94/0xb0
>>   [<ffffffff81401f92>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>   [<ffffffff8100bc2d>] ? retint_swapgs+0xe/0x13
>>   [<ffffffff810e703b>] sys_ioctl+0x47/0x6a
>>   [<ffffffff811d849c>] ? __up_read+0x1a/0x85
>>   [<ffffffff8100b1db>] system_call_fastpath+0x16/0x1b
>>
>> Since mm_take_all_locks takes a gazillion locks.
>>
>> Is there any way around this other than completly shutting down lockdep?
>>      
>
> When we created this the promise was that kvm would only do this on a
> fresh mm with only a few vmas, has that changed

The number of vmas did increase, but not materially.  We do link with 
more shared libraries though.
Marcelo Tosatti July 7, 2009, 6:37 p.m. UTC | #3
On Tue, Jul 07, 2009 at 09:18:36PM +0300, Avi Kivity wrote:
> On 07/07/2009 09:15 PM, Peter Zijlstra wrote:
>> On Tue, 2009-07-07 at 15:06 -0300, Marcelo Tosatti wrote:
>>    
>>> KVM guests with CONFIG_LOCKDEP=y trigger the following warning:
>>>
>>> BUG: MAX_LOCK_DEPTH too low!
>>> turning off the locking correctness validator.
>>> Pid: 4624, comm: qemu-system-x86 Not tainted 2.6.31-rc2-03981-g3abaf21
>>> #32
>>> Call Trace:
>>>   [<ffffffff81068bab>] __lock_acquire+0x1559/0x15fc
>>>   [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>>>   [<ffffffff810be4d9>] ? mm_take_all_locks+0x99/0x109
>>>   [<ffffffff81068d3c>] lock_acquire+0xee/0x112
>>>   [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>>>   [<ffffffff81402596>] ? _spin_lock_nest_lock+0x20/0x50
>>>   [<ffffffff814025b7>] _spin_lock_nest_lock+0x41/0x50
>>>   [<ffffffff810be516>] ? mm_take_all_locks+0xd6/0x109
>>>   [<ffffffff810be516>] mm_take_all_locks+0xd6/0x109
>>>   [<ffffffff810d0f76>] do_mmu_notifier_register+0xd4/0x199
>>>   [<ffffffff810d1060>] mmu_notifier_register+0x13/0x15
>>>   [<ffffffffa0107f16>] kvm_dev_ioctl+0x13f/0x30e [kvm]
>>>   [<ffffffff810e6a3a>] vfs_ioctl+0x2f/0x7d
>>>   [<ffffffff810e6fb7>] do_vfs_ioctl+0x4af/0x4ec
>>>   [<ffffffff814030b4>] ? error_exit+0x94/0xb0
>>>   [<ffffffff81401f92>] ? trace_hardirqs_off_thunk+0x3a/0x3c
>>>   [<ffffffff8100bc2d>] ? retint_swapgs+0xe/0x13
>>>   [<ffffffff810e703b>] sys_ioctl+0x47/0x6a
>>>   [<ffffffff811d849c>] ? __up_read+0x1a/0x85
>>>   [<ffffffff8100b1db>] system_call_fastpath+0x16/0x1b
>>>
>>> Since mm_take_all_locks takes a gazillion locks.
>>>
>>> Is there any way around this other than completly shutting down lockdep?
>>>      
>>
>> When we created this the promise was that kvm would only do this on a
>> fresh mm with only a few vmas, has that changed
>
> The number of vmas did increase, but not materially.  We do link with  
> more shared libraries though.

Yeah, see attached /proc/pid/maps just before the ioctl thats ends up in 
mmu_notifier_register.

mm_take_all_locks: file_vma=79 anon_vma=40
00400000-005f7000 r-xp 00000000 fe:00 35325942                           /home/marcelo/git/kvm-userspace/qemu/x86_64-softmmu/qemu-system-x86_64
007f7000-007fd000 rw-p 001f7000 fe:00 35325942                           /home/marcelo/git/kvm-userspace/qemu/x86_64-softmmu/qemu-system-x86_64
007fd000-00c2b000 rw-p 00000000 00:00 0                                  [heap]
3694c00000-3694c1d000 r-xp 00000000 fe:00 55902220                       /lib64/ld-2.8.so
3694e1c000-3694e1d000 r--p 0001c000 fe:00 55902220                       /lib64/ld-2.8.so
3694e1d000-3694e1e000 rw-p 0001d000 fe:00 55902220                       /lib64/ld-2.8.so
3695800000-369586b000 r-xp 00000000 fe:00 37007715                       /usr/lib64/libSDL-1.2.so.0.11.2
369586b000-3695a6a000 ---p 0006b000 fe:00 37007715                       /usr/lib64/libSDL-1.2.so.0.11.2
3695a6a000-3695a6d000 rw-p 0006a000 fe:00 37007715                       /usr/lib64/libSDL-1.2.so.0.11.2
3695a6d000-3695a9d000 rw-p 00000000 00:00 0 
3695e00000-3695f62000 r-xp 00000000 fe:00 55902230                       /lib64/libc-2.8.so
3695f62000-3696162000 ---p 00162000 fe:00 55902230                       /lib64/libc-2.8.so
3696162000-3696166000 r--p 00162000 fe:00 55902230                       /lib64/libc-2.8.so
3696166000-3696167000 rw-p 00166000 fe:00 55902230                       /lib64/libc-2.8.so
3696167000-369616c000 rw-p 00000000 00:00 0 
3696200000-3696284000 r-xp 00000000 fe:00 55902232                       /lib64/libm-2.8.so
3696284000-3696483000 ---p 00084000 fe:00 55902232                       /lib64/libm-2.8.so
3696483000-3696484000 r--p 00083000 fe:00 55902232                       /lib64/libm-2.8.so
3696484000-3696485000 rw-p 00084000 fe:00 55902232                       /lib64/libm-2.8.so
3696600000-3696602000 r-xp 00000000 fe:00 55902236                       /lib64/libdl-2.8.so
3696602000-3696802000 ---p 00002000 fe:00 55902236                       /lib64/libdl-2.8.so
3696802000-3696803000 r--p 00002000 fe:00 55902236                       /lib64/libdl-2.8.so
3696803000-3696804000 rw-p 00003000 fe:00 55902236                       /lib64/libdl-2.8.so
3696a00000-3696a16000 r-xp 00000000 fe:00 55902321                       /lib64/libpthread-2.8.so
3696a16000-3696c15000 ---p 00016000 fe:00 55902321                       /lib64/libpthread-2.8.so
3696c15000-3696c16000 r--p 00015000 fe:00 55902321                       /lib64/libpthread-2.8.so
3696c16000-3696c17000 rw-p 00016000 fe:00 55902321                       /lib64/libpthread-2.8.so
3696c17000-3696c1b000 rw-p 00000000 00:00 0 
3697200000-3697215000 r-xp 00000000 fe:00 55902316                       /lib64/libz.so.1.2.3
3697215000-3697414000 ---p 00015000 fe:00 55902316                       /lib64/libz.so.1.2.3
3697414000-3697415000 rw-p 00014000 fe:00 55902316                       /lib64/libz.so.1.2.3
3697e00000-3697e05000 r-xp 00000000 fe:00 37003255                       /usr/lib64/libXdmcp.so.6.0.0
3697e05000-3698004000 ---p 00005000 fe:00 37003255                       /usr/lib64/libXdmcp.so.6.0.0
3698004000-3698005000 rw-p 00004000 fe:00 37003255                       /usr/lib64/libXdmcp.so.6.0.0
3698200000-369821a000 r-xp 00000000 fe:00 37003257                       /usr/lib64/libxcb.so.1.0.0
369821a000-369841a000 ---p 0001a000 fe:00 37003257                       /usr/lib64/libxcb.so.1.0.0
369841a000-369841b000 rw-p 0001a000 fe:00 37003257                       /usr/lib64/libxcb.so.1.0.0
3698600000-3698602000 r-xp 00000000 fe:00 37003253                       /usr/lib64/libXau.so.6.0.0
3698602000-3698801000 ---p 00002000 fe:00 37003253                       /usr/lib64/libXau.so.6.0.0
3698801000-3698802000 rw-p 00001000 fe:00 37003253                       /usr/lib64/libXau.so.6.0.0
3698a00000-3698b06000 r-xp 00000000 fe:00 37003260                       /usr/lib64/libX11.so.6.2.0
3698b06000-3698d05000 ---p 00106000 fe:00 37003260                       /usr/lib64/libX11.so.6.2.0
3698d05000-3698d0b000 rw-p 00105000 fe:00 37003260                       /usr/lib64/libX11.so.6.2.0
3698e00000-3698e01000 r-xp 00000000 fe:00 37003259                       /usr/lib64/libxcb-xlib.so.0.0.0
3698e01000-3699000000 ---p 00001000 fe:00 37003259                       /usr/lib64/libxcb-xlib.so.0.0.0
3699000000-3699001000 rw-p 00000000 fe:00 37003259                       /usr/lib64/libxcb-xlib.so.0.0.0
369f600000-369f611000 r-xp 00000000 fe:00 55902471                       /lib64/libresolv-2.8.so
369f611000-369f811000 ---p 00011000 fe:00 55902471                       /lib64/libresolv-2.8.so
369f811000-369f812000 r--p 00011000 fe:00 55902471                       /lib64/libresolv-2.8.so
369f812000-369f813000 rw-p 00012000 fe:00 55902471                       /lib64/libresolv-2.8.so
369f813000-369f815000 rw-p 00000000 00:00 0 
36a2600000-36a2602000 r-xp 00000000 fe:00 55902548                       /lib64/libutil-2.8.so
36a2602000-36a2801000 ---p 00002000 fe:00 55902548                       /lib64/libutil-2.8.so
36a2801000-36a2802000 r--p 00001000 fe:00 55902548                       /lib64/libutil-2.8.so
36a2802000-36a2803000 rw-p 00002000 fe:00 55902548                       /lib64/libutil-2.8.so
36a4a00000-36a4a09000 r-xp 00000000 fe:00 55902514                       /lib64/libcrypt-2.8.so
36a4a09000-36a4c08000 ---p 00009000 fe:00 55902514                       /lib64/libcrypt-2.8.so
36a4c08000-36a4c09000 r--p 00008000 fe:00 55902514                       /lib64/libcrypt-2.8.so
36a4c09000-36a4c0a000 rw-p 00009000 fe:00 55902514                       /lib64/libcrypt-2.8.so
36a4c0a000-36a4c38000 rw-p 00000000 00:00 0 
36a6200000-36a6221000 r-xp 00000000 fe:00 55902277                       /lib64/libncurses.so.5.6
36a6221000-36a6421000 ---p 00021000 fe:00 55902277                       /lib64/libncurses.so.5.6
36a6421000-36a6422000 rw-p 00021000 fe:00 55902277                       /lib64/libncurses.so.5.6
36aae00000-36aae64000 r-xp 00000000 fe:00 55903029                       /lib64/libgcrypt.so.11.4.3
36aae64000-36ab063000 ---p 00064000 fe:00 55903029                       /lib64/libgcrypt.so.11.4.3
36ab063000-36ab066000 rw-p 00063000 fe:00 55903029                       /lib64/libgcrypt.so.11.4.3
36aba00000-36aba1c000 r-xp 00000000 fe:00 55902546                       /lib64/libtinfo.so.5.6
36aba1c000-36abc1c000 ---p 0001c000 fe:00 55902546                       /lib64/libtinfo.so.5.6
36abc1c000-36abc20000 rw-p 0001c000 fe:00 55902546                       /lib64/libtinfo.so.5.6
36aca00000-36aca19000 r-xp 00000000 fe:00 37003770                       /usr/lib64/libsasl2.so.2.0.22
36aca19000-36acc19000 ---p 00019000 fe:00 37003770                       /usr/lib64/libsasl2.so.2.0.22
36acc19000-36acc1a000 rw-p 00019000 fe:00 37003770                       /usr/lib64/libsasl2.so.2.0.22
3794800000-379487b000 r-xp 00000000 fe:00 3342887                        /usr/lib64/libgnutls.so.13.9.1
379487b000-3794a7a000 ---p 0007b000 fe:00 3342887                        /usr/lib64/libgnutls.so.13.9.1
3794a7a000-3794a85000 rw-p 0007a000 fe:00 3342887                        /usr/lib64/libgnutls.so.13.9.1
3797a00000-3797a07000 r-xp 00000000 fe:00 47317353                       /lib64/librt-2.8.so
3797a07000-3797c07000 ---p 00007000 fe:00 47317353                       /lib64/librt-2.8.so
3797c07000-3797c08000 r--p 00007000 fe:00 47317353                       /lib64/librt-2.8.so
3797c08000-3797c09000 rw-p 00008000 fe:00 47317353                       /lib64/librt-2.8.so
3798200000-3798210000 r-xp 00000000 fe:00 3342886                        /usr/lib64/libtasn1.so.3.0.14
3798210000-379840f000 ---p 00010000 fe:00 3342886                        /usr/lib64/libtasn1.so.3.0.14
379840f000-3798410000 rw-p 0000f000 fe:00 3342886                        /usr/lib64/libtasn1.so.3.0.14
7f7b2d7d5000-7f7b2d7da000 rw-p 00000000 00:00 0 
7f7b2d7da000-7f7b2d7dd000 r-xp 00000000 fe:00 55902991                   /lib64/libgpg-error.so.0.4.0
7f7b2d7dd000-7f7b2d9dc000 ---p 00003000 fe:00 55902991                   /lib64/libgpg-error.so.0.4.0
7f7b2d9dc000-7f7b2d9dd000 rw-p 00002000 fe:00 55902991                   /lib64/libgpg-error.so.0.4.0
7f7b2d9dd000-7f7b2d9e2000 rw-p 00000000 00:00 0 
7f7b2da06000-7f7b2da08000 rw-p 00000000 00:00 0 
7fffe0cdb000-7fffe0cf0000 rw-p 00000000 00:00 0                          [stack]
7fffe0d93000-7fffe0d94000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
Peter Zijlstra July 7, 2009, 7:04 p.m. UTC | #4
On Tue, 2009-07-07 at 15:37 -0300, Marcelo Tosatti wrote:
> >>>
> >>> Is there any way around this other than completly shutting down lockdep?
> >>>      
> >>
> >> When we created this the promise was that kvm would only do this on a
> >> fresh mm with only a few vmas, has that changed
> >
> > The number of vmas did increase, but not materially.  We do link with  
> > more shared libraries though.
> 
> Yeah, see attached /proc/pid/maps just before the ioctl thats ends up in 
> mmu_notifier_register.
> 
> mm_take_all_locks: file_vma=79 anon_vma=40

Another issue, at about >=256 vmas we'll overflow the preempt count. So
disabling lockdep will only 'fix' this for a short while, until you've
bloated beyond that ;-)

Although you could possibly disable preemption and use
__raw_spin_lock(), that would also side-step the whole lockdep issue,
but it feels like such a horrid hack.

Alternatively we would have to modify the rmap locking, but that would
incur overhead on the regular code paths, so that's probably not worth
the trade-off.

Linus, Ingo, any opinions?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds July 7, 2009, 7:25 p.m. UTC | #5
On Tue, 7 Jul 2009, Peter Zijlstra wrote:
> 
> Another issue, at about >=256 vmas we'll overflow the preempt count. So
> disabling lockdep will only 'fix' this for a short while, until you've
> bloated beyond that ;-)

We would? 

I don't think so. Sure, we'd "overflow" into the softirq bits, but it's 
all designed to faile very gracefully. Somebody who tests our "status" 
might think we're in softirq context, but that really doesn't matter: we 
still have preemption disabled.

> Linus, Ingo, any opinions?

I do think that if lockdep can't handle it, we probably should turn it off 
around it.

I don't think it's broken wrt regular preempt, though.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 5f4ef02..0c43cae 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -148,6 +148,8 @@  static int do_mmu_notifier_register(struct mmu_notifier *mn,
 	struct mmu_notifier_mm *mmu_notifier_mm;
 	int ret;
 
+	lockdep_off();
+
 	BUG_ON(atomic_read(&mm->mm_users) <= 0);
 
 	ret = -ENOMEM;
@@ -189,6 +191,7 @@  out_cleanup:
 	kfree(mmu_notifier_mm);
 out:
 	BUG_ON(atomic_read(&mm->mm_users) <= 0);
+	lockdep_on();
 	return ret;
 }