diff mbox

usbhid: Fix lockdep unannotated irqs-off warning

Message ID 1443427804-2957-1-git-send-email-sedat.dilek@gmail.com (mailing list archive)
State New, archived
Delegated to: Jiri Kosina
Headers show

Commit Message

Sedat Dilek Sept. 28, 2015, 8:10 a.m. UTC
When compiling Linux v4.2+ and v4.3-rc2+ with a llvmlinux patchset
and CLANG v3.7 I see a BUG line like this:

[   24.705463] BUG: sleeping function called from invalid context at kernel/workqueue.c:2680
[   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid

After some vital help from workqueue and hid folks it turned out to be
a problem in the hid area.

Jiri encouraged me to look into del_timer-sync()/cancel_work_sync().
So, I disassembled kernel/time/timer.o.
This looked good.

Both functions are called in hid_cancel_delayed_stuff().
Adding a might_sleep() on top of it showed hints towards lockdep and
unannotated irqs-off.

[   23.723807] WARNING: CPU: 0 PID: 1451 at kernel/locking/lockdep.c:3519 check_flags+0x6c/0x1b0()
[   23.723866] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
...
[   23.774719] possible reason: unannotated irqs-off.

Dealing with irqflags-tracing I enabled some helpful kernel-options:

CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
CONFIG_TRACING=y
CONFIG_TRACE_IRQFLAGS=y
CONFIG_IRQSOFF_TRACER=y

The file irqflags-tracing.txt was helpful.
Steven Rostedt pointed me to the irqsoff section in ftrace.txt documentation.

Then, I tried to force to turn off the tracing of hardirqs in usbhid_close()
via trace_hardirqs_off() - line by line.
Finally, the spin_unlock_irq() within the if-statement looked like the culprit.

Looking through commits having a similiar issue, I replaced spin_{un}lock_irq()
with spin_{un}lock_bh() as this doesn't need interrupts disabled.

This fixed the issue for me.

Tested against Linux v4.3-rc3.

Still unsure why the same kernel built with GCC v4.9 does not show this.

Thanks to all involved people.

[1] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/irqflags-tracing.txt
[2] http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/Documentation/trace/ftrace.txt#n906

CC: Jiri Kosina <jikos@kernel.org>
CC: linux-input@vger.kernel.org
CC: Tejun Heo <tj@kernel.org>
CC: Lai Jiangshan <jiangshanlai@gmail.com>
CC: Steven Rostedt <rostedt@goodmis.org>
CC: Paul McKenney <paulmck@linux.vnet.ibm.com>

---
 drivers/hid/usbhid/hid-core.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jiri Kosina Sept. 28, 2015, 11:33 a.m. UTC | #1
On Mon, 28 Sep 2015, Sedat Dilek wrote:

> When compiling Linux v4.2+ and v4.3-rc2+ with a llvmlinux patchset
> and CLANG v3.7 I see a BUG line like this:
> 
> [   24.705463] BUG: sleeping function called from invalid context at kernel/workqueue.c:2680
> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
> 
> After some vital help from workqueue and hid folks it turned out to be
> a problem in the hid area.
> 
> Jiri encouraged me to look into del_timer-sync()/cancel_work_sync().
> So, I disassembled kernel/time/timer.o.
> This looked good.
> 
> Both functions are called in hid_cancel_delayed_stuff().

Yeah, but we're enabling IRQs before calling hid_cancel_delayed_stuff() 
(or, to be more precise, we're restoring original flags, and I don't see 
usbhid_close() being called with IRQs off).

Therefore ...

[ ... snip ... ]
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index 36712e9f56c2..188f59348ec5 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -729,16 +729,16 @@ void usbhid_close(struct hid_device *hid)
>  	 * data acquistion due to a resumption we no longer
>  	 * care about
>  	 */
> -	spin_lock_irq(&usbhid->lock);
> +	spin_lock_bh(&usbhid->lock);
>  	if (!--hid->open) {
> -		spin_unlock_irq(&usbhid->lock);
> +		spin_unlock_bh(&usbhid->lock);
>  		hid_cancel_delayed_stuff(usbhid);

I still don't understand how this should be improving anything. I believe 
spin_unlock_irq() should just re-enable interrupts, because we've been 
called with them enabled as well.

Now if you are able to see how usbhid_close() can be called with IRQs 
off, that would be a completely different story. But if that's not the 
case, the warning is bogus, and gcc-compiled kernels are right about not 
issuing it.

But without that, I so far fail to see how this is a correct thing to do.

Thanks,
Sedat Dilek Sept. 29, 2015, 8:40 a.m. UTC | #2
On Mon, Sep 28, 2015 at 1:33 PM, Jiri Kosina <jikos@kernel.org> wrote:
> On Mon, 28 Sep 2015, Sedat Dilek wrote:
>
>> When compiling Linux v4.2+ and v4.3-rc2+ with a llvmlinux patchset
>> and CLANG v3.7 I see a BUG line like this:
>>
>> [   24.705463] BUG: sleeping function called from invalid context at kernel/workqueue.c:2680
>> [   24.705576] in_atomic(): 0, irqs_disabled(): 1, pid: 1447, name: acpid
>>
>> After some vital help from workqueue and hid folks it turned out to be
>> a problem in the hid area.
>>
>> Jiri encouraged me to look into del_timer-sync()/cancel_work_sync().
>> So, I disassembled kernel/time/timer.o.
>> This looked good.
>>
>> Both functions are called in hid_cancel_delayed_stuff().
>

Hi Jiri,

first of all thanky ou for your fast and informative reply.

I am not a LOCKING expert or compiler guru, so a lot of stuff is new
to me and I try to understand things by testing.

> Yeah, but we're enabling IRQs before calling hid_cancel_delayed_stuff()
> (or, to be more precise, we're restoring original flags, and I don't see
> usbhid_close() being called with IRQs off).
>

2nd this issue is really serious for me.
When doing a cut and paste in Firefox, my Xorg restarts and I am
confronted with my LightDM login-manager.
So... not really a working environment.

Did you look at the step-by-step moving of trace_hardirqs_off() and
the corresponding dmesg-logs?
What helps is a trace_hardirqs_off() before spin_unlock_irq() in the
if-statement.
So, yes "IRQs are enabled" but tracing does not like it.

> Therefore ...
>
> [ ... snip ... ]
>> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
>> index 36712e9f56c2..188f59348ec5 100644
>> --- a/drivers/hid/usbhid/hid-core.c
>> +++ b/drivers/hid/usbhid/hid-core.c
>> @@ -729,16 +729,16 @@ void usbhid_close(struct hid_device *hid)
>>        * data acquistion due to a resumption we no longer
>>        * care about
>>        */
>> -     spin_lock_irq(&usbhid->lock);
>> +     spin_lock_bh(&usbhid->lock);
>>       if (!--hid->open) {
>> -             spin_unlock_irq(&usbhid->lock);
>> +             spin_unlock_bh(&usbhid->lock);
>>               hid_cancel_delayed_stuff(usbhid);
>
> I still don't understand how this should be improving anything. I believe
> spin_unlock_irq() should just re-enable interrupts, because we've been
> called with them enabled as well.
>

Is spin_lock_bh() not an appropriate replacement?
Does it change code beaviour?
Is it important to re-enable IRQs here - before hid_cancel_delayed_stuff()?
Turning hardirqs off seems to make the bug-line go away.

> Now if you are able to see how usbhid_close() can be called with IRQs
> off, that would be a completely different story. But if that's not the
> case, the warning is bogus, and gcc-compiled kernels are right about not
> issuing it.
>

Again, I am new to tracing.
Steven encouraged me to look at the lockdep hints in dmesg - not ftrace [1].

"Actually, if you are looking for where interrupts were disabled last
before triggering the "sleeping function called from invalid context",
lockdep, not ftrace, would be your better bet.

Enable lockdep with CONFIG_PROVE_LOCKING. It will give you better
information about where the last irq was disabled."

Here, I have CONFIG_PROVE_LOCKING=y.

I am doing a new kernel-build with the might_sleep() on top of
hid_cancel_delayed_stuff() which showed some lockdep/irqsoff hints in
dmesg-log.

> But without that, I so far fail to see how this is a correct thing to do.

Again, I dunno why two compiler behave different here.
Unsure if it is a compiler or linux-kernel issue or whatever.

Still fighting...

- Sedat -

[1] http://marc.info/?l=linux-kernel&m=144337272915104&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Sept. 29, 2015, 1:13 p.m. UTC | #3
On Tue, 29 Sep 2015 11:06:25 +0200
Sedat Dilek <sedat.dilek@gmail.com> wrote:

> > Is spin_lock_bh() not an appropriate replacement?

I agree with Jiri, your patch makes no sense. It's strange that it
would fix anything.

> > Does it change code beaviour?

Yes.

> > Is it important to re-enable IRQs here - before hid_cancel_delayed_stuff()?
> > Turning hardirqs off seems to make the bug-line go away.
> >
> >> Now if you are able to see how usbhid_close() can be called with IRQs
> >> off, that would be a completely different story. But if that's not the
> >> case, the warning is bogus, and gcc-compiled kernels are right about not
> >> issuing it.
> >>
> >
> > Again, I am new to tracing.
> > Steven encouraged me to look at the lockdep hints in dmesg - not ftrace [1].

Because lockdep will show you where interrupts were last disabled (see
below).

> >
> > "Actually, if you are looking for where interrupts were disabled last
> > before triggering the "sleeping function called from invalid context",
> > lockdep, not ftrace, would be your better bet.
> >
> > Enable lockdep with CONFIG_PROVE_LOCKING. It will give you better
> > information about where the last irq was disabled."
> >
> > Here, I have CONFIG_PROVE_LOCKING=y.
> >
> > I am doing a new kernel-build with the might_sleep() on top of
> > hid_cancel_delayed_stuff() which showed some lockdep/irqsoff hints in
> > dmesg-log.
> >
> >> But without that, I so far fail to see how this is a correct thing to do.
> >
> > Again, I dunno why two compiler behave different here.
> > Unsure if it is a compiler or linux-kernel issue or whatever.
> >
> > Still fighting...
> >
> > - Sedat -
> >
> > [1] http://marc.info/?l=linux-kernel&m=144337272915104&w=2
> 
> As promised my dmesg-log (I did not apply above patch but the attached one)...
> 
> [   24.100648] ------------[ cut here ]------------
> [   24.100723] WARNING: CPU: 3 PID: 1400 at
> kernel/locking/lockdep.c:3519 check_flags+0x6c/0x1b0()
> [   24.100833] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
> [   24.100903] Modules linked in: option cdc_ether usbnet usb_wwan
> usbserial i915 arc4 iwldvm mac80211 snd_hda_codec_hdmi
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
> snd_hda_codec bnep i2c_algo_bit rfcomm snd_hwdep snd_hda_core
> drm_kms_helper snd_pcm parport_pc snd_seq_midi snd_seq_midi_event
> snd_rawmidi uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_core
> iwlwifi v4l2_common videodev btusb ppdev joydev btrtl btbcm kvm_intel
> cfg80211 kvm btintel bluetooth usb_storage snd_seq syscopyarea
> sysfillrect sysimgblt psmouse fb_sys_fops drm snd_timer snd_seq_device
> snd serio_raw soundcore samsung_laptop video lpc_ich wmi intel_rst
> mac_hid lp parport binfmt_misc hid_generic usbhid hid r8169 mii
> [   24.102047] CPU: 3 PID: 1400 Comm: acpid Not tainted
> 4.3.0-rc3-6-llvmlinux-amd64 #1

Question: Does this happen if you compile with gcc? If not, this could
be a compiler bug.

> [   24.102142] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.102281]  ffffffff81c5a6ef ffff8800bc48bc18 0000000000000003
> ffff8800bc48bb88
> [   24.102394]  ffffffff8149cabd ffff8800bc48bbc8 ffffffff810907c4
> 00000009bc48bc58
> [   24.102479]  ffffffff81c4614f ffff880118b001c0 ffff8800ba929000
> 0000000000000dbf
> [   24.102542] Call Trace:
> [   24.102567]  [<ffffffff8149cabd>] dump_stack+0x7d/0xa0
> [   24.102606]  [<ffffffff810907c4>] warn_slowpath_common+0x94/0x120
> [   24.102648]  [<ffffffff81090721>] warn_slowpath_fmt+0x61/0x70
> [   24.102690]  [<ffffffff81929e09>] ? __mutex_lock_common+0x5c9/0x8a0
> [   24.102735]  [<ffffffff810f0b7c>] check_flags+0x6c/0x1b0
> [   24.102772]  [<ffffffff810f3ed4>] lock_is_held+0x44/0xc0
> [   24.102810]  [<ffffffff810cd35e>] ___might_sleep+0x3e/0x2a0
> [   24.102849]  [<ffffffff810cd2af>] __might_sleep+0x4f/0xc0
> [   24.102888]  [<ffffffffa00563d0>] usbhid_close+0x70/0xc0 [usbhid]
> [   24.102933]  [<ffffffffa00394d1>] hidinput_close+0x31/0x40 [hid]
> [   24.104797]  [<ffffffffa00394a0>] ? hidinput_open+0x40/0x40 [hid]
> [   24.107960]  [<ffffffff817440c8>] input_close_device+0x48/0x70
> [   24.111139]  [<ffffffff8174c9a6>] evdev_release+0xd6/0xf0
> [   24.114298]  [<ffffffff81274367>] __fput+0x107/0x240
> [   24.117400]  [<ffffffff812741f6>] ____fput+0x16/0x20
> [   24.120450]  [<ffffffff810b97bc>] task_work_run+0x6c/0xe0
> [   24.123473]  [<ffffffff81003b8a>] prepare_exit_to_usermode+0x13a/0x140
> [   24.126515]  [<ffffffff81003e11>] syscall_return_slowpath+0x281/0x2f0
> [   24.129545]  [<ffffffff81270a45>] ? filp_close+0x65/0x90
> [   24.132500]  [<ffffffff810f02d9>] ? trace_hardirqs_on_caller+0x19/0x290
> [   24.135463]  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [   24.138400]  [<ffffffff8192d9e2>] int_ret_from_sys_call+0x25/0x9f
> [   24.141364] ---[ end trace 615e247c8083b603 ]---
> [   24.144296] possible reason: unannotated irqs-off.
> [   24.147183] irq event stamp: 3227
> [   24.150067] hardirqs last  enabled at (3227): [<ffffffff8192ccd2>]
> _raw_spin_unlock_irq+0x32/0x60
> [   24.152440] hardirqs last disabled at (3226): [<ffffffff8192cb24>]
> _raw_spin_lock_irq+0x24/0x70

The above two lines is what I wanted to see. The (3227) and (3226) show
the order of events. According to lockdep, hardirqs were last disabled
at _raw_spin_lock_irq, and last enabled at _raw_spin_unlock_irq.
According to lockdep, irqs should be enabled. But apparently, they are
not. Unless something disabled interrupts without lockdep knowing about
it.

-- Steve


> [   24.155429] softirqs last  enabled at (2684): [<ffffffff818b3069>]
> local_bh_enable+0x9/0x20
> [   24.157660] softirqs last disabled at (2682): [<ffffffff818b3049>]
> local_bh_disable+0x9/0x20
> [   24.160601] BUG: sleeping function called from invalid context at
> drivers/hid/usbhid/hid-core.c:1395
> [   24.163592] in_atomic(): 0, irqs_disabled(): 1, pid: 1400, name: acpid
> [   24.165978] INFO: lockdep is turned off.
> [   24.168739] irq event stamp: 3227
> [   24.171280] hardirqs last  enabled at (3227): [<ffffffff8192ccd2>]
> _raw_spin_unlock_irq+0x32/0x60
> [   24.174084] hardirqs last disabled at (3226): [<ffffffff8192cb24>]
> _raw_spin_lock_irq+0x24/0x70
> [   24.177052] softirqs last  enabled at (2684): [<ffffffff818b3069>]
> local_bh_enable+0x9/0x20
> [   24.179976] softirqs last disabled at (2682): [<ffffffff818b3049>]
> local_bh_disable+0x9/0x20
> [   24.182659] CPU: 3 PID: 1400 Comm: acpid Tainted: G        W
> 4.3.0-rc3-6-llvmlinux-amd64 #1
> [   24.184189] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [   24.185728]  ffff8800ba929000 0000000000000092 0000000000000000
> ffff8800bc48bc88
> [   24.187213]  ffffffff8149cabd ffff8800bc48bcb8 ffffffff810cd5aa
> ffffffffa005f80c
> [   24.189039]  ffff880118b001c0 0000000000000000 0000000000000573
> ffff8800bc48bcf8
> [   24.191538] Call Trace:
> [   24.193955]  [<ffffffff8149cabd>] dump_stack+0x7d/0xa0
> [   24.196305]  [<ffffffff810cd5aa>] ___might_sleep+0x28a/0x2a0
> [   24.198316]  [<ffffffff810cd2af>] __might_sleep+0x4f/0xc0
> [   24.199798]  [<ffffffffa00563d0>] usbhid_close+0x70/0xc0 [usbhid]
> [   24.201181]  [<ffffffffa00394d1>] hidinput_close+0x31/0x40 [hid]
> [   24.202609]  [<ffffffffa00394a0>] ? hidinput_open+0x40/0x40 [hid]
> [   24.204045]  [<ffffffff817440c8>] input_close_device+0x48/0x70
> [   24.205453]  [<ffffffff8174c9a6>] evdev_release+0xd6/0xf0
> [   24.207352]  [<ffffffff81274367>] __fput+0x107/0x240
> [   24.209681]  [<ffffffff812741f6>] ____fput+0x16/0x20
> [   24.211276]  [<ffffffff810b97bc>] task_work_run+0x6c/0xe0
> [   24.212723]  [<ffffffff81003b8a>] prepare_exit_to_usermode+0x13a/0x140
> [   24.215069]  [<ffffffff81003e11>] syscall_return_slowpath+0x281/0x2f0
> [   24.217423]  [<ffffffff81270a45>] ? filp_close+0x65/0x90
> [   24.218919]  [<ffffffff810f02d9>] ? trace_hardirqs_on_caller+0x19/0x290
> [   24.220271]  [<ffffffff81003017>] ? trace_hardirqs_on_thunk+0x17/0x19
> [   24.222020]  [<ffffffff8192d9e2>] int_ret_from_sys_call+0x25/0x9f
> 
> Do you need more informations?
> 
> - Sedat -

--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index 36712e9f56c2..188f59348ec5 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -729,16 +729,16 @@  void usbhid_close(struct hid_device *hid)
 	 * data acquistion due to a resumption we no longer
 	 * care about
 	 */
-	spin_lock_irq(&usbhid->lock);
+	spin_lock_bh(&usbhid->lock);
 	if (!--hid->open) {
-		spin_unlock_irq(&usbhid->lock);
+		spin_unlock_bh(&usbhid->lock);
 		hid_cancel_delayed_stuff(usbhid);
 		if (!(hid->quirks & HID_QUIRK_ALWAYS_POLL)) {
 			usb_kill_urb(usbhid->urbin);
 			usbhid->intf->needs_remote_wakeup = 0;
 		}
 	} else {
-		spin_unlock_irq(&usbhid->lock);
+		spin_unlock_bh(&usbhid->lock);
 	}
 	mutex_unlock(&hid_open_mut);
 }