diff mbox

[v3] hid-sony: Prevent crash when rumble effects are still loaded at USB disconnect

Message ID 575C4C44.4070401@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Cameron Gutman June 11, 2016, 5:37 p.m. UTC
On 06/11/2016 05:00 AM, Manuel Reimer wrote:
> Hello,
> 
> I did some more testing. Now I added printk messages to start and end of ml_effect_timer and to hl_ff_destroy. Result:
> 
> [  513.493511] ml_effect_timer start
> [  513.746964] ml_effect_timer end
> [  515.107003] hid-sony: Sending to uninitialized device failed!
> [  515.333520] hid-sony: Sending to uninitialized device failed!
> [  515.415381] hid-sony: Sending to uninitialized device failed!
> [  520.476860] ml_effect_timer start
> [  520.677003] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8
> 
> The hid-sony messages are created by my last patch to fix the hid-sony driver. They show that some sending attempts have been cancelled, as the device is about to be destroyed.
> 
> Quite some time after that there in fact is another attempt to call ml_effect_timer, so the timer still was active. Tomorrow I'll add additional printk lines to the hid-sony destroy function to see if this finished executing before this unwanted timer call arrives.
> 
> This also shows that ml_ff_destroy is not the right place to cancel the timer. ml_ff_destroy is called as soon as I exit fftest. It is not called at all on USB disconnect.
> 
> I now guess this can also be reproduced with the xpad driver, but it requires some fiddling with fftest. It took me ten minutes this time to get the bug triggered. I think the way to trigger the bug is to start effect 5 and shortly after that effect 4. With some luck the USB plug is pulled before event 4 is actually started.
> 

Can you try applying the following patch on a clean source tree and see if it resolves your issue?

--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Cameron Gutman June 11, 2016, 7:15 p.m. UTC | #1
On 06/11/2016 12:37 PM, Cameron Gutman wrote:
> 
> 
> On 06/11/2016 05:00 AM, Manuel Reimer wrote:
>> Hello,
>>
>> I did some more testing. Now I added printk messages to start and end of ml_effect_timer and to hl_ff_destroy. Result:
>>
>> [  513.493511] ml_effect_timer start
>> [  513.746964] ml_effect_timer end
>> [  515.107003] hid-sony: Sending to uninitialized device failed!
>> [  515.333520] hid-sony: Sending to uninitialized device failed!
>> [  515.415381] hid-sony: Sending to uninitialized device failed!
>> [  520.476860] ml_effect_timer start
>> [  520.677003] BUG: unable to handle kernel NULL pointer dereference at 00000000000000d8
>>
>> The hid-sony messages are created by my last patch to fix the hid-sony driver. They show that some sending attempts have been cancelled, as the device is about to be destroyed.
>>
>> Quite some time after that there in fact is another attempt to call ml_effect_timer, so the timer still was active. Tomorrow I'll add additional printk lines to the hid-sony destroy function to see if this finished executing before this unwanted timer call arrives.
>>
>> This also shows that ml_ff_destroy is not the right place to cancel the timer. ml_ff_destroy is called as soon as I exit fftest. It is not called at all on USB disconnect.
>>
>> I now guess this can also be reproduced with the xpad driver, but it requires some fiddling with fftest. It took me ten minutes this time to get the bug triggered. I think the way to trigger the bug is to start effect 5 and shortly after that effect 4. With some luck the USB plug is pulled before event 4 is actually started.
>>
> 
> Can you try applying the following patch on a clean source tree and see if it resolves your issue?
> 
> diff --git a/drivers/input/ff-memless.c b/drivers/input/ff-memless.c
> index fcc6c33..6366e9a 100644
> --- a/drivers/input/ff-memless.c
> +++ b/drivers/input/ff-memless.c
> @@ -501,6 +501,7 @@ static void ml_ff_destroy(struct ff_device *ff)
>  {
>         struct ml_device *ml = ff->private;
>  
> +       del_timer_sync(&ml->timer);
>         kfree(ml->private);
>  }
> 

Sorry, I misremembered what I thought I read in your email - you said you didn't see a call to ml_ff_destroy()
in the disconnect case.

Though that is quite strange to me, as the comments on ff_device indicate that it should be the right spot:

483  * @destroy: called by input core when parent input device is being
484  *      destroyed

The input core will call input_ff_destroy() immediately when an input device's release function is called, which
happens when input_unregister_device() is called. Xpad calls that almost immediately when xpad_disconnect() is
called. The HID case is a bit more complex but seems to make sense too: sony_remove() -> hid_hw_stop() ->
hid_disconnect() -> hidinput_disconnect() -> input_unregister_device().

My concern is that, as we thought before, hid_hw_stop() is being called too late in the teardown process in
the hid-sony module. This means input_ff_destroy() doesn't get called until after all the state needed to
send FF events is already freed, leaving the window where you oops before your ml_ff_destroy() printk fires.

I want to be sure we're on the right track though. Can you please resolve the address in the RIP register to a
line number in hid-sony.c?


Cameron
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Manuel Reimer June 12, 2016, 10:01 a.m. UTC | #2
On 06/11/2016 09:15 PM, Cameron Gutman wrote:
> Sorry, I misremembered what I thought I read in your email - you said you didn't see a call to ml_ff_destroy()
> in the disconnect case.

I also don't see this call with the xpad driver!

> Though that is quite strange to me, as the comments on ff_device indicate that it should be the right spot:
>
> 483  * @destroy: called by input core when parent input device is being
> 484  *      destroyed

What does "input device is being destroyed" actually mean in this case?

I tried with the xpad driver and as long as fftest is running (it will 
keep running until it detects that the device has been disconnected) 
there is no ml_ff_destroy event. So, if I want, there can be hours 
between actual USB disconnect and ml_ff_destroy.

So what does it mean to the kernel that fftest is keeping an open file 
handle to this input device? Is it really gone if there is at least one 
process still holding a reference to it?


And. It actually seems possible to trigger the bug with the xpad driver. 
I didn't have the crash, but there is a good chance to get it. This is 
what I managed to get:

[  267.006873] ml_effect_timer start
[  267.256399] ml_effect_timer end
[  268.286814] xpad 1-2:1.0: xpad_try_sending_next_out_packet - 
usb_submit_urb failed with result -19
[  268.331112] xpad 1-2:1.0: xpad_try_sending_next_out_packet - 
usb_submit_urb failed with result -19
[  268.384689] xpad 1-2:1.0: xpad_try_sending_next_out_packet - 
usb_submit_urb failed with result -19
[  268.452146] xpad 1-2:1.0: xpad_try_sending_next_out_packet - 
usb_submit_urb failed with result -19
[  273.813537] ml_effect_timer start
[  273.816845] usb ȵ�<: xpad_try_sending_next_out_packet - 
usb_submit_urb failed with result -19
[  274.400668] ml_effect_timer end


Starting with 268.286814 the USB device is gone! That's why the xpad 
driver is throwing errors.

Some time later at 273.813537 (USB device is gone!) there is a new call 
to the timer callback. "Unfortunately" no crash.

> I want to be sure we're on the right track though. Can you please resolve the address in the RIP register to a
> line number in hid-sony.c?

Can you point me to some instructions on how to do this? The only way to 
debug things, I currently know, is the "good old print(f/k)" debugging...

Manuel
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Manuel Reimer June 14, 2016, 6:31 p.m. UTC | #3
On 06/12/2016 03:56 PM, Antonio Ospite wrote:
> On Sun, 12 Jun 2016 12:01:31 +0200
> Manuel Reimer <mail+linux-input@m-reimer.de> wrote:
>
>> On 06/11/2016 09:15 PM, Cameron Gutman wrote:
> [...]
>>> I want to be sure we're on the right track though. Can you please resolve the address in the RIP register to a
>>> line number in hid-sony.c?
>>
>> Can you point me to some instructions on how to do this? The only way to
>> debug things, I currently know, is the "good old print(f/k)" debugging...
>>
>
> The return addresses are in the trace from the Oops, you can resolve
> them with objdump, see:
> https://ao2.it/en/blog/2010/11/02/bug-hunting-linux-kernel-land-unpretentious-primer
>
> Ciao,
>    Antonio

Thank you for this link and sorry for the late response.

This is my latest crash (abbreviated):

[  183.865163] ml_ff_playback start. Effect: 1 Value: 1
[  184.897065] ml_ff_playback start. Effect: 0 Value: 0
[  184.900069] hid-sony: Sending to uninitialized device failed!
[  184.956432] ml_ff_playback start. Effect: 1 Value: 0
[  184.957933] hid-sony: Sending to uninitialized device failed!
[  185.023015] ml_ff_playback start. Effect: 2 Value: 0
[  185.026143] hid-sony: Sending to uninitialized device failed!
[  190.863508] ml_effect_timer start
[  190.866803] BUG: unable to handle kernel NULL pointer dereference at 
00000000000000d8
[  190.866803] IP: [<ffffffffa037f33d>] sony_play_effect+0x2d/0x90 
[hid_sony]

Also contains some of my printk messages, so we now can actually see why 
the last sending attempts are happening (attempt to stop the effects).

My sony_play_effect is at 0x310 and with 0x2d added we are at 0x33d.

In my case:
      339:	4c 8d 63 04          	lea    0x4(%rbx),%r12
sony_play_effect():
/home/manuel/kernel/4.5.4/src/b/drivers/hid/hid-sony.c:1978
      33d:	88 83 d8 00 00 00    	mov    %al,0xd8(%rbx)
/home/manuel/kernel/4.5.4/src/b/drivers/hid/hid-sony.c:1979
	sc->right = effect->u.rumble.weak_magnitude / 256;
      343:	0f b6 42 13          	movzbl 0x13(%rdx),%eax
sony_schedule_work():
/home/manuel/kernel/4.5.4/src/b/drivers/hid/hid-sony.c:1059


So if I interpret this right, then the source for the crash is the 
assignment of the weak magnitude value to sc->right? Maybe sc already 
freed/zeroed?

Manuel
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/input/ff-memless.c b/drivers/input/ff-memless.c
index fcc6c33..6366e9a 100644
--- a/drivers/input/ff-memless.c
+++ b/drivers/input/ff-memless.c
@@ -501,6 +501,7 @@  static void ml_ff_destroy(struct ff_device *ff)
 {
        struct ml_device *ml = ff->private;
 
+       del_timer_sync(&ml->timer);
        kfree(ml->private);
 }