diff mbox

[v2] hid: usbhid: hid-core: fix recursive deadlock

Message ID 1448050742-10777-1-git-send-email-adi@adirat.com (mailing list archive)
State New, archived
Delegated to: Jiri Kosina
Headers show

Commit Message

Adi Ratiu Nov. 20, 2015, 8:19 p.m. UTC
The critical section protected by usbhid->lock in hid_ctrl() is too
big and because of this it causes a recursive deadlock. "Too big" means
the case statement and the call to hid_input_report() do not need to be
protected by the spinlock (no URB operations are done inside them).

The deadlock happens because in certain rare cases drivers try to grab
the lock while handling the ctrl irq which grabs the lock before them
as described above. For example newer wacom tablets like 056a:033c try
to reschedule proximity reads from wacom_intuos_schedule_prox_event()
calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
which tries to grab the usbhid lock already held by hid_ctrl().

There are two ways to get out of this deadlock:
    1. Make the drivers work "around" the ctrl critical region, in the
    wacom case for ex. by delaying the scheduling of the proximity read
    request itself to a workqueue.
    2. Shrink the critical region so the usbhid lock protects only the
    instructions which modify usbhid state, calling hid_input_report()
    with the spinlock unlocked, allowing the device driver to grab the
    lock first, finish and then grab the lock afterwards in hid_ctrl().

This patch implements the 2nd solution.

Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
---
 drivers/hid/usbhid/hid-core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Adi Ratiu Nov. 29, 2015, 10:29 a.m. UTC | #1
On Fri, 20 Nov 2015 22:19:02 +0200
Ioan-Adrian Ratiu <adi@adirat.com> wrote:

> The critical section protected by usbhid->lock in hid_ctrl() is too
> big and because of this it causes a recursive deadlock. "Too big" means
> the case statement and the call to hid_input_report() do not need to be
> protected by the spinlock (no URB operations are done inside them).
> 
> The deadlock happens because in certain rare cases drivers try to grab
> the lock while handling the ctrl irq which grabs the lock before them
> as described above. For example newer wacom tablets like 056a:033c try
> to reschedule proximity reads from wacom_intuos_schedule_prox_event()
> calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
> which tries to grab the usbhid lock already held by hid_ctrl().
> 
> There are two ways to get out of this deadlock:
>     1. Make the drivers work "around" the ctrl critical region, in the
>     wacom case for ex. by delaying the scheduling of the proximity read
>     request itself to a workqueue.
>     2. Shrink the critical region so the usbhid lock protects only the
>     instructions which modify usbhid state, calling hid_input_report()
>     with the spinlock unlocked, allowing the device driver to grab the
>     lock first, finish and then grab the lock afterwards in hid_ctrl().
> 
> This patch implements the 2nd solution.
> 
> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
> ---
>  drivers/hid/usbhid/hid-core.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
> index 36712e9..5dd426f 100644
> --- a/drivers/hid/usbhid/hid-core.c
> +++ b/drivers/hid/usbhid/hid-core.c
> @@ -477,8 +477,6 @@ static void hid_ctrl(struct urb *urb)
>  	struct usbhid_device *usbhid = hid->driver_data;
>  	int unplug = 0, status = urb->status;
>  
> -	spin_lock(&usbhid->lock);
> -
>  	switch (status) {
>  	case 0:			/* success */
>  		if (usbhid->ctrl[usbhid->ctrltail].dir == USB_DIR_IN)
> @@ -498,6 +496,8 @@ static void hid_ctrl(struct urb *urb)
>  		hid_warn(urb->dev, "ctrl urb status %d received\n", status);
>  	}
>  
> +	spin_lock(&usbhid->lock);
> +
>  	if (unplug) {
>  		usbhid->ctrltail = usbhid->ctrlhead;
>  	} else {

Hello again

Can this please be merged in the 4.4? There are quite a few people who have
their tablets deadlock and don't know how to patch their kernels so are stuck
waiting for a new release.

The severity of this issue is much bigger than I initially thought. Since I've
posted on the wacom mailing list that I have a fix for this deadlock I've
recived lots of email from people complaining of the same problem on a wide
range of tablets.

Some of those people know how to patch a kernel, some found this patch on the
mailing list and tested it and confirmed that it works on the wacom mailing
list (you can verify the deadlock + fix on that mailing list).

Thank you,
Adrian
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Dec. 1, 2015, 4:36 p.m. UTC | #2
On Fri, 20 Nov 2015, Ioan-Adrian Ratiu wrote:

> The critical section protected by usbhid->lock in hid_ctrl() is too
> big and because of this it causes a recursive deadlock. "Too big" means
> the case statement and the call to hid_input_report() do not need to be
> protected by the spinlock (no URB operations are done inside them).
> 
> The deadlock happens because in certain rare cases drivers try to grab
> the lock while handling the ctrl irq which grabs the lock before them
> as described above. For example newer wacom tablets like 056a:033c try
> to reschedule proximity reads from wacom_intuos_schedule_prox_event()
> calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
> which tries to grab the usbhid lock already held by hid_ctrl().
> 
> There are two ways to get out of this deadlock:
>     1. Make the drivers work "around" the ctrl critical region, in the
>     wacom case for ex. by delaying the scheduling of the proximity read
>     request itself to a workqueue.
>     2. Shrink the critical region so the usbhid lock protects only the
>     instructions which modify usbhid state, calling hid_input_report()
>     with the spinlock unlocked, allowing the device driver to grab the
>     lock first, finish and then grab the lock afterwards in hid_ctrl().
> 
> This patch implements the 2nd solution.
> 
> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>

Applied to for-4.5/core branch. Thanks,
Ping Cheng Dec. 2, 2015, midnight UTC | #3
Hi Jiri,

On Tue, Dec 1, 2015 at 8:36 AM, Jiri Kosina <jikos@kernel.org> wrote:
> On Fri, 20 Nov 2015, Ioan-Adrian Ratiu wrote:
>
>> The critical section protected by usbhid->lock in hid_ctrl() is too
>> big and because of this it causes a recursive deadlock. "Too big" means
>> the case statement and the call to hid_input_report() do not need to be
>> protected by the spinlock (no URB operations are done inside them).
>>
>> The deadlock happens because in certain rare cases drivers try to grab
>> the lock while handling the ctrl irq which grabs the lock before them
>> as described above. For example newer wacom tablets like 056a:033c try
>> to reschedule proximity reads from wacom_intuos_schedule_prox_event()
>> calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
>> which tries to grab the usbhid lock already held by hid_ctrl().
>>
>> There are two ways to get out of this deadlock:
>>     1. Make the drivers work "around" the ctrl critical region, in the
>>     wacom case for ex. by delaying the scheduling of the proximity read
>>     request itself to a workqueue.
>>     2. Shrink the critical region so the usbhid lock protects only the
>>     instructions which modify usbhid state, calling hid_input_report()
>>     with the spinlock unlocked, allowing the device driver to grab the
>>     lock first, finish and then grab the lock afterwards in hid_ctrl().
>>
>> This patch implements the 2nd solution.
>>
>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
>
> Applied to for-4.5/core branch. Thanks,

Since wacom_intuos_schedule_prox_event() was introduced on March 5,
2015, the call was first used in kernel 3.19. Shouldn't we back-port
this patch to other stable versions?

Thank you.

Ping
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gerecke, Jason Jan. 21, 2016, 1:28 a.m. UTC | #4
Jason
---
Now instead of four in the eights place /
you’ve got three, ‘Cause you added one  /
(That is to say, eight) to the two,     /
But you can’t take seven from three,    /
So you look at the sixty-fours....



On Sun, Nov 29, 2015 at 2:29 AM, Ioan-Adrian Ratiu <adi@adirat.com> wrote:
> On Fri, 20 Nov 2015 22:19:02 +0200
> Ioan-Adrian Ratiu <adi@adirat.com> wrote:
>
>> The critical section protected by usbhid->lock in hid_ctrl() is too
>> big and because of this it causes a recursive deadlock. "Too big" means
>> the case statement and the call to hid_input_report() do not need to be
>> protected by the spinlock (no URB operations are done inside them).
>>
>> The deadlock happens because in certain rare cases drivers try to grab
>> the lock while handling the ctrl irq which grabs the lock before them
>> as described above. For example newer wacom tablets like 056a:033c try
>> to reschedule proximity reads from wacom_intuos_schedule_prox_event()
>> calling hid_hw_request() -> usbhid_request() -> usbhid_submit_report()
>> which tries to grab the usbhid lock already held by hid_ctrl().
>>
>> There are two ways to get out of this deadlock:
>>     1. Make the drivers work "around" the ctrl critical region, in the
>>     wacom case for ex. by delaying the scheduling of the proximity read
>>     request itself to a workqueue.
>>     2. Shrink the critical region so the usbhid lock protects only the
>>     instructions which modify usbhid state, calling hid_input_report()
>>     with the spinlock unlocked, allowing the device driver to grab the
>>     lock first, finish and then grab the lock afterwards in hid_ctrl().
>>
>> This patch implements the 2nd solution.
>>
>> Signed-off-by: Ioan-Adrian Ratiu <adi@adirat.com>
>> ---
>>  drivers/hid/usbhid/hid-core.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
>> index 36712e9..5dd426f 100644
>> --- a/drivers/hid/usbhid/hid-core.c
>> +++ b/drivers/hid/usbhid/hid-core.c
>> @@ -477,8 +477,6 @@ static void hid_ctrl(struct urb *urb)
>>       struct usbhid_device *usbhid = hid->driver_data;
>>       int unplug = 0, status = urb->status;
>>
>> -     spin_lock(&usbhid->lock);
>> -
>>       switch (status) {
>>       case 0:                 /* success */
>>               if (usbhid->ctrl[usbhid->ctrltail].dir == USB_DIR_IN)
>> @@ -498,6 +496,8 @@ static void hid_ctrl(struct urb *urb)
>>               hid_warn(urb->dev, "ctrl urb status %d received\n", status);
>>       }
>>
>> +     spin_lock(&usbhid->lock);
>> +
>>       if (unplug) {
>>               usbhid->ctrltail = usbhid->ctrlhead;
>>       } else {
>
> Hello again
>
> Can this please be merged in the 4.4? There are quite a few people who have
> their tablets deadlock and don't know how to patch their kernels so are stuck
> waiting for a new release.
>
> The severity of this issue is much bigger than I initially thought. Since I've
> posted on the wacom mailing list that I have a fix for this deadlock I've
> recived lots of email from people complaining of the same problem on a wide
> range of tablets.
>
> Some of those people know how to patch a kernel, some found this patch on the
> mailing list and tested it and confirmed that it works on the wacom mailing
> list (you can verify the deadlock + fix on that mailing list).
>
> Thank you,
> Adrian


On Tue, Dec 1, 2015 at 4:00 PM, Ping Cheng <pinglinux@gmail.com> wrote:
> Hi Jiri,
>
> Since wacom_intuos_schedule_prox_event() was introduced on March 5,
> 2015, the call was first used in kernel 3.19. Shouldn't we back-port
> this patch to other stable versions?
>
> Thank you.
>
> Ping

I've been prompted to resurrect this question after fielding yet
another question from a user encountering a locked-up system. We
received a surprising number of reports of people encountering this
issue in the past several weeks and that's only going to get worse as
4.4 works its way into distros. Integrating back even just to 4.4
instead of 3.19 would be appreciated since this issue has (so far)
only ever been encountered with the new Intuos tablets that were added
in 4.4...

Jason
---
Now instead of four in the eights place /
you’ve got three, ‘Cause you added one  /
(That is to say, eight) to the two,     /
But you can’t take seven from three,    /
So you look at the sixty-fours....
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina Jan. 21, 2016, 9:36 a.m. UTC | #5
On Wed, 20 Jan 2016, Jason Gerecke wrote:

> I've been prompted to resurrect this question after fielding yet another 
> question from a user encountering a locked-up system. We received a 
> surprising number of reports of people encountering this issue in the 
> past several weeks and that's only going to get worse as 4.4 works its 
> way into distros. Integrating back even just to 4.4 instead of 3.19 
> would be appreciated since this issue has (so far) only ever been 
> encountered with the new Intuos tablets that were added in 4.4...

I am in favor of pushing this to -stable. As you guys seem to be the ones 
able to test it with your hardware, would you guys (Ioan-Adrian? Jason?) 
be willing to prepare the backport and submit it once it passes your 
testing?

Thanks,
diff mbox

Patch

diff --git a/drivers/hid/usbhid/hid-core.c b/drivers/hid/usbhid/hid-core.c
index 36712e9..5dd426f 100644
--- a/drivers/hid/usbhid/hid-core.c
+++ b/drivers/hid/usbhid/hid-core.c
@@ -477,8 +477,6 @@  static void hid_ctrl(struct urb *urb)
 	struct usbhid_device *usbhid = hid->driver_data;
 	int unplug = 0, status = urb->status;
 
-	spin_lock(&usbhid->lock);
-
 	switch (status) {
 	case 0:			/* success */
 		if (usbhid->ctrl[usbhid->ctrltail].dir == USB_DIR_IN)
@@ -498,6 +496,8 @@  static void hid_ctrl(struct urb *urb)
 		hid_warn(urb->dev, "ctrl urb status %d received\n", status);
 	}
 
+	spin_lock(&usbhid->lock);
+
 	if (unplug) {
 		usbhid->ctrltail = usbhid->ctrlhead;
 	} else {