diff mbox

xen: avoid deadlock in xenbus driver

Message ID 20170607162412.8432-1-jgross@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Jürgen Groß June 7, 2017, 4:24 p.m. UTC
There has been a report about a deadlock in the xenbus driver:

[  247.979498] ======================================================
[  247.985688] WARNING: possible circular locking dependency detected
[  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
[  247.997040] ------------------------------------------------------
[  248.003232] xenbus/91 is trying to acquire lock:
[  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
xenbus_dev_queue_reply+0x3c/0x230
[  248.017163]
[  248.017163] but task is already holding lock:
[  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
xenbus_thread+0x5f0/0x798
[  248.031267]
[  248.031267] which lock already depends on the new lock.
[  248.031267]
[  248.039615]
[  248.039615] the existing dependency chain (in reverse order) is:
[  248.047176]
[  248.047176] -> #1 (xb_write_mutex){+.+...}:
[  248.052943]        __lock_acquire+0x1728/0x1778
[  248.057498]        lock_acquire+0xc4/0x288
[  248.061630]        __mutex_lock+0x84/0x868
[  248.065755]        mutex_lock_nested+0x3c/0x50
[  248.070227]        xs_send+0x164/0x1f8
[  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
[  248.079427]        xenbus_file_write+0x260/0x420
[  248.084073]        __vfs_write+0x48/0x138
[  248.088113]        vfs_write+0xa8/0x1b8
[  248.091983]        SyS_write+0x54/0xb0
[  248.095768]        el0_svc_naked+0x24/0x28
[  248.099897]
[  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
[  248.106088]        print_circular_bug+0x80/0x2e0
[  248.110730]        __lock_acquire+0x1768/0x1778
[  248.115288]        lock_acquire+0xc4/0x288
[  248.119417]        __mutex_lock+0x84/0x868
[  248.123545]        mutex_lock_nested+0x3c/0x50
[  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
[  248.133005]        xenbus_thread+0x788/0x798
[  248.137306]        kthread+0x110/0x140
[  248.141087]        ret_from_fork+0x10/0x40

It is rather easy to avoid by dropping xb_write_mutex before calling
xenbus_dev_queue_reply().

Fixes fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
driver for multiple concurrent xenstore accesses").

Cc: <stable@vger.kernel.org> # 4.11
Reported-by: Andre Przywara <andre.przywara@arm.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
---
 drivers/xen/xenbus/xenbus_comms.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Jürgen Groß June 8, 2017, 2 p.m. UTC | #1
On 07/06/17 18:24, Juergen Gross wrote:
> There has been a report about a deadlock in the xenbus driver:
> 
> [  247.979498] ======================================================
> [  247.985688] WARNING: possible circular locking dependency detected
> [  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
> [  247.997040] ------------------------------------------------------
> [  248.003232] xenbus/91 is trying to acquire lock:
> [  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
> xenbus_dev_queue_reply+0x3c/0x230
> [  248.017163]
> [  248.017163] but task is already holding lock:
> [  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
> xenbus_thread+0x5f0/0x798
> [  248.031267]
> [  248.031267] which lock already depends on the new lock.
> [  248.031267]
> [  248.039615]
> [  248.039615] the existing dependency chain (in reverse order) is:
> [  248.047176]
> [  248.047176] -> #1 (xb_write_mutex){+.+...}:
> [  248.052943]        __lock_acquire+0x1728/0x1778
> [  248.057498]        lock_acquire+0xc4/0x288
> [  248.061630]        __mutex_lock+0x84/0x868
> [  248.065755]        mutex_lock_nested+0x3c/0x50
> [  248.070227]        xs_send+0x164/0x1f8
> [  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
> [  248.079427]        xenbus_file_write+0x260/0x420
> [  248.084073]        __vfs_write+0x48/0x138
> [  248.088113]        vfs_write+0xa8/0x1b8
> [  248.091983]        SyS_write+0x54/0xb0
> [  248.095768]        el0_svc_naked+0x24/0x28
> [  248.099897]
> [  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
> [  248.106088]        print_circular_bug+0x80/0x2e0
> [  248.110730]        __lock_acquire+0x1768/0x1778
> [  248.115288]        lock_acquire+0xc4/0x288
> [  248.119417]        __mutex_lock+0x84/0x868
> [  248.123545]        mutex_lock_nested+0x3c/0x50
> [  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
> [  248.133005]        xenbus_thread+0x788/0x798
> [  248.137306]        kthread+0x110/0x140
> [  248.141087]        ret_from_fork+0x10/0x40
> 
> It is rather easy to avoid by dropping xb_write_mutex before calling
> xenbus_dev_queue_reply().
> 
> Fixes fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
> driver for multiple concurrent xenstore accesses").
> 
> Cc: <stable@vger.kernel.org> # 4.11
> Reported-by: Andre Przywara <andre.przywara@arm.com>
> Signed-off-by: Juergen Gross <jgross@suse.com>

While this patch is functionally okay, the resulting code is not
very nice. Will send out V2 soon looking much better.


Juergen

> ---
>  drivers/xen/xenbus/xenbus_comms.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
> index 856ada5d39c9..a44bcdbf6533 100644
> --- a/drivers/xen/xenbus/xenbus_comms.c
> +++ b/drivers/xen/xenbus/xenbus_comms.c
> @@ -305,18 +305,21 @@ static int process_msg(void)
>  					req->body = state.body;
>  					req->state = xb_req_state_got_reply;
>  					list_del(&req->list);
> +					mutex_unlock(&xb_write_mutex);
>  					req->cb(req);
>  				} else {
>  					list_del(&req->list);
> +					mutex_unlock(&xb_write_mutex);
>  					kfree(req);
>  				}
>  				err = 0;
>  				break;
>  			}
>  		}
> -		mutex_unlock(&xb_write_mutex);
> -		if (err)
> +		if (err) {
> +			mutex_unlock(&xb_write_mutex);
>  			goto out;
> +		}
>  	}
>  
>  	mutex_unlock(&xs_response_mutex);
>
Andre Przywara June 8, 2017, 2:06 p.m. UTC | #2
Hi Jürgen,

On 08/06/17 15:00, Juergen Gross wrote:
> On 07/06/17 18:24, Juergen Gross wrote:
>> There has been a report about a deadlock in the xenbus driver:
>>
>> [  247.979498] ======================================================
>> [  247.985688] WARNING: possible circular locking dependency detected
>> [  247.991882] 4.12.0-rc4-00022-gc4b25c0 #575 Not tainted
>> [  247.997040] ------------------------------------------------------
>> [  248.003232] xenbus/91 is trying to acquire lock:
>> [  248.007875]  (&u->msgbuffer_mutex){+.+.+.}, at: [<ffff00000863e904>]
>> xenbus_dev_queue_reply+0x3c/0x230
>> [  248.017163]
>> [  248.017163] but task is already holding lock:
>> [  248.023096]  (xb_write_mutex){+.+...}, at: [<ffff00000863a940>]
>> xenbus_thread+0x5f0/0x798
>> [  248.031267]
>> [  248.031267] which lock already depends on the new lock.
>> [  248.031267]
>> [  248.039615]
>> [  248.039615] the existing dependency chain (in reverse order) is:
>> [  248.047176]
>> [  248.047176] -> #1 (xb_write_mutex){+.+...}:
>> [  248.052943]        __lock_acquire+0x1728/0x1778
>> [  248.057498]        lock_acquire+0xc4/0x288
>> [  248.061630]        __mutex_lock+0x84/0x868
>> [  248.065755]        mutex_lock_nested+0x3c/0x50
>> [  248.070227]        xs_send+0x164/0x1f8
>> [  248.074015]        xenbus_dev_request_and_reply+0x6c/0x88
>> [  248.079427]        xenbus_file_write+0x260/0x420
>> [  248.084073]        __vfs_write+0x48/0x138
>> [  248.088113]        vfs_write+0xa8/0x1b8
>> [  248.091983]        SyS_write+0x54/0xb0
>> [  248.095768]        el0_svc_naked+0x24/0x28
>> [  248.099897]
>> [  248.099897] -> #0 (&u->msgbuffer_mutex){+.+.+.}:
>> [  248.106088]        print_circular_bug+0x80/0x2e0
>> [  248.110730]        __lock_acquire+0x1768/0x1778
>> [  248.115288]        lock_acquire+0xc4/0x288
>> [  248.119417]        __mutex_lock+0x84/0x868
>> [  248.123545]        mutex_lock_nested+0x3c/0x50
>> [  248.128016]        xenbus_dev_queue_reply+0x3c/0x230
>> [  248.133005]        xenbus_thread+0x788/0x798
>> [  248.137306]        kthread+0x110/0x140
>> [  248.141087]        ret_from_fork+0x10/0x40
>>
>> It is rather easy to avoid by dropping xb_write_mutex before calling
>> xenbus_dev_queue_reply().
>>
>> Fixes fd8aa9095a95c02dcc35540a263267c29b8fda9d ("xen: optimize xenbus
>> driver for multiple concurrent xenstore accesses").
>>
>> Cc: <stable@vger.kernel.org> # 4.11
>> Reported-by: Andre Przywara <andre.przywara@arm.com>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
> 
> While this patch is functionally okay, the resulting code is not
> very nice. Will send out V2 soon looking much better.

Thanks anyway for the quick reaction! I will try tomorrow if I can
reproduce the old problem and then confirm that the patch fixes it. I
think I saw xencommons fail somehow (wrong xen-tools version or using
/bin/sh), then fixed that, retried and saw the splat.

Cheers,
Andre.

> Juergen
> 
>> ---
>>  drivers/xen/xenbus/xenbus_comms.c | 7 +++++--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
>> index 856ada5d39c9..a44bcdbf6533 100644
>> --- a/drivers/xen/xenbus/xenbus_comms.c
>> +++ b/drivers/xen/xenbus/xenbus_comms.c
>> @@ -305,18 +305,21 @@ static int process_msg(void)
>>  					req->body = state.body;
>>  					req->state = xb_req_state_got_reply;
>>  					list_del(&req->list);
>> +					mutex_unlock(&xb_write_mutex);
>>  					req->cb(req);
>>  				} else {
>>  					list_del(&req->list);
>> +					mutex_unlock(&xb_write_mutex);
>>  					kfree(req);
>>  				}
>>  				err = 0;
>>  				break;
>>  			}
>>  		}
>> -		mutex_unlock(&xb_write_mutex);
>> -		if (err)
>> +		if (err) {
>> +			mutex_unlock(&xb_write_mutex);
>>  			goto out;
>> +		}
>>  	}
>>  
>>  	mutex_unlock(&xs_response_mutex);
>>
>
diff mbox

Patch

diff --git a/drivers/xen/xenbus/xenbus_comms.c b/drivers/xen/xenbus/xenbus_comms.c
index 856ada5d39c9..a44bcdbf6533 100644
--- a/drivers/xen/xenbus/xenbus_comms.c
+++ b/drivers/xen/xenbus/xenbus_comms.c
@@ -305,18 +305,21 @@  static int process_msg(void)
 					req->body = state.body;
 					req->state = xb_req_state_got_reply;
 					list_del(&req->list);
+					mutex_unlock(&xb_write_mutex);
 					req->cb(req);
 				} else {
 					list_del(&req->list);
+					mutex_unlock(&xb_write_mutex);
 					kfree(req);
 				}
 				err = 0;
 				break;
 			}
 		}
-		mutex_unlock(&xb_write_mutex);
-		if (err)
+		if (err) {
+			mutex_unlock(&xb_write_mutex);
 			goto out;
+		}
 	}
 
 	mutex_unlock(&xs_response_mutex);