diff mbox series

drm: fix deadlock of syncobj

Message ID 20181019102641.3574-1-david1.zhou@amd.com (mailing list archive)
State New, archived
Headers show
Series drm: fix deadlock of syncobj | expand

Commit Message

Chunming Zhou Oct. 19, 2018, 10:26 a.m. UTC
Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/drm_syncobj.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

Comments

Chris Wilson Oct. 19, 2018, 10:50 a.m. UTC | #1
Quoting Chunming Zhou (2018-10-19 11:26:41)
> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index 57bf6006394d..2f3c14cb5156 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>         drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>         if (fence) {
>                 struct drm_syncobj_cb *cur, *tmp;
> +               struct list_head cb_list;
> +               INIT_LIST_HEAD(&cb_list);
LIST_HEAD(cb_list); // does both in one

>                 spin_lock(&syncobj->lock);
> -               list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
> +               list_splice_init(&syncobj->cb_list, &cb_list);

Steal the snapshot of the list under the lock, ok.

> +               spin_unlock(&syncobj->lock);
> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>                         list_del_init(&cur->node);

Races against external caller of drm_syncobj_remove_callback(). However,
it looks like that race is just fine, but we don't guard against the
struct drm_syncobj_cb itself being freed, leading to all sort of fun for
an interrupted drm_syncobj_array_wait_timeout.

That kfree seems to undermine the validity of stealing the list.
-Chris
Chunming Zhou Oct. 19, 2018, 11:26 a.m. UTC | #2
On 2018年10月19日 18:50, Chris Wilson wrote:
> Quoting Chunming Zhou (2018-10-19 11:26:41)
>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>> Cc: Daniel Vetter <daniel@ffwll.ch>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Christian König <christian.koenig@amd.com>
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
>> index 57bf6006394d..2f3c14cb5156 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
>>          drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>>          if (fence) {
>>                  struct drm_syncobj_cb *cur, *tmp;
>> +               struct list_head cb_list;
>> +               INIT_LIST_HEAD(&cb_list);
> LIST_HEAD(cb_list); // does both in one
>
>>                  spin_lock(&syncobj->lock);
>> -               list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
>> +               list_splice_init(&syncobj->cb_list, &cb_list);
> Steal the snapshot of the list under the lock, ok.
>
>> +               spin_unlock(&syncobj->lock);
>> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>                          list_del_init(&cur->node);
> Races against external caller of drm_syncobj_remove_callback(). However,
> it looks like that race is just fine, but we don't guard against the
> struct drm_syncobj_cb itself being freed, leading to all sort of fun for
> an interrupted drm_syncobj_array_wait_timeout.
Thanks quick review, I will use "while (!list_empty()) { e = 
list_first_entry(); list_del(e)" to avoid deadlock.

will send v2 in one minute.

Regards,
David Zhou
>
> That kfree seems to undermine the validity of stealing the list.
> -Chris
Chunming Zhou Oct. 19, 2018, 12:01 p.m. UTC | #3
On 2018年10月19日 19:26, zhoucm1 wrote:
>
>
> On 2018年10月19日 18:50, Chris Wilson wrote:
>> Quoting Chunming Zhou (2018-10-19 11:26:41)
>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Christian König <christian.koenig@amd.com>
>>> ---
>>>   drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>>> b/drivers/gpu/drm/drm_syncobj.c
>>> index 57bf6006394d..2f3c14cb5156 100644
>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct 
>>> drm_syncobj *syncobj,
>>>          drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>>>          if (fence) {
>>>                  struct drm_syncobj_cb *cur, *tmp;
>>> +               struct list_head cb_list;
>>> +               INIT_LIST_HEAD(&cb_list);
>> LIST_HEAD(cb_list); // does both in one
>>
>>> spin_lock(&syncobj->lock);
>>> -               list_for_each_entry_safe(cur, tmp, 
>>> &syncobj->cb_list, node) {
>>> +               list_splice_init(&syncobj->cb_list, &cb_list);
>> Steal the snapshot of the list under the lock, ok.
>>
>>> + spin_unlock(&syncobj->lock);
>>> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>                          list_del_init(&cur->node);
>> Races against external caller of drm_syncobj_remove_callback(). However,
>> it looks like that race is just fine, but we don't guard against the
>> struct drm_syncobj_cb itself being freed, leading to all sort of fun for
>> an interrupted drm_syncobj_array_wait_timeout.
> Thanks quick review, I will use "while (!list_empty()) { e = 
> list_first_entry(); list_del(e)" to avoid deadlock.
this still cannot resolve freeing problem,  do you mind I change 
spinlock to mutex?

Thanks,
David Zhou
>
> will send v2 in one minute.
>
> Regards,
> David Zhou
>>
>> That kfree seems to undermine the validity of stealing the list.
>> -Chris
>
Christian König Oct. 19, 2018, 12:08 p.m. UTC | #4
Am 19.10.18 um 14:01 schrieb zhoucm1:
>
>
> On 2018年10月19日 19:26, zhoucm1 wrote:
>>
>>
>> On 2018年10月19日 18:50, Chris Wilson wrote:
>>> Quoting Chunming Zhou (2018-10-19 11:26:41)
>>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>> Cc: Christian König <christian.koenig@amd.com>
>>>> ---
>>>>   drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>>>>   1 file changed, 5 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>> index 57bf6006394d..2f3c14cb5156 100644
>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct 
>>>> drm_syncobj *syncobj,
>>>>          drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>>>>          if (fence) {
>>>>                  struct drm_syncobj_cb *cur, *tmp;
>>>> +               struct list_head cb_list;
>>>> +               INIT_LIST_HEAD(&cb_list);
>>> LIST_HEAD(cb_list); // does both in one
>>>
>>>> spin_lock(&syncobj->lock);
>>>> -               list_for_each_entry_safe(cur, tmp, 
>>>> &syncobj->cb_list, node) {
>>>> +               list_splice_init(&syncobj->cb_list, &cb_list);
>>> Steal the snapshot of the list under the lock, ok.
>>>
>>>> + spin_unlock(&syncobj->lock);
>>>> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>                          list_del_init(&cur->node);
>>> Races against external caller of drm_syncobj_remove_callback(). 
>>> However,
>>> it looks like that race is just fine, but we don't guard against the
>>> struct drm_syncobj_cb itself being freed, leading to all sort of fun 
>>> for
>>> an interrupted drm_syncobj_array_wait_timeout.
>> Thanks quick review, I will use "while (!list_empty()) { e = 
>> list_first_entry(); list_del(e)" to avoid deadlock.
> this still cannot resolve freeing problem,  do you mind I change 
> spinlock to mutex?

How does that help?

What you could do is to merge the array of fences into the beginning of 
the signal_pt, e.g. something like this:

struct drm_syncobj_signal_pt {
         struct dma_fence fences[2];
         struct dma_fence_array *fence_array;
         u64    value;
         struct list_head list;
};

This way the drm_syncobj_signal_pt is freed when the fence_array is 
freed. That should be sufficient if we correctly reference count the 
fence_array.

Christian.

>
> Thanks,
> David Zhou
>>
>> will send v2 in one minute.
>>
>> Regards,
>> David Zhou
>>>
>>> That kfree seems to undermine the validity of stealing the list.
>>> -Chris
>>
>
Chunming Zhou Oct. 19, 2018, 12:19 p.m. UTC | #5
On 2018年10月19日 20:08, Koenig, Christian wrote:
> Am 19.10.18 um 14:01 schrieb zhoucm1:
>>
>> On 2018年10月19日 19:26, zhoucm1 wrote:
>>>
>>> On 2018年10月19日 18:50, Chris Wilson wrote:
>>>> Quoting Chunming Zhou (2018-10-19 11:26:41)
>>>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>>>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>> index 57bf6006394d..2f3c14cb5156 100644
>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct
>>>>> drm_syncobj *syncobj,
>>>>>           drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>>>>>           if (fence) {
>>>>>                   struct drm_syncobj_cb *cur, *tmp;
>>>>> +               struct list_head cb_list;
>>>>> +               INIT_LIST_HEAD(&cb_list);
>>>> LIST_HEAD(cb_list); // does both in one
>>>>
>>>>> spin_lock(&syncobj->lock);
>>>>> -               list_for_each_entry_safe(cur, tmp,
>>>>> &syncobj->cb_list, node) {
>>>>> +               list_splice_init(&syncobj->cb_list, &cb_list);
>>>> Steal the snapshot of the list under the lock, ok.
>>>>
>>>>> + spin_unlock(&syncobj->lock);
>>>>> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>>                           list_del_init(&cur->node);
>>>> Races against external caller of drm_syncobj_remove_callback().
>>>> However,
>>>> it looks like that race is just fine, but we don't guard against the
>>>> struct drm_syncobj_cb itself being freed, leading to all sort of fun
>>>> for
>>>> an interrupted drm_syncobj_array_wait_timeout.
>>> Thanks quick review, I will use "while (!list_empty()) { e =
>>> list_first_entry(); list_del(e)" to avoid deadlock.
>> this still cannot resolve freeing problem,  do you mind I change
>> spinlock to mutex?
> How does that help?
>
> What you could do is to merge the array of fences into the beginning of
> the signal_pt, e.g. something like this:
>
> struct drm_syncobj_signal_pt {
>           struct dma_fence fences[2];
>           struct dma_fence_array *fence_array;
>           u64    value;
>           struct list_head list;
> };
>
> This way the drm_syncobj_signal_pt is freed when the fence_array is
> freed. That should be sufficient if we correctly reference count the
> fence_array.
I'm not sure what problem you said, the deadlock reason is :
Cb func will call drm_syncobj_search_fence, which will need to grab the 
lock, otherwise deadlock.

But when we steal list or use "while (!list_empty()) { e = 
list_first_entry(); list_del(e)", both will encounter another freeing 
problem, that is syncobj_cb could be freed when wait timeout.

If we change to use mutex, then we can move lock inside of _search_fence 
out.
another way is we add a separate spinlock for signal_pt_list, not share 
one lock with cb_list.

Regards,
David
>
> Christian.
>
>> Thanks,
>> David Zhou
>>> will send v2 in one minute.
>>>
>>> Regards,
>>> David Zhou
>>>> That kfree seems to undermine the validity of stealing the list.
>>>> -Chris
Christian König Oct. 19, 2018, 12:54 p.m. UTC | #6
Am 19.10.18 um 14:19 schrieb zhoucm1:
>
>
> On 2018年10月19日 20:08, Koenig, Christian wrote:
>> Am 19.10.18 um 14:01 schrieb zhoucm1:
>>>
>>> On 2018年10月19日 19:26, zhoucm1 wrote:
>>>>
>>>> On 2018年10月19日 18:50, Chris Wilson wrote:
>>>>> Quoting Chunming Zhou (2018-10-19 11:26:41)
>>>>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com>
>>>>>> Cc: Daniel Vetter <daniel@ffwll.ch>
>>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Cc: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_syncobj.c | 7 +++++--
>>>>>>    1 file changed, 5 insertions(+), 2 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>>> index 57bf6006394d..2f3c14cb5156 100644
>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct
>>>>>> drm_syncobj *syncobj,
>>>>>>           drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
>>>>>>           if (fence) {
>>>>>>                   struct drm_syncobj_cb *cur, *tmp;
>>>>>> +               struct list_head cb_list;
>>>>>> +               INIT_LIST_HEAD(&cb_list);
>>>>> LIST_HEAD(cb_list); // does both in one
>>>>>
>>>>>> spin_lock(&syncobj->lock);
>>>>>> -               list_for_each_entry_safe(cur, tmp,
>>>>>> &syncobj->cb_list, node) {
>>>>>> + list_splice_init(&syncobj->cb_list, &cb_list);
>>>>> Steal the snapshot of the list under the lock, ok.
>>>>>
>>>>>> + spin_unlock(&syncobj->lock);
>>>>>> +               list_for_each_entry_safe(cur, tmp, &cb_list, node) {
>>>>>> list_del_init(&cur->node);
>>>>> Races against external caller of drm_syncobj_remove_callback().
>>>>> However,
>>>>> it looks like that race is just fine, but we don't guard against the
>>>>> struct drm_syncobj_cb itself being freed, leading to all sort of fun
>>>>> for
>>>>> an interrupted drm_syncobj_array_wait_timeout.
>>>> Thanks quick review, I will use "while (!list_empty()) { e =
>>>> list_first_entry(); list_del(e)" to avoid deadlock.
>>> this still cannot resolve freeing problem,  do you mind I change
>>> spinlock to mutex?
>> How does that help?
>>
>> What you could do is to merge the array of fences into the beginning of
>> the signal_pt, e.g. something like this:
>>
>> struct drm_syncobj_signal_pt {
>>           struct dma_fence fences[2];
>>           struct dma_fence_array *fence_array;
>>           u64    value;
>>           struct list_head list;
>> };
>>
>> This way the drm_syncobj_signal_pt is freed when the fence_array is
>> freed. That should be sufficient if we correctly reference count the
>> fence_array.
> I'm not sure what problem you said, the deadlock reason is :
> Cb func will call drm_syncobj_search_fence, which will need to grab 
> the lock, otherwise deadlock.
>
> But when we steal list or use "while (!list_empty()) { e = 
> list_first_entry(); list_del(e)", both will encounter another freeing 
> problem, that is syncobj_cb could be freed when wait timeout.
>
> If we change to use mutex, then we can move lock inside of 
> _search_fence out.

> another way is we add a separate spinlock for signal_pt_list, not 
> share one lock with cb_list.

Well of hand that sounds like the cleanest approach to me.

Christian.

>
> Regards,
> David
>>
>> Christian.
>>
>>> Thanks,
>>> David Zhou
>>>> will send v2 in one minute.
>>>>
>>>> Regards,
>>>> David Zhou
>>>>> That kfree seems to undermine the validity of stealing the list.
>>>>> -Chris
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index 57bf6006394d..2f3c14cb5156 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -344,13 +344,16 @@  void drm_syncobj_replace_fence(struct drm_syncobj *syncobj,
 	drm_syncobj_create_signal_pt(syncobj, fence, pt_value);
 	if (fence) {
 		struct drm_syncobj_cb *cur, *tmp;
+		struct list_head cb_list;
 
+		INIT_LIST_HEAD(&cb_list);
 		spin_lock(&syncobj->lock);
-		list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) {
+		list_splice_init(&syncobj->cb_list, &cb_list);
+		spin_unlock(&syncobj->lock);
+		list_for_each_entry_safe(cur, tmp, &cb_list, node) {
 			list_del_init(&cur->node);
 			cur->func(syncobj, cur);
 		}
-		spin_unlock(&syncobj->lock);
 	}
 }
 EXPORT_SYMBOL(drm_syncobj_replace_fence);