Message ID | 20181019102641.3574-1-david1.zhou@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: fix deadlock of syncobj | expand |
Quoting Chunming Zhou (2018-10-19 11:26:41) > Signed-off-by: Chunming Zhou <david1.zhou@amd.com> > Cc: Daniel Vetter <daniel@ffwll.ch> > Cc: Chris Wilson <chris@chris-wilson.co.uk> > Cc: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/drm_syncobj.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c > index 57bf6006394d..2f3c14cb5156 100644 > --- a/drivers/gpu/drm/drm_syncobj.c > +++ b/drivers/gpu/drm/drm_syncobj.c > @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, > drm_syncobj_create_signal_pt(syncobj, fence, pt_value); > if (fence) { > struct drm_syncobj_cb *cur, *tmp; > + struct list_head cb_list; > + INIT_LIST_HEAD(&cb_list); LIST_HEAD(cb_list); // does both in one > spin_lock(&syncobj->lock); > - list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { > + list_splice_init(&syncobj->cb_list, &cb_list); Steal the snapshot of the list under the lock, ok. > + spin_unlock(&syncobj->lock); > + list_for_each_entry_safe(cur, tmp, &cb_list, node) { > list_del_init(&cur->node); Races against external caller of drm_syncobj_remove_callback(). However, it looks like that race is just fine, but we don't guard against the struct drm_syncobj_cb itself being freed, leading to all sort of fun for an interrupted drm_syncobj_array_wait_timeout. That kfree seems to undermine the validity of stealing the list. -Chris
On 2018年10月19日 18:50, Chris Wilson wrote: > Quoting Chunming Zhou (2018-10-19 11:26:41) >> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >> Cc: Daniel Vetter <daniel@ffwll.ch> >> Cc: Chris Wilson <chris@chris-wilson.co.uk> >> Cc: Christian König <christian.koenig@amd.com> >> --- >> drivers/gpu/drm/drm_syncobj.c | 7 +++++-- >> 1 file changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c >> index 57bf6006394d..2f3c14cb5156 100644 >> --- a/drivers/gpu/drm/drm_syncobj.c >> +++ b/drivers/gpu/drm/drm_syncobj.c >> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, >> drm_syncobj_create_signal_pt(syncobj, fence, pt_value); >> if (fence) { >> struct drm_syncobj_cb *cur, *tmp; >> + struct list_head cb_list; >> + INIT_LIST_HEAD(&cb_list); > LIST_HEAD(cb_list); // does both in one > >> spin_lock(&syncobj->lock); >> - list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { >> + list_splice_init(&syncobj->cb_list, &cb_list); > Steal the snapshot of the list under the lock, ok. > >> + spin_unlock(&syncobj->lock); >> + list_for_each_entry_safe(cur, tmp, &cb_list, node) { >> list_del_init(&cur->node); > Races against external caller of drm_syncobj_remove_callback(). However, > it looks like that race is just fine, but we don't guard against the > struct drm_syncobj_cb itself being freed, leading to all sort of fun for > an interrupted drm_syncobj_array_wait_timeout. Thanks quick review, I will use "while (!list_empty()) { e = list_first_entry(); list_del(e)" to avoid deadlock. will send v2 in one minute. Regards, David Zhou > > That kfree seems to undermine the validity of stealing the list. > -Chris
On 2018年10月19日 19:26, zhoucm1 wrote: > > > On 2018年10月19日 18:50, Chris Wilson wrote: >> Quoting Chunming Zhou (2018-10-19 11:26:41) >>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>> Cc: Daniel Vetter <daniel@ffwll.ch> >>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>> Cc: Christian König <christian.koenig@amd.com> >>> --- >>> drivers/gpu/drm/drm_syncobj.c | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/drm_syncobj.c >>> b/drivers/gpu/drm/drm_syncobj.c >>> index 57bf6006394d..2f3c14cb5156 100644 >>> --- a/drivers/gpu/drm/drm_syncobj.c >>> +++ b/drivers/gpu/drm/drm_syncobj.c >>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct >>> drm_syncobj *syncobj, >>> drm_syncobj_create_signal_pt(syncobj, fence, pt_value); >>> if (fence) { >>> struct drm_syncobj_cb *cur, *tmp; >>> + struct list_head cb_list; >>> + INIT_LIST_HEAD(&cb_list); >> LIST_HEAD(cb_list); // does both in one >> >>> spin_lock(&syncobj->lock); >>> - list_for_each_entry_safe(cur, tmp, >>> &syncobj->cb_list, node) { >>> + list_splice_init(&syncobj->cb_list, &cb_list); >> Steal the snapshot of the list under the lock, ok. >> >>> + spin_unlock(&syncobj->lock); >>> + list_for_each_entry_safe(cur, tmp, &cb_list, node) { >>> list_del_init(&cur->node); >> Races against external caller of drm_syncobj_remove_callback(). However, >> it looks like that race is just fine, but we don't guard against the >> struct drm_syncobj_cb itself being freed, leading to all sort of fun for >> an interrupted drm_syncobj_array_wait_timeout. > Thanks quick review, I will use "while (!list_empty()) { e = > list_first_entry(); list_del(e)" to avoid deadlock. this still cannot resolve freeing problem, do you mind I change spinlock to mutex? Thanks, David Zhou > > will send v2 in one minute. > > Regards, > David Zhou >> >> That kfree seems to undermine the validity of stealing the list. >> -Chris >
Am 19.10.18 um 14:01 schrieb zhoucm1: > > > On 2018年10月19日 19:26, zhoucm1 wrote: >> >> >> On 2018年10月19日 18:50, Chris Wilson wrote: >>> Quoting Chunming Zhou (2018-10-19 11:26:41) >>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>> Cc: Daniel Vetter <daniel@ffwll.ch> >>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>> Cc: Christian König <christian.koenig@amd.com> >>>> --- >>>> drivers/gpu/drm/drm_syncobj.c | 7 +++++-- >>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/drm_syncobj.c >>>> b/drivers/gpu/drm/drm_syncobj.c >>>> index 57bf6006394d..2f3c14cb5156 100644 >>>> --- a/drivers/gpu/drm/drm_syncobj.c >>>> +++ b/drivers/gpu/drm/drm_syncobj.c >>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct >>>> drm_syncobj *syncobj, >>>> drm_syncobj_create_signal_pt(syncobj, fence, pt_value); >>>> if (fence) { >>>> struct drm_syncobj_cb *cur, *tmp; >>>> + struct list_head cb_list; >>>> + INIT_LIST_HEAD(&cb_list); >>> LIST_HEAD(cb_list); // does both in one >>> >>>> spin_lock(&syncobj->lock); >>>> - list_for_each_entry_safe(cur, tmp, >>>> &syncobj->cb_list, node) { >>>> + list_splice_init(&syncobj->cb_list, &cb_list); >>> Steal the snapshot of the list under the lock, ok. >>> >>>> + spin_unlock(&syncobj->lock); >>>> + list_for_each_entry_safe(cur, tmp, &cb_list, node) { >>>> list_del_init(&cur->node); >>> Races against external caller of drm_syncobj_remove_callback(). >>> However, >>> it looks like that race is just fine, but we don't guard against the >>> struct drm_syncobj_cb itself being freed, leading to all sort of fun >>> for >>> an interrupted drm_syncobj_array_wait_timeout. >> Thanks quick review, I will use "while (!list_empty()) { e = >> list_first_entry(); list_del(e)" to avoid deadlock. > this still cannot resolve freeing problem, do you mind I change > spinlock to mutex? How does that help? What you could do is to merge the array of fences into the beginning of the signal_pt, e.g. something like this: struct drm_syncobj_signal_pt { struct dma_fence fences[2]; struct dma_fence_array *fence_array; u64 value; struct list_head list; }; This way the drm_syncobj_signal_pt is freed when the fence_array is freed. That should be sufficient if we correctly reference count the fence_array. Christian. > > Thanks, > David Zhou >> >> will send v2 in one minute. >> >> Regards, >> David Zhou >>> >>> That kfree seems to undermine the validity of stealing the list. >>> -Chris >> >
On 2018年10月19日 20:08, Koenig, Christian wrote: > Am 19.10.18 um 14:01 schrieb zhoucm1: >> >> On 2018年10月19日 19:26, zhoucm1 wrote: >>> >>> On 2018年10月19日 18:50, Chris Wilson wrote: >>>> Quoting Chunming Zhou (2018-10-19 11:26:41) >>>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>> Cc: Daniel Vetter <daniel@ffwll.ch> >>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>>> Cc: Christian König <christian.koenig@amd.com> >>>>> --- >>>>> drivers/gpu/drm/drm_syncobj.c | 7 +++++-- >>>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c >>>>> b/drivers/gpu/drm/drm_syncobj.c >>>>> index 57bf6006394d..2f3c14cb5156 100644 >>>>> --- a/drivers/gpu/drm/drm_syncobj.c >>>>> +++ b/drivers/gpu/drm/drm_syncobj.c >>>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct >>>>> drm_syncobj *syncobj, >>>>> drm_syncobj_create_signal_pt(syncobj, fence, pt_value); >>>>> if (fence) { >>>>> struct drm_syncobj_cb *cur, *tmp; >>>>> + struct list_head cb_list; >>>>> + INIT_LIST_HEAD(&cb_list); >>>> LIST_HEAD(cb_list); // does both in one >>>> >>>>> spin_lock(&syncobj->lock); >>>>> - list_for_each_entry_safe(cur, tmp, >>>>> &syncobj->cb_list, node) { >>>>> + list_splice_init(&syncobj->cb_list, &cb_list); >>>> Steal the snapshot of the list under the lock, ok. >>>> >>>>> + spin_unlock(&syncobj->lock); >>>>> + list_for_each_entry_safe(cur, tmp, &cb_list, node) { >>>>> list_del_init(&cur->node); >>>> Races against external caller of drm_syncobj_remove_callback(). >>>> However, >>>> it looks like that race is just fine, but we don't guard against the >>>> struct drm_syncobj_cb itself being freed, leading to all sort of fun >>>> for >>>> an interrupted drm_syncobj_array_wait_timeout. >>> Thanks quick review, I will use "while (!list_empty()) { e = >>> list_first_entry(); list_del(e)" to avoid deadlock. >> this still cannot resolve freeing problem, do you mind I change >> spinlock to mutex? > How does that help? > > What you could do is to merge the array of fences into the beginning of > the signal_pt, e.g. something like this: > > struct drm_syncobj_signal_pt { > struct dma_fence fences[2]; > struct dma_fence_array *fence_array; > u64 value; > struct list_head list; > }; > > This way the drm_syncobj_signal_pt is freed when the fence_array is > freed. That should be sufficient if we correctly reference count the > fence_array. I'm not sure what problem you said, the deadlock reason is : Cb func will call drm_syncobj_search_fence, which will need to grab the lock, otherwise deadlock. But when we steal list or use "while (!list_empty()) { e = list_first_entry(); list_del(e)", both will encounter another freeing problem, that is syncobj_cb could be freed when wait timeout. If we change to use mutex, then we can move lock inside of _search_fence out. another way is we add a separate spinlock for signal_pt_list, not share one lock with cb_list. Regards, David > > Christian. > >> Thanks, >> David Zhou >>> will send v2 in one minute. >>> >>> Regards, >>> David Zhou >>>> That kfree seems to undermine the validity of stealing the list. >>>> -Chris
Am 19.10.18 um 14:19 schrieb zhoucm1: > > > On 2018年10月19日 20:08, Koenig, Christian wrote: >> Am 19.10.18 um 14:01 schrieb zhoucm1: >>> >>> On 2018年10月19日 19:26, zhoucm1 wrote: >>>> >>>> On 2018年10月19日 18:50, Chris Wilson wrote: >>>>> Quoting Chunming Zhou (2018-10-19 11:26:41) >>>>>> Signed-off-by: Chunming Zhou <david1.zhou@amd.com> >>>>>> Cc: Daniel Vetter <daniel@ffwll.ch> >>>>>> Cc: Chris Wilson <chris@chris-wilson.co.uk> >>>>>> Cc: Christian König <christian.koenig@amd.com> >>>>>> --- >>>>>> drivers/gpu/drm/drm_syncobj.c | 7 +++++-- >>>>>> 1 file changed, 5 insertions(+), 2 deletions(-) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c >>>>>> b/drivers/gpu/drm/drm_syncobj.c >>>>>> index 57bf6006394d..2f3c14cb5156 100644 >>>>>> --- a/drivers/gpu/drm/drm_syncobj.c >>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c >>>>>> @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct >>>>>> drm_syncobj *syncobj, >>>>>> drm_syncobj_create_signal_pt(syncobj, fence, pt_value); >>>>>> if (fence) { >>>>>> struct drm_syncobj_cb *cur, *tmp; >>>>>> + struct list_head cb_list; >>>>>> + INIT_LIST_HEAD(&cb_list); >>>>> LIST_HEAD(cb_list); // does both in one >>>>> >>>>>> spin_lock(&syncobj->lock); >>>>>> - list_for_each_entry_safe(cur, tmp, >>>>>> &syncobj->cb_list, node) { >>>>>> + list_splice_init(&syncobj->cb_list, &cb_list); >>>>> Steal the snapshot of the list under the lock, ok. >>>>> >>>>>> + spin_unlock(&syncobj->lock); >>>>>> + list_for_each_entry_safe(cur, tmp, &cb_list, node) { >>>>>> list_del_init(&cur->node); >>>>> Races against external caller of drm_syncobj_remove_callback(). >>>>> However, >>>>> it looks like that race is just fine, but we don't guard against the >>>>> struct drm_syncobj_cb itself being freed, leading to all sort of fun >>>>> for >>>>> an interrupted drm_syncobj_array_wait_timeout. >>>> Thanks quick review, I will use "while (!list_empty()) { e = >>>> list_first_entry(); list_del(e)" to avoid deadlock. >>> this still cannot resolve freeing problem, do you mind I change >>> spinlock to mutex? >> How does that help? >> >> What you could do is to merge the array of fences into the beginning of >> the signal_pt, e.g. something like this: >> >> struct drm_syncobj_signal_pt { >> struct dma_fence fences[2]; >> struct dma_fence_array *fence_array; >> u64 value; >> struct list_head list; >> }; >> >> This way the drm_syncobj_signal_pt is freed when the fence_array is >> freed. That should be sufficient if we correctly reference count the >> fence_array. > I'm not sure what problem you said, the deadlock reason is : > Cb func will call drm_syncobj_search_fence, which will need to grab > the lock, otherwise deadlock. > > But when we steal list or use "while (!list_empty()) { e = > list_first_entry(); list_del(e)", both will encounter another freeing > problem, that is syncobj_cb could be freed when wait timeout. > > If we change to use mutex, then we can move lock inside of > _search_fence out. > another way is we add a separate spinlock for signal_pt_list, not > share one lock with cb_list. Well of hand that sounds like the cleanest approach to me. Christian. > > Regards, > David >> >> Christian. >> >>> Thanks, >>> David Zhou >>>> will send v2 in one minute. >>>> >>>> Regards, >>>> David Zhou >>>>> That kfree seems to undermine the validity of stealing the list. >>>>> -Chris > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 57bf6006394d..2f3c14cb5156 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -344,13 +344,16 @@ void drm_syncobj_replace_fence(struct drm_syncobj *syncobj, drm_syncobj_create_signal_pt(syncobj, fence, pt_value); if (fence) { struct drm_syncobj_cb *cur, *tmp; + struct list_head cb_list; + INIT_LIST_HEAD(&cb_list); spin_lock(&syncobj->lock); - list_for_each_entry_safe(cur, tmp, &syncobj->cb_list, node) { + list_splice_init(&syncobj->cb_list, &cb_list); + spin_unlock(&syncobj->lock); + list_for_each_entry_safe(cur, tmp, &cb_list, node) { list_del_init(&cur->node); cur->func(syncobj, cur); } - spin_unlock(&syncobj->lock); } } EXPORT_SYMBOL(drm_syncobj_replace_fence);
Signed-off-by: Chunming Zhou <david1.zhou@amd.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Christian König <christian.koenig@amd.com> --- drivers/gpu/drm/drm_syncobj.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)