diff mbox series

drm/syncobj: Deal with signalled fences in transfer.

Message ID 20211207013235.5985-1-bas@basnieuwenhuizen.nl (mailing list archive)
State New, archived
Headers show
Series drm/syncobj: Deal with signalled fences in transfer. | expand

Commit Message

Bas Nieuwenhuizen Dec. 7, 2021, 1:32 a.m. UTC
See the comments in the code. Basically if the seqno is already
signalled then we get a NULL fence. If we then put the NULL fence
in a binary syncobj it counts as unsignalled, making that syncobj
pretty much useless for all expected uses.

Not 100% sure about the transfer to a timeline syncobj but I
believe it is needed there too, as AFAICT the add_point function
assumes the fence isn't NULL.

Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2")
Cc: stable@vger.kernel.org
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
---
 drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Christian König Dec. 7, 2021, 7:09 a.m. UTC | #1
Am 07.12.21 um 02:32 schrieb Bas Nieuwenhuizen:
> See the comments in the code. Basically if the seqno is already
> signalled then we get a NULL fence. If we then put the NULL fence
> in a binary syncobj it counts as unsignalled, making that syncobj
> pretty much useless for all expected uses.
>
> Not 100% sure about the transfer to a timeline syncobj but I
> believe it is needed there too, as AFAICT the add_point function
> assumes the fence isn't NULL.
>
> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>

Reviewed-by: Christian König <christian.koenig@amd.com>

Going to push that to drm-misc-fixes later today if nobody objects in 
the meantime.

> ---
>   drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index fdd2ec87cdd1..eb28a40400d2 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>   				     &fence);
>   	if (ret)
>   		goto err;
> +
> +	/* If the requested seqno is already signaled drm_syncobj_find_fence may
> +	 * return a NULL fence. To make sure the recipient gets signalled, use
> +	 * a new fence instead.
> +	 */
> +	if (!fence) {
> +		fence = dma_fence_allocate_private_stub();
> +		if (!fence) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +	}
> +
>   	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>   	if (!chain) {
>   		ret = -ENOMEM;
> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private,
>   				     args->src_point, args->flags, &fence);
>   	if (ret)
>   		goto err;
> +
> +	/* If the requested seqno is already signaled drm_syncobj_find_fence may
> +	 * return a NULL fence. To make sure the recipient gets signalled, use
> +	 * a new fence instead.
> +	 */
> +	if (!fence) {
> +		fence = dma_fence_allocate_private_stub();
> +		if (!fence) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +	}
> +
>   	drm_syncobj_replace_fence(binary_syncobj, fence);
>   	dma_fence_put(fence);
>   err:
Lionel Landwerlin Dec. 7, 2021, 7:10 a.m. UTC | #2
On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
> See the comments in the code. Basically if the seqno is already
> signalled then we get a NULL fence. If we then put the NULL fence
> in a binary syncobj it counts as unsignalled, making that syncobj
> pretty much useless for all expected uses.
>
> Not 100% sure about the transfer to a timeline syncobj but I
> believe it is needed there too, as AFAICT the add_point function
> assumes the fence isn't NULL.
>
> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between binary and timeline v2")
> Cc: stable@vger.kernel.org
> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> ---
>   drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
>
> diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
> index fdd2ec87cdd1..eb28a40400d2 100644
> --- a/drivers/gpu/drm/drm_syncobj.c
> +++ b/drivers/gpu/drm/drm_syncobj.c
> @@ -861,6 +861,19 @@ static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>   				     &fence);
>   	if (ret)
>   		goto err;
> +
> +	/* If the requested seqno is already signaled drm_syncobj_find_fence may
> +	 * return a NULL fence. To make sure the recipient gets signalled, use
> +	 * a new fence instead.
> +	 */
> +	if (!fence) {
> +		fence = dma_fence_allocate_private_stub();
> +		if (!fence) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +	}
> +


Shouldn't we fix drm_syncobj_find_fence() instead?

By returning a stub fence for the timeline case if there isn't one.


Because the same NULL fence check appears missing in amdgpu (and 
probably other drivers).


Also we should have tests for this in IGT.

AMD contributed some tests when this code was written but they never got 
reviewed :(


-Lionel


>   	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>   	if (!chain) {
>   		ret = -ENOMEM;
> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file *file_private,
>   				     args->src_point, args->flags, &fence);
>   	if (ret)
>   		goto err;
> +
> +	/* If the requested seqno is already signaled drm_syncobj_find_fence may
> +	 * return a NULL fence. To make sure the recipient gets signalled, use
> +	 * a new fence instead.
> +	 */
> +	if (!fence) {
> +		fence = dma_fence_allocate_private_stub();
> +		if (!fence) {
> +			ret = -ENOMEM;
> +			goto err;
> +		}
> +	}
> +
>   	drm_syncobj_replace_fence(binary_syncobj, fence);
>   	dma_fence_put(fence);
>   err:
Christian König Dec. 7, 2021, 7:21 a.m. UTC | #3
Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>> See the comments in the code. Basically if the seqno is already
>> signalled then we get a NULL fence. If we then put the NULL fence
>> in a binary syncobj it counts as unsignalled, making that syncobj
>> pretty much useless for all expected uses.
>>
>> Not 100% sure about the transfer to a timeline syncobj but I
>> believe it is needed there too, as AFAICT the add_point function
>> assumes the fence isn't NULL.
>>
>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between 
>> binary and timeline v2")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>> b/drivers/gpu/drm/drm_syncobj.c
>> index fdd2ec87cdd1..eb28a40400d2 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -861,6 +861,19 @@ static int 
>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>                        &fence);
>>       if (ret)
>>           goto err;
>> +
>> +    /* If the requested seqno is already signaled 
>> drm_syncobj_find_fence may
>> +     * return a NULL fence. To make sure the recipient gets 
>> signalled, use
>> +     * a new fence instead.
>> +     */
>> +    if (!fence) {
>> +        fence = dma_fence_allocate_private_stub();
>> +        if (!fence) {
>> +            ret = -ENOMEM;
>> +            goto err;
>> +        }
>> +    }
>> +
>
>
> Shouldn't we fix drm_syncobj_find_fence() instead?

Mhm, now that you mention it. Bas, why do you think that 
dma_fence_chain_find_seqno() may return NULL when the fence is already 
signaled?

Double checking the code that should never ever happen.

Regards,
Christian.

>
> By returning a stub fence for the timeline case if there isn't one.
>
>
> Because the same NULL fence check appears missing in amdgpu (and 
> probably other drivers).
>
>
> Also we should have tests for this in IGT.
>
> AMD contributed some tests when this code was written but they never 
> got reviewed :(
>
>
> -Lionel
>
>
>>       chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>       if (!chain) {
>>           ret = -ENOMEM;
>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file 
>> *file_private,
>>                        args->src_point, args->flags, &fence);
>>       if (ret)
>>           goto err;
>> +
>> +    /* If the requested seqno is already signaled 
>> drm_syncobj_find_fence may
>> +     * return a NULL fence. To make sure the recipient gets 
>> signalled, use
>> +     * a new fence instead.
>> +     */
>> +    if (!fence) {
>> +        fence = dma_fence_allocate_private_stub();
>> +        if (!fence) {
>> +            ret = -ENOMEM;
>> +            goto err;
>> +        }
>> +    }
>> +
>>       drm_syncobj_replace_fence(binary_syncobj, fence);
>>       dma_fence_put(fence);
>>   err:
>
>
Christian König Dec. 7, 2021, 7:22 a.m. UTC | #4
Am 07.12.21 um 08:09 schrieb Christian König:
> Am 07.12.21 um 02:32 schrieb Bas Nieuwenhuizen:
>> See the comments in the code. Basically if the seqno is already
>> signalled then we get a NULL fence. If we then put the NULL fence
>> in a binary syncobj it counts as unsignalled, making that syncobj
>> pretty much useless for all expected uses.
>>
>> Not 100% sure about the transfer to a timeline syncobj but I
>> believe it is needed there too, as AFAICT the add_point function
>> assumes the fence isn't NULL.
>>
>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between 
>> binary and timeline v2")
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>
> Reviewed-by: Christian König <christian.koenig@amd.com>
>
> Going to push that to drm-misc-fixes later today if nobody objects in 
> the meantime.

Need to retreat that rb, Lionel correctly pointed out that this should 
never ever happen.

Christian.

>
>> ---
>>   drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>   1 file changed, 26 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/drm_syncobj.c 
>> b/drivers/gpu/drm/drm_syncobj.c
>> index fdd2ec87cdd1..eb28a40400d2 100644
>> --- a/drivers/gpu/drm/drm_syncobj.c
>> +++ b/drivers/gpu/drm/drm_syncobj.c
>> @@ -861,6 +861,19 @@ static int 
>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>                        &fence);
>>       if (ret)
>>           goto err;
>> +
>> +    /* If the requested seqno is already signaled 
>> drm_syncobj_find_fence may
>> +     * return a NULL fence. To make sure the recipient gets 
>> signalled, use
>> +     * a new fence instead.
>> +     */
>> +    if (!fence) {
>> +        fence = dma_fence_allocate_private_stub();
>> +        if (!fence) {
>> +            ret = -ENOMEM;
>> +            goto err;
>> +        }
>> +    }
>> +
>>       chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>       if (!chain) {
>>           ret = -ENOMEM;
>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file 
>> *file_private,
>>                        args->src_point, args->flags, &fence);
>>       if (ret)
>>           goto err;
>> +
>> +    /* If the requested seqno is already signaled 
>> drm_syncobj_find_fence may
>> +     * return a NULL fence. To make sure the recipient gets 
>> signalled, use
>> +     * a new fence instead.
>> +     */
>> +    if (!fence) {
>> +        fence = dma_fence_allocate_private_stub();
>> +        if (!fence) {
>> +            ret = -ENOMEM;
>> +            goto err;
>> +        }
>> +    }
>> +
>>       drm_syncobj_replace_fence(binary_syncobj, fence);
>>       dma_fence_put(fence);
>>   err:
>
Bas Nieuwenhuizen Dec. 7, 2021, 10:40 a.m. UTC | #5
On Tue, Dec 7, 2021 at 8:21 AM Christian König <christian.koenig@amd.com> wrote:
>
> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
> > On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
> >> See the comments in the code. Basically if the seqno is already
> >> signalled then we get a NULL fence. If we then put the NULL fence
> >> in a binary syncobj it counts as unsignalled, making that syncobj
> >> pretty much useless for all expected uses.
> >>
> >> Not 100% sure about the transfer to a timeline syncobj but I
> >> believe it is needed there too, as AFAICT the add_point function
> >> assumes the fence isn't NULL.
> >>
> >> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
> >> binary and timeline v2")
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >> ---
> >>   drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
> >>   1 file changed, 26 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/drm_syncobj.c
> >> b/drivers/gpu/drm/drm_syncobj.c
> >> index fdd2ec87cdd1..eb28a40400d2 100644
> >> --- a/drivers/gpu/drm/drm_syncobj.c
> >> +++ b/drivers/gpu/drm/drm_syncobj.c
> >> @@ -861,6 +861,19 @@ static int
> >> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
> >>                        &fence);
> >>       if (ret)
> >>           goto err;
> >> +
> >> +    /* If the requested seqno is already signaled
> >> drm_syncobj_find_fence may
> >> +     * return a NULL fence. To make sure the recipient gets
> >> signalled, use
> >> +     * a new fence instead.
> >> +     */
> >> +    if (!fence) {
> >> +        fence = dma_fence_allocate_private_stub();
> >> +        if (!fence) {
> >> +            ret = -ENOMEM;
> >> +            goto err;
> >> +        }
> >> +    }
> >> +
> >
> >
> > Shouldn't we fix drm_syncobj_find_fence() instead?
>
> Mhm, now that you mention it. Bas, why do you think that
> dma_fence_chain_find_seqno() may return NULL when the fence is already
> signaled?
>
> Double checking the code that should never ever happen.

Well, I tested the patch with
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14097/diffs?commit_id=d4c5c840f4e3839f9f5c1747a9034eb2b565f5c0
so I'm pretty sure it happens, and this patch fixes  it, though I may
have misidentified what the code should do.

My reading is that the dma_fence_chain_for_each in
dma_fence_chain_find_seqno will never visit a signalled fence (unless
the top one is signalled), as dma_fence_chain_walk will never return a
signalled fence (it only returns on NULL or !signalled).

Happy to move this to drm_syncobj_find_fence.
>
> Regards,
> Christian.
>
> >
> > By returning a stub fence for the timeline case if there isn't one.
> >
> >
> > Because the same NULL fence check appears missing in amdgpu (and
> > probably other drivers).
> >
> >
> > Also we should have tests for this in IGT.
> >
> > AMD contributed some tests when this code was written but they never
> > got reviewed :(
> >
> >
> > -Lionel
> >
> >
> >>       chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >>       if (!chain) {
> >>           ret = -ENOMEM;
> >> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
> >> *file_private,
> >>                        args->src_point, args->flags, &fence);
> >>       if (ret)
> >>           goto err;
> >> +
> >> +    /* If the requested seqno is already signaled
> >> drm_syncobj_find_fence may
> >> +     * return a NULL fence. To make sure the recipient gets
> >> signalled, use
> >> +     * a new fence instead.
> >> +     */
> >> +    if (!fence) {
> >> +        fence = dma_fence_allocate_private_stub();
> >> +        if (!fence) {
> >> +            ret = -ENOMEM;
> >> +            goto err;
> >> +        }
> >> +    }
> >> +
> >>       drm_syncobj_replace_fence(binary_syncobj, fence);
> >>       dma_fence_put(fence);
> >>   err:
> >
> >
>
Christian König Dec. 7, 2021, 11 a.m. UTC | #6
Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
> On Tue, Dec 7, 2021 at 8:21 AM Christian König <christian.koenig@amd.com> wrote:
>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>>>> See the comments in the code. Basically if the seqno is already
>>>> signalled then we get a NULL fence. If we then put the NULL fence
>>>> in a binary syncobj it counts as unsignalled, making that syncobj
>>>> pretty much useless for all expected uses.
>>>>
>>>> Not 100% sure about the transfer to a timeline syncobj but I
>>>> believe it is needed there too, as AFAICT the add_point function
>>>> assumes the fence isn't NULL.
>>>>
>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
>>>> binary and timeline v2")
>>>> Cc: stable@vger.kernel.org
>>>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>> ---
>>>>    drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>>>    1 file changed, 26 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>> index fdd2ec87cdd1..eb28a40400d2 100644
>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>> @@ -861,6 +861,19 @@ static int
>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>>>                         &fence);
>>>>        if (ret)
>>>>            goto err;
>>>> +
>>>> +    /* If the requested seqno is already signaled
>>>> drm_syncobj_find_fence may
>>>> +     * return a NULL fence. To make sure the recipient gets
>>>> signalled, use
>>>> +     * a new fence instead.
>>>> +     */
>>>> +    if (!fence) {
>>>> +        fence = dma_fence_allocate_private_stub();
>>>> +        if (!fence) {
>>>> +            ret = -ENOMEM;
>>>> +            goto err;
>>>> +        }
>>>> +    }
>>>> +
>>>
>>> Shouldn't we fix drm_syncobj_find_fence() instead?
>> Mhm, now that you mention it. Bas, why do you think that
>> dma_fence_chain_find_seqno() may return NULL when the fence is already
>> signaled?
>>
>> Double checking the code that should never ever happen.
> Well, I tested the patch with
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&amp;reserved=0
> so I'm pretty sure it happens, and this patch fixes  it, though I may
> have misidentified what the code should do.
>
> My reading is that the dma_fence_chain_for_each in
> dma_fence_chain_find_seqno will never visit a signalled fence (unless
> the top one is signalled), as dma_fence_chain_walk will never return a
> signalled fence (it only returns on NULL or !signalled).

Ah, yes that suddenly makes more sense.

> Happy to move this to drm_syncobj_find_fence.

No, I think that your current patch is fine.

That drm_syncobj_find_fence() only returns NULL when it can't find 
anything !signaled is correct behavior I think.

Going to push your original patch if nobody has any more objections.

But somebody might want to take care of the IGT as well.

Regards,
Christian.

>> Regards,
>> Christian.
>>
>>> By returning a stub fence for the timeline case if there isn't one.
>>>
>>>
>>> Because the same NULL fence check appears missing in amdgpu (and
>>> probably other drivers).
>>>
>>>
>>> Also we should have tests for this in IGT.
>>>
>>> AMD contributed some tests when this code was written but they never
>>> got reviewed :(
>>>
>>>
>>> -Lionel
>>>
>>>
>>>>        chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>>>        if (!chain) {
>>>>            ret = -ENOMEM;
>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
>>>> *file_private,
>>>>                         args->src_point, args->flags, &fence);
>>>>        if (ret)
>>>>            goto err;
>>>> +
>>>> +    /* If the requested seqno is already signaled
>>>> drm_syncobj_find_fence may
>>>> +     * return a NULL fence. To make sure the recipient gets
>>>> signalled, use
>>>> +     * a new fence instead.
>>>> +     */
>>>> +    if (!fence) {
>>>> +        fence = dma_fence_allocate_private_stub();
>>>> +        if (!fence) {
>>>> +            ret = -ENOMEM;
>>>> +            goto err;
>>>> +        }
>>>> +    }
>>>> +
>>>>        drm_syncobj_replace_fence(binary_syncobj, fence);
>>>>        dma_fence_put(fence);
>>>>    err:
>>>
Lionel Landwerlin Dec. 7, 2021, 11:28 a.m. UTC | #7
On 07/12/2021 13:00, Christian König wrote:
> Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
>> On Tue, Dec 7, 2021 at 8:21 AM Christian König 
>> <christian.koenig@amd.com> wrote:
>>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
>>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>>>>> See the comments in the code. Basically if the seqno is already
>>>>> signalled then we get a NULL fence. If we then put the NULL fence
>>>>> in a binary syncobj it counts as unsignalled, making that syncobj
>>>>> pretty much useless for all expected uses.
>>>>>
>>>>> Not 100% sure about the transfer to a timeline syncobj but I
>>>>> believe it is needed there too, as AFAICT the add_point function
>>>>> assumes the fence isn't NULL.
>>>>>
>>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
>>>>> binary and timeline v2")
>>>>> Cc: stable@vger.kernel.org
>>>>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>> ---
>>>>>    drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>>>>    1 file changed, 26 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>> index fdd2ec87cdd1..eb28a40400d2 100644
>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>> @@ -861,6 +861,19 @@ static int
>>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>>>>                         &fence);
>>>>>        if (ret)
>>>>>            goto err;
>>>>> +
>>>>> +    /* If the requested seqno is already signaled
>>>>> drm_syncobj_find_fence may
>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>> signalled, use
>>>>> +     * a new fence instead.
>>>>> +     */
>>>>> +    if (!fence) {
>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>> +        if (!fence) {
>>>>> +            ret = -ENOMEM;
>>>>> +            goto err;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>
>>>> Shouldn't we fix drm_syncobj_find_fence() instead?
>>> Mhm, now that you mention it. Bas, why do you think that
>>> dma_fence_chain_find_seqno() may return NULL when the fence is already
>>> signaled?
>>>
>>> Double checking the code that should never ever happen.
>> Well, I tested the patch with
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&amp;reserved=0 
>>
>> so I'm pretty sure it happens, and this patch fixes  it, though I may
>> have misidentified what the code should do.
>>
>> My reading is that the dma_fence_chain_for_each in
>> dma_fence_chain_find_seqno will never visit a signalled fence (unless
>> the top one is signalled), as dma_fence_chain_walk will never return a
>> signalled fence (it only returns on NULL or !signalled).
>
> Ah, yes that suddenly makes more sense.
>
>> Happy to move this to drm_syncobj_find_fence.
>
> No, I think that your current patch is fine.
>
> That drm_syncobj_find_fence() only returns NULL when it can't find 
> anything !signaled is correct behavior I think.


We should probably update the docs then :


  * Returns 0 on success or a negative error value on failure. On 
success @fence
  * contains a reference to the fence, which must be released by calling
  * dma_fence_put().


Looking at some of the kernel drivers, it looks like they don't all 
protect themselves against NULL pointers :


https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195

https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020


-Lionel


>
> Going to push your original patch if nobody has any more objections.
>
> But somebody might want to take care of the IGT as well.
>
> Regards,
> Christian.
>
>>> Regards,
>>> Christian.
>>>
>>>> By returning a stub fence for the timeline case if there isn't one.
>>>>
>>>>
>>>> Because the same NULL fence check appears missing in amdgpu (and
>>>> probably other drivers).
>>>>
>>>>
>>>> Also we should have tests for this in IGT.
>>>>
>>>> AMD contributed some tests when this code was written but they never
>>>> got reviewed :(
>>>>
>>>>
>>>> -Lionel
>>>>
>>>>
>>>>>        chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>>>>        if (!chain) {
>>>>>            ret = -ENOMEM;
>>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
>>>>> *file_private,
>>>>>                         args->src_point, args->flags, &fence);
>>>>>        if (ret)
>>>>>            goto err;
>>>>> +
>>>>> +    /* If the requested seqno is already signaled
>>>>> drm_syncobj_find_fence may
>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>> signalled, use
>>>>> +     * a new fence instead.
>>>>> +     */
>>>>> +    if (!fence) {
>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>> +        if (!fence) {
>>>>> +            ret = -ENOMEM;
>>>>> +            goto err;
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>        drm_syncobj_replace_fence(binary_syncobj, fence);
>>>>>        dma_fence_put(fence);
>>>>>    err:
>>>>
>
Bas Nieuwenhuizen Dec. 7, 2021, 11:35 a.m. UTC | #8
On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 07/12/2021 13:00, Christian König wrote:
> > Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
> >> On Tue, Dec 7, 2021 at 8:21 AM Christian König
> >> <christian.koenig@amd.com> wrote:
> >>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
> >>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
> >>>>> See the comments in the code. Basically if the seqno is already
> >>>>> signalled then we get a NULL fence. If we then put the NULL fence
> >>>>> in a binary syncobj it counts as unsignalled, making that syncobj
> >>>>> pretty much useless for all expected uses.
> >>>>>
> >>>>> Not 100% sure about the transfer to a timeline syncobj but I
> >>>>> believe it is needed there too, as AFAICT the add_point function
> >>>>> assumes the fence isn't NULL.
> >>>>>
> >>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
> >>>>> binary and timeline v2")
> >>>>> Cc: stable@vger.kernel.org
> >>>>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> >>>>> ---
> >>>>>    drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
> >>>>>    1 file changed, 26 insertions(+)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
> >>>>> b/drivers/gpu/drm/drm_syncobj.c
> >>>>> index fdd2ec87cdd1..eb28a40400d2 100644
> >>>>> --- a/drivers/gpu/drm/drm_syncobj.c
> >>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
> >>>>> @@ -861,6 +861,19 @@ static int
> >>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
> >>>>>                         &fence);
> >>>>>        if (ret)
> >>>>>            goto err;
> >>>>> +
> >>>>> +    /* If the requested seqno is already signaled
> >>>>> drm_syncobj_find_fence may
> >>>>> +     * return a NULL fence. To make sure the recipient gets
> >>>>> signalled, use
> >>>>> +     * a new fence instead.
> >>>>> +     */
> >>>>> +    if (!fence) {
> >>>>> +        fence = dma_fence_allocate_private_stub();
> >>>>> +        if (!fence) {
> >>>>> +            ret = -ENOMEM;
> >>>>> +            goto err;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>
> >>>> Shouldn't we fix drm_syncobj_find_fence() instead?
> >>> Mhm, now that you mention it. Bas, why do you think that
> >>> dma_fence_chain_find_seqno() may return NULL when the fence is already
> >>> signaled?
> >>>
> >>> Double checking the code that should never ever happen.
> >> Well, I tested the patch with
> >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cc1ab29fc100842826f5d08d9b96e102a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744705383763833%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=sXkTJWm%2FWm2xwgLGdepVWAOlqj%2FeArnvmMvnJpQ9YEs%3D&amp;reserved=0
> >>
> >> so I'm pretty sure it happens, and this patch fixes  it, though I may
> >> have misidentified what the code should do.
> >>
> >> My reading is that the dma_fence_chain_for_each in
> >> dma_fence_chain_find_seqno will never visit a signalled fence (unless
> >> the top one is signalled), as dma_fence_chain_walk will never return a
> >> signalled fence (it only returns on NULL or !signalled).
> >
> > Ah, yes that suddenly makes more sense.
> >
> >> Happy to move this to drm_syncobj_find_fence.
> >
> > No, I think that your current patch is fine.
> >
> > That drm_syncobj_find_fence() only returns NULL when it can't find
> > anything !signaled is correct behavior I think.
>
>
> We should probably update the docs then :
>
>
>   * Returns 0 on success or a negative error value on failure. On
> success @fence
>   * contains a reference to the fence, which must be released by calling
>   * dma_fence_put().
>
>
> Looking at some of the kernel drivers, it looks like they don't all
> protect themselves against NULL pointers :
>
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vc4/vc4_gem.c#L1195
>
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c#L1020

amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence).
But yeah I think it is a bit treacherous, especially as this only
occurs with timeline semaphores.

>
>
> -Lionel
>
>
> >
> > Going to push your original patch if nobody has any more objections.
> >
> > But somebody might want to take care of the IGT as well.
> >
> > Regards,
> > Christian.
> >
> >>> Regards,
> >>> Christian.
> >>>
> >>>> By returning a stub fence for the timeline case if there isn't one.
> >>>>
> >>>>
> >>>> Because the same NULL fence check appears missing in amdgpu (and
> >>>> probably other drivers).
> >>>>
> >>>>
> >>>> Also we should have tests for this in IGT.
> >>>>
> >>>> AMD contributed some tests when this code was written but they never
> >>>> got reviewed :(
> >>>>
> >>>>
> >>>> -Lionel
> >>>>
> >>>>
> >>>>>        chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
> >>>>>        if (!chain) {
> >>>>>            ret = -ENOMEM;
> >>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
> >>>>> *file_private,
> >>>>>                         args->src_point, args->flags, &fence);
> >>>>>        if (ret)
> >>>>>            goto err;
> >>>>> +
> >>>>> +    /* If the requested seqno is already signaled
> >>>>> drm_syncobj_find_fence may
> >>>>> +     * return a NULL fence. To make sure the recipient gets
> >>>>> signalled, use
> >>>>> +     * a new fence instead.
> >>>>> +     */
> >>>>> +    if (!fence) {
> >>>>> +        fence = dma_fence_allocate_private_stub();
> >>>>> +        if (!fence) {
> >>>>> +            ret = -ENOMEM;
> >>>>> +            goto err;
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>>        drm_syncobj_replace_fence(binary_syncobj, fence);
> >>>>>        dma_fence_put(fence);
> >>>>>    err:
> >>>>
> >
>
Christian König Dec. 7, 2021, 11:51 a.m. UTC | #9
Am 07.12.21 um 12:35 schrieb Bas Nieuwenhuizen:
> On Tue, Dec 7, 2021 at 12:28 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 07/12/2021 13:00, Christian König wrote:
>>> Am 07.12.21 um 11:40 schrieb Bas Nieuwenhuizen:
>>>> On Tue, Dec 7, 2021 at 8:21 AM Christian König
>>>> <christian.koenig@amd.com> wrote:
>>>>> Am 07.12.21 um 08:10 schrieb Lionel Landwerlin:
>>>>>> On 07/12/2021 03:32, Bas Nieuwenhuizen wrote:
>>>>>>> See the comments in the code. Basically if the seqno is already
>>>>>>> signalled then we get a NULL fence. If we then put the NULL fence
>>>>>>> in a binary syncobj it counts as unsignalled, making that syncobj
>>>>>>> pretty much useless for all expected uses.
>>>>>>>
>>>>>>> Not 100% sure about the transfer to a timeline syncobj but I
>>>>>>> believe it is needed there too, as AFAICT the add_point function
>>>>>>> assumes the fence isn't NULL.
>>>>>>>
>>>>>>> Fixes: ea569910cbab ("drm/syncobj: add transition iotcls between
>>>>>>> binary and timeline v2")
>>>>>>> Cc: stable@vger.kernel.org
>>>>>>> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
>>>>>>> ---
>>>>>>>     drivers/gpu/drm/drm_syncobj.c | 26 ++++++++++++++++++++++++++
>>>>>>>     1 file changed, 26 insertions(+)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/drm_syncobj.c
>>>>>>> b/drivers/gpu/drm/drm_syncobj.c
>>>>>>> index fdd2ec87cdd1..eb28a40400d2 100644
>>>>>>> --- a/drivers/gpu/drm/drm_syncobj.c
>>>>>>> +++ b/drivers/gpu/drm/drm_syncobj.c
>>>>>>> @@ -861,6 +861,19 @@ static int
>>>>>>> drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
>>>>>>>                          &fence);
>>>>>>>         if (ret)
>>>>>>>             goto err;
>>>>>>> +
>>>>>>> +    /* If the requested seqno is already signaled
>>>>>>> drm_syncobj_find_fence may
>>>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>>>> signalled, use
>>>>>>> +     * a new fence instead.
>>>>>>> +     */
>>>>>>> +    if (!fence) {
>>>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>>>> +        if (!fence) {
>>>>>>> +            ret = -ENOMEM;
>>>>>>> +            goto err;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>> Shouldn't we fix drm_syncobj_find_fence() instead?
>>>>> Mhm, now that you mention it. Bas, why do you think that
>>>>> dma_fence_chain_find_seqno() may return NULL when the fence is already
>>>>> signaled?
>>>>>
>>>>> Double checking the code that should never ever happen.
>>>> Well, I tested the patch with
>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.freedesktop.org%2Fmesa%2Fmesa%2F-%2Fmerge_requests%2F14097%2Fdiffs%3Fcommit_id%3Dd4c5c840f4e3839f9f5c1747a9034eb2b565f5c0&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=5SiZYL1TgLq3ldGy1COOkSasklZWQN%2BxWGXJ1j%2BHSOQ%3D&amp;reserved=0
>>>>
>>>> so I'm pretty sure it happens, and this patch fixes  it, though I may
>>>> have misidentified what the code should do.
>>>>
>>>> My reading is that the dma_fence_chain_for_each in
>>>> dma_fence_chain_find_seqno will never visit a signalled fence (unless
>>>> the top one is signalled), as dma_fence_chain_walk will never return a
>>>> signalled fence (it only returns on NULL or !signalled).
>>> Ah, yes that suddenly makes more sense.
>>>
>>>> Happy to move this to drm_syncobj_find_fence.
>>> No, I think that your current patch is fine.
>>>
>>> That drm_syncobj_find_fence() only returns NULL when it can't find
>>> anything !signaled is correct behavior I think.
>>
>> We should probably update the docs then :
>>
>>
>>    * Returns 0 on success or a negative error value on failure. On
>> success @fence
>>    * contains a reference to the fence, which must be released by calling
>>    * dma_fence_put().
>>
>>
>> Looking at some of the kernel drivers, it looks like they don't all
>> protect themselves against NULL pointers :
>>
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Fvc4%2Fvc4_gem.c%23L1195&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=LjyWQpIUqqAGgR7ak3CJOJTqf%2FG8QB9BZX542vL25RA%3D&amp;reserved=0
>>
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Felixir.bootlin.com%2Flinux%2Flatest%2Fsource%2Fdrivers%2Fgpu%2Fdrm%2Famd%2Famdgpu%2Famdgpu_cs.c%23L1020&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C342b0cbdff7d487630ae08d9b975abd9%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637744738372115823%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=kk9k8sDNLiOFTIyul79FbhfQ4Y2MdwFA6rT0h46xM40%3D&amp;reserved=0
> amdgpu handles it here (amdgpu_sync_fence checks for a NULL fence).
> But yeah I think it is a bit treacherous, especially as this only
> occurs with timeline semaphores.

Mhm, that's a good point.

While I still think it makes more sense from the design perspective to 
return NULL to distinct that there is nothing to wait for, I see as well 
that this is it not the most defensive approach.

Ok in that case let's move it into drm_syncobj_find_fence() instead.

Just one more thing, the timestamp is now busted anyway (cause the 
original fence is already garbage collected). So we can probably use 
dma_fence_get_stub() instead of dma_fence_allocate_private_stub().

Regards,
Christian.

>
>>
>> -Lionel
>>
>>
>>> Going to push your original patch if nobody has any more objections.
>>>
>>> But somebody might want to take care of the IGT as well.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> By returning a stub fence for the timeline case if there isn't one.
>>>>>>
>>>>>>
>>>>>> Because the same NULL fence check appears missing in amdgpu (and
>>>>>> probably other drivers).
>>>>>>
>>>>>>
>>>>>> Also we should have tests for this in IGT.
>>>>>>
>>>>>> AMD contributed some tests when this code was written but they never
>>>>>> got reviewed :(
>>>>>>
>>>>>>
>>>>>> -Lionel
>>>>>>
>>>>>>
>>>>>>>         chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
>>>>>>>         if (!chain) {
>>>>>>>             ret = -ENOMEM;
>>>>>>> @@ -890,6 +903,19 @@ drm_syncobj_transfer_to_binary(struct drm_file
>>>>>>> *file_private,
>>>>>>>                          args->src_point, args->flags, &fence);
>>>>>>>         if (ret)
>>>>>>>             goto err;
>>>>>>> +
>>>>>>> +    /* If the requested seqno is already signaled
>>>>>>> drm_syncobj_find_fence may
>>>>>>> +     * return a NULL fence. To make sure the recipient gets
>>>>>>> signalled, use
>>>>>>> +     * a new fence instead.
>>>>>>> +     */
>>>>>>> +    if (!fence) {
>>>>>>> +        fence = dma_fence_allocate_private_stub();
>>>>>>> +        if (!fence) {
>>>>>>> +            ret = -ENOMEM;
>>>>>>> +            goto err;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>         drm_syncobj_replace_fence(binary_syncobj, fence);
>>>>>>>         dma_fence_put(fence);
>>>>>>>     err:
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c
index fdd2ec87cdd1..eb28a40400d2 100644
--- a/drivers/gpu/drm/drm_syncobj.c
+++ b/drivers/gpu/drm/drm_syncobj.c
@@ -861,6 +861,19 @@  static int drm_syncobj_transfer_to_timeline(struct drm_file *file_private,
 				     &fence);
 	if (ret)
 		goto err;
+
+	/* If the requested seqno is already signaled drm_syncobj_find_fence may
+	 * return a NULL fence. To make sure the recipient gets signalled, use
+	 * a new fence instead.
+	 */
+	if (!fence) {
+		fence = dma_fence_allocate_private_stub();
+		if (!fence) {
+			ret = -ENOMEM;
+			goto err;
+		}
+	}
+
 	chain = kzalloc(sizeof(struct dma_fence_chain), GFP_KERNEL);
 	if (!chain) {
 		ret = -ENOMEM;
@@ -890,6 +903,19 @@  drm_syncobj_transfer_to_binary(struct drm_file *file_private,
 				     args->src_point, args->flags, &fence);
 	if (ret)
 		goto err;
+
+	/* If the requested seqno is already signaled drm_syncobj_find_fence may
+	 * return a NULL fence. To make sure the recipient gets signalled, use
+	 * a new fence instead.
+	 */
+	if (!fence) {
+		fence = dma_fence_allocate_private_stub();
+		if (!fence) {
+			ret = -ENOMEM;
+			goto err;
+		}
+	}
+
 	drm_syncobj_replace_fence(binary_syncobj, fence);
 	dma_fence_put(fence);
 err: