diff mbox series

[2/2] dma-buf: fix dma-fence-chain out of order test

Message ID 20200625123443.19680-2-lionel.g.landwerlin@intel.com (mailing list archive)
State New, archived
Headers show
Series [1/2] Revert "dma-buf: Report signaled links inside dma-fence-chain" | expand

Commit Message

Lionel Landwerlin June 25, 2020, 12:34 p.m. UTC
There was probably a misunderstand on how the dma-fence-chain is
supposed to work or what dma_fence_chain_find_seqno() is supposed to
return.

dma_fence_chain_find_seqno() is here to give us the fence to wait upon
for a particular point in the timeline. The timeline progresses only
when all the points prior to a given number have completed.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dc2f7e67a28a5c ("dma-buf: Exercise dma-fence-chain under selftests")
---
 drivers/dma-buf/st-dma-fence-chain.c | 43 ++++++++++++++--------------
 1 file changed, 21 insertions(+), 22 deletions(-)

Comments

Christian König June 25, 2020, 12:44 p.m. UTC | #1
Am 25.06.20 um 14:34 schrieb Lionel Landwerlin:
> There was probably a misunderstand on how the dma-fence-chain is
> supposed to work or what dma_fence_chain_find_seqno() is supposed to
> return.
>
> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
> for a particular point in the timeline. The timeline progresses only
> when all the points prior to a given number have completed.
>
> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
> Fixes: dc2f7e67a28a5c ("dma-buf: Exercise dma-fence-chain under selftests")

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/dma-buf/st-dma-fence-chain.c | 43 ++++++++++++++--------------
>   1 file changed, 21 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
> index 5d45ba7ba3cd..9525f7f56119 100644
> --- a/drivers/dma-buf/st-dma-fence-chain.c
> +++ b/drivers/dma-buf/st-dma-fence-chain.c
> @@ -318,15 +318,16 @@ static int find_out_of_order(void *arg)
>   		goto err;
>   	}
>   
> -	if (fence && fence != fc.chains[1]) {
> +	/*
> +	 * We signaled the middle fence (2) of the 1-2-3 chain. The behavior
> +	 * of the dma-fence-chain is to make us wait for all the fences up to
> +	 * the point we want. Since fence 1 is still not signaled, this what
> +	 * we should get as fence to wait upon (fence 2 being garbage
> +	 * collected during the traversal of the chain).
> +	 */
> +	if (fence != fc.chains[0]) {
>   		pr_err("Incorrect chain-fence.seqno:%lld reported for completed seqno:2\n",
> -		       fence->seqno);
> -
> -		dma_fence_get(fence);
> -		err = dma_fence_chain_find_seqno(&fence, 2);
> -		dma_fence_put(fence);
> -		if (err)
> -			pr_err("Reported %d for finding self!\n", err);
> +		       fence ? fence->seqno : 0);
>   
>   		err = -EINVAL;
>   	}
> @@ -415,20 +416,18 @@ static int __find_race(void *arg)
>   		if (!fence)
>   			goto signal;
>   
> -		err = dma_fence_chain_find_seqno(&fence, seqno);
> -		if (err) {
> -			pr_err("Reported an invalid fence for find-self:%d\n",
> -			       seqno);
> -			dma_fence_put(fence);
> -			break;
> -		}
> -
> -		if (fence->seqno < seqno) {
> -			pr_err("Reported an earlier fence.seqno:%lld for seqno:%d\n",
> -			       fence->seqno, seqno);
> -			err = -EINVAL;
> -			dma_fence_put(fence);
> -			break;
> +		/*
> +		 * We can only find ourselves if we are on fence we were
> +		 * looking for.
> +		 */
> +		if (fence->seqno == seqno) {
> +			err = dma_fence_chain_find_seqno(&fence, seqno);
> +			if (err) {
> +				pr_err("Reported an invalid fence for find-self:%d\n",
> +				       seqno);
> +				dma_fence_put(fence);
> +				break;
> +			}
>   		}
>   
>   		dma_fence_put(fence);
Chris Wilson June 25, 2020, 1:18 p.m. UTC | #2
Quoting Lionel Landwerlin (2020-06-25 13:34:43)
> There was probably a misunderstand on how the dma-fence-chain is
> supposed to work or what dma_fence_chain_find_seqno() is supposed to
> return.
> 
> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
> for a particular point in the timeline. The timeline progresses only
> when all the points prior to a given number have completed.

Hmm, the question was what point is it supposed to wait for.

For the simple chain of [1, 3], does 1 being signaled imply that all
points up to 3 are signaled, or does 3 not being signaled imply that all
points after 1 are not. If that's mentioned already somewhere, my bad.
If not, could you put the answer somewhere.
-Chris
Lionel Landwerlin June 25, 2020, 1:23 p.m. UTC | #3
On 25/06/2020 16:18, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2020-06-25 13:34:43)
>> There was probably a misunderstand on how the dma-fence-chain is
>> supposed to work or what dma_fence_chain_find_seqno() is supposed to
>> return.
>>
>> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
>> for a particular point in the timeline. The timeline progresses only
>> when all the points prior to a given number have completed.
> Hmm, the question was what point is it supposed to wait for.
>
> For the simple chain of [1, 3], does 1 being signaled imply that all
> points up to 3 are signaled, or does 3 not being signaled imply that all
> points after 1 are not. If that's mentioned already somewhere, my bad.
> If not, could you put the answer somewhere.
> -Chris

In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2) 
should return NULL.


In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and 
the test was expecting no fence to be returned by find_seqno(2).

But we still have to wait on 1 to complete before find_seqno(2) can 
return NULL (as in you don't have to wait on anything).


Hope that answer the question.


-Lionel
Chris Wilson June 25, 2020, 1:47 p.m. UTC | #4
Quoting Lionel Landwerlin (2020-06-25 14:23:25)
> On 25/06/2020 16:18, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2020-06-25 13:34:43)
> >> There was probably a misunderstand on how the dma-fence-chain is
> >> supposed to work or what dma_fence_chain_find_seqno() is supposed to
> >> return.
> >>
> >> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
> >> for a particular point in the timeline. The timeline progresses only
> >> when all the points prior to a given number have completed.
> > Hmm, the question was what point is it supposed to wait for.
> >
> > For the simple chain of [1, 3], does 1 being signaled imply that all
> > points up to 3 are signaled, or does 3 not being signaled imply that all
> > points after 1 are not. If that's mentioned already somewhere, my bad.
> > If not, could you put the answer somewhere.
> > -Chris
> 
> In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2) 
> should return NULL.
> 
> 
> In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and 
> the test was expecting no fence to be returned by find_seqno(2).
> 
> But we still have to wait on 1 to complete before find_seqno(2) can 
> return NULL (as in you don't have to wait on anything).

* scratches head

I thought it was meant to be expecting fc.chain[1] to still be present
as the chain at that point was not yet signaled.

Oh well, a mistake compounded. :|
-Chris
Lionel Landwerlin June 25, 2020, 1:56 p.m. UTC | #5
On 25/06/2020 16:47, Chris Wilson wrote:
> Quoting Lionel Landwerlin (2020-06-25 14:23:25)
>> On 25/06/2020 16:18, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2020-06-25 13:34:43)
>>>> There was probably a misunderstand on how the dma-fence-chain is
>>>> supposed to work or what dma_fence_chain_find_seqno() is supposed to
>>>> return.
>>>>
>>>> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
>>>> for a particular point in the timeline. The timeline progresses only
>>>> when all the points prior to a given number have completed.
>>> Hmm, the question was what point is it supposed to wait for.
>>>
>>> For the simple chain of [1, 3], does 1 being signaled imply that all
>>> points up to 3 are signaled, or does 3 not being signaled imply that all
>>> points after 1 are not. If that's mentioned already somewhere, my bad.
>>> If not, could you put the answer somewhere.
>>> -Chris
>> In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2)
>> should return NULL.
>>
>>
>> In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and
>> the test was expecting no fence to be returned by find_seqno(2).
>>
>> But we still have to wait on 1 to complete before find_seqno(2) can
>> return NULL (as in you don't have to wait on anything).
> * scratches head
>
> I thought it was meant to be expecting fc.chain[1] to still be present
> as the chain at that point was not yet signaled.


You're right that the point is not yet signaled.

But it doesn't need to stay in the chain if you can wait on a previous 
point.


chain[1] gets removed as we walk the chain backward in dma_fence_chain_walk.


-Lionel


>
> Oh well, a mistake compounded. :|
> -Chris
Daniel Vetter June 25, 2020, 1:59 p.m. UTC | #6
On Thu, Jun 25, 2020 at 3:23 PM Lionel Landwerlin
<lionel.g.landwerlin@intel.com> wrote:
>
> On 25/06/2020 16:18, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2020-06-25 13:34:43)
> >> There was probably a misunderstand on how the dma-fence-chain is
> >> supposed to work or what dma_fence_chain_find_seqno() is supposed to
> >> return.
> >>
> >> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
> >> for a particular point in the timeline. The timeline progresses only
> >> when all the points prior to a given number have completed.
> > Hmm, the question was what point is it supposed to wait for.
> >
> > For the simple chain of [1, 3], does 1 being signaled imply that all
> > points up to 3 are signaled, or does 3 not being signaled imply that all
> > points after 1 are not. If that's mentioned already somewhere, my bad.
> > If not, could you put the answer somewhere.
> > -Chris
>
> In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2)
> should return NULL.
>
>
> In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and
> the test was expecting no fence to be returned by find_seqno(2).
>
> But we still have to wait on 1 to complete before find_seqno(2) can
> return NULL (as in you don't have to wait on anything).
>
>
> Hope that answer the question.

I asked Christian to document why timeline works like this, but I
can't find it in the kerneldoc right now. If it's missing I think we
should fix that and add the explanation, iirc it was around gpu reset
creating too much havoc otherwise.
-Daniel

>
>
> -Lionel
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Christian König July 2, 2020, 12:57 p.m. UTC | #7
Am 25.06.20 um 15:59 schrieb Daniel Vetter:
> On Thu, Jun 25, 2020 at 3:23 PM Lionel Landwerlin
> <lionel.g.landwerlin@intel.com> wrote:
>> On 25/06/2020 16:18, Chris Wilson wrote:
>>> Quoting Lionel Landwerlin (2020-06-25 13:34:43)
>>>> There was probably a misunderstand on how the dma-fence-chain is
>>>> supposed to work or what dma_fence_chain_find_seqno() is supposed to
>>>> return.
>>>>
>>>> dma_fence_chain_find_seqno() is here to give us the fence to wait upon
>>>> for a particular point in the timeline. The timeline progresses only
>>>> when all the points prior to a given number have completed.
>>> Hmm, the question was what point is it supposed to wait for.
>>>
>>> For the simple chain of [1, 3], does 1 being signaled imply that all
>>> points up to 3 are signaled, or does 3 not being signaled imply that all
>>> points after 1 are not. If that's mentioned already somewhere, my bad.
>>> If not, could you put the answer somewhere.
>>> -Chris
>> In [1, 3], if 1 is signaled, the timeline value is 1. And find_seqno(2)
>> should return NULL.
>>
>>
>> In the out_of_order selftest the chain was [1, 2, 3], 2 was signaled and
>> the test was expecting no fence to be returned by find_seqno(2).
>>
>> But we still have to wait on 1 to complete before find_seqno(2) can
>> return NULL (as in you don't have to wait on anything).
>>
>>
>> Hope that answer the question.
> I asked Christian to document why timeline works like this, but I
> can't find it in the kerneldoc right now. If it's missing I think we
> should fix that and add the explanation, iirc it was around gpu reset
> creating too much havoc otherwise.

I do remember that I wrote a patch to improve the kerneldoc for timeline 
semaphores, but then somebody else came along with an even better 
description.

Unfortunately it looks like neither was ever merged.

Need to dig through my mails,
Christian.

> -Daniel
>
>>
>> -Lionel
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fintel-gfx&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7Cfd87640cd9bd422971bf08d8191004d2%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637286903879074805&amp;sdata=M3WGWbuyQKZeGC0J3wEKtgQ1oKYo6GOAMvKU2mU3r%2FM%3D&amp;reserved=0
>
>
diff mbox series

Patch

diff --git a/drivers/dma-buf/st-dma-fence-chain.c b/drivers/dma-buf/st-dma-fence-chain.c
index 5d45ba7ba3cd..9525f7f56119 100644
--- a/drivers/dma-buf/st-dma-fence-chain.c
+++ b/drivers/dma-buf/st-dma-fence-chain.c
@@ -318,15 +318,16 @@  static int find_out_of_order(void *arg)
 		goto err;
 	}
 
-	if (fence && fence != fc.chains[1]) {
+	/*
+	 * We signaled the middle fence (2) of the 1-2-3 chain. The behavior
+	 * of the dma-fence-chain is to make us wait for all the fences up to
+	 * the point we want. Since fence 1 is still not signaled, this what
+	 * we should get as fence to wait upon (fence 2 being garbage
+	 * collected during the traversal of the chain).
+	 */
+	if (fence != fc.chains[0]) {
 		pr_err("Incorrect chain-fence.seqno:%lld reported for completed seqno:2\n",
-		       fence->seqno);
-
-		dma_fence_get(fence);
-		err = dma_fence_chain_find_seqno(&fence, 2);
-		dma_fence_put(fence);
-		if (err)
-			pr_err("Reported %d for finding self!\n", err);
+		       fence ? fence->seqno : 0);
 
 		err = -EINVAL;
 	}
@@ -415,20 +416,18 @@  static int __find_race(void *arg)
 		if (!fence)
 			goto signal;
 
-		err = dma_fence_chain_find_seqno(&fence, seqno);
-		if (err) {
-			pr_err("Reported an invalid fence for find-self:%d\n",
-			       seqno);
-			dma_fence_put(fence);
-			break;
-		}
-
-		if (fence->seqno < seqno) {
-			pr_err("Reported an earlier fence.seqno:%lld for seqno:%d\n",
-			       fence->seqno, seqno);
-			err = -EINVAL;
-			dma_fence_put(fence);
-			break;
+		/*
+		 * We can only find ourselves if we are on fence we were
+		 * looking for.
+		 */
+		if (fence->seqno == seqno) {
+			err = dma_fence_chain_find_seqno(&fence, seqno);
+			if (err) {
+				pr_err("Reported an invalid fence for find-self:%d\n",
+				       seqno);
+				dma_fence_put(fence);
+				break;
+			}
 		}
 
 		dma_fence_put(fence);