[v5,01/10] drm/v3d: Fix the MMU flush order

Message ID	20240829130954.2439316-2-mcanal@igalia.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: =?utf-8?q?Ma=C3=ADra_Canal?= <mcanal@igalia.com> To: Melissa Wen <mwen@igalia.com>, Iago Toral <itoral@igalia.com>, Tvrtko Ursulin <tursulin@igalia.com>, Maarten Lankhorst <maarten.lankhorst@linux.intel.com>, Maxime Ripard <mripard@kernel.org>, Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>, Daniel Vetter <daniel@ffwll.ch> Cc: dri-devel@lists.freedesktop.org, kernel-dev@igalia.com, =?utf-8?q?Ma?= =?utf-8?q?=C3=ADra_Canal?= <mcanal@igalia.com> Subject: [PATCH v5 01/10] drm/v3d: Fix the MMU flush order Date: Thu, 29 Aug 2024 10:05:09 -0300 Message-ID: <20240829130954.2439316-2-mcanal@igalia.com> In-Reply-To: <20240829130954.2439316-1-mcanal@igalia.com> References: <20240829130954.2439316-1-mcanal@igalia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: list Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	drm/v3d: Enable Big and Super Pages \| expand [v5,00/10] drm/v3d: Enable Big and Super Pages [v5,01/10] drm/v3d: Fix the MMU flush order [v5,02/10] drm/v3d: Flush the MMU before we supply more memory to the binner [v5,03/10] drm/v3d: Fix return if scheduler initialization fails [v5,04/10] drm/gem: Create a drm_gem_object_init_with_mnt() function [v5,05/10] drm/v3d: Introduce gemfs [v5,06/10] drm/gem: Create shmem GEM object in a given mountpoint [v5,07/10] drm/v3d: Reduce the alignment of the node allocation [v5,08/10] drm/v3d: Support Big/Super Pages when writing out PTEs [v5,09/10] drm/v3d: Use gemfs/THP in BO creation if available [v5,10/10] drm/v3d: Add modparam for turning off Big/Super Pages

Message ID

20240829130954.2439316-2-mcanal@igalia.com (mailing list archive)

State

New, archived

Headers

From: =?utf-8?q?Ma=C3=ADra_Canal?= <mcanal@igalia.com>
To: Melissa Wen <mwen@igalia.com>, Iago Toral <itoral@igalia.com>,
 Tvrtko Ursulin <tursulin@igalia.com>,
 Maarten Lankhorst <maarten.lankhorst@linux.intel.com>,
 Maxime Ripard <mripard@kernel.org>,
 Thomas Zimmermann <tzimmermann@suse.de>, David Airlie <airlied@gmail.com>,
 Daniel Vetter <daniel@ffwll.ch>
Cc: dri-devel@lists.freedesktop.org, kernel-dev@igalia.com, =?utf-8?q?Ma?=
	=?utf-8?q?=C3=ADra_Canal?= <mcanal@igalia.com>
Subject: [PATCH v5 01/10] drm/v3d: Fix the MMU flush order
Date: Thu, 29 Aug 2024 10:05:09 -0300
Message-ID: <20240829130954.2439316-2-mcanal@igalia.com>
In-Reply-To: <20240829130954.2439316-1-mcanal@igalia.com>
References: <20240829130954.2439316-1-mcanal@igalia.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Precedence: list
Errors-To: dri-devel-bounces@lists.freedesktop.org
Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Series

drm/v3d: Enable Big and Super Pages | expand

Commit Message

Maíra Canal Aug. 29, 2024, 1:05 p.m. UTC

We must first flush the MMU cache and then, flush the TLB, not the other
way around. Currently, we can see a race condition between the MMU cache
and the TLB when running multiple rendering processes at the same time.
This is evidenced by MMU errors triggered by the IRQ.

Fix the MMU flush order by flushing the MMU cache and then the TLB.

Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
---
 drivers/gpu/drm/v3d/v3d_mmu.c | 29 ++++++++++-------------------
 1 file changed, 10 insertions(+), 19 deletions(-)

Comments

Iago Toral Sept. 4, 2024, 8:24 a.m. UTC | #1

Hi Maira,

El jue, 29-08-2024 a las 10:05 -0300, Maíra Canal escribió:
> We must first flush the MMU cache and then, flush the TLB, not the
> other
> way around. Currently, we can see a race condition between the MMU
> cache
> and the TLB when running multiple rendering processes at the same
> time.
> This is evidenced by MMU errors triggered by the IRQ.
> 
> Fix the MMU flush order by flushing the MMU cache and then the TLB.

the patch is making 2 changes, it is changing the ordering of the
flushes but also the fact that now we wait for the first flush to
commplete before starting the second while the previous version would
start both flushes and then wait for both to complete. The commit log
seems to suggest that the first change is the one that fixes the issue
but I wonder if that is really what is happening.

Also, have you tested keeping the original order of operations but with
interleaved waits like we do here? Either way, I think we probably
should emphasized more in the committ log the fact that we are now
waiting for each flush to complete before starting the other flush.


> 
> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for
> Broadcom V3D V3.x+")
> Signed-off-by: Maíra Canal <mcanal@igalia.com>
> ---
>  drivers/gpu/drm/v3d/v3d_mmu.c | 29 ++++++++++-------------------
>  1 file changed, 10 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c
> b/drivers/gpu/drm/v3d/v3d_mmu.c
> index 14f3af40d6f6..06bb44c7f35d 100644
> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
> @@ -32,32 +32,23 @@ static int v3d_mmu_flush_all(struct v3d_dev *v3d)
>  {
>  	int ret;
>  
> -	/* Make sure that another flush isn't already running when
> we
> -	 * start this one.
> -	 */
> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
> -	if (ret)
> -		dev_err(v3d->drm.dev, "TLB clear wait idle pre-wait
> failed\n");
> -

are we certain we can't have a flush in flux when a new one comes in?

> -	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
> -		  V3D_MMU_CTL_TLB_CLEAR);
> -
> -	V3D_WRITE(V3D_MMUC_CONTROL,
> -		  V3D_MMUC_CONTROL_FLUSH |
> +	V3D_WRITE(V3D_MMUC_CONTROL, V3D_MMUC_CONTROL_FLUSH |
>  		  V3D_MMUC_CONTROL_ENABLE);
>  
> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
> +	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
> +			 V3D_MMUC_CONTROL_FLUSHING), 100);
>  	if (ret) {
> -		dev_err(v3d->drm.dev, "TLB clear wait idle
> failed\n");
> +		dev_err(v3d->drm.dev, "MMUC flush wait idle
> failed\n");
>  		return ret;
>  	}
>  
> -	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
> -			 V3D_MMUC_CONTROL_FLUSHING), 100);
> +	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
> +		  V3D_MMU_CTL_TLB_CLEAR);
> +
> +	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
> +			 V3D_MMU_CTL_TLB_CLEARING), 100);
>  	if (ret)
> -		dev_err(v3d->drm.dev, "MMUC flush wait idle
> failed\n");
> +		dev_err(v3d->drm.dev, "TLB clear wait idle
> failed\n");

I'd maybe use "MMU TLB clear wait idle failed", so we can more easily
identify this message comes from the MMU implementation.

Iago

>  
>  	return ret;
>  }

Maíra Canal Sept. 18, 2024, 11:54 a.m. UTC | #2

Hi Iago,

On 9/4/24 05:24, Iago Toral wrote:
> Hi Maira,
> 
> El jue, 29-08-2024 a las 10:05 -0300, Maíra Canal escribió:
>> We must first flush the MMU cache and then, flush the TLB, not the
>> other
>> way around. Currently, we can see a race condition between the MMU
>> cache
>> and the TLB when running multiple rendering processes at the same
>> time.
>> This is evidenced by MMU errors triggered by the IRQ.
>>
>> Fix the MMU flush order by flushing the MMU cache and then the TLB.
> 
> the patch is making 2 changes, it is changing the ordering of the
> flushes but also the fact that now we wait for the first flush to
> commplete before starting the second while the previous version would
> start both flushes and then wait for both to complete. The commit log
> seems to suggest that the first change is the one that fixes the issue
> but I wonder if that is really what is happening.
> 
> Also, have you tested keeping the original order of operations but with
> interleaved waits like we do here? Either way, I think we probably

I just tested keeping the order of the operations but using interleaved
waits and I end up having MMU errors that crash the GPU. Changing the
order of the operations was a deliberated choice due the GPU design and
and waiting for each of those to finish is needed to avoid race
conditions.

> should emphasized more in the committ log the fact that we are now
> waiting for each flush to complete before starting the other flush.
> 

But you are absolutely right. I need to mention it on the commit log.
I'll add this information in the next version.

> 
>>
>> Fixes: 57692c94dcbe ("drm/v3d: Introduce a new DRM driver for
>> Broadcom V3D V3.x+")
>> Signed-off-by: Maíra Canal <mcanal@igalia.com>
>> ---
>>   drivers/gpu/drm/v3d/v3d_mmu.c | 29 ++++++++++-------------------
>>   1 file changed, 10 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c
>> b/drivers/gpu/drm/v3d/v3d_mmu.c
>> index 14f3af40d6f6..06bb44c7f35d 100644
>> --- a/drivers/gpu/drm/v3d/v3d_mmu.c
>> +++ b/drivers/gpu/drm/v3d/v3d_mmu.c
>> @@ -32,32 +32,23 @@ static int v3d_mmu_flush_all(struct v3d_dev *v3d)
>>   {
>>   	int ret;
>>   
>> -	/* Make sure that another flush isn't already running when
>> we
>> -	 * start this one.
>> -	 */
>> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
>> -	if (ret)
>> -		dev_err(v3d->drm.dev, "TLB clear wait idle pre-wait
>> failed\n");
>> -
> 
> are we certain we can't have a flush in flux when a new one comes in?

We can, but it doesn't really matter, as by the end of both requests,
the TLB will be flushed.

Best Regards,
- Maíra

> 
>> -	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
>> -		  V3D_MMU_CTL_TLB_CLEAR);
>> -
>> -	V3D_WRITE(V3D_MMUC_CONTROL,
>> -		  V3D_MMUC_CONTROL_FLUSH |
>> +	V3D_WRITE(V3D_MMUC_CONTROL, V3D_MMUC_CONTROL_FLUSH |
>>   		  V3D_MMUC_CONTROL_ENABLE);
>>   
>> -	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> -			 V3D_MMU_CTL_TLB_CLEARING), 100);
>> +	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
>> +			 V3D_MMUC_CONTROL_FLUSHING), 100);
>>   	if (ret) {
>> -		dev_err(v3d->drm.dev, "TLB clear wait idle
>> failed\n");
>> +		dev_err(v3d->drm.dev, "MMUC flush wait idle
>> failed\n");
>>   		return ret;
>>   	}
>>   
>> -	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
>> -			 V3D_MMUC_CONTROL_FLUSHING), 100);
>> +	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
>> +		  V3D_MMU_CTL_TLB_CLEAR);
>> +
>> +	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
>> +			 V3D_MMU_CTL_TLB_CLEARING), 100);
>>   	if (ret)
>> -		dev_err(v3d->drm.dev, "MMUC flush wait idle
>> failed\n");
>> +		dev_err(v3d->drm.dev, "TLB clear wait idle
>> failed\n");
> 
> I'd maybe use "MMU TLB clear wait idle failed", so we can more easily
> identify this message comes from the MMU implementation.
> 
> Iago
> 
>>   
>>   	return ret;
>>   }
>

diff --git a/drivers/gpu/drm/v3d/v3d_mmu.c b/drivers/gpu/drm/v3d/v3d_mmu.c
index 14f3af40d6f6..06bb44c7f35d 100644
--- a/drivers/gpu/drm/v3d/v3d_mmu.c
+++ b/drivers/gpu/drm/v3d/v3d_mmu.c
@@ -32,32 +32,23 @@  static int v3d_mmu_flush_all(struct v3d_dev *v3d)
 {
 	int ret;
 
-	/* Make sure that another flush isn't already running when we
-	 * start this one.
-	 */
-	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
-			 V3D_MMU_CTL_TLB_CLEARING), 100);
-	if (ret)
-		dev_err(v3d->drm.dev, "TLB clear wait idle pre-wait failed\n");
-
-	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
-		  V3D_MMU_CTL_TLB_CLEAR);
-
-	V3D_WRITE(V3D_MMUC_CONTROL,
-		  V3D_MMUC_CONTROL_FLUSH |
+	V3D_WRITE(V3D_MMUC_CONTROL, V3D_MMUC_CONTROL_FLUSH |
 		  V3D_MMUC_CONTROL_ENABLE);
 
-	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
-			 V3D_MMU_CTL_TLB_CLEARING), 100);
+	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
+			 V3D_MMUC_CONTROL_FLUSHING), 100);
 	if (ret) {
-		dev_err(v3d->drm.dev, "TLB clear wait idle failed\n");
+		dev_err(v3d->drm.dev, "MMUC flush wait idle failed\n");
 		return ret;
 	}
 
-	ret = wait_for(!(V3D_READ(V3D_MMUC_CONTROL) &
-			 V3D_MMUC_CONTROL_FLUSHING), 100);
+	V3D_WRITE(V3D_MMU_CTL, V3D_READ(V3D_MMU_CTL) |
+		  V3D_MMU_CTL_TLB_CLEAR);
+
+	ret = wait_for(!(V3D_READ(V3D_MMU_CTL) &
+			 V3D_MMU_CTL_TLB_CLEARING), 100);
 	if (ret)
-		dev_err(v3d->drm.dev, "MMUC flush wait idle failed\n");
+		dev_err(v3d->drm.dev, "TLB clear wait idle failed\n");
 
 	return ret;
 }

[v5,01/10] drm/v3d: Fix the MMU flush order

Commit Message

Comments

Patch