[02/10] drm/etnaviv: mmuv2: don't map zero page

Message ID	20181219144546.28224-3-l.stach@pengutronix.de (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: Lucas Stach <l.stach@pengutronix.de> To: etnaviv@lists.freedesktop.org Subject: [PATCH 02/10] drm/etnaviv: mmuv2: don't map zero page Date: Wed, 19 Dec 2018 15:45:38 +0100 Message-Id: <20181219144546.28224-3-l.stach@pengutronix.de> In-Reply-To: <20181219144546.28224-1-l.stach@pengutronix.de> References: <20181219144546.28224-1-l.stach@pengutronix.de> MIME-Version: 1.0 Precedence: list Cc: patchwork-lst@pengutronix.de, kernel@pengutronix.de, dri-devel@lists.freedesktop.org, Russell King <linux+etnaviv@armlinux.org.uk> Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	per-process address spaces for MMUv2 \| expand [00/10] per-process address spaces for MMUv2 [01/10] drm/etnaviv: move job context pointer to etnaviv_gem_submit [02/10] drm/etnaviv: mmuv2: don't map zero page [03/10] drm/etnaviv: split out cmdbuf mapping into address space [04/10] drm/etnaviv: share a single cmdbuf suballoc region across all GPUs [05/10] drm/etnaviv: replace MMU flush marker with flush sequence [06/10] drm/etnaviv: rework MMU handling [07/10] drm/etnaviv: split out starting of FE idle loop [08/10] drm/etnaviv: provide MMU context to etnaviv_gem_mapping_get [09/10] drm/etnaviv: implement per-process address spaces on MMUv2 [10/10] drm/etnaviv: dump only failing submit

Lucas Stach Dec. 19, 2018, 2:45 p.m. UTC

Keep the page at address 0 as faulting to catch any potential state
setup issues early.

Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
---
 drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Guido Günther Dec. 30, 2018, 3:49 p.m. UTC | #1

Hi Lucas,
On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> Keep the page at address 0 as faulting to catch any potential state
> setup issues early.

This is a nice idea! But applying this and making mesa hit that page
leads to the process hanging in D state over here on GC7000:

# [  242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
[  242.733010]       Not tainted 4.18.0-00129-gce2b21074b41 #504
[  242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.746638] kworker/u8:2    D    0    37      2 0x00000028
[  242.752144] Workqueue: events_unbound commit_work
[  242.756860] Call trace:
[  242.759318]  __switch_to+0x94/0xd0
[  242.762741]  __schedule+0x1c0/0x6b8
[  242.766239]  schedule+0x40/0xa8
[  242.769380]  schedule_timeout+0x2f0/0x428
[  242.773410]  dma_fence_default_wait+0x1cc/0x2b8
[  242.777951]  dma_fence_wait_timeout+0x44/0x1b0
[  242.782403]  drm_atomic_helper_wait_for_fences+0x48/0x108
[  242.787819]  commit_tail+0x30/0x80
[  242.791229]  commit_work+0x20/0x30
[  242.794642]  process_one_work+0x1ec/0x458
[  242.798659]  worker_thread+0x48/0x430
[  242.802331]  kthread+0x130/0x138
[  242.805557]  ret_from_fork+0x10/0x1c

This is in dmesg showing that we hit the first page:

    [   65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
    [   65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40

Without that patch it's sampling random data from that page but does not hang.

Cheers,
 -- Guido

> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> index f1c88d8ad5ba..f794e04be9e6 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu)
>  	domain = &etnaviv_domain->base;
>  
>  	domain->dev = gpu->dev;
> -	domain->base = 0;
> -	domain->size = (u64)SZ_1G * 4;
> +	domain->base = SZ_4K;
> +	domain->size = (u64)SZ_1G * 4 - SZ_4K;
>  	domain->ops = &etnaviv_iommuv2_ops;
>  
>  	ret = etnaviv_iommuv2_init(etnaviv_domain);
> -- 
> 2.19.1
> 
> _______________________________________________
> etnaviv mailing list
> etnaviv@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/etnaviv

Lucas Stach Jan. 7, 2019, 8:50 a.m. UTC | #2

Hi Guido,

Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther:
> Hi Lucas,
> On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> > Keep the page at address 0 as faulting to catch any potential state
> > setup issues early.
> 
> This is a nice idea! But applying this and making mesa hit that page
> leads to the process hanging in D state over here on GC7000:
> 
> # [  242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
> [  242.733010]       Not tainted 4.18.0-00129-gce2b21074b41 #504
> [  242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  242.746638] kworker/u8:2    D    0    37      2 0x00000028
> [  242.752144] Workqueue: events_unbound commit_work
> [  242.756860] Call trace:
> [  242.759318]  __switch_to+0x94/0xd0
> [  242.762741]  __schedule+0x1c0/0x6b8
> [  242.766239]  schedule+0x40/0xa8
> [  242.769380]  schedule_timeout+0x2f0/0x428
> [  242.773410]  dma_fence_default_wait+0x1cc/0x2b8
> [  242.777951]  dma_fence_wait_timeout+0x44/0x1b0
> [  242.782403]  drm_atomic_helper_wait_for_fences+0x48/0x108
> [  242.787819]  commit_tail+0x30/0x80
> [  242.791229]  commit_work+0x20/0x30
> [  242.794642]  process_one_work+0x1ec/0x458
> [  242.798659]  worker_thread+0x48/0x430
> [  242.802331]  kthread+0x130/0x138
> [  242.805557]  ret_from_fork+0x10/0x1c
> 
> This is in dmesg showing that we hit the first page:
> 
>     [   65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
>     [   65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40
> 
> Without that patch it's sampling random data from that page but does not hang.

GPU hangs after a MMU fault are expected or more accurately, we
actively request the GPU to stop by setting the exception bit in the
page table.

A hanging GPU should trigger the scheduler timeout handler, which then
makes sure to get the GPU back into a working state. So if things don't
progress after the fault for you either the timeout handler is buggy on
GC7000, or the fence signaling is broken somehow. I'll take a look at
this.

Regards,
Lucas

> Cheers,
>  -- Guido
> 
> > 
> > > > Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> > ---
> >  drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> > index f1c88d8ad5ba..f794e04be9e6 100644
> > --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> > +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> > @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu)
> > > >  	domain = &etnaviv_domain->base;
> >  
> > > >  	domain->dev = gpu->dev;
> > > > -	domain->base = 0;
> > > > -	domain->size = (u64)SZ_1G * 4;
> > > > +	domain->base = SZ_4K;
> > > > +	domain->size = (u64)SZ_1G * 4 - SZ_4K;
> > > >  	domain->ops = &etnaviv_iommuv2_ops;
> >  
> > > >  	ret = etnaviv_iommuv2_init(etnaviv_domain);
> > -- 
> > 2.19.1
> > 
> > _______________________________________________
> > etnaviv mailing list
> > etnaviv@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/etnaviv

Guido Günther Jan. 7, 2019, 9:13 a.m. UTC | #3

Hi,
On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote:
> Hi Guido,
> 
> Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther:
> > Hi Lucas,
> > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> > > Keep the page at address 0 as faulting to catch any potential state
> > > setup issues early.
> > 
> > This is a nice idea! But applying this and making mesa hit that page
> > leads to the process hanging in D state over here on GC7000:
> > 
> > # [  242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
> > [  242.733010]       Not tainted 4.18.0-00129-gce2b21074b41 #504
> > [  242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  242.746638] kworker/u8:2    D    0    37      2 0x00000028
> > [  242.752144] Workqueue: events_unbound commit_work
> > [  242.756860] Call trace:
> > [  242.759318]  __switch_to+0x94/0xd0
> > [  242.762741]  __schedule+0x1c0/0x6b8
> > [  242.766239]  schedule+0x40/0xa8
> > [  242.769380]  schedule_timeout+0x2f0/0x428
> > [  242.773410]  dma_fence_default_wait+0x1cc/0x2b8
> > [  242.777951]  dma_fence_wait_timeout+0x44/0x1b0
> > [  242.782403]  drm_atomic_helper_wait_for_fences+0x48/0x108
> > [  242.787819]  commit_tail+0x30/0x80
> > [  242.791229]  commit_work+0x20/0x30
> > [  242.794642]  process_one_work+0x1ec/0x458
> > [  242.798659]  worker_thread+0x48/0x430
> > [  242.802331]  kthread+0x130/0x138
> > [  242.805557]  ret_from_fork+0x10/0x1c
> > 
> > This is in dmesg showing that we hit the first page:
> > 
> >     [   65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
> >     [   65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40
> > 
> > Without that patch it's sampling random data from that page but does not hang.
> 
> GPU hangs after a MMU fault are expected or more accurately, we
> actively request the GPU to stop by setting the exception bit in the
> page table.

Yeah. I put that in to show that this the cause for the trouble above.

> 
> A hanging GPU should trigger the scheduler timeout handler, which then
> makes sure to get the GPU back into a working state. So if things don't
> progress after the fault for you either the timeout handler is buggy on
> GC7000, or the fence signaling is broken somehow. I'll take a look at
> this.

This isn't a top notch linux-next based tree yet so if you're not seeing this
let me forward port our stuff to that and report back again.

Cheers,
 -- Guido

Lucas Stach Jan. 7, 2019, 3:02 p.m. UTC | #4

Am Montag, den 07.01.2019, 10:13 +0100 schrieb Guido Günther:
> Hi,
> On Mon, Jan 07, 2019 at 09:50:52AM +0100, Lucas Stach wrote:
> > Hi Guido,
> > 
> > Am Sonntag, den 30.12.2018, 16:49 +0100 schrieb Guido Günther:
> > > Hi Lucas,
> > > On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> > > > Keep the page at address 0 as faulting to catch any potential state
> > > > setup issues early.
> > > 
> > > This is a nice idea! But applying this and making mesa hit that page
> > > leads to the process hanging in D state over here on GC7000:
> > > 
> > > # [  242.726192] INFO: task kworker/u8:2:37 blocked for more than 120 seconds.
> > > [  242.733010]       Not tainted 4.18.0-00129-gce2b21074b41 #504
> > > [  242.738795] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [  242.746638] kworker/u8:2    D    0    37      2 0x00000028
> > > [  242.752144] Workqueue: events_unbound commit_work
> > > [  242.756860] Call trace:
> > > [  242.759318]  __switch_to+0x94/0xd0
> > > [  242.762741]  __schedule+0x1c0/0x6b8
> > > [  242.766239]  schedule+0x40/0xa8
> > > [  242.769380]  schedule_timeout+0x2f0/0x428
> > > [  242.773410]  dma_fence_default_wait+0x1cc/0x2b8
> > > [  242.777951]  dma_fence_wait_timeout+0x44/0x1b0
> > > [  242.782403]  drm_atomic_helper_wait_for_fences+0x48/0x108
> > > [  242.787819]  commit_tail+0x30/0x80
> > > [  242.791229]  commit_work+0x20/0x30
> > > [  242.794642]  process_one_work+0x1ec/0x458
> > > [  242.798659]  worker_thread+0x48/0x430
> > > [  242.802331]  kthread+0x130/0x138
> > > [  242.805557]  ret_from_fork+0x10/0x1c
> > > 
> > > This is in dmesg showing that we hit the first page:
> > > 
> > >     [   65.907388] etnaviv-gpu 38000000.gpu: MMU fault status 0x00000002
> > >     [   65.913497] etnaviv-gpu 38000000.gpu: MMU 0 fault addr 0x00000e40
> > > 
> > > Without that patch it's sampling random data from that page but does not hang.
> > 
> > GPU hangs after a MMU fault are expected or more accurately, we
> > actively request the GPU to stop by setting the exception bit in the
> > page table.
> 
> Yeah. I put that in to show that this the cause for the trouble above.
> 
> > 
> > A hanging GPU should trigger the scheduler timeout handler, which then
> > makes sure to get the GPU back into a working state. So if things don't
> > progress after the fault for you either the timeout handler is buggy on
> > GC7000, or the fence signaling is broken somehow. I'll take a look at
> > this.
> 
> This isn't a top notch linux-next based tree yet so if you're not seeing this
> let me forward port our stuff to that and report back again.

I've certainly seen the timeout handler working on GC7000, but with the
GC7000 support being relatively lightly tested right now, I wouldn't
bet on us handling all corner cases correctly.

If this is an issue on a recent kernel, I would certainly love to learn
what's going wrong.

Regards,
Lucas

Christian Gmeiner Feb. 1, 2019, 7:57 a.m. UTC | #5

Am Mi., 19. Dez. 2018 um 15:45 Uhr schrieb Lucas Stach <l.stach@pengutronix.de>:
>
> Keep the page at address 0 as faulting to catch any potential state
> setup issues early.
>
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>

I like this idea.. but I am unsure about Guido's GC7000 problem.

Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>

> ---
>  drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> index f1c88d8ad5ba..f794e04be9e6 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu)
>         domain = &etnaviv_domain->base;
>
>         domain->dev = gpu->dev;
> -       domain->base = 0;
> -       domain->size = (u64)SZ_1G * 4;
> +       domain->base = SZ_4K;
> +       domain->size = (u64)SZ_1G * 4 - SZ_4K;
>         domain->ops = &etnaviv_iommuv2_ops;
>
>         ret = etnaviv_iommuv2_init(etnaviv_domain);
> --
> 2.19.1
>

Guido Günther April 2, 2019, 11:38 a.m. UTC | #6

Hi,
On Mon, Jan 07, 2019 at 04:02:33PM +0100, Lucas Stach wrote:
[..snip..]
> I've certainly seen the timeout handler working on GC7000, but with the
> GC7000 support being relatively lightly tested right now, I wouldn't
> bet on us handling all corner cases correctly.
> 
> If this is an issue on a recent kernel, I would certainly love to learn
> what's going wrong.

I've brought my drm more in line with 5.x and it doesn't seem to hang
anymore.
Cheers,
 -- Guido

> 
> Regards,
> Lucas
> _______________________________________________
> etnaviv mailing list
> etnaviv@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/etnaviv

Guido Günther April 2, 2019, 11:39 a.m. UTC | #7

Hi,
On Wed, Dec 19, 2018 at 03:45:38PM +0100, Lucas Stach wrote:
> Keep the page at address 0 as faulting to catch any potential state
> setup issues early.
> 
> Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
> ---
>  drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> index f1c88d8ad5ba..f794e04be9e6 100644
> --- a/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> +++ b/drivers/gpu/drm/etnaviv/etnaviv_iommu_v2.c
> @@ -320,8 +320,8 @@ etnaviv_iommuv2_domain_alloc(struct etnaviv_gpu *gpu)
>  	domain = &etnaviv_domain->base;
>  
>  	domain->dev = gpu->dev;
> -	domain->base = 0;
> -	domain->size = (u64)SZ_1G * 4;
> +	domain->base = SZ_4K;
> +	domain->size = (u64)SZ_1G * 4 - SZ_4K;
>  	domain->ops = &etnaviv_iommuv2_ops;
>  
>  	ret = etnaviv_iommuv2_init(etnaviv_domain);
> -- 

Reviewed-By: Guido Günther <agx@sigxcpu.org>

Cheers and sorry for the extreme delay,
 -- Guido

[02/10] drm/etnaviv: mmuv2: don't map zero page

Commit Message

Comments

Patch