diff mbox

drm/i915: Setup all page directories for gen8

Message ID 1425395009-9899-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala March 3, 2015, 3:03 p.m. UTC
If the mappable size is less than what the full range
of pdps can address, we end up setting pdps for only the
mappable area.

The logical context however needs valid pdp entries.
Prior to commit 06fda602dbca ("drm/i915: Create page table allocators")
we just have been writing pdp entries with dma address of zero instead
of valid pdps. This is supposedly bad even if those pdps are not
addressed.

As commit 06fda602dbca ("drm/i915: Create page table allocators")
introduced more dynamic structure for pdps, we ended up oopsing
when we populated the lrc context. Analyzing this oops revealed
the fact that we have not been writing valid pdps with bsw, as
it is doing the ppgtt init with 2gb limit.

We should do the right thing and setup the non addressable part
pdps/pde/pte to scratch page through the minimal structure by
having just pdp with pde entries pointing to same page with
pte entries pointing to scratch page.

But instead of going through that trouble, setup all the pdps
through individual pd pages and pt entries, even for non
addressable parts. This way we populate the lrc with valid
pdps and gives us a base for dynamic page allocation to
introduce code that truncates the page table structure.

The regression of oopsing in init was introduced by
commit 06fda602dbca ("drm/i915: Create page table allocators")

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89350
Tested-by: Valtteri Rantala <valtteri.rantala@intel.com>
Cc: Michel Thierry <michel.thierry@intel.com>
Cc: Ben Widawsky <benjamin.widawsky@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ville Syrjälä March 3, 2015, 4:32 p.m. UTC | #1
On Tue, Mar 03, 2015 at 05:03:29PM +0200, Mika Kuoppala wrote:
> If the mappable size is less than what the full range
> of pdps can address, we end up setting pdps for only the
> mappable area.

mappable is not a factor here. The global gtt is 2GiB and we just used
the same size for the ppgtt, which made sense for aliasing ppgtt I
suppose.

> 
> The logical context however needs valid pdp entries.
> Prior to commit 06fda602dbca ("drm/i915: Create page table allocators")
> we just have been writing pdp entries with dma address of zero instead
> of valid pdps. This is supposedly bad even if those pdps are not
> addressed.
> 
> As commit 06fda602dbca ("drm/i915: Create page table allocators")
> introduced more dynamic structure for pdps, we ended up oopsing
> when we populated the lrc context. Analyzing this oops revealed
> the fact that we have not been writing valid pdps with bsw, as
> it is doing the ppgtt init with 2gb limit.
> 
> We should do the right thing and setup the non addressable part
> pdps/pde/pte to scratch page through the minimal structure by
> having just pdp with pde entries pointing to same page with
> pte entries pointing to scratch page.
> 
> But instead of going through that trouble, setup all the pdps
> through individual pd pages and pt entries, even for non
> addressable parts. This way we populate the lrc with valid
> pdps and gives us a base for dynamic page allocation to
> introduce code that truncates the page table structure.

This means using an extra 4+MiB of kernel memroy per address space. But
I guess the dynamic page table stuff is coming along so it'll get sorted
out eventually.

But this won't actually prevent the GPU from faulting for >=2GiB
addresses since we leave the extra PTEs zeroed (ie. valid=0). We'd
need to extend the initial .clear_range() to make sure all the new
PTEs point to the scratch page. If we go to the trouble of allocating
the page tables I think we might as well set them up fully.

Previously when we just left the PDPs zeroed the GPU might or might not
fault depending on what kind of data was in the page at bus address 0.
I've occasionally wondered why the hardware designers didn't use the the
normal PTE/PDE encoding for the PDP registers so that you could have a
valid bit already at the top level.

> 
> The regression of oopsing in init was introduced by
> commit 06fda602dbca ("drm/i915: Create page table allocators")
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89350
> Tested-by: Valtteri Rantala <valtteri.rantala@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index bd95776..848a821 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -709,7 +709,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>   */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
> -	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> +	const int max_pdp = GEN8_LEGACY_PDPES;
>  	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
>  	int i, j, ret;
>  
> -- 
> 1.9.1
Shuang He March 4, 2015, 10:48 a.m. UTC | #2
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5878
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -9              278/278              269/278
ILK                                  308/308              308/308
SNB                                  284/284              284/284
IVB                                  380/380              380/380
BYT                                  294/294              294/294
HSW                                  387/387              387/387
BDW                 -1              316/316              315/316
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gem_fence_thrash_bo-write-verify-none      NRUN(1)PASS(6)      FAIL(1)PASS(1)
*PNV  igt_gem_fence_thrash_bo-write-verify-x      PASS(7)      FAIL(1)PASS(1)
*PNV  igt_gem_fence_thrash_bo-write-verify-y      NO_RESULT(1)PASS(6)      FAIL(1)PASS(1)
 PNV  igt_gem_userptr_blits_coherency-sync      NO_RESULT(1)CRASH(7)NRUN(1)PASS(8)      CRASH(2)
 PNV  igt_gem_userptr_blits_coherency-unsync      NO_RESULT(1)CRASH(6)PASS(7)      CRASH(2)
*PNV  igt_gem_userptr_blits_forked-unsync-swapping-multifd-mempressure-interruptible      PASS(2)      NRUN(1)PASS(1)
*PNV  igt_gem_userptr_blits_input-checking      PASS(2)      NRUN(1)PASS(1)
 PNV  igt_gem_userptr_blits_minor-unsync-interruptible      DMESG_WARN(1)PASS(5)      DMESG_WARN(1)PASS(1)
 PNV  igt_gem_fence_thrash_bo-write-verify-threaded-none      FAIL(2)CRASH(4)PASS(6)      CRASH(1)PASS(1)
*BDW  igt_gem_gtt_hog      PASS(19)      DMESG_WARN(1)PASS(1)
Note: You need to pay more attention to line start with '*'
Ben Widawsky March 4, 2015, 7:58 p.m. UTC | #3
On Tue, Mar 03, 2015 at 05:03:29PM +0200, Mika Kuoppala wrote:
> If the mappable size is less than what the full range
> of pdps can address, we end up setting pdps for only the
> mappable area.
> 
> The logical context however needs valid pdp entries.
> Prior to commit 06fda602dbca ("drm/i915: Create page table allocators")
> we just have been writing pdp entries with dma address of zero instead
> of valid pdps. This is supposedly bad even if those pdps are not
> addressed.
> 
> As commit 06fda602dbca ("drm/i915: Create page table allocators")
> introduced more dynamic structure for pdps, we ended up oopsing
> when we populated the lrc context. Analyzing this oops revealed
> the fact that we have not been writing valid pdps with bsw, as
> it is doing the ppgtt init with 2gb limit.
> 
> We should do the right thing and setup the non addressable part
> pdps/pde/pte to scratch page through the minimal structure by
> having just pdp with pde entries pointing to same page with
> pte entries pointing to scratch page.
> 
> But instead of going through that trouble, setup all the pdps
> through individual pd pages and pt entries, even for non
> addressable parts. This way we populate the lrc with valid
> pdps and gives us a base for dynamic page allocation to
> introduce code that truncates the page table structure.
> 
> The regression of oopsing in init was introduced by
> commit 06fda602dbca ("drm/i915: Create page table allocators")
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89350
> Tested-by: Valtteri Rantala <valtteri.rantala@intel.com>
> Cc: Michel Thierry <michel.thierry@intel.com>
> Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index bd95776..848a821 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -709,7 +709,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
>   */
>  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
>  {
> -	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> +	const int max_pdp = GEN8_LEGACY_PDPES;
>  	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
>  	int i, j, ret;
>  

FWIW, I think I solved this later in my original series:
http://lists.freedesktop.org/archives/intel-gfx/2014-August/051162.html
Daniel Vetter March 5, 2015, 12:14 p.m. UTC | #4
On Wed, Mar 04, 2015 at 11:58:41AM -0800, Ben Widawsky wrote:
> On Tue, Mar 03, 2015 at 05:03:29PM +0200, Mika Kuoppala wrote:
> > If the mappable size is less than what the full range
> > of pdps can address, we end up setting pdps for only the
> > mappable area.
> > 
> > The logical context however needs valid pdp entries.
> > Prior to commit 06fda602dbca ("drm/i915: Create page table allocators")
> > we just have been writing pdp entries with dma address of zero instead
> > of valid pdps. This is supposedly bad even if those pdps are not
> > addressed.
> > 
> > As commit 06fda602dbca ("drm/i915: Create page table allocators")
> > introduced more dynamic structure for pdps, we ended up oopsing
> > when we populated the lrc context. Analyzing this oops revealed
> > the fact that we have not been writing valid pdps with bsw, as
> > it is doing the ppgtt init with 2gb limit.
> > 
> > We should do the right thing and setup the non addressable part
> > pdps/pde/pte to scratch page through the minimal structure by
> > having just pdp with pde entries pointing to same page with
> > pte entries pointing to scratch page.
> > 
> > But instead of going through that trouble, setup all the pdps
> > through individual pd pages and pt entries, even for non
> > addressable parts. This way we populate the lrc with valid
> > pdps and gives us a base for dynamic page allocation to
> > introduce code that truncates the page table structure.
> > 
> > The regression of oopsing in init was introduced by
> > commit 06fda602dbca ("drm/i915: Create page table allocators")
> > 
> > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=89350
> > Tested-by: Valtteri Rantala <valtteri.rantala@intel.com>
> > Cc: Michel Thierry <michel.thierry@intel.com>
> > Cc: Ben Widawsky <benjamin.widawsky@intel.com>
> > Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index bd95776..848a821 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -709,7 +709,7 @@ static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
> >   */
> >  static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
> >  {
> > -	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
> > +	const int max_pdp = GEN8_LEGACY_PDPES;
> >  	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
> >  	int i, j, ret;
> >  
> 
> FWIW, I think I solved this later in my original series:
> http://lists.freedesktop.org/archives/intel-gfx/2014-August/051162.html

Yeah that looks a lot saner I agree. But since dynamic pt alloc
essentially amounts to the same (we'll start out with all zero entries in
the pds) the indirection with the additional scratch_pd doesn't help I
think.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index bd95776..848a821 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -709,7 +709,7 @@  static int gen8_ppgtt_setup_page_tables(struct i915_hw_ppgtt *ppgtt,
  */
 static int gen8_ppgtt_init(struct i915_hw_ppgtt *ppgtt, uint64_t size)
 {
-	const int max_pdp = DIV_ROUND_UP(size, 1 << 30);
+	const int max_pdp = GEN8_LEGACY_PDPES;
 	const int min_pt_pages = GEN8_PDES_PER_PAGE * max_pdp;
 	int i, j, ret;