drm/i915/bdw: Fix GEN8 GTT size calculation

Message ID	1394816286-31200-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Ben Widawsky <benjamin.widawsky@intel.com> To: Intel GFX <intel-gfx@lists.freedesktop.org> Date: Fri, 14 Mar 2014 09:58:06 -0700 Message-Id: <1394816286-31200-1-git-send-email-benjamin.widawsky@intel.com> Cc: Ben Widawsky <ben@bwidawsk.net>, David Woodhouse <David.Woodhouse@intel.com>, Ben Widawsky <benjamin.widawsky@intel.com> Subject: [Intel-gfx] [PATCH] drm/i915/bdw: Fix GEN8 GTT size calculation Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: intel-gfx-bounces@lists.freedesktop.org Errors-To: intel-gfx-bounces@lists.freedesktop.org

Ben Widawsky March 14, 2014, 4:58 p.m. UTC

The preliminary HW support check is no longer needed, and the
calculation is simplified while here.

Reported-by: David Woodhouse <David.Woodhouse@intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +--------
 1 file changed, 1 insertion(+), 8 deletions(-)

Daniel Vetter March 14, 2014, 6:40 p.m. UTC | #1

On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote:
> The preliminary HW support check is no longer needed, and the
> calculation is simplified while here.
> 
> Reported-by: David Woodhouse <David.Woodhouse@intel.com>
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +--------
>  1 file changed, 1 insertion(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 40a2b36..694112a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
>  {
>  	bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
>  	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
> -	if (bdw_gmch_ctl)
> -		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
> -	if (bdw_gmch_ctl > 4) {
> -		WARN_ON(!i915_preliminary_hw_support);
> -		return 4<<20;
> -	}
> -
> -	return bdw_gmch_ctl << 20;
> +	return 1 << (bdw_gmch_ctl + 20);

I don't have this in my tree, and it seems to never have existed in
upstream ...

/me is confused

Cheers, Daniel

>  }
>  
>  static inline size_t gen6_get_stolen_size(u16 snb_gmch_ctl)
> -- 
> 1.9.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Ben Widawsky March 14, 2014, 7:14 p.m. UTC | #2

On Fri, Mar 14, 2014 at 07:40:30PM +0100, Daniel Vetter wrote:
> On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote:
> > The preliminary HW support check is no longer needed, and the
> > calculation is simplified while here.
> > 
> > Reported-by: David Woodhouse <David.Woodhouse@intel.com>
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +--------
> >  1 file changed, 1 insertion(+), 8 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > index 40a2b36..694112a 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
> >  {
> >  	bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
> >  	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
> > -	if (bdw_gmch_ctl)
> > -		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
> > -	if (bdw_gmch_ctl > 4) {
> > -		WARN_ON(!i915_preliminary_hw_support);
> > -		return 4<<20;
> > -	}
> > -
> > -	return bdw_gmch_ctl << 20;
> > +	return 1 << (bdw_gmch_ctl + 20);
> 
> I don't have this in my tree, and it seems to never have existed in
> upstream ...
> 
> /me is confused
> 
> Cheers, Daniel
> 

This was based off of linus master, since I assumed that is what David
was using. Honestly, I hadn't looked at our Intel trees. Maybe we're
good here already.

Daniel Vetter March 14, 2014, 8:03 p.m. UTC | #3

On Fri, Mar 14, 2014 at 12:14:05PM -0700, Ben Widawsky wrote:
> On Fri, Mar 14, 2014 at 07:40:30PM +0100, Daniel Vetter wrote:
> > On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote:
> > > The preliminary HW support check is no longer needed, and the
> > > calculation is simplified while here.
> > > 
> > > Reported-by: David Woodhouse <David.Woodhouse@intel.com>
> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > > ---
> > >  drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +--------
> > >  1 file changed, 1 insertion(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > index 40a2b36..694112a 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > > @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl)
> > >  {
> > >  	bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT;
> > >  	bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK;
> > > -	if (bdw_gmch_ctl)
> > > -		bdw_gmch_ctl = 1 << bdw_gmch_ctl;
> > > -	if (bdw_gmch_ctl > 4) {
> > > -		WARN_ON(!i915_preliminary_hw_support);
> > > -		return 4<<20;
> > > -	}
> > > -
> > > -	return bdw_gmch_ctl << 20;
> > > +	return 1 << (bdw_gmch_ctl + 20);
> > 
> > I don't have this in my tree, and it seems to never have existed in
> > upstream ...
> > 
> > /me is confused
> > 
> > Cheers, Daniel
> > 
> 
> This was based off of linus master, since I assumed that is what David
> was using. Honestly, I hadn't looked at our Intel trees. Maybe we're
> good here already.

Queued for -next, thanks for the patch.
-Daniel

Daniel Vetter March 14, 2014, 9:56 p.m. UTC | #4

On Fri, Mar 14, 2014 at 8:14 PM, Ben Widawsky <ben@bwidawsk.net> wrote:
> This was based off of linus master, since I assumed that is what David
> was using. Honestly, I hadn't looked at our Intel trees. Maybe we're
> good here already.

Ok, now I was even more confused ... Turns out this was part of
"drm/i915/bdw: Limit GTT to 2GB" which is now reverted in dinq. Hence
why I didn't spot it anywhere.

Maybe poke Jani to queue this up for -fixes? Oh and ignore my merge
notice mail, that's just the other silly confused danvet who escaped
from the chains ...
-Daniel

David Woodhouse March 14, 2014, 10:05 p.m. UTC | #5

On Fri, 2014-03-14 at 12:14 -0700, Ben Widawsky wrote:
> 
> This was based off of linus master, since I assumed that is what David
> was using.

Indeed. Now it's much better. Now that nasty scary warning is gone away
and now I just have a few hundred of these (from your other patch before
which it crashed instead):
[    6.167463] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up
... 
[   35.443214] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up

... followed by these until I turn it off:
[   38.190001] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ade47000 
[   38.190001] DMAR:[fault reason 06] PTE Read access is not set
[   38.204801] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr add5e000 
[   38.204801] DMAR:[fault reason 06] PTE Read access is not set
[   38.219598] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr adc75000 
[   38.219598] DMAR:[fault reason 06] PTE Read access is not set

As I noted in private though, there's an RMRR instructing us to set up a
1:1 mapping for the entire ad000000-af7fffff range. An instruction which
we appear to be following. So I have no idea why we're taking faults for
those addresses; that might be a problem in IOMMU code. Will look into
that further...

Daniel Vetter March 14, 2014, 10:10 p.m. UTC | #6

On Fri, Mar 14, 2014 at 11:05 PM, Woodhouse, David
<david.woodhouse@intel.com> wrote:
> On Fri, 2014-03-14 at 12:14 -0700, Ben Widawsky wrote:
>>
>> This was based off of linus master, since I assumed that is what David
>> was using.
>
> Indeed. Now it's much better. Now that nasty scary warning is gone away
> and now I just have a few hundred of these (from your other patch before
> which it crashed instead):
> [    6.167463] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up
> ...
> [   35.443214] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up

drm-intel-fixes should help:
http://cgit.freedesktop.org/drm-intel/commit/?id=5c673b60a9b3b23486f4eda75c72e91d31d26a2b

> ... followed by these until I turn it off:
> [   38.190001] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ade47000
> [   38.190001] DMAR:[fault reason 06] PTE Read access is not set
> [   38.204801] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr add5e000
> [   38.204801] DMAR:[fault reason 06] PTE Read access is not set
> [   38.219598] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr adc75000
> [   38.219598] DMAR:[fault reason 06] PTE Read access is not set
>
> As I noted in private though, there's an RMRR instructing us to set up a
> 1:1 mapping for the entire ad000000-af7fffff range. An instruction which
> we appear to be following. So I have no idea why we're taking faults for
> those addresses; that might be a problem in IOMMU code. Will look into
> that further...

I've asked you on private irc whether this range matches/overlaps with
stolen - we know of things blowing up at least on earlier generations
in combination with dmar. Please boot with drm.debug=0xe and scan for
the stolen mem reporting:

[    9.459273] [drm:i915_gem_init_stolen], found 67108864 bytes of
stolen memory at 9ba00000

Your range has roughly an offset/size stolen usually has.

Thanks, Daniel

David Woodhouse March 17, 2014, 4:17 p.m. UTC | #7

On Fri, 2014-03-14 at 23:10 +0100, Daniel Vetter wrote:
> 
> I've asked you on private irc whether this range matches/overlaps with
> stolen - we know of things blowing up at least on earlier generations
> in combination with dmar. Please boot with drm.debug=0xe and scan for
> the stolen mem reporting:

Surely it *has* to be stolen? That's the whole *point* in the RMRR that
the BIOS provides, telling us that the gfx unit is expecting to do DMA
to this range of memory.

If it isn't stolen, it's just being wantonly "borrowed".

Have you *ever* known an RMRR point at memory other than the stolen
range?

I think I have found this problem on the IOMMU side. We usually assume
that RMRRs are for boot-time only, such as USB controllers for the
legacy keyboard/mouse emulation. And a patch sneaked in which
effectively *unmaps* the RMRR regions when you do the first "real"
mapping for the driver. Having fixed that, I think I should no longer
see these DMA faults.

Can't tell though, because my machine is still dying in an endless
stream of
[  199.647850] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up

David Woodhouse March 17, 2014, 4:19 p.m. UTC | #8

On Mon, 2014-03-17 at 16:17 +0000, David Woodhouse wrote:
> 
> Can't tell though, because my machine is still dying in an endless
> stream of
> [  199.647850] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR*
> Interrupt arrived before CRTCs were setup up

Except when the failure mode is different...

[   24.143180] ------------[ cut here ]------------
[   24.150433] WARNING: CPU: 1 PID: 161 at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0()
[   24.162945] Watchdog detected hard LOCKUP on cpu 1
[   24.168206] Modules linked in: i915(+) i2c_algo_bit drm_kms_helper drm i2c_core video
[   24.181743] CPU: 1 PID: 161 Comm: systemd-udevd Not tainted 3.14.0-rc5+ #54
[   24.191833] Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1, BIOS BDW-E2R1.86C.0053.R02.131213
[   24.208364]  0000000000000009 ffff88014e446c20 ffffffff816b5c48 ffff88014e446c68
[   24.219026]  ffff88014e446c58 ffffffff81085e9d ffff880147cf0000 0000000000000000
[   24.229676]  ffff88014e446d88 0000000000000000 ffff88014e446ef8 ffff88014e446cb8
[   24.240298] Call Trace:
[   24.245251]  <NMI>  [<ffffffff816b5c48>] dump_stack+0x45/0x56
[   24.254028]  [<ffffffff81085e9d>] warn_slowpath_common+0x7d/0xa0
[   24.263071]  [<ffffffff81085f0c>] warn_slowpath_fmt+0x4c/0x50
[   24.271802]  [<ffffffff81125af0>] ? restart_watchdog_hrtimer+0x50/0x50
[   24.281352]  [<ffffffff81125b8c>] watchdog_overflow_callback+0x9c/0xd0
[   24.290979]  [<ffffffff8115d55e>] __perf_event_overflow+0x8e/0x240
[   24.300211]  [<ffffffff81029e68>] ? x86_perf_event_set_period+0xe8/0x150
[   24.310053]  [<ffffffff8115e024>] perf_event_overflow+0x14/0x20
[   24.319014]  [<ffffffff8103123d>] intel_pmu_handle_irq+0x1cd/0x3c0
[   24.328264]  [<ffffffff816be02b>] perf_event_nmi_handler+0x2b/0x50
[   24.337519]  [<ffffffff816bd878>] nmi_handle.isra.3+0x88/0x180
[   24.346369]  [<ffffffff816bdad9>] do_nmi+0x169/0x310
[   24.354215]  [<ffffffff816bceb1>] end_repeat_nmi+0x1e/0x2e
[   24.362660]  [<ffffffff8119ffaf>] ? alloc_vmap_area+0x6f/0x310
[   24.371525]  [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915]
[   24.382215]  [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915]
[   24.392935]  [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915]
[   24.403590]  <<EOE>>  [<ffffffffa00f3c5c>] gen8_write32+0x10c/0x120 [i915]
[   24.413655]  [<ffffffffa00a2ad7>] gen8_gmch_probe+0xe7/0x190 [i915]
[   24.422984]  [<ffffffffa00a3fb2>] i915_gem_gtt_init+0x62/0x200 [i915]
[   24.432504]  [<ffffffffa00889dd>] i915_driver_load+0x43d/0xe10 [i915]
[   24.442000]  [<ffffffffa0026765>] ? drm_get_minor+0x1a5/0x200 [drm]
[   24.451322]  [<ffffffffa002683b>] drm_dev_register+0x7b/0x160 [drm]
[   24.460620]  [<ffffffffa0028cc0>] drm_get_pci_dev+0xa0/0x220 [drm]
[   24.469827]  [<ffffffffa00856fb>] i915_pci_probe+0x3b/0x60 [i915]
[   24.478917]  [<ffffffff813722a5>] local_pci_probe+0x45/0xa0
[   24.487382]  [<ffffffff81373621>] pci_device_probe+0xd1/0x130
[   24.496072]  [<ffffffff8143c5c5>] driver_probe_device+0x125/0x3a0
[   24.505141]  [<ffffffff8143c913>] __driver_attach+0x93/0xa0
[   24.513631]  [<ffffffff8143c880>] ? __device_attach+0x40/0x40
[   24.522301]  [<ffffffff8143a533>] bus_for_each_dev+0x63/0xa0
[   24.530871]  [<ffffffff8143bf7e>] driver_attach+0x1e/0x20
[   24.539114]  [<ffffffff8143bb60>] bus_add_driver+0x180/0x250
[   24.547625]  [<ffffffffa0147000>] ? 0xffffffffa0146fff
[   24.555554]  [<ffffffff8143cf44>] driver_register+0x64/0xf0
[   24.563824]  [<ffffffffa0147000>] ? 0xffffffffa0146fff
[   24.571716]  [<ffffffff81371c4b>] __pci_register_driver+0x4b/0x50
[   24.580701]  [<ffffffffa0028f5a>] drm_pci_init+0x11a/0x130 [drm]
[   24.589447]  [<ffffffffa0147000>] ? 0xffffffffa0146fff
[   24.597330]  [<ffffffffa014706a>] i915_init+0x6a/0x6c [i915]
[   24.605806]  [<ffffffff81002142>] do_one_initcall+0xd2/0x180
[   24.614278]  [<ffffffff810573a3>] ? set_memory_nx+0x43/0x50
[   24.622651]  [<ffffffff810fbe1b>] load_module+0x1e0b/0x25a0
[   24.631014]  [<ffffffff810f77a0>] ? store_uevent+0x40/0x40
[   24.639285]  [<ffffffff810f828a>] ? copy_module_from_fd.isra.47+0x12a/0x190
[   24.649242]  [<ffffffff810fc726>] SyS_finit_module+0x86/0xb0
[   24.657721]  [<ffffffff816c50e9>] system_call_fastpath+0x16/0x1b
[   24.666590] ---[ end trace d4228967fcb7ae4a ]---

Daniel Vetter March 17, 2014, 5:03 p.m. UTC | #9

On Mon, Mar 17, 2014 at 5:17 PM, Woodhouse, David
<david.woodhouse@intel.com> wrote:
> I think I have found this problem on the IOMMU side. We usually assume
> that RMRRs are for boot-time only, such as USB controllers for the
> legacy keyboard/mouse emulation. And a patch sneaked in which
> effectively *unmaps* the RMRR regions when you do the first "real"
> mapping for the driver. Having fixed that, I think I should no longer
> see these DMA faults.

Which commit is this? I've a patch+bug report that stolen + dmar is busted.
-Daniel

drm/i915/bdw: Fix GEN8 GTT size calculation

Commit Message

Comments

Patch