Message ID | 1394816286-31200-1-git-send-email-benjamin.widawsky@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote: > The preliminary HW support check is no longer needed, and the > calculation is simplified while here. > > Reported-by: David Woodhouse <David.Woodhouse@intel.com> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > --- > drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +-------- > 1 file changed, 1 insertion(+), 8 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > index 40a2b36..694112a 100644 > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) > { > bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT; > bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK; > - if (bdw_gmch_ctl) > - bdw_gmch_ctl = 1 << bdw_gmch_ctl; > - if (bdw_gmch_ctl > 4) { > - WARN_ON(!i915_preliminary_hw_support); > - return 4<<20; > - } > - > - return bdw_gmch_ctl << 20; > + return 1 << (bdw_gmch_ctl + 20); I don't have this in my tree, and it seems to never have existed in upstream ... /me is confused Cheers, Daniel > } > > static inline size_t gen6_get_stolen_size(u16 snb_gmch_ctl) > -- > 1.9.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On Fri, Mar 14, 2014 at 07:40:30PM +0100, Daniel Vetter wrote: > On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote: > > The preliminary HW support check is no longer needed, and the > > calculation is simplified while here. > > > > Reported-by: David Woodhouse <David.Woodhouse@intel.com> > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > --- > > drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +-------- > > 1 file changed, 1 insertion(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > index 40a2b36..694112a 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) > > { > > bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT; > > bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK; > > - if (bdw_gmch_ctl) > > - bdw_gmch_ctl = 1 << bdw_gmch_ctl; > > - if (bdw_gmch_ctl > 4) { > > - WARN_ON(!i915_preliminary_hw_support); > > - return 4<<20; > > - } > > - > > - return bdw_gmch_ctl << 20; > > + return 1 << (bdw_gmch_ctl + 20); > > I don't have this in my tree, and it seems to never have existed in > upstream ... > > /me is confused > > Cheers, Daniel > This was based off of linus master, since I assumed that is what David was using. Honestly, I hadn't looked at our Intel trees. Maybe we're good here already.
On Fri, Mar 14, 2014 at 12:14:05PM -0700, Ben Widawsky wrote: > On Fri, Mar 14, 2014 at 07:40:30PM +0100, Daniel Vetter wrote: > > On Fri, Mar 14, 2014 at 09:58:06AM -0700, Ben Widawsky wrote: > > > The preliminary HW support check is no longer needed, and the > > > calculation is simplified while here. > > > > > > Reported-by: David Woodhouse <David.Woodhouse@intel.com> > > > Signed-off-by: Ben Widawsky <ben@bwidawsk.net> > > > --- > > > drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +-------- > > > 1 file changed, 1 insertion(+), 8 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > index 40a2b36..694112a 100644 > > > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c > > > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c > > > @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) > > > { > > > bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT; > > > bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK; > > > - if (bdw_gmch_ctl) > > > - bdw_gmch_ctl = 1 << bdw_gmch_ctl; > > > - if (bdw_gmch_ctl > 4) { > > > - WARN_ON(!i915_preliminary_hw_support); > > > - return 4<<20; > > > - } > > > - > > > - return bdw_gmch_ctl << 20; > > > + return 1 << (bdw_gmch_ctl + 20); > > > > I don't have this in my tree, and it seems to never have existed in > > upstream ... > > > > /me is confused > > > > Cheers, Daniel > > > > This was based off of linus master, since I assumed that is what David > was using. Honestly, I hadn't looked at our Intel trees. Maybe we're > good here already. Queued for -next, thanks for the patch. -Daniel
On Fri, Mar 14, 2014 at 8:14 PM, Ben Widawsky <ben@bwidawsk.net> wrote: > This was based off of linus master, since I assumed that is what David > was using. Honestly, I hadn't looked at our Intel trees. Maybe we're > good here already. Ok, now I was even more confused ... Turns out this was part of "drm/i915/bdw: Limit GTT to 2GB" which is now reverted in dinq. Hence why I didn't spot it anywhere. Maybe poke Jani to queue this up for -fixes? Oh and ignore my merge notice mail, that's just the other silly confused danvet who escaped from the chains ... -Daniel
On Fri, 2014-03-14 at 12:14 -0700, Ben Widawsky wrote: > > This was based off of linus master, since I assumed that is what David > was using. Indeed. Now it's much better. Now that nasty scary warning is gone away and now I just have a few hundred of these (from your other patch before which it crashed instead): [ 6.167463] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up ... [ 35.443214] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up ... followed by these until I turn it off: [ 38.190001] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ade47000 [ 38.190001] DMAR:[fault reason 06] PTE Read access is not set [ 38.204801] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr add5e000 [ 38.204801] DMAR:[fault reason 06] PTE Read access is not set [ 38.219598] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr adc75000 [ 38.219598] DMAR:[fault reason 06] PTE Read access is not set As I noted in private though, there's an RMRR instructing us to set up a 1:1 mapping for the entire ad000000-af7fffff range. An instruction which we appear to be following. So I have no idea why we're taking faults for those addresses; that might be a problem in IOMMU code. Will look into that further...
On Fri, Mar 14, 2014 at 11:05 PM, Woodhouse, David <david.woodhouse@intel.com> wrote: > On Fri, 2014-03-14 at 12:14 -0700, Ben Widawsky wrote: >> >> This was based off of linus master, since I assumed that is what David >> was using. > > Indeed. Now it's much better. Now that nasty scary warning is gone away > and now I just have a few hundred of these (from your other patch before > which it crashed instead): > [ 6.167463] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up > ... > [ 35.443214] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up drm-intel-fixes should help: http://cgit.freedesktop.org/drm-intel/commit/?id=5c673b60a9b3b23486f4eda75c72e91d31d26a2b > ... followed by these until I turn it off: > [ 38.190001] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr ade47000 > [ 38.190001] DMAR:[fault reason 06] PTE Read access is not set > [ 38.204801] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr add5e000 > [ 38.204801] DMAR:[fault reason 06] PTE Read access is not set > [ 38.219598] dmar: DMAR:[DMA Read] Request device [00:02.0] fault addr adc75000 > [ 38.219598] DMAR:[fault reason 06] PTE Read access is not set > > As I noted in private though, there's an RMRR instructing us to set up a > 1:1 mapping for the entire ad000000-af7fffff range. An instruction which > we appear to be following. So I have no idea why we're taking faults for > those addresses; that might be a problem in IOMMU code. Will look into > that further... I've asked you on private irc whether this range matches/overlaps with stolen - we know of things blowing up at least on earlier generations in combination with dmar. Please boot with drm.debug=0xe and scan for the stolen mem reporting: [ 9.459273] [drm:i915_gem_init_stolen], found 67108864 bytes of stolen memory at 9ba00000 Your range has roughly an offset/size stolen usually has. Thanks, Daniel
On Fri, 2014-03-14 at 23:10 +0100, Daniel Vetter wrote: > > I've asked you on private irc whether this range matches/overlaps with > stolen - we know of things blowing up at least on earlier generations > in combination with dmar. Please boot with drm.debug=0xe and scan for > the stolen mem reporting: Surely it *has* to be stolen? That's the whole *point* in the RMRR that the BIOS provides, telling us that the gfx unit is expecting to do DMA to this range of memory. If it isn't stolen, it's just being wantonly "borrowed". Have you *ever* known an RMRR point at memory other than the stolen range? I think I have found this problem on the IOMMU side. We usually assume that RMRRs are for boot-time only, such as USB controllers for the legacy keyboard/mouse emulation. And a patch sneaked in which effectively *unmaps* the RMRR regions when you do the first "real" mapping for the driver. Having fixed that, I think I should no longer see these DMA faults. Can't tell though, because my machine is still dying in an endless stream of [ 199.647850] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* Interrupt arrived before CRTCs were setup up
On Mon, 2014-03-17 at 16:17 +0000, David Woodhouse wrote: > > Can't tell though, because my machine is still dying in an endless > stream of > [ 199.647850] [drm:intel_set_cpu_fifo_underrun_reporting] *ERROR* > Interrupt arrived before CRTCs were setup up Except when the failure mode is different... [ 24.143180] ------------[ cut here ]------------ [ 24.150433] WARNING: CPU: 1 PID: 161 at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xd0() [ 24.162945] Watchdog detected hard LOCKUP on cpu 1 [ 24.168206] Modules linked in: i915(+) i2c_algo_bit drm_kms_helper drm i2c_core video [ 24.181743] CPU: 1 PID: 161 Comm: systemd-udevd Not tainted 3.14.0-rc5+ #54 [ 24.191833] Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1, BIOS BDW-E2R1.86C.0053.R02.131213 [ 24.208364] 0000000000000009 ffff88014e446c20 ffffffff816b5c48 ffff88014e446c68 [ 24.219026] ffff88014e446c58 ffffffff81085e9d ffff880147cf0000 0000000000000000 [ 24.229676] ffff88014e446d88 0000000000000000 ffff88014e446ef8 ffff88014e446cb8 [ 24.240298] Call Trace: [ 24.245251] <NMI> [<ffffffff816b5c48>] dump_stack+0x45/0x56 [ 24.254028] [<ffffffff81085e9d>] warn_slowpath_common+0x7d/0xa0 [ 24.263071] [<ffffffff81085f0c>] warn_slowpath_fmt+0x4c/0x50 [ 24.271802] [<ffffffff81125af0>] ? restart_watchdog_hrtimer+0x50/0x50 [ 24.281352] [<ffffffff81125b8c>] watchdog_overflow_callback+0x9c/0xd0 [ 24.290979] [<ffffffff8115d55e>] __perf_event_overflow+0x8e/0x240 [ 24.300211] [<ffffffff81029e68>] ? x86_perf_event_set_period+0xe8/0x150 [ 24.310053] [<ffffffff8115e024>] perf_event_overflow+0x14/0x20 [ 24.319014] [<ffffffff8103123d>] intel_pmu_handle_irq+0x1cd/0x3c0 [ 24.328264] [<ffffffff816be02b>] perf_event_nmi_handler+0x2b/0x50 [ 24.337519] [<ffffffff816bd878>] nmi_handle.isra.3+0x88/0x180 [ 24.346369] [<ffffffff816bdad9>] do_nmi+0x169/0x310 [ 24.354215] [<ffffffff816bceb1>] end_repeat_nmi+0x1e/0x2e [ 24.362660] [<ffffffff8119ffaf>] ? alloc_vmap_area+0x6f/0x310 [ 24.371525] [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915] [ 24.382215] [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915] [ 24.392935] [<ffffffffa00f1e2b>] ? __gen6_gt_force_wake_mt_get+0x7b/0x150 [i915] [ 24.403590] <<EOE>> [<ffffffffa00f3c5c>] gen8_write32+0x10c/0x120 [i915] [ 24.413655] [<ffffffffa00a2ad7>] gen8_gmch_probe+0xe7/0x190 [i915] [ 24.422984] [<ffffffffa00a3fb2>] i915_gem_gtt_init+0x62/0x200 [i915] [ 24.432504] [<ffffffffa00889dd>] i915_driver_load+0x43d/0xe10 [i915] [ 24.442000] [<ffffffffa0026765>] ? drm_get_minor+0x1a5/0x200 [drm] [ 24.451322] [<ffffffffa002683b>] drm_dev_register+0x7b/0x160 [drm] [ 24.460620] [<ffffffffa0028cc0>] drm_get_pci_dev+0xa0/0x220 [drm] [ 24.469827] [<ffffffffa00856fb>] i915_pci_probe+0x3b/0x60 [i915] [ 24.478917] [<ffffffff813722a5>] local_pci_probe+0x45/0xa0 [ 24.487382] [<ffffffff81373621>] pci_device_probe+0xd1/0x130 [ 24.496072] [<ffffffff8143c5c5>] driver_probe_device+0x125/0x3a0 [ 24.505141] [<ffffffff8143c913>] __driver_attach+0x93/0xa0 [ 24.513631] [<ffffffff8143c880>] ? __device_attach+0x40/0x40 [ 24.522301] [<ffffffff8143a533>] bus_for_each_dev+0x63/0xa0 [ 24.530871] [<ffffffff8143bf7e>] driver_attach+0x1e/0x20 [ 24.539114] [<ffffffff8143bb60>] bus_add_driver+0x180/0x250 [ 24.547625] [<ffffffffa0147000>] ? 0xffffffffa0146fff [ 24.555554] [<ffffffff8143cf44>] driver_register+0x64/0xf0 [ 24.563824] [<ffffffffa0147000>] ? 0xffffffffa0146fff [ 24.571716] [<ffffffff81371c4b>] __pci_register_driver+0x4b/0x50 [ 24.580701] [<ffffffffa0028f5a>] drm_pci_init+0x11a/0x130 [drm] [ 24.589447] [<ffffffffa0147000>] ? 0xffffffffa0146fff [ 24.597330] [<ffffffffa014706a>] i915_init+0x6a/0x6c [i915] [ 24.605806] [<ffffffff81002142>] do_one_initcall+0xd2/0x180 [ 24.614278] [<ffffffff810573a3>] ? set_memory_nx+0x43/0x50 [ 24.622651] [<ffffffff810fbe1b>] load_module+0x1e0b/0x25a0 [ 24.631014] [<ffffffff810f77a0>] ? store_uevent+0x40/0x40 [ 24.639285] [<ffffffff810f828a>] ? copy_module_from_fd.isra.47+0x12a/0x190 [ 24.649242] [<ffffffff810fc726>] SyS_finit_module+0x86/0xb0 [ 24.657721] [<ffffffff816c50e9>] system_call_fastpath+0x16/0x1b [ 24.666590] ---[ end trace d4228967fcb7ae4a ]---
On Mon, Mar 17, 2014 at 5:17 PM, Woodhouse, David <david.woodhouse@intel.com> wrote: > I think I have found this problem on the IOMMU side. We usually assume > that RMRRs are for boot-time only, such as USB controllers for the > legacy keyboard/mouse emulation. And a patch sneaked in which > effectively *unmaps* the RMRR regions when you do the first "real" > mapping for the driver. Having fixed that, I think I should no longer > see these DMA faults. Which commit is this? I've a patch+bug report that stolen + dmar is busted. -Daniel
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 40a2b36..694112a 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1250,14 +1250,7 @@ static inline unsigned int gen8_get_total_gtt_size(u16 bdw_gmch_ctl) { bdw_gmch_ctl >>= BDW_GMCH_GGMS_SHIFT; bdw_gmch_ctl &= BDW_GMCH_GGMS_MASK; - if (bdw_gmch_ctl) - bdw_gmch_ctl = 1 << bdw_gmch_ctl; - if (bdw_gmch_ctl > 4) { - WARN_ON(!i915_preliminary_hw_support); - return 4<<20; - } - - return bdw_gmch_ctl << 20; + return 1 << (bdw_gmch_ctl + 20); } static inline size_t gen6_get_stolen_size(u16 snb_gmch_ctl)
The preliminary HW support check is no longer needed, and the calculation is simplified while here. Reported-by: David Woodhouse <David.Woodhouse@intel.com> Signed-off-by: Ben Widawsky <ben@bwidawsk.net> --- drivers/gpu/drm/i915/i915_gem_gtt.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-)