diff mbox

drm/i915: Init PPGTT before context enable

Message ID 1421774502.13994.200.camel@infradead.org (mailing list archive)
State New, archived
Headers show

Commit Message

David Woodhouse Jan. 20, 2015, 5:21 p.m. UTC
Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
ppgtt") introduced a regression on Broadwell, triggering the following
IOMMU fault at startup:

  vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
  dmar: DRHD: handling fault status reg 2
  dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
  DMAR:[fault reason 23] Unknown
  fbcon: inteldrmfb (fb0) is primary device

Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
Cc: stable@vger.kernel.org
---
[17:01] <danvet> can you pls submit this as a patch to intel-gfx?
[17:01] <danvet> with the comment above context_enable binned, since that's outdated now anyway

 drivers/gpu/drm/i915/i915_gem.c | 19 ++++++-------------
 1 file changed, 6 insertions(+), 13 deletions(-)

Comments

Daniel Vetter Jan. 21, 2015, 9:35 a.m. UTC | #1
On Tue, Jan 20, 2015 at 05:21:42PM +0000, David Woodhouse wrote:
> Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
> ppgtt") introduced a regression on Broadwell, triggering the following
> IOMMU fault at startup:
> 
>   vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
>   dmar: DRHD: handling fault status reg 2
>   dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
>   DMAR:[fault reason 23] Unknown
>   fbcon: inteldrmfb (fb0) is primary device
> 
> Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> Cc: stable@vger.kernel.org

I sugggested this change to David after staring at the offending patch for
a while. I have no idea and theory whatsoever why this would upset the gpu
less than the other way round. But it seems to work. David promised to
chase hw people a bit more to get a more meaningful answer.

Wrt the comment that this deletes: I've done some digging and afaict
loading context before ppgtt enable was once required before our recent
restructuring of the context/ppgtt init code: Before that context sw setup
(i.e. allocating the default context) and hw setup was smashed together.
Also the setup of the default context was the bit that actually allocated
the aliasing ppgtt structures. Which is the reason for the context before
ppgtt depency.

Or was, since with all the untangling there's no no real depency any more
(functional, who knows what the hw is doing), so the comment is just
stale.

Jani, can you pls paste my elaboration into the commit message when
applying? With that this is:

Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

> ---
> [17:01] <danvet> can you pls submit this as a patch to intel-gfx?
> [17:01] <danvet> with the comment above context_enable binned, since that's outdated now anyway
> 
>  drivers/gpu/drm/i915/i915_gem.c | 19 ++++++-------------
>  1 file changed, 6 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c11603b..b02a3f3 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4884,25 +4884,18 @@ i915_gem_init_hw(struct drm_device *dev)
>  	for (i = 0; i < NUM_L3_SLICES(dev); i++)
>  		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
>  
> -	/*
> -	 * XXX: Contexts should only be initialized once. Doing a switch to the
> -	 * default context switch however is something we'd like to do after
> -	 * reset or thaw (the latter may not actually be necessary for HW, but
> -	 * goes with our code better). Context switching requires rings (for
> -	 * the do_switch), but before enabling PPGTT. So don't move this.
> -	 */
> -	ret = i915_gem_context_enable(dev_priv);
> +	ret = i915_ppgtt_init_hw(dev);
>  	if (ret && ret != -EIO) {
> -		DRM_ERROR("Context enable failed %d\n", ret);
> +		DRM_ERROR("PPGTT enable failed %d\n", ret);
>  		i915_gem_cleanup_ringbuffer(dev);
> -
> -		return ret;
>  	}
>  
> -	ret = i915_ppgtt_init_hw(dev);
> +	ret = i915_gem_context_enable(dev_priv);
>  	if (ret && ret != -EIO) {
> -		DRM_ERROR("PPGTT enable failed %d\n", ret);
> +		DRM_ERROR("Context enable failed %d\n", ret);
>  		i915_gem_cleanup_ringbuffer(dev);
> +
> +		return ret;
>  	}
>  
>  	return ret;
> -- 
> 2.1.0
> 
> 
> 
> 
> -- 
> David Woodhouse                            Open Source Technology Centre
> David.Woodhouse@intel.com                              Intel Corporation
>
Chris Wilson Jan. 21, 2015, 9:51 a.m. UTC | #2
On Wed, Jan 21, 2015 at 10:35:58AM +0100, Daniel Vetter wrote:
> On Tue, Jan 20, 2015 at 05:21:42PM +0000, David Woodhouse wrote:
> > Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
> > ppgtt") introduced a regression on Broadwell, triggering the following
> > IOMMU fault at startup:
> > 
> >   vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
> >   dmar: DRHD: handling fault status reg 2
> >   dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
> >   DMAR:[fault reason 23] Unknown
> >   fbcon: inteldrmfb (fb0) is primary device
> > 
> > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
> > Cc: stable@vger.kernel.org
> 
> I sugggested this change to David after staring at the offending patch for
> a while. I have no idea and theory whatsoever why this would upset the gpu
> less than the other way round. But it seems to work. David promised to
> chase hw people a bit more to get a more meaningful answer.

The issue is likely the execution of the golden render state batch
concurrently with the flip over to ppgtt. The GPU throws a pagefault and
we get an ERROR reported.

http://patchwork.freedesktop.org/patch/38270/
http://patchwork.freedesktop.org/patch/38269/
-Chris
Jani Nikula Jan. 21, 2015, 4:33 p.m. UTC | #3
On Wed, 21 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Wed, Jan 21, 2015 at 10:35:58AM +0100, Daniel Vetter wrote:
>> On Tue, Jan 20, 2015 at 05:21:42PM +0000, David Woodhouse wrote:
>> > Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
>> > ppgtt") introduced a regression on Broadwell, triggering the following
>> > IOMMU fault at startup:
>> > 
>> >   vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
>> >   dmar: DRHD: handling fault status reg 2
>> >   dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
>> >   DMAR:[fault reason 23] Unknown
>> >   fbcon: inteldrmfb (fb0) is primary device
>> > 
>> > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
>> > Cc: stable@vger.kernel.org
>> 
>> I sugggested this change to David after staring at the offending patch for
>> a while. I have no idea and theory whatsoever why this would upset the gpu
>> less than the other way round. But it seems to work. David promised to
>> chase hw people a bit more to get a more meaningful answer.
>
> The issue is likely the execution of the golden render state batch
> concurrently with the flip over to ppgtt. The GPU throws a pagefault and
> we get an ERROR reported.
>
> http://patchwork.freedesktop.org/patch/38270/
> http://patchwork.freedesktop.org/patch/38269/

Pushed the revert to drm-intel-fixes, it's cc: stable and it's getting
late in the rc's too. Thanks for the patch and review.

BR,
Jani.


> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Jani Nikula Jan. 21, 2015, 5:56 p.m. UTC | #4
On Wed, 21 Jan 2015, Jani Nikula <jani.nikula@linux.intel.com> wrote:
> On Wed, 21 Jan 2015, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> On Wed, Jan 21, 2015 at 10:35:58AM +0100, Daniel Vetter wrote:
>>> On Tue, Jan 20, 2015 at 05:21:42PM +0000, David Woodhouse wrote:
>>> > Commit 82460d972 ("drm/i915: Rework ppgtt init to no require an aliasing
>>> > ppgtt") introduced a regression on Broadwell, triggering the following
>>> > IOMMU fault at startup:
>>> > 
>>> >   vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
>>> >   dmar: DRHD: handling fault status reg 2
>>> >   dmar: DMAR:[DMA Write] Request device [00:02.0] fault addr 880000
>>> >   DMAR:[fault reason 23] Unknown
>>> >   fbcon: inteldrmfb (fb0) is primary device
>>> > 
>>> > Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
>>> > Cc: stable@vger.kernel.org
>>> 
>>> I sugggested this change to David after staring at the offending patch for
>>> a while. I have no idea and theory whatsoever why this would upset the gpu
>>> less than the other way round. But it seems to work. David promised to
>>> chase hw people a bit more to get a more meaningful answer.
>>
>> The issue is likely the execution of the golden render state batch
>> concurrently with the flip over to ppgtt. The GPU throws a pagefault and
>> we get an ERROR reported.
>>
>> http://patchwork.freedesktop.org/patch/38270/
>> http://patchwork.freedesktop.org/patch/38269/
>
> Pushed the revert to drm-intel-fixes, it's cc: stable and it's getting
> late in the rc's too. Thanks for the patch and review.

s/revert/David's patch/ to be clear.

BR,
Jani.


>
> BR,
> Jani.
>
>
>> -Chris
>>
>> -- 
>> Chris Wilson, Intel Open Source Technology Centre
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Jani Nikula, Intel Open Source Technology Center
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c11603b..b02a3f3 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4884,25 +4884,18 @@  i915_gem_init_hw(struct drm_device *dev)
 	for (i = 0; i < NUM_L3_SLICES(dev); i++)
 		i915_gem_l3_remap(&dev_priv->ring[RCS], i);
 
-	/*
-	 * XXX: Contexts should only be initialized once. Doing a switch to the
-	 * default context switch however is something we'd like to do after
-	 * reset or thaw (the latter may not actually be necessary for HW, but
-	 * goes with our code better). Context switching requires rings (for
-	 * the do_switch), but before enabling PPGTT. So don't move this.
-	 */
-	ret = i915_gem_context_enable(dev_priv);
+	ret = i915_ppgtt_init_hw(dev);
 	if (ret && ret != -EIO) {
-		DRM_ERROR("Context enable failed %d\n", ret);
+		DRM_ERROR("PPGTT enable failed %d\n", ret);
 		i915_gem_cleanup_ringbuffer(dev);
-
-		return ret;
 	}
 
-	ret = i915_ppgtt_init_hw(dev);
+	ret = i915_gem_context_enable(dev_priv);
 	if (ret && ret != -EIO) {
-		DRM_ERROR("PPGTT enable failed %d\n", ret);
+		DRM_ERROR("Context enable failed %d\n", ret);
 		i915_gem_cleanup_ringbuffer(dev);
+
+		return ret;
 	}
 
 	return ret;