diff mbox

drm/i915: Initialize HWS page address after GPU reset

Message ID 1433272019-21818-1-git-send-email-arun.siluvery@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

arun.siluvery@linux.intel.com June 2, 2015, 7:06 p.m. UTC
After GPU reset, HW is losing the address of HWS page in the register.
The page itself is valid except that HW is not aware of its location.

[   64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
[   64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000

This patch reloads this value into the register during ring init.

Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Shuang He June 3, 2015, 3:13 a.m. UTC | #1
Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6526
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                                  276/276              276/276
ILK                                  303/303              303/303
SNB                 -1              315/315              314/315
IVB                                  343/343              343/343
BYT                                  287/287              287/287
BDW                                  321/321              321/321
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*SNB  igt@pm_rpm@dpms-mode-unset-non-lpsp      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)WARNING:at_drivers/gpu/drm/i915/intel_uncore.c:#assert_device_not_suspended[i915]()@WARNING:.* at .* assert_device_not_suspended+0x
Note: You need to pay more attention to line start with '*'
Ville Syrjälä June 3, 2015, 4:14 p.m. UTC | #2
On Tue, Jun 02, 2015 at 08:06:59PM +0100, Arun Siluvery wrote:
> After GPU reset, HW is losing the address of HWS page in the register.
> The page itself is valid except that HW is not aware of its location.
> 
> [   64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
> [   64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
> [   64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
> [   64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
> 
> This patch reloads this value into the register during ring init.
> 
> Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371
> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0413b8f..73033f5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1085,6 +1085,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
>  	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
>  
> +	if (ring->status_page.obj) {
> +		I915_WRITE(RING_HWS_PGA(ring->mmio_base),
> +			   (u32)ring->status_page.gfx_addr);
> +		POSTING_READ(RING_HWS_PGA(ring->mmio_base));
> +	}

I was going to suggest removing the same thing from the
lrc_setup_hardware_status_page(), but after another look it seems we
sometimes call .init_hw() before the context setup. Would be nice to
have a more consistent sequence for init and reset. But anyway the patch
looks OK to me. I verified that we indeed lose this register on GPU
reset.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> +
>  	I915_WRITE(RING_MODE_GEN7(ring),
>  		   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
>  		   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
> -- 
> 2.3.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Jani Nikula June 4, 2015, 8:18 a.m. UTC | #3
On Wed, 03 Jun 2015, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> On Tue, Jun 02, 2015 at 08:06:59PM +0100, Arun Siluvery wrote:
>> After GPU reset, HW is losing the address of HWS page in the register.
>> The page itself is valid except that HW is not aware of its location.
>> 
>> [   64.368623] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
>> [   64.368655] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
>> [   64.368681] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
>> [   64.368704] [drm:gen8_init_common_ring [i915]] *ERROR* HWS Page address = 0x00000000
>> 
>> This patch reloads this value into the register during ring init.
>> 
>> Change-Id: Ibdd1e4645921b4deb02dfc8d0d8e6ba993ce7371
>> Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
>> ---
>>  drivers/gpu/drm/i915/intel_lrc.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 0413b8f..73033f5 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1085,6 +1085,12 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>>  	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
>>  	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
>>  
>> +	if (ring->status_page.obj) {
>> +		I915_WRITE(RING_HWS_PGA(ring->mmio_base),
>> +			   (u32)ring->status_page.gfx_addr);
>> +		POSTING_READ(RING_HWS_PGA(ring->mmio_base));
>> +	}
>
> I was going to suggest removing the same thing from the
> lrc_setup_hardware_status_page(), but after another look it seems we
> sometimes call .init_hw() before the context setup. Would be nice to
> have a more consistent sequence for init and reset. But anyway the patch
> looks OK to me. I verified that we indeed lose this register on GPU
> reset.
>
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Pushed to drm-intel-fixes, thanks for the patch and review.

BR,
Jani.


>
>> +
>>  	I915_WRITE(RING_MODE_GEN7(ring),
>>  		   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
>>  		   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));
>> -- 
>> 2.3.0
>> 
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Daniel Vetter June 15, 2015, 5:20 a.m. UTC | #4
On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> I was going to suggest removing the same thing from the
> lrc_setup_hardware_status_page(), but after another look it seems we
> sometimes call .init_hw() before the context setup. Would be nice to
> have a more consistent sequence for init and reset. But anyway the patch
> looks OK to me. I verified that we indeed lose this register on GPU
> reset.

Yep, this is a mess. And historically _any_ difference between driver
load and gpu reset (or resume fwiw) has lead to hilarious bugs, so
this difference is really troubling to me. Arun, can you please work
on a patch to unify the setup sequence here, so that both driver load
gpu resets work the same way? By the time we're calling gem_init_hw
the default context should have been created already, and hence we
should be able to write to HWS_PGA in ring->init_hw only.

Also I wonder about resume, where's the HWS_PGA restore for that case?
-Daniel
arun.siluvery@linux.intel.com June 18, 2015, 2:05 p.m. UTC | #5
On 15/06/2015 06:20, Daniel Vetter wrote:
> On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
>> I was going to suggest removing the same thing from the
>> lrc_setup_hardware_status_page(), but after another look it seems we
>> sometimes call .init_hw() before the context setup. Would be nice to
>> have a more consistent sequence for init and reset. But anyway the patch
>> looks OK to me. I verified that we indeed lose this register on GPU
>> reset.
>
> Yep, this is a mess. And historically _any_ difference between driver
> load and gpu reset (or resume fwiw) has lead to hilarious bugs, so
> this difference is really troubling to me. Arun, can you please work
> on a patch to unify the setup sequence here, so that both driver load
> gpu resets work the same way? By the time we're calling gem_init_hw
> the default context should have been created already, and hence we
> should be able to write to HWS_PGA in ring->init_hw only.
>

Hi Daniel,

I think the problem in this case was the code to init HWS page after 
reset was missing for Gen8+. For Gen7 we are doing this as part of 
ring->init_hw.

Gen7:
i915_reset()
+--> i915_gem_init_hw()
+------> ring->init_hw() which is init_render_ring()
+----------> init_ring_common()
+------------> intel_ring_setup_status_page()

Gen8:
i915_reset()
+--> i915_gem_init_hw()
+------> ring->init_hw() which is gen8_init_render_ring()
+--------> gen8_init_common_ring() - I added changes in this function.

We could probably use intel_ring_setup_status_page() for both cases, 
does it have to be Gen7 specific?

> Also I wonder about resume, where's the HWS_PGA restore for that case?
It is covered.

i915_drm_resume()
+-->i915_gem_init_hw

regards
Arun

> -Daniel
>
Daniel Vetter June 18, 2015, 2:53 p.m. UTC | #6
On Thu, Jun 18, 2015 at 03:05:12PM +0100, Siluvery, Arun wrote:
> On 15/06/2015 06:20, Daniel Vetter wrote:
> >On Wed, Jun 3, 2015 at 6:14 PM, Ville Syrjälä
> ><ville.syrjala@linux.intel.com> wrote:
> >>I was going to suggest removing the same thing from the
> >>lrc_setup_hardware_status_page(), but after another look it seems we
> >>sometimes call .init_hw() before the context setup. Would be nice to
> >>have a more consistent sequence for init and reset. But anyway the patch
> >>looks OK to me. I verified that we indeed lose this register on GPU
> >>reset.
> >
> >Yep, this is a mess. And historically _any_ difference between driver
> >load and gpu reset (or resume fwiw) has lead to hilarious bugs, so
> >this difference is really troubling to me. Arun, can you please work
> >on a patch to unify the setup sequence here, so that both driver load
> >gpu resets work the same way? By the time we're calling gem_init_hw
> >the default context should have been created already, and hence we
> >should be able to write to HWS_PGA in ring->init_hw only.
> >
> 
> Hi Daniel,
> 
> I think the problem in this case was the code to init HWS page after reset
> was missing for Gen8+. For Gen7 we are doing this as part of ring->init_hw.
> 
> Gen7:
> i915_reset()
> +--> i915_gem_init_hw()
> +------> ring->init_hw() which is init_render_ring()
> +----------> init_ring_common()
> +------------> intel_ring_setup_status_page()
> 
> Gen8:
> i915_reset()
> +--> i915_gem_init_hw()
> +------> ring->init_hw() which is gen8_init_render_ring()
> +--------> gen8_init_common_ring() - I added changes in this function.
> 
> We could probably use intel_ring_setup_status_page() for both cases, does it
> have to be Gen7 specific?

My concern isn't that we have two functions doing hws setup. My concern is
that we now have 2 callsites for execlist mode doing hws setup, with
slight differences between reset/driver load and resume.

I want one, unconditional call to set up the hws page at exactly the right
place in the setup sequence. That might require some refactoring, I
haven't looked that closely at intel_lrc.c

The usual approach is that gem_init does exclusively software setup, and
gem_init_hw does all the register writes an actual enabling. I think the
hws setup in the driver load code is currently called from gem_init(),
which is the wrong place.

> >Also I wonder about resume, where's the HWS_PGA restore for that case?
> It is covered.
> 
> i915_drm_resume()
> +-->i915_gem_init_hw

Ok, so should be covered with whatever fix we have for gpu reset.

Thanks, Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0413b8f..73033f5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1085,6 +1085,12 @@  static int gen8_init_common_ring(struct intel_engine_cs *ring)
 	I915_WRITE_IMR(ring, ~(ring->irq_enable_mask | ring->irq_keep_mask));
 	I915_WRITE(RING_HWSTAM(ring->mmio_base), 0xffffffff);
 
+	if (ring->status_page.obj) {
+		I915_WRITE(RING_HWS_PGA(ring->mmio_base),
+			   (u32)ring->status_page.gfx_addr);
+		POSTING_READ(RING_HWS_PGA(ring->mmio_base));
+	}
+
 	I915_WRITE(RING_MODE_GEN7(ring),
 		   _MASKED_BIT_DISABLE(GFX_REPLAY_MODE) |
 		   _MASKED_BIT_ENABLE(GFX_RUN_LIST_ENABLE));