diff mbox

drm/i915: Update ring space correctly on lrc context reset

Message ID 1440081299-9814-1-git-send-email-mika.kuoppala@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Mika Kuoppala Aug. 20, 2015, 2:34 p.m. UTC
If we leave the last_retired_head to pre-reset value, we might
end up in a situation where intel_ring_space() returns wrong
value on next hardware init.

The recent GuC changes made ringbuffer size much smaller. Thus
the odds grew that we got pre-reset last_retired_head in
a value so that intel_ring_space() returned too small value after
reset for mocs values to fit, triggering the following trace:

[  337.622311] ------------[ cut here ]------------
[  337.622362] WARNING: CPU: 0 PID: 1355 at drivers/gpu/drm/i915/intel_lrc.c:678 intel_logical_ring_begin+0x171/0x1dd [i915]()
[  337.622365] WARN_ON(&target->list == &ring->request_list)
[  337.622368] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy x86_pkg_temp_thermal coretemp kvm_intel snd_seq_oss kvm microcode snd_seq_midi asix joydev snd_rawmidi usbnet snd_seq_midi_event input_leds serio_raw snd_seq snd_seq_device snd_timer i915 snd soundcore drm_kms_helper shpchp syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi battery ipv6 bnep video bluetooth rfkill ac parport_pc button ppdev acpi_cpufreq lp parport sdhci_pci sdhci led_class mmc_core
[  337.622461] CPU: 0 PID: 1355 Comm: kworker/u16:2 Tainted: G     U  W       4.2.0-rc5-drm-intel-nightly-ww33+ #1
[  337.622465] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.X093.B02.1507222151 07/22/2015
[  337.622496] Workqueue: i915-hangcheck i915_hangcheck_elapsed [i915]
[  337.622500]  0000000000000009 ffff8800370838e8 ffffffff81897fb7 0000000000000000
[  337.622508]  ffff880037083938 ffff880037083928 ffffffff810467c3 ffff880037083908
[  337.622515]  ffffffffa02a6aea ffff8800873400c0 ffff88016d79e480 0000000000000000
[  337.622523] Call Trace:
[  337.622533]  [<ffffffff81897fb7>] dump_stack+0x45/0x57
[  337.622540]  [<ffffffff810467c3>] warn_slowpath_common+0xa1/0xbb
[  337.622584]  [<ffffffffa02a6aea>] ? intel_logical_ring_begin+0x171/0x1dd [i915]
[  337.622590]  [<ffffffff81046823>] warn_slowpath_fmt+0x46/0x48
[  337.622632]  [<ffffffffa02a6aea>] intel_logical_ring_begin+0x171/0x1dd [i915]
[  337.622674]  [<ffffffffa02a986d>] intel_rcs_context_init_mocs+0x170/0x2a9 [i915]
[  337.622714]  [<ffffffffa02a6ef5>] ? gen8_emit_flush_render+0x174/0x18f [i915]
[  337.622753]  [<ffffffffa02a77f0>] gen8_init_rcs_context+0x9d/0x1f9 [i915]
[  337.622792]  [<ffffffffa02a7248>] ? intel_logical_ring_reserve_space+0x26/0x2a [i915]
[  337.622827]  [<ffffffffa028ba91>] i915_gem_context_enable+0x26/0x4d [i915]
[  337.622866]  [<ffffffffa029a1b5>] i915_gem_init_hw+0x285/0x371 [i915]
[  337.622892]  [<ffffffffa0268a10>] i915_reset+0xe2/0x132 [i915]
[  337.622920]  [<ffffffffa026cd71>] i915_reset_and_wakeup+0xd3/0x133 [i915]
[  337.622948]  [<ffffffffa02709e2>] i915_handle_error+0x5ab/0x5bd [i915]
[  337.622956]  [<ffffffff810962bf>] ? vprintk_default+0x1d/0x1f
[  337.622962]  [<ffffffff8189398b>] ? printk+0x46/0x48
[  337.622990]  [<ffffffffa0270dbc>] i915_hangcheck_elapsed+0x387/0x3a7 [i915]
[  337.622996]  [<ffffffff8105c889>] process_one_work+0x225/0x409
[  337.623001]  [<ffffffff8105c80a>] ? process_one_work+0x1a6/0x409
[  337.623007]  [<ffffffff8105d2ce>] worker_thread+0x275/0x369
[  337.623013]  [<ffffffff8105d059>] ? cancel_delayed_work_sync+0x15/0x15
[  337.623019]  [<ffffffff81061ec8>] kthread+0xed/0xf5
[  337.623027]  [<ffffffff81061ddb>] ? kthread_create_on_node+0x1ac/0x1ac
[  337.623033]  [<ffffffff818a030f>] ret_from_fork+0x3f/0x70
[  337.623039]  [<ffffffff81061ddb>] ? kthread_create_on_node+0x1ac/0x1ac
[  337.623043] ---[ end trace bbf071dfdac9d9da ]---
[  337.623081] [drm:gen8_init_rcs_context [i915]] *ERROR* MOCS failed to program: expect performance issues.

Setup last_retired_head correctly on context reset and
recalculate free space for ringbuf.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91634
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Chris Wilson Aug. 20, 2015, 3:27 p.m. UTC | #1
On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote:
> If we leave the last_retired_head to pre-reset value, we might
> end up in a situation where intel_ring_space() returns wrong
> value on next hardware init.

http://patchwork.freedesktop.org/patch/46612/
and earlier
-Chris
arun.siluvery@linux.intel.com Sept. 2, 2015, 11:20 a.m. UTC | #2
On 20/08/2015 16:27, Chris Wilson wrote:
> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote:
>> If we leave the last_retired_head to pre-reset value, we might
>> end up in a situation where intel_ring_space() returns wrong
>> value on next hardware init.
>
> http://patchwork.freedesktop.org/patch/46612/
> and earlier
> -Chris
>
Hi Chris,

I see the warning even with below batch,

[Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant
http://patchwork.freedesktop.org/patch/46601/

the following patch need to be updated as it uses olr,
[Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of 
the request
http://patchwork.freedesktop.org/patch/46612/

Do we need some of the previous patches in series as well?

This patch is fixing the issue in the current code, do you think we can 
get this in its current state?

regards
Arun
Jani Nikula Oct. 13, 2015, 1:02 p.m. UTC | #3
On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote:
> On 20/08/2015 16:27, Chris Wilson wrote:
>> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote:
>>> If we leave the last_retired_head to pre-reset value, we might
>>> end up in a situation where intel_ring_space() returns wrong
>>> value on next hardware init.
>>
>> http://patchwork.freedesktop.org/patch/46612/
>> and earlier
>> -Chris
>>
> Hi Chris,
>
> I see the warning even with below batch,
>
> [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant
> http://patchwork.freedesktop.org/patch/46601/
>
> the following patch need to be updated as it uses olr,
> [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of 
> the request
> http://patchwork.freedesktop.org/patch/46612/
>
> Do we need some of the previous patches in series as well?
>
> This patch is fixing the issue in the current code, do you think we can 
> get this in its current state?

Is this patch still valid?

BR,
Jani.
Mika Kuoppala Oct. 23, 2015, 10:39 a.m. UTC | #4
Jani Nikula <jani.nikula@linux.intel.com> writes:

> On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote:
>> On 20/08/2015 16:27, Chris Wilson wrote:
>>> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote:
>>>> If we leave the last_retired_head to pre-reset value, we might
>>>> end up in a situation where intel_ring_space() returns wrong
>>>> value on next hardware init.
>>>
>>> http://patchwork.freedesktop.org/patch/46612/
>>> and earlier
>>> -Chris
>>>
>> Hi Chris,
>>
>> I see the warning even with below batch,
>>
>> [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant
>> http://patchwork.freedesktop.org/patch/46601/
>>
>> the following patch need to be updated as it uses olr,
>> [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of 
>> the request
>> http://patchwork.freedesktop.org/patch/46612/
>>
>> Do we need some of the previous patches in series as well?
>>
>> This patch is fixing the issue in the current code, do you think we can 
>> get this in its current state?
>
> Is this patch still valid?
>

My understanding is that Chris wants more throughout
revamp of the code so that we have a common ringbuffer init.
And then go further and remove the special reset handling in
execlist case.

But argument for this patch is that it fixes a bug in current
nightly and it makes the legacy ring init and execlist ring init
identical how they set/reset the ring space. Until we gain
the generic ringbuffer init code.

Chris?

-Mika

> BR,
> Jani.
>
>
> -- 
> Jani Nikula, Intel Open Source Technology Center
Chris Wilson Oct. 23, 2015, 10:51 a.m. UTC | #5
On Fri, Oct 23, 2015 at 01:39:24PM +0300, Mika Kuoppala wrote:
> Jani Nikula <jani.nikula@linux.intel.com> writes:
> 
> > On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote:
> >> On 20/08/2015 16:27, Chris Wilson wrote:
> >>> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote:
> >>>> If we leave the last_retired_head to pre-reset value, we might
> >>>> end up in a situation where intel_ring_space() returns wrong
> >>>> value on next hardware init.
> >>>
> >>> http://patchwork.freedesktop.org/patch/46612/
> >>> and earlier
> >>> -Chris
> >>>
> >> Hi Chris,
> >>
> >> I see the warning even with below batch,
> >>
> >> [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant
> >> http://patchwork.freedesktop.org/patch/46601/
> >>
> >> the following patch need to be updated as it uses olr,
> >> [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of 
> >> the request
> >> http://patchwork.freedesktop.org/patch/46612/
> >>
> >> Do we need some of the previous patches in series as well?
> >>
> >> This patch is fixing the issue in the current code, do you think we can 
> >> get this in its current state?
> >
> > Is this patch still valid?
> >
> 
> My understanding is that Chris wants more throughout
> revamp of the code so that we have a common ringbuffer init.
> And then go further and remove the special reset handling in
> execlist case.
> 
> But argument for this patch is that it fixes a bug in current
> nightly and it makes the legacy ring init and execlist ring init
> identical how they set/reset the ring space. Until we gain
> the generic ringbuffer init code.

My argument is that we can progress the former with a simple generic patch
here that is agnostic of the legacy/execlists duplication.
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 60c683e..61c99e9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2507,5 +2507,7 @@  void intel_lr_context_reset(struct drm_device *dev,
 
 		ringbuf->head = 0;
 		ringbuf->tail = 0;
+		ringbuf->last_retired_head = -1;
+		intel_ring_update_space(ringbuf);
 	}
 }