Message ID | 1440081299-9814-1-git-send-email-mika.kuoppala@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote: > If we leave the last_retired_head to pre-reset value, we might > end up in a situation where intel_ring_space() returns wrong > value on next hardware init. http://patchwork.freedesktop.org/patch/46612/ and earlier -Chris
On 20/08/2015 16:27, Chris Wilson wrote: > On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote: >> If we leave the last_retired_head to pre-reset value, we might >> end up in a situation where intel_ring_space() returns wrong >> value on next hardware init. > > http://patchwork.freedesktop.org/patch/46612/ > and earlier > -Chris > Hi Chris, I see the warning even with below batch, [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant http://patchwork.freedesktop.org/patch/46601/ the following patch need to be updated as it uses olr, [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of the request http://patchwork.freedesktop.org/patch/46612/ Do we need some of the previous patches in series as well? This patch is fixing the issue in the current code, do you think we can get this in its current state? regards Arun
On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote: > On 20/08/2015 16:27, Chris Wilson wrote: >> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote: >>> If we leave the last_retired_head to pre-reset value, we might >>> end up in a situation where intel_ring_space() returns wrong >>> value on next hardware init. >> >> http://patchwork.freedesktop.org/patch/46612/ >> and earlier >> -Chris >> > Hi Chris, > > I see the warning even with below batch, > > [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant > http://patchwork.freedesktop.org/patch/46601/ > > the following patch need to be updated as it uses olr, > [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of > the request > http://patchwork.freedesktop.org/patch/46612/ > > Do we need some of the previous patches in series as well? > > This patch is fixing the issue in the current code, do you think we can > get this in its current state? Is this patch still valid? BR, Jani.
Jani Nikula <jani.nikula@linux.intel.com> writes: > On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote: >> On 20/08/2015 16:27, Chris Wilson wrote: >>> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote: >>>> If we leave the last_retired_head to pre-reset value, we might >>>> end up in a situation where intel_ring_space() returns wrong >>>> value on next hardware init. >>> >>> http://patchwork.freedesktop.org/patch/46612/ >>> and earlier >>> -Chris >>> >> Hi Chris, >> >> I see the warning even with below batch, >> >> [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant >> http://patchwork.freedesktop.org/patch/46601/ >> >> the following patch need to be updated as it uses olr, >> [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of >> the request >> http://patchwork.freedesktop.org/patch/46612/ >> >> Do we need some of the previous patches in series as well? >> >> This patch is fixing the issue in the current code, do you think we can >> get this in its current state? > > Is this patch still valid? > My understanding is that Chris wants more throughout revamp of the code so that we have a common ringbuffer init. And then go further and remove the special reset handling in execlist case. But argument for this patch is that it fixes a bug in current nightly and it makes the legacy ring init and execlist ring init identical how they set/reset the ring space. Until we gain the generic ringbuffer init code. Chris? -Mika > BR, > Jani. > > > -- > Jani Nikula, Intel Open Source Technology Center
On Fri, Oct 23, 2015 at 01:39:24PM +0300, Mika Kuoppala wrote: > Jani Nikula <jani.nikula@linux.intel.com> writes: > > > On Wed, 02 Sep 2015, Arun Siluvery <arun.siluvery@linux.intel.com> wrote: > >> On 20/08/2015 16:27, Chris Wilson wrote: > >>> On Thu, Aug 20, 2015 at 05:34:59PM +0300, Mika Kuoppala wrote: > >>>> If we leave the last_retired_head to pre-reset value, we might > >>>> end up in a situation where intel_ring_space() returns wrong > >>>> value on next hardware init. > >>> > >>> http://patchwork.freedesktop.org/patch/46612/ > >>> and earlier > >>> -Chris > >>> > >> Hi Chris, > >> > >> I see the warning even with below batch, > >> > >> [Intel-gfx] [PATCH 50/70] drm/i915: The argument for postfix is redundant > >> http://patchwork.freedesktop.org/patch/46601/ > >> > >> the following patch need to be updated as it uses olr, > >> [Intel-gfx] [PATCH 51/70] drm/i915: Record the position of the start of > >> the request > >> http://patchwork.freedesktop.org/patch/46612/ > >> > >> Do we need some of the previous patches in series as well? > >> > >> This patch is fixing the issue in the current code, do you think we can > >> get this in its current state? > > > > Is this patch still valid? > > > > My understanding is that Chris wants more throughout > revamp of the code so that we have a common ringbuffer init. > And then go further and remove the special reset handling in > execlist case. > > But argument for this patch is that it fixes a bug in current > nightly and it makes the legacy ring init and execlist ring init > identical how they set/reset the ring space. Until we gain > the generic ringbuffer init code. My argument is that we can progress the former with a simple generic patch here that is agnostic of the legacy/execlists duplication. -Chris
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 60c683e..61c99e9 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -2507,5 +2507,7 @@ void intel_lr_context_reset(struct drm_device *dev, ringbuf->head = 0; ringbuf->tail = 0; + ringbuf->last_retired_head = -1; + intel_ring_update_space(ringbuf); } }
If we leave the last_retired_head to pre-reset value, we might end up in a situation where intel_ring_space() returns wrong value on next hardware init. The recent GuC changes made ringbuffer size much smaller. Thus the odds grew that we got pre-reset last_retired_head in a value so that intel_ring_space() returned too small value after reset for mocs values to fit, triggering the following trace: [ 337.622311] ------------[ cut here ]------------ [ 337.622362] WARNING: CPU: 0 PID: 1355 at drivers/gpu/drm/i915/intel_lrc.c:678 intel_logical_ring_begin+0x171/0x1dd [i915]() [ 337.622365] WARN_ON(&target->list == &ring->request_list) [ 337.622368] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy x86_pkg_temp_thermal coretemp kvm_intel snd_seq_oss kvm microcode snd_seq_midi asix joydev snd_rawmidi usbnet snd_seq_midi_event input_leds serio_raw snd_seq snd_seq_device snd_timer i915 snd soundcore drm_kms_helper shpchp syscopyarea sysfillrect sysimgblt fb_sys_fops drm wmi battery ipv6 bnep video bluetooth rfkill ac parport_pc button ppdev acpi_cpufreq lp parport sdhci_pci sdhci led_class mmc_core [ 337.622461] CPU: 0 PID: 1355 Comm: kworker/u16:2 Tainted: G U W 4.2.0-rc5-drm-intel-nightly-ww33+ #1 [ 337.622465] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.X093.B02.1507222151 07/22/2015 [ 337.622496] Workqueue: i915-hangcheck i915_hangcheck_elapsed [i915] [ 337.622500] 0000000000000009 ffff8800370838e8 ffffffff81897fb7 0000000000000000 [ 337.622508] ffff880037083938 ffff880037083928 ffffffff810467c3 ffff880037083908 [ 337.622515] ffffffffa02a6aea ffff8800873400c0 ffff88016d79e480 0000000000000000 [ 337.622523] Call Trace: [ 337.622533] [<ffffffff81897fb7>] dump_stack+0x45/0x57 [ 337.622540] [<ffffffff810467c3>] warn_slowpath_common+0xa1/0xbb [ 337.622584] [<ffffffffa02a6aea>] ? intel_logical_ring_begin+0x171/0x1dd [i915] [ 337.622590] [<ffffffff81046823>] warn_slowpath_fmt+0x46/0x48 [ 337.622632] [<ffffffffa02a6aea>] intel_logical_ring_begin+0x171/0x1dd [i915] [ 337.622674] [<ffffffffa02a986d>] intel_rcs_context_init_mocs+0x170/0x2a9 [i915] [ 337.622714] [<ffffffffa02a6ef5>] ? gen8_emit_flush_render+0x174/0x18f [i915] [ 337.622753] [<ffffffffa02a77f0>] gen8_init_rcs_context+0x9d/0x1f9 [i915] [ 337.622792] [<ffffffffa02a7248>] ? intel_logical_ring_reserve_space+0x26/0x2a [i915] [ 337.622827] [<ffffffffa028ba91>] i915_gem_context_enable+0x26/0x4d [i915] [ 337.622866] [<ffffffffa029a1b5>] i915_gem_init_hw+0x285/0x371 [i915] [ 337.622892] [<ffffffffa0268a10>] i915_reset+0xe2/0x132 [i915] [ 337.622920] [<ffffffffa026cd71>] i915_reset_and_wakeup+0xd3/0x133 [i915] [ 337.622948] [<ffffffffa02709e2>] i915_handle_error+0x5ab/0x5bd [i915] [ 337.622956] [<ffffffff810962bf>] ? vprintk_default+0x1d/0x1f [ 337.622962] [<ffffffff8189398b>] ? printk+0x46/0x48 [ 337.622990] [<ffffffffa0270dbc>] i915_hangcheck_elapsed+0x387/0x3a7 [i915] [ 337.622996] [<ffffffff8105c889>] process_one_work+0x225/0x409 [ 337.623001] [<ffffffff8105c80a>] ? process_one_work+0x1a6/0x409 [ 337.623007] [<ffffffff8105d2ce>] worker_thread+0x275/0x369 [ 337.623013] [<ffffffff8105d059>] ? cancel_delayed_work_sync+0x15/0x15 [ 337.623019] [<ffffffff81061ec8>] kthread+0xed/0xf5 [ 337.623027] [<ffffffff81061ddb>] ? kthread_create_on_node+0x1ac/0x1ac [ 337.623033] [<ffffffff818a030f>] ret_from_fork+0x3f/0x70 [ 337.623039] [<ffffffff81061ddb>] ? kthread_create_on_node+0x1ac/0x1ac [ 337.623043] ---[ end trace bbf071dfdac9d9da ]--- [ 337.623081] [drm:gen8_init_rcs_context [i915]] *ERROR* MOCS failed to program: expect performance issues. Setup last_retired_head correctly on context reset and recalculate free space for ringbuf. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91634 Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com> --- drivers/gpu/drm/i915/intel_lrc.c | 2 ++ 1 file changed, 2 insertions(+)