Message ID | 20221005175251.3586272-2-imre.deak@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] drm/i915: Take display INIT power for GPU reset/restore | expand |
On Wed, Oct 05, 2022 at 08:52:51PM +0300, Imre Deak wrote: > The GPU reset involves a display suspend/resume sequence, but this is > done without suspending/resuming the encoders. The display reset path is there for the old platforms which can't reset the gt stuff separately from the display engine. And the only reason we started to force that codepath on more modern platforms was to make sure it doesn't break all the time. That used to happen quite regularly, but not sure if we even had any pre-g4x hw in CI at the time. I suspect it's probably a mistake to start piling on more code in there just to make it work on really modern hw. The old hw where it actually matters doesn't need any of that code after all. Well, unless we manage to make it just call some simple high level "suspend display + resume display" pair of functions and nothing else. That would probably be nice simplification in general, but iirc currently it's much more ad-hoc than that.
On Wed, Oct 05, 2022 at 10:14:43PM +0300, Ville Syrjälä wrote: > On Wed, Oct 05, 2022 at 08:52:51PM +0300, Imre Deak wrote: > > The GPU reset involves a display suspend/resume sequence, but this is > > done without suspending/resuming the encoders. > > The display reset path is there for the old platforms which > can't reset the gt stuff separately from the display engine. > And the only reason we started to force that codepath on more > modern platforms was to make sure it doesn't break all the time. > That used to happen quite regularly, but not sure if we even had > any pre-g4x hw in CI at the time. > > I suspect it's probably a mistake to start piling on more > code in there just to make it work on really modern hw. > The old hw where it actually matters doesn't need any of > that code after all. Ok, but for the !clobbers_display case the current resume sequence is broken imo. So if this fix is not acceptable how about only restoring modeset_restore_state in this case without reading out the HW state first (to keep some test coverage still) or removing the force_reset_modeset_test? > Well, unless we manage to make it just call some simple high > level "suspend display + resume display" pair of functions > and nothing else. That would probably be nice simplification > in general, but iirc currently it's much more ad-hoc than that. I agree, but I'd say that should be done as a follow-up (just calling the same functions during both system supend/resume and reset I suppose) and have a simpler fix for the current issue. > > -- > Ville Syrjälä > Intel
On Thu, Oct 06, 2022 at 12:01:17AM +0300, Imre Deak wrote: > On Wed, Oct 05, 2022 at 10:14:43PM +0300, Ville Syrjälä wrote: > > On Wed, Oct 05, 2022 at 08:52:51PM +0300, Imre Deak wrote: > > > The GPU reset involves a display suspend/resume sequence, but this is > > > done without suspending/resuming the encoders. > > > > The display reset path is there for the old platforms which > > can't reset the gt stuff separately from the display engine. > > And the only reason we started to force that codepath on more > > modern platforms was to make sure it doesn't break all the time. > > That used to happen quite regularly, but not sure if we even had > > any pre-g4x hw in CI at the time. > > > > I suspect it's probably a mistake to start piling on more > > code in there just to make it work on really modern hw. > > The old hw where it actually matters doesn't need any of > > that code after all. > > Ok, but for the !clobbers_display case the current resume sequence is > broken imo. So if this fix is not acceptable how about only restoring > modeset_restore_state in this case without reading out the HW state > first (to keep some test coverage still) or removing the > force_reset_modeset_test? So the conclusion from our chat was to nuke all the extra junk from the simulated path and leave it with just the commit_duplicated_state(). I think that's still sufficient test of the display vs. reset path since it should still grab the modeset locks and whatnot. Well, sufficient assuming it actually works :)
On Thu, Oct 06, 2022 at 10:24:45PM +0300, Ville Syrjälä wrote: > On Thu, Oct 06, 2022 at 12:01:17AM +0300, Imre Deak wrote: > > On Wed, Oct 05, 2022 at 10:14:43PM +0300, Ville Syrjälä wrote: > > > On Wed, Oct 05, 2022 at 08:52:51PM +0300, Imre Deak wrote: > > > > The GPU reset involves a display suspend/resume sequence, but this is > > > > done without suspending/resuming the encoders. > > > > > > The display reset path is there for the old platforms which > > > can't reset the gt stuff separately from the display engine. > > > And the only reason we started to force that codepath on more > > > modern platforms was to make sure it doesn't break all the time. > > > That used to happen quite regularly, but not sure if we even had > > > any pre-g4x hw in CI at the time. > > > > > > I suspect it's probably a mistake to start piling on more > > > code in there just to make it work on really modern hw. > > > The old hw where it actually matters doesn't need any of > > > that code after all. > > > > Ok, but for the !clobbers_display case the current resume sequence is > > broken imo. So if this fix is not acceptable how about only restoring > > modeset_restore_state in this case without reading out the HW state > > first (to keep some test coverage still) or removing the > > force_reset_modeset_test? > > So the conclusion from our chat was to nuke all the extra > junk from the simulated path and leave it with just the > commit_duplicated_state(). I think that's still sufficient > test of the display vs. reset path since it should still > grab the modeset locks and whatnot. Well, sufficient > assuming it actually works :) Ok, it seems to work at least on ADLP, also fixing the original issue, so can follow up with this.
diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c index 8d2cb4904f965..48373cce769cd 100644 --- a/drivers/gpu/drm/i915/display/intel_display.c +++ b/drivers/gpu/drm/i915/display/intel_display.c @@ -75,6 +75,7 @@ #include "g4x_dp.h" #include "g4x_hdmi.h" #include "hsw_ips.h" +#include "i915_driver.h" #include "i915_drv.h" #include "i915_utils.h" #include "icl_dsi.h" @@ -881,6 +882,7 @@ void intel_display_prepare_reset(struct drm_i915_private *dev_priv) struct drm_device *dev = &dev_priv->drm; struct drm_modeset_acquire_ctx *ctx = &dev_priv->reset_ctx; struct drm_atomic_state *state; + intel_wakeref_t wakeref; int ret; if (!HAS_DISPLAY(dev_priv)) @@ -937,6 +939,9 @@ void intel_display_prepare_reset(struct drm_i915_private *dev_priv) dev_priv->modeset_restore_state = state; state->acquire_ctx = ctx; + + with_intel_display_power(dev_priv, POWER_DOMAIN_INIT, wakeref) + i915_driver_suspend_encoders(dev_priv); } void intel_display_finish_reset(struct drm_i915_private *i915) @@ -961,6 +966,7 @@ void intel_display_finish_reset(struct drm_i915_private *i915) /* reset doesn't touch the display */ if (!gpu_reset_clobbers_display(i915)) { + drm_mode_config_reset(&i915->drm); /* for testing only restore the display */ ret = __intel_display_resume(i915, state, ctx); if (ret) @@ -972,6 +978,7 @@ void intel_display_finish_reset(struct drm_i915_private *i915) * so need a full re-initialization. */ intel_pps_unlock_regs_wa(i915); + drm_mode_config_reset(&i915->drm); intel_modeset_init_hw(i915); intel_init_clock_gating(i915); intel_hpd_init(i915); diff --git a/drivers/gpu/drm/i915/i915_driver.c b/drivers/gpu/drm/i915/i915_driver.c index fb3826dabe8b6..59c18010e08a3 100644 --- a/drivers/gpu/drm/i915/i915_driver.c +++ b/drivers/gpu/drm/i915/i915_driver.c @@ -1091,18 +1091,24 @@ static void i915_driver_postclose(struct drm_device *dev, struct drm_file *file) i915_gem_flush_free_objects(to_i915(dev)); } -static void intel_suspend_encoders(struct drm_i915_private *dev_priv) +void i915_driver_suspend_encoders(struct drm_i915_private *i915) { - struct drm_device *dev = &dev_priv->drm; struct intel_encoder *encoder; - if (!HAS_DISPLAY(dev_priv)) - return; - - drm_modeset_lock_all(dev); - for_each_intel_encoder(dev, encoder) + for_each_intel_encoder(&i915->drm, encoder) if (encoder->suspend) encoder->suspend(encoder); +} + +static void intel_suspend_encoders(struct drm_i915_private *dev_priv) +{ + struct drm_device *dev = &dev_priv->drm; + + if (!HAS_DISPLAY(dev_priv)) + return; + + drm_modeset_lock_all(dev); + i915_driver_suspend_encoders(dev_priv); drm_modeset_unlock_all(dev); } diff --git a/drivers/gpu/drm/i915/i915_driver.h b/drivers/gpu/drm/i915/i915_driver.h index 44ec543d92cb3..20a66ad9d9ca4 100644 --- a/drivers/gpu/drm/i915/i915_driver.h +++ b/drivers/gpu/drm/i915/i915_driver.h @@ -27,6 +27,8 @@ void i915_driver_shutdown(struct drm_i915_private *i915); int i915_driver_resume_switcheroo(struct drm_i915_private *i915); int i915_driver_suspend_switcheroo(struct drm_i915_private *i915, pm_message_t state); +void i915_driver_suspend_encoders(struct drm_i915_private *i915); + void i915_print_iommu_status(struct drm_i915_private *i915, struct drm_printer *p);
The GPU reset involves a display suspend/resume sequence, but this is done without suspending/resuming the encoders. The encoder HW readout code during resume however assumes that the encoders were suspended/resumed, at least on TypeC platforms where the TC PHYs must be left in a disconnected state during encoder-suspend, and the PHY's TypeC mode must be initialized already during encoder-resume. Fix the above by suspending/resuming the encoders during GPU reset on all platforms, which also fixes the WARN below introduced by commit a82796a2e332 ("drm/i915: Fix TypeC mode initialization during system resume") <4> [319.983309] ------------[ cut here ]------------ <4> [319.983313] i915 0000:00:02.0: drm_WARN_ON(dig_port->tc_link_refcount != 1) <4> [319.983341] WARNING: CPU: 10 PID: 268 at drivers/gpu/drm/i915/display/intel_tc.c:751 intel_tc_port_sanitize_mode+0x239/0x290 [i915] <4> [319.983407] Modules linked in: fuse snd_hda_codec_hdmi i915 x86_pkg_temp_thermal mei_hdcp coretemp wmi_bmof r8153_ecm cdc_ether kvm_intel usbnet r8152 mii kvm prime_numbers snd_hda_intel ttm snd_intel_dspcfg irqbypass drm_buddy e1000e crct10dif_pclmul snd_hda_codec crc32_pclmul drm_display_helper ptp snd_hwdep ghash_clmulni_intel snd_hda_core drm_kms_helper pps_core mei_me syscopyarea video i2c_i801 snd_pcm sysfillrect i2c_smbus sysimgblt mei fb_sys_fops intel_lpss_pci wmi <4> [319.983483] CPU: 10 PID: 268 Comm: kworker/10:1H Not tainted 6.0.0-rc7-CI_DRM_12200-g394e575b57e9+ #1 <4> [319.983486] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P LP5 RVP, BIOS ADLPFWI1.R00.2313.A00.2107301001 07/30/2021 <4> [319.983488] Workqueue: events_highpri heartbeat [i915] <4> [319.983536] RIP: 0010:intel_tc_port_sanitize_mode+0x239/0x290 [i915] <4> [319.983600] Code: 85 d2 75 03 48 8b 17 48 89 14 24 e8 e1 dc 2d e1 48 8b 14 24 48 c7 c1 f8 db 5b a0 48 c7 c7 3e 3c 5e a0 48 89 c6 e8 45 d7 66 e1 <0f> 0b e9 20 fe ff ff 0f 0b 49 c7 c0 8b 3c 5e a0 e9 9e fe ff ff 48 <4> [319.983601] RSP: 0018:ffffc90001617a30 EFLAGS: 00010286 <4> [319.983604] RAX: 0000000000000000 RBX: ffff88811f9d2000 RCX: 0000000000000001 <4> [319.983606] RDX: 0000000080000001 RSI: ffffffff8231e8cd RDI: 00000000ffffffff <4> [319.983607] RBP: ffff888121e98000 R08: 0000000000000000 R09: c0000000ffffc134 <4> [319.983608] R10: 00000000000d6078 R11: ffffc900016178c8 R12: ffff88811f9d3838 <4> [319.983609] R13: ffff88811f9d397d R14: ffff888121e98000 R15: 0000000000000000 <4> [319.983611] FS: 0000000000000000(0000) GS:ffff8882a7300000(0000) knlGS:0000000000000000 <4> [319.983612] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [319.983613] CR2: 00007fe7397f1e18 CR3: 0000000006612003 CR4: 0000000000770ee0 <4> [319.983615] PKRU: 55555554 <4> [319.983616] Call Trace: <4> [319.983617] <TASK> <4> [319.983621] intel_ddi_sync_state+0x3f/0x90 [i915] <4> [319.983698] intel_modeset_setup_hw_state+0x3a3/0x1440 [i915] <4> [319.983777] ? intel_gt_reset_global+0xeb/0x160 [i915] <4> [319.983839] ? __intel_display_resume+0x15/0xe0 [i915] <4> [319.983909] __intel_display_resume+0x15/0xe0 [i915] <4> [319.983979] intel_display_finish_reset+0x58/0x130 [i915] <4> [319.984048] intel_gt_reset_global+0xf3/0x160 [i915] <4> [319.984107] ? intel_reset_guc.cold.62+0x5d/0x5d [i915] <4> [319.984189] ? 0xffffffff81000000 <4> [319.984192] ? queue_work_node+0x90/0x90 <4> [319.984202] intel_gt_handle_error+0x2c2/0x410 [i915] <4> [319.984267] ? _raw_spin_unlock_irqrestore+0x54/0x70 <4> [319.984271] ? lockdep_hardirqs_on+0xbf/0x140 <4> [319.984276] ? intel_guc_find_hung_context+0x19e/0x1d0 [i915] <4> [319.984352] reset_engine+0x99/0xd0 [i915] <4> [319.984399] ? __drm_printfn_seq_file+0x20/0x20 <4> [319.984406] heartbeat+0x4cd/0x4f0 [i915] <4> [319.984454] process_one_work+0x272/0x5b0 <4> [319.984461] worker_thread+0x37/0x370 <4> [319.984465] ? process_one_work+0x5b0/0x5b0 <4> [319.984467] kthread+0xed/0x120 <4> [319.984470] ? kthread_complete_and_exit+0x20/0x20 <4> [319.984474] ret_from_fork+0x1f/0x30 <4> [319.984484] </TASK> <4> [319.984485] irq event stamp: 36107 <4> [319.984487] hardirqs last enabled at (36113): [<ffffffff811391d6>] __up_console_sem+0x66/0x70 <4> [319.984492] hardirqs last disabled at (36118): [<ffffffff811391bb>] __up_console_sem+0x4b/0x70 <4> [319.984494] softirqs last enabled at (34316): [<ffffffff81e00323>] __do_softirq+0x323/0x48e <4> [319.984497] softirqs last disabled at (34309): [<ffffffff810c16b8>] irq_exit_rcu+0xb8/0xe0 <4> [319.984499] ---[ end trace 0000000000000000 ]--- Closes: https://gitlab.freedesktop.org/drm/intel/issues/7021 Cc: Mika Kahola <mika.kahola@intel.com> Signed-off-by: Imre Deak <imre.deak@intel.com> --- drivers/gpu/drm/i915/display/intel_display.c | 7 +++++++ drivers/gpu/drm/i915/i915_driver.c | 20 +++++++++++++------- drivers/gpu/drm/i915/i915_driver.h | 2 ++ 3 files changed, 22 insertions(+), 7 deletions(-)