Revert "drm/msm/dp: Remove INIT_SETUP delay"

Message ID	ebbcd56ac883d3c3d3024d368fab63d26e02637a@lausen.nl (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> MIME-Version: 1.0 Date: Mon, 08 May 2023 01:06:13 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Leonard Lausen" <leonard@lausen.nl> Message-ID: <ebbcd56ac883d3c3d3024d368fab63d26e02637a@lausen.nl> TLS-Required: No Subject: [PATCH] Revert "drm/msm/dp: Remove INIT_SETUP delay" To: regressions@lists.linux.dev, "Bjorn Andersson" <quic_bjorande@quicinc.com>, "Dmitry Baryshkov" <dmitry.baryshkov@linaro.org>, "Rob Clark" <robdclark@gmail.com>, "Abhinav Kumar" <quic_abhinavk@quicinc.com>, "Stephen Boyd" <swboyd@chromium.org>, "Kuogee Hsieh" <quic_khsieh@quicinc.com>, "Johan Hovold" <johan+linaro@kernel.org>, "Sankeerth Billakanti" <quic_sbillaka@quicinc.com> Precedence: list Cc: Sean Paul <sean@poorly.run>, freedreno@lists.freedesktop.org, Nikita Travkin <nikita@trvn.ru>, linux-kernel@vger.kernel.org, dri-devel@lists.freedesktop.org, linux-arm-msm@vger.kernel.org Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>
Series	Revert "drm/msm/dp: Remove INIT_SETUP delay" \| expand Revert "drm/msm/dp: Remove INIT_SETUP delay"

Leonard Lausen May 8, 2023, 1:06 a.m. UTC

This reverts commit e17af1c9d861dc177e5b56009bd4f71ace688d97.

Removing the delay of 100 units broke hot plug detection for USB-C displays on
qcom sc7180 lazor devices. Lazor uses mdss for hot plug detection and declares
dp_hot_plug_det in the dts. Other sc7180 based devices like aspire1 were not
affected by the regression, as they do not rely on mdss and dp_hot_plug_det for
hot plug detection.

Signed-off-by: Leonard Lausen <leonard@lausen.nl>
Tested-by: Leonard Lausen <leonard@lausen.nl> # Trogdor (sc7180)
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

---
 drivers/gpu/drm/msm/dp/dp_display.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Leonard Lausen May 8, 2023, 11:02 a.m. UTC | #1

Abhinav Kumar <quic_abhinavk@quicinc.com> writes:
> On 5/7/2023 7:15 PM, Bjorn Andersson wrote:
>> When booting with the cable connected on my X13s, 100 is long enough for
>> my display to time out and require me to disconnect and reconnect the
>> cable again.
>> 
>> Do we have any idea of why the reduction to 0 is causing an issue when
>> using the internal HPD?
>> 
>> Regards,
>> Bjorn
> Yes, we do know why this is causing an issue. The cleaner patch for this 
> will be posted this week.

Great!

> There is no need to add the 100ms delay back yet.
> 
> thanks for posting this but NAK on this patch till we post the fix this 
> week.
>
> Appreciate a bit of patience till then.

This regression is already part of the 6.3 stable release series. Will
the new patch qualify for inclusion in 6.3.y? Or will it be part of 6.4
and this revert should go into 6.3.y?

Even with this revert, there are additional regressions in 6.3 causing
dpu errors and blank external display upon suspending and resuming the
system while an external display is connected. Will your new patch also
fix these regressions?

[  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488] [dpu error]vblank timeout
[  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait for commit done returned -110
[  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu error]enc33 frame done timeout

followed by a kernel panic if any modification to the display settings
is done, such as disabling the external display:

[  341.631287] Hardware name: Google Lazor (rev3 - 8) (DT)
[  341.631290] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  341.631296] pc : do_raw_spin_unlock+0xb8/0xc4
[  341.631310] lr : do_raw_spin_unlock+0x78/0xc4
[  341.631315] sp : ffffffc01100b880
[  341.631317] x29: ffffffc01100b880 x28: 0000000000000028 x27: 0000000000000038
[  341.631326] x26: ffffff808c89e180 x25: ffffffef33e39920 x24: 0000000000000000
[  341.631333] x23: ffffffef33e3ca0c x22: 0000000000000002 x21: ffffff808345ded8
[  341.631339] x20: ffffff808345ded0 x19: 000000000000001e x18: 0000000000000000
[  341.631345] x17: 0048000000000460 x16: 0441043b04600438 x15: 04380000089807d0
[  341.631351] x14: 07b0089807800780 x13: 0000000000000068 x12: 0000000000000001
[  341.631357] x11: ffffffef3413bb76 x10: 0000000000000bb0 x9 : ffffffef33e3d6bc
[  341.631363] x8 : ffffff808c89ed90 x7 : ffffff80b1c9f738 x6 : 0000000000000001
[  341.631370] x5 : 0000000000000000 x4 : 0000000000000000 x3 : ffffff808345def0
[  341.631375] x2 : 00000000dead4ead x1 : 0000000000000003 x0 : 0000000000000000
[  341.631383] Kernel panic - not syncing: Asynchronous SError Interrupt
[  341.631386] CPU: 3 PID: 1520 Comm: kwin_wayland Not tainted 6.3.0-stb-cbq+ #2
[  341.631390] Hardware name: Google Lazor (rev3 - 8) (DT)
[  341.631393] Call trace:
[  341.631395]  dump_backtrace+0xc8/0x104
[  341.631402]  show_stack+0x20/0x30
[  341.631407]  dump_stack_lvl+0x48/0x60
[  341.631414]  dump_stack+0x18/0x24
[  341.631419]  panic+0x130/0x2fc
[  341.631425]  nmi_panic+0x54/0x78
[  341.631428]  arm64_serror_panic+0x74/0x80
[  341.631434]  arm64_is_fatal_ras_serror+0x6c/0x8c
[  341.631439]  do_serror+0x48/0x60
[  341.631444]  el1h_64_error_handler+0x30/0x48
[  341.631450]  el1h_64_error+0x68/0x6c
[  341.631455]  do_raw_spin_unlock+0xb8/0xc4
[  341.631460]  _raw_spin_unlock_irq+0x18/0x38
[  341.631466]  __wait_for_common+0xb8/0x154
[  341.631472]  wait_for_completion_timeout+0x28/0x34
[  341.631477]  dp_ctrl_push_idle+0x3c/0x88
[  341.631483]  dp_bridge_disable+0x20/0x2c
[  341.631488]  drm_atomic_bridge_chain_disable+0x8c/0xb8
[  341.631495]  drm_atomic_helper_commit_modeset_disables+0x198/0x450
[  341.631501]  msm_atomic_commit_tail+0x1c8/0x36c
[  341.631507]  commit_tail+0x80/0x108
[  341.631512]  drm_atomic_helper_commit+0x114/0x118
[  341.631516]  drm_atomic_commit+0xb4/0xe0
[  341.631522]  drm_mode_atomic_ioctl+0x6b0/0x890
[  341.631527]  drm_ioctl_kernel+0xe4/0x164
[  341.631534]  drm_ioctl+0x35c/0x3bc
[  341.631539]  vfs_ioctl+0x30/0x50
[  341.631547]  __arm64_sys_ioctl+0x80/0xb4
[  341.631552]  invoke_syscall+0x84/0x11c
[  341.631558]  el0_svc_common.constprop.0+0xc0/0xec
[  341.631563]  do_el0_svc+0x94/0xa4
[  341.631567]  el0_svc+0x2c/0x54
[  341.631570]  el0t_64_sync_handler+0x94/0x100
[  341.631575]  el0t_64_sync+0x194/0x198
[  341.631580] SMP: stopping secondary CPUs
[  341.831615] Kernel Offset: 0x2f2b200000 from 0xffffffc008000000
[  341.831618] PHYS_OFFSET: 0x80000000
[  341.831620] CPU features: 0x400000,61500506,3200720b
[  341.831623] Memory Limit: none

Abhinav Kumar May 9, 2023, 7:15 p.m. UTC | #2

On 5/8/2023 4:30 AM, Dmitry Baryshkov wrote:
> On 08/05/2023 14:02, Leonard Lausen wrote:
>> Abhinav Kumar <quic_abhinavk@quicinc.com> writes:
>>> On 5/7/2023 7:15 PM, Bjorn Andersson wrote:
>>>> When booting with the cable connected on my X13s, 100 is long enough 
>>>> for
>>>> my display to time out and require me to disconnect and reconnect the
>>>> cable again.
>>>>
>>>> Do we have any idea of why the reduction to 0 is causing an issue when
>>>> using the internal HPD?
>>>>
>>>> Regards,
>>>> Bjorn
>>> Yes, we do know why this is causing an issue. The cleaner patch for this
>>> will be posted this week.
>>
>> Great!
>>
>>> There is no need to add the 100ms delay back yet.
>>>
>>> thanks for posting this but NAK on this patch till we post the fix this
>>> week.
>>>
>>> Appreciate a bit of patience till then.
>>
>> This regression is already part of the 6.3 stable release series. Will
>> the new patch qualify for inclusion in 6.3.y? Or will it be part of 6.4
>> and this revert should go into 6.3.y?
> 
> This is a tough situation, as landing a revert will break x13s, as noted 
> by Bjorn. Given that the workaround is known at this moment, I would 
> like to wait for the patch from Abhinav to appear, then we can decide 
> which of the fixes should go to the stable kernel.
> 
>>
>> Even with this revert, there are additional regressions in 6.3 causing
>> dpu errors and blank external display upon suspending and resuming the
>> system while an external display is connected. Will your new patch also
>> fix these regressions?
>>
>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488] 
>> [dpu error]vblank timeout
>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait 
>> for commit done returned -110
>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu 
>> error]enc33 frame done timeout
>>
>> followed by a kernel panic if any modification to the display settings
>> is done, such as disabling the external display:
> 
> Interesting crash, thank you for the report.
> 

This is a different crash but the root-cause of both the issues is the 
bridge hpd_enable/disable series.

https://patchwork.freedesktop.org/patch/514414/

This is breaking the sequence and logic of internal hpd as per my 
discussion with kuogee.

We are analyzing the issue and the fix internally first and once we 
figure out all the details will post it.

>>
>> [  341.631287] Hardware name: Google Lazor (rev3 - 8) (DT)
>> [  341.631290] pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS 
>> BTYPE=--)
>> [  341.631296] pc : do_raw_spin_unlock+0xb8/0xc4
>> [  341.631310] lr : do_raw_spin_unlock+0x78/0xc4
>> [  341.631315] sp : ffffffc01100b880
>> [  341.631317] x29: ffffffc01100b880 x28: 0000000000000028 x27: 
>> 0000000000000038
>> [  341.631326] x26: ffffff808c89e180 x25: ffffffef33e39920 x24: 
>> 0000000000000000
>> [  341.631333] x23: ffffffef33e3ca0c x22: 0000000000000002 x21: 
>> ffffff808345ded8
>> [  341.631339] x20: ffffff808345ded0 x19: 000000000000001e x18: 
>> 0000000000000000
>> [  341.631345] x17: 0048000000000460 x16: 0441043b04600438 x15: 
>> 04380000089807d0
>> [  341.631351] x14: 07b0089807800780 x13: 0000000000000068 x12: 
>> 0000000000000001
>> [  341.631357] x11: ffffffef3413bb76 x10: 0000000000000bb0 x9 : 
>> ffffffef33e3d6bc
>> [  341.631363] x8 : ffffff808c89ed90 x7 : ffffff80b1c9f738 x6 : 
>> 0000000000000001
>> [  341.631370] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 
>> ffffff808345def0
>> [  341.631375] x2 : 00000000dead4ead x1 : 0000000000000003 x0 : 
>> 0000000000000000
>> [  341.631383] Kernel panic - not syncing: Asynchronous SError Interrupt
>> [  341.631386] CPU: 3 PID: 1520 Comm: kwin_wayland Not tainted 
>> 6.3.0-stb-cbq+ #2
>> [  341.631390] Hardware name: Google Lazor (rev3 - 8) (DT)
>> [  341.631393] Call trace:
>> [  341.631395]  dump_backtrace+0xc8/0x104
>> [  341.631402]  show_stack+0x20/0x30
>> [  341.631407]  dump_stack_lvl+0x48/0x60
>> [  341.631414]  dump_stack+0x18/0x24
>> [  341.631419]  panic+0x130/0x2fc
>> [  341.631425]  nmi_panic+0x54/0x78
>> [  341.631428]  arm64_serror_panic+0x74/0x80
>> [  341.631434]  arm64_is_fatal_ras_serror+0x6c/0x8c
>> [  341.631439]  do_serror+0x48/0x60
>> [  341.631444]  el1h_64_error_handler+0x30/0x48
>> [  341.631450]  el1h_64_error+0x68/0x6c
>> [  341.631455]  do_raw_spin_unlock+0xb8/0xc4
>> [  341.631460]  _raw_spin_unlock_irq+0x18/0x38
>> [  341.631466]  __wait_for_common+0xb8/0x154
>> [  341.631472]  wait_for_completion_timeout+0x28/0x34
>> [  341.631477]  dp_ctrl_push_idle+0x3c/0x88
>> [  341.631483]  dp_bridge_disable+0x20/0x2c
>> [  341.631488]  drm_atomic_bridge_chain_disable+0x8c/0xb8
>> [  341.631495]  drm_atomic_helper_commit_modeset_disables+0x198/0x450
>> [  341.631501]  msm_atomic_commit_tail+0x1c8/0x36c
>> [  341.631507]  commit_tail+0x80/0x108
>> [  341.631512]  drm_atomic_helper_commit+0x114/0x118
>> [  341.631516]  drm_atomic_commit+0xb4/0xe0
>> [  341.631522]  drm_mode_atomic_ioctl+0x6b0/0x890
>> [  341.631527]  drm_ioctl_kernel+0xe4/0x164
>> [  341.631534]  drm_ioctl+0x35c/0x3bc
>> [  341.631539]  vfs_ioctl+0x30/0x50
>> [  341.631547]  __arm64_sys_ioctl+0x80/0xb4
>> [  341.631552]  invoke_syscall+0x84/0x11c
>> [  341.631558]  el0_svc_common.constprop.0+0xc0/0xec
>> [  341.631563]  do_el0_svc+0x94/0xa4
>> [  341.631567]  el0_svc+0x2c/0x54
>> [  341.631570]  el0t_64_sync_handler+0x94/0x100
>> [  341.631575]  el0t_64_sync+0x194/0x198
>> [  341.631580] SMP: stopping secondary CPUs
>> [  341.831615] Kernel Offset: 0x2f2b200000 from 0xffffffc008000000
>> [  341.831618] PHYS_OFFSET: 0x80000000
>> [  341.831620] CPU features: 0x400000,61500506,3200720b
>> [  341.831623] Memory Limit: none
>

Leonard Lausen May 23, 2023, 2:39 a.m. UTC | #3

Abhinav Kumar <quic_abhinavk@quicinc.com> writes:
>>>> There is no need to add the 100ms delay back yet.
>>>>
>>>> thanks for posting this but NAK on this patch till we post the fix this
>>>> week.
>>>>
>>>> Appreciate a bit of patience till then.
>>>
>>> This regression is already part of the 6.3 stable release series. Will
>>> the new patch qualify for inclusion in 6.3.y? Or will it be part of 6.4
>>> and this revert should go into 6.3.y?
>> 
>> This is a tough situation, as landing a revert will break x13s, as noted 
>> by Bjorn. Given that the workaround is known at this moment, I would 
>> like to wait for the patch from Abhinav to appear, then we can decide 
>> which of the fixes should go to the stable kernel.

I wasn't able to find new patches, though may have missed them. Is there
a decision yet how to proceed with this regression? 6.2 now being EOL
may make this a good moment to decide on the next steps.

>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488] 
>>> [dpu error]vblank timeout
>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait 
>>> for commit done returned -110
>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu 
>>> error]enc33 frame done timeout
>
> This is a different crash but the root-cause of both the issues is the 
> bridge hpd_enable/disable series.
>
> https://patchwork.freedesktop.org/patch/514414/
>
> This is breaking the sequence and logic of internal hpd as per my 
> discussion with kuogee.
>
> We are analyzing the issue and the fix internally first and once we 
> figure out all the details will post it.

Thank you!

Abhinav Kumar May 23, 2023, 6:56 p.m. UTC | #4

Hi Leonard

On 5/22/2023 7:39 PM, Leonard Lausen wrote:
> Abhinav Kumar <quic_abhinavk@quicinc.com> writes:
>>>>> There is no need to add the 100ms delay back yet.
>>>>>
>>>>> thanks for posting this but NAK on this patch till we post the fix this
>>>>> week.
>>>>>
>>>>> Appreciate a bit of patience till then.
>>>>
>>>> This regression is already part of the 6.3 stable release series. Will
>>>> the new patch qualify for inclusion in 6.3.y? Or will it be part of 6.4
>>>> and this revert should go into 6.3.y?
>>>
>>> This is a tough situation, as landing a revert will break x13s, as noted
>>> by Bjorn. Given that the workaround is known at this moment, I would
>>> like to wait for the patch from Abhinav to appear, then we can decide
>>> which of the fixes should go to the stable kernel.
> 
> I wasn't able to find new patches, though may have missed them. Is there
> a decision yet how to proceed with this regression? 6.2 now being EOL
> may make this a good moment to decide on the next steps.
> 

Yes, the new patch to fix this issue is here

https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3

Apologies if you were not CCed on this, if a next version is CCed, will 
ask kuogee to cc you.

Meanwhile, will be great if you can verify if it works for you and 
provide Tested-by tags.

>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>> [dpu error]vblank timeout
>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait
>>>> for commit done returned -110
>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>> error]enc33 frame done timeout
>>
>> This is a different crash but the root-cause of both the issues is the
>> bridge hpd_enable/disable series.
>>
>> https://patchwork.freedesktop.org/patch/514414/
>>
>> This is breaking the sequence and logic of internal hpd as per my
>> discussion with kuogee.
>>
>> We are analyzing the issue and the fix internally first and once we
>> figure out all the details will post it.
> 
> Thank you!

Kuogee Hsieh May 23, 2023, 7:53 p.m. UTC | #5

On 5/23/2023 11:56 AM, Abhinav Kumar wrote:
> Hi Leonard
>
> On 5/22/2023 7:39 PM, Leonard Lausen wrote:
>> Abhinav Kumar <quic_abhinavk@quicinc.com> writes:
>>>>>> There is no need to add the 100ms delay back yet.
>>>>>>
>>>>>> thanks for posting this but NAK on this patch till we post the 
>>>>>> fix this
>>>>>> week.
>>>>>>
>>>>>> Appreciate a bit of patience till then.
>>>>>
>>>>> This regression is already part of the 6.3 stable release series. 
>>>>> Will
>>>>> the new patch qualify for inclusion in 6.3.y? Or will it be part 
>>>>> of 6.4
>>>>> and this revert should go into 6.3.y?
>>>>
>>>> This is a tough situation, as landing a revert will break x13s, as 
>>>> noted
>>>> by Bjorn. Given that the workaround is known at this moment, I would
>>>> like to wait for the patch from Abhinav to appear, then we can decide
>>>> which of the fixes should go to the stable kernel.
>>
>> I wasn't able to find new patches, though may have missed them. Is there
>> a decision yet how to proceed with this regression? 6.2 now being EOL
>> may make this a good moment to decide on the next steps.
>>
>
> Yes, the new patch to fix this issue is here
>
> https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
>
> Apologies if you were not CCed on this, if a next version is CCed, 
> will ask kuogee to cc you.
>
> Meanwhile, will be great if you can verify if it works for you and 
> provide Tested-by tags.

Hi Leonard,

I had  cc you with v5 patches.

Would you please verify it.

Thanks,

>
>>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>>> [dpu error]vblank timeout
>>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait
>>>>> for commit done returned -110
>>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>>> error]enc33 frame done timeout
>>>
>>> This is a different crash but the root-cause of both the issues is the
>>> bridge hpd_enable/disable series.
>>>
>>> https://patchwork.freedesktop.org/patch/514414/
>>>
>>> This is breaking the sequence and logic of internal hpd as per my
>>> discussion with kuogee.
>>>
>>> We are analyzing the issue and the fix internally first and once we
>>> figure out all the details will post it.
>>
>> Thank you!

Leonard Lausen May 24, 2023, 12:58 p.m. UTC | #6

>>>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>>>> [dpu error]vblank timeout
>>>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait
>>>>>> for commit done returned -110
>>>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>>>> error]enc33 frame done timeout
>>>>
>>>> This is a different crash but the root-cause of both the issues is the
>>>> bridge hpd_enable/disable series.
>>>>
>>>> https://patchwork.freedesktop.org/patch/514414/
>>
>> Yes, the new patch to fix this issue is here
>>
>> https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
>>
>> Apologies if you were not CCed on this, if a next version is CCed, 
>> will ask kuogee to cc you.
>>
>> Meanwhile, will be great if you can verify if it works for you and 
>> provide Tested-by tags.
>
> Hi Leonard,
>
> I had  cc you with v5 patches.
>
> Would you please verify it.

Hi Kuogee,

thank you. Verified the v6 patch fixes the regression when ported to
6.3.3. One non-fatal issue remains: Suspending and resuming the system
while USB-C DP monitor is connected triggers an error, though the system
recovers within a second without the need to unplug the cable.

[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)


dmesg snippet related to the suspend below

[  194.066321] PM: suspend entry (deep)
[  194.178793] Filesystems sync: 0.108 seconds
[  194.184142] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/qcom/sc7180-trogdor/modem-nolte/qdsp6sw.mbn" pid=3380 cmdline=""
[  194.196934] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/qcom/sc7180-trogdor/modem-nolte/mba.mbn" pid=3387 cmdline=""
[  194.197320] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/regulatory.db-debian" pid=3390 cmdline=""
[  194.204128] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/qcom/venus-5.4/venus.mbn" pid=3380 cmdline=""
[  194.204808] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/qca/crbtfw32.tlv" pid=3380 cmdline=""
[  194.205058] LoadPin: firmware pinning-ignored obj="/usr/lib/firmware/qca/crnv32.bin" pid=3380 cmdline=""
[  194.253591] Freezing user space processes
[  194.263621] Freezing user space processes completed (elapsed 0.005 seconds)
[  194.270816] OOM killer disabled.
[  194.274165] Freezing remaining freezable tasks
[  194.281253] Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
[  194.288866] printk: Suspending console(s) (use no_console_suspend to debug)
[  194.494479] Disabling non-boot CPUs ...
[  194.497569] psci: CPU1 killed (polled 1 ms)
[  194.501844] psci: CPU2 killed (polled 1 ms)
[  194.506311] psci: CPU3 killed (polled 1 ms)
[  194.510237] psci: CPU4 killed (polled 1 ms)
[  194.512854] psci: CPU5 killed (polled 1 ms)
[  194.516076] psci: CPU6 killed (polled 1 ms)
[  194.518397] psci: CPU7 killed (polled 0 ms)
[  194.520706] Enabling non-boot CPUs ...
[  194.521595] Detected VIPT I-cache on CPU1
[  194.521664] cacheinfo: Unable to detect cache hierarchy for CPU 1
[  194.521678] GICv3: CPU1: found redistributor 100 region 0:0x0000000017a80000
[  194.521743] CPU1: Booted secondary processor 0x0000000100 [0x51df805e]
[  194.522829] CPU1 is up
[  194.523646] Detected VIPT I-cache on CPU2
[  194.523701] cacheinfo: Unable to detect cache hierarchy for CPU 2
[  194.523716] GICv3: CPU2: found redistributor 200 region 0:0x0000000017aa0000
[  194.523775] CPU2: Booted secondary processor 0x0000000200 [0x51df805e]
[  194.524809] CPU2 is up
[  194.525537] Detected VIPT I-cache on CPU3
[  194.525592] cacheinfo: Unable to detect cache hierarchy for CPU 3
[  194.525611] GICv3: CPU3: found redistributor 300 region 0:0x0000000017ac0000
[  194.525668] CPU3: Booted secondary processor 0x0000000300 [0x51df805e]
[  194.526674] CPU3 is up
[  194.527486] Detected VIPT I-cache on CPU4
[  194.527535] cacheinfo: Unable to detect cache hierarchy for CPU 4
[  194.527556] GICv3: CPU4: found redistributor 400 region 0:0x0000000017ae0000
[  194.527612] CPU4: Booted secondary processor 0x0000000400 [0x51df805e]
[  194.528836] CPU4 is up
[  194.529553] Detected VIPT I-cache on CPU5
[  194.529601] cacheinfo: Unable to detect cache hierarchy for CPU 5
[  194.529623] GICv3: CPU5: found redistributor 500 region 0:0x0000000017b00000
[  194.529675] CPU5: Booted secondary processor 0x0000000500 [0x51df805e]
[  194.530986] CPU5 is up
[  194.532280] Detected PIPT I-cache on CPU6
[  194.532307] cacheinfo: Unable to detect cache hierarchy for CPU 6
[  194.532322] GICv3: CPU6: found redistributor 600 region 0:0x0000000017b20000
[  194.532358] CPU6: Booted secondary processor 0x0000000600 [0x51ff804f]
[  194.534434] CPU6 is up
[  194.535408] Detected PIPT I-cache on CPU7
[  194.535445] cacheinfo: Unable to detect cache hierarchy for CPU 7
[  194.535463] GICv3: CPU7: found redistributor 700 region 0:0x0000000017b40000
[  194.535505] CPU7: Booted secondary processor 0x0000000700 [0x51ff804f]
[  194.536281] CPU7 is up
[  195.285023] onboard-usb-hub 1-1: reset high-speed USB device number 2 using xhci-hcd
[  195.541240] onboard-usb-hub 2-1: reset SuperSpeed USB device number 2 using xhci-hcd
[  195.796915] usb 1-1.4: reset high-speed USB device number 22 using xhci-hcd
[  195.972952] usb 2-1.4: reset SuperSpeed USB device number 10 using xhci-hcd
[  196.278492] usb 1-1.4.4: reset high-speed USB device number 24 using xhci-hcd
[  196.468996] usb 1-1.4.2: reset high-speed USB device number 26 using xhci-hcd
[  197.055717] usb 2-1.4.2: reset SuperSpeed USB device number 11 using xhci-hcd
[  197.845110] usb 2-1.4.4: reset SuperSpeed USB device number 12 using xhci-hcd
[  198.235191] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
[  198.528638] OOM killer enabled.
[  198.531866] Restarting tasks ... 
[  198.531994] usb 1-1.4.4.1: USB disconnect, device number 27
[  198.532223] usb 1-1.4.3: USB disconnect, device number 23
[  198.532509] usb 1-1.4.2.1: USB disconnect, device number 29
[  198.534805] r8152-cfgselector 2-1.4.4.2: USB disconnect, device number 13
[  198.535444] done.
[  198.535536] usb 1-1.1: USB disconnect, device number 15
[  198.567811] random: crng reseeded on system resumption
[  198.583431] PM: suspend exit

Kuogee Hsieh May 25, 2023, 5:57 p.m. UTC | #7

On 5/24/2023 5:58 AM, Leonard Lausen wrote:
>>>>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>>>>> [dpu error]vblank timeout
>>>>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait
>>>>>>> for commit done returned -110
>>>>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>>>>> error]enc33 frame done timeout
>>>>> This is a different crash but the root-cause of both the issues is the
>>>>> bridge hpd_enable/disable series.
>>>>>
>>>>> https://patchwork.freedesktop.org/patch/514414/
>>> Yes, the new patch to fix this issue is here
>>>
>>> https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
>>>
>>> Apologies if you were not CCed on this, if a next version is CCed,
>>> will ask kuogee to cc you.
>>>
>>> Meanwhile, will be great if you can verify if it works for you and
>>> provide Tested-by tags.
>> Hi Leonard,
>>
>> I had  cc you with v5 patches.
>>
>> Would you please verify it.
> Hi Kuogee,
>
> thank you. Verified the v6 patch fixes the regression when ported to
> 6.3.3. One non-fatal issue remains: Suspending and resuming the system
> while USB-C DP monitor is connected triggers an error, though the system
> recovers within a second without the need to unplug the cable.
>
> [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
>
>
> dmesg snippet related to the suspend below
>
>
> [  197.845110] usb 2-1.4.4: reset SuperSpeed USB device number 12 using xhci-hcd
> [  198.235191] [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)

Hi Leonard,

I did not see this problem at my setup (Kodiak) during suspend/resume.

Will investigate more on Trogdor device.

Thanks,


> [  198.528638] OOM killer enabled.
> [  198.531866] Restarting tasks ...
> [  198.531994] usb 1-1.4.4.1: USB disconnect, device number 27
> [  198.532223] usb 1-1.4.3: USB disconnect, device number 23
> [  198.532509] usb 1-1.4.2.1: USB disconnect, device number 29
> [  198.534805] r8152-cfgselector 2-1.4.4.2: USB disconnect, device number 13
> [  198.535444] done.
> [  198.535536] usb 1-1.1: USB disconnect, device number 15
> [  198.567811] random: crng reseeded on system resumption
> [  198.583431] PM: suspend exit

Abhinav Kumar May 25, 2023, 6:10 p.m. UTC | #8

On 5/25/2023 10:57 AM, Kuogee Hsieh wrote:
> 
> On 5/24/2023 5:58 AM, Leonard Lausen wrote:
>>>>>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>>>>>> [dpu error]vblank timeout
>>>>>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu 
>>>>>>>> error]wait
>>>>>>>> for commit done returned -110
>>>>>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>>>>>> error]enc33 frame done timeout
>>>>>> This is a different crash but the root-cause of both the issues is 
>>>>>> the
>>>>>> bridge hpd_enable/disable series.
>>>>>>
>>>>>> https://patchwork.freedesktop.org/patch/514414/
>>>> Yes, the new patch to fix this issue is here
>>>>
>>>> https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
>>>>
>>>> Apologies if you were not CCed on this, if a next version is CCed,
>>>> will ask kuogee to cc you.
>>>>
>>>> Meanwhile, will be great if you can verify if it works for you and
>>>> provide Tested-by tags.
>>> Hi Leonard,
>>>
>>> I had  cc you with v5 patches.
>>>
>>> Would you please verify it.
>> Hi Kuogee,
>>
>> thank you. Verified the v6 patch fixes the regression when ported to
>> 6.3.3. One non-fatal issue remains: Suspending and resuming the system
>> while USB-C DP monitor is connected triggers an error, though the system
>> recovers within a second without the need to unplug the cable.
>>
>> [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
>>
>>
>> dmesg snippet related to the suspend below
>>
>>
>> [  197.845110] usb 2-1.4.4: reset SuperSpeed USB device number 12 
>> using xhci-hcd
>> [  198.235191] [drm:drm_mode_config_helper_resume] *ERROR* Failed to 
>> resume (-107)
> 
> Hi Leonard,
> 
> I did not see this problem at my setup (Kodiak) during suspend/resume.
> 
> Will investigate more on Trogdor device.
> 
> Thanks,
> 

Hi Leonard

Feel free to open a bug for this and assign to me, we can check this and 
ask more info if needed on that bug.

Thanks

Abhinav

> 
>> [  198.528638] OOM killer enabled.
>> [  198.531866] Restarting tasks ...
>> [  198.531994] usb 1-1.4.4.1: USB disconnect, device number 27
>> [  198.532223] usb 1-1.4.3: USB disconnect, device number 23
>> [  198.532509] usb 1-1.4.2.1: USB disconnect, device number 29
>> [  198.534805] r8152-cfgselector 2-1.4.4.2: USB disconnect, device 
>> number 13
>> [  198.535444] done.
>> [  198.535536] usb 1-1.1: USB disconnect, device number 15
>> [  198.567811] random: crng reseeded on system resumption
>> [  198.583431] PM: suspend exit

Abhinav Kumar June 1, 2023, 7:20 p.m. UTC | #9

Hi Leonard

On 5/24/2023 5:58 AM, Leonard Lausen wrote:
>>>>>>> [  275.025497] [drm:dpu_encoder_phys_vid_wait_for_commit_done:488]
>>>>>>> [dpu error]vblank timeout
>>>>>>> [  275.025514] [drm:dpu_kms_wait_for_commit_done:510] [dpu error]wait
>>>>>>> for commit done returned -110
>>>>>>> [  275.064141] [drm:dpu_encoder_frame_done_timeout:2382] [dpu
>>>>>>> error]enc33 frame done timeout
>>>>>
>>>>> This is a different crash but the root-cause of both the issues is the
>>>>> bridge hpd_enable/disable series.
>>>>>
>>>>> https://patchwork.freedesktop.org/patch/514414/
>>>
>>> Yes, the new patch to fix this issue is here
>>>
>>> https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
>>>
>>> Apologies if you were not CCed on this, if a next version is CCed,
>>> will ask kuogee to cc you.
>>>
>>> Meanwhile, will be great if you can verify if it works for you and
>>> provide Tested-by tags.
>>
>> Hi Leonard,
>>
>> I had  cc you with v5 patches.
>>
>> Would you please verify it.
> 
> Hi Kuogee,
> 
> thank you. Verified the v6 patch fixes the regression when ported to
> 6.3.3. One non-fatal issue remains: Suspending and resuming the system
> while USB-C DP monitor is connected triggers an error, though the system
> recovers within a second without the need to unplug the cable.
> 
> [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
> 

We are not able to recreate this on sc7280 chromebooks , will need to 
check on sc7180. This does not seem directly related to any of the 
hotplug changes though so needs to be checked separately. So please feel 
free to raise a gitlab bug for this and assign to me.

Leonard Lausen June 2, 2023, 1:36 a.m. UTC | #10

Hi Abhinav,

June 1, 2023 at 3:20 PM, "Abhinav Kumar" <quic_abhinavk@quicinc.com> wrote:
> > 
> >  [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-107)
> > 
> 
> We are not able to recreate this on sc7280 chromebooks , will need to check on sc7180. This does not seem directly related to any of the hotplug changes though so needs to be checked separately. So please feel free to raise a gitlab bug for this and assign to me.

Thank you for checking with sc7280. I created https://gitlab.freedesktop.org/drm/msm/-/issues/25 and CCed you. I've also verified that the error persists with v6.4.0-rc4 + Kuogee's patch (just in case you may have tested on sc7280 with 6.4).

> >  https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
> >  Apologies if you were not CCed on this, if a next version is CCed,
> >  will ask kuogee to cc you.
> >  Meanwhile, will be great if you can verify if it works for you and
> >  provide Tested-by tags.

I see Bjorn also tested the patch. As it fixes a serious USB-C DP regression which broke USB-C DP completely on lazor for v6.3, can it be included in upcoming 6.3.y release?

Thank you
Leonard

Leonard Lausen June 22, 2023, 1:05 p.m. UTC | #11

> > https://patchwork.freedesktop.org/patch/538601/?series=118148&rev=3
> >  Apologies if you were not CCed on this, if a next version is CCed,
> > 
> >  will ask kuogee to cc you.
> > 
> >  Meanwhile, will be great if you can verify if it works for you and
> > 
> >  provide Tested-by tags.
> > 
> 
> I see Bjorn also tested the patch. As it fixes a serious USB-C DP regression which broke USB-C DP completely on lazor for v6.3, can it be included in upcoming 6.3.y release?

Kuogee's fix has since been committed to drm-tip on 2023-06-08 as a8e981ac2d0eb9dd53a4c173e29ca0c99c88abe2. Since it fixes a serious regression in 6.3 and 6.4 kernels, can we include it for the stable releases?

Thank you
Leonard

Revert "drm/msm/dp: Remove INIT_SETUP delay"

Commit Message

Comments

Patch