diff mbox series

[v2,1/2] drm/msm/dpu1: don't choke on disabling the writeback connector

Message ID 20240802-dpu-fix-wb-v2-1-7eac9eb8e895@linaro.org (mailing list archive)
State New, archived
Headers show
Series drm/msm/dpu: two fixes targeting 6.11 | expand

Commit Message

Dmitry Baryshkov Aug. 2, 2024, 7:47 p.m. UTC
During suspend/resume process all connectors are explicitly disabled and
then reenabled. However resume fails because of the connector_status check:

[ 1185.831970] [dpu error]connector not connected 3

It doesn't make sense to check for the Writeback connected status (and
other drivers don't perform such check), so drop the check.

Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
Cc: stable@vger.kernel.org
Reported-by: Leonard Lausen <leonard@lausen.nl>
Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
 1 file changed, 3 deletions(-)

Comments

Leonard Lausen Aug. 5, 2024, 2:27 a.m. UTC | #1
Dear Dmitry,

Thank you for the patch. Unfortunately, the patch triggers a regression with
respect to DRM CRTC state handling. With the patch applied, suspending and
resuming a lazor sc7180 with external display connected, looses CRTC state on
resume and prevents applying a new CRTC state. Without the patch, CRTC state is
preserved across suspend and resume and it remains possible to change CRTC
settings after resume. This means the patch regresses the user experience,
preventing "Night Light" mode to work as expected. I've validated this on
v6.10.2 vs. v6.10.2 with this patch applied.

While the cause for the bug uncovered by this change is likely separate, given
it's impact, would it be prudent to delay the application of this patch until
the related bug is identified and fixed? Otherwise we would be fixing a dmesg
error message "[dpu error]connector not connected 3" that appears to do no harm
but thereby break more critical user visible behavior.

Best regards
Leonard

On 8/2/24 15:47, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [ 1185.831970] [dpu error]connector not connected 3
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c9..8ff496082902 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>  	if (!conn_state || !conn_state->connector) {
>  		DPU_ERROR("invalid connector state\n");
>  		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>  	}
>  
>  	crtc = conn_state->crtc;
>
Abhinav Kumar Aug. 5, 2024, 7:19 p.m. UTC | #2
On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [ 1185.831970] [dpu error]connector not connected 3
> 
> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>   1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> index 16f144cbc0c9..8ff496082902 100644
> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}

For this issue, do we hit the connector->force = DRM_FORCE_OFF path?

Because otherwise, writeback does not implement .detect() callback today 
so its always connected.

But if that was the case how come this error is only for writeback. Even 
DP has the same connected check in atomic_check()

Change seems fine with me because ideally this seems like a no-op to me 
because writeback connector is assumed to be always connected but the 
issue is missing some details here.

>   
>   	crtc = conn_state->crtc;
>
Dmitry Baryshkov Aug. 7, 2024, 10:44 a.m. UTC | #3
On August 5, 2024 9:27:39 AM GMT+07:00, Leonard Lausen <leonard@lausen.nl> wrote:
>Dear Dmitry,
>
>Thank you for the patch. Unfortunately, the patch triggers a regression with
>respect to DRM CRTC state handling. With the patch applied, suspending and
>resuming a lazor sc7180 with external display connected, looses CRTC state on
>resume and prevents applying a new CRTC state. Without the patch, CRTC state is
>preserved across suspend and resume and it remains possible to change CRTC
>settings after resume. This means the patch regresses the user experience,
>preventing "Night Light" mode to work as expected. I've validated this on
>v6.10.2 vs. v6.10.2 with this patch applied.
>

Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.

>While the cause for the bug uncovered by this change is likely separate, given
>it's impact, would it be prudent to delay the application of this patch until
>the related bug is identified and fixed? Otherwise we would be fixing a dmesg
>error message "[dpu error]connector not connected 3" that appears to do no harm
>but thereby break more critical user visible behavior.
>
>Best regards
>Leonard
>
>On 8/2/24 15:47, Dmitry Baryshkov wrote:
>> During suspend/resume process all connectors are explicitly disabled and
>> then reenabled. However resume fails because of the connector_status check:
>> 
>> [ 1185.831970] [dpu error]connector not connected 3
>> 
>> It doesn't make sense to check for the Writeback connected status (and
>> other drivers don't perform such check), so drop the check.
>> 
>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>> Cc: stable@vger.kernel.org
>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> ---
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>  1 file changed, 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> index 16f144cbc0c9..8ff496082902 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>  	if (!conn_state || !conn_state->connector) {
>>  		DPU_ERROR("invalid connector state\n");
>>  		return -EINVAL;
>> -	} else if (conn_state->connector->status != connector_status_connected) {
>> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>> -		return -EINVAL;
>>  	}
>>  
>>  	crtc = conn_state->crtc;
>> 
>
Dmitry Baryshkov Aug. 7, 2024, 10:46 a.m. UTC | #4
On August 6, 2024 2:19:46 AM GMT+07:00, Abhinav Kumar <quic_abhinavk@quicinc.com> wrote:
>
>
>On 8/2/2024 12:47 PM, Dmitry Baryshkov wrote:
>> During suspend/resume process all connectors are explicitly disabled and
>> then reenabled. However resume fails because of the connector_status check:
>> 
>> [ 1185.831970] [dpu error]connector not connected 3
>> 
>> It doesn't make sense to check for the Writeback connected status (and
>> other drivers don't perform such check), so drop the check.
>> 
>> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
>> Cc: stable@vger.kernel.org
>> Reported-by: Leonard Lausen <leonard@lausen.nl>
>> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
>> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
>> ---
>>   drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c | 3 ---
>>   1 file changed, 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> index 16f144cbc0c9..8ff496082902 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
>> @@ -42,9 +42,6 @@ static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
>>   	if (!conn_state || !conn_state->connector) {
>>   		DPU_ERROR("invalid connector state\n");
>>   		return -EINVAL;
>> -	} else if (conn_state->connector->status != connector_status_connected) {
>> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
>> -		return -EINVAL;
>>   	}
>
>For this issue, do we hit the connector->force = DRM_FORCE_OFF path?

It was hit during the suspend/resume, so yes, it is a forced off, but by the different means.

>
>Because otherwise, writeback does not implement .detect() callback today so its always connected.

It is undefined/unkown (3), not connected (1)

>
>But if that was the case how come this error is only for writeback. Even DP has the same connected check in atomic_check()
>
>Change seems fine with me because ideally this seems like a no-op to me because writeback connector is assumed to be always connected but the issue is missing some details here.
>
>>     	crtc = conn_state->crtc;
>>
Leonard Lausen Aug. 7, 2024, 3:41 p.m. UTC | #5
On 8/7/24 06:44, Dmitry Baryshkov wrote:> Could you please clarify, I was under the impression that currently whole suspend/resume is broken, so it's more than a dmesg message.

71174f362d67 specifically, or v6.9 more broadly regress in that we get "[dpu
error]connector not connected 3" and "[drm:drm_mode_config_helper_resume]
*ERROR* Failed to resume (-22)" if suspending and resuming the system while
external display is connected over USB-C DP. Suspend and resume itself
still works, and the external display also works after the resume, albeit
perhaps with a small delay due to the dpu error. This is also mentioned in
the issue description of https://gitlab.freedesktop.org/drm/msm/-/issues/57.
So while suspend/resume isn't fully broken, the error is still unexpected and
I thus bisected and identified 71174f362d67 as the first commit to trigger it.
While your patch avoids the dpu/drm error, it triggers issue with the CRTC state,
breaking the CRTC functionality after resume.

Might we be facing a race condition here, which is accidentally exposed by
71174f362d67 but requires a separate fix?
György Kurucz Aug. 30, 2024, 5:36 p.m. UTC | #6
Dear Dmitry,

For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues 
with the display staying black after sleep. As a workaround, I could 
switch to a different VT and back.

> [ 1185.831970] [dpu error]connector not connected 3

I can confirm that I was seeing this exact error message as well.

>   	if (!conn_state || !conn_state->connector) {
>   		DPU_ERROR("invalid connector state\n");
>   		return -EINVAL;
> -	} else if (conn_state->connector->status != connector_status_connected) {
> -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> -		return -EINVAL;
>   	}
>   
>   	crtc = conn_state->crtc;

After applying this patch, the screen now resumes successfully, and the 
errors are gone.

(For future reference, I am using this kernel: 
https://github.com/jhovold/linux/commits/wip/x1e80100-6.11-rc5/, commit 
cc2cb95cc77fec43edd407c993122085fa8c2dd7.)

Tested-by: György Kurucz <me@kuruczgy.com>

Best regards,
György
Leonard Lausen Aug. 31, 2024, 2:53 p.m. UTC | #7
Dear György,

On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.

Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.

On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Best regards
Leonard
Leonard Lausen Aug. 31, 2024, 3:47 p.m. UTC | #8
(Resend as there was an email SPF record issue)

Dear György,

On 8/30/24 13:36, György Kurucz wrote:
> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues with the display staying black after sleep. As a workaround, I could switch to a different VT and back.

Do you observe this issue on every suspend-resume cycle? On sc7180 trogdor lazor, I do observe the same "display staying black after sleep" in some of the suspend-resume cycles. I have not been able to observe a pattern with respect to when the issue occurs and when it does not. "[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)" is printed in either case. Switching to a different VT and back works to restore display functionality after those suspend-resume cycles that experience the issue.

On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Best regards
Leonard
György Kurucz Aug. 31, 2024, 6:46 p.m. UTC | #9
Dear Leonard,

> Do you observe this issue on every suspend-resume cycle?

I just did 10 suspend/resume cycles in a row to double check, and 
without this patch the screen never comes back (always have to switch VT 
back-and-forth to bring it back). The

[dpu error]connector not connected 3
[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)

pair of error messages also consistently appears after all resumes.

Though I think e.g. Rob Clark reported that suspend/resume already works 
properly for him without this patch, so this experience is not universal 
on the Yoga Slim 7x.

> On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.

Unfortunately I cannot test this, as color temperature adjustments seems 
to be completely non-functional for me in the first place. For color 
temperature adjustment, I use gammastep on my machines, which uses 
wlr_gamma_control_unstable_v1 under the hood. It outputs the following 
warnings:

Warning: Zero outputs support gamma adjustment.
Warning: 1/1 output(s) do not support gamma adjustment.

I haven't dug deeper into the cause yet, based on these it seems that 
wlroots isn't detecting the display as being gamma-adjustable in the 
first place.

Best regards,
György
Leonard Lausen Aug. 31, 2024, 7 p.m. UTC | #10
Dear György,

>> Do you observe this issue on every suspend-resume cycle?
> 
> I just did 10 suspend/resume cycles in a row to double check, and without this patch the screen never comes back (always have to switch VT back-and-forth to bring it back). The
> 
> [dpu error]connector not connected 3
> [drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)
> 
> pair of error messages also consistently appears after all resumes.
> 
> Though I think e.g. Rob Clark reported that suspend/resume already works properly for him without this patch, so this experience is not universal on the Yoga Slim 7x.

Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?

>> On sc7180 lazor, I do observe that this patch deterministically breaks restoring the CRTC state and functionality after resume. Can you please validate if you observe the same on Lenovo Yoga Slim 7x? Specifically, try set Night Light in your desktop environment to "Always On" and observe whether the screen remains in "Night Light" mode after resume. For lazor, "Night Light" is breaks after applying this patch and even manually toggling it off and on after resume does not restore "Night Light" / CRTC functionality.
> 
> Unfortunately I cannot test this, as color temperature adjustments seems to be completely non-functional for me in the first place. For color temperature adjustment, I use gammastep on my machines, which uses wlr_gamma_control_unstable_v1 under the hood. It outputs the following warnings:
> 
> Warning: Zero outputs support gamma adjustment.
> Warning: 1/1 output(s) do not support gamma adjustment.
> 
> I haven't dug deeper into the cause yet, based on these it seems that wlroots isn't detecting the display as being gamma-adjustable in the first place.

The cause is simple: Qualcomm SoCs don't implement GAMMA_LUT support. Your desktop environment needs to use Color Transform Matrix (CTM) on ARM/QCom devices. You can refer to https://bugs.kde.org/show_bug.cgi?id=455720 for further details. It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE. Gnome does not support CTM yet (https://gitlab.gnome.org/GNOME/mutter/-/issues/2318).

Best regards
Leonard
György Kurucz Aug. 31, 2024, 9:50 p.m. UTC | #11
Dear Leonard,

I installed KDE. First, I ran it with the my regular kernel without this 
patch. The first interesting thing I notice is that the screen *does* 
come back after resume. (The error messages are still present though.)

> Ack. Do you mean that Rob Clark also uses Yoga Slim 7x but does not face the "screen never comes back (always have to switch VT back-and-forth to bring it back)" issue?

Yes, at least that's what I gathered from our conversations on IRC. But 
with the above in mind, I now suspect that this comes down to desktop 
environment differences.

> It would be great if you can validate whether this patch breaks CRTC state (which includes the CTM state) on Yoga Slim 7x, or whether that is specific to the trogdor lazor (Chromebook Acer Spin 513), though it may require you to install KDE.

Well "Night Light" seems to be even more broken under KDE. I went into 
System Settings, set it to "Always on night light", and tried to adjust 
the temperature slider. While adjusting the slider, the screen goes 
black, and only comes back after a few seconds. The color temperature 
does not change, no matter what I change the slider to. Afterwards I 
tried with this patch as well, but it produces the exact same behavior.

Best regards,
György
Johan Hovold Nov. 19, 2024, 1:52 p.m. UTC | #12
On Fri, Aug 30, 2024 at 07:36:32PM +0200, György Kurucz wrote:

> For context, I have a Lenovo Yoga Slim 7x laptop, and was having issues 
> with the display staying black after sleep. As a workaround, I could 
> switch to a different VT and back.
> 
> > [ 1185.831970] [dpu error]connector not connected 3
> 
> I can confirm that I was seeing this exact error message as well.
> 
> >   	if (!conn_state || !conn_state->connector) {
> >   		DPU_ERROR("invalid connector state\n");
> >   		return -EINVAL;
> > -	} else if (conn_state->connector->status != connector_status_connected) {
> > -		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
> > -		return -EINVAL;
> >   	}
> >   
> >   	crtc = conn_state->crtc;
> 
> After applying this patch, the screen now resumes successfully, and the 
> errors are gone.

I'm seeing the same issue as György on the x1e80100 CRD and Lenovo
ThinkPad T14s. Without this patch, the internal display fails to resume
properly (switching VT brings it back) and the following errors are
logged:

	[dpu error]connector not connected 3
	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)

I see the same symptoms with Xorg as well as sway.

Can we please get this fixed and backported as soon as possible?

Even if there are further issues with some "Night Light" functionality
on one machine, keeping this bug as workaround does not seem warranted
given that it breaks basic functionality for users.

The x1e80100 is the only platform I have access to with a writeback
connector, but this regression potentially affects a whole host of older
platforms as well.

Johan
Leonard Lausen Nov. 19, 2024, 2:33 p.m. UTC | #13
Hi Johan,

> I'm seeing the same issue as György on the x1e80100 CRD and Lenovo
> ThinkPad T14s. Without this patch, the internal display fails to resume
> properly (switching VT brings it back) and the following errors are
> logged:
> 
> 	[dpu error]connector not connected 3
> 	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> 
> I see the same symptoms with Xorg as well as sway.

The issue of "internal display fails to resume properly (switching VT brings it back)"
also affects sc7180 platform during some resumes. Do you see the issue consistently
during every resume?

> Can we please get this fixed and backported as soon as possible?
> 
> Even if there are further issues with some "Night Light" functionality
> on one machine, keeping this bug as workaround does not seem warranted
> given that it breaks basic functionality for users.

I suspect this is not about "further issues with some 'Night Light' functionality
on one machine", but rather a more fundamental issue or race condition in the qcom
DRM devices stack, that is exposed when applying this patch. With this patch applied
DRM device state is lost after resume and setting the state is no longer possible.
Lots of kernel errors are printed if attempting to set DRM state such as the
Color Transform Matrix, when running a kernel with this patch applied.
Back in July 2024 I tested this patch on top of 6.9.8 and next-20240709,
observing below snippet being logged tens of times:

[drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0
[drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers
[drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119

Full logs are attached at https://gitlab.freedesktop.org/drm/msm/-/issues/58.

> The x1e80100 is the only platform I have access to with a writeback
> connector, but this regression potentially affects a whole host of older
> platforms as well.

Have you attempted setting CTM or other DRM state when running with this patch?

Best regards
Leonard
Johan Hovold Nov. 19, 2024, 3:11 p.m. UTC | #14
On Tue, Nov 19, 2024 at 09:33:26AM -0500, Leonard Lausen wrote:

> > I'm seeing the same issue as György on the x1e80100 CRD and Lenovo
> > ThinkPad T14s. Without this patch, the internal display fails to resume
> > properly (switching VT brings it back) and the following errors are
> > logged:
> > 
> > 	[dpu error]connector not connected 3
> > 	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> > 
> > I see the same symptoms with Xorg as well as sway.
> 
> The issue of "internal display fails to resume properly (switching VT brings it back)"
> also affects sc7180 platform during some resumes. Do you see the issue consistently
> during every resume?

Yes, it happens on every suspend cycle here.

I didn't notice the issue initially as fbdev does not seem to be
affected, and I've been running with this patch applied to suppress the
resume errors since it was posted.

> > Can we please get this fixed and backported as soon as possible?
> > 
> > Even if there are further issues with some "Night Light" functionality
> > on one machine, keeping this bug as workaround does not seem warranted
> > given that it breaks basic functionality for users.
> 
> I suspect this is not about "further issues with some 'Night Light' functionality
> on one machine", but rather a more fundamental issue or race condition in the qcom
> DRM devices stack, that is exposed when applying this patch. With this patch applied
> DRM device state is lost after resume and setting the state is no longer possible.
> Lots of kernel errors are printed if attempting to set DRM state such as the
> Color Transform Matrix, when running a kernel with this patch applied.
> Back in July 2024 I tested this patch on top of 6.9.8 and next-20240709,
> observing below snippet being logged tens of times:
> 
> [drm:_dpu_rm_check_lm_and_get_connected_blks] [dpu error]failed to get dspp on lm 0
> [drm:_dpu_rm_make_reservation] [dpu error]unable to find appropriate mixers
> [drm:dpu_rm_reserve] [dpu error]failed to reserve hw resources: -119
> 
> Full logs are attached at https://gitlab.freedesktop.org/drm/msm/-/issues/58.

I would not be surprised if there are further issues here, but we can't
just leave things completely broken as they currently are.

> > The x1e80100 is the only platform I have access to with a writeback
> > connector, but this regression potentially affects a whole host of older
> > platforms as well.
> 
> Have you attempted setting CTM or other DRM state when running with this patch?

Nope, I just want basic suspend to work.

Johan
Leonard Lausen Nov. 20, 2024, 3:02 a.m. UTC | #15
>> The issue of "internal display fails to resume properly (switching VT brings it back)"
>> also affects sc7180 platform during some resumes. Do you see the issue consistently
>> during every resume?
> 
> Yes, it happens on every suspend cycle here.
> 
> I didn't notice the issue initially as fbdev does not seem to be
> affected, and I've been running with this patch applied to suppress the
> resume errors since it was posted.

Ok. Then situation is worse on x1e80100 than sc7180, where the issue only
occurs sporadically.

>>> The x1e80100 is the only platform I have access to with a writeback
>>> connector, but this regression potentially affects a whole host of older
>>> platforms as well.
>>
>> Have you attempted setting CTM or other DRM state when running with this patch?
> 
> Nope, I just want basic suspend to work.

Ok.

Given my previous testing and finding of this patch causing unexplained CRTC
CTM regression was back in July on 6.9.8 and next-20240709, I went ahead and
tested on 6.10.14, 6.11.9 and 6.12 as well. To recall, the problematic behavior
observed with this patch before was that CRTC CTM state would be lost after
suspend with external display attached and re-setting the state was no longer
possible after resume. (The "failed to get dspp on lm 0" etc. errors mentioned
in my earlier email today were not associated with CRTC state issue, but
actually occur with and without this patch. Apologies for misstating, given the
elapsed time):

The finding is that while 6.10.14 with this patch applied still suffers from
that regression, 6.11.9 and 6.12 do not face the CRTC state regression.
Therefore, whatever issue the patch uncovered in older kernels and which
justified not merging it before due to regressing basic CTM functionality, is
now fixed. The patch should be good to merge and backport to 6.11, but from my
perspective should not be backported to older kernels unless the interaction
with the DRM CRTC state issue is understood and an associated fix backported as
well.

I also confirmed that the patch (still) fixes the
"[drm:drm_mode_config_helper_resume] *ERROR* Failed to resume (-22)"
error upon resume.

Tested-by: Leonard Lausen <leonard@lausen.nl> # on sc7180 lazor

Thank you
Leonard
Johan Hovold Nov. 20, 2024, 8:32 a.m. UTC | #16
On Tue, Nov 19, 2024 at 10:02:33PM -0500, Leonard Lausen wrote:

> The finding is that while 6.10.14 with this patch applied still suffers from
> that regression, 6.11.9 and 6.12 do not face the CRTC state regression.
> Therefore, whatever issue the patch uncovered in older kernels and which
> justified not merging it before due to regressing basic CTM functionality, is
> now fixed. The patch should be good to merge and backport to 6.11, but from my
> perspective should not be backported to older kernels unless the interaction
> with the DRM CRTC state issue is understood and an associated fix backported as
> well.

Thanks for testing. The 6.9 and 6.10 stable trees are EOL and backporting
to 6.11 should not cause any trouble then.

Johan
Johan Hovold Nov. 20, 2024, 8:39 a.m. UTC | #17
On Fri, Aug 02, 2024 at 10:47:33PM +0300, Dmitry Baryshkov wrote:
> During suspend/resume process all connectors are explicitly disabled and
> then reenabled. However resume fails because of the connector_status check:
> 
> [ 1185.831970] [dpu error]connector not connected 3

Please also include the follow-on resume error. I'm seeing:

	[dpu error]connector not connected 3
	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)

and say something about that this can prevent displays from being
enabled on resume in some setups (preferably with an explanation why if
you have one).

> It doesn't make sense to check for the Writeback connected status (and
> other drivers don't perform such check), so drop the check.
> 
> Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")

I noticed that the implementation had this status check also before
71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to
dpu_writeback.c").

Why did this not cause any trouble back then? Or is this not the right
Fixes tag?

> Cc: stable@vger.kernel.org
> Reported-by: Leonard Lausen <leonard@lausen.nl>
> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57

Perhaps you can include mine an György's reports here too.

> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>

With the above addressed:

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan
Johan Hovold Dec. 6, 2024, 10:40 a.m. UTC | #18
Hi Dmitry,

On Wed, Nov 20, 2024 at 09:39:27AM +0100, Johan Hovold wrote:
> On Fri, Aug 02, 2024 at 10:47:33PM +0300, Dmitry Baryshkov wrote:
> > During suspend/resume process all connectors are explicitly disabled and
> > then reenabled. However resume fails because of the connector_status check:
> > 
> > [ 1185.831970] [dpu error]connector not connected 3
> 
> Please also include the follow-on resume error. I'm seeing:
> 
> 	[dpu error]connector not connected 3
> 	[drm:drm_mode_config_helper_resume [drm_kms_helper]] *ERROR* Failed to resume (-22)
> 
> and say something about that this can prevent displays from being
> enabled on resume in some setups (preferably with an explanation why if
> you have one).
> 
> > It doesn't make sense to check for the Writeback connected status (and
> > other drivers don't perform such check), so drop the check.
> > 
> > Fixes: 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to dpu_writeback.c")
> 
> I noticed that the implementation had this status check also before
> 71174f362d67 ("drm/msm/dpu: move writeback's atomic_check to
> dpu_writeback.c").
> 
> Why did this not cause any trouble back then? Or is this not the right
> Fixes tag?
> 
> > Cc: stable@vger.kernel.org
> > Reported-by: Leonard Lausen <leonard@lausen.nl>
> > Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/57
> 
> Perhaps you can include mine an György's reports here too.
> 
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> 
> With the above addressed:
> 
> Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
> Tested-by: Johan Hovold <johan+linaro@kernel.org>

It's been over two weeks and I'm still waiting on a reply from you.

Can you please respin the patch as suggested above so that we can get
this merged ASAP to fix suspend on X1E which has been broken since at
least early summer.

Johan
diff mbox series

Patch

diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
index 16f144cbc0c9..8ff496082902 100644
--- a/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
+++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_writeback.c
@@ -42,9 +42,6 @@  static int dpu_wb_conn_atomic_check(struct drm_connector *connector,
 	if (!conn_state || !conn_state->connector) {
 		DPU_ERROR("invalid connector state\n");
 		return -EINVAL;
-	} else if (conn_state->connector->status != connector_status_connected) {
-		DPU_ERROR("connector not connected %d\n", conn_state->connector->status);
-		return -EINVAL;
 	}
 
 	crtc = conn_state->crtc;