diff mbox series

drm/bridge: analogix_dp: Make PSR-disable non-blocking

Message ID 20211020161724.1.I67612ea073c3306c71b46a87be894f79707082df@changeid (mailing list archive)
State New, archived
Headers show
Series drm/bridge: analogix_dp: Make PSR-disable non-blocking | expand

Commit Message

Brian Norris Oct. 20, 2021, 11:17 p.m. UTC
Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
"PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
accidentally (?) set blocking=true.

This can cause upwards of 60-100ms of unneeded latency when exiting
self-refresh, which can cause very noticeable lag when, say, moving a
cursor.

Presumbaly it's OK to let the display finish exiting refresh in parallel
with clocking out the next video frames, so we shouldn't hold up the
atomic_enable() step. This also brings behavior in line with the
downstream ("mainline-derived") variant of the driver currently deployed
to Chrome OS Rockchip systems.

Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin).

Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
Cc: <stable@vger.kernel.org>
Cc: Zain Wang <wzz@rock-chips.com>
Cc: Tomasz Figa <tfiga@chromium.org>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Sean Paul <seanpaul@chromium.org>
Signed-off-by: Brian Norris <briannorris@chromium.org>
---
CC list is partially constructed from the commit message of the Fixed
commit

 drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Sean Paul Oct. 21, 2021, 12:40 a.m. UTC | #1
On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote:
> Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
> "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
> accidentally (?) set blocking=true.

IIRC this wasn't accidental.

The reason it became synchronous was:
 - To avoid racing a subsequent PSR entry (if exit takes a long time)
 - To avoid racing disable/modeset
 - We're not displaying new content while exiting PSR anyways, so there is
   minimal utility in allowing frames to be submitted
 - We're lying to userspace telling them frames are on the screen when we're
   just dropping them on the floor

The actual latency gains from doing this synchronously are minimal since the
panel will display new content as soon as it can regardless of whether the
kernel is blocking. There is likely a perceptual difference, but that's only
because kernel is lying to userspace and skipping frames without consent.

Going back to the first line, it's entirely possible my memory is failing
and this was accidental!

Sean

> 
> This can cause upwards of 60-100ms of unneeded latency when exiting
> self-refresh, which can cause very noticeable lag when, say, moving a
> cursor.
> 
> Presumbaly it's OK to let the display finish exiting refresh in parallel
> with clocking out the next video frames, so we shouldn't hold up the
> atomic_enable() step. This also brings behavior in line with the
> downstream ("mainline-derived") variant of the driver currently deployed
> to Chrome OS Rockchip systems.
> 
> Tested on a Samsung Chromebook Plus (i.e., Rockchip RK3399 Gru Kevin).
> 
> Fixes: 6c836d965bad ("drm/rockchip: Use the helpers for PSR")
> Cc: <stable@vger.kernel.org>
> Cc: Zain Wang <wzz@rock-chips.com>
> Cc: Tomasz Figa <tfiga@chromium.org>
> Cc: Heiko Stuebner <heiko@sntech.de>
> Cc: Sean Paul <seanpaul@chromium.org>
> Signed-off-by: Brian Norris <briannorris@chromium.org>
> ---
> CC list is partially constructed from the commit message of the Fixed
> commit
> 
>  drivers/gpu/drm/bridge/analogix/analogix_dp_core.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> index b7d2e4449cfa..fbe6eb9df310 100644
> --- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> +++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
> @@ -1055,7 +1055,7 @@ static int analogix_dp_disable_psr(struct analogix_dp_device *dp)
>  	psr_vsc.db[0] = 0;
>  	psr_vsc.db[1] = 0;
>  
> -	return analogix_dp_send_psr_spd(dp, &psr_vsc, true);
> +	return analogix_dp_send_psr_spd(dp, &psr_vsc, false);
>  }
>  
>  /*
> -- 
> 2.33.0.1079.g6e70778dc9-goog
>
Brian Norris Oct. 21, 2021, 1:23 a.m. UTC | #2
On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote:
> On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote:
> > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
> > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
> > accidentally (?) set blocking=true.
>
> IIRC this wasn't accidental.
>
> The reason it became synchronous was:
>  - To avoid racing a subsequent PSR entry (if exit takes a long time)

How did this work pre-commit-6c836d965bad then? I don't see any
provision for avoiding subsequent PSR entry. Or I guess that was
implicitly covered by PSR_FLUSH_TIMEOUT_MS, which means we allowed at
least 100ms between exit/entry each time, which was good enough? And
in the "new" implementation, that turned into a running average that
gets measured on each commit? So we're no longer guaranteed 100ms, and
it's even worse if we cheat the timing measurement?

I'm still not sure that "race" is truly a problem without consulting
some kind of hardware documentation though. It wouldn't surprise me if
these things are cancelable.

>  - To avoid racing disable/modeset
>  - We're not displaying new content while exiting PSR anyways, so there is
>    minimal utility in allowing frames to be submitted
>  - We're lying to userspace telling them frames are on the screen when we're
>    just dropping them on the floor
>
> The actual latency gains from doing this synchronously are minimal since the
> panel will display new content as soon as it can regardless of whether the
> kernel is blocking. There is likely a perceptual difference, but that's only
> because kernel is lying to userspace and skipping frames without consent.

Hmm, you might well be right about some of the first points (I'm still
learning the DRM framework), but I'm a bit skeptical that the
perceptual difference is "only" because we're cheating in some way.
I'm not doing science here, and it's certainly not a blinded test, but
I'm nearly certain this patch cuts out approx 50-80% of the cursor lag
I see without this patch (relative to the current Chrome OS kernel). I
don't see how cheating would produce a smoother cursor movement --
we'd still be dropping frames, and the movement would appear jumpy
somewhere along the way.

In any case, I'm absolutely certain that mainline Linux performs much
much worse with PSR than the current CrOS kernel, but there are some
other potential reasons for that, such as the lack of an
input-notifier [1].

> Going back to the first line, it's entirely possible my memory is failing
> and this was accidental!

Well either way, thanks for the notes. I'll see if I can get anywhere
on proving/disproving that they are relevant, or if they can be worked
around some other way; or perhaps I can regain the lost performance
some other way. It'll be a few days before I get around to that.

Brian

[1] This got locked up in "controversy":
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balletbo@collabora.com/
Brian Norris Oct. 21, 2021, 1:41 a.m. UTC | #3
(Dropping Andrzej, because that address keeps bouncing. Does
MAINTAINERS and/or .mailmap need updating?)

Apologies for the double reply here, but I forgot to mention one last
thing for now:

On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote:
>
> On Wed, Oct 20, 2021 at 04:17:28PM -0700, Brian Norris wrote:
> > Prior to commit 6c836d965bad ("drm/rockchip: Use the helpers for PSR"),
> > "PSR disable" used non-blocking analogix_dp_send_psr_spd(). The refactor
> > accidentally (?) set blocking=true.
>
> IIRC this wasn't accidental.

One other tip that made me think it was accidental was that today, the
|blocking| argument to analogix_dp_send_psr_spd() is always true. If
non-blocking support was intentionally dropped, it seemed like you
should have dropped the non-blocking code too. But that's a weak proof
of your intentions :)

Brian
Brian Norris Oct. 29, 2021, 11:10 p.m. UTC | #4
On Wed, Oct 20, 2021 at 06:23:35PM -0700, Brian Norris wrote:
> On Wed, Oct 20, 2021 at 5:40 PM Sean Paul <sean@poorly.run> wrote:
> > The actual latency gains from doing this synchronously are minimal since the
> > panel will display new content as soon as it can regardless of whether the
> > kernel is blocking. There is likely a perceptual difference, but that's only
> > because kernel is lying to userspace and skipping frames without consent.
> 
> Hmm, you might well be right about some of the first points (I'm still
> learning the DRM framework), but I'm a bit skeptical that the
> perceptual difference is "only" because we're cheating in some way.
> I'm not doing science here, and it's certainly not a blinded test, but
> I'm nearly certain this patch cuts out approx 50-80% of the cursor lag
> I see without this patch (relative to the current Chrome OS kernel). I
> don't see how cheating would produce a smoother cursor movement --
> we'd still be dropping frames, and the movement would appear jumpy
> somewhere along the way.

Aha, so I think I found {a,the} reason for some disagreement here:
looking at the eDP PSR spec, I see that while the current implementation
is looking for psr_state==DP_PSR_SINK_INACTIVE to signal PSR-exit
completion, the spec shows an intermediate state
(DP_PSR_SINK_ACTIVE_RESYNC == 4), where among other things, "the Sink
device must display the incoming active frames from the Source device
with no visible glitches and/or artifacts."

And it happens that we move to DP_PSR_SINK_ACTIVE_RESYNC somewhat
quickly (on the order of 20-40ms), while the move to
DP_PSR_SINK_INACTIVE is a good chunk longer (approx 60ms more). So
pre-commit-6c836d965bad might have been cheating a little (we'd claim
we're "done" about 20-40ms too early), but post-commit-6c836d965bad
we're waiting about 60ms too long.

I'll send v2 to make this block for DP_PSR_SINK_ACTIVE_RESYNC ||
DP_PSR_SINK_INACTIVE, which gets much or all of the same latency win,
and I'll try to document the reasons, etc., better.

I'll probably also include a patch to drop the 'blocking' parameter,
since it's unused, and gives the wrong idea about this state machine.

> In any case, I'm absolutely certain that mainline Linux performs much
> much worse with PSR than the current CrOS kernel, but there are some
> other potential reasons for that, such as the lack of an
> input-notifier [1].
...
> [1] This got locked up in "controversy":
> https://patchwork.kernel.org/project/linux-arm-kernel/patch/20180405095000.9756-25-enric.balletbo@collabora.com/

While I'm here: I also played with this a bit, and I still haven't
gotten all the details right, but I don't believe this alone will get
the latency wins we'd like. We still need something like the above.

Brian
diff mbox series

Patch

diff --git a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
index b7d2e4449cfa..fbe6eb9df310 100644
--- a/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
+++ b/drivers/gpu/drm/bridge/analogix/analogix_dp_core.c
@@ -1055,7 +1055,7 @@  static int analogix_dp_disable_psr(struct analogix_dp_device *dp)
 	psr_vsc.db[0] = 0;
 	psr_vsc.db[1] = 0;
 
-	return analogix_dp_send_psr_spd(dp, &psr_vsc, true);
+	return analogix_dp_send_psr_spd(dp, &psr_vsc, false);
 }
 
 /*