[6/8] drm/bochs: phase 3: provide a custom ->atomic_commit implementation
diff mbox

Message ID 1437049241-11972-7-git-send-email-zhjwpku@gmail.com
State New
Headers show

Commit Message

John Hunter July 16, 2015, 12:20 p.m. UTC
From: Zhao Junwang <zhjwpku@gmail.com>

This supports the asynchronous commits, required for page-flipping
Since it's virtual hw it's ok to commit async stuff right away, we
never have to wait for vblank.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
---
 drivers/gpu/drm/bochs/bochs_mm.c |    9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

Comments

Pekka Paalanen July 17, 2015, 6:08 a.m. UTC | #1
On Thu, 16 Jul 2015 20:20:39 +0800
John Hunter <zhjwpku@gmail.com> wrote:

> From: Zhao Junwang <zhjwpku@gmail.com>
> 
> This supports the asynchronous commits, required for page-flipping
> Since it's virtual hw it's ok to commit async stuff right away, we
> never have to wait for vblank.

Hi,

in theory, yes. This is what a patch to bochs implemented not too long
ago, so AFAIK you are only replicating the existing behaviour.

However, if userspace doing an async commit (or sync, I suppose) does
not incur any waits in the kernel in e.g. sending the page flip event,
then flip driven programs (e.g. a Wayland compositor, say, Weston)
will be running its rendering loop as a busy-loop, because the kernel
does not throttle it to the (virtual) display refresh rate.

This will cause maximal CPU usage and poor user experience as
everything else needs to fight for CPU time and event dispatch to get
through, like input.

I would hope someone could do a follow-up to implement a refresh cycle
emulation based on a clock. Userspace expects page flips to happen at
most at refresh rate when asking for vblank-synced flips. It's only
natural for userspace to drive its rendering loop based on the vblank
cycle.


Thanks,
pq

> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
> ---
>  drivers/gpu/drm/bochs/bochs_mm.c |    9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
> index c1d819c..37ac2ca 100644
> --- a/drivers/gpu/drm/bochs/bochs_mm.c
> +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> @@ -545,8 +545,15 @@ bochs_user_framebuffer_create(struct drm_device *dev,
>  	return &bochs_fb->base;
>  }
>  
> +static int bochs_atomic_commit(struct drm_device *dev,
> +			     struct drm_atomic_state *a,
> +			     bool async)
> +{
> +	return drm_atomic_helper_commit(dev, a, false);
> +}
> +
>  const struct drm_mode_config_funcs bochs_mode_funcs = {
>  	.fb_create = bochs_user_framebuffer_create,
>  	.atomic_check = drm_atomic_helper_check,
> -	.atomic_commit = drm_atomic_helper_commit,
> +	.atomic_commit = bochs_atomic_commit,
>  };
John Hunter July 19, 2015, 11:56 p.m. UTC | #2
Hi Pekka,

Thanks for the information, I will talk to my mentor Daniel and try to find
out
what I can do about this.

Cheers,
Zhao Junwang

On Fri, Jul 17, 2015 at 2:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:

> On Thu, 16 Jul 2015 20:20:39 +0800
> John Hunter <zhjwpku@gmail.com> wrote:
>
> > From: Zhao Junwang <zhjwpku@gmail.com>
> >
> > This supports the asynchronous commits, required for page-flipping
> > Since it's virtual hw it's ok to commit async stuff right away, we
> > never have to wait for vblank.
>
> Hi,
>
> in theory, yes. This is what a patch to bochs implemented not too long
> ago, so AFAIK you are only replicating the existing behaviour.
>
> However, if userspace doing an async commit (or sync, I suppose) does
> not incur any waits in the kernel in e.g. sending the page flip event,
> then flip driven programs (e.g. a Wayland compositor, say, Weston)
> will be running its rendering loop as a busy-loop, because the kernel
> does not throttle it to the (virtual) display refresh rate.
>
> This will cause maximal CPU usage and poor user experience as
> everything else needs to fight for CPU time and event dispatch to get
> through, like input.
>
> I would hope someone could do a follow-up to implement a refresh cycle
> emulation based on a clock. Userspace expects page flips to happen at
> most at refresh rate when asking for vblank-synced flips. It's only
> natural for userspace to drive its rendering loop based on the vblank
> cycle.
>
>
> Thanks,
> pq
>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
> > ---
> >  drivers/gpu/drm/bochs/bochs_mm.c |    9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/bochs/bochs_mm.c
> b/drivers/gpu/drm/bochs/bochs_mm.c
> > index c1d819c..37ac2ca 100644
> > --- a/drivers/gpu/drm/bochs/bochs_mm.c
> > +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> > @@ -545,8 +545,15 @@ bochs_user_framebuffer_create(struct drm_device
> *dev,
> >       return &bochs_fb->base;
> >  }
> >
> > +static int bochs_atomic_commit(struct drm_device *dev,
> > +                          struct drm_atomic_state *a,
> > +                          bool async)
> > +{
> > +     return drm_atomic_helper_commit(dev, a, false);
> > +}
> > +
> >  const struct drm_mode_config_funcs bochs_mode_funcs = {
> >       .fb_create = bochs_user_framebuffer_create,
> >       .atomic_check = drm_atomic_helper_check,
> > -     .atomic_commit = drm_atomic_helper_commit,
> > +     .atomic_commit = bochs_atomic_commit,
> >  };
>
>
Stéphane Marchesin July 20, 2015, 12:20 a.m. UTC | #3
On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>
> On Thu, 16 Jul 2015 20:20:39 +0800
> John Hunter <zhjwpku@gmail.com> wrote:
>
> > From: Zhao Junwang <zhjwpku@gmail.com>
> >
> > This supports the asynchronous commits, required for page-flipping
> > Since it's virtual hw it's ok to commit async stuff right away, we
> > never have to wait for vblank.
>
> Hi,
>
> in theory, yes. This is what a patch to bochs implemented not too long
> ago, so AFAIK you are only replicating the existing behaviour.
>
> However, if userspace doing an async commit (or sync, I suppose) does
> not incur any waits in the kernel in e.g. sending the page flip event,
> then flip driven programs (e.g. a Wayland compositor, say, Weston)
> will be running its rendering loop as a busy-loop, because the kernel
> does not throttle it to the (virtual) display refresh rate.
>
> This will cause maximal CPU usage and poor user experience as
> everything else needs to fight for CPU time and event dispatch to get
> through, like input.
>
> I would hope someone could do a follow-up to implement a refresh cycle
> emulation based on a clock. Userspace expects page flips to happen at
> most at refresh rate when asking for vblank-synced flips. It's only
> natural for userspace to drive its rendering loop based on the vblank
> cycle.


I've been asking myself the same question (for the UDL driver) and I'm
not sure if this policy should go in the kernel. After all, there
could be legitimate reasons for user space to render lots of frames
per second. It seems to me that if user space doesn't want too many
fps, it should just throttle itself.

Stéphane


>
>
>
> Thanks,
> pq
>
> > Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> > Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> > Signed-off-by: Zhao Junwang <zhjwpku@gmail.com>
> > ---
> >  drivers/gpu/drm/bochs/bochs_mm.c |    9 ++++++++-
> >  1 file changed, 8 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
> > index c1d819c..37ac2ca 100644
> > --- a/drivers/gpu/drm/bochs/bochs_mm.c
> > +++ b/drivers/gpu/drm/bochs/bochs_mm.c
> > @@ -545,8 +545,15 @@ bochs_user_framebuffer_create(struct drm_device *dev,
> >       return &bochs_fb->base;
> >  }
> >
> > +static int bochs_atomic_commit(struct drm_device *dev,
> > +                          struct drm_atomic_state *a,
> > +                          bool async)
> > +{
> > +     return drm_atomic_helper_commit(dev, a, false);
> > +}
> > +
> >  const struct drm_mode_config_funcs bochs_mode_funcs = {
> >       .fb_create = bochs_user_framebuffer_create,
> >       .atomic_check = drm_atomic_helper_check,
> > -     .atomic_commit = drm_atomic_helper_commit,
> > +     .atomic_commit = bochs_atomic_commit,
> >  };
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
Pekka Paalanen July 20, 2015, 7:46 a.m. UTC | #4
On Sun, 19 Jul 2015 17:20:32 -0700
Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:

> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >
> > On Thu, 16 Jul 2015 20:20:39 +0800
> > John Hunter <zhjwpku@gmail.com> wrote:
> >
> > > From: Zhao Junwang <zhjwpku@gmail.com>
> > >
> > > This supports the asynchronous commits, required for page-flipping
> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > > never have to wait for vblank.
> >
> > Hi,
> >
> > in theory, yes. This is what a patch to bochs implemented not too long
> > ago, so AFAIK you are only replicating the existing behaviour.
> >
> > However, if userspace doing an async commit (or sync, I suppose) does
> > not incur any waits in the kernel in e.g. sending the page flip event,
> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > will be running its rendering loop as a busy-loop, because the kernel
> > does not throttle it to the (virtual) display refresh rate.
> >
> > This will cause maximal CPU usage and poor user experience as
> > everything else needs to fight for CPU time and event dispatch to get
> > through, like input.
> >
> > I would hope someone could do a follow-up to implement a refresh cycle
> > emulation based on a clock. Userspace expects page flips to happen at
> > most at refresh rate when asking for vblank-synced flips. It's only
> > natural for userspace to drive its rendering loop based on the vblank
> > cycle.
> 
> 
> I've been asking myself the same question (for the UDL driver) and I'm
> not sure if this policy should go in the kernel. After all, there
> could be legitimate reasons for user space to render lots of frames
> per second. It seems to me that if user space doesn't want too many
> fps, it should just throttle itself.

If userspace wants to render lots of frames per second, IMO it should
not be using vblank-synced operations in a way that may throttle it.
The lots of frames use case is already non-working for the majority of
the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?

The problem here I see is that one DRM driver decides to work different
to other DRM drivers. All real-hardware DRM drivers, when asked to do
vblank-synced update, actually do throttle to the vblank AFAIK. Is it
too much to assume, that the video mode set in a driver (refresh rate)
corresponds to the vblank rate which implicitly delays the completion
of vblank-sync'd operations to at least the next vblank boundary?

I think, if the driver cannot implement proper semantics (which IMO
includes the throttling) for vblank-sync'd operations and it does not
want to fake them with a clock, it should just refuse vblank-synced
operations. That would push the problem to userspace, and it would be
obvious what's going wrong. Naturally, it would break some userspace
programs that expect vblank-synced operations to work, but is that
so much different to the current unfixed situation?


Thanks,
pq
Stéphane Marchesin July 20, 2015, 8:58 a.m. UTC | #5
On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> On Sun, 19 Jul 2015 17:20:32 -0700
> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
>
>> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>> >
>> > On Thu, 16 Jul 2015 20:20:39 +0800
>> > John Hunter <zhjwpku@gmail.com> wrote:
>> >
>> > > From: Zhao Junwang <zhjwpku@gmail.com>
>> > >
>> > > This supports the asynchronous commits, required for page-flipping
>> > > Since it's virtual hw it's ok to commit async stuff right away, we
>> > > never have to wait for vblank.
>> >
>> > Hi,
>> >
>> > in theory, yes. This is what a patch to bochs implemented not too long
>> > ago, so AFAIK you are only replicating the existing behaviour.
>> >
>> > However, if userspace doing an async commit (or sync, I suppose) does
>> > not incur any waits in the kernel in e.g. sending the page flip event,
>> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
>> > will be running its rendering loop as a busy-loop, because the kernel
>> > does not throttle it to the (virtual) display refresh rate.
>> >
>> > This will cause maximal CPU usage and poor user experience as
>> > everything else needs to fight for CPU time and event dispatch to get
>> > through, like input.
>> >
>> > I would hope someone could do a follow-up to implement a refresh cycle
>> > emulation based on a clock. Userspace expects page flips to happen at
>> > most at refresh rate when asking for vblank-synced flips. It's only
>> > natural for userspace to drive its rendering loop based on the vblank
>> > cycle.
>>
>>
>> I've been asking myself the same question (for the UDL driver) and I'm
>> not sure if this policy should go in the kernel. After all, there
>> could be legitimate reasons for user space to render lots of frames
>> per second. It seems to me that if user space doesn't want too many
>> fps, it should just throttle itself.
>
> If userspace wants to render lots of frames per second, IMO it should
> not be using vblank-synced operations in a way that may throttle it.
> The lots of frames use case is already non-working for the majority of
> the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
>
> The problem here I see is that one DRM driver decides to work different
> to other DRM drivers. All real-hardware DRM drivers, when asked to do
> vblank-synced update, actually do throttle to the vblank AFAIK.

udl is an exception here. It is (arguably) real hardware but doesn't throttle.

> Is it
> too much to assume, that the video mode set in a driver (refresh rate)
> corresponds to the vblank rate which implicitly delays the completion
> of vblank-sync'd operations to at least the next vblank boundary?

I think it's wrong to make user space think that a vsynced display
always matches the refresh rate in a world where:

- some displays have variable refresh rates (not just the fancy new
stuff like g-sync, look for lvds_downclock in the intel driver for
example, also consider DSI displays)

- some displays have no refresh rate (the ones we are talking about
here: udl, bochs...)

- you can do partial vsynced updates by just waiting for a specific
scanline range which again breaks the assumption that "vsynced" ==
"refreshes at the monitor rate". In this case there is no visible
tearing (in that sense it is vsynced) but the flip time is not
predictable using the refresh rate.

So I don't think we should perpetuate that problem. And I would like
user space to "see" the actual flip times to enable some kind of
scheduling where possible.

>
> I think, if the driver cannot implement proper semantics (which IMO
> includes the throttling) for vblank-sync'd operations and it does not
> want to fake them with a clock, it should just refuse vblank-synced
> operations.

Yes refusing vsynced flips for these drivers sounds reasonable. But
please let's not bake in another assumption in the API (or rather,
let's try to un-bake it).

Stéphane


> That would push the problem to userspace, and it would be
> obvious what's going wrong. Naturally, it would break some userspace
> programs that expect vblank-synced operations to work, but is that
> so much different to the current unfixed situation?
>
>
> Thanks,
> pq
Pekka Paalanen July 20, 2015, 9:35 a.m. UTC | #6
On Mon, 20 Jul 2015 01:58:33 -0700
Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:

> On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > On Sun, 19 Jul 2015 17:20:32 -0700
> > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> >
> >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >> >
> >> > On Thu, 16 Jul 2015 20:20:39 +0800
> >> > John Hunter <zhjwpku@gmail.com> wrote:
> >> >
> >> > > From: Zhao Junwang <zhjwpku@gmail.com>
> >> > >
> >> > > This supports the asynchronous commits, required for page-flipping
> >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> >> > > never have to wait for vblank.
> >> >
> >> > Hi,
> >> >
> >> > in theory, yes. This is what a patch to bochs implemented not too long
> >> > ago, so AFAIK you are only replicating the existing behaviour.
> >> >
> >> > However, if userspace doing an async commit (or sync, I suppose) does
> >> > not incur any waits in the kernel in e.g. sending the page flip event,
> >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> >> > will be running its rendering loop as a busy-loop, because the kernel
> >> > does not throttle it to the (virtual) display refresh rate.
> >> >
> >> > This will cause maximal CPU usage and poor user experience as
> >> > everything else needs to fight for CPU time and event dispatch to get
> >> > through, like input.
> >> >
> >> > I would hope someone could do a follow-up to implement a refresh cycle
> >> > emulation based on a clock. Userspace expects page flips to happen at
> >> > most at refresh rate when asking for vblank-synced flips. It's only
> >> > natural for userspace to drive its rendering loop based on the vblank
> >> > cycle.
> >>
> >>
> >> I've been asking myself the same question (for the UDL driver) and I'm
> >> not sure if this policy should go in the kernel. After all, there
> >> could be legitimate reasons for user space to render lots of frames
> >> per second. It seems to me that if user space doesn't want too many
> >> fps, it should just throttle itself.
> >
> > If userspace wants to render lots of frames per second, IMO it should
> > not be using vblank-synced operations in a way that may throttle it.
> > The lots of frames use case is already non-working for the majority of
> > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> >
> > The problem here I see is that one DRM driver decides to work different
> > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > vblank-synced update, actually do throttle to the vblank AFAIK.
> 
> udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> 
> > Is it
> > too much to assume, that the video mode set in a driver (refresh rate)
> > corresponds to the vblank rate which implicitly delays the completion
> > of vblank-sync'd operations to at least the next vblank boundary?
> 
> I think it's wrong to make user space think that a vsynced display
> always matches the refresh rate in a world where:
> 
> - some displays have variable refresh rates (not just the fancy new
> stuff like g-sync, look for lvds_downclock in the intel driver for
> example, also consider DSI displays)
> 
> - some displays have no refresh rate (the ones we are talking about
> here: udl, bochs...)

That means that refresh rate in a video mode is bogus. Can userspace
know when the refresh rate is meaningless? I suppose there are two
different cases of meaningless, too: when the driver ignores it as
input argument, and when it is used but has no guarantees for timings.

Assuming it's always meaningless wrt. timings is pretty harsh. E.g. the
Wayland Presentation extension's implementation in Weston uses the
refresh rate to predict the next flip time and hands it out to clients
for scheduling/interpolation purposes.

> - you can do partial vsynced updates by just waiting for a specific
> scanline range which again breaks the assumption that "vsynced" ==
> "refreshes at the monitor rate". In this case there is no visible
> tearing (in that sense it is vsynced) but the flip time is not
> predictable using the refresh rate.

Okay. That also invalidates the design (well, at least the
implementation, and sounds like DRM does not give any tools to allow
implementing it) the Wayland Presentation extension even on "good"
hardware, so nice to realize. I was already suggesting we should
stabilize it since it looks good, but this puts it all back to the
drawing board.

I think it also mostly invalidates the whole scheduling implementation
in Weston.

> So I don't think we should perpetuate that problem. And I would like
> user space to "see" the actual flip times to enable some kind of
> scheduling where possible.
> 
> >
> > I think, if the driver cannot implement proper semantics (which IMO
> > includes the throttling) for vblank-sync'd operations and it does not
> > want to fake them with a clock, it should just refuse vblank-synced
> > operations.
> 
> Yes refusing vsynced flips for these drivers sounds reasonable. But
> please let's not bake in another assumption in the API (or rather,
> let's try to un-bake it).

Could you be more specific on everything, please?

What should drivers do in different situations, what guarantees we do
have, and how does userspace predict the earliest possible flip time?
How do you define flip time to begin with, if it's not tied to the
scanout cycle (vblank)?

How should a compositor schedule eveything, and what can it tell to the
clients about the timings in the immediate future?

You gave me the feeling that everything I thought I knew and relied on
is wrong.

> > That would push the problem to userspace, and it would be
> > obvious what's going wrong. Naturally, it would break some userspace
> > programs that expect vblank-synced operations to work, but is that
> > so much different to the current unfixed situation?

Thanks,
pq
Daniel Vetter July 20, 2015, 2:21 p.m. UTC | #7
On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> On Mon, 20 Jul 2015 01:58:33 -0700
> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> 
> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > On Sun, 19 Jul 2015 17:20:32 -0700
> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > >
> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >> >
> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> > >> > John Hunter <zhjwpku@gmail.com> wrote:
> > >> >
> > >> > > From: Zhao Junwang <zhjwpku@gmail.com>
> > >> > >
> > >> > > This supports the asynchronous commits, required for page-flipping
> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > >> > > never have to wait for vblank.
> > >> >
> > >> > Hi,
> > >> >
> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> > >> >
> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > >> > will be running its rendering loop as a busy-loop, because the kernel
> > >> > does not throttle it to the (virtual) display refresh rate.
> > >> >
> > >> > This will cause maximal CPU usage and poor user experience as
> > >> > everything else needs to fight for CPU time and event dispatch to get
> > >> > through, like input.
> > >> >
> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> > >> > emulation based on a clock. Userspace expects page flips to happen at
> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> > >> > natural for userspace to drive its rendering loop based on the vblank
> > >> > cycle.
> > >>
> > >>
> > >> I've been asking myself the same question (for the UDL driver) and I'm
> > >> not sure if this policy should go in the kernel. After all, there
> > >> could be legitimate reasons for user space to render lots of frames
> > >> per second. It seems to me that if user space doesn't want too many
> > >> fps, it should just throttle itself.
> > >
> > > If userspace wants to render lots of frames per second, IMO it should
> > > not be using vblank-synced operations in a way that may throttle it.
> > > The lots of frames use case is already non-working for the majority of
> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> > >
> > > The problem here I see is that one DRM driver decides to work different
> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> > 
> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> > 
> > > Is it
> > > too much to assume, that the video mode set in a driver (refresh rate)
> > > corresponds to the vblank rate which implicitly delays the completion
> > > of vblank-sync'd operations to at least the next vblank boundary?
> > 
> > I think it's wrong to make user space think that a vsynced display
> > always matches the refresh rate in a world where:
> > 
> > - some displays have variable refresh rates (not just the fancy new
> > stuff like g-sync, look for lvds_downclock in the intel driver for
> > example, also consider DSI displays)
> > 
> > - some displays have no refresh rate (the ones we are talking about
> > here: udl, bochs...)
> 
> That means that refresh rate in a video mode is bogus. Can userspace
> know when the refresh rate is meaningless? I suppose there are two
> different cases of meaningless, too: when the driver ignores it as
> input argument, and when it is used but has no guarantees for timings.
> 
> Assuming it's always meaningless wrt. timings is pretty harsh. E.g. the
> Wayland Presentation extension's implementation in Weston uses the
> refresh rate to predict the next flip time and hands it out to clients
> for scheduling/interpolation purposes.

We removed lvds_downclock in i915 (was never enabled by default). The new
code is currently edp only, but enabled by default. And it tries _really_
hard to keep up the illusion that the requested vrefresh is the one you
actually have - it upclocks every time userspace changes the screen.

The only exception is when you only ask for vblank events, and the delay
to the next pageflip is longer than the timeout. That can't be fixed right
now because drm_irq.c is the last midlayer needed by modern drivers, but
I'd like to fix it eventually.

There's future plans to allow userspace to explicitly ask for a reduced
vrefresh (e.g. video playback), but then obviously they'll match again
too.

We also have a big pile of testcases in kms_flip which check that the
timestamps are evenly spaced and agree with vrefresh. There's some oddball
bugs around tv-out that we never bothered to fix (since the requested mode
is a fake one and rescaled by the TV port, which also has it's own clock).

Imo aiming for vrefresh to be accurate is good. For gsync and friends I
think we should have an explicit range or flag to make userspace aware of
what's going on.

> > - you can do partial vsynced updates by just waiting for a specific
> > scanline range which again breaks the assumption that "vsynced" ==
> > "refreshes at the monitor rate". In this case there is no visible
> > tearing (in that sense it is vsynced) but the flip time is not
> > predictable using the refresh rate.
> 
> Okay. That also invalidates the design (well, at least the
> implementation, and sounds like DRM does not give any tools to allow
> implementing it) the Wayland Presentation extension even on "good"
> hardware, so nice to realize. I was already suggesting we should
> stabilize it since it looks good, but this puts it all back to the
> drawing board.
> 
> I think it also mostly invalidates the whole scheduling implementation
> in Weston.

Currently scanline waits is just something the intel DDX does to update
dri and video clients without tearing while still essentially doing
unsynced frontbuffer rendering. It sucks down massive amounts of gpu
bandwidth since it fully stalls the rendering (at least on i915 hw).

As long as you have buffer_age and friends you should be able to be as
efficient with pageflips (or better) as with scanline waits.

> > So I don't think we should perpetuate that problem. And I would like
> > user space to "see" the actual flip times to enable some kind of
> > scheduling where possible.
> > 
> > >
> > > I think, if the driver cannot implement proper semantics (which IMO
> > > includes the throttling) for vblank-sync'd operations and it does not
> > > want to fake them with a clock, it should just refuse vblank-synced
> > > operations.
> > 
> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> > please let's not bake in another assumption in the API (or rather,
> > let's try to un-bake it).
> 
> Could you be more specific on everything, please?
> 
> What should drivers do in different situations, what guarantees we do
> have, and how does userspace predict the earliest possible flip time?
> How do you define flip time to begin with, if it's not tied to the
> scanout cycle (vblank)?
> 
> How should a compositor schedule eveything, and what can it tell to the
> clients about the timings in the immediate future?
> 
> You gave me the feeling that everything I thought I knew and relied on
> is wrong.

I guess we either kick out page_flip for all drivers who fake it. And if
that's causing regressions then we probably want to fake it with a timer.
Unpretty, but such is the game of backwards compat forever. But I'm not
sure whether we established that we have a problem already, at least I'm
missing users screaming about udl/bochs & friends.
-Daniel
Stéphane Marchesin July 20, 2015, 5:32 p.m. UTC | #8
On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
>> On Mon, 20 Jul 2015 01:58:33 -0700
>> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
>>
>> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>> > > On Sun, 19 Jul 2015 17:20:32 -0700
>> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
>> > >
>> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
>> > >> >
>> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
>> > >> > John Hunter <zhjwpku@gmail.com> wrote:
>> > >> >
>> > >> > > From: Zhao Junwang <zhjwpku@gmail.com>
>> > >> > >
>> > >> > > This supports the asynchronous commits, required for page-flipping
>> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
>> > >> > > never have to wait for vblank.
>> > >> >
>> > >> > Hi,
>> > >> >
>> > >> > in theory, yes. This is what a patch to bochs implemented not too long
>> > >> > ago, so AFAIK you are only replicating the existing behaviour.
>> > >> >
>> > >> > However, if userspace doing an async commit (or sync, I suppose) does
>> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
>> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
>> > >> > will be running its rendering loop as a busy-loop, because the kernel
>> > >> > does not throttle it to the (virtual) display refresh rate.
>> > >> >
>> > >> > This will cause maximal CPU usage and poor user experience as
>> > >> > everything else needs to fight for CPU time and event dispatch to get
>> > >> > through, like input.
>> > >> >
>> > >> > I would hope someone could do a follow-up to implement a refresh cycle
>> > >> > emulation based on a clock. Userspace expects page flips to happen at
>> > >> > most at refresh rate when asking for vblank-synced flips. It's only
>> > >> > natural for userspace to drive its rendering loop based on the vblank
>> > >> > cycle.
>> > >>
>> > >>
>> > >> I've been asking myself the same question (for the UDL driver) and I'm
>> > >> not sure if this policy should go in the kernel. After all, there
>> > >> could be legitimate reasons for user space to render lots of frames
>> > >> per second. It seems to me that if user space doesn't want too many
>> > >> fps, it should just throttle itself.
>> > >
>> > > If userspace wants to render lots of frames per second, IMO it should
>> > > not be using vblank-synced operations in a way that may throttle it.
>> > > The lots of frames use case is already non-working for the majority of
>> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
>> > >
>> > > The problem here I see is that one DRM driver decides to work different
>> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
>> > > vblank-synced update, actually do throttle to the vblank AFAIK.
>> >
>> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
>> >
>> > > Is it
>> > > too much to assume, that the video mode set in a driver (refresh rate)
>> > > corresponds to the vblank rate which implicitly delays the completion
>> > > of vblank-sync'd operations to at least the next vblank boundary?
>> >
>> > I think it's wrong to make user space think that a vsynced display
>> > always matches the refresh rate in a world where:
>> >
>> > - some displays have variable refresh rates (not just the fancy new
>> > stuff like g-sync, look for lvds_downclock in the intel driver for
>> > example, also consider DSI displays)
>> >
>> > - some displays have no refresh rate (the ones we are talking about
>> > here: udl, bochs...)
>>
>> That means that refresh rate in a video mode is bogus. Can userspace
>> know when the refresh rate is meaningless? I suppose there are two
>> different cases of meaningless, too: when the driver ignores it as
>> input argument, and when it is used but has no guarantees for timings.
>>
>> Assuming it's always meaningless wrt. timings is pretty harsh. E.g. the
>> Wayland Presentation extension's implementation in Weston uses the
>> refresh rate to predict the next flip time and hands it out to clients
>> for scheduling/interpolation purposes.
>
> We removed lvds_downclock in i915 (was never enabled by default).

We use/ship it, so there is a value I would say.

> The new
> code is currently edp only, but enabled by default. And it tries _really_
> hard to keep up the illusion that the requested vrefresh is the one you
> actually have - it upclocks every time userspace changes the screen.

Except that there is latency in doing so (it can never be done
instantly), and user space definitely sees it, at least when it queues
the first flip. So the illusion doesn't exist.

>
> The only exception is when you only ask for vblank events, and the delay
> to the next pageflip is longer than the timeout. That can't be fixed right
> now because drm_irq.c is the last midlayer needed by modern drivers, but
> I'd like to fix it eventually.
>
> There's future plans to allow userspace to explicitly ask for a reduced
> vrefresh (e.g. video playback), but then obviously they'll match again
> too.
>
> We also have a big pile of testcases in kms_flip which check that the
> timestamps are evenly spaced and agree with vrefresh. There's some oddball
> bugs around tv-out that we never bothered to fix (since the requested mode
> is a fake one and rescaled by the TV port, which also has it's own clock).
>
> Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> think we should have an explicit range or flag to make userspace aware of
> what's going on.

I think the concept of vrefresh is flawed and not really future-proof
(I gave a few examples in my previous email). I agree we should keep
it as legacy, but we should add something else for the more advanced
cases.

>
>> > - you can do partial vsynced updates by just waiting for a specific
>> > scanline range which again breaks the assumption that "vsynced" ==
>> > "refreshes at the monitor rate". In this case there is no visible
>> > tearing (in that sense it is vsynced) but the flip time is not
>> > predictable using the refresh rate.
>>
>> Okay. That also invalidates the design (well, at least the
>> implementation, and sounds like DRM does not give any tools to allow
>> implementing it) the Wayland Presentation extension even on "good"
>> hardware, so nice to realize. I was already suggesting we should
>> stabilize it since it looks good, but this puts it all back to the
>> drawing board.
>>
>> I think it also mostly invalidates the whole scheduling implementation
>> in Weston.
>
> Currently scanline waits is just something the intel DDX does to update
> dri and video clients without tearing while still essentially doing
> unsynced frontbuffer rendering. It sucks down massive amounts of gpu
> bandwidth since it fully stalls the rendering (at least on i915 hw).

That's an i915 specificity though, if it doesn't work on i915, just
don't expose it on i915...

>
> As long as you have buffer_age and friends you should be able to be as
> efficient with pageflips (or better) as with scanline waits.

Flips and scanline waits are completely orthogonal things though --
there are 4 valid combinations in { wait for a scanline, wait for a
vsync } x { do a flip, do a blit}. And you can be more efficient if
you wait for a scanline; for example if you are using a damage rect at
(x,y) and the "beam" is past scanline 0 but before y, you can still do
your scanline flip right away while you can't with vsynced flips.

>
>> > So I don't think we should perpetuate that problem. And I would like
>> > user space to "see" the actual flip times to enable some kind of
>> > scheduling where possible.
>> >
>> > >
>> > > I think, if the driver cannot implement proper semantics (which IMO
>> > > includes the throttling) for vblank-sync'd operations and it does not
>> > > want to fake them with a clock, it should just refuse vblank-synced
>> > > operations.
>> >
>> > Yes refusing vsynced flips for these drivers sounds reasonable. But
>> > please let's not bake in another assumption in the API (or rather,
>> > let's try to un-bake it).
>>
>> Could you be more specific on everything, please?
>>
>> What should drivers do in different situations, what guarantees we do
>> have, and how does userspace predict the earliest possible flip time?
>> How do you define flip time to begin with, if it's not tied to the
>> scanout cycle (vblank)?
>>
>> How should a compositor schedule eveything, and what can it tell to the
>> clients about the timings in the immediate future?
>>
>> You gave me the feeling that everything I thought I knew and relied on
>> is wrong.
>
> I guess we either kick out page_flip for all drivers who fake it. And if
> that's causing regressions then we probably want to fake it with a timer.
> Unpretty, but such is the game of backwards compat forever. But I'm not
> sure whether we established that we have a problem already, at least I'm
> missing users screaming about udl/bochs & friends.

Yeah I don't think I care about the old interface, it is what it is.
But we should design something which works well for the future use
cases.

Stéphane

> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Pekka Paalanen July 21, 2015, 7:06 a.m. UTC | #9
On Mon, 20 Jul 2015 10:32:31 -0700
Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:

> On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> >> On Mon, 20 Jul 2015 01:58:33 -0700
> >> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> >>
> >> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >> > > On Sun, 19 Jul 2015 17:20:32 -0700
> >> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> >> > >
> >> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >> > >> >
> >> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> >> > >> > John Hunter <zhjwpku@gmail.com> wrote:
> >> > >> >
> >> > >> > > From: Zhao Junwang <zhjwpku@gmail.com>
> >> > >> > >
> >> > >> > > This supports the asynchronous commits, required for page-flipping
> >> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> >> > >> > > never have to wait for vblank.
> >> > >> >
> >> > >> > Hi,
> >> > >> >
> >> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> >> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> >> > >> >
> >> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> >> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> >> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> >> > >> > will be running its rendering loop as a busy-loop, because the kernel
> >> > >> > does not throttle it to the (virtual) display refresh rate.
> >> > >> >
> >> > >> > This will cause maximal CPU usage and poor user experience as
> >> > >> > everything else needs to fight for CPU time and event dispatch to get
> >> > >> > through, like input.
> >> > >> >
> >> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> >> > >> > emulation based on a clock. Userspace expects page flips to happen at
> >> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> >> > >> > natural for userspace to drive its rendering loop based on the vblank
> >> > >> > cycle.
> >> > >>
> >> > >>
> >> > >> I've been asking myself the same question (for the UDL driver) and I'm
> >> > >> not sure if this policy should go in the kernel. After all, there
> >> > >> could be legitimate reasons for user space to render lots of frames
> >> > >> per second. It seems to me that if user space doesn't want too many
> >> > >> fps, it should just throttle itself.
> >> > >
> >> > > If userspace wants to render lots of frames per second, IMO it should
> >> > > not be using vblank-synced operations in a way that may throttle it.
> >> > > The lots of frames use case is already non-working for the majority of
> >> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> >> > >
> >> > > The problem here I see is that one DRM driver decides to work different
> >> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> >> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> >> >
> >> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> >> >
> >> > > Is it
> >> > > too much to assume, that the video mode set in a driver (refresh rate)
> >> > > corresponds to the vblank rate which implicitly delays the completion
> >> > > of vblank-sync'd operations to at least the next vblank boundary?
> >> >
> >> > I think it's wrong to make user space think that a vsynced display
> >> > always matches the refresh rate in a world where:
> >> >
> >> > - some displays have variable refresh rates (not just the fancy new
> >> > stuff like g-sync, look for lvds_downclock in the intel driver for
> >> > example, also consider DSI displays)
> >> >
> >> > - some displays have no refresh rate (the ones we are talking about
> >> > here: udl, bochs...)

> > Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> > think we should have an explicit range or flag to make userspace aware of
> > what's going on.
> 
> I think the concept of vrefresh is flawed and not really future-proof
> (I gave a few examples in my previous email). I agree we should keep
> it as legacy, but we should add something else for the more advanced
> cases.

Right, so let's add something new for new hardware features and keep
the existing behavior existing.

I suppose the problem is that the existing behavior is not really
documented so we have to resort to screaming users?

If one does not ask for ASYNC with a page flip, does it mean flipping
on the next vblank, or flipping such that it cannot tear but allowing
techiques like scanline waits?

It used to be reasonable to assume a constant refresh rate apart from
explicit mode changes. Should we keep this assumption the default and
add API to say different?

> >> > > I think, if the driver cannot implement proper semantics (which IMO
> >> > > includes the throttling) for vblank-sync'd operations and it does not
> >> > > want to fake them with a clock, it should just refuse vblank-synced
> >> > > operations.
> >> >
> >> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> >> > please let's not bake in another assumption in the API (or rather,
> >> > let's try to un-bake it).
> >>
> >> Could you be more specific on everything, please?
> >>
> >> What should drivers do in different situations, what guarantees we do
> >> have, and how does userspace predict the earliest possible flip time?
> >> How do you define flip time to begin with, if it's not tied to the
> >> scanout cycle (vblank)?
> >>
> >> How should a compositor schedule eveything, and what can it tell to the
> >> clients about the timings in the immediate future?
> >>
> >> You gave me the feeling that everything I thought I knew and relied on
> >> is wrong.
> >
> > I guess we either kick out page_flip for all drivers who fake it. And if
> > that's causing regressions then we probably want to fake it with a timer.
> > Unpretty, but such is the game of backwards compat forever. But I'm not
> > sure whether we established that we have a problem already, at least I'm
> > missing users screaming about udl/bochs & friends.

I suppose not hearing users scream is that X.org is not affected,
because it simply doesn't work in a way it would be affected? Or the
various DDX'en. The combination of affected userspace with drivers like
bochs is even more rare.

I first heard about Weston having an abysmal user experience on
drm/bochs when a co-worker was looking at testing Weston in
QEMU/stdvga, which according to others was supposed to be the best
working QEMU output / DRM KMS driver combination. Might have something
to do with needing it on a virtual ARM cpu.

Weston in a VM has not been too attractive before, because of the
missing EGL platform support for swrast, but that has been fixed
recently. I've also heard users (RebeccaBlackOS) to just revert to
Weston's fbdev backend, when the DRM backend just doesn't work right in
a VM.

So I would say it is a known problem with Weston, but users tend to
just dismiss it rather than start pushing it forward. Whether you count
Weston as an existing user in the first place is up to you, I suppose.

I asked Jasper about Mutter:
< pq> Jasper, trying to find users of the KMS API who would work badly,
      if page flips were signalled always immediately
< Jasper> pq, oh, god, us.
< Jasper> pq, we'd spin ourselves in a paint loop to death

I think there might be two different problems here: a) signalling page
flips immediately, and b) "variable refresh rate" systems / on-demand
updates like g-sync, UDL, etc.

My immediate concern is to outlaw immediate signalling of operations
that are intended to be vblank-synced (those that have
traditionally taken time to complete on real hardware, if you don't
like the definition of vblank-synced).

The problem with variable refresh rate systems is unpredictability,
which is different. Submitting updates will take some time in any case,
so you cannot really loop to death (I hope). There I would be happy to
just know that predictions based on perpetual refresh rate are invalid.
That would be enough for Wayland at first.

Stéphane also raised the concern that scanout downclocking etc. may
cause flips to be quite late from predicted. This is less of a problem,
rendering can also take arbitrary times and software is usually written
to deal with missing deadlines. It's a problem that happens all the
time anyway. As it is a problem with prediction accuracy, we could just
wave it off by setting the imaginary "no constant refresh rate" flag.
IMHO this is a problem that should be solved at another occasion, if
necessary.

> Yeah I don't think I care about the old interface, it is what it is.
> But we should design something which works well for the future use
> cases.

Are you referring to the atomic API as the old thing?

In the end, the question is how paranoid should display servers be
about the timings? Do we need to fix Weston, Mutter and likely others
to deal with page flips being signalled immediately, or is it more
appropriate to fix the DRM drivers to keep up the illusion of providing
what the API appears to suggest?


Thanks,
pq
Daniel Vetter July 21, 2015, 9:02 a.m. UTC | #10
On Tue, Jul 21, 2015 at 10:06:09AM +0300, Pekka Paalanen wrote:
> On Mon, 20 Jul 2015 10:32:31 -0700
> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> 
> > On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> > >> On Mon, 20 Jul 2015 01:58:33 -0700
> > >> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > >>
> > >> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >> > > On Sun, 19 Jul 2015 17:20:32 -0700
> > >> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > >> > >
> > >> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > >> > >> >
> > >> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> > >> > >> > John Hunter <zhjwpku@gmail.com> wrote:
> > >> > >> >
> > >> > >> > > From: Zhao Junwang <zhjwpku@gmail.com>
> > >> > >> > >
> > >> > >> > > This supports the asynchronous commits, required for page-flipping
> > >> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > >> > >> > > never have to wait for vblank.
> > >> > >> >
> > >> > >> > Hi,
> > >> > >> >
> > >> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> > >> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> > >> > >> >
> > >> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> > >> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> > >> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > >> > >> > will be running its rendering loop as a busy-loop, because the kernel
> > >> > >> > does not throttle it to the (virtual) display refresh rate.
> > >> > >> >
> > >> > >> > This will cause maximal CPU usage and poor user experience as
> > >> > >> > everything else needs to fight for CPU time and event dispatch to get
> > >> > >> > through, like input.
> > >> > >> >
> > >> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> > >> > >> > emulation based on a clock. Userspace expects page flips to happen at
> > >> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> > >> > >> > natural for userspace to drive its rendering loop based on the vblank
> > >> > >> > cycle.
> > >> > >>
> > >> > >>
> > >> > >> I've been asking myself the same question (for the UDL driver) and I'm
> > >> > >> not sure if this policy should go in the kernel. After all, there
> > >> > >> could be legitimate reasons for user space to render lots of frames
> > >> > >> per second. It seems to me that if user space doesn't want too many
> > >> > >> fps, it should just throttle itself.
> > >> > >
> > >> > > If userspace wants to render lots of frames per second, IMO it should
> > >> > > not be using vblank-synced operations in a way that may throttle it.
> > >> > > The lots of frames use case is already non-working for the majority of
> > >> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> > >> > >
> > >> > > The problem here I see is that one DRM driver decides to work different
> > >> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > >> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> > >> >
> > >> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> > >> >
> > >> > > Is it
> > >> > > too much to assume, that the video mode set in a driver (refresh rate)
> > >> > > corresponds to the vblank rate which implicitly delays the completion
> > >> > > of vblank-sync'd operations to at least the next vblank boundary?
> > >> >
> > >> > I think it's wrong to make user space think that a vsynced display
> > >> > always matches the refresh rate in a world where:
> > >> >
> > >> > - some displays have variable refresh rates (not just the fancy new
> > >> > stuff like g-sync, look for lvds_downclock in the intel driver for
> > >> > example, also consider DSI displays)
> > >> >
> > >> > - some displays have no refresh rate (the ones we are talking about
> > >> > here: udl, bochs...)
> 
> > > Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> > > think we should have an explicit range or flag to make userspace aware of
> > > what's going on.
> > 
> > I think the concept of vrefresh is flawed and not really future-proof
> > (I gave a few examples in my previous email). I agree we should keep
> > it as legacy, but we should add something else for the more advanced
> > cases.
> 
> Right, so let's add something new for new hardware features and keep
> the existing behavior existing.
> 
> I suppose the problem is that the existing behavior is not really
> documented so we have to resort to screaming users?
> 
> If one does not ask for ASYNC with a page flip, does it mean flipping
> on the next vblank, or flipping such that it cannot tear but allowing
> techiques like scanline waits?

Since legacy page_flip is always for the full primary plane you can't do
scanline waits - it covers everything anyway.

> It used to be reasonable to assume a constant refresh rate apart from
> explicit mode changes. Should we keep this assumption the default and
> add API to say different?
> 
> > >> > > I think, if the driver cannot implement proper semantics (which IMO
> > >> > > includes the throttling) for vblank-sync'd operations and it does not
> > >> > > want to fake them with a clock, it should just refuse vblank-synced
> > >> > > operations.
> > >> >
> > >> > Yes refusing vsynced flips for these drivers sounds reasonable. But
> > >> > please let's not bake in another assumption in the API (or rather,
> > >> > let's try to un-bake it).
> > >>
> > >> Could you be more specific on everything, please?
> > >>
> > >> What should drivers do in different situations, what guarantees we do
> > >> have, and how does userspace predict the earliest possible flip time?
> > >> How do you define flip time to begin with, if it's not tied to the
> > >> scanout cycle (vblank)?
> > >>
> > >> How should a compositor schedule eveything, and what can it tell to the
> > >> clients about the timings in the immediate future?
> > >>
> > >> You gave me the feeling that everything I thought I knew and relied on
> > >> is wrong.
> > >
> > > I guess we either kick out page_flip for all drivers who fake it. And if
> > > that's causing regressions then we probably want to fake it with a timer.
> > > Unpretty, but such is the game of backwards compat forever. But I'm not
> > > sure whether we established that we have a problem already, at least I'm
> > > missing users screaming about udl/bochs & friends.
> 
> I suppose not hearing users scream is that X.org is not affected,
> because it simply doesn't work in a way it would be affected? Or the
> various DDX'en. The combination of affected userspace with drivers like
> bochs is even more rare.
> 
> I first heard about Weston having an abysmal user experience on
> drm/bochs when a co-worker was looking at testing Weston in
> QEMU/stdvga, which according to others was supposed to be the best
> working QEMU output / DRM KMS driver combination. Might have something
> to do with needing it on a virtual ARM cpu.
> 
> Weston in a VM has not been too attractive before, because of the
> missing EGL platform support for swrast, but that has been fixed
> recently. I've also heard users (RebeccaBlackOS) to just revert to
> Weston's fbdev backend, when the DRM backend just doesn't work right in
> a VM.
> 
> So I would say it is a known problem with Weston, but users tend to
> just dismiss it rather than start pushing it forward. Whether you count
> Weston as an existing user in the first place is up to you, I suppose.
> 
> I asked Jasper about Mutter:
> < pq> Jasper, trying to find users of the KMS API who would work badly,
>       if page flips were signalled always immediately
> < Jasper> pq, oh, god, us.
> < Jasper> pq, we'd spin ourselves in a paint loop to death
> 
> I think there might be two different problems here: a) signalling page
> flips immediately, and b) "variable refresh rate" systems / on-demand
> updates like g-sync, UDL, etc.
> 
> My immediate concern is to outlaw immediate signalling of operations
> that are intended to be vblank-synced (those that have
> traditionally taken time to complete on real hardware, if you don't
> like the definition of vblank-synced).
> 
> The problem with variable refresh rate systems is unpredictability,
> which is different. Submitting updates will take some time in any case,
> so you cannot really loop to death (I hope). There I would be happy to
> just know that predictions based on perpetual refresh rate are invalid.
> That would be enough for Wayland at first.
> 
> Stéphane also raised the concern that scanout downclocking etc. may
> cause flips to be quite late from predicted. This is less of a problem,
> rendering can also take arbitrary times and software is usually written
> to deal with missing deadlines. It's a problem that happens all the
> time anyway. As it is a problem with prediction accuracy, we could just
> wave it off by setting the imaginary "no constant refresh rate" flag.
> IMHO this is a problem that should be solved at another occasion, if
> necessary.

Downclocking should only be able to delay the very first frame in a
sequence (video playback, animation). Imo we can shrug that off as "clock
skew" ;-)

Imo once someone updates frames regularly vblank timestamps should be
evenly spaced and pageflips not instant (at least by default).

> > Yeah I don't think I care about the old interface, it is what it is.
> > But we should design something which works well for the future use
> > cases.
> 
> Are you referring to the atomic API as the old thing?
> 
> In the end, the question is how paranoid should display servers be
> about the timings? Do we need to fix Weston, Mutter and likely others
> to deal with page flips being signalled immediately, or is it more
> appropriate to fix the DRM drivers to keep up the illusion of providing
> what the API appears to suggest?

I guess they can't assume too much about vblanks (too many drivers don't
even bother with precise/irq-delay-correct timestamps), but I think
assuming that doing a page_flip completion event based renderer shouldn't
result in spinning is reasonable.

I guess for bochs/udl and others we could create a small drm driver which
keeps track of the last vblank ts (we have those already) and suitable
delays the even/timestamp to keep up the illusion. Or we just rip out
pageflip support for those drivers. But if weston&co can't cope with that
that would be worse.
-Daniel
Pekka Paalanen July 21, 2015, 11:22 a.m. UTC | #11
On Tue, 21 Jul 2015 11:02:58 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Jul 21, 2015 at 10:06:09AM +0300, Pekka Paalanen wrote:
> > On Mon, 20 Jul 2015 10:32:31 -0700
> > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > 
> > > On Mon, Jul 20, 2015 at 7:21 AM, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > On Mon, Jul 20, 2015 at 12:35:48PM +0300, Pekka Paalanen wrote:
> > > >> On Mon, 20 Jul 2015 01:58:33 -0700
> > > >> Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > > >>
> > > >> > On Mon, Jul 20, 2015 at 12:46 AM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >> > > On Sun, 19 Jul 2015 17:20:32 -0700
> > > >> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:
> > > >> > >
> > > >> > >> On Thu, Jul 16, 2015 at 11:08 PM, Pekka Paalanen <ppaalanen@gmail.com> wrote:
> > > >> > >> >
> > > >> > >> > On Thu, 16 Jul 2015 20:20:39 +0800
> > > >> > >> > John Hunter <zhjwpku@gmail.com> wrote:
> > > >> > >> >
> > > >> > >> > > From: Zhao Junwang <zhjwpku@gmail.com>
> > > >> > >> > >
> > > >> > >> > > This supports the asynchronous commits, required for page-flipping
> > > >> > >> > > Since it's virtual hw it's ok to commit async stuff right away, we
> > > >> > >> > > never have to wait for vblank.
> > > >> > >> >
> > > >> > >> > Hi,
> > > >> > >> >
> > > >> > >> > in theory, yes. This is what a patch to bochs implemented not too long
> > > >> > >> > ago, so AFAIK you are only replicating the existing behaviour.
> > > >> > >> >
> > > >> > >> > However, if userspace doing an async commit (or sync, I suppose) does
> > > >> > >> > not incur any waits in the kernel in e.g. sending the page flip event,
> > > >> > >> > then flip driven programs (e.g. a Wayland compositor, say, Weston)
> > > >> > >> > will be running its rendering loop as a busy-loop, because the kernel
> > > >> > >> > does not throttle it to the (virtual) display refresh rate.
> > > >> > >> >
> > > >> > >> > This will cause maximal CPU usage and poor user experience as
> > > >> > >> > everything else needs to fight for CPU time and event dispatch to get
> > > >> > >> > through, like input.
> > > >> > >> >
> > > >> > >> > I would hope someone could do a follow-up to implement a refresh cycle
> > > >> > >> > emulation based on a clock. Userspace expects page flips to happen at
> > > >> > >> > most at refresh rate when asking for vblank-synced flips. It's only
> > > >> > >> > natural for userspace to drive its rendering loop based on the vblank
> > > >> > >> > cycle.
> > > >> > >>
> > > >> > >>
> > > >> > >> I've been asking myself the same question (for the UDL driver) and I'm
> > > >> > >> not sure if this policy should go in the kernel. After all, there
> > > >> > >> could be legitimate reasons for user space to render lots of frames
> > > >> > >> per second. It seems to me that if user space doesn't want too many
> > > >> > >> fps, it should just throttle itself.
> > > >> > >
> > > >> > > If userspace wants to render lots of frames per second, IMO it should
> > > >> > > not be using vblank-synced operations in a way that may throttle it.
> > > >> > > The lots of frames use case is already non-working for the majority of
> > > >> > > the drivers without DRM_MODE_PAGE_FLIP_ASYNC, right?
> > > >> > >
> > > >> > > The problem here I see is that one DRM driver decides to work different
> > > >> > > to other DRM drivers. All real-hardware DRM drivers, when asked to do
> > > >> > > vblank-synced update, actually do throttle to the vblank AFAIK.
> > > >> >
> > > >> > udl is an exception here. It is (arguably) real hardware but doesn't throttle.
> > > >> >
> > > >> > > Is it
> > > >> > > too much to assume, that the video mode set in a driver (refresh rate)
> > > >> > > corresponds to the vblank rate which implicitly delays the completion
> > > >> > > of vblank-sync'd operations to at least the next vblank boundary?
> > > >> >
> > > >> > I think it's wrong to make user space think that a vsynced display
> > > >> > always matches the refresh rate in a world where:
> > > >> >
> > > >> > - some displays have variable refresh rates (not just the fancy new
> > > >> > stuff like g-sync, look for lvds_downclock in the intel driver for
> > > >> > example, also consider DSI displays)
> > > >> >
> > > >> > - some displays have no refresh rate (the ones we are talking about
> > > >> > here: udl, bochs...)
> > 
> > > > Imo aiming for vrefresh to be accurate is good. For gsync and friends I
> > > > think we should have an explicit range or flag to make userspace aware of
> > > > what's going on.
> > > 
> > > I think the concept of vrefresh is flawed and not really future-proof
> > > (I gave a few examples in my previous email). I agree we should keep
> > > it as legacy, but we should add something else for the more advanced
> > > cases.
> > 
> > Right, so let's add something new for new hardware features and keep
> > the existing behavior existing.
> > 
> > I suppose the problem is that the existing behavior is not really
> > documented so we have to resort to screaming users?
> > 
> > If one does not ask for ASYNC with a page flip, does it mean flipping
> > on the next vblank, or flipping such that it cannot tear but allowing
> > techiques like scanline waits?
> 
> Since legacy page_flip is always for the full primary plane you can't do
> scanline waits - it covers everything anyway.

Ah, nice. Obvious in hindsight, at least if the display server does
not mark dirty regions.

But there seems to be drmModeDirtyFB(), can that not be used with page
flip? If userspace provided dirty regions, would scanline-wait be
possible in theory? If yes, would it be acceptable for drivers to do?

> > Stéphane also raised the concern that scanout downclocking etc. may
> > cause flips to be quite late from predicted. This is less of a problem,
> > rendering can also take arbitrary times and software is usually written
> > to deal with missing deadlines. It's a problem that happens all the
> > time anyway. As it is a problem with prediction accuracy, we could just
> > wave it off by setting the imaginary "no constant refresh rate" flag.
> > IMHO this is a problem that should be solved at another occasion, if
> > necessary.
> 
> Downclocking should only be able to delay the very first frame in a
> sequence (video playback, animation). Imo we can shrug that off as "clock
> skew" ;-)

Indeed, and at least Weston has special repaint loop start logic to
deal with that.

> Imo once someone updates frames regularly vblank timestamps should be
> evenly spaced and pageflips not instant (at least by default).

Glad to hear. :-)

> > > Yeah I don't think I care about the old interface, it is what it is.
> > > But we should design something which works well for the future use
> > > cases.
> > 
> > Are you referring to the atomic API as the old thing?
> > 
> > In the end, the question is how paranoid should display servers be
> > about the timings? Do we need to fix Weston, Mutter and likely others
> > to deal with page flips being signalled immediately, or is it more
> > appropriate to fix the DRM drivers to keep up the illusion of providing
> > what the API appears to suggest?
> 
> I guess they can't assume too much about vblanks (too many drivers don't
> even bother with precise/irq-delay-correct timestamps), but I think
> assuming that doing a page_flip completion event based renderer shouldn't
> result in spinning is reasonable.

Right. Reliability of timestamps is a whole another can.

> I guess for bochs/udl and others we could create a small drm driver which
> keeps track of the last vblank ts (we have those already) and suitable
> delays the even/timestamp to keep up the illusion. Or we just rip out
> pageflip support for those drivers. But if weston&co can't cope with that
> that would be worse.

Now that you ask, I'm not even really sure what the non-page-flip KMS
display update way is, so no, I would say Weston cannot deal with it as
is.

Who's up for the job? :-)


Thanks,
pq
Daniel Vetter July 21, 2015, 1:47 p.m. UTC | #12
On Tue, Jul 21, 2015 at 02:22:03PM +0300, Pekka Paalanen wrote:
> On Tue, 21 Jul 2015 11:02:58 +0200
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Jul 21, 2015 at 10:06:09AM +0300, Pekka Paalanen wrote:
> > > On Mon, 20 Jul 2015 10:32:31 -0700
> > > Stéphane Marchesin <stephane.marchesin@gmail.com> wrote:

> > > If one does not ask for ASYNC with a page flip, does it mean flipping
> > > on the next vblank, or flipping such that it cannot tear but allowing
> > > techiques like scanline waits?
> > 
> > Since legacy page_flip is always for the full primary plane you can't do
> > scanline waits - it covers everything anyway.
> 
> Ah, nice. Obvious in hindsight, at least if the display server does
> not mark dirty regions.
> 
> But there seems to be drmModeDirtyFB(), can that not be used with page
> flip? If userspace provided dirty regions, would scanline-wait be
> possible in theory? If yes, would it be acceptable for drivers to do?

DirtyFB is exclusively for frontbuffer rendering. We could easily add a
dirty-x/y/w/h rectangle to atomic (I don't think a list makes sense, most
hw can only do one), but right now there's no way to do FlipWithDamage.
Definitely something we want to look into (and we should be able to make
it work on skl on at least in some cases).

> > I guess for bochs/udl and others we could create a small drm driver which
> > keeps track of the last vblank ts (we have those already) and suitable
> > delays the even/timestamp to keep up the illusion. Or we just rip out
> > pageflip support for those drivers. But if weston&co can't cope with that
> > that would be worse.
> 
> Now that you ask, I'm not even really sure what the non-page-flip KMS
> display update way is, so no, I would say Weston cannot deal with it as
> is.
> 
> Who's up for the job? :-)

Implement a drm_send_fake_vblank_event in drm_irq.c and then roll it out
shouldnt be too much work really. We can store crtc-private date needed
for this in the drm_vblank structure (just need a timer plus maybe a
high-res timestamp to make the illusion better). Then we could replace the
send_vblank_event call with that helper in bochs/udl/...
-Daniel

Patch
diff mbox

diff --git a/drivers/gpu/drm/bochs/bochs_mm.c b/drivers/gpu/drm/bochs/bochs_mm.c
index c1d819c..37ac2ca 100644
--- a/drivers/gpu/drm/bochs/bochs_mm.c
+++ b/drivers/gpu/drm/bochs/bochs_mm.c
@@ -545,8 +545,15 @@  bochs_user_framebuffer_create(struct drm_device *dev,
 	return &bochs_fb->base;
 }
 
+static int bochs_atomic_commit(struct drm_device *dev,
+			     struct drm_atomic_state *a,
+			     bool async)
+{
+	return drm_atomic_helper_commit(dev, a, false);
+}
+
 const struct drm_mode_config_funcs bochs_mode_funcs = {
 	.fb_create = bochs_user_framebuffer_create,
 	.atomic_check = drm_atomic_helper_check,
-	.atomic_commit = drm_atomic_helper_commit,
+	.atomic_commit = bochs_atomic_commit,
 };