diff mbox series

drm: avoid spurious EBUSY due to nonblocking atomic modesets

Message ID 20200408162403.3616785-1-daniel.vetter@ffwll.ch (mailing list archive)
State New, archived
Headers show
Series drm: avoid spurious EBUSY due to nonblocking atomic modesets | expand

Commit Message

Daniel Vetter April 8, 2020, 4:24 p.m. UTC
When doing an atomic modeset with ALLOW_MODESET drivers are allowed to
pull in arbitrary other resources, including CRTCs (e.g. when
reconfiguring global resources).

But in nonblocking mode userspace has then no idea this happened,
which can lead to spurious EBUSY calls, both:
- when that other CRTC is currently busy doing a page_flip the
  ALLOW_MODESET commit can fail with an EBUSY
- on the other CRTC a normal atomic flip can fail with EBUSY because
  of the additional commit inserted by the kernel without userspace's
  knowledge

For blocking commits this isn't a problem, because everyone else will
just block until all the CRTC are reconfigured. Only thing userspace
can notice is the dropped frames without any reason for why frames got
dropped.

Consensus is that we need new uapi to handle this properly, but no one
has any idea what exactly the new uapi should look like. As a stop-gap
plug this problem by demoting nonblocking commits which might cause
issues by including CRTCs not in the original request to blocking
commits.

v2: Add comments and a WARN_ON to enforce this only when allowed - we
don't want to silently convert page flips into blocking plane updates
just because the driver is buggy.

v3: Fix inverted WARN_ON (Pekka).

References: https://lists.freedesktop.org/archives/dri-devel/2018-July/182281.html
Bugzilla: https://gitlab.freedesktop.org/wayland/weston/issues/24#note_9568
Cc: Daniel Stone <daniel@fooishbar.org>
Cc: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Stone <daniels@collabora.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
--
Resending because last attempt failed CI and meanwhile the results are
lost :-/
-Daniel
---
 drivers/gpu/drm/drm_atomic.c | 34 +++++++++++++++++++++++++++++++---
 1 file changed, 31 insertions(+), 3 deletions(-)

Comments

Daniel Stone May 14, 2020, 6:40 a.m. UTC | #1
On Wed, 8 Apr 2020 at 17:24, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> Resending because last attempt failed CI and meanwhile the results are
> lost :-/

Did anything happen with this?

Cheers,
Daniel
Daniel Vetter May 14, 2020, 7:08 a.m. UTC | #2
On Thu, May 14, 2020 at 8:42 AM Daniel Stone <daniel@fooishbar.org> wrote:
>
> On Wed, 8 Apr 2020 at 17:24, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Resending because last attempt failed CI and meanwhile the results are
> > lost :-/
>
> Did anything happen with this?

Nope. There's an igt now that fails with this, and I'm not sure
whether changing the igt is the right idea or not.

I'm kinda now thinking about changing this to instead document under
which exact situations you can get a spurious EBUSY, and enforcing
that in the code with some checks. Essentially only possible if you do
a ALLOW_MODESET | NONBLOCKING on the other crtc. And then tell
userspace you get to eat that. We've been shipping with this for so
long by now that's defacto the uapi anyway :-/

Thoughts? Too horrible?
-Daniel
Daniel Stone May 14, 2020, 7:16 a.m. UTC | #3
Hi,

On Thu, 14 May 2020 at 08:08, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > Did anything happen with this?
>
> Nope. There's an igt now that fails with this, and I'm not sure
> whether changing the igt is the right idea or not.
>
> I'm kinda now thinking about changing this to instead document under
> which exact situations you can get a spurious EBUSY, and enforcing
> that in the code with some checks. Essentially only possible if you do
> a ALLOW_MODESET | NONBLOCKING on the other crtc. And then tell
> userspace you get to eat that. We've been shipping with this for so
> long by now that's defacto the uapi anyway :-/
>
> Thoughts? Too horrible?

I've been trying to avoid that, to be honest. Taking a random delay
because the kernel needs to do global things is fine. But making
userspace either do an expensive/complicated cross-CRTC
synchronisation is less easy; for some compositors, that means
reaching across threads to make sure all CRTCs are quiescent. Either
that, or deferring your ALLOW_MODESET to somewhere else, like an idle
handler, far away from where you were originally trying to do it,
which wouldn't be pleasant. The other option is that we teach people
to ignore EBUSY as random noise which can just sometimes happen and to
try again (when? how often? and you still have cross-CRTC
synchronisation issues), which doesn't scream compositor best practice
to me.

I'd be very much in favour of putting the blocking down in the kernel
at least until the kernel can give us a clear indication to tell us
what's going on, and ideally which other resources need to be dragged
in, in a way which is distinguishable from your compositor having
broken synchronisation.

Cheers,
Daniel
Daniel Vetter May 14, 2020, 7:25 a.m. UTC | #4
On Thu, May 14, 2020 at 9:18 AM Daniel Stone <daniel@fooishbar.org> wrote:
>
> Hi,
>
> On Thu, 14 May 2020 at 08:08, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > Did anything happen with this?
> >
> > Nope. There's an igt now that fails with this, and I'm not sure
> > whether changing the igt is the right idea or not.
> >
> > I'm kinda now thinking about changing this to instead document under
> > which exact situations you can get a spurious EBUSY, and enforcing
> > that in the code with some checks. Essentially only possible if you do
> > a ALLOW_MODESET | NONBLOCKING on the other crtc. And then tell
> > userspace you get to eat that. We've been shipping with this for so
> > long by now that's defacto the uapi anyway :-/
> >
> > Thoughts? Too horrible?
>
> I've been trying to avoid that, to be honest. Taking a random delay
> because the kernel needs to do global things is fine. But making
> userspace either do an expensive/complicated cross-CRTC
> synchronisation is less easy; for some compositors, that means
> reaching across threads to make sure all CRTCs are quiescent. Either
> that, or deferring your ALLOW_MODESET to somewhere else, like an idle
> handler, far away from where you were originally trying to do it,
> which wouldn't be pleasant. The other option is that we teach people
> to ignore EBUSY as random noise which can just sometimes happen and to
> try again (when? how often? and you still have cross-CRTC
> synchronisation issues), which doesn't scream compositor best practice
> to me.
>
> I'd be very much in favour of putting the blocking down in the kernel
> at least until the kernel can give us a clear indication to tell us
> what's going on, and ideally which other resources need to be dragged
> in, in a way which is distinguishable from your compositor having
> broken synchronisation.

We know, the patch already computes that ... So would be a matter of
exporting that to userspace. We have a mask of all additional crtc
that will get an event and will -EBUSY until that's done.
-Daniel
Daniel Stone May 14, 2020, 7:40 a.m. UTC | #5
On Thu, 14 May 2020 at 08:25, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Thu, May 14, 2020 at 9:18 AM Daniel Stone <daniel@fooishbar.org> wrote:
> > On Thu, 14 May 2020 at 08:08, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > I'd be very much in favour of putting the blocking down in the kernel
> > at least until the kernel can give us a clear indication to tell us
> > what's going on, and ideally which other resources need to be dragged
> > in, in a way which is distinguishable from your compositor having
> > broken synchronisation.
>
> We know, the patch already computes that ... So would be a matter of
> exporting that to userspace. We have a mask of all additional crtc
> that will get an event and will -EBUSY until that's done.

Yep, but unless and until that happens, could we please get this in?
Given it would require uAPI changes, we'd need to modify all the
compositors to work with the old path (random EBUSY) and the new path
(predictable and obvious), so at least preserving the promise that
per-CRTC updates are really independent would be good.

Cheers,
Daniel
Daniel Vetter May 14, 2020, 12:32 p.m. UTC | #6
On Thu, May 14, 2020 at 08:40:21AM +0100, Daniel Stone wrote:
> On Thu, 14 May 2020 at 08:25, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > On Thu, May 14, 2020 at 9:18 AM Daniel Stone <daniel@fooishbar.org> wrote:
> > > On Thu, 14 May 2020 at 08:08, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > I'd be very much in favour of putting the blocking down in the kernel
> > > at least until the kernel can give us a clear indication to tell us
> > > what's going on, and ideally which other resources need to be dragged
> > > in, in a way which is distinguishable from your compositor having
> > > broken synchronisation.
> >
> > We know, the patch already computes that ... So would be a matter of
> > exporting that to userspace. We have a mask of all additional crtc
> > that will get an event and will -EBUSY until that's done.
> 
> Yep, but unless and until that happens, could we please get this in?
> Given it would require uAPI changes, we'd need to modify all the
> compositors to work with the old path (random EBUSY) and the new path
> (predictable and obvious), so at least preserving the promise that
> per-CRTC updates are really independent would be good.

I haven't found the time to look at the intel-gfx-ci fail in igt nor
really think about that. Nor care enough to just hammer this ignoring ci,
since I didn't even get around to understand why the igt now fails If
someone else takes this over, happy to see it land.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index 965173fd0ac2..4f140ff6fb98 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1362,15 +1362,43 @@  EXPORT_SYMBOL(drm_atomic_commit);
 int drm_atomic_nonblocking_commit(struct drm_atomic_state *state)
 {
 	struct drm_mode_config *config = &state->dev->mode_config;
-	int ret;
+	unsigned requested_crtc = 0;
+	unsigned affected_crtc = 0;
+	struct drm_crtc *crtc;
+	struct drm_crtc_state *crtc_state;
+	bool nonblocking = true;
+	int ret, i;
+
+	/*
+	 * For commits that allow modesets drivers can add other CRTCs to the
+	 * atomic commit, e.g. when they need to reallocate global resources.
+	 *
+	 * But when userspace also requests a nonblocking commit then userspace
+	 * cannot know that the commit affects other CRTCs, which can result in
+	 * spurious EBUSY failures. Until we have better uapi plug this by
+	 * demoting such commits to blocking mode.
+	 */
+	for_each_new_crtc_in_state(state, crtc, crtc_state, i)
+		requested_crtc |= drm_crtc_mask(crtc);
 
 	ret = drm_atomic_check_only(state);
 	if (ret)
 		return ret;
 
-	DRM_DEBUG_ATOMIC("committing %p nonblocking\n", state);
+	for_each_new_crtc_in_state(state, crtc, crtc_state, i)
+		affected_crtc |= drm_crtc_mask(crtc);
+
+	if (affected_crtc != requested_crtc) {
+		/* adding other CRTC is only allowed for modeset commits */
+		WARN_ON(!state->allow_modeset);
+
+		DRM_DEBUG_ATOMIC("demoting %p to blocking mode to avoid EBUSY\n", state);
+		nonblocking = false;
+	} else {
+		DRM_DEBUG_ATOMIC("committing %p nonblocking\n", state);
+	}
 
-	return config->funcs->atomic_commit(state->dev, state, true);
+	return config->funcs->atomic_commit(state->dev, state, nonblocking);
 }
 EXPORT_SYMBOL(drm_atomic_nonblocking_commit);