drm/atomc: Update docs around locking and commit sequencing
diff mbox series

Message ID 20191204100011.859468-1-daniel.vetter@ffwll.ch
State New
Headers show
Series
  • drm/atomc: Update docs around locking and commit sequencing
Related show

Commit Message

Daniel Vetter Dec. 4, 2019, 10 a.m. UTC
Both locking and especially sequencing of nonblocking commits have
evolved a lot. The details are all there, but I noticed that the big
picture and connections have fallen behind a bit. Apply polish.

Motivated by some review discussions with Thierry.

Cc: Thierry Reding <treding@nvidia.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
---
 Documentation/gpu/drm-kms.rst       | 11 ++++++-
 drivers/gpu/drm/drm_atomic.c        | 10 ++++---
 drivers/gpu/drm/drm_atomic_helper.c | 46 ++++++++++++++++++-----------
 include/drm/drm_atomic.h            | 13 ++++++--
 4 files changed, 56 insertions(+), 24 deletions(-)

Comments

Thierry Reding Dec. 5, 2019, 4:02 p.m. UTC | #1
On Wed, Dec 04, 2019 at 11:00:11AM +0100, Daniel Vetter wrote:
> Both locking and especially sequencing of nonblocking commits have
> evolved a lot. The details are all there, but I noticed that the big
> picture and connections have fallen behind a bit. Apply polish.
> 
> Motivated by some review discussions with Thierry.
> 
> Cc: Thierry Reding <treding@nvidia.com>
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> ---
>  Documentation/gpu/drm-kms.rst       | 11 ++++++-
>  drivers/gpu/drm/drm_atomic.c        | 10 ++++---
>  drivers/gpu/drm/drm_atomic_helper.c | 46 ++++++++++++++++++-----------
>  include/drm/drm_atomic.h            | 13 ++++++--
>  4 files changed, 56 insertions(+), 24 deletions(-)

Hi Daniel,

"drm/atomic" in the subject. A couple more minor things I noticed below.

> 
> diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> index c68588ce4090..b9330343d1bc 100644
> --- a/Documentation/gpu/drm-kms.rst
> +++ b/Documentation/gpu/drm-kms.rst
> @@ -260,7 +260,8 @@ Taken all together there's two consequences for the atomic design:
>    drm_connector_state <drm_connector_state>` for connectors. These are the only
>    objects with userspace-visible and settable state. For internal state drivers
>    can subclass these structures through embeddeding, or add entirely new state
> -  structures for their globally shared hardware functions.
> +  structures for their globally shared hardware functions, see :c:type:`struct
> +  drm_private_state<drm_private_state>`.
>  
>  - An atomic update is assembled and validated as an entirely free-standing pile
>    of structures within the :c:type:`drm_atomic_state <drm_atomic_state>`
> @@ -269,6 +270,14 @@ Taken all together there's two consequences for the atomic design:
>    to the driver and modeset objects. This way rolling back an update boils down
>    to releasing memory and unreferencing objects like framebuffers.
>  
> +Locking of atomic state structures is internally using :c:type:`struct
> +drm_modeset_lock <drm_modeset_lock>`. As a general rule the locking shouldn't be
> +exposed to drivers, instead the right locks should be automatically acquired by
> +any function that duplicates or peeks into a state, like e.g.
> +:c:func:`drm_atomic_get_crtc_state()`.  Locking only protects the software data
> +structure, ordering of committing state changes to hardware is sequenced using
> +:c:type:`struct drm_crtc_commit <drm_crtc_commit>`.
> +
>  Read on in this chapter, and also in :ref:`drm_atomic_helper` for more detailed
>  coverage of specific topics.
>  
> diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> index a351a9a39530..5b4787e33f0d 100644
> --- a/drivers/gpu/drm/drm_atomic.c
> +++ b/drivers/gpu/drm/drm_atomic.c
> @@ -688,10 +688,12 @@ static void drm_atomic_plane_print_state(struct drm_printer *p,
>   * associated state struct &drm_private_state.
>   *
>   * Similar to userspace-exposed objects, private state structures can be
> - * acquired by calling drm_atomic_get_private_obj_state(). Since this function
> - * does not take care of locking, drivers should wrap it for each type of
> - * private state object they have with the required call to drm_modeset_lock()
> - * for the corresponding &drm_modeset_lock.
> + * acquired by calling drm_atomic_get_private_obj_state(). This also takes care
> + * of locking, hence drivers should not have a need to call drm_modeset_lock()
> + * directly. Sequence of the actual hardware state commit is not handled,
> + * drivers might need to keep track of struct drm_crtc_commit within subclassed
> + * structure of &drm_private_state as necessary, e.g. similar to
> + * &drm_plane_state.commit. See also &drm_atomic_state.fake_commit.
>   *
>   * All private state structures contained in a &drm_atomic_state update can be
>   * iterated using for_each_oldnew_private_obj_in_state(),
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 711801b9d4f1..10d62f726b22 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -1827,17 +1827,21 @@ EXPORT_SYMBOL(drm_atomic_helper_commit);
>  /**
>   * DOC: implementing nonblocking commit
>   *
> - * Nonblocking atomic commits have to be implemented in the following sequence:
> + * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence
> + * different operations against each another. Locks, especially struct
> + * &drm_modeset_lock, should not be held in worker threads or any other
> + * asynchronous context used to commit the hardware state.
>   *
> - * 1. Run drm_atomic_helper_prepare_planes() first. This is the only function
> - * which commit needs to call which can fail, so we want to run it first and
> + * drm_atomic_helper_commit() implements the recommended sequence for
> + * nonblocking commits, using drm_atomic_helper_setup_commit() internally:
> + *
> + * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we
> + * need to propagate out of memory/VRAM errors to userspace, it must be called
>   * synchronously.
>   *
>   * 2. Synchronize with any outstanding nonblocking commit worker threads which
> - * might be affected the new state update. This can be done by either cancelling
> - * or flushing the work items, depending upon whether the driver can deal with
> - * cancelled updates. Note that it is important to ensure that the framebuffer
> - * cleanup is still done when cancelling.
> + * might be affected the new state update. This is handled by

"affected _by_ the new state update"?

> + * drm_atomic_helper_setup_commit().
>   *
>   * Asynchronous workers need to have sufficient parallelism to be able to run
>   * different atomic commits on different CRTCs in parallel. The simplest way to
> @@ -1848,21 +1852,29 @@ EXPORT_SYMBOL(drm_atomic_helper_commit);
>   * must be done as one global operation, and enabling or disabling a CRTC can
>   * take a long time. But even that is not required.
>   *
> + * IMPORTANT: A &drm_atomic_state update for multiple CRTCs is sequenced
> + * against all CRTCs therein. Therefor for atomic state updates which only flip

I think "therefor" has a slightly different meaning than "therefore",
and I think you actually want the latter in this case.

> + * planes the driver must not get the struct &drm_crtc_state of unrelated CRTCs
> + * in its atomic check codee: This would prevent committing of atomic updates to

"code"

> + * multiple CRTCs in parallel. In general, adding additional state structures
> + * should be avoided as much as possible, because this reduces parallism in

"parallelism"

> + * (nonblocking) commits, both due to locking and due to commit sequencing
> + * requirements.
> + *
>   * 3. The software state is updated synchronously with
>   * drm_atomic_helper_swap_state(). Doing this under the protection of all modeset
> - * locks means concurrent callers never see inconsistent state. And doing this
> - * while it's guaranteed that no relevant nonblocking worker runs means that
> - * nonblocking workers do not need grab any locks. Actually they must not grab
> - * locks, for otherwise the work flushing will deadlock.
> + * locks means concurrent callers never see inconsistent state. Note that commit
> + * workers do not hold any locks, their access is only coordinated through

I stumbled across this a couple of times when reading it. I think it
becomes clearer when you replace the comma by a colon:

	Note that commit workers do not hold any locks: their access is only
	coordinated through ordering.

Or perhaps replace the comma with "and"?

Other than that, looks good to me:

Reviewed-by: Thierry Reding <treding@nvidia.com>

> + * ordering. If workers would access state only through the pointers in the
> + * free-standing state objects (currently not the case for any driver) then even
> + * multiple pending commits could be in-flight at the same time.
>   *
>   * 4. Schedule a work item to do all subsequent steps, using the split-out
>   * commit helpers: a) pre-plane commit b) plane commit c) post-plane commit and
>   * then cleaning up the framebuffers after the old framebuffer is no longer
> - * being displayed.
> - *
> - * The above scheme is implemented in the atomic helper libraries in
> - * drm_atomic_helper_commit() using a bunch of helper functions. See
> - * drm_atomic_helper_setup_commit() for a starting point.
> + * being displayed. The scheduled work should synchronize against other workers
> + * using the &drm_crtc_commit infrastructure as needed. See
> + * drm_atomic_helper_setup_commit() for more details.
>   */
>  
>  static int stall_checks(struct drm_crtc *crtc, bool nonblock)
> @@ -2085,7 +2097,7 @@ EXPORT_SYMBOL(drm_atomic_helper_setup_commit);
>   *
>   * This function waits for all preceeding commits that touch the same CRTC as
>   * @old_state to both be committed to the hardware (as signalled by
> - * drm_atomic_helper_commit_hw_done) and executed by the hardware (as signalled
> + * drm_atomic_helper_commit_hw_done()) and executed by the hardware (as signalled
>   * by calling drm_crtc_send_vblank_event() on the &drm_crtc_state.event).
>   *
>   * This is part of the atomic helper support for nonblocking commits, see
> diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
> index b6c73fd9f55a..5923819dcd68 100644
> --- a/include/drm/drm_atomic.h
> +++ b/include/drm/drm_atomic.h
> @@ -60,8 +60,8 @@
>   * 	wait for flip_done		<----
>   * 	clean up atomic state
>   *
> - * The important bit to know is that cleanup_done is the terminal event, but the
> - * ordering between flip_done and hw_done is entirely up to the specific driver
> + * The important bit to know is that &cleanup_done is the terminal event, but the
> + * ordering between &flip_done and &hw_done is entirely up to the specific driver
>   * and modeset state change.
>   *
>   * For an implementation of how to use this look at
> @@ -92,6 +92,9 @@ struct drm_crtc_commit {
>  	 * commit is sent to userspace, or when an out-fence is singalled. Note
>  	 * that for most hardware, in most cases this happens after @hw_done is
>  	 * signalled.
> +	 *
> +	 * Completion of this stage is signalled implicitly by calling
> +	 * drm_crtc_send_vblank_event() on &drm_crtc_state.event.
>  	 */
>  	struct completion flip_done;
>  
> @@ -107,6 +110,9 @@ struct drm_crtc_commit {
>  	 * Note that this does not need to include separately reference-counted
>  	 * resources like backing storage buffer pinning, or runtime pm
>  	 * management.
> +	 *
> +	 * Drivers should call drm_atomic_helper_commit_hw_done() to signal
> +	 * completion of this stage.
>  	 */
>  	struct completion hw_done;
>  
> @@ -118,6 +124,9 @@ struct drm_crtc_commit {
>  	 * a vblank wait completed it might be a bit later. This completion is
>  	 * useful to throttle updates and avoid hardware updates getting ahead
>  	 * of the buffer cleanup too much.
> +	 *
> +	 * Drivers should call drm_atomic_helper_commit_cleanup_done() to signal
> +	 * completion of this stage.
>  	 */
>  	struct completion cleanup_done;
>  
> -- 
> 2.24.0
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Dec. 10, 2019, 10:31 a.m. UTC | #2
On Thu, Dec 05, 2019 at 05:02:24PM +0100, Thierry Reding wrote:
> On Wed, Dec 04, 2019 at 11:00:11AM +0100, Daniel Vetter wrote:
> > Both locking and especially sequencing of nonblocking commits have
> > evolved a lot. The details are all there, but I noticed that the big
> > picture and connections have fallen behind a bit. Apply polish.
> > 
> > Motivated by some review discussions with Thierry.
> > 
> > Cc: Thierry Reding <treding@nvidia.com>
> > Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
> > ---
> >  Documentation/gpu/drm-kms.rst       | 11 ++++++-
> >  drivers/gpu/drm/drm_atomic.c        | 10 ++++---
> >  drivers/gpu/drm/drm_atomic_helper.c | 46 ++++++++++++++++++-----------
> >  include/drm/drm_atomic.h            | 13 ++++++--
> >  4 files changed, 56 insertions(+), 24 deletions(-)
> 
> Hi Daniel,
> 
> "drm/atomic" in the subject. A couple more minor things I noticed below.
> 
> > 
> > diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
> > index c68588ce4090..b9330343d1bc 100644
> > --- a/Documentation/gpu/drm-kms.rst
> > +++ b/Documentation/gpu/drm-kms.rst
> > @@ -260,7 +260,8 @@ Taken all together there's two consequences for the atomic design:
> >    drm_connector_state <drm_connector_state>` for connectors. These are the only
> >    objects with userspace-visible and settable state. For internal state drivers
> >    can subclass these structures through embeddeding, or add entirely new state
> > -  structures for their globally shared hardware functions.
> > +  structures for their globally shared hardware functions, see :c:type:`struct
> > +  drm_private_state<drm_private_state>`.
> >  
> >  - An atomic update is assembled and validated as an entirely free-standing pile
> >    of structures within the :c:type:`drm_atomic_state <drm_atomic_state>`
> > @@ -269,6 +270,14 @@ Taken all together there's two consequences for the atomic design:
> >    to the driver and modeset objects. This way rolling back an update boils down
> >    to releasing memory and unreferencing objects like framebuffers.
> >  
> > +Locking of atomic state structures is internally using :c:type:`struct
> > +drm_modeset_lock <drm_modeset_lock>`. As a general rule the locking shouldn't be
> > +exposed to drivers, instead the right locks should be automatically acquired by
> > +any function that duplicates or peeks into a state, like e.g.
> > +:c:func:`drm_atomic_get_crtc_state()`.  Locking only protects the software data
> > +structure, ordering of committing state changes to hardware is sequenced using
> > +:c:type:`struct drm_crtc_commit <drm_crtc_commit>`.
> > +
> >  Read on in this chapter, and also in :ref:`drm_atomic_helper` for more detailed
> >  coverage of specific topics.
> >  
> > diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
> > index a351a9a39530..5b4787e33f0d 100644
> > --- a/drivers/gpu/drm/drm_atomic.c
> > +++ b/drivers/gpu/drm/drm_atomic.c
> > @@ -688,10 +688,12 @@ static void drm_atomic_plane_print_state(struct drm_printer *p,
> >   * associated state struct &drm_private_state.
> >   *
> >   * Similar to userspace-exposed objects, private state structures can be
> > - * acquired by calling drm_atomic_get_private_obj_state(). Since this function
> > - * does not take care of locking, drivers should wrap it for each type of
> > - * private state object they have with the required call to drm_modeset_lock()
> > - * for the corresponding &drm_modeset_lock.
> > + * acquired by calling drm_atomic_get_private_obj_state(). This also takes care
> > + * of locking, hence drivers should not have a need to call drm_modeset_lock()
> > + * directly. Sequence of the actual hardware state commit is not handled,
> > + * drivers might need to keep track of struct drm_crtc_commit within subclassed
> > + * structure of &drm_private_state as necessary, e.g. similar to
> > + * &drm_plane_state.commit. See also &drm_atomic_state.fake_commit.
> >   *
> >   * All private state structures contained in a &drm_atomic_state update can be
> >   * iterated using for_each_oldnew_private_obj_in_state(),
> > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> > index 711801b9d4f1..10d62f726b22 100644
> > --- a/drivers/gpu/drm/drm_atomic_helper.c
> > +++ b/drivers/gpu/drm/drm_atomic_helper.c
> > @@ -1827,17 +1827,21 @@ EXPORT_SYMBOL(drm_atomic_helper_commit);
> >  /**
> >   * DOC: implementing nonblocking commit
> >   *
> > - * Nonblocking atomic commits have to be implemented in the following sequence:
> > + * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence
> > + * different operations against each another. Locks, especially struct
> > + * &drm_modeset_lock, should not be held in worker threads or any other
> > + * asynchronous context used to commit the hardware state.
> >   *
> > - * 1. Run drm_atomic_helper_prepare_planes() first. This is the only function
> > - * which commit needs to call which can fail, so we want to run it first and
> > + * drm_atomic_helper_commit() implements the recommended sequence for
> > + * nonblocking commits, using drm_atomic_helper_setup_commit() internally:
> > + *
> > + * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we
> > + * need to propagate out of memory/VRAM errors to userspace, it must be called
> >   * synchronously.
> >   *
> >   * 2. Synchronize with any outstanding nonblocking commit worker threads which
> > - * might be affected the new state update. This can be done by either cancelling
> > - * or flushing the work items, depending upon whether the driver can deal with
> > - * cancelled updates. Note that it is important to ensure that the framebuffer
> > - * cleanup is still done when cancelling.
> > + * might be affected the new state update. This is handled by
> 
> "affected _by_ the new state update"?
> 
> > + * drm_atomic_helper_setup_commit().
> >   *
> >   * Asynchronous workers need to have sufficient parallelism to be able to run
> >   * different atomic commits on different CRTCs in parallel. The simplest way to
> > @@ -1848,21 +1852,29 @@ EXPORT_SYMBOL(drm_atomic_helper_commit);
> >   * must be done as one global operation, and enabling or disabling a CRTC can
> >   * take a long time. But even that is not required.
> >   *
> > + * IMPORTANT: A &drm_atomic_state update for multiple CRTCs is sequenced
> > + * against all CRTCs therein. Therefor for atomic state updates which only flip
> 
> I think "therefor" has a slightly different meaning than "therefore",
> and I think you actually want the latter in this case.
> 
> > + * planes the driver must not get the struct &drm_crtc_state of unrelated CRTCs
> > + * in its atomic check codee: This would prevent committing of atomic updates to
> 
> "code"
> 
> > + * multiple CRTCs in parallel. In general, adding additional state structures
> > + * should be avoided as much as possible, because this reduces parallism in
> 
> "parallelism"
> 
> > + * (nonblocking) commits, both due to locking and due to commit sequencing
> > + * requirements.
> > + *
> >   * 3. The software state is updated synchronously with
> >   * drm_atomic_helper_swap_state(). Doing this under the protection of all modeset
> > - * locks means concurrent callers never see inconsistent state. And doing this
> > - * while it's guaranteed that no relevant nonblocking worker runs means that
> > - * nonblocking workers do not need grab any locks. Actually they must not grab
> > - * locks, for otherwise the work flushing will deadlock.
> > + * locks means concurrent callers never see inconsistent state. Note that commit
> > + * workers do not hold any locks, their access is only coordinated through
> 
> I stumbled across this a couple of times when reading it. I think it
> becomes clearer when you replace the comma by a colon:
> 
> 	Note that commit workers do not hold any locks: their access is only
> 	coordinated through ordering.
> 
> Or perhaps replace the comma with "and"?
> 
> Other than that, looks good to me:
> 
> Reviewed-by: Thierry Reding <treding@nvidia.com>

All your comments applied and patch merged, thanks for your review.
-Daniel

> 
> > + * ordering. If workers would access state only through the pointers in the
> > + * free-standing state objects (currently not the case for any driver) then even
> > + * multiple pending commits could be in-flight at the same time.
> >   *
> >   * 4. Schedule a work item to do all subsequent steps, using the split-out
> >   * commit helpers: a) pre-plane commit b) plane commit c) post-plane commit and
> >   * then cleaning up the framebuffers after the old framebuffer is no longer
> > - * being displayed.
> > - *
> > - * The above scheme is implemented in the atomic helper libraries in
> > - * drm_atomic_helper_commit() using a bunch of helper functions. See
> > - * drm_atomic_helper_setup_commit() for a starting point.
> > + * being displayed. The scheduled work should synchronize against other workers
> > + * using the &drm_crtc_commit infrastructure as needed. See
> > + * drm_atomic_helper_setup_commit() for more details.
> >   */
> >  
> >  static int stall_checks(struct drm_crtc *crtc, bool nonblock)
> > @@ -2085,7 +2097,7 @@ EXPORT_SYMBOL(drm_atomic_helper_setup_commit);
> >   *
> >   * This function waits for all preceeding commits that touch the same CRTC as
> >   * @old_state to both be committed to the hardware (as signalled by
> > - * drm_atomic_helper_commit_hw_done) and executed by the hardware (as signalled
> > + * drm_atomic_helper_commit_hw_done()) and executed by the hardware (as signalled
> >   * by calling drm_crtc_send_vblank_event() on the &drm_crtc_state.event).
> >   *
> >   * This is part of the atomic helper support for nonblocking commits, see
> > diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
> > index b6c73fd9f55a..5923819dcd68 100644
> > --- a/include/drm/drm_atomic.h
> > +++ b/include/drm/drm_atomic.h
> > @@ -60,8 +60,8 @@
> >   * 	wait for flip_done		<----
> >   * 	clean up atomic state
> >   *
> > - * The important bit to know is that cleanup_done is the terminal event, but the
> > - * ordering between flip_done and hw_done is entirely up to the specific driver
> > + * The important bit to know is that &cleanup_done is the terminal event, but the
> > + * ordering between &flip_done and &hw_done is entirely up to the specific driver
> >   * and modeset state change.
> >   *
> >   * For an implementation of how to use this look at
> > @@ -92,6 +92,9 @@ struct drm_crtc_commit {
> >  	 * commit is sent to userspace, or when an out-fence is singalled. Note
> >  	 * that for most hardware, in most cases this happens after @hw_done is
> >  	 * signalled.
> > +	 *
> > +	 * Completion of this stage is signalled implicitly by calling
> > +	 * drm_crtc_send_vblank_event() on &drm_crtc_state.event.
> >  	 */
> >  	struct completion flip_done;
> >  
> > @@ -107,6 +110,9 @@ struct drm_crtc_commit {
> >  	 * Note that this does not need to include separately reference-counted
> >  	 * resources like backing storage buffer pinning, or runtime pm
> >  	 * management.
> > +	 *
> > +	 * Drivers should call drm_atomic_helper_commit_hw_done() to signal
> > +	 * completion of this stage.
> >  	 */
> >  	struct completion hw_done;
> >  
> > @@ -118,6 +124,9 @@ struct drm_crtc_commit {
> >  	 * a vblank wait completed it might be a bit later. This completion is
> >  	 * useful to throttle updates and avoid hardware updates getting ahead
> >  	 * of the buffer cleanup too much.
> > +	 *
> > +	 * Drivers should call drm_atomic_helper_commit_cleanup_done() to signal
> > +	 * completion of this stage.
> >  	 */
> >  	struct completion cleanup_done;
> >  
> > -- 
> > 2.24.0
> > 
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel

Patch
diff mbox series

diff --git a/Documentation/gpu/drm-kms.rst b/Documentation/gpu/drm-kms.rst
index c68588ce4090..b9330343d1bc 100644
--- a/Documentation/gpu/drm-kms.rst
+++ b/Documentation/gpu/drm-kms.rst
@@ -260,7 +260,8 @@  Taken all together there's two consequences for the atomic design:
   drm_connector_state <drm_connector_state>` for connectors. These are the only
   objects with userspace-visible and settable state. For internal state drivers
   can subclass these structures through embeddeding, or add entirely new state
-  structures for their globally shared hardware functions.
+  structures for their globally shared hardware functions, see :c:type:`struct
+  drm_private_state<drm_private_state>`.
 
 - An atomic update is assembled and validated as an entirely free-standing pile
   of structures within the :c:type:`drm_atomic_state <drm_atomic_state>`
@@ -269,6 +270,14 @@  Taken all together there's two consequences for the atomic design:
   to the driver and modeset objects. This way rolling back an update boils down
   to releasing memory and unreferencing objects like framebuffers.
 
+Locking of atomic state structures is internally using :c:type:`struct
+drm_modeset_lock <drm_modeset_lock>`. As a general rule the locking shouldn't be
+exposed to drivers, instead the right locks should be automatically acquired by
+any function that duplicates or peeks into a state, like e.g.
+:c:func:`drm_atomic_get_crtc_state()`.  Locking only protects the software data
+structure, ordering of committing state changes to hardware is sequenced using
+:c:type:`struct drm_crtc_commit <drm_crtc_commit>`.
+
 Read on in this chapter, and also in :ref:`drm_atomic_helper` for more detailed
 coverage of specific topics.
 
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index a351a9a39530..5b4787e33f0d 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -688,10 +688,12 @@  static void drm_atomic_plane_print_state(struct drm_printer *p,
  * associated state struct &drm_private_state.
  *
  * Similar to userspace-exposed objects, private state structures can be
- * acquired by calling drm_atomic_get_private_obj_state(). Since this function
- * does not take care of locking, drivers should wrap it for each type of
- * private state object they have with the required call to drm_modeset_lock()
- * for the corresponding &drm_modeset_lock.
+ * acquired by calling drm_atomic_get_private_obj_state(). This also takes care
+ * of locking, hence drivers should not have a need to call drm_modeset_lock()
+ * directly. Sequence of the actual hardware state commit is not handled,
+ * drivers might need to keep track of struct drm_crtc_commit within subclassed
+ * structure of &drm_private_state as necessary, e.g. similar to
+ * &drm_plane_state.commit. See also &drm_atomic_state.fake_commit.
  *
  * All private state structures contained in a &drm_atomic_state update can be
  * iterated using for_each_oldnew_private_obj_in_state(),
diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 711801b9d4f1..10d62f726b22 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -1827,17 +1827,21 @@  EXPORT_SYMBOL(drm_atomic_helper_commit);
 /**
  * DOC: implementing nonblocking commit
  *
- * Nonblocking atomic commits have to be implemented in the following sequence:
+ * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence
+ * different operations against each another. Locks, especially struct
+ * &drm_modeset_lock, should not be held in worker threads or any other
+ * asynchronous context used to commit the hardware state.
  *
- * 1. Run drm_atomic_helper_prepare_planes() first. This is the only function
- * which commit needs to call which can fail, so we want to run it first and
+ * drm_atomic_helper_commit() implements the recommended sequence for
+ * nonblocking commits, using drm_atomic_helper_setup_commit() internally:
+ *
+ * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we
+ * need to propagate out of memory/VRAM errors to userspace, it must be called
  * synchronously.
  *
  * 2. Synchronize with any outstanding nonblocking commit worker threads which
- * might be affected the new state update. This can be done by either cancelling
- * or flushing the work items, depending upon whether the driver can deal with
- * cancelled updates. Note that it is important to ensure that the framebuffer
- * cleanup is still done when cancelling.
+ * might be affected the new state update. This is handled by
+ * drm_atomic_helper_setup_commit().
  *
  * Asynchronous workers need to have sufficient parallelism to be able to run
  * different atomic commits on different CRTCs in parallel. The simplest way to
@@ -1848,21 +1852,29 @@  EXPORT_SYMBOL(drm_atomic_helper_commit);
  * must be done as one global operation, and enabling or disabling a CRTC can
  * take a long time. But even that is not required.
  *
+ * IMPORTANT: A &drm_atomic_state update for multiple CRTCs is sequenced
+ * against all CRTCs therein. Therefor for atomic state updates which only flip
+ * planes the driver must not get the struct &drm_crtc_state of unrelated CRTCs
+ * in its atomic check codee: This would prevent committing of atomic updates to
+ * multiple CRTCs in parallel. In general, adding additional state structures
+ * should be avoided as much as possible, because this reduces parallism in
+ * (nonblocking) commits, both due to locking and due to commit sequencing
+ * requirements.
+ *
  * 3. The software state is updated synchronously with
  * drm_atomic_helper_swap_state(). Doing this under the protection of all modeset
- * locks means concurrent callers never see inconsistent state. And doing this
- * while it's guaranteed that no relevant nonblocking worker runs means that
- * nonblocking workers do not need grab any locks. Actually they must not grab
- * locks, for otherwise the work flushing will deadlock.
+ * locks means concurrent callers never see inconsistent state. Note that commit
+ * workers do not hold any locks, their access is only coordinated through
+ * ordering. If workers would access state only through the pointers in the
+ * free-standing state objects (currently not the case for any driver) then even
+ * multiple pending commits could be in-flight at the same time.
  *
  * 4. Schedule a work item to do all subsequent steps, using the split-out
  * commit helpers: a) pre-plane commit b) plane commit c) post-plane commit and
  * then cleaning up the framebuffers after the old framebuffer is no longer
- * being displayed.
- *
- * The above scheme is implemented in the atomic helper libraries in
- * drm_atomic_helper_commit() using a bunch of helper functions. See
- * drm_atomic_helper_setup_commit() for a starting point.
+ * being displayed. The scheduled work should synchronize against other workers
+ * using the &drm_crtc_commit infrastructure as needed. See
+ * drm_atomic_helper_setup_commit() for more details.
  */
 
 static int stall_checks(struct drm_crtc *crtc, bool nonblock)
@@ -2085,7 +2097,7 @@  EXPORT_SYMBOL(drm_atomic_helper_setup_commit);
  *
  * This function waits for all preceeding commits that touch the same CRTC as
  * @old_state to both be committed to the hardware (as signalled by
- * drm_atomic_helper_commit_hw_done) and executed by the hardware (as signalled
+ * drm_atomic_helper_commit_hw_done()) and executed by the hardware (as signalled
  * by calling drm_crtc_send_vblank_event() on the &drm_crtc_state.event).
  *
  * This is part of the atomic helper support for nonblocking commits, see
diff --git a/include/drm/drm_atomic.h b/include/drm/drm_atomic.h
index b6c73fd9f55a..5923819dcd68 100644
--- a/include/drm/drm_atomic.h
+++ b/include/drm/drm_atomic.h
@@ -60,8 +60,8 @@ 
  * 	wait for flip_done		<----
  * 	clean up atomic state
  *
- * The important bit to know is that cleanup_done is the terminal event, but the
- * ordering between flip_done and hw_done is entirely up to the specific driver
+ * The important bit to know is that &cleanup_done is the terminal event, but the
+ * ordering between &flip_done and &hw_done is entirely up to the specific driver
  * and modeset state change.
  *
  * For an implementation of how to use this look at
@@ -92,6 +92,9 @@  struct drm_crtc_commit {
 	 * commit is sent to userspace, or when an out-fence is singalled. Note
 	 * that for most hardware, in most cases this happens after @hw_done is
 	 * signalled.
+	 *
+	 * Completion of this stage is signalled implicitly by calling
+	 * drm_crtc_send_vblank_event() on &drm_crtc_state.event.
 	 */
 	struct completion flip_done;
 
@@ -107,6 +110,9 @@  struct drm_crtc_commit {
 	 * Note that this does not need to include separately reference-counted
 	 * resources like backing storage buffer pinning, or runtime pm
 	 * management.
+	 *
+	 * Drivers should call drm_atomic_helper_commit_hw_done() to signal
+	 * completion of this stage.
 	 */
 	struct completion hw_done;
 
@@ -118,6 +124,9 @@  struct drm_crtc_commit {
 	 * a vblank wait completed it might be a bit later. This completion is
 	 * useful to throttle updates and avoid hardware updates getting ahead
 	 * of the buffer cleanup too much.
+	 *
+	 * Drivers should call drm_atomic_helper_commit_cleanup_done() to signal
+	 * completion of this stage.
 	 */
 	struct completion cleanup_done;