diff mbox

[1/1,RFC] drivers/gpu/drm/i915:Documentation for batchbuffer submission

Message ID 1518789862-4006-2-git-send-email-kevin.rogovin@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

kevin.rogovin@intel.com Feb. 16, 2018, 2:04 p.m. UTC
From: Kevin Rogovin <kevin.rogovin@intel.com>

Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com>
---
 Documentation/gpu/i915.rst                 | 109 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  10 +++
 2 files changed, 119 insertions(+)

Comments

Joonas Lahtinen Feb. 19, 2018, 1:38 p.m. UTC | #1
These documentation improvements are much welcome, here are a few
comments from me.

Quoting kevin.rogovin@intel.com (2018-02-16 16:04:22)
> +Intel GPU Basics
> +----------------
> +
> +An Intel GPU has multiple engines. There are several engine types.
> +
> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_DEFAULT` in user space.

I'd call out I915_EXEC_RENDER existence here and introduce I915_EXEC_DEFAULT as
its own line.

> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
> +
> +The Intel GPU family is a familiy of integrated GPU's using Unified Memory
> +Access. For having the GPU "do work", user space will feed the GPU batch buffers
> +via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, `DRM_IOCTL_I915_GEM_EXECBUFFER2`
> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct the

I'd also call out DRM_IOCTL_I915_GEM_EXECBUFFER to be legacy submission
method and primarily mention I915_GEM_EXECBUFFER2_WR.

> +GPU to perform work (for example rendering) and that work needs memory from
> +which to read and memory to which to write. All memory is encapsulated within
> +GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).
> +An ioctl providing a batchbuffer for the GPU to create will also list all GEM
> +buffer objects that the batchbuffer reads and/or writes.
> +


In chronological order, maybe first introduce the hardware contexts?
Only then go to PPGTT.

> +The GPU has its own memory management and address space. The kernel driver
> +maintains the memory translation table for the GPU. For older GPUs (i.e. those
> +before Gen8), there is a single global such translation table, a global
> +Graphics Translation Table (GTT). For newer generation GPUs each hardware
> +context has its own translation table, called Per-Process Graphics Translation
> +Table (PPGTT). Of important note, is that although PPGTT is named per-process it
> +is actually per hardware context. When user space submits a batchbuffer, the kernel
> +walks the list of GEM buffer objects used by the batchbuffer and guarantees
> +that not only is the memory of each such GEM buffer object resident but it is
> +also present in the (PP)GTT. If the GEM buffer object is not yet placed in
> +the (PP)GTT, then it is given an address. Two consequences of this are:
> +the kernel needs to edit the batchbuffer submitted to write the correct
> +value of the GPU address when a GEM BO is assigned a GPU address and
> +the kernel might evict a different GEM BO from the (PP)GTT to make address
> +room for a GEM BO.
> +
> +Consequently, the ioctls submitting a batchbuffer for execution also include
> +a list of all locations within buffers that refer to GPU-addresses so that the
> +kernel can edit the buffer correctly. This process is dubbed relocation. The
> +ioctls allow user space to provide what the GPU address could be. If the kernel
> +sees that the address provided by user space is correct, then it skips performing
> +relocation for that GEM buffer object. In addition, the ioctl's provide to what
> +addresses the kernel relocates each GEM buffer object.
> +
> +There is also an interface for user space to directly specify the address location
> +of GEM BO's, the feature soft-pinning and made active within an execbuffer2
> +ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies I915_EXEC_NO_RELOC,
> +then the kernel is to not execute any relocation and user-space manages the address
> +space for its PPGTT itself. The advantage of user space handling address space is
> +that then the kernel does far less work and user space can safely assume that
> +GEM buffer object's location in GPU address space do not change.
> +
> +Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware context
> +represents GPU state that can be saved and restored. When user space uses a hardware
> +context, it does not need to restore the GPU state at the start of each batchbuffer
> +because the kernel directly the GPU to load the state from the hardware context.
> +Hardware contexts allow for much greater isolation between processes that use the GPU.
> +
> +Batchbuffer Submission
> +----------------------
> +
> +Depending on GPU generation, the i915 kernel driver will submit batchbuffers
> +in one of the several ways. However, the top code logic is shared for all
> +methods. They key function, i915_gem_do_execbuffer() essentially converts
> +the ioctl command to an internal data structure which is then added to a queue
> +which is processed elsewhere to give the job to the GPU; the details of
> +i915_gem_do_execbuffer() are covered in `Common Code`_.
> +
> +
> +Common Code
> +~~~~~~~~~~~
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :doc: User command execution
> +
> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +   :functions: i915_gem_do_execbuffer

I'm not sure about referring to internal functions as they're bound to
change often. No strong feeling on this, I just see this will be easy to
miss when changing the related code.

> +
> +Batchbuffer Submission Varieties 
> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +As stated before, there are several varieties in how to submit batchbuffers to the GPU;
> +which one in use is controlled by function pointer values in the c-struct intel_engine_cs
> +(defined in drivers/gpu/drm/i915/intel_ringbuffer.h)
> +
> +- request_alloc
> +- submit_request

Same here. Due to the being here in a separate file, I'm not sure if this level
of detail is going to be kept up when changing the actual code?

> +
> +The three varieties for submitting batchbuffer to the GPU are the following.
> +
> +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. When batchbuffers are submitted this older way, their contents are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.

Just for editing and reading pleasure, there must be a way of cutting
long lines in lists.

But more importantly, do refer to Command Parser/Parsing as the code uses
cmd parser aka. command parser extensively.

Regards, Joonas
kevin.rogovin@intel.com Feb. 27, 2018, 6:58 a.m. UTC | #2
Hi,

> These documentation improvements are much welcome, here are a few comments from me.


Thankyou!

>Quoting kevin.rogovin@intel.com (2018-02-16 16:04:22)

>> +Intel GPU Basics

>> +----------------

>> +

>> +An Intel GPU has multiple engines. There are several engine types.

>> +

>> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_DEFAULT` in user space.


>I'd call out I915_EXEC_RENDER existence here and introduce I915_EXEC_DEFAULT as its own line.


I agree; though what exactly is I915_EXECI_DEFAULT supposed to mean? it appears (to me) as just an alias
for I915_EXEC_RENDER.

>> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.

> +- VCS is a video encode and decode engine, this is named 

> +`I915_EXEC_BSD` in user space

>> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.

>> +

>> +The Intel GPU family is a familiy of integrated GPU's using Unified 

>> +Memory Access. For having the GPU "do work", user space will feed the 

>> +GPU batch buffers via one of the ioctls 

>> +`DRM_IOCTL_I915_GEM_EXECBUFFER`, `DRM_IOCTL_I915_GEM_EXECBUFFER2` or 

>> +`DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will 

>> +instruct the


> I'd also call out DRM_IOCTL_I915_GEM_EXECBUFFER to be legacy submission method and primarily mention I915_GEM_EXECBUFFER2_WR.


Agreed. 

>> +GPU to perform work (for example rendering) and that work needs 

>> +memory from which to read and memory to which to write. All memory is 

>> +encapsulated within GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).

>> +An ioctl providing a batchbuffer for the GPU to create will also list 

>> +all GEM buffer objects that the batchbuffer reads and/or writes.

>> +


>In chronological order, maybe first introduce the hardware contexts?

>Only then go to PPGTT.


Sure.

- snip (quoting patch) -

>> +Common Code

>> +~~~~~~~~~~~

>> +

>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c

>> +   :doc: User command execution

>> +

>> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c

>> +   :functions: i915_gem_do_execbuffer


>I'm not sure about referring to internal functions as they're bound to change often. No strong feeling on this, I just see this will be easy to miss when changing the related code.


I can place the text in the source file itself and have the .rst reference it, does this sound good?

>> +

>> +Batchbuffer Submission Varieties

>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

>> +

>> +As stated before, there are several varieties in how to submit 

>> +batchbuffers to the GPU; which one in use is controlled by function 

>> +pointer values in the c-struct intel_engine_cs (defined in 

>> +drivers/gpu/drm/i915/intel_ringbuffer.h)

>> +

>> +- request_alloc

>> +- submit_request


> Same here. Due to the being here in a separate file, I'm not sure if this level of detail is going to be kept up when changing the actual code?


I can place the text in the source file as well; one of the goals I have is that the documentation has sufficient details within
it so that a new person can not only have an idea of what the driver is doing, but where the code is located.

>> +

>> +The three varieties for submitting batchbuffer to the GPU are the following.

>> +

>> +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. When batchbuffers are submitted this older way, their contents are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.


> Just for editing and reading pleasure, there must be a way of cutting long lines in lists.


This was something that I had troubles with; when I cut the long lines, the produced html output would not make it a list;
I had tried each of the following to cut the long lines:

  1. just cut the long lines to multiple lines

  2. add a \ between line-breaks

  3. cut long line and add white space to force alignment

Each of these resulted in just one item instead of a list. Any advice/pointers are greatly appreciated.

> But more importantly, do refer to Command Parser/Parsing as the code uses cmd parser aka. command parser extensively.


The item (3) has a link to the section "Batchbuffer Parsing" whose contents are the information on the command parser. Would you 
like me to rename the section "Batchbuffer Parsing" to "Command Parser" as well?

And again, thankyou for the comments (as this is just an RFC); I will be posting more (hopefully) by the end of the week with
all feedback taken into use and additional content (namely how/when those various function pointers are called and the entire
IRQ dance for submission and other details,  (for example) GuC submission).

-Kevin
diff mbox

Patch

diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst
index 41dc881b00dc..36b3ade85839 100644
--- a/Documentation/gpu/i915.rst
+++ b/Documentation/gpu/i915.rst
@@ -13,6 +13,18 @@  Core Driver Infrastructure
 This section covers core driver infrastructure used by both the display
 and the GEM parts of the driver.
 
+Initialization
+--------------
+
+The real action of initialization for the i915 driver is handled by
+:c:func:`i915_driver_load`; from this function one can see the key
+data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points
+to to the driver from user space. 
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c
+   :functions: i915_driver_load
+
+
 Runtime Power Management
 ------------------------
 
@@ -249,6 +261,102 @@  Memory Management and Command Submission
 This sections covers all things related to the GEM implementation in the
 i915 driver.
 
+Intel GPU Basics
+----------------
+
+An Intel GPU has multiple engines. There are several engine types.
+
+- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_DEFAULT` in user space.
+- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space.
+- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space
+- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space.
+
+The Intel GPU family is a familiy of integrated GPU's using Unified Memory
+Access. For having the GPU "do work", user space will feed the GPU batch buffers
+via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER`, `DRM_IOCTL_I915_GEM_EXECBUFFER2`
+or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will instruct the
+GPU to perform work (for example rendering) and that work needs memory from
+which to read and memory to which to write. All memory is encapsulated within
+GEM buffer objects (usually created with the ioctl DRM_IOCTL_I915_GEM_CREATE).
+An ioctl providing a batchbuffer for the GPU to create will also list all GEM
+buffer objects that the batchbuffer reads and/or writes.
+
+The GPU has its own memory management and address space. The kernel driver
+maintains the memory translation table for the GPU. For older GPUs (i.e. those
+before Gen8), there is a single global such translation table, a global
+Graphics Translation Table (GTT). For newer generation GPUs each hardware
+context has its own translation table, called Per-Process Graphics Translation
+Table (PPGTT). Of important note, is that although PPGTT is named per-process it
+is actually per hardware context. When user space submits a batchbuffer, the kernel
+walks the list of GEM buffer objects used by the batchbuffer and guarantees
+that not only is the memory of each such GEM buffer object resident but it is
+also present in the (PP)GTT. If the GEM buffer object is not yet placed in
+the (PP)GTT, then it is given an address. Two consequences of this are:
+the kernel needs to edit the batchbuffer submitted to write the correct
+value of the GPU address when a GEM BO is assigned a GPU address and
+the kernel might evict a different GEM BO from the (PP)GTT to make address
+room for a GEM BO.
+
+Consequently, the ioctls submitting a batchbuffer for execution also include
+a list of all locations within buffers that refer to GPU-addresses so that the
+kernel can edit the buffer correctly. This process is dubbed relocation. The
+ioctls allow user space to provide what the GPU address could be. If the kernel
+sees that the address provided by user space is correct, then it skips performing
+relocation for that GEM buffer object. In addition, the ioctl's provide to what
+addresses the kernel relocates each GEM buffer object.
+
+There is also an interface for user space to directly specify the address location
+of GEM BO's, the feature soft-pinning and made active within an execbuffer2
+ioctl with EXEC_OBJECT_PINNED bit up. If user-space also specifies I915_EXEC_NO_RELOC,
+then the kernel is to not execute any relocation and user-space manages the address
+space for its PPGTT itself. The advantage of user space handling address space is
+that then the kernel does far less work and user space can safely assume that
+GEM buffer object's location in GPU address space do not change.
+
+Starting in Gen6, Intel GPU's support hardware contexts. A GPU hardware context
+represents GPU state that can be saved and restored. When user space uses a hardware
+context, it does not need to restore the GPU state at the start of each batchbuffer
+because the kernel directly the GPU to load the state from the hardware context.
+Hardware contexts allow for much greater isolation between processes that use the GPU.
+
+Batchbuffer Submission
+----------------------
+
+Depending on GPU generation, the i915 kernel driver will submit batchbuffers
+in one of the several ways. However, the top code logic is shared for all
+methods. They key function, i915_gem_do_execbuffer() essentially converts
+the ioctl command to an internal data structure which is then added to a queue
+which is processed elsewhere to give the job to the GPU; the details of
+i915_gem_do_execbuffer() are covered in `Common Code`_.
+
+
+Common Code
+~~~~~~~~~~~
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :doc: User command execution
+
+.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c
+   :functions: i915_gem_do_execbuffer
+
+Batchbuffer Submission Varieties 
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As stated before, there are several varieties in how to submit batchbuffers to the GPU;
+which one in use is controlled by function pointer values in the c-struct intel_engine_cs
+(defined in drivers/gpu/drm/i915/intel_ringbuffer.h)
+
+- request_alloc
+- submit_request
+
+The three varieties for submitting batchbuffer to the GPU are the following.
+
+1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. When batchbuffers are submitted this older way, their contents are checked via Batchbuffer Parsing, see `Batchbuffer Parsing`_.
+2. Batchbuffer are submitting via execlists are a features supported by Gen8 and new devices; the macro :c:macro:'HAS_EXECLISTS' is used to determine if a GPU supports submitting via execlists, see `Logical Rings, Logical Ring Contexts and Execlists`_.    
+3. Batchbuffer are submitted to the GuC, see `GuC`_.
+
+
+
 Batchbuffer Parsing
 -------------------
 
@@ -266,6 +374,7 @@  Batchbuffer Pools
 
 .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c
    :internal:
 
 Logical Rings, Logical Ring Contexts and Execlists
 --------------------------------------------------
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index b15305f2fb76..4a22ae86ceb3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2178,6 +2178,16 @@  signal_fence_array(struct i915_execbuffer *eb,
 	}
 }
 
+/**
+ * i915_gem_do_execbuffer() - Batchbuffer submission common implementation
+ *
+ * All ioctl's for submitting a batchbuffer reduce to this function;
+ * This function will place the batchbuffer to be executed on a submission
+ * queue which will later (via interupt calling into i915 driver) place
+ * send the batchbuffer to the GPU.
+ *
+ * Return: 0 on success, error code on failure
+ */
 static int
 i915_gem_do_execbuffer(struct drm_device *dev,
 		       struct drm_file *file,