Message ID | 1522752747-7836-2-git-send-email-kevin.rogovin@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Quoting kevin.rogovin@intel.com (2018-04-03 13:52:23) > From: Kevin Rogovin <kevin.rogovin@intel.com> > > Add a narration to i915.rst about Intel GEN GPU's: engines, > driver context and relocation. > > Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> I'm still bummed by the long lines in the bulleted list, but regardless: Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Regards, Joonas > --- > Documentation/gpu/i915.rst | 116 ++++++++++++++++++++++++++++++++-------- > drivers/gpu/drm/i915/i915_vma.h | 10 ++-- > 2 files changed, 100 insertions(+), 26 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 41dc881b00dc..00f897f67f85 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -249,6 +249,99 @@ Memory Management and Command Submission > This sections covers all things related to the GEM implementation in the > i915 driver. > > +Intel GPU Basics > +---------------- > + > +An Intel GPU has multiple engines. There are several engine types. > + > +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. > +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. > +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space > +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. > +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. > + > +The Intel GPU family is a family of integrated GPU's using Unified > +Memory Access. For having the GPU "do work", user space will feed the > +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` > +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will > +instruct the GPU to perform work (for example rendering) and that work > +needs memory from which to read and memory to which to write. All memory > +is encapsulated within GEM buffer objects (usually created with the ioctl > +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU > +to create will also list all GEM buffer objects that the batchbuffer reads > +and/or writes. For implementation details of memory management see > +`GEM BO Management Implementation Details`_. > + > +The i915 driver allows user space to create a context via the ioctl > +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit > +integer. Such a context should be veiwed by user-space as -loosely- > +analogous to the idea of a CPU process of an operating system. The i915 > +driver guarantees that commands issued to a fixed context are to be > +executed so that writes of a previously issued command are seen by > +reads of following commands. Actions issued between different contexts > +(even if from the same file descriptor) are NOT given that guarantee > +and the only way to synchornize across contexts (even from the same > +file descriptor) is through the use of fences. At least as far back as > +Gen4, also have that a context carries with it a GPU HW context; > +the HW context is essentially (most of atleast) the state of a GPU. > +In addition to the ordering gaurantees, the kernel will restore GPU > +state via HW context when commands are issued to a context, this saves > +user space the need to restore (most of atleast) the GPU state at the > +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` > +is used by user space to create a hardware context which is identified > +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer > +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) > +to identify what context to use with the command. > + > +The GPU has its own memory management and address space. The kernel > +driver maintains the memory translation table for the GPU. For older > +GPUs (i.e. those before Gen8), there is a single global such translation > +table, a global Graphics Translation Table (GTT). For newer generation > +GPUs each context has its own translation table, called Per-Process > +Graphics Translation Table (PPGTT). Of important note, is that although > +PPGTT is named per-process it is actually per context. When user space > +submits a batchbuffer, the kernel walks the list of GEM buffer objects > +used by the batchbuffer and guarantees that not only is the memory of > +each such GEM buffer object resident but it is also present in the > +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, > +then it is given an address. Two consequences of this are: the kernel > +needs to edit the batchbuffer submitted to write the correct value of > +the GPU address when a GEM BO is assigned a GPU address and the kernel > +might evict a different GEM BO from the (PP)GTT to make address room > +for another GEM BO. Consequently, the ioctls submitting a batchbuffer > +for execution also include a list of all locations within buffers that > +refer to GPU-addresses so that the kernel can edit the buffer correctly. > +This process is dubbed relocation. > + > +GEM BO Management Implementation Details > +---------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h > + :doc: Virtual Memory Address > + > +Buffer Object Eviction > +---------------------- > + > +This section documents the interface functions for evicting buffer > +objects to make space available in the virtual gpu address spaces. Note > +that this is mostly orthogonal to shrinking buffer objects caches, which > +has the goal to make main memory (shared with the gpu through the > +unified memory architecture) available. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > + :internal: > + > +Buffer Object Memory Shrinking > +------------------------------ > + > +This section documents the interface function for shrinking memory usage > +of buffer object caches. Shrinking is used to make main memory > +available. Note that this is mostly orthogonal to evicting buffer > +objects, which has the goal to make space in gpu virtual address spaces. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > + :internal: > + > Batchbuffer Parsing > ------------------- > > @@ -312,29 +405,6 @@ Object Tiling IOCTLs > .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c > :doc: buffer object tiling > > -Buffer Object Eviction > ----------------------- > - > -This section documents the interface functions for evicting buffer > -objects to make space available in the virtual gpu address spaces. Note > -that this is mostly orthogonal to shrinking buffer objects caches, which > -has the goal to make main memory (shared with the gpu through the > -unified memory architecture) available. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > - :internal: > - > -Buffer Object Memory Shrinking > ------------------------------- > - > -This section documents the interface function for shrinking memory usage > -of buffer object caches. Shrinking is used to make main memory > -available. Note that this is mostly orthogonal to evicting buffer > -objects, which has the goal to make space in gpu virtual address spaces. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > - :internal: > - > GuC > === > > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 8c5022095418..0000f23a7266 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -38,9 +38,13 @@ > enum i915_cache_level; > > /** > - * A VMA represents a GEM BO that is bound into an address space. Therefore, a > - * VMA's presence cannot be guaranteed before binding, or after unbinding the > - * object into/from the address space. > + * DOC: Virtual Memory Address > + * > + * An `i915_vma` struct represents a GEM BO that is bound into an address > + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or > + * after unbinding the object into/from the address space. The struct includes > + * the bookkepping details needed for tracking it in all the lists with which > + * it interacts. > * > * To make things as simple as possible (ie. no refcounting), a VMA's lifetime > * will always be <= an objects lifetime. So object refcounting should cover us. > -- > 2.16.2 >
On Tue, 03 Apr 2018, Joonas Lahtinen <joonas.lahtinen@linux.intel.com> wrote: > Quoting kevin.rogovin@intel.com (2018-04-03 13:52:23) >> From: Kevin Rogovin <kevin.rogovin@intel.com> >> >> Add a narration to i915.rst about Intel GEN GPU's: engines, >> driver context and relocation. >> >> Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> > > I'm still bummed by the long lines in the bulleted list, but regardless: Hum, there's no need to do that. Please reflow. BR, Jani. > > Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> > > Regards, Joonas > >> --- >> Documentation/gpu/i915.rst | 116 ++++++++++++++++++++++++++++++++-------- >> drivers/gpu/drm/i915/i915_vma.h | 10 ++-- >> 2 files changed, 100 insertions(+), 26 deletions(-) >> >> diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst >> index 41dc881b00dc..00f897f67f85 100644 >> --- a/Documentation/gpu/i915.rst >> +++ b/Documentation/gpu/i915.rst >> @@ -249,6 +249,99 @@ Memory Management and Command Submission >> This sections covers all things related to the GEM implementation in the >> i915 driver. >> >> +Intel GPU Basics >> +---------------- >> + >> +An Intel GPU has multiple engines. There are several engine types. >> + >> +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. >> +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. >> +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space >> +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. >> +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. >> + >> +The Intel GPU family is a family of integrated GPU's using Unified >> +Memory Access. For having the GPU "do work", user space will feed the >> +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` >> +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will >> +instruct the GPU to perform work (for example rendering) and that work >> +needs memory from which to read and memory to which to write. All memory >> +is encapsulated within GEM buffer objects (usually created with the ioctl >> +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU >> +to create will also list all GEM buffer objects that the batchbuffer reads >> +and/or writes. For implementation details of memory management see >> +`GEM BO Management Implementation Details`_. >> + >> +The i915 driver allows user space to create a context via the ioctl >> +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit >> +integer. Such a context should be veiwed by user-space as -loosely- >> +analogous to the idea of a CPU process of an operating system. The i915 >> +driver guarantees that commands issued to a fixed context are to be >> +executed so that writes of a previously issued command are seen by >> +reads of following commands. Actions issued between different contexts >> +(even if from the same file descriptor) are NOT given that guarantee >> +and the only way to synchornize across contexts (even from the same >> +file descriptor) is through the use of fences. At least as far back as >> +Gen4, also have that a context carries with it a GPU HW context; >> +the HW context is essentially (most of atleast) the state of a GPU. >> +In addition to the ordering gaurantees, the kernel will restore GPU >> +state via HW context when commands are issued to a context, this saves >> +user space the need to restore (most of atleast) the GPU state at the >> +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` >> +is used by user space to create a hardware context which is identified >> +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer >> +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) >> +to identify what context to use with the command. >> + >> +The GPU has its own memory management and address space. The kernel >> +driver maintains the memory translation table for the GPU. For older >> +GPUs (i.e. those before Gen8), there is a single global such translation >> +table, a global Graphics Translation Table (GTT). For newer generation >> +GPUs each context has its own translation table, called Per-Process >> +Graphics Translation Table (PPGTT). Of important note, is that although >> +PPGTT is named per-process it is actually per context. When user space >> +submits a batchbuffer, the kernel walks the list of GEM buffer objects >> +used by the batchbuffer and guarantees that not only is the memory of >> +each such GEM buffer object resident but it is also present in the >> +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, >> +then it is given an address. Two consequences of this are: the kernel >> +needs to edit the batchbuffer submitted to write the correct value of >> +the GPU address when a GEM BO is assigned a GPU address and the kernel >> +might evict a different GEM BO from the (PP)GTT to make address room >> +for another GEM BO. Consequently, the ioctls submitting a batchbuffer >> +for execution also include a list of all locations within buffers that >> +refer to GPU-addresses so that the kernel can edit the buffer correctly. >> +This process is dubbed relocation. >> + >> +GEM BO Management Implementation Details >> +---------------------------------------- >> + >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h >> + :doc: Virtual Memory Address >> + >> +Buffer Object Eviction >> +---------------------- >> + >> +This section documents the interface functions for evicting buffer >> +objects to make space available in the virtual gpu address spaces. Note >> +that this is mostly orthogonal to shrinking buffer objects caches, which >> +has the goal to make main memory (shared with the gpu through the >> +unified memory architecture) available. >> + >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c >> + :internal: >> + >> +Buffer Object Memory Shrinking >> +------------------------------ >> + >> +This section documents the interface function for shrinking memory usage >> +of buffer object caches. Shrinking is used to make main memory >> +available. Note that this is mostly orthogonal to evicting buffer >> +objects, which has the goal to make space in gpu virtual address spaces. >> + >> +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c >> + :internal: >> + >> Batchbuffer Parsing >> ------------------- >> >> @@ -312,29 +405,6 @@ Object Tiling IOCTLs >> .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c >> :doc: buffer object tiling >> >> -Buffer Object Eviction >> ----------------------- >> - >> -This section documents the interface functions for evicting buffer >> -objects to make space available in the virtual gpu address spaces. Note >> -that this is mostly orthogonal to shrinking buffer objects caches, which >> -has the goal to make main memory (shared with the gpu through the >> -unified memory architecture) available. >> - >> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c >> - :internal: >> - >> -Buffer Object Memory Shrinking >> ------------------------------- >> - >> -This section documents the interface function for shrinking memory usage >> -of buffer object caches. Shrinking is used to make main memory >> -available. Note that this is mostly orthogonal to evicting buffer >> -objects, which has the goal to make space in gpu virtual address spaces. >> - >> -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c >> - :internal: >> - >> GuC >> === >> >> diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h >> index 8c5022095418..0000f23a7266 100644 >> --- a/drivers/gpu/drm/i915/i915_vma.h >> +++ b/drivers/gpu/drm/i915/i915_vma.h >> @@ -38,9 +38,13 @@ >> enum i915_cache_level; >> >> /** >> - * A VMA represents a GEM BO that is bound into an address space. Therefore, a >> - * VMA's presence cannot be guaranteed before binding, or after unbinding the >> - * object into/from the address space. >> + * DOC: Virtual Memory Address >> + * >> + * An `i915_vma` struct represents a GEM BO that is bound into an address >> + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or >> + * after unbinding the object into/from the address space. The struct includes >> + * the bookkepping details needed for tracking it in all the lists with which >> + * it interacts. >> * >> * To make things as simple as possible (ie. no refcounting), a VMA's lifetime >> * will always be <= an objects lifetime. So object refcounting should cover us. >> -- >> 2.16.2 >> > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
kevin.rogovin@intel.com writes: > From: Kevin Rogovin <kevin.rogovin@intel.com> > > Add a narration to i915.rst about Intel GEN GPU's: engines, > driver context and relocation. > > Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> > --- > Documentation/gpu/i915.rst | 116 ++++++++++++++++++++++++++++++++-------- > drivers/gpu/drm/i915/i915_vma.h | 10 ++-- > 2 files changed, 100 insertions(+), 26 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 41dc881b00dc..00f897f67f85 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -249,6 +249,99 @@ Memory Management and Command Submission > This sections covers all things related to the GEM implementation in the > i915 driver. > > +Intel GPU Basics > +---------------- > + > +An Intel GPU has multiple engines. There are several engine types. > + > +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. > +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. > +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space > +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. > +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. > + > +The Intel GPU family is a family of integrated GPU's using Unified > +Memory Access. For having the GPU "do work", user space will feed the > +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` > +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will > +instruct the GPU to perform work (for example rendering) and that work > +needs memory from which to read and memory to which to write. All memory > +is encapsulated within GEM buffer objects (usually created with the ioctl > +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU > +to create will also list all GEM buffer objects that the batchbuffer reads > +and/or writes. For implementation details of memory management see > +`GEM BO Management Implementation Details`_. > + > +The i915 driver allows user space to create a context via the ioctl > +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit > +integer. Such a context should be veiwed by user-space as -loosely- s/veiwed/viewed > +analogous to the idea of a CPU process of an operating system. The i915 > +driver guarantees that commands issued to a fixed context are to be > +executed so that writes of a previously issued command are seen by > +reads of following commands. Actions issued between different contexts > +(even if from the same file descriptor) are NOT given that guarantee > +and the only way to synchornize across contexts (even from the same > +file descriptor) is through the use of fences. At least as far back as > +Gen4, also have that a context carries with it a GPU HW context; > +the HW context is essentially (most of atleast) the state of a GPU. > +In addition to the ordering gaurantees, the kernel will restore GPU s/gaurantees/guarantees -Mika > +state via HW context when commands are issued to a context, this saves > +user space the need to restore (most of atleast) the GPU state at the > +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` > +is used by user space to create a hardware context which is identified > +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer > +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) > +to identify what context to use with the command. > + > +The GPU has its own memory management and address space. The kernel > +driver maintains the memory translation table for the GPU. For older > +GPUs (i.e. those before Gen8), there is a single global such translation > +table, a global Graphics Translation Table (GTT). For newer generation > +GPUs each context has its own translation table, called Per-Process > +Graphics Translation Table (PPGTT). Of important note, is that although > +PPGTT is named per-process it is actually per context. When user space > +submits a batchbuffer, the kernel walks the list of GEM buffer objects > +used by the batchbuffer and guarantees that not only is the memory of > +each such GEM buffer object resident but it is also present in the > +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, > +then it is given an address. Two consequences of this are: the kernel > +needs to edit the batchbuffer submitted to write the correct value of > +the GPU address when a GEM BO is assigned a GPU address and the kernel > +might evict a different GEM BO from the (PP)GTT to make address room > +for another GEM BO. Consequently, the ioctls submitting a batchbuffer > +for execution also include a list of all locations within buffers that > +refer to GPU-addresses so that the kernel can edit the buffer correctly. > +This process is dubbed relocation. > + > +GEM BO Management Implementation Details > +---------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h > + :doc: Virtual Memory Address > + > +Buffer Object Eviction > +---------------------- > + > +This section documents the interface functions for evicting buffer > +objects to make space available in the virtual gpu address spaces. Note > +that this is mostly orthogonal to shrinking buffer objects caches, which > +has the goal to make main memory (shared with the gpu through the > +unified memory architecture) available. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > + :internal: > + > +Buffer Object Memory Shrinking > +------------------------------ > + > +This section documents the interface function for shrinking memory usage > +of buffer object caches. Shrinking is used to make main memory > +available. Note that this is mostly orthogonal to evicting buffer > +objects, which has the goal to make space in gpu virtual address spaces. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > + :internal: > + > Batchbuffer Parsing > ------------------- > > @@ -312,29 +405,6 @@ Object Tiling IOCTLs > .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c > :doc: buffer object tiling > > -Buffer Object Eviction > ----------------------- > - > -This section documents the interface functions for evicting buffer > -objects to make space available in the virtual gpu address spaces. Note > -that this is mostly orthogonal to shrinking buffer objects caches, which > -has the goal to make main memory (shared with the gpu through the > -unified memory architecture) available. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > - :internal: > - > -Buffer Object Memory Shrinking > ------------------------------- > - > -This section documents the interface function for shrinking memory usage > -of buffer object caches. Shrinking is used to make main memory > -available. Note that this is mostly orthogonal to evicting buffer > -objects, which has the goal to make space in gpu virtual address spaces. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > - :internal: > - > GuC > === > > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 8c5022095418..0000f23a7266 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -38,9 +38,13 @@ > enum i915_cache_level; > > /** > - * A VMA represents a GEM BO that is bound into an address space. Therefore, a > - * VMA's presence cannot be guaranteed before binding, or after unbinding the > - * object into/from the address space. > + * DOC: Virtual Memory Address > + * > + * An `i915_vma` struct represents a GEM BO that is bound into an address > + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or > + * after unbinding the object into/from the address space. The struct includes > + * the bookkepping details needed for tracking it in all the lists with which > + * it interacts. > * > * To make things as simple as possible (ie. no refcounting), a VMA's lifetime > * will always be <= an objects lifetime. So object refcounting should cover us. > -- > 2.16.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
kevin.rogovin@intel.com writes: > From: Kevin Rogovin <kevin.rogovin@intel.com> > > Add a narration to i915.rst about Intel GEN GPU's: engines, > driver context and relocation. > > Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> Few typos pointed on a previous mail with those fixed, Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > --- > Documentation/gpu/i915.rst | 116 ++++++++++++++++++++++++++++++++-------- > drivers/gpu/drm/i915/i915_vma.h | 10 ++-- > 2 files changed, 100 insertions(+), 26 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 41dc881b00dc..00f897f67f85 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -249,6 +249,99 @@ Memory Management and Command Submission > This sections covers all things related to the GEM implementation in the > i915 driver. > > +Intel GPU Basics > +---------------- > + > +An Intel GPU has multiple engines. There are several engine types. > + > +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. > +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. > +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space > +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. > +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. > + > +The Intel GPU family is a family of integrated GPU's using Unified > +Memory Access. For having the GPU "do work", user space will feed the > +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` > +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will > +instruct the GPU to perform work (for example rendering) and that work > +needs memory from which to read and memory to which to write. All memory > +is encapsulated within GEM buffer objects (usually created with the ioctl > +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU > +to create will also list all GEM buffer objects that the batchbuffer reads > +and/or writes. For implementation details of memory management see > +`GEM BO Management Implementation Details`_. > + > +The i915 driver allows user space to create a context via the ioctl > +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit > +integer. Such a context should be veiwed by user-space as -loosely- > +analogous to the idea of a CPU process of an operating system. The i915 > +driver guarantees that commands issued to a fixed context are to be > +executed so that writes of a previously issued command are seen by > +reads of following commands. Actions issued between different contexts > +(even if from the same file descriptor) are NOT given that guarantee > +and the only way to synchornize across contexts (even from the same > +file descriptor) is through the use of fences. At least as far back as > +Gen4, also have that a context carries with it a GPU HW context; > +the HW context is essentially (most of atleast) the state of a GPU. > +In addition to the ordering gaurantees, the kernel will restore GPU > +state via HW context when commands are issued to a context, this saves > +user space the need to restore (most of atleast) the GPU state at the > +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` > +is used by user space to create a hardware context which is identified > +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer > +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) > +to identify what context to use with the command. > + > +The GPU has its own memory management and address space. The kernel > +driver maintains the memory translation table for the GPU. For older > +GPUs (i.e. those before Gen8), there is a single global such translation > +table, a global Graphics Translation Table (GTT). For newer generation > +GPUs each context has its own translation table, called Per-Process > +Graphics Translation Table (PPGTT). Of important note, is that although > +PPGTT is named per-process it is actually per context. When user space > +submits a batchbuffer, the kernel walks the list of GEM buffer objects > +used by the batchbuffer and guarantees that not only is the memory of > +each such GEM buffer object resident but it is also present in the > +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, > +then it is given an address. Two consequences of this are: the kernel > +needs to edit the batchbuffer submitted to write the correct value of > +the GPU address when a GEM BO is assigned a GPU address and the kernel > +might evict a different GEM BO from the (PP)GTT to make address room > +for another GEM BO. Consequently, the ioctls submitting a batchbuffer > +for execution also include a list of all locations within buffers that > +refer to GPU-addresses so that the kernel can edit the buffer correctly. > +This process is dubbed relocation. > + > +GEM BO Management Implementation Details > +---------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h > + :doc: Virtual Memory Address > + > +Buffer Object Eviction > +---------------------- > + > +This section documents the interface functions for evicting buffer > +objects to make space available in the virtual gpu address spaces. Note > +that this is mostly orthogonal to shrinking buffer objects caches, which > +has the goal to make main memory (shared with the gpu through the > +unified memory architecture) available. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > + :internal: > + > +Buffer Object Memory Shrinking > +------------------------------ > + > +This section documents the interface function for shrinking memory usage > +of buffer object caches. Shrinking is used to make main memory > +available. Note that this is mostly orthogonal to evicting buffer > +objects, which has the goal to make space in gpu virtual address spaces. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > + :internal: > + > Batchbuffer Parsing > ------------------- > > @@ -312,29 +405,6 @@ Object Tiling IOCTLs > .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c > :doc: buffer object tiling > > -Buffer Object Eviction > ----------------------- > - > -This section documents the interface functions for evicting buffer > -objects to make space available in the virtual gpu address spaces. Note > -that this is mostly orthogonal to shrinking buffer objects caches, which > -has the goal to make main memory (shared with the gpu through the > -unified memory architecture) available. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > - :internal: > - > -Buffer Object Memory Shrinking > ------------------------------- > - > -This section documents the interface function for shrinking memory usage > -of buffer object caches. Shrinking is used to make main memory > -available. Note that this is mostly orthogonal to evicting buffer > -objects, which has the goal to make space in gpu virtual address spaces. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > - :internal: > - > GuC > === > > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 8c5022095418..0000f23a7266 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -38,9 +38,13 @@ > enum i915_cache_level; > > /** > - * A VMA represents a GEM BO that is bound into an address space. Therefore, a > - * VMA's presence cannot be guaranteed before binding, or after unbinding the > - * object into/from the address space. > + * DOC: Virtual Memory Address > + * > + * An `i915_vma` struct represents a GEM BO that is bound into an address > + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or > + * after unbinding the object into/from the address space. The struct includes > + * the bookkepping details needed for tracking it in all the lists with which > + * it interacts. > * > * To make things as simple as possible (ie. no refcounting), a VMA's lifetime > * will always be <= an objects lifetime. So object refcounting should cover us. > -- > 2.16.2 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 4/3/2018 3:52 AM, kevin.rogovin@intel.com wrote: > From: Kevin Rogovin <kevin.rogovin@intel.com> > > Add a narration to i915.rst about Intel GEN GPU's: engines, > driver context and relocation. > > Signed-off-by: Kevin Rogovin <kevin.rogovin@intel.com> > --- > Documentation/gpu/i915.rst | 116 ++++++++++++++++++++++++++++++++-------- > drivers/gpu/drm/i915/i915_vma.h | 10 ++-- > 2 files changed, 100 insertions(+), 26 deletions(-) > > diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst > index 41dc881b00dc..00f897f67f85 100644 > --- a/Documentation/gpu/i915.rst > +++ b/Documentation/gpu/i915.rst > @@ -249,6 +249,99 @@ Memory Management and Command Submission > This sections covers all things related to the GEM implementation in the > i915 driver. > > +Intel GPU Basics > +---------------- > + > +An Intel GPU has multiple engines. There are several engine types. > + > +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. > +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. > +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space > +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. > +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. > + > +The Intel GPU family is a family of integrated GPU's using Unified > +Memory Access. For having the GPU "do work", user space will feed the > +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` > +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will > +instruct the GPU to perform work (for example rendering) and that work > +needs memory from which to read and memory to which to write. All memory > +is encapsulated within GEM buffer objects (usually created with the ioctl > +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU > +to create will also list all GEM buffer objects that the batchbuffer reads > +and/or writes. For implementation details of memory management see > +`GEM BO Management Implementation Details`_. > + > +The i915 driver allows user space to create a context via the ioctl > +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit Context is defined here > +integer. Such a context should be veiwed by user-space as -loosely- > +analogous to the idea of a CPU process of an operating system. The i915 > +driver guarantees that commands issued to a fixed context are to be > +executed so that writes of a previously issued command are seen by > +reads of following commands. Actions issued between different contexts > +(even if from the same file descriptor) are NOT given that guarantee > +and the only way to synchornize across contexts (even from the same s/synchornize/synchronize > +file descriptor) is through the use of fences. At least as far back as > +Gen4, also have that a context carries with it a GPU HW context; > +the HW context is essentially (most of atleast) the state of a GPU. > +In addition to the ordering gaurantees, the kernel will restore GPU > +state via HW context when commands are issued to a context, this saves > +user space the need to restore (most of atleast) the GPU state at the > +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` > +is used by user space to create a hardware context which is identified Duplicate of above definition of context? > +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer > +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) > +to identify what context to use with the command. > + > +The GPU has its own memory management and address space. The kernel > +driver maintains the memory translation table for the GPU. For older > +GPUs (i.e. those before Gen8), there is a single global such translation > +table, a global Graphics Translation Table (GTT). For newer generation > +GPUs each context has its own translation table, called Per-Process > +Graphics Translation Table (PPGTT). Of important note, is that although > +PPGTT is named per-process it is actually per context. When user space > +submits a batchbuffer, the kernel walks the list of GEM buffer objects > +used by the batchbuffer and guarantees that not only is the memory of > +each such GEM buffer object resident but it is also present in the > +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, > +then it is given an address. Two consequences of this are: the kernel > +needs to edit the batchbuffer submitted to write the correct value of > +the GPU address when a GEM BO is assigned a GPU address and the kernel > +might evict a different GEM BO from the (PP)GTT to make address room > +for another GEM BO. Consequently, the ioctls submitting a batchbuffer > +for execution also include a list of all locations within buffers that > +refer to GPU-addresses so that the kernel can edit the buffer correctly. > +This process is dubbed relocation. > + > +GEM BO Management Implementation Details > +---------------------------------------- > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h > + :doc: Virtual Memory Address > + > +Buffer Object Eviction > +---------------------- > + > +This section documents the interface functions for evicting buffer > +objects to make space available in the virtual gpu address spaces. Note > +that this is mostly orthogonal to shrinking buffer objects caches, which > +has the goal to make main memory (shared with the gpu through the > +unified memory architecture) available. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > + :internal: > + > +Buffer Object Memory Shrinking > +------------------------------ > + > +This section documents the interface function for shrinking memory usage > +of buffer object caches. Shrinking is used to make main memory > +available. Note that this is mostly orthogonal to evicting buffer > +objects, which has the goal to make space in gpu virtual address spaces. > + > +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > + :internal: > + > Batchbuffer Parsing > ------------------- > > @@ -312,29 +405,6 @@ Object Tiling IOCTLs > .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c > :doc: buffer object tiling > > -Buffer Object Eviction > ----------------------- > - > -This section documents the interface functions for evicting buffer > -objects to make space available in the virtual gpu address spaces. Note > -that this is mostly orthogonal to shrinking buffer objects caches, which > -has the goal to make main memory (shared with the gpu through the > -unified memory architecture) available. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c > - :internal: > - > -Buffer Object Memory Shrinking > ------------------------------- > - > -This section documents the interface function for shrinking memory usage > -of buffer object caches. Shrinking is used to make main memory > -available. Note that this is mostly orthogonal to evicting buffer > -objects, which has the goal to make space in gpu virtual address spaces. > - > -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c > - :internal: > - > GuC > === > > diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h > index 8c5022095418..0000f23a7266 100644 > --- a/drivers/gpu/drm/i915/i915_vma.h > +++ b/drivers/gpu/drm/i915/i915_vma.h > @@ -38,9 +38,13 @@ > enum i915_cache_level; > > /** > - * A VMA represents a GEM BO that is bound into an address space. Therefore, a > - * VMA's presence cannot be guaranteed before binding, or after unbinding the > - * object into/from the address space. > + * DOC: Virtual Memory Address > + * > + * An `i915_vma` struct represents a GEM BO that is bound into an address > + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or > + * after unbinding the object into/from the address space. The struct includes > + * the bookkepping details needed for tracking it in all the lists with which > + * it interacts. > * > * To make things as simple as possible (ie. no refcounting), a VMA's lifetime > * will always be <= an objects lifetime. So object refcounting should cover us. >
diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 41dc881b00dc..00f897f67f85 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -249,6 +249,99 @@ Memory Management and Command Submission This sections covers all things related to the GEM implementation in the i915 driver. +Intel GPU Basics +---------------- + +An Intel GPU has multiple engines. There are several engine types. + +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. +- The enumeration `I915_EXEC_DEFAULT` does not refer to specific engine; instead it is to be used by user space to specify a default rendering engine (for 3D) that may or may not be the same as RCS. + +The Intel GPU family is a family of integrated GPU's using Unified +Memory Access. For having the GPU "do work", user space will feed the +GPU batch buffers via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` +or `DRM_IOCTL_I915_GEM_EXECBUFFER2_WR`. Most such batchbuffers will +instruct the GPU to perform work (for example rendering) and that work +needs memory from which to read and memory to which to write. All memory +is encapsulated within GEM buffer objects (usually created with the ioctl +`DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer for the GPU +to create will also list all GEM buffer objects that the batchbuffer reads +and/or writes. For implementation details of memory management see +`GEM BO Management Implementation Details`_. + +The i915 driver allows user space to create a context via the ioctl +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` which is identified by a 32-bit +integer. Such a context should be veiwed by user-space as -loosely- +analogous to the idea of a CPU process of an operating system. The i915 +driver guarantees that commands issued to a fixed context are to be +executed so that writes of a previously issued command are seen by +reads of following commands. Actions issued between different contexts +(even if from the same file descriptor) are NOT given that guarantee +and the only way to synchornize across contexts (even from the same +file descriptor) is through the use of fences. At least as far back as +Gen4, also have that a context carries with it a GPU HW context; +the HW context is essentially (most of atleast) the state of a GPU. +In addition to the ordering gaurantees, the kernel will restore GPU +state via HW context when commands are issued to a context, this saves +user space the need to restore (most of atleast) the GPU state at the +start of each batchbuffer. The ioctl `DRM_IOCTL_I915_GEM_CONTEXT_CREATE` +is used by user space to create a hardware context which is identified +by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) +to identify what context to use with the command. + +The GPU has its own memory management and address space. The kernel +driver maintains the memory translation table for the GPU. For older +GPUs (i.e. those before Gen8), there is a single global such translation +table, a global Graphics Translation Table (GTT). For newer generation +GPUs each context has its own translation table, called Per-Process +Graphics Translation Table (PPGTT). Of important note, is that although +PPGTT is named per-process it is actually per context. When user space +submits a batchbuffer, the kernel walks the list of GEM buffer objects +used by the batchbuffer and guarantees that not only is the memory of +each such GEM buffer object resident but it is also present in the +(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT, +then it is given an address. Two consequences of this are: the kernel +needs to edit the batchbuffer submitted to write the correct value of +the GPU address when a GEM BO is assigned a GPU address and the kernel +might evict a different GEM BO from the (PP)GTT to make address room +for another GEM BO. Consequently, the ioctls submitting a batchbuffer +for execution also include a list of all locations within buffers that +refer to GPU-addresses so that the kernel can edit the buffer correctly. +This process is dubbed relocation. + +GEM BO Management Implementation Details +---------------------------------------- + +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h + :doc: Virtual Memory Address + +Buffer Object Eviction +---------------------- + +This section documents the interface functions for evicting buffer +objects to make space available in the virtual gpu address spaces. Note +that this is mostly orthogonal to shrinking buffer objects caches, which +has the goal to make main memory (shared with the gpu through the +unified memory architecture) available. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c + :internal: + +Buffer Object Memory Shrinking +------------------------------ + +This section documents the interface function for shrinking memory usage +of buffer object caches. Shrinking is used to make main memory +available. Note that this is mostly orthogonal to evicting buffer +objects, which has the goal to make space in gpu virtual address spaces. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c + :internal: + Batchbuffer Parsing ------------------- @@ -312,29 +405,6 @@ Object Tiling IOCTLs .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c :doc: buffer object tiling -Buffer Object Eviction ----------------------- - -This section documents the interface functions for evicting buffer -objects to make space available in the virtual gpu address spaces. Note -that this is mostly orthogonal to shrinking buffer objects caches, which -has the goal to make main memory (shared with the gpu through the -unified memory architecture) available. - -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c - :internal: - -Buffer Object Memory Shrinking ------------------------------- - -This section documents the interface function for shrinking memory usage -of buffer object caches. Shrinking is used to make main memory -available. Note that this is mostly orthogonal to evicting buffer -objects, which has the goal to make space in gpu virtual address spaces. - -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c - :internal: - GuC === diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 8c5022095418..0000f23a7266 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -38,9 +38,13 @@ enum i915_cache_level; /** - * A VMA represents a GEM BO that is bound into an address space. Therefore, a - * VMA's presence cannot be guaranteed before binding, or after unbinding the - * object into/from the address space. + * DOC: Virtual Memory Address + * + * An `i915_vma` struct represents a GEM BO that is bound into an address + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or + * after unbinding the object into/from the address space. The struct includes + * the bookkepping details needed for tracking it in all the lists with which + * it interacts. * * To make things as simple as possible (ie. no refcounting), a VMA's lifetime * will always be <= an objects lifetime. So object refcounting should cover us.