From patchwork Fri Mar 2 14:09:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: kevin.rogovin@intel.com X-Patchwork-Id: 10254741 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id A093760211 for ; Fri, 2 Mar 2018 14:09:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8F318289A6 for ; Fri, 2 Mar 2018 14:09:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 83C6B289A9; Fri, 2 Mar 2018 14:09:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8E0C7289A6 for ; Fri, 2 Mar 2018 14:09:30 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4FF5489BF1; Fri, 2 Mar 2018 14:09:29 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3140489D4A for ; Fri, 2 Mar 2018 14:09:28 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Mar 2018 06:09:27 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.47,412,1515484800"; d="scan'208";a="22336290" Received: from pmessmer-mobl1.ger.corp.intel.com (HELO LittleBigTrouble.amr.corp.intel.com) ([10.249.254.92]) by orsmga006.jf.intel.com with ESMTP; 02 Mar 2018 06:09:26 -0800 From: kevin.rogovin@intel.com To: intel-gfx@lists.freedesktop.org, joonas.lahtinen@linux.intel.com Date: Fri, 2 Mar 2018 16:09:21 +0200 Message-Id: <1519999761-6605-2-git-send-email-kevin.rogovin@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1519999761-6605-1-git-send-email-kevin.rogovin@intel.com> References: <1519999761-6605-1-git-send-email-kevin.rogovin@intel.com> Subject: [Intel-gfx] [PATCH v2 1/1] i915: additional GEM documentation X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Kevin Rogovin MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP From: Kevin Rogovin This patch provides additional overview documentation to the i915 kernel driver GEM. In addition, it presents already written documentation to i915.rst as well. Signed-off-by: Kevin Rogovin Reviewed-by: Abdiel Janulgue --- Documentation/gpu/i915.rst | 194 +++++++++++++++++++++++------ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +- drivers/gpu/drm/i915/i915_vma.h | 11 +- drivers/gpu/drm/i915/intel_lrc.c | 3 +- drivers/gpu/drm/i915/intel_ringbuffer.h | 64 ++++++++++ 5 files changed, 235 insertions(+), 40 deletions(-) diff --git a/Documentation/gpu/i915.rst b/Documentation/gpu/i915.rst index 41dc881b00dc..cd23da2793ec 100644 --- a/Documentation/gpu/i915.rst +++ b/Documentation/gpu/i915.rst @@ -13,6 +13,18 @@ Core Driver Infrastructure This section covers core driver infrastructure used by both the display and the GEM parts of the driver. +Initialization +-------------- + +The real action of initialization for the i915 driver is handled by +:c:func:`i915_driver_load`; from this function one can see the key +data (in paritcular :c:struct:'drm_driver' for GEM) of the entry points +to to the driver from user space. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_drv.c + :functions: i915_driver_load + + Runtime Power Management ------------------------ @@ -243,32 +255,148 @@ Display PLLs .. kernel-doc:: drivers/gpu/drm/i915/intel_dpll_mgr.h :internal: -Memory Management and Command Submission -======================================== +GEM: Memory Management and Command Submission +============================================= This sections covers all things related to the GEM implementation in the i915 driver. -Batchbuffer Parsing -------------------- +Intel GPU Basics +---------------- -.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c - :doc: batch buffer command parser +An Intel GPU has multiple engines. There are several engine types. +The user-space value `I915_EXEC_DEFAULT` is an alias to the user +space value `I915_EXEC_RENDER`. + +- RCS engine is for rendering 3D and performing compute, this is named `I915_EXEC_RENDER` in user space. +- BCS is a blitting (copy) engine, this is named `I915_EXEC_BLT` in user space. +- VCS is a video encode and decode engine, this is named `I915_EXEC_BSD` in user space +- VECS is video enhancement engine, this is named `I915_EXEC_VEBOX` in user space. + +The Intel GPU family is a familiy of integrated GPU's using Unified Memory +Access. For having the GPU "do work", user space will feed the GPU batch buffers +via one of the ioctls `DRM_IOCTL_I915_GEM_EXECBUFFER2` or +`DRM_IOCTL_I915_GEM_EXECBUFFER2_WR` (the ioctl `DRM_IOCTL_I915_GEM_EXECBUFFER` +is deprecated). Most such batchbuffers will instruct the GPU to perform work +(for example rendering) and that work needs memory from which to read and memory +to which to write. All memory is encapsulated within GEM buffer objects (usually +created with the ioctl `DRM_IOCTL_I915_GEM_CREATE`). An ioctl providing a batchbuffer +for the GPU to create will also list all GEM buffer objects that the batchbuffer +reads and/or writes. For implementation details of memory management see +`GEM BO Management Implementation Details`_. + +A GPU pipeline (mostly strongly so for the RCS engine) has a great deal of state +which is to be programmed by user space via the contents of a batchbuffer. Starting +in Gen6 (SandyBridge), hardware contexts are supported. A hardware context +encapsulates GPU pipeline state and other portions of GPU state and it is much more +efficient for the GPU to load a hardware context instead of re-submitting commands +in a batchbuffer to the GPU to restore state. In addition, using hardware contexts +provides much better isolation between user space clients. The ioctl +`DRM_IOCTL_I915_GEM_CONTEXT_CREATE` is used by user space to create a hardware context +which is identified by a 32-bit integer. The non-deprecated ioctls to submit batchbuffer +work can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1) to +identify what HW context to use with the command. When the kernel submits the +batchbuffer to be executed by the GPU it will also instruct the GPU to load the HW +context prior to executing the contents of a batchbuffer. + +The GPU has its own memory management and address space. The kernel driver +maintains the memory translation table for the GPU. For older GPUs (i.e. those +before Gen8), there is a single global such translation table, a global +Graphics Translation Table (GTT). For newer generation GPUs each hardware +context has its own translation table, called Per-Process Graphics Translation +Table (PPGTT). Of important note, is that although PPGTT is named per-process it +is actually per hardware context. When user space submits a batchbuffer, the kernel +walks the list of GEM buffer objects used by the batchbuffer and guarantees +that not only is the memory of each such GEM buffer object resident but it is +also present in the (PP)GTT. If the GEM buffer object is not yet placed in +the (PP)GTT, then it is given an address. Two consequences of this are: +the kernel needs to edit the batchbuffer submitted to write the correct +value of the GPU address when a GEM BO is assigned a GPU address and +the kernel might evict a different GEM BO from the (PP)GTT to make address +room for a GEM BO. + +Consequently, the ioctls submitting a batchbuffer for execution also include +a list of all locations within buffers that refer to GPU-addresses so that the +kernel can edit the buffer correctly. This process is dubbed relocation. The +ioctls allow user space to provide what the GPU address could be. If the kernel +sees that the address provided by user space is correct, then it skips performing +relocation for that GEM buffer object. In addition, the kernel provides to what +addresses the kernel relocates each GEM buffer object. + +There is also an interface for user space to directly specify the address location +of GEM BO's, the feature soft-pinning and made active within an execbuffer2 ioctl +with `EXEC_OBJECT_PINNED` bit up. If user-space also specifies `I915_EXEC_NO_RELOC`, +then the kernel is to not execute any relocation and user-space manages the address +space for its PPGTT itself. The advantage of user space handling address space is +that then the kernel does far less work and user space can safely assume that +GEM buffer object's location in GPU address space do not change. + +GEM BO Management Implementation Details +---------------------------------------- -.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c +.. kernel-doc:: drivers/gpu/drm/i915/i915_vma.h + :doc: Virtual Memory Address + +Buffer Object Eviction +~~~~~~~~~~~~~~~~~~~~~~ + +This section documents the interface functions for evicting buffer +objects to make space available in the virtual gpu address spaces. Note +that this is mostly orthogonal to shrinking buffer objects caches, which +has the goal to make main memory (shared with the gpu through the +unified memory architecture) available. + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c :internal: -Batchbuffer Pools ------------------ +Buffer Object Memory Shrinking +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c - :doc: batch pool +This section documents the interface function for shrinking memory usage +of buffer object caches. Shrinking is used to make main memory +available. Note that this is mostly orthogonal to evicting buffer +objects, which has the goal to make space in gpu virtual address spaces. -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c :internal: + +Batchbuffer Submission +---------------------- + +Depending on GPU generation, the i915 kernel driver will submit batchbuffers +in one of the several ways. However, the top code logic is shared for all +methods, see `Common: At the bottom`_ and `Common: Processing requests`_ +for details. In addition, the kernel may filter the contents of user space +provided batchbuffers. To that end the i915 driver has a +`Command Buffer Parser`_ and a pool from which to allocate buffers to place +filtered user space batchbuffers, see section `Batchbuffer Pools`_. + +Common: At the bottom +~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h + :doc: Ringbuffers to submit batchbuffers + +Common: Processing requests +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_execbuffer.c + :doc: User command execution + +Batchbuffer Submission Varieties +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. kernel-doc:: drivers/gpu/drm/i915/intel_ringbuffer.h + :doc: Batchbuffer Submission Backend + +The two varieties for submitting batchbuffer to the GPU are the following. + +1. Batchbuffers are subbmitted directly to a ring buffer; this is the most basic way to submit batchbuffers to the GPU and is for generations strictly before Gen8. +2. Batchbuffer are submitting via execlists are a features supported by Gen8 and new devices; the macro :c:macro:'HAS_EXECLISTS' is used to determine if a GPU supports submitting via execlists, see `Logical Rings, Logical Ring Contexts and Execlists`_. + Logical Rings, Logical Ring Contexts and Execlists --------------------------------------------------- +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c :doc: Logical Rings, Logical Ring Contexts and Execlists @@ -276,6 +404,24 @@ Logical Rings, Logical Ring Contexts and Execlists .. kernel-doc:: drivers/gpu/drm/i915/intel_lrc.c :internal: +Command Buffer Parser +--------------------- + +.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c + :doc: batch buffer command parser + +.. kernel-doc:: drivers/gpu/drm/i915/i915_cmd_parser.c + :internal: + +Batchbuffer Pools +----------------- + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c + :doc: batch pool + +.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_batch_pool.c + :internal: + Global GTT views ---------------- @@ -312,28 +458,6 @@ Object Tiling IOCTLs .. kernel-doc:: drivers/gpu/drm/i915/i915_gem_tiling.c :doc: buffer object tiling -Buffer Object Eviction ----------------------- - -This section documents the interface functions for evicting buffer -objects to make space available in the virtual gpu address spaces. Note -that this is mostly orthogonal to shrinking buffer objects caches, which -has the goal to make main memory (shared with the gpu through the -unified memory architecture) available. - -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_evict.c - :internal: - -Buffer Object Memory Shrinking ------------------------------- - -This section documents the interface function for shrinking memory usage -of buffer object caches. Shrinking is used to make main memory -available. Note that this is mostly orthogonal to evicting buffer -objects, which has the goal to make space in gpu virtual address spaces. - -.. kernel-doc:: drivers/gpu/drm/i915/i915_gem_shrinker.c - :internal: GuC === diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index 8c170db8495d..6c8b8e2041f1 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -81,7 +81,8 @@ enum { * but this remains just a hint as the kernel may choose a new location for * any object in the future. * - * Processing an execbuf ioctl is conceptually split up into a few phases. + * Processing an execbuf ioctl is handled by i915_gem_do_execbuffer() which + * conceptually splits up processing of an execbuf ioctl into a few phases. * * 1. Validation - Ensure all the pointers, handles and flags are valid. * 2. Reservation - Assign GPU address space for every object diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 8c5022095418..d0feb4f9e326 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -38,13 +38,18 @@ enum i915_cache_level; /** - * A VMA represents a GEM BO that is bound into an address space. Therefore, a - * VMA's presence cannot be guaranteed before binding, or after unbinding the - * object into/from the address space. + * DOC: Virtual Memory Address + * + * An `i915_vma` struct represents a GEM BO that is bound into an address + * space. Therefore, a VMA's presence cannot be guaranteed before binding, or + * after unbinding the object into/from the address space. The struct includes + * the bookkepping details needed for tracking it in all the lists with which + * it interacts. * * To make things as simple as possible (ie. no refcounting), a VMA's lifetime * will always be <= an objects lifetime. So object refcounting should cover us. */ + struct i915_vma { struct drm_mm_node node; struct drm_i915_gem_object *obj; diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c index 14288743909f..bc4943333090 100644 --- a/drivers/gpu/drm/i915/intel_lrc.c +++ b/drivers/gpu/drm/i915/intel_lrc.c @@ -34,7 +34,8 @@ * Motivation: * GEN8 brings an expansion of the HW contexts: "Logical Ring Contexts". * These expanded contexts enable a number of new abilities, especially - * "Execlists" (also implemented in this file). + * "Execlists" (also implemented in this file, + * drivers/gpu/drm/i915/intel_lrc.c). * * One of the main differences with the legacy HW contexts is that logical * ring contexts incorporate many more things to the context's state, like diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h index bbacf4d0f4cb..390f63479565 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.h +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h @@ -300,6 +300,70 @@ struct intel_engine_execlists { #define INTEL_ENGINE_CS_MAX_NAME 8 +/** + * DOC: Ringbuffers to submit batchbuffers + * + * At the lowest level, submitting work to a GPU engine is to add commands to + * a ringbuffer. A ringbuffer in the kernel driver is essentially a location + * from which the GPU reads its next command. To avoid copying the contents + * of a batchbuffer in order to submit it, the GPU has native hardware support + * to perform commands specified in another buffer; the command to do so is + * a batchbuffer start and the i915 kernel driver uses this to avoid copying + * batchbuffers to the ringbuffer. At the very bottom of the stack, the i915 + * adds the following to a ringbuffer to submit a batchbuffer to the GPU. + * + * 1. Add a batchbuffer start command to the ringbuffer. + * The start command is essentially a token together with the GPU + * address of the batchbuffer to be executed + * + * 2. Add a pipeline flush to the the ring buffer. + * This is accomplished by the function pointer + * + * 3. Add a register write command to the ring buffer. + * This register write writes the the request ID, + * ``i915_request::global_seqno``; the i915 kernel driver uses + * the value in the register to know what requests are completed. + * + * 4. Add a user interrupt command to the ringbuffer. + * This command instructs the GPU to issue an interrupt + * when the command (and pipeline flush) are completed. + */ + +/** + * DOC: Batchbuffer Submission Backend + * + * The core logic of submitting a batchbuffer for the GPU to execute + * is shared across all engines for all GPU generations. Through the use + * of functions pointers, we can customize submission to different GPU + * capabilities. The struct ``intel_engine_cs`` has the following member + * function pointers for the following purposes in the scope of batchbuffer + * submission. + * + * - context_pin + * pins the context and also returns to what ``intel_ringbuffer`` + * to write to submit a batchbuffer. + * + * - request_alloc + * is used to reserve space in an ``intel_ringbuffer`` + * for submitting a batchbuffer to the GPU. + * + * - emit_flush + * writes a pipeline flush command to the ring buffer. + * + * - emit_bb_start + * writes the batchbuffer start command to the ringer buffer. + * + * - emit_breadcrumb + * writes to the ring buffer both the regiser write of the + * request ID (`i915_request::global_seqno`) and the command to + * issue an interrupt. + * + * - submit_request + * See the comment on this member in ``intel_engine_cs``, declared + * in intel_ringbuffer.h. + * + */ + struct intel_engine_cs { struct drm_i915_private *i915; char name[INTEL_ENGINE_CS_MAX_NAME];