[v4,2/7] drm/i915: Use batch pools with the command parser

Message ID	1415398927-16572-3-git-send-email-bradley.d.volkin@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: bradley.d.volkin@intel.com To: intel-gfx@lists.freedesktop.org Date: Fri, 7 Nov 2014 14:22:02 -0800 Message-Id: <1415398927-16572-3-git-send-email-bradley.d.volkin@intel.com> In-Reply-To: <1415398927-16572-1-git-send-email-bradley.d.volkin@intel.com> References: <1415398927-16572-1-git-send-email-bradley.d.volkin@intel.com> Subject: [Intel-gfx] [PATCH v4 2/7] drm/i915: Use batch pools with the command parser Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Message ID

1415398927-16572-3-git-send-email-bradley.d.volkin@intel.com (mailing list archive)

State

New, archived

Headers

From: bradley.d.volkin@intel.com
To: intel-gfx@lists.freedesktop.org
Date: Fri,  7 Nov 2014 14:22:02 -0800
Message-Id: <1415398927-16572-3-git-send-email-bradley.d.volkin@intel.com>
In-Reply-To: <1415398927-16572-1-git-send-email-bradley.d.volkin@intel.com>
References: <1415398927-16572-1-git-send-email-bradley.d.volkin@intel.com>
Subject: [Intel-gfx] [PATCH v4 2/7] drm/i915: Use batch pools with the
	command parser
Precedence: list
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Commit Message

bradley.d.volkin@intel.com Nov. 7, 2014, 10:22 p.m. UTC

From: Brad Volkin <bradley.d.volkin@intel.com>

This patch sets up all of the tracking and copying necessary to
use batch pools with the command parser and dispatches the copied
(shadow) batch to the hardware.

After this patch, the parser is in 'enabling' mode.

Note that performance takes a hit from the copy in some cases
and will likely need some work. At a rough pass, the memcpy
appears to be the bottleneck. Without having done a deeper
analysis, two ideas that come to mind are:
1) Copy sections of the batch at a time, as they are reached
   by parsing. Might improve cache locality.
2) Copy only up to the userspace-supplied batch length and
   memset the rest of the buffer. Reduces the number of reads.

v2:
- Remove setting the capacity of the pool
- One global pool instead of per-ring pools
- Replace batch_obj with shadow_batch_obj and hook into eb->vmas
- Memset any space in the shadow batch beyond what gets copied
- Rebased on execlist prep refactoring

v3:
- Rebase on chained batch handling
- Squash in setting the secure dispatch flag
- Add a note about the interaction w/secure dispatch pinning
- Check for request->batch_obj == NULL in i915_gem_free_request

v4:
- Fix read domains for shadow_batch_obj
- Remove the set_to_gtt_domain call from i915_parse_cmds
- ggtt_pin/unpin in the parser block to simplify error handling
- Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
- Remove i915_gem_batch_pool_put calls

Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c     | 79 +++++++++++++++++++++++-------
 drivers/gpu/drm/i915/i915_dma.c            |  1 +
 drivers/gpu/drm/i915/i915_drv.h            |  8 +++
 drivers/gpu/drm/i915/i915_gem.c            |  2 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 49 ++++++++++++++++--
 5 files changed, 117 insertions(+), 22 deletions(-)

Comments

Chris Wilson Nov. 12, 2014, 9:49 a.m. UTC | #1

On Fri, Nov 07, 2014 at 02:22:02PM -0800, bradley.d.volkin@intel.com wrote:
> From: Brad Volkin <bradley.d.volkin@intel.com>
> 
> This patch sets up all of the tracking and copying necessary to
> use batch pools with the command parser and dispatches the copied
> (shadow) batch to the hardware.
> 
> After this patch, the parser is in 'enabling' mode.
> 
> Note that performance takes a hit from the copy in some cases
> and will likely need some work. At a rough pass, the memcpy
> appears to be the bottleneck. Without having done a deeper
> analysis, two ideas that come to mind are:
> 1) Copy sections of the batch at a time, as they are reached
>    by parsing. Might improve cache locality.
> 2) Copy only up to the userspace-supplied batch length and
>    memset the rest of the buffer. Reduces the number of reads.
> 
> v2:
> - Remove setting the capacity of the pool
> - One global pool instead of per-ring pools
> - Replace batch_obj with shadow_batch_obj and hook into eb->vmas
> - Memset any space in the shadow batch beyond what gets copied
> - Rebased on execlist prep refactoring
> 
> v3:
> - Rebase on chained batch handling
> - Squash in setting the secure dispatch flag
> - Add a note about the interaction w/secure dispatch pinning
> - Check for request->batch_obj == NULL in i915_gem_free_request
> 
> v4:
> - Fix read domains for shadow_batch_obj
> - Remove the set_to_gtt_domain call from i915_parse_cmds
> - ggtt_pin/unpin in the parser block to simplify error handling
> - Check USES_FULL_PPGTT before setting DISPATCH_SECURE flag
> - Remove i915_gem_batch_pool_put calls
> 
> Signed-off-by: Brad Volkin <bradley.d.volkin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_cmd_parser.c     | 79 +++++++++++++++++++++++-------
>  drivers/gpu/drm/i915/i915_dma.c            |  1 +
>  drivers/gpu/drm/i915/i915_drv.h            |  8 +++
>  drivers/gpu/drm/i915/i915_gem.c            |  2 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 49 ++++++++++++++++--
>  5 files changed, 117 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
> index 809bb95..5a3f4e4 100644
> --- a/drivers/gpu/drm/i915/i915_cmd_parser.c
> +++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
> @@ -838,6 +838,56 @@ finish:
>  	return (u32*)addr;
>  }
>  
> +/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
> +static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
> +		       struct drm_i915_gem_object *src_obj)
> +{
> +	int ret = 0;
> +	int needs_clflush = 0;
> +	u32 *src_addr, *dest_addr = NULL;
> +
> +	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
> +		return ERR_PTR(ret);
> +	}
> +
> +	src_addr = vmap_batch(src_obj);
> +	if (!src_addr) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
> +		ret = -ENOMEM;
> +		goto unpin_src;
> +	}
> +
> +	if (needs_clflush)
> +		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
> +
> +	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
> +		goto unmap_src;
> +	}
> +
> +	dest_addr = vmap_batch(dest_obj);

Considering that this does a straightforward copy (if we ignore the
highly dubious memset), any chance of a kmap_atomic(); copy_page();
loop? (Modulo first/last pages which may not be aligned.)
-Chris

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 809bb95..5a3f4e4 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -838,6 +838,56 @@  finish:
 	return (u32*)addr;
 }
 
+/* Returns a vmap'd pointer to dest_obj, which the caller must unmap */
+static u32 *copy_batch(struct drm_i915_gem_object *dest_obj,
+		       struct drm_i915_gem_object *src_obj)
+{
+	int ret = 0;
+	int needs_clflush = 0;
+	u32 *src_addr, *dest_addr = NULL;
+
+	ret = i915_gem_obj_prepare_shmem_read(src_obj, &needs_clflush);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
+		return ERR_PTR(ret);
+	}
+
+	src_addr = vmap_batch(src_obj);
+	if (!src_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
+		ret = -ENOMEM;
+		goto unpin_src;
+	}
+
+	if (needs_clflush)
+		drm_clflush_virt_range((char *)src_addr, src_obj->base.size);
+
+	ret = i915_gem_object_set_to_cpu_domain(dest_obj, true);
+	if (ret) {
+		DRM_DEBUG_DRIVER("CMD: Failed to set batch CPU domain\n");
+		goto unmap_src;
+	}
+
+	dest_addr = vmap_batch(dest_obj);
+	if (!dest_addr) {
+		DRM_DEBUG_DRIVER("CMD: Failed to vmap shadow batch\n");
+		ret = -ENOMEM;
+		goto unmap_src;
+	}
+
+	memcpy(dest_addr, src_addr, src_obj->base.size);
+	if (dest_obj->base.size > src_obj->base.size)
+		memset((u8 *)dest_addr + src_obj->base.size, 0,
+		       dest_obj->base.size - src_obj->base.size);
+
+unmap_src:
+	vunmap(src_addr);
+unpin_src:
+	i915_gem_object_unpin_pages(src_obj);
+
+	return ret ? ERR_PTR(ret) : dest_addr;
+}
+
 /**
  * i915_needs_cmd_parser() - should a given ring use software command parsing?
  * @ring: the ring in question
@@ -954,6 +1004,7 @@  static bool check_cmd(const struct intel_engine_cs *ring,
  * i915_parse_cmds() - parse a submitted batch buffer for privilege violations
  * @ring: the ring on which the batch is to execute
  * @batch_obj: the batch buffer in question
+ * @shadow_batch_obj: copy of the batch buffer in question
  * @batch_start_offset: byte offset in the batch at which execution starts
  * @is_master: is the submitting process the drm master?
  *
@@ -965,32 +1016,28 @@  static bool check_cmd(const struct intel_engine_cs *ring,
  */
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master)
 {
 	int ret = 0;
 	u32 *cmd, *batch_base, *batch_end;
 	struct drm_i915_cmd_descriptor default_desc = { 0 };
-	int needs_clflush = 0;
 	bool oacontrol_set = false; /* OACONTROL tracking. See check_cmd() */
 
-	ret = i915_gem_obj_prepare_shmem_read(batch_obj, &needs_clflush);
-	if (ret) {
-		DRM_DEBUG_DRIVER("CMD: failed to prep read\n");
-		return ret;
+	batch_base = copy_batch(shadow_batch_obj, batch_obj);
+	if (IS_ERR(batch_base)) {
+		DRM_DEBUG_DRIVER("CMD: Failed to copy batch\n");
+		return PTR_ERR(batch_base);
 	}
 
-	batch_base = vmap_batch(batch_obj);
-	if (!batch_base) {
-		DRM_DEBUG_DRIVER("CMD: Failed to vmap batch\n");
-		i915_gem_object_unpin_pages(batch_obj);
-		return -ENOMEM;
-	}
-
-	if (needs_clflush)
-		drm_clflush_virt_range((char *)batch_base, batch_obj->base.size);
-
 	cmd = batch_base + (batch_start_offset / sizeof(*cmd));
+
+	/*
+	 * We use the source object's size because the shadow object is as
+	 * large or larger and copy_batch() will write MI_NOPs to the extra
+	 * space. Parsing should be faster in some cases this way.
+	 */
 	batch_end = cmd + (batch_obj->base.size / sizeof(*batch_end));
 
 	while (cmd < batch_end) {
@@ -1052,8 +1099,6 @@  int i915_parse_cmds(struct intel_engine_cs *ring,
 
 	vunmap(batch_base);
 
-	i915_gem_object_unpin_pages(batch_obj);
-
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 9a73533..dea0f45 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1891,6 +1891,7 @@  int i915_driver_unload(struct drm_device *dev)
 
 		mutex_lock(&dev->struct_mutex);
 		i915_gem_cleanup_ringbuffer(dev);
+		i915_gem_batch_pool_fini(&dev_priv->mm.batch_pool);
 		i915_gem_context_fini(dev);
 		mutex_unlock(&dev->struct_mutex);
 		i915_gem_cleanup_stolen(dev);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2955ed9..a1e19dd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1153,6 +1153,13 @@  struct i915_gem_mm {
 	 */
 	struct list_head unbound_list;
 
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
 	/** Usable portion of the GTT for GEM */
 	unsigned long stolen_base; /* limited to low memory (32-bit) */
 
@@ -2851,6 +2858,7 @@  void i915_cmd_parser_fini_ring(struct intel_engine_cs *ring);
 bool i915_needs_cmd_parser(struct intel_engine_cs *ring);
 int i915_parse_cmds(struct intel_engine_cs *ring,
 		    struct drm_i915_gem_object *batch_obj,
+		    struct drm_i915_gem_object *shadow_batch_obj,
 		    u32 batch_start_offset,
 		    bool is_master);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 875c151..7a2bf89 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5034,6 +5034,8 @@  i915_gem_load(struct drm_device *dev)
 	dev_priv->mm.oom_notifier.notifier_call = i915_gem_shrinker_oom;
 	register_oom_notifier(&dev_priv->mm.oom_notifier);
 
+	i915_gem_batch_pool_init(dev, &dev_priv->mm.batch_pool);
+
 	mutex_init(&dev_priv->fb_tracking.lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index e1ed85a..ce45533 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1236,6 +1236,8 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct eb_vmas *eb;
 	struct drm_i915_gem_object *batch_obj;
+	struct drm_i915_gem_object *shadow_batch_obj = NULL;
+	struct drm_i915_gem_exec_object2 shadow_exec_entry;
 	struct intel_engine_cs *ring;
 	struct intel_context *ctx;
 	struct i915_address_space *vm;
@@ -1361,22 +1363,59 @@  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	batch_obj->base.pending_read_domains |= I915_GEM_DOMAIN_COMMAND;
 
 	if (i915_needs_cmd_parser(ring)) {
+		shadow_batch_obj =
+			i915_gem_batch_pool_get(&dev_priv->mm.batch_pool,
+						batch_obj->base.size);
+		if (IS_ERR(shadow_batch_obj)) {
+			ret = PTR_ERR(shadow_batch_obj);
+			/* Don't try to clean up the obj in the error path */
+			shadow_batch_obj = NULL;
+			goto err;
+		}
+
+		ret = i915_gem_obj_ggtt_pin(shadow_batch_obj, 4096, 0);
+		if (ret)
+			goto err;
+
 		ret = i915_parse_cmds(ring,
 				      batch_obj,
+				      shadow_batch_obj,
 				      args->batch_start_offset,
 				      file->is_master);
+		i915_gem_object_ggtt_unpin(shadow_batch_obj);
+
 		if (ret) {
 			if (ret != -EACCES)
 				goto err;
 		} else {
+			struct i915_vma *vma;
+
+			memset(&shadow_exec_entry, 0,
+			       sizeof(shadow_exec_entry));
+
+			vma = i915_gem_obj_to_ggtt(shadow_batch_obj);
+			vma->exec_entry = &shadow_exec_entry;
+			drm_gem_object_reference(&shadow_batch_obj->base);
+			list_add_tail(&vma->exec_list, &eb->vmas);
+
+			shadow_batch_obj->base.pending_read_domains =
+				batch_obj->base.pending_read_domains;
+
+			batch_obj = shadow_batch_obj;
+
 			/*
-			 * XXX: Actually do this when enabling batch copy...
+			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE
+			 * bit from MI_BATCH_BUFFER_START commands issued in the
+			 * dispatch_execbuffer implementations. We specifically
+			 * don't want that set when the command parser is
+			 * enabled.
 			 *
-			 * Set the DISPATCH_SECURE bit to remove the NON_SECURE bit
-			 * from MI_BATCH_BUFFER_START commands issued in the
-			 * dispatch_execbuffer implementations. We specifically don't
-			 * want that set when the command parser is enabled.
+			 * FIXME: with aliasing ppgtt, buffers that should only
+			 * be in ggtt still end up in the aliasing ppgtt. remove
+			 * this check when that is fixed.
 			 */
+			if (USES_FULL_PPGTT(dev))
+				flags |= I915_DISPATCH_SECURE;
 		}
 	}

[v4,2/7] drm/i915: Use batch pools with the command parser

Commit Message

Comments

Patch