diff mbox series

[RFC,42/42] HAX drm/i915/lmem: default userspace allocations to LMEM

Message ID 20190214145740.14521-43-matthew.auld@intel.com (mailing list archive)
State New, archived
Headers show
Series Introduce memory region concept (including device local memory) | expand

Commit Message

Matthew Auld Feb. 14, 2019, 2:57 p.m. UTC
Hack patch to default all userspace allocations to LMEM. Useful for
testing purposes.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 45 +++++++++++++++++++++++++++++++--
 1 file changed, 43 insertions(+), 2 deletions(-)

Comments

Chris Wilson Feb. 14, 2019, 4:13 p.m. UTC | #1
Quoting Matthew Auld (2019-02-14 14:57:40)
> Hack patch to default all userspace allocations to LMEM. Useful for
> testing purposes.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 45 +++++++++++++++++++++++++++++++--
>  1 file changed, 43 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3c86909d55b9..bd857f477ef9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -641,7 +641,8 @@ i915_gem_create(struct drm_file *file,
>                 u32 *handle_p)
>  {
>         struct drm_i915_gem_object *obj;
> -       int ret;
> +       intel_wakeref_t wakeref;
> +       int ret = 0;
>         u32 handle;
>  
>         size = roundup(size, PAGE_SIZE);
> @@ -649,10 +650,50 @@ i915_gem_create(struct drm_file *file,
>                 return -EINVAL;
>  
>         /* Allocate the new object */
> -       obj = i915_gem_object_create(dev_priv, size);
> +       if (HAS_LMEM(dev_priv))
> +               obj = i915_gem_object_create_lmem(dev_priv, size, 0);
> +       else
> +               obj = i915_gem_object_create(dev_priv, size);
>         if (IS_ERR(obj))
>                 return PTR_ERR(obj);
>  
> +       if (i915_gem_object_is_lmem(obj)) {
> +               struct i915_gem_context *ctx;
> +
> +               /* XXX: we should prob use the blitter context for this? */

Or the kernel_context which is setup for emitting without taking
struct_mutex...
-Chris
Chris Wilson Feb. 14, 2019, 4:22 p.m. UTC | #2
Quoting Matthew Auld (2019-02-14 14:57:40)
> Hack patch to default all userspace allocations to LMEM. Useful for
> testing purposes.

One caveat to note is that userspace assumes objects start idle in
.write=CPU. That assumption may very well be put to the test, and go
unnoticed for quite a while...
-Chris
Chris Wilson Feb. 18, 2019, 12:44 p.m. UTC | #3
Quoting Chris Wilson (2019-02-14 16:13:18)
> Quoting Matthew Auld (2019-02-14 14:57:40)
> > Hack patch to default all userspace allocations to LMEM. Useful for
> > testing purposes.
> > 
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 45 +++++++++++++++++++++++++++++++--
> >  1 file changed, 43 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3c86909d55b9..bd857f477ef9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -641,7 +641,8 @@ i915_gem_create(struct drm_file *file,
> >                 u32 *handle_p)
> >  {
> >         struct drm_i915_gem_object *obj;
> > -       int ret;
> > +       intel_wakeref_t wakeref;
> > +       int ret = 0;
> >         u32 handle;
> >  
> >         size = roundup(size, PAGE_SIZE);
> > @@ -649,10 +650,50 @@ i915_gem_create(struct drm_file *file,
> >                 return -EINVAL;
> >  
> >         /* Allocate the new object */
> > -       obj = i915_gem_object_create(dev_priv, size);
> > +       if (HAS_LMEM(dev_priv))
> > +               obj = i915_gem_object_create_lmem(dev_priv, size, 0);
> > +       else
> > +               obj = i915_gem_object_create(dev_priv, size);
> >         if (IS_ERR(obj))
> >                 return PTR_ERR(obj);
> >  
> > +       if (i915_gem_object_is_lmem(obj)) {
> > +               struct i915_gem_context *ctx;
> > +
> > +               /* XXX: we should prob use the blitter context for this? */
> 
> Or the kernel_context which is setup for emitting without taking
> struct_mutex...

Actually doing this async (i.e. not blocking object creation due to
clear) requires us to be sure that the cpu access via mmap has a sync
point. pread, mmap_ggtt does, but mmap_(wb|wc) does not currently, and
we would be reliant on userspace doing a set-domain first in order for
them not to see stale buffer content.

Also I have this penciled in to enable buffer pools (e.g. !llc would
dearly love to keep WC pages around before giving them back to the
system).

Hmm,
-Chris
Chris Wilson Feb. 19, 2019, 5:44 p.m. UTC | #4
Quoting Chris Wilson (2019-02-14 16:13:18)
> Quoting Matthew Auld (2019-02-14 14:57:40)
> > Hack patch to default all userspace allocations to LMEM. Useful for
> > testing purposes.
> > 
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c | 45 +++++++++++++++++++++++++++++++--
> >  1 file changed, 43 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 3c86909d55b9..bd857f477ef9 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -641,7 +641,8 @@ i915_gem_create(struct drm_file *file,
> >                 u32 *handle_p)
> >  {
> >         struct drm_i915_gem_object *obj;
> > -       int ret;
> > +       intel_wakeref_t wakeref;
> > +       int ret = 0;
> >         u32 handle;
> >  
> >         size = roundup(size, PAGE_SIZE);
> > @@ -649,10 +650,50 @@ i915_gem_create(struct drm_file *file,
> >                 return -EINVAL;
> >  
> >         /* Allocate the new object */
> > -       obj = i915_gem_object_create(dev_priv, size);
> > +       if (HAS_LMEM(dev_priv))
> > +               obj = i915_gem_object_create_lmem(dev_priv, size, 0);
> > +       else
> > +               obj = i915_gem_object_create(dev_priv, size);
> >         if (IS_ERR(obj))
> >                 return PTR_ERR(obj);
> >  
> > +       if (i915_gem_object_is_lmem(obj)) {
> > +               struct i915_gem_context *ctx;
> > +
> > +               /* XXX: we should prob use the blitter context for this? */
> 
> Or the kernel_context which is setup for emitting without taking
> struct_mutex...

Using a single context should only be a last resort; be it kernel or
blitter context. We need to defer this until an owning HW context is
known so that we can properly queue it in their name, or else we end up
with a global barrier being the kernel context and priority inversions
abound.

This does suggest to me that async pages needs to be in better shape...
-Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3c86909d55b9..bd857f477ef9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -641,7 +641,8 @@  i915_gem_create(struct drm_file *file,
 		u32 *handle_p)
 {
 	struct drm_i915_gem_object *obj;
-	int ret;
+	intel_wakeref_t wakeref;
+	int ret = 0;
 	u32 handle;
 
 	size = roundup(size, PAGE_SIZE);
@@ -649,10 +650,50 @@  i915_gem_create(struct drm_file *file,
 		return -EINVAL;
 
 	/* Allocate the new object */
-	obj = i915_gem_object_create(dev_priv, size);
+	if (HAS_LMEM(dev_priv))
+		obj = i915_gem_object_create_lmem(dev_priv, size, 0);
+	else
+		obj = i915_gem_object_create(dev_priv, size);
 	if (IS_ERR(obj))
 		return PTR_ERR(obj);
 
+	if (i915_gem_object_is_lmem(obj)) {
+		struct i915_gem_context *ctx;
+
+		/* XXX: we should prob use the blitter context for this? */
+		ctx = i915_gem_context_lookup(file->driver_priv,
+					      DEFAULT_CONTEXT_HANDLE);
+		if (!ctx) {
+			i915_gem_object_put(obj);
+			return -ENOENT;
+		}
+
+		/*
+		 * XXX: We really want to move this to get_pages(), but we
+		 * require grabbing the BKL for the blitting operation which is
+		 * annoying. In the pipeline is support for async get_pages()
+		 * which should fit nicely for this. Also note that the actual
+		 * clear should be done async(we currently do an object_wait
+		 * in clear_blt which is pure garbage), we just need to take
+		 * care if userspace opts of implicit sync for the execbuf, to
+		 * avoid any potential info leak.
+		 */
+
+		mutex_lock(&dev_priv->drm.struct_mutex);
+
+		with_intel_runtime_pm(dev_priv, wakeref)
+			ret = i915_gem_object_clear_blt(ctx, obj);
+
+		i915_gem_context_put(ctx);
+		if (ret) {
+			__i915_gem_object_release_unless_active(obj);
+			mutex_unlock(&dev_priv->drm.struct_mutex);
+			return ret;
+		}
+
+		mutex_unlock(&dev_priv->drm.struct_mutex);
+	}
+
 	ret = drm_gem_handle_create(file, &obj->base, &handle);
 	/* drop reference from allocate - handle holds it now */
 	i915_gem_object_put(obj);