[v2,03/37] drm/i915/region: support basic eviction

Message ID	20190627205633.1143-4-matthew.auld@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Matthew Auld <matthew.auld@intel.com> To: intel-gfx@lists.freedesktop.org Date: Thu, 27 Jun 2019 21:55:59 +0100 Message-Id: <20190627205633.1143-4-matthew.auld@intel.com> In-Reply-To: <20190627205633.1143-1-matthew.auld@intel.com> References: <20190627205633.1143-1-matthew.auld@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic eviction Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	Introduce memory region concept (including device local memory) \| expand [v2,00/37] Introduce memory region concept (including device local memory) [v2,01/37] drm/i915: buddy allocator [v2,02/37] drm/i915: introduce intel_memory_region [v2,03/37] drm/i915/region: support basic eviction [v2,04/37] drm/i915/region: support continuous allocations [v2,05/37] drm/i915/region: support volatile objects [v2,06/37] drm/i915: Add memory region information to device_info [v2,07/37] drm/i915: support creating LMEM objects [v2,08/37] drm/i915: setup io-mapping for LMEM [v2,09/37] drm/i915/lmem: support kernel mapping [v2,10/37] drm/i915/blt: support copying objects [v2,11/37] drm/i915/selftests: move gpu-write-dw into utils [v2,12/37] drm/i915/selftests: add write-dword test for LMEM [v2,13/37] drm/i915/selftests: don't just test CACHE_NONE for huge-pages [v2,14/37] drm/i915/selftest: extend coverage to include LMEM huge-pages [v2,15/37] drm/i915/lmem: support CPU relocations [v2,16/37] drm/i915/lmem: support pread [v2,17/37] drm/i915/lmem: support pwrite [v2,18/37] drm/i915: enumerate and init each supported region [v2,19/37] drm/i915: treat shmem as a region [v2,20/37] drm/i915: treat stolen as a region [v2,21/37] drm/i915: define HAS_MAPPABLE_APERTURE [v2,22/37] drm/i915: do not map aperture if it is not available. [v2,23/37] drm/i915: expose missing map_gtt support to users [v2,24/37] drm/i915: set num_fence_regs to 0 if there is no aperture [v2,25/37] drm/i915/selftests: check for missing aperture [v2,26/37] drm/i915: error capture with no ggtt slot [v2,27/37] drm/i915: Don't try to place HWS in non-existing mappable region [v2,28/37] drm/i915: Allow i915 to manage the vma offset nodes instead of drm core [v2,29/37] drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET [v2,30/37] drm/i915/lmem: add helper to get CPU accessible offset [v2,31/37] drm/i915: Add cpu and lmem fault handlers [v2,32/37] drm/i915: cpu-map based dumb buffers [v2,33/37] drm/i915: support basic object migration [v2,34/37] drm/i915: Introduce GEM_OBJECT_SETPARAM with I915_PARAM_MEMORY_REGION [v2,35/37] drm/i915/query: Expose memory regions through the query uAPI [v2,36/37] HAX drm/i915: add the fake lmem region [v2,37/37] HAX drm/i915/lmem: default userspace allocations to LMEM

Matthew Auld June 27, 2019, 8:55 p.m. UTC

Support basic eviction for regions.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
---
 .../gpu/drm/i915/gem/i915_gem_object_types.h  |  7 ++
 drivers/gpu/drm/i915/i915_gem.c               | 16 ++++
 drivers/gpu/drm/i915/intel_memory_region.c    | 89 ++++++++++++++++++-
 drivers/gpu/drm/i915/intel_memory_region.h    | 10 +++
 .../drm/i915/selftests/intel_memory_region.c  | 73 +++++++++++++++
 drivers/gpu/drm/i915/selftests/mock_region.c  |  1 +
 6 files changed, 192 insertions(+), 4 deletions(-)

Chris Wilson June 27, 2019, 10:59 p.m. UTC | #1

Quoting Matthew Auld (2019-06-27 21:55:59)
> +int i915_memory_region_evict(struct intel_memory_region *mem,

What type is this again?

> +                            resource_size_t target)
> +{
> +       struct drm_i915_gem_object *obj, *on;
> +       resource_size_t found;
> +       LIST_HEAD(purgeable);
> +       int err;
> +
> +       err = 0;
> +       found = 0;
> +
> +       mutex_lock(&mem->obj_lock);
> +
> +       list_for_each_entry(obj, &mem->purgeable, region_link) {
> +               if (!i915_gem_object_has_pages(obj))
> +                       continue;
> +
> +               if (READ_ONCE(obj->pin_global))
> +                       continue;
> +
> +               if (atomic_read(&obj->bind_count))
> +                       continue;
> +
> +               list_add(&obj->eviction_link, &purgeable);
> +
> +               found += obj->base.size;
> +               if (found >= target)
> +                       goto found;
> +       }
> +
> +       err = -ENOSPC;
> +found:
> +       list_for_each_entry_safe(obj, on, &purgeable, eviction_link) {
> +               if (!err) {
> +                       __i915_gem_object_put_pages(obj, I915_MM_SHRINKER);

How come put_pages is not taking mm->obj_lock to remove the
obj->region_link?

I'm getting fishy vibes.

> +
> +                       mutex_lock_nested(&obj->mm.lock, I915_MM_SHRINKER);

> +                       if (!i915_gem_object_has_pages(obj))
> +                               obj->mm.madv = __I915_MADV_PURGED;

That should be pushed to put_pages() as reason. The unlock/lock is just
asking for trouble.

> +                       mutex_unlock(&obj->mm.lock);
> +               }
> +
> +               list_del(&obj->eviction_link);
> +       }

You will have noticed that a separate eviction_link is superfluous? If
both region_link and evction_link are only valid underneath obj_lock,
you can list_move(&obj->region_link, &purgeable) in the first pass, and
unwind on error.

However, I'm going hmm.

So you keep all objects on the shrink lists even when not allocated. Ho
hum. With a bit more creative locking, read careful acquisition of
resources then dropping the lock before actually evicting, it should
work out.
-Chris

Daniel Vetter July 30, 2019, 4:26 p.m. UTC | #2

On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> Support basic eviction for regions.
> 
> Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>

So from a very high level this looks like it was largely modelled after
i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
running out of stuff" code). Any specific reasons?

I think i915_gem_evict is a lot closer match for what we want for vram (it
started out to manage severely limitted GTT on gen2/3/4) after all. With
the complication that we'll have to manage physical memory with multiple
virtual mappings of it on top, so unfortunately we can't just reuse the
locking patter Chris has come up with in his struct_mutex-removal branch.
But at least conceptually it should be a lot closer.

But I might be entirely off the track with reconstructing how this code
came to be, so please elaborate a bit.

Thanks, Daniel

> ---
>  .../gpu/drm/i915/gem/i915_gem_object_types.h  |  7 ++
>  drivers/gpu/drm/i915/i915_gem.c               | 16 ++++
>  drivers/gpu/drm/i915/intel_memory_region.c    | 89 ++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_memory_region.h    | 10 +++
>  .../drm/i915/selftests/intel_memory_region.c  | 73 +++++++++++++++
>  drivers/gpu/drm/i915/selftests/mock_region.c  |  1 +
>  6 files changed, 192 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> index 8d760e852c4b..87000fc24ab3 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h
> @@ -72,6 +72,13 @@ struct drm_i915_gem_object {
>  	 * List of memory region blocks allocated for this object.
>  	 */
>  	struct list_head blocks;
> +	/**
> +	 * Element within memory_region->objects or memory_region->purgeable if
> +	 * the object is marked as DONTNEED. Access is protected by
> +	 * memory_region->obj_lock.
> +	 */
> +	struct list_head region_link;
> +	struct list_head eviction_link;
>  
>  	struct {
>  		/**
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index db3744b0bc80..85677ae89849 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1122,6 +1122,22 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data,
>  	    !i915_gem_object_has_pages(obj))
>  		i915_gem_object_truncate(obj);
>  
> +	if (obj->memory_region) {
> +		mutex_lock(&obj->memory_region->obj_lock);
> +
> +		switch (obj->mm.madv) {
> +		case I915_MADV_WILLNEED:
> +			list_move(&obj->region_link, &obj->memory_region->objects);
> +			break;
> +		default:
> +			list_move(&obj->region_link,
> +				  &obj->memory_region->purgeable);
> +			break;
> +		}
> +
> +		mutex_unlock(&obj->memory_region->obj_lock);
> +	}
> +
>  	args->retained = obj->mm.madv != __I915_MADV_PURGED;
>  	mutex_unlock(&obj->mm.lock);
>  
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.c b/drivers/gpu/drm/i915/intel_memory_region.c
> index 4c89853a7769..721b47e46492 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/intel_memory_region.c
> @@ -6,6 +6,56 @@
>  #include "intel_memory_region.h"
>  #include "i915_drv.h"
>  
> +int i915_memory_region_evict(struct intel_memory_region *mem,
> +			     resource_size_t target)
> +{
> +	struct drm_i915_gem_object *obj, *on;
> +	resource_size_t found;
> +	LIST_HEAD(purgeable);
> +	int err;
> +
> +	err = 0;
> +	found = 0;
> +
> +	mutex_lock(&mem->obj_lock);
> +
> +	list_for_each_entry(obj, &mem->purgeable, region_link) {
> +		if (!i915_gem_object_has_pages(obj))
> +			continue;
> +
> +		if (READ_ONCE(obj->pin_global))
> +			continue;
> +
> +		if (atomic_read(&obj->bind_count))
> +			continue;
> +
> +		list_add(&obj->eviction_link, &purgeable);
> +
> +		found += obj->base.size;
> +		if (found >= target)
> +			goto found;
> +	}
> +
> +	err = -ENOSPC;
> +found:
> +	list_for_each_entry_safe(obj, on, &purgeable, eviction_link) {
> +		if (!err) {
> +			__i915_gem_object_put_pages(obj, I915_MM_SHRINKER);
> +
> +			mutex_lock_nested(&obj->mm.lock, I915_MM_SHRINKER);
> +			if (!i915_gem_object_has_pages(obj))
> +				obj->mm.madv = __I915_MADV_PURGED;
> +			mutex_unlock(&obj->mm.lock);
> +		}
> +
> +		list_del(&obj->eviction_link);
> +	}
> +
> +	mutex_unlock(&mem->obj_lock);
> +
> +	return err;
> +}
> +
>  static void
>  memory_region_free_pages(struct drm_i915_gem_object *obj,
>  			 struct sg_table *pages)
> @@ -70,7 +120,8 @@ i915_memory_region_get_pages_buddy(struct drm_i915_gem_object *obj)
>  		unsigned int order;
>  		u64 block_size;
>  		u64 offset;
> -
> +		bool retry = true;
> +retry:
>  		order = fls(n_pages) - 1;
>  		GEM_BUG_ON(order > mem->mm.max_order);
>  
> @@ -79,9 +130,24 @@ i915_memory_region_get_pages_buddy(struct drm_i915_gem_object *obj)
>  			if (!IS_ERR(block))
>  				break;
>  
> -			/* XXX: some kind of eviction pass, local to the device */
> -			if (!order--)
> -				goto err_free_blocks;
> +			if (!order--) {
> +				resource_size_t target;
> +				int err;
> +
> +				if (!retry)
> +					goto err_free_blocks;
> +
> +				target = n_pages * mem->mm.min_size;
> +
> +				mutex_unlock(&mem->mm_lock);
> +				err = i915_memory_region_evict(mem, target);
> +				mutex_lock(&mem->mm_lock);
> +				if (err)
> +					goto err_free_blocks;
> +
> +				retry = false;
> +				goto retry;
> +			}
>  		} while (1);
>  
>  		n_pages -= BIT(order);
> @@ -136,6 +202,13 @@ void i915_memory_region_release_buddy(struct intel_memory_region *mem)
>  	i915_buddy_fini(&mem->mm);
>  }
>  
> +void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj)
> +{
> +	mutex_lock(&obj->memory_region->obj_lock);
> +	list_del(&obj->region_link);
> +	mutex_unlock(&obj->memory_region->obj_lock);
> +}
> +
>  struct drm_i915_gem_object *
>  i915_gem_object_create_region(struct intel_memory_region *mem,
>  			      resource_size_t size,
> @@ -164,6 +237,10 @@ i915_gem_object_create_region(struct intel_memory_region *mem,
>  	INIT_LIST_HEAD(&obj->blocks);
>  	obj->memory_region = mem;
>  
> +	mutex_lock(&mem->obj_lock);
> +	list_add(&obj->region_link, &mem->objects);
> +	mutex_unlock(&mem->obj_lock);
> +
>  	return obj;
>  }
>  
> @@ -188,6 +265,10 @@ intel_memory_region_create(struct drm_i915_private *i915,
>  	mem->min_page_size = min_page_size;
>  	mem->ops = ops;
>  
> +	mutex_init(&mem->obj_lock);
> +	INIT_LIST_HEAD(&mem->objects);
> +	INIT_LIST_HEAD(&mem->purgeable);
> +
>  	mutex_init(&mem->mm_lock);
>  
>  	if (ops->init) {
> diff --git a/drivers/gpu/drm/i915/intel_memory_region.h b/drivers/gpu/drm/i915/intel_memory_region.h
> index 8d4736bdde50..bee0c022d295 100644
> --- a/drivers/gpu/drm/i915/intel_memory_region.h
> +++ b/drivers/gpu/drm/i915/intel_memory_region.h
> @@ -80,8 +80,16 @@ struct intel_memory_region {
>  	unsigned int type;
>  	unsigned int instance;
>  	unsigned int id;
> +
> +	/* Protects access to objects and purgeable */
> +	struct mutex obj_lock;
> +	struct list_head objects;
> +	struct list_head purgeable;
>  };
>  
> +int i915_memory_region_evict(struct intel_memory_region *mem,
> +			     resource_size_t target);
> +
>  int i915_memory_region_init_buddy(struct intel_memory_region *mem);
>  void i915_memory_region_release_buddy(struct intel_memory_region *mem);
>  
> @@ -89,6 +97,8 @@ int i915_memory_region_get_pages_buddy(struct drm_i915_gem_object *obj);
>  void i915_memory_region_put_pages_buddy(struct drm_i915_gem_object *obj,
>  					struct sg_table *pages);
>  
> +void i915_gem_object_release_memory_region(struct drm_i915_gem_object *obj);
> +
>  struct intel_memory_region *
>  intel_memory_region_create(struct drm_i915_private *i915,
>  			   resource_size_t start,
> diff --git a/drivers/gpu/drm/i915/selftests/intel_memory_region.c b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> index c3b160cfd713..ece499869747 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_memory_region.c
> @@ -76,10 +76,83 @@ static int igt_mock_fill(void *arg)
>  	return err;
>  }
>  
> +static void igt_mark_evictable(struct drm_i915_gem_object *obj)
> +{
> +	i915_gem_object_unpin_pages(obj);
> +	obj->mm.madv = I915_MADV_DONTNEED;
> +	list_move(&obj->region_link, &obj->memory_region->purgeable);
> +}
> +
> +static int igt_mock_evict(void *arg)
> +{
> +	struct intel_memory_region *mem = arg;
> +	struct drm_i915_gem_object *obj;
> +	unsigned long n_objects;
> +	LIST_HEAD(objects);
> +	resource_size_t target;
> +	resource_size_t total;
> +	int err = 0;
> +
> +	target = mem->mm.min_size;
> +	total = resource_size(&mem->region);
> +	n_objects = total / target;
> +
> +	while (n_objects--) {
> +		obj = i915_gem_object_create_region(mem, target, 0);
> +		if (IS_ERR(obj)) {
> +			err = PTR_ERR(obj);
> +			goto err_close_objects;
> +		}
> +
> +		list_add(&obj->st_link, &objects);
> +
> +		err = i915_gem_object_pin_pages(obj);
> +		if (err)
> +			goto err_close_objects;
> +
> +		/*
> +		 * Make half of the region evictable, though do so in a
> +		 * horribly fragmented fashion.
> +		 */
> +		if (n_objects % 2)
> +			igt_mark_evictable(obj);
> +	}
> +
> +	while (target <= total / 2) {
> +		obj = i915_gem_object_create_region(mem, target, 0);
> +		if (IS_ERR(obj)) {
> +			err = PTR_ERR(obj);
> +			goto err_close_objects;
> +		}
> +
> +		list_add(&obj->st_link, &objects);
> +
> +		err = i915_gem_object_pin_pages(obj);
> +		if (err) {
> +			pr_err("failed to evict for target=%pa", &target);
> +			goto err_close_objects;
> +		}
> +
> +		/* Again, half of the region should remain evictable */
> +		igt_mark_evictable(obj);
> +
> +		target <<= 1;
> +	}
> +
> +err_close_objects:
> +	close_objects(&objects);
> +
> +	if (err == -ENOMEM)
> +		err = 0;
> +
> +	return err;
> +}
> +
>  int intel_memory_region_mock_selftests(void)
>  {
>  	static const struct i915_subtest tests[] = {
>  		SUBTEST(igt_mock_fill),
> +		SUBTEST(igt_mock_evict),
>  	};
>  	struct intel_memory_region *mem;
>  	struct drm_i915_private *i915;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_region.c b/drivers/gpu/drm/i915/selftests/mock_region.c
> index cb942a461e9d..80eafdc54927 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_region.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_region.c
> @@ -8,6 +8,7 @@
>  static const struct drm_i915_gem_object_ops mock_region_obj_ops = {
>  	.get_pages = i915_memory_region_get_pages_buddy,
>  	.put_pages = i915_memory_region_put_pages_buddy,
> +	.release = i915_gem_object_release_memory_region,
>  };
>  
>  static struct drm_i915_gem_object *
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Matthew Auld Aug. 15, 2019, 10:48 a.m. UTC | #3

On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > Support basic eviction for regions.
> >
> > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
>
> So from a very high level this looks like it was largely modelled after
> i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> running out of stuff" code). Any specific reasons?

IIRC I think it was originally based on the patches that exposed
stolen-memory to userspace from a few years ago.

>
> I think i915_gem_evict is a lot closer match for what we want for vram (it
> started out to manage severely limitted GTT on gen2/3/4) after all. With
> the complication that we'll have to manage physical memory with multiple
> virtual mappings of it on top, so unfortunately we can't just reuse the
> locking patter Chris has come up with in his struct_mutex-removal branch.
> But at least conceptually it should be a lot closer.

When you say make it more like i915_gem_evict, what does that mean?
Are you talking about the eviction roster stuff, or the
placement/locking of the eviction logic, with it being deep down in
get_pages?

>
> But I might be entirely off the track with reconstructing how this code
> came to be, so please elaborate a bit.
>
> Thanks, Daniel

Daniel Vetter Aug. 15, 2019, 2:26 p.m. UTC | #4

On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
<matthew.william.auld@gmail.com> wrote:
>
> On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > Support basic eviction for regions.
> > >
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> >
> > So from a very high level this looks like it was largely modelled after
> > i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> > running out of stuff" code). Any specific reasons?
>
> IIRC I think it was originally based on the patches that exposed
> stolen-memory to userspace from a few years ago.
>
> >
> > I think i915_gem_evict is a lot closer match for what we want for vram (it
> > started out to manage severely limitted GTT on gen2/3/4) after all. With
> > the complication that we'll have to manage physical memory with multiple
> > virtual mappings of it on top, so unfortunately we can't just reuse the
> > locking patter Chris has come up with in his struct_mutex-removal branch.
> > But at least conceptually it should be a lot closer.
>
> When you say make it more like i915_gem_evict, what does that mean?
> Are you talking about the eviction roster stuff, or the
> placement/locking of the eviction logic, with it being deep down in
> get_pages?

So there's kinda two aspects here that I meant.

First is the high-level approach of the shrinker, which is a direct
reflection of core mm low memory handling principles: Core mm just
tries to equally shrink everyone when there's low memory, which is
managed by watermarks, and a few other tricks. This is all only
best-effort, and if multiple threads want a lot of memory at the same
time then it's all going to fail with ENOMEM.

On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and very
much needed with the tiny gtt for everything in gen2/3/4/5) is that
when we run out of space, we stall, throw out everyone else, and have
exclusive access to the entire gpu space. Then the next batchbuffer
goes through the same dance. With this you guarantee that if you have
a series of batchbuffers which all need e.g. 60% of lmem, they will
all be able to execute. With the shrinker-style of low-memory handling
eventually you're unlucky, both threads will only get up to 50%, fail
with ENOSPC, and userspace crashes. Which is not good.

The other bit is locking. Since we need to free pages from the
shrinker there's tricky locking rules involved. Worse, we cannot back
off from the shrinker down to e.g. the kmalloc or alloc_pages called
that put us into reclaim. Which means the usual deadlock avoidance
trick of having a slowpath where you first drop all the locks, then
reacquire them in the right order doesn't work - in some cases the
caller of kmalloc or alloc_pages could be holding a lock that we'd
need to unlock first. Hence why the shrinker uses the
best-effort-might-fail solution of trylocks, encoded in shrinker_lock.

But for lmem we don't have such an excuse, because it's all our own
code. The locking design can (and should!) assume that it can get out
of any deadlock and always acquire all the locks it needs. Without
that you can't achive the first part about guaranteeing execution of
batches which collectively need more than 100% of lmem, but
individually all fit. As an example if you look at the amdgpu command
submission ioctl, that passes around ttm_operation_ctx which tracks a
few things about locks and other bits, and if they hit a possible
deadlock situation they can unwind the entire CS and restart by taking
the locks in the right order.

I thought I typed that up somewhere, but I guess it got lost ...

Cheers, Daniel

>
> >
> > But I might be entirely off the track with reconstructing how this code
> > came to be, so please elaborate a bit.
> >
> > Thanks, Daniel

--
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

Daniel Vetter Aug. 15, 2019, 2:34 p.m. UTC | #5

On Thu, Aug 15, 2019 at 4:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
> <matthew.william.auld@gmail.com> wrote:
> >
> > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > Support basic eviction for regions.
> > > >
> > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > >
> > > So from a very high level this looks like it was largely modelled after
> > > i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> > > running out of stuff" code). Any specific reasons?
> >
> > IIRC I think it was originally based on the patches that exposed
> > stolen-memory to userspace from a few years ago.

Forgot to add this:

Yeah, I guess those have been best-effort to at most. Or at least I
never really looked at them seriously since the open source userspace
never went anywhere.
-Daniel

> > > I think i915_gem_evict is a lot closer match for what we want for vram (it
> > > started out to manage severely limitted GTT on gen2/3/4) after all. With
> > > the complication that we'll have to manage physical memory with multiple
> > > virtual mappings of it on top, so unfortunately we can't just reuse the
> > > locking patter Chris has come up with in his struct_mutex-removal branch.
> > > But at least conceptually it should be a lot closer.
> >
> > When you say make it more like i915_gem_evict, what does that mean?
> > Are you talking about the eviction roster stuff, or the
> > placement/locking of the eviction logic, with it being deep down in
> > get_pages?
>
> So there's kinda two aspects here that I meant.
>
> First is the high-level approach of the shrinker, which is a direct
> reflection of core mm low memory handling principles: Core mm just
> tries to equally shrink everyone when there's low memory, which is
> managed by watermarks, and a few other tricks. This is all only
> best-effort, and if multiple threads want a lot of memory at the same
> time then it's all going to fail with ENOMEM.
>
> On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and very
> much needed with the tiny gtt for everything in gen2/3/4/5) is that
> when we run out of space, we stall, throw out everyone else, and have
> exclusive access to the entire gpu space. Then the next batchbuffer
> goes through the same dance. With this you guarantee that if you have
> a series of batchbuffers which all need e.g. 60% of lmem, they will
> all be able to execute. With the shrinker-style of low-memory handling
> eventually you're unlucky, both threads will only get up to 50%, fail
> with ENOSPC, and userspace crashes. Which is not good.
>
> The other bit is locking. Since we need to free pages from the
> shrinker there's tricky locking rules involved. Worse, we cannot back
> off from the shrinker down to e.g. the kmalloc or alloc_pages called
> that put us into reclaim. Which means the usual deadlock avoidance
> trick of having a slowpath where you first drop all the locks, then
> reacquire them in the right order doesn't work - in some cases the
> caller of kmalloc or alloc_pages could be holding a lock that we'd
> need to unlock first. Hence why the shrinker uses the
> best-effort-might-fail solution of trylocks, encoded in shrinker_lock.
>
> But for lmem we don't have such an excuse, because it's all our own
> code. The locking design can (and should!) assume that it can get out
> of any deadlock and always acquire all the locks it needs. Without
> that you can't achive the first part about guaranteeing execution of
> batches which collectively need more than 100% of lmem, but
> individually all fit. As an example if you look at the amdgpu command
> submission ioctl, that passes around ttm_operation_ctx which tracks a
> few things about locks and other bits, and if they hit a possible
> deadlock situation they can unwind the entire CS and restart by taking
> the locks in the right order.
>
> I thought I typed that up somewhere, but I guess it got lost ...
>
> Cheers, Daniel
>
> >
> > >
> > > But I might be entirely off the track with reconstructing how this code
> > > came to be, so please elaborate a bit.
> > >
> > > Thanks, Daniel
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

Tang, CQ Aug. 15, 2019, 2:57 p.m. UTC | #6

> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> Of Daniel Vetter
> Sent: Thursday, August 15, 2019 7:27 AM
> To: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>; Auld,
> Matthew <matthew.auld@intel.com>
> Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic
> eviction
> 
> On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
> <matthew.william.auld@gmail.com> wrote:
> >
> > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > Support basic eviction for regions.
> > > >
> > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > >
> > > So from a very high level this looks like it was largely modelled
> > > after i915_gem_shrink.c and not i915_gem_evict.c (our other "make
> > > room, we're running out of stuff" code). Any specific reasons?
> >
> > IIRC I think it was originally based on the patches that exposed
> > stolen-memory to userspace from a few years ago.
> >
> > >
> > > I think i915_gem_evict is a lot closer match for what we want for
> > > vram (it started out to manage severely limitted GTT on gen2/3/4)
> > > after all. With the complication that we'll have to manage physical
> > > memory with multiple virtual mappings of it on top, so unfortunately
> > > we can't just reuse the locking patter Chris has come up with in his
> struct_mutex-removal branch.
> > > But at least conceptually it should be a lot closer.
> >
> > When you say make it more like i915_gem_evict, what does that mean?
> > Are you talking about the eviction roster stuff, or the
> > placement/locking of the eviction logic, with it being deep down in
> > get_pages?
> 
> So there's kinda two aspects here that I meant.
> 
> First is the high-level approach of the shrinker, which is a direct reflection of
> core mm low memory handling principles: Core mm just tries to equally
> shrink everyone when there's low memory, which is managed by
> watermarks, and a few other tricks. This is all only best-effort, and if multiple
> threads want a lot of memory at the same time then it's all going to fail with
> ENOMEM.
> 
> On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and very much
> needed with the tiny gtt for everything in gen2/3/4/5) is that when we run
> out of space, we stall, throw out everyone else, and have exclusive access to
> the entire gpu space. Then the next batchbuffer goes through the same
> dance. With this you guarantee that if you have a series of batchbuffers
> which all need e.g. 60% of lmem, they will all be able to execute. With the
> shrinker-style of low-memory handling eventually you're unlucky, both
> threads will only get up to 50%, fail with ENOSPC, and userspace crashes.
> Which is not good.
> 
> The other bit is locking. Since we need to free pages from the shrinker
> there's tricky locking rules involved. Worse, we cannot back off from the
> shrinker down to e.g. the kmalloc or alloc_pages called that put us into
> reclaim. Which means the usual deadlock avoidance trick of having a
> slowpath where you first drop all the locks, then reacquire them in the right
> order doesn't work - in some cases the caller of kmalloc or alloc_pages could
> be holding a lock that we'd need to unlock first. Hence why the shrinker uses
> the best-effort-might-fail solution of trylocks, encoded in shrinker_lock.
> 
> But for lmem we don't have such an excuse, because it's all our own code.
> The locking design can (and should!) assume that it can get out of any
> deadlock and always acquire all the locks it needs. Without that you can't
> achive the first part about guaranteeing execution of batches which
> collectively need more than 100% of lmem, but individually all fit. As an
> example if you look at the amdgpu command submission ioctl, that passes
> around ttm_operation_ctx which tracks a few things about locks and other
> bits, and if they hit a possible deadlock situation they can unwind the entire
> CS and restart by taking the locks in the right order.

Thank you for the explanation.

What does our 'struct_mutex' protect for exactly?  As example, I see when blitter engine functions are called, we hold 'struct_mutex" first.

Can we replace 'struct_mutex' with some fine-grain locks so that we can lock obj->mm.lock first, and then lock these fine-grain locks?

I need some background info about 'struct_mutex' design.

--CQ

> 
> I thought I typed that up somewhere, but I guess it got lost ...
> 
> Cheers, Daniel
> 
> >
> > >
> > > But I might be entirely off the track with reconstructing how this
> > > code came to be, so please elaborate a bit.
> > >
> > > Thanks, Daniel
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Chris Wilson Aug. 15, 2019, 3:26 p.m. UTC | #7

Quoting Matthew Auld (2019-08-15 11:48:04)
> On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > Support basic eviction for regions.
> > >
> > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> >
> > So from a very high level this looks like it was largely modelled after
> > i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> > running out of stuff" code). Any specific reasons?
> 
> IIRC I think it was originally based on the patches that exposed
> stolen-memory to userspace from a few years ago.
> 
> >
> > I think i915_gem_evict is a lot closer match for what we want for vram (it
> > started out to manage severely limitted GTT on gen2/3/4) after all. With
> > the complication that we'll have to manage physical memory with multiple
> > virtual mappings of it on top, so unfortunately we can't just reuse the
> > locking patter Chris has come up with in his struct_mutex-removal branch.
> > But at least conceptually it should be a lot closer.
> 
> When you say make it more like i915_gem_evict, what does that mean?
> Are you talking about the eviction roster stuff, or the
> placement/locking of the eviction logic, with it being deep down in
> get_pages?

The biggest difference would be the lack of region coalescing; the
eviction code only tries to free what would result in a successful
allocation. With the order being put into the scanner somewhat relevant,
in practice, fragmentation effects cause the range search to be somewhat
slow and we much prefer the random replacement -- while harmful, it is
not biased as to who it harms, and so is consistent overhead. However,
since you don't need to find a slot inside a small range within a few
million objects, I would expect LRU or even MRU (recently used objects
in games tend to be more ephemeral and so made good eviction targets, at
least according to John Carmack back in the day) to require fewer major
faults.
https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_day/johnc_plan_20000307.txt

You would need a very similar scanner to keep a journal of the potential
frees from which to track the coalescing (slightly more complicated due
to the disjoint nature of the buddy merges). One suspects that adding
the scanner would shape the buddy_nodes more towards drm_mm_nodes.

This is also a case where real world testing of a thrashing load beats
simulation.  So just make sure the eviction doesn't stall the entire GPU
and submission pipeline and you will be forgiven most transgressions.
-Chris

Daniel Vetter Aug. 15, 2019, 4:20 p.m. UTC | #8

On Thu, Aug 15, 2019 at 4:58 PM Tang, CQ <cq.tang@intel.com> wrote:
>
>
>
> > -----Original Message-----
> > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On Behalf
> > Of Daniel Vetter
> > Sent: Thursday, August 15, 2019 7:27 AM
> > To: Matthew Auld <matthew.william.auld@gmail.com>
> > Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>; Auld,
> > Matthew <matthew.auld@intel.com>
> > Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic
> > eviction
> >
> > On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
> > <matthew.william.auld@gmail.com> wrote:
> > >
> > > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >
> > > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > > Support basic eviction for regions.
> > > > >
> > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > > >
> > > > So from a very high level this looks like it was largely modelled
> > > > after i915_gem_shrink.c and not i915_gem_evict.c (our other "make
> > > > room, we're running out of stuff" code). Any specific reasons?
> > >
> > > IIRC I think it was originally based on the patches that exposed
> > > stolen-memory to userspace from a few years ago.
> > >
> > > >
> > > > I think i915_gem_evict is a lot closer match for what we want for
> > > > vram (it started out to manage severely limitted GTT on gen2/3/4)
> > > > after all. With the complication that we'll have to manage physical
> > > > memory with multiple virtual mappings of it on top, so unfortunately
> > > > we can't just reuse the locking patter Chris has come up with in his
> > struct_mutex-removal branch.
> > > > But at least conceptually it should be a lot closer.
> > >
> > > When you say make it more like i915_gem_evict, what does that mean?
> > > Are you talking about the eviction roster stuff, or the
> > > placement/locking of the eviction logic, with it being deep down in
> > > get_pages?
> >
> > So there's kinda two aspects here that I meant.
> >
> > First is the high-level approach of the shrinker, which is a direct reflection of
> > core mm low memory handling principles: Core mm just tries to equally
> > shrink everyone when there's low memory, which is managed by
> > watermarks, and a few other tricks. This is all only best-effort, and if multiple
> > threads want a lot of memory at the same time then it's all going to fail with
> > ENOMEM.
> >
> > On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and very much
> > needed with the tiny gtt for everything in gen2/3/4/5) is that when we run
> > out of space, we stall, throw out everyone else, and have exclusive access to
> > the entire gpu space. Then the next batchbuffer goes through the same
> > dance. With this you guarantee that if you have a series of batchbuffers
> > which all need e.g. 60% of lmem, they will all be able to execute. With the
> > shrinker-style of low-memory handling eventually you're unlucky, both
> > threads will only get up to 50%, fail with ENOSPC, and userspace crashes.
> > Which is not good.
> >
> > The other bit is locking. Since we need to free pages from the shrinker
> > there's tricky locking rules involved. Worse, we cannot back off from the
> > shrinker down to e.g. the kmalloc or alloc_pages called that put us into
> > reclaim. Which means the usual deadlock avoidance trick of having a
> > slowpath where you first drop all the locks, then reacquire them in the right
> > order doesn't work - in some cases the caller of kmalloc or alloc_pages could
> > be holding a lock that we'd need to unlock first. Hence why the shrinker uses
> > the best-effort-might-fail solution of trylocks, encoded in shrinker_lock.
> >
> > But for lmem we don't have such an excuse, because it's all our own code.
> > The locking design can (and should!) assume that it can get out of any
> > deadlock and always acquire all the locks it needs. Without that you can't
> > achive the first part about guaranteeing execution of batches which
> > collectively need more than 100% of lmem, but individually all fit. As an
> > example if you look at the amdgpu command submission ioctl, that passes
> > around ttm_operation_ctx which tracks a few things about locks and other
> > bits, and if they hit a possible deadlock situation they can unwind the entire
> > CS and restart by taking the locks in the right order.
>
> Thank you for the explanation.
>
> What does our 'struct_mutex' protect for exactly?  As example, I see when blitter engine functions are called, we hold 'struct_mutex" first.
>
> Can we replace 'struct_mutex' with some fine-grain locks so that we can lock obj->mm.lock first, and then lock these fine-grain locks?

Sure. With lots of efforts.

> I need some background info about 'struct_mutex' design.

There's not really a design behind it, it's just 10+ years of evolution.
-Daniel

> --CQ
>
> >
> > I thought I typed that up somewhere, but I guess it got lost ...
> >
> > Cheers, Daniel
> >
> > >
> > > >
> > > > But I might be entirely off the track with reconstructing how this
> > > > code came to be, so please elaborate a bit.
> > > >
> > > > Thanks, Daniel
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Daniel Vetter Aug. 15, 2019, 4:23 p.m. UTC | #9

On Thu, Aug 15, 2019 at 5:26 PM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Matthew Auld (2019-08-15 11:48:04)
> > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > Support basic eviction for regions.
> > > >
> > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > >
> > > So from a very high level this looks like it was largely modelled after
> > > i915_gem_shrink.c and not i915_gem_evict.c (our other "make room, we're
> > > running out of stuff" code). Any specific reasons?
> >
> > IIRC I think it was originally based on the patches that exposed
> > stolen-memory to userspace from a few years ago.
> >
> > >
> > > I think i915_gem_evict is a lot closer match for what we want for vram (it
> > > started out to manage severely limitted GTT on gen2/3/4) after all. With
> > > the complication that we'll have to manage physical memory with multiple
> > > virtual mappings of it on top, so unfortunately we can't just reuse the
> > > locking patter Chris has come up with in his struct_mutex-removal branch.
> > > But at least conceptually it should be a lot closer.
> >
> > When you say make it more like i915_gem_evict, what does that mean?
> > Are you talking about the eviction roster stuff, or the
> > placement/locking of the eviction logic, with it being deep down in
> > get_pages?
>
> The biggest difference would be the lack of region coalescing; the
> eviction code only tries to free what would result in a successful
> allocation. With the order being put into the scanner somewhat relevant,
> in practice, fragmentation effects cause the range search to be somewhat
> slow and we much prefer the random replacement -- while harmful, it is
> not biased as to who it harms, and so is consistent overhead. However,
> since you don't need to find a slot inside a small range within a few
> million objects, I would expect LRU or even MRU (recently used objects
> in games tend to be more ephemeral and so made good eviction targets, at
> least according to John Carmack back in the day) to require fewer major
> faults.
> https://github.com/ESWAT/john-carmack-plan-archive/blob/master/by_day/johnc_plan_20000307.txt
>
> You would need a very similar scanner to keep a journal of the potential
> frees from which to track the coalescing (slightly more complicated due
> to the disjoint nature of the buddy merges). One suspects that adding
> the scanner would shape the buddy_nodes more towards drm_mm_nodes.
>
> This is also a case where real world testing of a thrashing load beats
> simulation.  So just make sure the eviction doesn't stall the entire GPU
> and submission pipeline and you will be forgiven most transgressions.

Yeah the fancy roster is definitely not on the wishlist until we have
this all optimized already. And even then it's probably better to not
be fancy, since we don't really need a contiguous block for pretty
much anything.
-Daniel

Tang, CQ Aug. 15, 2019, 4:35 p.m. UTC | #10

> -----Original Message-----
> From: Daniel Vetter [mailto:daniel@ffwll.ch]
> Sent: Thursday, August 15, 2019 9:21 AM
> To: Tang, CQ <cq.tang@intel.com>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>; Intel Graphics
> Development <intel-gfx@lists.freedesktop.org>; Auld, Matthew
> <matthew.auld@intel.com>
> Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support basic
> eviction
> 
> On Thu, Aug 15, 2019 at 4:58 PM Tang, CQ <cq.tang@intel.com> wrote:
> >
> >
> >
> > > -----Original Message-----
> > > From: Intel-gfx [mailto:intel-gfx-bounces@lists.freedesktop.org] On
> > > Behalf Of Daniel Vetter
> > > Sent: Thursday, August 15, 2019 7:27 AM
> > > To: Matthew Auld <matthew.william.auld@gmail.com>
> > > Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>;
> > > Auld, Matthew <matthew.auld@intel.com>
> > > Subject: Re: [Intel-gfx] [PATCH v2 03/37] drm/i915/region: support
> > > basic eviction
> > >
> > > On Thu, Aug 15, 2019 at 12:48 PM Matthew Auld
> > > <matthew.william.auld@gmail.com> wrote:
> > > >
> > > > On Tue, 30 Jul 2019 at 17:26, Daniel Vetter <daniel@ffwll.ch> wrote:
> > > > >
> > > > > On Thu, Jun 27, 2019 at 09:55:59PM +0100, Matthew Auld wrote:
> > > > > > Support basic eviction for regions.
> > > > > >
> > > > > > Signed-off-by: Matthew Auld <matthew.auld@intel.com>
> > > > > > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > > > > > Cc: Abdiel Janulgue <abdiel.janulgue@linux.intel.com>
> > > > >
> > > > > So from a very high level this looks like it was largely
> > > > > modelled after i915_gem_shrink.c and not i915_gem_evict.c (our
> > > > > other "make room, we're running out of stuff" code). Any specific
> reasons?
> > > >
> > > > IIRC I think it was originally based on the patches that exposed
> > > > stolen-memory to userspace from a few years ago.
> > > >
> > > > >
> > > > > I think i915_gem_evict is a lot closer match for what we want
> > > > > for vram (it started out to manage severely limitted GTT on
> > > > > gen2/3/4) after all. With the complication that we'll have to
> > > > > manage physical memory with multiple virtual mappings of it on
> > > > > top, so unfortunately we can't just reuse the locking patter
> > > > > Chris has come up with in his
> > > struct_mutex-removal branch.
> > > > > But at least conceptually it should be a lot closer.
> > > >
> > > > When you say make it more like i915_gem_evict, what does that mean?
> > > > Are you talking about the eviction roster stuff, or the
> > > > placement/locking of the eviction logic, with it being deep down
> > > > in get_pages?
> > >
> > > So there's kinda two aspects here that I meant.
> > >
> > > First is the high-level approach of the shrinker, which is a direct
> > > reflection of core mm low memory handling principles: Core mm just
> > > tries to equally shrink everyone when there's low memory, which is
> > > managed by watermarks, and a few other tricks. This is all only
> > > best-effort, and if multiple threads want a lot of memory at the
> > > same time then it's all going to fail with ENOMEM.
> > >
> > > On gpus otoh, and what we do in i915_gem_eviction.c for gtt (and
> > > very much needed with the tiny gtt for everything in gen2/3/4/5) is
> > > that when we run out of space, we stall, throw out everyone else,
> > > and have exclusive access to the entire gpu space. Then the next
> > > batchbuffer goes through the same dance. With this you guarantee
> > > that if you have a series of batchbuffers which all need e.g. 60% of
> > > lmem, they will all be able to execute. With the shrinker-style of
> > > low-memory handling eventually you're unlucky, both threads will only
> get up to 50%, fail with ENOSPC, and userspace crashes.
> > > Which is not good.
> > >
> > > The other bit is locking. Since we need to free pages from the
> > > shrinker there's tricky locking rules involved. Worse, we cannot
> > > back off from the shrinker down to e.g. the kmalloc or alloc_pages
> > > called that put us into reclaim. Which means the usual deadlock
> > > avoidance trick of having a slowpath where you first drop all the
> > > locks, then reacquire them in the right order doesn't work - in some
> > > cases the caller of kmalloc or alloc_pages could be holding a lock
> > > that we'd need to unlock first. Hence why the shrinker uses the best-
> effort-might-fail solution of trylocks, encoded in shrinker_lock.
> > >
> > > But for lmem we don't have such an excuse, because it's all our own code.
> > > The locking design can (and should!) assume that it can get out of
> > > any deadlock and always acquire all the locks it needs. Without that
> > > you can't achive the first part about guaranteeing execution of
> > > batches which collectively need more than 100% of lmem, but
> > > individually all fit. As an example if you look at the amdgpu
> > > command submission ioctl, that passes around ttm_operation_ctx which
> > > tracks a few things about locks and other bits, and if they hit a
> > > possible deadlock situation they can unwind the entire CS and restart by
> taking the locks in the right order.
> >
> > Thank you for the explanation.
> >
> > What does our 'struct_mutex' protect for exactly?  As example, I see when
> blitter engine functions are called, we hold 'struct_mutex" first.
> >
> > Can we replace 'struct_mutex' with some fine-grain locks so that we can
> lock obj->mm.lock first, and then lock these fine-grain locks?
> 
> Sure. With lots of efforts.
> 
> > I need some background info about 'struct_mutex' design.
> 
> There's not really a design behind it, it's just 10+ years of evolution.

Yes, in old days, a big coarse-grain lock was OK. Now with so many engines in new hardware for new computation workloads, we might need to do fine-grain locks

--CQ


> -Daniel
> 
> > --CQ
> >
> > >
> > > I thought I typed that up somewhere, but I guess it got lost ...
> > >
> > > Cheers, Daniel
> > >
> > > >
> > > > >
> > > > > But I might be entirely off the track with reconstructing how
> > > > > this code came to be, so please elaborate a bit.
> > > > >
> > > > > Thanks, Daniel
> > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > > _______________________________________________
> > > Intel-gfx mailing list
> > > Intel-gfx@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> 
> 
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

[v2,03/37] drm/i915/region: support basic eviction

Commit Message

Comments

Patch