[01/12] drm/i915/guc: Avoid reclaim locks during reset

Message ID	20190701100502.15639-1-chris@chris-wilson.co.uk (mailing list archive)
State	New, archived
Headers	show Return-Path: <intel-gfx-bounces@lists.freedesktop.org> From: Chris Wilson <chris@chris-wilson.co.uk> To: intel-gfx@lists.freedesktop.org Date: Mon, 1 Jul 2019 11:04:51 +0100 Message-Id: <20190701100502.15639-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/12] drm/i915/guc: Avoid reclaim locks during reset Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>
Series	[01/12] drm/i915/guc: Avoid reclaim locks during reset \| expand [01/12] drm/i915/guc: Avoid reclaim locks during reset [02/12] drm/i915: Markup potential lock for i915_active [03/12] drm/i915: Mark up vma->active as safe for use inside shrinkers [04/12] drm/i915/execlists: Refactor CSB state machine [05/12] drm/i915/execlists: Hesitate before slicing [06/12] drm/i915/selftests: Lock the drm_mm while modifying [07/12] drm/i915: Teach execbuffer to take the engine wakeref not GT [08/12] drm/i915/gt: Track timeline activeness in enter/exit [09/12] drm/i915/gt: Convert timeline tracking to spinlock [10/12] drm/i915/gt: Guard timeline pinning with its own mutex [11/12] drm/i915: Protect request retirement with timeline->mutex [12/12] drm/i915: Replace struct_mutex for batch pool serialisation

Message ID

20190701100502.15639-1-chris@chris-wilson.co.uk (mailing list archive)

State

New, archived

Headers

From: Chris Wilson <chris@chris-wilson.co.uk>
To: intel-gfx@lists.freedesktop.org
Date: Mon,  1 Jul 2019 11:04:51 +0100
Message-Id: <20190701100502.15639-1-chris@chris-wilson.co.uk>
MIME-Version: 1.0
Subject: [Intel-gfx] [PATCH 01/12] drm/i915/guc: Avoid reclaim locks during
 reset
Precedence: list
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: intel-gfx-bounces@lists.freedesktop.org
Sender: "Intel-gfx" <intel-gfx-bounces@lists.freedesktop.org>

Series

[01/12] drm/i915/guc: Avoid reclaim locks during reset | expand

Commit Message

Chris Wilson July 1, 2019, 10:04 a.m. UTC

During reset, we must be very selective in which locks we take as most
are tainted by being held across a wait or reclaim (kmalloc) which
implicitly waits. Inside the guc reset path, we reset the ADS to sane
defaults, but must keep it pinned from initialisation to avoid having to
pin it during reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
---
 drivers/gpu/drm/i915/intel_guc.h     |  4 ++++
 drivers/gpu/drm/i915/intel_guc_ads.c | 26 +++++++++++++-------------
 2 files changed, 17 insertions(+), 13 deletions(-)

Comments

Michal Wajdeczko July 1, 2019, 12:36 p.m. UTC | #1

On Mon, 01 Jul 2019 12:04:51 +0200, Chris Wilson  
<chris@chris-wilson.co.uk> wrote:

> During reset, we must be very selective in which locks we take as most
> are tainted by being held across a wait or reclaim (kmalloc) which
> implicitly waits. Inside the guc reset path, we reset the ADS to sane
> defaults, but must keep it pinned from initialisation to avoid having to
> pin it during reset.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>

but I'm little worried about leaving stale guc->ads_blob below:

> @@ -183,7 +183,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
> void intel_guc_ads_destroy(struct intel_guc *guc)
>  {
> -	i915_vma_unpin_and_release(&guc->ads_vma, 0);
> +	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
>  }

maybe there is a way to get ptr right from the pinned/mapped vma
without introducing extra separate field that might go out of sync ?

Michal

Chris Wilson July 1, 2019, 1:48 p.m. UTC | #2

Quoting Michal Wajdeczko (2019-07-01 13:36:28)
> On Mon, 01 Jul 2019 12:04:51 +0200, Chris Wilson  
> <chris@chris-wilson.co.uk> wrote:
> 
> > During reset, we must be very selective in which locks we take as most
> > are tainted by being held across a wait or reclaim (kmalloc) which
> > implicitly waits. Inside the guc reset path, we reset the ADS to sane
> > defaults, but must keep it pinned from initialisation to avoid having to
> > pin it during reset.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> 
> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
> 
> but I'm little worried about leaving stale guc->ads_blob below:
> 
> > @@ -183,7 +183,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
> > void intel_guc_ads_destroy(struct intel_guc *guc)
> >  {
> > -     i915_vma_unpin_and_release(&guc->ads_vma, 0);
> > +     i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
> >  }
> 
> maybe there is a way to get ptr right from the pinned/mapped vma
> without introducing extra separate field that might go out of sync ?

You mean the vaddr? I look at it as your token of ownership: this is the
address I pinned. While you own that pin, it is not allowed to change.

I expect, If we ever start wanting separate concurrent views of the
object, the return from pin_map will be its own little refcount -- or
simply not be cached. So to remind myself, the cache is because vmap is
slow and we use it frequently for cmdparsing.

So we could just transfer ownership of the map entirely to the caller
and leave it to utilities like the buffer cache to retain the map. I
don't think we actually have concurrent users of the maps, but I
wouldn't bet on it.
-Chris

Daniele Ceraolo Spurio July 1, 2019, 6:12 p.m. UTC | #3

On 7/1/19 3:04 AM, Chris Wilson wrote:
> During reset, we must be very selective in which locks we take as most
> are tainted by being held across a wait or reclaim (kmalloc) which
> implicitly waits. Inside the guc reset path, we reset the ADS to sane
> defaults, but must keep it pinned from initialisation to avoid having to
> pin it during reset.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>

I'm wondering if we should add an assert when the locks are taken inside 
the reset path to catch similar issues in the future, because they could 
slip through review.

Daniele

> ---
>   drivers/gpu/drm/i915/intel_guc.h     |  4 ++++
>   drivers/gpu/drm/i915/intel_guc_ads.c | 26 +++++++++++++-------------
>   2 files changed, 17 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index d6a75bc3d7f4..d91c96679dbb 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -35,6 +35,8 @@
>   #include "i915_utils.h"
>   #include "i915_vma.h"
>   
> +struct __guc_ads_blob;
> +
>   struct guc_preempt_work {
>   	struct work_struct work;
>   	struct intel_engine_cs *engine;
> @@ -65,6 +67,8 @@ struct intel_guc {
>   	} interrupts;
>   
>   	struct i915_vma *ads_vma;
> +	struct __guc_ads_blob *ads_blob;
> +
>   	struct i915_vma *stage_desc_pool;
>   	void *stage_desc_pool_vaddr;
>   	struct ida stage_ids;
> diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
> index ecb69fc94218..69859d1e047f 100644
> --- a/drivers/gpu/drm/i915/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/intel_guc_ads.c
> @@ -83,18 +83,14 @@ struct __guc_ads_blob {
>   	u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
>   } __packed;
>   
> -static int __guc_ads_init(struct intel_guc *guc)
> +static void __guc_ads_init(struct intel_guc *guc)
>   {
>   	struct drm_i915_private *dev_priv = guc_to_i915(guc);
> -	struct __guc_ads_blob *blob;
> +	struct __guc_ads_blob *blob = guc->ads_blob;
>   	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
>   	u32 base;
>   	u8 engine_class;
>   
> -	blob = i915_gem_object_pin_map(guc->ads_vma->obj, I915_MAP_WB);
> -	if (IS_ERR(blob))
> -		return PTR_ERR(blob);
> -
>   	/* GuC scheduling policies */
>   	guc_policies_init(&blob->policies);
>   
> @@ -144,9 +140,7 @@ static int __guc_ads_init(struct intel_guc *guc)
>   	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
>   	blob->ads.clients_info = base + ptr_offset(blob, clients_info);
>   
> -	i915_gem_object_unpin_map(guc->ads_vma->obj);
> -
> -	return 0;
> +	i915_gem_object_flush_map(guc->ads_vma->obj);
>   }
>   
>   /**
> @@ -160,6 +154,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
>   {
>   	const u32 size = PAGE_ALIGN(sizeof(struct __guc_ads_blob));
>   	struct i915_vma *vma;
> +	void *blob;
>   	int ret;
>   
>   	GEM_BUG_ON(guc->ads_vma);
> @@ -168,11 +163,16 @@ int intel_guc_ads_create(struct intel_guc *guc)
>   	if (IS_ERR(vma))
>   		return PTR_ERR(vma);
>   
> +	blob = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
> +	if (IS_ERR(blob)) {
> +		ret = PTR_ERR(blob);
> +		goto err_vma;
> +	}
> +
>   	guc->ads_vma = vma;
> +	guc->ads_blob = blob;
>   
> -	ret = __guc_ads_init(guc);
> -	if (ret)
> -		goto err_vma;
> +	__guc_ads_init(guc);
>   
>   	return 0;
>   
> @@ -183,7 +183,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
>   
>   void intel_guc_ads_destroy(struct intel_guc *guc)
>   {
> -	i915_vma_unpin_and_release(&guc->ads_vma, 0);
> +	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
>   }
>   
>   /**
>

diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index d6a75bc3d7f4..d91c96679dbb 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -35,6 +35,8 @@ 
 #include "i915_utils.h"
 #include "i915_vma.h"
 
+struct __guc_ads_blob;
+
 struct guc_preempt_work {
 	struct work_struct work;
 	struct intel_engine_cs *engine;
@@ -65,6 +67,8 @@  struct intel_guc {
 	} interrupts;
 
 	struct i915_vma *ads_vma;
+	struct __guc_ads_blob *ads_blob;
+
 	struct i915_vma *stage_desc_pool;
 	void *stage_desc_pool_vaddr;
 	struct ida stage_ids;
diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
index ecb69fc94218..69859d1e047f 100644
--- a/drivers/gpu/drm/i915/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/intel_guc_ads.c
@@ -83,18 +83,14 @@  struct __guc_ads_blob {
 	u8 reg_state_buffer[GUC_S3_SAVE_SPACE_PAGES * PAGE_SIZE];
 } __packed;
 
-static int __guc_ads_init(struct intel_guc *guc)
+static void __guc_ads_init(struct intel_guc *guc)
 {
 	struct drm_i915_private *dev_priv = guc_to_i915(guc);
-	struct __guc_ads_blob *blob;
+	struct __guc_ads_blob *blob = guc->ads_blob;
 	const u32 skipped_size = LRC_PPHWSP_SZ * PAGE_SIZE + LR_HW_CONTEXT_SIZE;
 	u32 base;
 	u8 engine_class;
 
-	blob = i915_gem_object_pin_map(guc->ads_vma->obj, I915_MAP_WB);
-	if (IS_ERR(blob))
-		return PTR_ERR(blob);
-
 	/* GuC scheduling policies */
 	guc_policies_init(&blob->policies);
 
@@ -144,9 +140,7 @@  static int __guc_ads_init(struct intel_guc *guc)
 	blob->ads.gt_system_info = base + ptr_offset(blob, system_info);
 	blob->ads.clients_info = base + ptr_offset(blob, clients_info);
 
-	i915_gem_object_unpin_map(guc->ads_vma->obj);
-
-	return 0;
+	i915_gem_object_flush_map(guc->ads_vma->obj);
 }
 
 /**
@@ -160,6 +154,7 @@  int intel_guc_ads_create(struct intel_guc *guc)
 {
 	const u32 size = PAGE_ALIGN(sizeof(struct __guc_ads_blob));
 	struct i915_vma *vma;
+	void *blob;
 	int ret;
 
 	GEM_BUG_ON(guc->ads_vma);
@@ -168,11 +163,16 @@  int intel_guc_ads_create(struct intel_guc *guc)
 	if (IS_ERR(vma))
 		return PTR_ERR(vma);
 
+	blob = i915_gem_object_pin_map(vma->obj, I915_MAP_WB);
+	if (IS_ERR(blob)) {
+		ret = PTR_ERR(blob);
+		goto err_vma;
+	}
+
 	guc->ads_vma = vma;
+	guc->ads_blob = blob;
 
-	ret = __guc_ads_init(guc);
-	if (ret)
-		goto err_vma;
+	__guc_ads_init(guc);
 
 	return 0;
 
@@ -183,7 +183,7 @@  int intel_guc_ads_create(struct intel_guc *guc)
 
 void intel_guc_ads_destroy(struct intel_guc *guc)
 {
-	i915_vma_unpin_and_release(&guc->ads_vma, 0);
+	i915_vma_unpin_and_release(&guc->ads_vma, I915_VMA_RELEASE_MAP);
 }
 
 /**

[01/12] drm/i915/guc: Avoid reclaim locks during reset

Commit Message

Comments

Patch