From patchwork Tue Nov 24 11:42:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4BA6C56201 for ; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 31AC5206D8 for ; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 31AC5206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 7E45E6E241; Tue, 24 Nov 2020 11:42:32 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id D83076E233 for ; Tue, 24 Nov 2020 11:42:30 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090921-1500050 for multiple; Tue, 24 Nov 2020 11:42:20 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:04 +0000 Message-Id: <20201124114219.29020-1-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/16] drm/i915/gem: Drop free_work for GEM contexts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The free_list and worker was introduced in commit 5f09a9c8ab6b ("drm/i915: Allow contexts to be unreferenced locklessly"), but subsequently made redundant by the removal of the last sleeping lock in commit 2935ed5339c4 ("drm/i915: Remove logical HW ID"). As we can now free the GEM context immediately from any context, remove the deferral of the free_list v2: Lift removing the context from the global list into close(). Suggested-by: Mika Kuoppala Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 59 +++---------------- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 - .../gpu/drm/i915/gem/i915_gem_context_types.h | 1 - drivers/gpu/drm/i915/i915_drv.h | 3 - drivers/gpu/drm/i915/i915_gem.c | 2 - .../gpu/drm/i915/selftests/mock_gem_device.c | 2 - 6 files changed, 8 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index a6299da64de4..8493af37eb8a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -333,13 +333,12 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx) return e; } -static void i915_gem_context_free(struct i915_gem_context *ctx) +void i915_gem_context_release(struct kref *ref) { - GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); + struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref); - spin_lock(&ctx->i915->gem.contexts.lock); - list_del(&ctx->link); - spin_unlock(&ctx->i915->gem.contexts.lock); + trace_i915_context_free(ctx); + GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); mutex_destroy(&ctx->engines_mutex); mutex_destroy(&ctx->lut_mutex); @@ -353,37 +352,6 @@ static void i915_gem_context_free(struct i915_gem_context *ctx) kfree_rcu(ctx, rcu); } -static void contexts_free_all(struct llist_node *list) -{ - struct i915_gem_context *ctx, *cn; - - llist_for_each_entry_safe(ctx, cn, list, free_link) - i915_gem_context_free(ctx); -} - -static void contexts_flush_free(struct i915_gem_contexts *gc) -{ - contexts_free_all(llist_del_all(&gc->free_list)); -} - -static void contexts_free_worker(struct work_struct *work) -{ - struct i915_gem_contexts *gc = - container_of(work, typeof(*gc), free_work); - - contexts_flush_free(gc); -} - -void i915_gem_context_release(struct kref *ref) -{ - struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref); - struct i915_gem_contexts *gc = &ctx->i915->gem.contexts; - - trace_i915_context_free(ctx); - if (llist_add(&ctx->free_link, &gc->free_list)) - schedule_work(&gc->free_work); -} - static inline struct i915_gem_engines * __context_engines_static(const struct i915_gem_context *ctx) { @@ -632,6 +600,10 @@ static void context_close(struct i915_gem_context *ctx) */ lut_close(ctx); + spin_lock(&ctx->i915->gem.contexts.lock); + list_del(&ctx->link); + spin_unlock(&ctx->i915->gem.contexts.lock); + mutex_unlock(&ctx->mutex); /* @@ -849,9 +821,6 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags) !HAS_EXECLISTS(i915)) return ERR_PTR(-EINVAL); - /* Reap the stale contexts */ - contexts_flush_free(&i915->gem.contexts); - ctx = __create_context(i915); if (IS_ERR(ctx)) return ctx; @@ -896,9 +865,6 @@ static void init_contexts(struct i915_gem_contexts *gc) { spin_lock_init(&gc->lock); INIT_LIST_HEAD(&gc->list); - - INIT_WORK(&gc->free_work, contexts_free_worker); - init_llist_head(&gc->free_list); } void i915_gem_init__contexts(struct drm_i915_private *i915) @@ -906,12 +872,6 @@ void i915_gem_init__contexts(struct drm_i915_private *i915) init_contexts(&i915->gem.contexts); } -void i915_gem_driver_release__contexts(struct drm_i915_private *i915) -{ - flush_work(&i915->gem.contexts.free_work); - rcu_barrier(); /* and flush the left over RCU frees */ -} - static int gem_context_register(struct i915_gem_context *ctx, struct drm_i915_file_private *fpriv, u32 *id) @@ -985,7 +945,6 @@ int i915_gem_context_open(struct drm_i915_private *i915, void i915_gem_context_close(struct drm_file *file) { struct drm_i915_file_private *file_priv = file->driver_priv; - struct drm_i915_private *i915 = file_priv->dev_priv; struct i915_address_space *vm; struct i915_gem_context *ctx; unsigned long idx; @@ -997,8 +956,6 @@ void i915_gem_context_close(struct drm_file *file) xa_for_each(&file_priv->vm_xa, idx, vm) i915_vm_put(vm); xa_destroy(&file_priv->vm_xa); - - contexts_flush_free(&i915->gem.contexts); } int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index a133f92bbedb..b5c908f3f4f2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -110,7 +110,6 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx) /* i915_gem_context.c */ void i915_gem_init__contexts(struct drm_i915_private *i915); -void i915_gem_driver_release__contexts(struct drm_i915_private *i915); int i915_gem_context_open(struct drm_i915_private *i915, struct drm_file *file); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index ae14ca24a11f..1449f54924e0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -108,7 +108,6 @@ struct i915_gem_context { /** link: place with &drm_i915_private.context_list */ struct list_head link; - struct llist_node free_link; /** * @ref: reference count diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 15be8debae54..1a2a1276dd31 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1172,9 +1172,6 @@ struct drm_i915_private { struct i915_gem_contexts { spinlock_t lock; /* locks list */ struct list_head list; - - struct llist_head free_list; - struct work_struct free_work; } contexts; /* diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 58276694c848..17a4636ee542 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1207,8 +1207,6 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv) void i915_gem_driver_release(struct drm_i915_private *dev_priv) { - i915_gem_driver_release__contexts(dev_priv); - intel_gt_driver_release(&dev_priv->gt); intel_wa_list_free(&dev_priv->gt_wa_list); diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index e946bd2087d8..0188f877cab2 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -64,8 +64,6 @@ static void mock_device_release(struct drm_device *dev) mock_device_flush(i915); intel_gt_driver_remove(&i915->gt); - i915_gem_driver_release__contexts(i915); - i915_gem_drain_workqueue(i915); i915_gem_drain_freed_objects(i915); From patchwork Tue Nov 24 11:42:05 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928131 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.9 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,UNWANTED_LANGUAGE_BODY, URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DCBC2C6379D for ; Tue, 24 Nov 2020 11:42:51 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7DA8D206D8 for ; Tue, 24 Nov 2020 11:42:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DA8D206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 43C646E2D8; Tue, 24 Nov 2020 11:42:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4514E6E252 for ; Tue, 24 Nov 2020 11:42:30 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090922-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:05 +0000 Message-Id: <20201124114219.29020-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/16] drm/i915/gt: Track the overall awake/busy time X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since we wake the GT up before executing a request, and go to sleep as soon as it is retired, the GT wake time not only represents how long the device is powered up, but also provides a summary, albeit an overestimate, of the device runtime. v2: s/busy/awake/ Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/debugfs_gt_pm.c | 5 ++- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 49 ++++++++++++++++++++++++ drivers/gpu/drm/i915/gt/intel_gt_pm.h | 2 + drivers/gpu/drm/i915/gt/intel_gt_types.h | 24 ++++++++++++ drivers/gpu/drm/i915/i915_debugfs.c | 5 ++- drivers/gpu/drm/i915/i915_pmu.c | 6 +++ include/uapi/drm/i915_drm.h | 3 +- 7 files changed, 90 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c b/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c index 174a24553322..8975717ace06 100644 --- a/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/debugfs_gt_pm.c @@ -11,6 +11,7 @@ #include "i915_drv.h" #include "intel_gt.h" #include "intel_gt_clock_utils.h" +#include "intel_gt_pm.h" #include "intel_llc.h" #include "intel_rc6.h" #include "intel_rps.h" @@ -558,7 +559,9 @@ static int rps_boost_show(struct seq_file *m, void *data) seq_printf(m, "RPS enabled? %s\n", yesno(intel_rps_is_enabled(rps))); seq_printf(m, "RPS active? %s\n", yesno(intel_rps_is_active(rps))); - seq_printf(m, "GPU busy? %s\n", yesno(gt->awake)); + seq_printf(m, "GPU busy? %s, %llums\n", + yesno(gt->awake), + ktime_to_ms(intel_gt_get_awake_time(gt))); seq_printf(m, "Boosts outstanding? %d\n", atomic_read(&rps->num_waiters)); seq_printf(m, "Interactive? %d\n", READ_ONCE(rps->power.interactive)); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index 274aa0dd7050..c94e8ac884eb 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -39,6 +39,28 @@ static void user_forcewake(struct intel_gt *gt, bool suspend) intel_gt_pm_put(gt); } +static void runtime_begin(struct intel_gt *gt) +{ + local_irq_disable(); + write_seqcount_begin(>->stats.lock); + gt->stats.start = ktime_get(); + gt->stats.active = true; + write_seqcount_end(>->stats.lock); + local_irq_enable(); +} + +static void runtime_end(struct intel_gt *gt) +{ + local_irq_disable(); + write_seqcount_begin(>->stats.lock); + gt->stats.active = false; + gt->stats.total = + ktime_add(gt->stats.total, + ktime_sub(ktime_get(), gt->stats.start)); + write_seqcount_end(>->stats.lock); + local_irq_enable(); +} + static int __gt_unpark(struct intel_wakeref *wf) { struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref); @@ -67,6 +89,7 @@ static int __gt_unpark(struct intel_wakeref *wf) i915_pmu_gt_unparked(i915); intel_gt_unpark_requests(gt); + runtime_begin(gt); return 0; } @@ -79,6 +102,7 @@ static int __gt_park(struct intel_wakeref *wf) GT_TRACE(gt, "\n"); + runtime_end(gt); intel_gt_park_requests(gt); i915_vma_parked(gt); @@ -106,6 +130,7 @@ static const struct intel_wakeref_ops wf_ops = { void intel_gt_pm_init_early(struct intel_gt *gt) { intel_wakeref_init(>->wakeref, gt->uncore->rpm, &wf_ops); + seqcount_mutex_init(>->stats.lock, >->wakeref.mutex); } void intel_gt_pm_init(struct intel_gt *gt) @@ -339,6 +364,30 @@ int intel_gt_runtime_resume(struct intel_gt *gt) return intel_uc_runtime_resume(>->uc); } +static ktime_t __intel_gt_get_awake_time(const struct intel_gt *gt) +{ + ktime_t total = gt->stats.total; + + if (gt->stats.active) + total = ktime_add(total, + ktime_sub(ktime_get(), gt->stats.start)); + + return total; +} + +ktime_t intel_gt_get_awake_time(const struct intel_gt *gt) +{ + unsigned int seq; + ktime_t total; + + do { + seq = read_seqcount_begin(>->stats.lock); + total = __intel_gt_get_awake_time(gt); + } while (read_seqcount_retry(>->stats.lock, seq)); + + return total; +} + #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftest_gt_pm.c" #endif diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h index 60f0e2fbe55c..63846a856e7e 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h @@ -58,6 +58,8 @@ int intel_gt_resume(struct intel_gt *gt); void intel_gt_runtime_suspend(struct intel_gt *gt); int intel_gt_runtime_resume(struct intel_gt *gt); +ktime_t intel_gt_get_awake_time(const struct intel_gt *gt); + static inline bool is_mock_gt(const struct intel_gt *gt) { return I915_SELFTEST_ONLY(gt->awake == -ENODEV); diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index 6d39a4a11bf3..c7bde529feab 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -87,6 +87,30 @@ struct intel_gt { u32 pm_guc_events; + struct { + bool active; + + /** + * @lock: Lock protecting the below fields. + */ + seqcount_mutex_t lock; + + /** + * @total: Total time this engine was busy. + * + * Accumulated time not counting the most recent block in cases + * where engine is currently busy (active > 0). + */ + ktime_t total; + + /** + * @start: Timestamp of the last idle to active transition. + * + * Idle is defined as active == 0, active is active > 0. + */ + ktime_t start; + } stats; + struct intel_engine_cs *engine[I915_NUM_ENGINES]; struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1] [MAX_ENGINE_INSTANCE + 1]; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 263074c2c097..f29487ea4528 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1314,9 +1314,10 @@ static int i915_engine_info(struct seq_file *m, void *unused) wakeref = intel_runtime_pm_get(&i915->runtime_pm); - seq_printf(m, "GT awake? %s [%d]\n", + seq_printf(m, "GT awake? %s [%d], %llums\n", yesno(i915->gt.awake), - atomic_read(&i915->gt.wakeref.count)); + atomic_read(&i915->gt.wakeref.count), + ktime_to_ms(intel_gt_get_awake_time(&i915->gt))); seq_printf(m, "CS timestamp frequency: %u Hz\n", RUNTIME_INFO(i915)->cs_timestamp_frequency_hz); diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c index cd786ad12be7..57cd55ceeac6 100644 --- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -488,6 +488,8 @@ config_status(struct drm_i915_private *i915, u64 config) if (!HAS_RC6(i915)) return -ENODEV; break; + case I915_PMU_GT_AWAKE: + break; default: return -ENOENT; } @@ -595,6 +597,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event) case I915_PMU_RC6_RESIDENCY: val = get_rc6(&i915->gt); break; + case I915_PMU_GT_AWAKE: + val = ktime_to_ns(intel_gt_get_awake_time(&i915->gt)); + break; } } @@ -898,6 +903,7 @@ create_event_attributes(struct i915_pmu *pmu) __event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"), __event(I915_PMU_INTERRUPTS, "interrupts", NULL), __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"), + __event(I915_PMU_GT_AWAKE, "awake", "ns"), }; static const struct { enum drm_i915_pmu_engine_sample sample; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index fa1f3d62f9a6..e3b372eaf232 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -177,8 +177,9 @@ enum drm_i915_pmu_engine_sample { #define I915_PMU_REQUESTED_FREQUENCY __I915_PMU_OTHER(1) #define I915_PMU_INTERRUPTS __I915_PMU_OTHER(2) #define I915_PMU_RC6_RESIDENCY __I915_PMU_OTHER(3) +#define I915_PMU_GT_AWAKE __I915_PMU_OTHER(4) -#define I915_PMU_LAST I915_PMU_RC6_RESIDENCY +#define I915_PMU_LAST I915_PMU_GT_AWAKE /* Each region is a minimum of 16k, and there are at most 255 of them. */ From patchwork Tue Nov 24 11:42:06 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928123 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 927C9C2D0E4 for ; Tue, 24 Nov 2020 11:42:45 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2F1F92075A for ; Tue, 24 Nov 2020 11:42:45 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2F1F92075A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A8E1F6E24E; Tue, 24 Nov 2020 11:42:44 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 218456E233 for ; Tue, 24 Nov 2020 11:42:28 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090923-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:06 +0000 Message-Id: <20201124114219.29020-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/16] drm/i915/gt: Protect context lifetime with RCU X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Allow a brief period for continued access to a dead intel_context by deferring the release of the struct until after an RCU grace period. As we are using a dedicated slab cache for the contexts, we can defer the release of the slab pages via RCU, with the caveat that individual structs may be reused from the freelist within an RCU grace period. To handle that, we have to avoid clearing members of the zombie struct. This is required for a later patch to handle locking around virtual requests in the signaler, as those requests may want to move between engines and be destroyed while we are holding b->irq_lock on a physical engine. v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we don't need to reset the debug code, at the loss of having the mutex debug code spot us attempting to destroy a locked mutex. v3: As the intended use will remain strongly referenced counted, with very little inflight access across reuse, drop the ctor. v4: Drop the unrequired change to remove the temporary reference around dropping the active context, and add back some more missing ctor operations. v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced later] maybe accessed under RCU and so needs special care not to be reinitialised. v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_context.c | 12 +++++++++--- drivers/gpu/drm/i915/gt/intel_context_types.h | 11 ++++++++++- 2 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 92a3f25c4006..d3a835212167 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -25,11 +25,18 @@ static struct intel_context *intel_context_alloc(void) return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL); } -void intel_context_free(struct intel_context *ce) +static void rcu_context_free(struct rcu_head *rcu) { + struct intel_context *ce = container_of(rcu, typeof(*ce), rcu); + kmem_cache_free(global.slab_ce, ce); } +void intel_context_free(struct intel_context *ce) +{ + call_rcu(&ce->rcu, rcu_context_free); +} + struct intel_context * intel_context_create(struct intel_engine_cs *engine) { @@ -356,8 +363,7 @@ static int __intel_context_active(struct i915_active *active) } void -intel_context_init(struct intel_context *ce, - struct intel_engine_cs *engine) +intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) { GEM_BUG_ON(!engine->cops); GEM_BUG_ON(!engine->gt->vm); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 552cb57a2e8c..20cb5835d1c3 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -44,7 +44,16 @@ struct intel_context_ops { }; struct intel_context { - struct kref ref; + /* + * Note: Some fields may be accessed under RCU. + * + * Unless otherwise noted a field can safely be assumed to be protected + * by strong reference counting. + */ + union { + struct kref ref; /* no kref_get_unless_zero()! */ + struct rcu_head rcu; + }; struct intel_engine_cs *engine; struct intel_engine_cs *inflight; From patchwork Tue Nov 24 11:42:07 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928125 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A81FC63798 for ; Tue, 24 Nov 2020 11:42:48 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D5985206D8 for ; Tue, 24 Nov 2020 11:42:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D5985206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 89BF36E2C0; Tue, 24 Nov 2020 11:42:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 314A66E24E for ; Tue, 24 Nov 2020 11:42:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090924-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:07 +0000 Message-Id: <20201124114219.29020-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/16] drm/i915/gt: Split the breadcrumb spinlock between global and contexts X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" As we funnel more and more contexts into the breadcrumbs on an engine, the hold time of b->irq_lock grows. As we may then contend with the b->irq_lock during request submission, this increases the burden upon the engine->active.lock and so directly impacts both our execution latency and client latency. If we split the b->irq_lock by introducing a per-context spinlock to manage the signalers within a context, we then only need the b->irq_lock for enabling/disabling the interrupt and can avoid taking the lock for walking the list of contexts within the signal worker. Even with the current setup, this greatly reduces the number of times we have to take and fight for b->irq_lock. Furthermore, this closes the race between enabling the signaling context while it is in the process of being signaled and removed: <4>[ 416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870). <4>[ 416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70 <4>[ 416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G U 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.208627] RIP: 0010:__list_add_valid+0x4d/0x70 <4>[ 416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8 <4>[ 416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086 <4>[ 416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105 <4>[ 416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff <4>[ 416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8 <4>[ 416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910 <4>[ 416.208669] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.208671] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.208675] PKRU: 55555554 <4>[ 416.208677] Call Trace: <4>[ 416.208679] <4>[ 416.208751] i915_request_enable_breadcrumb+0x278/0x400 [i915] <4>[ 416.208839] __i915_request_submit+0xca/0x2a0 [i915] <4>[ 416.208892] __execlists_submission_tasklet+0x480/0x1830 [i915] <4>[ 416.208942] execlists_submission_tasklet+0xc4/0x130 [i915] <4>[ 416.208947] tasklet_action_common.isra.17+0x6c/0x1c0 <4>[ 416.208954] __do_softirq+0xdf/0x498 <4>[ 416.208960] ? handle_fasteoi_irq+0x150/0x150 <4>[ 416.208964] asm_call_on_stack+0xf/0x20 <4>[ 416.208966] <4>[ 416.208969] do_softirq_own_stack+0xa1/0xc0 <4>[ 416.208972] irq_exit_rcu+0xb5/0xc0 <4>[ 416.208976] common_interrupt+0xf7/0x260 <4>[ 416.208980] asm_common_interrupt+0x1e/0x40 <4>[ 416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410 <4>[ 416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48 <4>[ 416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206 <4>[ 416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000 <4>[ 416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f <4>[ 416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000 <4>[ 416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f <4>[ 416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003 <4>[ 416.209012] cpuidle_enter+0x24/0x40 <4>[ 416.209016] do_idle+0x22f/0x2d0 <4>[ 416.209022] cpu_startup_entry+0x14/0x20 <4>[ 416.209025] start_secondary+0x158/0x1a0 <4>[ 416.209030] secondary_startup_64+0xa4/0xb0 <4>[ 416.209039] irq event stamp: 10186977 <4>[ 416.209042] hardirqs last enabled at (10186976): [] tasklet_action_common.isra.17+0xe3/0x1c0 <4>[ 416.209044] hardirqs last disabled at (10186977): [] _raw_spin_lock_irqsave+0xd/0x50 <4>[ 416.209047] softirqs last enabled at (10186968): [] irq_enter_rcu+0x6a/0x70 <4>[ 416.209049] softirqs last disabled at (10186969): [] asm_call_on_stack+0xf/0x20 <4>[ 416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100) <4>[ 416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915] <4>[ 416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G U W 5.8.0-CI-CI_DRM_8852+ #1 <4>[ 416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019 <4>[ 416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90 <4>[ 416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b <4>[ 416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086 <4>[ 416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104 <4>[ 416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff <4>[ 416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001 <4>[ 416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18 <4>[ 416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880 <4>[ 416.209317] FS: 0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000 <4>[ 416.209317] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0 <4>[ 416.209317] PKRU: 55555554 <4>[ 416.209317] Call Trace: <4>[ 416.209317] <4>[ 416.209317] remove_signaling_context.isra.13+0xd/0x70 [i915] <4>[ 416.209513] signal_irq_work+0x1f7/0x4b0 [i915] This is caused by virtual engines where although we take the breadcrumb lock on each of the active engines, they may be different engines on different requests, It turns out that the b->irq_lock was not a sufficient proxy for the engine->active.lock in the case of more than one request, so introduce an explicit lock around ce->signals. v2: ce->signal_lock is acquired with only RCU protection and so must be treated carefully and not cleared during reallocation. We also then need to confirm that the ce we lock is the same as we found in the breadcrumb list. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276 Fixes: f94343d0a622 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs") Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 168 ++++++++---------- .../gpu/drm/i915/gt/intel_breadcrumbs_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_context.c | 3 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 12 +- drivers/gpu/drm/i915/i915_request.h | 6 +- 5 files changed, 90 insertions(+), 105 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index cf6e05ea4d8f..a24cc1ff08a0 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -101,18 +101,37 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) intel_gt_pm_put_async(b->irq_engine->gt); } +static void intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b) +{ + spin_lock(&b->irq_lock); + if (b->irq_armed) + __intel_breadcrumbs_disarm_irq(b); + spin_unlock(&b->irq_lock); +} + static void add_signaling_context(struct intel_breadcrumbs *b, struct intel_context *ce) { - intel_context_get(ce); - list_add_tail(&ce->signal_link, &b->signalers); + lockdep_assert_held(&ce->signal_lock); + + spin_lock(&b->signalers_lock); + list_add_rcu(&ce->signal_link, &b->signalers); + spin_unlock(&b->signalers_lock); } -static void remove_signaling_context(struct intel_breadcrumbs *b, +static bool remove_signaling_context(struct intel_breadcrumbs *b, struct intel_context *ce) { - list_del(&ce->signal_link); - intel_context_put(ce); + lockdep_assert_held(&ce->signal_lock); + + if (!list_empty(&ce->signals)) + return false; + + spin_lock(&b->signalers_lock); + list_del_rcu(&ce->signal_link); + spin_unlock(&b->signalers_lock); + + return true; } static inline bool __request_completed(const struct i915_request *rq) @@ -175,6 +194,8 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl) static bool __signal_request(struct i915_request *rq) { + GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); + if (!__dma_fence_signal(&rq->fence)) { i915_request_put(rq); return false; @@ -195,15 +216,12 @@ static void signal_irq_work(struct irq_work *work) struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work); const ktime_t timestamp = ktime_get(); struct llist_node *signal, *sn; - struct intel_context *ce, *cn; - struct list_head *pos, *next; + struct intel_context *ce; signal = NULL; if (unlikely(!llist_empty(&b->signaled_requests))) signal = llist_del_all(&b->signaled_requests); - spin_lock(&b->irq_lock); - /* * Keep the irq armed until the interrupt after all listeners are gone. * @@ -229,47 +247,44 @@ static void signal_irq_work(struct irq_work *work) * interrupt draw less ire from other users of the system and tools * like powertop. */ - if (!signal && b->irq_armed && list_empty(&b->signalers)) - __intel_breadcrumbs_disarm_irq(b); + if (!signal && READ_ONCE(b->irq_armed) && list_empty(&b->signalers)) + intel_breadcrumbs_disarm_irq(b); - list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) { - GEM_BUG_ON(list_empty(&ce->signals)); + rcu_read_lock(); + list_for_each_entry_rcu(ce, &b->signalers, signal_link) { + struct i915_request *rq; - list_for_each_safe(pos, next, &ce->signals) { - struct i915_request *rq = - list_entry(pos, typeof(*rq), signal_link); + list_for_each_entry_rcu(rq, &ce->signals, signal_link) { + bool release; - GEM_BUG_ON(!check_signal_order(ce, rq)); if (!__request_completed(rq)) break; + if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, + &rq->fence.flags)) + break; + /* * Queue for execution after dropping the signaling * spinlock as the callback chain may end up adding * more signalers to the same context or engine. */ - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); + spin_lock(&ce->signal_lock); + list_del_rcu(&rq->signal_link); + release = remove_signaling_context(b, ce); + spin_unlock(&ce->signal_lock); + if (__signal_request(rq)) /* We own signal_node now, xfer to local list */ signal = slist_add(&rq->signal_node, signal); - } - /* - * We process the list deletion in bulk, only using a list_add - * (not list_move) above but keeping the status of - * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit. - */ - if (!list_is_first(pos, &ce->signals)) { - /* Advance the list to the first incomplete request */ - __list_del_many(&ce->signals, pos); - if (&ce->signals == pos) { /* now empty */ + if (release) { add_retire(b, ce->timeline); - remove_signaling_context(b, ce); + intel_context_put(ce); } } } - - spin_unlock(&b->irq_lock); + rcu_read_unlock(); llist_for_each_safe(signal, sn, signal) { struct i915_request *rq = @@ -298,14 +313,15 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine) if (!b) return NULL; - spin_lock_init(&b->irq_lock); + b->irq_engine = irq_engine; + + spin_lock_init(&b->signalers_lock); INIT_LIST_HEAD(&b->signalers); init_llist_head(&b->signaled_requests); + spin_lock_init(&b->irq_lock); init_irq_work(&b->irq_work, signal_irq_work); - b->irq_engine = irq_engine; - return b; } @@ -347,9 +363,9 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b) kfree(b); } -static void insert_breadcrumb(struct i915_request *rq, - struct intel_breadcrumbs *b) +static void insert_breadcrumb(struct i915_request *rq) { + struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; struct intel_context *ce = rq->context; struct list_head *pos; @@ -371,6 +387,7 @@ static void insert_breadcrumb(struct i915_request *rq, } if (list_empty(&ce->signals)) { + intel_context_get(ce); add_signaling_context(b, ce); pos = &ce->signals; } else { @@ -396,8 +413,9 @@ static void insert_breadcrumb(struct i915_request *rq, break; } } - list_add(&rq->signal_link, pos); + list_add_rcu(&rq->signal_link, pos); GEM_BUG_ON(!check_signal_order(ce, rq)); + GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags)); set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); /* @@ -410,7 +428,7 @@ static void insert_breadcrumb(struct i915_request *rq, bool i915_request_enable_breadcrumb(struct i915_request *rq) { - struct intel_breadcrumbs *b; + struct intel_context *ce = rq->context; /* Serialises with i915_request_retire() using rq->lock */ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags)) @@ -425,67 +443,30 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq) if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) return true; - /* - * rq->engine is locked by rq->engine->active.lock. That however - * is not known until after rq->engine has been dereferenced and - * the lock acquired. Hence we acquire the lock and then validate - * that rq->engine still matches the lock we hold for it. - * - * Here, we are using the breadcrumb lock as a proxy for the - * rq->engine->active.lock, and we know that since the breadcrumb - * will be serialised within i915_request_submit/i915_request_unsubmit, - * the engine cannot change while active as long as we hold the - * breadcrumb lock on that engine. - * - * From the dma_fence_enable_signaling() path, we are outside of the - * request submit/unsubmit path, and so we must be more careful to - * acquire the right lock. - */ - b = READ_ONCE(rq->engine)->breadcrumbs; - spin_lock(&b->irq_lock); - while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) { - spin_unlock(&b->irq_lock); - b = READ_ONCE(rq->engine)->breadcrumbs; - spin_lock(&b->irq_lock); - } - - /* - * Now that we are finally serialised with request submit/unsubmit, - * [with b->irq_lock] and with i915_request_retire() [via checking - * SIGNALED with rq->lock] confirm the request is indeed active. If - * it is no longer active, the breadcrumb will be attached upon - * i915_request_submit(). - */ + spin_lock(&ce->signal_lock); if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) - insert_breadcrumb(rq, b); - - spin_unlock(&b->irq_lock); + insert_breadcrumb(rq); + spin_unlock(&ce->signal_lock); return true; } void i915_request_cancel_breadcrumb(struct i915_request *rq) { - struct intel_breadcrumbs *b = rq->engine->breadcrumbs; + struct intel_context *ce = rq->context; + bool release; - /* - * We must wait for b->irq_lock so that we know the interrupt handler - * has released its reference to the intel_context and has completed - * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if - * required). - */ - spin_lock(&b->irq_lock); - if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) { - struct intel_context *ce = rq->context; + if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) + return; - list_del(&rq->signal_link); - if (list_empty(&ce->signals)) - remove_signaling_context(b, ce); + spin_lock(&ce->signal_lock); + list_del_rcu(&rq->signal_link); + release = remove_signaling_context(rq->engine->breadcrumbs, ce); + spin_unlock(&ce->signal_lock); + if (release) + intel_context_put(ce); - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); - i915_request_put(rq); - } - spin_unlock(&b->irq_lock); + i915_request_put(rq); } static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p) @@ -495,18 +476,17 @@ static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p) drm_printf(p, "Signals:\n"); - spin_lock_irq(&b->irq_lock); - list_for_each_entry(ce, &b->signalers, signal_link) { - list_for_each_entry(rq, &ce->signals, signal_link) { + rcu_read_lock(); + list_for_each_entry_rcu(ce, &b->signalers, signal_link) { + list_for_each_entry_rcu(rq, &ce->signals, signal_link) drm_printf(p, "\t[%llx:%llx%s] @ %dms\n", rq->fence.context, rq->fence.seqno, i915_request_completed(rq) ? "!" : i915_request_started(rq) ? "*" : "", jiffies_to_msecs(jiffies - rq->emitted_jiffies)); - } } - spin_unlock_irq(&b->irq_lock); + rcu_read_unlock(); } void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine, diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index 3fa19820b37a..a74bb3062bd8 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -29,18 +29,16 @@ * the overhead of waking that client is much preferred. */ struct intel_breadcrumbs { - spinlock_t irq_lock; /* protects the lists used in hardirq context */ - /* Not all breadcrumbs are attached to physical HW */ struct intel_engine_cs *irq_engine; + spinlock_t signalers_lock; /* protects the list of signalers */ struct list_head signalers; struct llist_head signaled_requests; + spinlock_t irq_lock; /* protects the interrupt from hardirq context */ struct irq_work irq_work; /* for use from inside irq_lock */ - unsigned int irq_enabled; - bool irq_armed; }; diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index d3a835212167..349e7fa1488d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -379,7 +379,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine) ce->vm = i915_vm_get(engine->gt->vm); - INIT_LIST_HEAD(&ce->signal_link); + /* NB ce->signal_link/lock is used under RCU */ + spin_lock_init(&ce->signal_lock); INIT_LIST_HEAD(&ce->signals); mutex_init(&ce->pin_mutex); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 20cb5835d1c3..52fa9c132746 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -25,6 +25,7 @@ DECLARE_EWMA(runtime, 3, 8); struct i915_gem_context; struct i915_gem_ww_ctx; struct i915_vma; +struct intel_breadcrumbs; struct intel_context; struct intel_ring; @@ -63,8 +64,15 @@ struct intel_context { struct i915_address_space *vm; struct i915_gem_context __rcu *gem_context; - struct list_head signal_link; - struct list_head signals; + /* + * @signal_lock protects the list of requests that need signaling, + * @signals. While there are any requests that need signaling, + * we add the context to the breadcrumbs worker, and remove it + * upon completion/cancellation of the last request. + */ + struct list_head signal_link; /* Accessed under RCU */ + struct list_head signals; /* Guarded by signal_lock */ + spinlock_t signal_lock; /* protects signals, the list of requests */ struct i915_vma *state; struct intel_ring *ring; diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index b222f7b46e9c..92e4320c50c4 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -178,10 +178,8 @@ struct i915_request { struct intel_ring *ring; struct intel_timeline __rcu *timeline; - union { - struct list_head signal_link; - struct llist_node signal_node; - }; + struct list_head signal_link; + struct llist_node signal_node; /* * The rcu epoch of when this request was allocated. Used to judiciously From patchwork Tue Nov 24 11:42:08 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928127 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE85EC56201 for ; Tue, 24 Nov 2020 11:42:47 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 943852075A for ; Tue, 24 Nov 2020 11:42:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 943852075A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 80F876E270; Tue, 24 Nov 2020 11:42:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2EFAD6E24E for ; Tue, 24 Nov 2020 11:42:30 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090925-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:08 +0000 Message-Id: <20201124114219.29020-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/16] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" If while we are cancelling the breadcrumb signaling, we find that the request is already completed, move it to the irq signaler and let it be signaled. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 20 ++++++++++++++++---- 1 file changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index a24cc1ff08a0..f5f6feed0fa6 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -363,6 +363,14 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b) kfree(b); } +static void irq_signal_request(struct i915_request *rq, + struct intel_breadcrumbs *b) +{ + if (__signal_request(rq) && + llist_add(&rq->signal_node, &b->signaled_requests)) + irq_work_queue(&b->irq_work); +} + static void insert_breadcrumb(struct i915_request *rq) { struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; @@ -380,9 +388,7 @@ static void insert_breadcrumb(struct i915_request *rq) * its signal completion. */ if (__request_completed(rq)) { - if (__signal_request(rq) && - llist_add(&rq->signal_node, &b->signaled_requests)) - irq_work_queue(&b->irq_work); + irq_signal_request(rq, b); return; } @@ -453,6 +459,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq) void i915_request_cancel_breadcrumb(struct i915_request *rq) { + struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; struct intel_context *ce = rq->context; bool release; @@ -461,11 +468,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) spin_lock(&ce->signal_lock); list_del_rcu(&rq->signal_link); - release = remove_signaling_context(rq->engine->breadcrumbs, ce); + release = remove_signaling_context(b, ce); spin_unlock(&ce->signal_lock); if (release) intel_context_put(ce); + if (__request_completed(rq)) { + irq_signal_request(rq, b); + return; + } + i915_request_put(rq); } From patchwork Tue Nov 24 11:42:09 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928119 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 207B3C2D0E4 for ; Tue, 24 Nov 2020 11:42:41 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id B02E3206D8 for ; Tue, 24 Nov 2020 11:42:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B02E3206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id AE25E6E28A; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2C2426E249 for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090927-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:09 +0000 Message-Id: <20201124114219.29020-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/16] drm/i915/gt: Decouple completed requests on unwind X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since the introduction of preempt-to-busy, requests can complete in the background, even while they are not on the engine->active.requests list. As such, the engine->active.request list itself is not in strict retirement order, and we have to scan the entire list while unwinding to not miss any. However, if the request is completed we currently leave it on the list [until retirement], but we could just as simply remove it and stop treating it as active. We would only have to then traverse it once while unwinding in quick succession. Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++-- drivers/gpu/drm/i915/i915_request.c | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 30aa59fb7271..cf11cbac241b 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1116,8 +1116,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) list_for_each_entry_safe_reverse(rq, rn, &engine->active.requests, sched.link) { - if (i915_request_completed(rq)) - continue; /* XXX */ + if (i915_request_completed(rq)) { + list_del_init(&rq->sched.link); + continue; + } __i915_request_unsubmit(rq); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 8d7d29c9e375..a9db1376b996 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -321,7 +321,8 @@ bool i915_request_retire(struct i915_request *rq) * after removing the breadcrumb and signaling it, so that we do not * inadvertently attach the breadcrumb to a completed request. */ - remove_from_engine(rq); + if (!list_empty(&rq->sched.link)) + remove_from_engine(rq); GEM_BUG_ON(!llist_empty(&rq->execute_cb)); __list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */ From patchwork Tue Nov 24 11:42:10 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928121 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C40EEC64E7A for ; Tue, 24 Nov 2020 11:42:46 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 45809206D8 for ; Tue, 24 Nov 2020 11:42:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 45809206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 58DBE6E28B; Tue, 24 Nov 2020 11:42:45 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 035596E233 for ; Tue, 24 Nov 2020 11:42:29 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090928-1500050 for multiple; Tue, 24 Nov 2020 11:42:21 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:10 +0000 Message-Id: <20201124114219.29020-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/16] drm/i915/gt: Check for a completed last request once X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Pull the repeated check for the last active request being completed to a single spot, when deciding whether or not execlist preemption is required. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 15 ++++----------- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index cf11cbac241b..43703efb36d1 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2141,12 +2141,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine) */ if ((last = *active)) { - if (need_preempt(engine, last, rb)) { - if (i915_request_completed(last)) { - tasklet_hi_schedule(&execlists->tasklet); - return; - } - + if (i915_request_completed(last)) { + goto check_secondary; + } else if (need_preempt(engine, last, rb)) { ENGINE_TRACE(engine, "preempting last=%llx:%lld, prio=%d, hint=%d\n", last->fence.context, @@ -2174,11 +2171,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) last = NULL; } else if (need_timeslice(engine, last, rb) && timeslice_expired(execlists, last)) { - if (i915_request_completed(last)) { - tasklet_hi_schedule(&execlists->tasklet); - return; - } - ENGINE_TRACE(engine, "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", last->fence.context, @@ -2214,6 +2206,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * we hopefully coalesce several updates into a single * submission. */ +check_secondary: if (!list_is_last(&last->sched.link, &engine->active.requests)) { /* From patchwork Tue Nov 24 11:42:11 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928135 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74E52C63798 for ; Tue, 24 Nov 2020 11:42:54 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 13B56206D8 for ; Tue, 24 Nov 2020 11:42:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 13B56206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 89ACA6E233; Tue, 24 Nov 2020 11:42:53 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6CA396E2E1 for ; Tue, 24 Nov 2020 11:42:52 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090929-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:11 +0000 Message-Id: <20201124114219.29020-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/16] drm/i915/gt: Replace direct submit with direct call to tasklet X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Rather than having special case code for opportunistically calling process_csb() and performing a direct submit while holding the engine spinlock for submitting the request, simply call the tasklet directly. This allows us to retain the direct submission path, including the CS draining to allow fast/immediate submissions, without requiring any duplicated code paths, and most importantly greatly simplifying the control flow by removing reentrancy. This will enable us to close a few races in the virtual engines in the next few patches. The trickiest part here is to ensure that paired operations (such as schedule_in/schedule_out) remain under consistent locking domains, e.g. when pulled outside of the engine->active.lock v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous"). v3: Update engine-reset to be tasklet aware Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 35 +++-- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 3 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 134 +++++++----------- drivers/gpu/drm/i915/gt/intel_reset.c | 60 +++++--- drivers/gpu/drm/i915/gt/intel_reset.h | 2 + drivers/gpu/drm/i915/gt/selftest_context.c | 2 +- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 7 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 27 ++-- drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +- drivers/gpu/drm/i915/i915_request.c | 12 +- drivers/gpu/drm/i915/i915_request.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 4 - drivers/gpu/drm/i915/selftests/i915_request.c | 6 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 3 + 15 files changed, 158 insertions(+), 148 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index d4e988b2816a..2ed03b88ec12 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -1002,32 +1002,39 @@ static unsigned long stop_timeout(const struct intel_engine_cs *engine) return READ_ONCE(engine->props.stop_timeout_ms); } -int intel_engine_stop_cs(struct intel_engine_cs *engine) +static int __intel_engine_stop_cs(struct intel_engine_cs *engine, + int fast_timeout_us, + int slow_timeout_ms) { struct intel_uncore *uncore = engine->uncore; - const u32 base = engine->mmio_base; - const i915_reg_t mode = RING_MI_MODE(base); + const i915_reg_t mode = RING_MI_MODE(engine->mmio_base); int err; + intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING)); + err = __intel_wait_for_register_fw(engine->uncore, mode, + MODE_IDLE, MODE_IDLE, + fast_timeout_us, + slow_timeout_ms, + NULL); + + /* A final mmio read to let GPU writes be hopefully flushed to memory */ + intel_uncore_posting_read_fw(uncore, mode); + return err; +} + +int intel_engine_stop_cs(struct intel_engine_cs *engine) +{ + int err = 0; + if (INTEL_GEN(engine->i915) < 3) return -ENODEV; ENGINE_TRACE(engine, "\n"); - - intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING)); - - err = 0; - if (__intel_wait_for_register_fw(uncore, - mode, MODE_IDLE, MODE_IDLE, - 1000, stop_timeout(engine), - NULL)) { + if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) { ENGINE_TRACE(engine, "timed out on STOP_RING -> IDLE\n"); err = -ETIMEDOUT; } - /* A final mmio read to let GPU writes be hopefully flushed to memory */ - intel_uncore_posting_read_fw(uncore, mode); - return err; } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 499b09cb4acf..99574378047f 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -136,7 +136,7 @@ __queue_and_release_pm(struct i915_request *rq, list_add_tail(&tl->link, &timelines->active_list); /* Hand the request over to HW and so engine_retire() */ - __i915_request_queue(rq, NULL); + __i915_request_queue_bh(rq); /* Let new submissions commence (and maybe retire this timeline) */ __intel_wakeref_defer_park(&engine->wakeref); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index ee6312601c56..e71eef157231 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -183,7 +183,8 @@ struct intel_engine_execlists { * Reserve the upper 16b for tracking internal errors. */ u32 error_interrupt; -#define ERROR_CSB BIT(31) +#define ERROR_CSB BIT(31) +#define ERROR_PREEMPT BIT(30) /** * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 43703efb36d1..49a80f940e73 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1378,8 +1378,7 @@ __execlists_schedule_in(struct i915_request *rq) return engine; } -static inline struct i915_request * -execlists_schedule_in(struct i915_request *rq, int idx) +static inline void execlists_schedule_in(struct i915_request *rq, int idx) { struct intel_context * const ce = rq->context; struct intel_engine_cs *old; @@ -1396,7 +1395,6 @@ execlists_schedule_in(struct i915_request *rq, int idx) } while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old))); GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); - return i915_request_get(rq); } static void kick_siblings(struct i915_request *rq, struct intel_context *ce) @@ -2071,8 +2069,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct intel_engine_execlists * const execlists = &engine->execlists; struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; - struct i915_request * const *active; - struct i915_request *last; + struct i915_request *last = *execlists->active; struct rb_node *rb; bool submit = false; @@ -2098,6 +2095,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * and context switches) submission. */ + spin_lock(&engine->active.lock); + for (rb = rb_first_cached(&execlists->virtual); rb; ) { struct virtual_engine *ve = rb_entry(rb, typeof(*ve), nodes[engine->id].rb); @@ -2125,10 +2124,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * the active context to interject the preemption request, * i.e. we will retrigger preemption following the ack in case * of trouble. - */ - active = READ_ONCE(execlists->active); - - /* + * * In theory we can skip over completed contexts that have not * yet been processed by events (as those events are in flight): * @@ -2140,7 +2136,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * completed and barf. */ - if ((last = *active)) { + if (last) { if (i915_request_completed(last)) { goto check_secondary; } else if (need_preempt(engine, last, rb)) { @@ -2213,6 +2209,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * Even if ELSP[1] is occupied and not worthy * of timeslices, our queue might be. */ + spin_unlock(&engine->active.lock); start_timeslice(engine, queue_prio(execlists)); return; } @@ -2248,6 +2245,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last && !can_merge_rq(last, rq)) { spin_unlock(&ve->base.active.lock); + spin_unlock(&engine->active.lock); start_timeslice(engine, rq_prio(rq)); return; /* leave this for another sibling */ } @@ -2365,8 +2363,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (__i915_request_submit(rq)) { if (!merge) { - *port = execlists_schedule_in(last, port - execlists->pending); - port++; + *port++ = i915_request_get(last); last = NULL; } @@ -2385,8 +2382,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine) rb_erase_cached(&p->node, &execlists->queue); i915_priolist_free(p); } - done: + *port++ = i915_request_get(last); + /* * Here be a bit of magic! Or sleight-of-hand, whichever you prefer. * @@ -2404,36 +2402,43 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * interrupt for secondary ports). */ execlists->queue_priority_hint = queue_prio(execlists); + spin_unlock(&engine->active.lock); if (submit) { - *port = execlists_schedule_in(last, port - execlists->pending); - execlists->switch_priority_hint = - switch_prio(engine, *execlists->pending); - /* * Skip if we ended up with exactly the same set of requests, * e.g. trying to timeslice a pair of ordered contexts */ - if (!memcmp(active, execlists->pending, - (port - execlists->pending + 1) * sizeof(*port))) { - do - execlists_schedule_out(fetch_and_zero(port)); - while (port-- != execlists->pending); - + if (!memcmp(execlists->active, + execlists->pending, + (port - execlists->pending) * sizeof(*port))) goto skip_submit; - } - clear_ports(port + 1, last_port - port); + + *port = NULL; + while (port-- != execlists->pending) + execlists_schedule_in(*port, port - execlists->pending); + + execlists->switch_priority_hint = + switch_prio(engine, *execlists->pending); WRITE_ONCE(execlists->yield, -1); - set_preempt_timeout(engine, *active); + set_preempt_timeout(engine, *execlists->active); execlists_submit_ports(engine); } else { start_timeslice(engine, execlists->queue_priority_hint); skip_submit: ring_set_paused(engine, 0); + *execlists->pending = NULL; } } +static void execlists_dequeue_irq(struct intel_engine_cs *engine) +{ + local_irq_disable(); /* Suspend interrupts across request submission */ + execlists_dequeue(engine); + local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ +} + static void cancel_port_requests(struct intel_engine_execlists * const execlists) { @@ -2771,16 +2776,6 @@ static void process_csb(struct intel_engine_cs *engine) invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); } -static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) -{ - lockdep_assert_held(&engine->active.lock); - if (!READ_ONCE(engine->execlists.pending[0])) { - rcu_read_lock(); /* protect peeking at execlists->active */ - execlists_dequeue(engine); - rcu_read_unlock(); - } -} - static void __execlists_hold(struct i915_request *rq) { LIST_HEAD(list); @@ -3169,7 +3164,7 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) if (!timer_expired(t)) return false; - return READ_ONCE(engine->execlists.pending[0]); + return engine->execlists.pending[0]; } /* @@ -3179,10 +3174,12 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) static void execlists_submission_tasklet(unsigned long data) { struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; - bool timeout = preempt_timeout(engine); process_csb(engine); + if (unlikely(preempt_timeout(engine))) + engine->execlists.error_interrupt |= ERROR_PREEMPT; + if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) { const char *msg; @@ -3191,6 +3188,8 @@ static void execlists_submission_tasklet(unsigned long data) msg = "CS error"; /* thrown by a user payload */ else if (engine->execlists.error_interrupt & ERROR_CSB) msg = "invalid CSB event"; + else if (engine->execlists.error_interrupt & ERROR_PREEMPT) + msg = "preemption time out"; else msg = "internal error"; @@ -3198,17 +3197,8 @@ static void execlists_submission_tasklet(unsigned long data) execlists_reset(engine, msg); } - if (!READ_ONCE(engine->execlists.pending[0]) || timeout) { - unsigned long flags; - - spin_lock_irqsave(&engine->active.lock, flags); - __execlists_submission_tasklet(engine); - spin_unlock_irqrestore(&engine->active.lock, flags); - - /* Recheck after serialising with direct-submission */ - if (unlikely(timeout && preempt_timeout(engine))) - execlists_reset(engine, "preemption time out"); - } + if (!engine->execlists.pending[0]) + execlists_dequeue_irq(engine); } static void __execlists_kick(struct intel_engine_execlists *execlists) @@ -3239,26 +3229,16 @@ static void queue_request(struct intel_engine_cs *engine, set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); } -static void __submit_queue_imm(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists * const execlists = &engine->execlists; - - if (reset_in_progress(execlists)) - return; /* defer until we restart the engine following reset */ - - __execlists_submission_tasklet(engine); -} - -static void submit_queue(struct intel_engine_cs *engine, +static bool submit_queue(struct intel_engine_cs *engine, const struct i915_request *rq) { struct intel_engine_execlists *execlists = &engine->execlists; if (rq_prio(rq) <= execlists->queue_priority_hint) - return; + return false; execlists->queue_priority_hint = rq_prio(rq); - __submit_queue_imm(engine); + return true; } static bool ancestor_on_hold(const struct intel_engine_cs *engine, @@ -3268,25 +3248,11 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine, return !list_empty(&engine->active.hold) && hold_request(rq); } -static void flush_csb(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists *el = &engine->execlists; - - if (READ_ONCE(el->pending[0]) && tasklet_trylock(&el->tasklet)) { - if (!reset_in_progress(el)) - process_csb(engine); - tasklet_unlock(&el->tasklet); - } -} - static void execlists_submit_request(struct i915_request *request) { struct intel_engine_cs *engine = request->engine; unsigned long flags; - /* Hopefully we clear execlists->pending[] to let us through */ - flush_csb(engine); - /* Will be called from irq-context when using foreign fences. */ spin_lock_irqsave(&engine->active.lock, flags); @@ -3300,7 +3266,8 @@ static void execlists_submit_request(struct i915_request *request) GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); GEM_BUG_ON(list_empty(&request->sched.link)); - submit_queue(engine, request); + if (submit_queue(engine, request)) + __execlists_kick(&engine->execlists); } spin_unlock_irqrestore(&engine->active.lock, flags); @@ -4214,7 +4181,6 @@ static int execlists_resume(struct intel_engine_cs *engine) static void execlists_reset_prepare(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - unsigned long flags; ENGINE_TRACE(engine, "depth<-%d\n", atomic_read(&execlists->tasklet.count)); @@ -4231,10 +4197,6 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) __tasklet_disable_sync_once(&execlists->tasklet); GEM_BUG_ON(!reset_in_progress(execlists)); - /* And flush any current direct submission. */ - spin_lock_irqsave(&engine->active.lock, flags); - spin_unlock_irqrestore(&engine->active.lock, flags); - /* * We stop engines, otherwise we might get failed reset and a * dead gpu (on elk). Also as modern gpu as kbl can suffer @@ -4479,12 +4441,12 @@ static void execlists_reset_finish(struct intel_engine_cs *engine) * to sleep before we restart and reload a context. */ GEM_BUG_ON(!reset_in_progress(execlists)); - if (!RB_EMPTY_ROOT(&execlists->queue.rb_root)) - execlists->tasklet.func(execlists->tasklet.data); + GEM_BUG_ON(engine->execlists.pending[0]); + /* And kick in case we missed a new request submission. */ if (__tasklet_enable(&execlists->tasklet)) - /* And kick in case we missed a new request submission. */ - tasklet_hi_schedule(&execlists->tasklet); + __execlists_kick(execlists); + ENGINE_TRACE(engine, "depth->%d\n", atomic_read(&execlists->tasklet.count)); } diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 3654c955e6be..77997114b551 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -40,20 +40,19 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr) intel_uncore_rmw_fw(uncore, reg, clr, 0); } -static void engine_skip_context(struct i915_request *rq) +static void skip_context(struct i915_request *rq) { - struct intel_engine_cs *engine = rq->engine; struct intel_context *hung_ctx = rq->context; - if (!i915_request_is_active(rq)) - return; + list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) { + if (!i915_request_is_active(rq)) + return; - lockdep_assert_held(&engine->active.lock); - list_for_each_entry_continue(rq, &engine->active.requests, sched.link) if (rq->context == hung_ctx) { i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); } + } } static void client_mark_guilty(struct i915_gem_context *ctx, bool banned) @@ -160,7 +159,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty) i915_request_set_error_once(rq, -EIO); __i915_request_skip(rq); if (mark_guilty(rq)) - engine_skip_context(rq); + skip_context(rq); } else { i915_request_set_error_once(rq, -EAGAIN); mark_innocent(rq); @@ -754,8 +753,10 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask) if (err) return err; + local_bh_disable(); for_each_engine(engine, gt, id) __intel_engine_reset(engine, stalled_mask & engine->mask); + local_bh_enable(); intel_ggtt_restore_fences(gt->ggtt); @@ -833,9 +834,11 @@ static void __intel_gt_set_wedged(struct intel_gt *gt) set_bit(I915_WEDGED, >->reset.flags); /* Mark all executing requests as skipped */ + local_bh_disable(); for_each_engine(engine, gt, id) if (engine->reset.cancel) engine->reset.cancel(engine); + local_bh_enable(); reset_finish(gt, awake); @@ -1110,20 +1113,7 @@ static inline int intel_gt_reset_engine(struct intel_engine_cs *engine) return __intel_gt_reset(engine->gt, engine->mask); } -/** - * intel_engine_reset - reset GPU engine to recover from a hang - * @engine: engine to reset - * @msg: reason for GPU reset; or NULL for no drm_notice() - * - * Reset a specific GPU engine. Useful if a hang is detected. - * Returns zero on successful reset or otherwise an error code. - * - * Procedure is: - * - identifies the request that caused the hang and it is dropped - * - reset engine (which will force the engine to idle) - * - re-init/configure engine - */ -int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) +int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg) { struct intel_gt *gt = engine->gt; bool uses_guc = intel_engine_in_guc_submission_mode(engine); @@ -1174,6 +1164,30 @@ int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) return ret; } +/** + * intel_engine_reset - reset GPU engine to recover from a hang + * @engine: engine to reset + * @msg: reason for GPU reset; or NULL for no drm_notice() + * + * Reset a specific GPU engine. Useful if a hang is detected. + * Returns zero on successful reset or otherwise an error code. + * + * Procedure is: + * - identifies the request that caused the hang and it is dropped + * - reset engine (which will force the engine to idle) + * - re-init/configure engine + */ +int intel_engine_reset(struct intel_engine_cs *engine, const char *msg) +{ + int err; + + local_bh_disable(); + err = __intel_engine_reset_bh(engine, msg); + local_bh_enable(); + + return err; +} + static void intel_gt_reset_global(struct intel_gt *gt, u32 engine_mask, const char *reason) @@ -1260,18 +1274,20 @@ void intel_gt_handle_error(struct intel_gt *gt, * single reset fails. */ if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) { + local_bh_disable(); for_each_engine_masked(engine, gt, engine_mask, tmp) { BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)) continue; - if (intel_engine_reset(engine, msg) == 0) + if (__intel_engine_reset_bh(engine, msg) == 0) engine_mask &= ~engine->mask; clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); } + local_bh_enable(); } if (!engine_mask) diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h index a0eec7c11c0c..7dbf5cc8a333 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.h +++ b/drivers/gpu/drm/i915/gt/intel_reset.h @@ -34,6 +34,8 @@ void intel_gt_reset(struct intel_gt *gt, const char *reason); int intel_engine_reset(struct intel_engine_cs *engine, const char *reason); +int __intel_engine_reset_bh(struct intel_engine_cs *engine, + const char *reason); void __i915_request_reset(struct i915_request *rq, bool guilty); diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c index 1f4020e906a8..db738d400168 100644 --- a/drivers/gpu/drm/i915/gt/selftest_context.c +++ b/drivers/gpu/drm/i915/gt/selftest_context.c @@ -25,7 +25,7 @@ static int request_sync(struct i915_request *rq) /* Opencode i915_request_add() so we can keep the timeline locked. */ __i915_request_commit(rq); rq->sched.attr.priority = I915_PRIORITY_BARRIER; - __i915_request_queue(rq, NULL); + __i915_request_queue_bh(rq); timeout = i915_request_wait(rq, 0, HZ / 10); if (timeout < 0) diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index fb5ebf930ab2..c28d1fcad673 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1576,12 +1576,17 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine, engine->name, mode, p->name); tasklet_disable(t); + if (strcmp(p->name, "softirq")) + local_bh_disable(); p->critical_section_begin(); - err = intel_engine_reset(engine, NULL); + err = __intel_engine_reset_bh(engine, NULL); p->critical_section_end(); + if (strcmp(p->name, "softirq")) + local_bh_enable(); tasklet_enable(t); + tasklet_hi_schedule(t); if (err) pr_err("i915_reset_engine(%s:%s) failed under %s\n", diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 95d41c01d0e0..37cb51c3f4f6 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -623,8 +623,10 @@ static int live_hold_reset(void *arg) /* We have our request executing, now remove it and reset */ + local_bh_disable(); if (test_and_set_bit(I915_RESET_ENGINE + id, >->reset.flags)) { + local_bh_enable(); intel_gt_set_wedged(gt); err = -EBUSY; goto out; @@ -638,12 +640,13 @@ static int live_hold_reset(void *arg) execlists_hold(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); - intel_engine_reset(engine, NULL); + __intel_engine_reset_bh(engine, NULL); GEM_BUG_ON(rq->fence.error != -EIO); tasklet_enable(&engine->execlists.tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + id, >->reset.flags); + local_bh_enable(); /* Check that we do not resubmit the held request */ if (!i915_request_wait(rq, 0, HZ / 5)) { @@ -4569,8 +4572,10 @@ static int reset_virtual_engine(struct intel_gt *gt, GEM_BUG_ON(engine == ve->engine); /* Take ownership of the reset and tasklet */ + local_bh_disable(); if (test_and_set_bit(I915_RESET_ENGINE + engine->id, >->reset.flags)) { + local_bh_enable(); intel_gt_set_wedged(gt); err = -EBUSY; goto out_heartbeat; @@ -4590,12 +4595,13 @@ static int reset_virtual_engine(struct intel_gt *gt, execlists_hold(engine, rq); GEM_BUG_ON(!i915_request_on_hold(rq)); - intel_engine_reset(engine, NULL); + __intel_engine_reset_bh(engine, NULL); GEM_BUG_ON(rq->fence.error != -EIO); /* Release our grasp on the engine, letting CS flow again */ tasklet_enable(&engine->execlists.tasklet); clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, >->reset.flags); + local_bh_enable(); /* Check that we do not resubmit the held request */ i915_request_get(rq); @@ -6242,16 +6248,17 @@ static void garbage_reset(struct intel_engine_cs *engine, const unsigned int bit = I915_RESET_ENGINE + engine->id; unsigned long *lock = &engine->gt->reset.flags; - if (test_and_set_bit(bit, lock)) - return; - - tasklet_disable(&engine->execlists.tasklet); + local_bh_disable(); + if (!test_and_set_bit(bit, lock)) { + tasklet_disable(&engine->execlists.tasklet); - if (!rq->fence.error) - intel_engine_reset(engine, NULL); + if (!rq->fence.error) + __intel_engine_reset_bh(engine, NULL); - tasklet_enable(&engine->execlists.tasklet); - clear_and_wake_up_bit(bit, lock); + tasklet_enable(&engine->execlists.tasklet); + clear_and_wake_up_bit(bit, lock); + } + local_bh_enable(); } static struct i915_request *garbage(struct intel_context *ce, diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c index ef5aeebbeeb0..4dbd5bc840c3 100644 --- a/drivers/gpu/drm/i915/gt/selftest_reset.c +++ b/drivers/gpu/drm/i915/gt/selftest_reset.c @@ -326,11 +326,16 @@ static int igt_atomic_engine_reset(void *arg) for (p = igt_atomic_phases; p->name; p++) { GEM_TRACE("intel_engine_reset(%s) under %s\n", engine->name, p->name); + if (strcmp(p->name, "softirq")) + local_bh_disable(); p->critical_section_begin(); - err = intel_engine_reset(engine, NULL); + err = __intel_engine_reset_bh(engine, NULL); p->critical_section_end(); + if (strcmp(p->name, "softirq")) + local_bh_enable(); + if (err) { pr_err("intel_engine_reset(%s) failed under %s\n", engine->name, p->name); @@ -340,6 +345,7 @@ static int igt_atomic_engine_reset(void *arg) intel_engine_pm_put(engine); tasklet_enable(&engine->execlists.tasklet); + tasklet_hi_schedule(&engine->execlists.tasklet); if (err) break; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index a9db1376b996..e4dad3aa69ff 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -1583,6 +1583,12 @@ struct i915_request *__i915_request_commit(struct i915_request *rq) return __i915_request_add_to_timeline(rq); } +void __i915_request_queue_bh(struct i915_request *rq) +{ + i915_sw_fence_commit(&rq->semaphore); + i915_sw_fence_commit(&rq->submit); +} + void __i915_request_queue(struct i915_request *rq, const struct i915_sched_attr *attr) { @@ -1599,8 +1605,10 @@ void __i915_request_queue(struct i915_request *rq, */ if (attr && rq->engine->schedule) rq->engine->schedule(rq, attr); - i915_sw_fence_commit(&rq->semaphore); - i915_sw_fence_commit(&rq->submit); + + local_bh_disable(); + __i915_request_queue_bh(rq); + local_bh_enable(); /* kick tasklets */ } void i915_request_add(struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index 92e4320c50c4..c528ab33c9bd 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -315,6 +315,7 @@ void __i915_request_skip(struct i915_request *rq); struct i915_request *__i915_request_commit(struct i915_request *request); void __i915_request_queue(struct i915_request *rq, const struct i915_sched_attr *attr); +void __i915_request_queue_bh(struct i915_request *rq); bool i915_request_retire(struct i915_request *rq); void i915_request_retire_upto(struct i915_request *rq); diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index b9cf9931ebd7..318e359bf5c3 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -458,14 +458,10 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node, if (!dep) return -ENOMEM; - local_bh_disable(); - if (!__i915_sched_node_add_dependency(node, signal, dep, flags | I915_DEPENDENCY_ALLOC)) i915_dependency_free(dep); - local_bh_enable(); /* kick submission tasklet */ - return 0; } diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index e424a6d1a68c..b8c5920d1ff3 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -1932,9 +1932,7 @@ static int measure_inter_request(struct intel_context *ce) intel_ring_advance(rq, cs); i915_request_add(rq); } - local_bh_disable(); i915_sw_fence_commit(submit); - local_bh_enable(); intel_engine_flush_submission(ce->engine); heap_fence_put(submit); @@ -2220,11 +2218,9 @@ static int measure_completion(struct intel_context *ce) intel_ring_advance(rq, cs); dma_fence_add_callback(&rq->fence, &cb.base, signal_cb); - - local_bh_disable(); i915_request_add(rq); - local_bh_enable(); + intel_engine_flush_submission(ce->engine); if (wait_for(READ_ONCE(sema[i]) == -1, 50)) { err = -EIO; goto err; diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index ec0ecb4e4ca6..e09ce8067b9c 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -219,6 +219,9 @@ void igt_spinner_fini(struct igt_spinner *spin) bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq) { + if (i915_request_is_ready(rq)) + intel_engine_flush_submission(rq->engine); + return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq), rq->fence.seqno), 100) && From patchwork Tue Nov 24 11:42:12 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928129 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF156C56202 for ; Tue, 24 Nov 2020 11:42:48 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 83E69206D8 for ; Tue, 24 Nov 2020 11:42:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 83E69206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 2F0996E2C8; Tue, 24 Nov 2020 11:42:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1F99F6E241 for ; Tue, 24 Nov 2020 11:42:29 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090931-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:12 +0000 Message-Id: <20201124114219.29020-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/16] drm/i915/gt: ce->inflight updates are now serialised X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Since schedule-in and schedule-out are now both always under the tasklet bitlock, we can reduce the individual atomic operations to simple instructions and worry less. This notably eliminates the race observed with intel_context_inflight in __engine_unpark(). Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2583 Signed-off-by: Chris Wilson Reviewed-by: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 52 ++++++++++++++--------------- 1 file changed, 25 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 49a80f940e73..3a6a9539b54e 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1358,11 +1358,11 @@ __execlists_schedule_in(struct i915_request *rq) ce->lrc.ccid = ce->tag; } else { /* We don't need a strict matching tag, just different values */ - unsigned int tag = ffs(READ_ONCE(engine->context_tag)); + unsigned int tag = __ffs(engine->context_tag); - GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG); - clear_bit(tag - 1, &engine->context_tag); - ce->lrc.ccid = tag << (GEN11_SW_CTX_ID_SHIFT - 32); + GEM_BUG_ON(tag >= BITS_PER_LONG); + __clear_bit(tag, &engine->context_tag); + ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32); BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID); } @@ -1375,6 +1375,8 @@ __execlists_schedule_in(struct i915_request *rq) execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); intel_engine_context_in(engine); + CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid); + return engine; } @@ -1386,13 +1388,10 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine)); trace_i915_request_in(rq, idx); - old = READ_ONCE(ce->inflight); - do { - if (!old) { - WRITE_ONCE(ce->inflight, __execlists_schedule_in(rq)); - break; - } - } while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old))); + old = ce->inflight; + if (!old) + old = __execlists_schedule_in(rq); + WRITE_ONCE(ce->inflight, ptr_inc(old)); GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); } @@ -1406,12 +1405,11 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) tasklet_hi_schedule(&ve->base.execlists.tasklet); } -static inline void -__execlists_schedule_out(struct i915_request *rq, - struct intel_engine_cs * const engine, - unsigned int ccid) +static inline void __execlists_schedule_out(struct i915_request *rq) { struct intel_context * const ce = rq->context; + struct intel_engine_cs * const engine = rq->engine; + unsigned int ccid; /* * NB process_csb() is not under the engine->active.lock and hence @@ -1419,6 +1417,8 @@ __execlists_schedule_out(struct i915_request *rq, * refrain from doing non-trivial work here. */ + CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid); + if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM)) execlists_check_context(ce, engine, "after"); @@ -1430,12 +1430,13 @@ __execlists_schedule_out(struct i915_request *rq, i915_request_completed(rq)) intel_engine_add_retire(engine, ce->timeline); + ccid = ce->lrc.ccid; ccid >>= GEN11_SW_CTX_ID_SHIFT - 32; ccid &= GEN12_MAX_CONTEXT_HW_ID; if (ccid < BITS_PER_LONG) { GEM_BUG_ON(ccid == 0); GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag)); - set_bit(ccid - 1, &engine->context_tag); + __set_bit(ccid - 1, &engine->context_tag); } intel_context_update_runtime(ce); @@ -1456,26 +1457,23 @@ __execlists_schedule_out(struct i915_request *rq, */ if (ce->engine != engine) kick_siblings(rq, ce); - - intel_context_put(ce); } static inline void execlists_schedule_out(struct i915_request *rq) { struct intel_context * const ce = rq->context; - struct intel_engine_cs *cur, *old; - u32 ccid; trace_i915_request_out(rq); - ccid = rq->context->lrc.ccid; - old = READ_ONCE(ce->inflight); - do - cur = ptr_unmask_bits(old, 2) ? ptr_dec(old) : NULL; - while (!try_cmpxchg(&ce->inflight, &old, cur)); - if (!cur) - __execlists_schedule_out(rq, old, ccid); + GEM_BUG_ON(!ce->inflight); + ce->inflight = ptr_dec(ce->inflight); + if (!intel_context_inflight_count(ce)) { + GEM_BUG_ON(ce->inflight != rq->engine); + __execlists_schedule_out(rq); + WRITE_ONCE(ce->inflight, NULL); + intel_context_put(ce); + } i915_request_put(rq); } From patchwork Tue Nov 24 11:42:13 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B81CC6379D for ; Tue, 24 Nov 2020 11:42:39 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD0A8206D8 for ; Tue, 24 Nov 2020 11:42:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD0A8206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8D4396E288; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 281FE6E249 for ; Tue, 24 Nov 2020 11:42:30 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090932-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:13 +0000 Message-Id: <20201124114219.29020-10-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/16] drm/i915/gt: Use virtual_engine during execlists_dequeue X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Rather than going back and forth between the rb_node entry and the virtual_engine type, store the ve local and reuse it. As the container_of conversion from rb_node to virtual_engine requires a variable offset, performing that conversion just once shaves off a bit of code. v2: Keep a single virtual engine lookup, for typical use. Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 239 ++++++++++++---------------- 1 file changed, 105 insertions(+), 134 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 3a6a9539b54e..a4f6e446f188 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -454,9 +454,15 @@ static int queue_prio(const struct intel_engine_execlists *execlists) return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used); } +static int virtual_prio(const struct intel_engine_execlists *el) +{ + struct rb_node *rb = rb_first_cached(&el->virtual); + + return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN; +} + static inline bool need_preempt(const struct intel_engine_cs *engine, - const struct i915_request *rq, - struct rb_node *rb) + const struct i915_request *rq) { int last_prio; @@ -493,25 +499,6 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, rq_prio(list_next_entry(rq, sched.link)) > last_prio) return true; - if (rb) { - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - bool preempt = false; - - if (engine == ve->siblings[0]) { /* only preempt one sibling */ - struct i915_request *next; - - rcu_read_lock(); - next = READ_ONCE(ve->request); - if (next) - preempt = rq_prio(next) > last_prio; - rcu_read_unlock(); - } - - if (preempt) - return preempt; - } - /* * If the inflight context did not trigger the preemption, then maybe * it was the set of queued requests? Pick the highest priority in @@ -522,7 +509,8 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same * context, it's priority would not exceed ELSP[0] aka last_prio. */ - return queue_prio(&engine->execlists) > last_prio; + return max(virtual_prio(&engine->execlists), + queue_prio(&engine->execlists)) > last_prio; } __maybe_unused static inline bool @@ -1808,6 +1796,35 @@ static bool virtual_matches(const struct virtual_engine *ve, return true; } +static struct virtual_engine * +first_virtual_engine(struct intel_engine_cs *engine) +{ + struct intel_engine_execlists *el = &engine->execlists; + struct rb_node *rb = rb_first_cached(&el->virtual); + + while (rb) { + struct virtual_engine *ve = + rb_entry(rb, typeof(*ve), nodes[engine->id].rb); + struct i915_request *rq = READ_ONCE(ve->request); + + /* lazily cleanup after another engine handled rq */ + if (!rq) { + rb_erase_cached(rb, &el->virtual); + RB_CLEAR_NODE(rb); + rb = rb_first_cached(&el->virtual); + continue; + } + + if (!virtual_matches(ve, rq, engine)) { + rb = rb_next(rb); + continue; + } + return ve; + } + + return NULL; +} + static void virtual_xfer_context(struct virtual_engine *ve, struct intel_engine_cs *engine) { @@ -1896,32 +1913,15 @@ static void defer_active(struct intel_engine_cs *engine) static bool need_timeslice(const struct intel_engine_cs *engine, - const struct i915_request *rq, - const struct rb_node *rb) + const struct i915_request *rq) { int hint; if (!intel_engine_has_timeslices(engine)) return false; - hint = engine->execlists.queue_priority_hint; - - if (rb) { - const struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - const struct intel_engine_cs *inflight = - intel_context_inflight(&ve->context); - - if (!inflight || inflight == engine) { - struct i915_request *next; - - rcu_read_lock(); - next = READ_ONCE(ve->request); - if (next) - hint = max(hint, rq_prio(next)); - rcu_read_unlock(); - } - } + hint = max(engine->execlists.queue_priority_hint, + virtual_prio(&engine->execlists)); if (!list_is_last(&rq->sched.link, &engine->active.requests)) hint = max(hint, rq_prio(list_next_entry(rq, sched.link))); @@ -2068,6 +2068,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) struct i915_request **port = execlists->pending; struct i915_request ** const last_port = port + execlists->port_mask; struct i915_request *last = *execlists->active; + struct virtual_engine *ve; struct rb_node *rb; bool submit = false; @@ -2095,26 +2096,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) spin_lock(&engine->active.lock); - for (rb = rb_first_cached(&execlists->virtual); rb; ) { - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); - struct i915_request *rq = READ_ONCE(ve->request); - - if (!rq) { /* lazily cleanup after another engine handled rq */ - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); - rb = rb_first_cached(&execlists->virtual); - continue; - } - - if (!virtual_matches(ve, rq, engine)) { - rb = rb_next(rb); - continue; - } - - break; - } - /* * If the queue is higher priority than the last * request in the currently active context, submit afresh. @@ -2137,7 +2118,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (last) { if (i915_request_completed(last)) { goto check_secondary; - } else if (need_preempt(engine, last, rb)) { + } else if (need_preempt(engine, last)) { ENGINE_TRACE(engine, "preempting last=%llx:%lld, prio=%d, hint=%d\n", last->fence.context, @@ -2163,7 +2144,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) __unwind_incomplete_requests(engine); last = NULL; - } else if (need_timeslice(engine, last, rb) && + } else if (need_timeslice(engine, last) && timeslice_expired(execlists, last)) { ENGINE_TRACE(engine, "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n", @@ -2214,96 +2195,86 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } } - while (rb) { /* XXX virtual is always taking precedence */ - struct virtual_engine *ve = - rb_entry(rb, typeof(*ve), nodes[engine->id].rb); + /* XXX virtual is always taking precedence */ + while ((ve = first_virtual_engine(engine))) { struct i915_request *rq; spin_lock(&ve->base.active.lock); rq = ve->request; - if (unlikely(!rq)) { /* lost the race to a sibling */ - spin_unlock(&ve->base.active.lock); - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); - rb = rb_first_cached(&execlists->virtual); - continue; - } + if (unlikely(!rq)) /* lost the race to a sibling */ + goto unlock; - GEM_BUG_ON(rq != ve->request); GEM_BUG_ON(rq->engine != &ve->base); GEM_BUG_ON(rq->context != &ve->context); - if (rq_prio(rq) >= queue_prio(execlists)) { - if (!virtual_matches(ve, rq, engine)) { - spin_unlock(&ve->base.active.lock); - rb = rb_next(rb); - continue; - } + if (unlikely(rq_prio(rq) < queue_prio(execlists))) { + spin_unlock(&ve->base.active.lock); + break; + } - if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.active.lock); - spin_unlock(&engine->active.lock); - start_timeslice(engine, rq_prio(rq)); - return; /* leave this for another sibling */ - } + GEM_BUG_ON(!virtual_matches(ve, rq, engine)); - ENGINE_TRACE(engine, - "virtual rq=%llx:%lld%s, new engine? %s\n", - rq->fence.context, - rq->fence.seqno, - i915_request_completed(rq) ? "!" : - i915_request_started(rq) ? "*" : - "", - yesno(engine != ve->siblings[0])); - - WRITE_ONCE(ve->request, NULL); - WRITE_ONCE(ve->base.execlists.queue_priority_hint, - INT_MIN); - rb_erase_cached(rb, &execlists->virtual); - RB_CLEAR_NODE(rb); + if (last && !can_merge_rq(last, rq)) { + spin_unlock(&ve->base.active.lock); + spin_unlock(&engine->active.lock); + start_timeslice(engine, rq_prio(rq)); + return; /* leave this for another sibling */ + } - GEM_BUG_ON(!(rq->execution_mask & engine->mask)); - WRITE_ONCE(rq->engine, engine); + ENGINE_TRACE(engine, + "virtual rq=%llx:%lld%s, new engine? %s\n", + rq->fence.context, + rq->fence.seqno, + i915_request_completed(rq) ? "!" : + i915_request_started(rq) ? "*" : + "", + yesno(engine != ve->siblings[0])); - if (__i915_request_submit(rq)) { - /* - * Only after we confirm that we will submit - * this request (i.e. it has not already - * completed), do we want to update the context. - * - * This serves two purposes. It avoids - * unnecessary work if we are resubmitting an - * already completed request after timeslicing. - * But more importantly, it prevents us altering - * ve->siblings[] on an idle context, where - * we may be using ve->siblings[] in - * virtual_context_enter / virtual_context_exit. - */ - virtual_xfer_context(ve, engine); - GEM_BUG_ON(ve->siblings[0] != engine); + WRITE_ONCE(ve->request, NULL); + WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN); - submit = true; - last = rq; - } - i915_request_put(rq); + rb = &ve->nodes[engine->id].rb; + rb_erase_cached(rb, &execlists->virtual); + RB_CLEAR_NODE(rb); + + GEM_BUG_ON(!(rq->execution_mask & engine->mask)); + WRITE_ONCE(rq->engine, engine); + if (__i915_request_submit(rq)) { /* - * Hmm, we have a bunch of virtual engine requests, - * but the first one was already completed (thanks - * preempt-to-busy!). Keep looking at the veng queue - * until we have no more relevant requests (i.e. - * the normal submit queue has higher priority). + * Only after we confirm that we will submit + * this request (i.e. it has not already + * completed), do we want to update the context. + * + * This serves two purposes. It avoids + * unnecessary work if we are resubmitting an + * already completed request after timeslicing. + * But more importantly, it prevents us altering + * ve->siblings[] on an idle context, where + * we may be using ve->siblings[] in + * virtual_context_enter / virtual_context_exit. */ - if (!submit) { - spin_unlock(&ve->base.active.lock); - rb = rb_first_cached(&execlists->virtual); - continue; - } + virtual_xfer_context(ve, engine); + GEM_BUG_ON(ve->siblings[0] != engine); + + submit = true; + last = rq; } + i915_request_put(rq); +unlock: spin_unlock(&ve->base.active.lock); - break; + + /* + * Hmm, we have a bunch of virtual engine requests, + * but the first one was already completed (thanks + * preempt-to-busy!). Keep looking at the veng queue + * until we have no more relevant requests (i.e. + * the normal submit queue has higher priority). + */ + if (submit) + break; } while ((rb = rb_first_cached(&execlists->queue))) { From patchwork Tue Nov 24 11:42:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA2E4C56202 for ; Tue, 24 Nov 2020 11:42:37 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 945A0206D8 for ; Tue, 24 Nov 2020 11:42:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 945A0206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4E4D26E25A; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 367EE6E24E for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090933-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:14 +0000 Message-Id: <20201124114219.29020-11-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/16] drm/i915/gt: Decouple inflight virtual engines X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Once a virtual engine has been bound to a sibling, it will remain bound until we finally schedule out the last active request. We can not rebind the context to a new sibling while it is inflight as the context save will conflict, hence we wait. As we cannot then use any other sibliing while the context is inflight, only kick the bound sibling while it inflight and upon scheduling out the kick the rest (so that we can swap engines on timeslicing if the previously bound engine becomes oversubscribed). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index a4f6e446f188..575a5b7d5324 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1387,9 +1387,8 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); - struct i915_request *next = READ_ONCE(ve->request); - if (next == rq || (next && next->execution_mask & ~rq->execution_mask)) + if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); } @@ -1808,17 +1807,13 @@ first_virtual_engine(struct intel_engine_cs *engine) struct i915_request *rq = READ_ONCE(ve->request); /* lazily cleanup after another engine handled rq */ - if (!rq) { + if (!rq || !virtual_matches(ve, rq, engine)) { rb_erase_cached(rb, &el->virtual); RB_CLEAR_NODE(rb); rb = rb_first_cached(&el->virtual); continue; } - if (!virtual_matches(ve, rq, engine)) { - rb = rb_next(rb); - continue; - } return ve; } @@ -5591,7 +5586,6 @@ static void virtual_submission_tasklet(unsigned long data) if (unlikely(!mask)) return; - local_irq_disable(); for (n = 0; n < ve->num_siblings; n++) { struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]); struct ve_node * const node = &ve->nodes[sibling->id]; @@ -5601,20 +5595,19 @@ static void virtual_submission_tasklet(unsigned long data) if (!READ_ONCE(ve->request)) break; /* already handled by a sibling's tasklet */ + spin_lock_irq(&sibling->active.lock); + if (unlikely(!(mask & sibling->mask))) { if (!RB_EMPTY_NODE(&node->rb)) { - spin_lock(&sibling->active.lock); rb_erase_cached(&node->rb, &sibling->execlists.virtual); RB_CLEAR_NODE(&node->rb); - spin_unlock(&sibling->active.lock); } - continue; - } - spin_lock(&sibling->active.lock); + goto unlock_engine; + } - if (!RB_EMPTY_NODE(&node->rb)) { + if (unlikely(!RB_EMPTY_NODE(&node->rb))) { /* * Cheat and avoid rebalancing the tree if we can * reuse this node in situ. @@ -5654,9 +5647,12 @@ static void virtual_submission_tasklet(unsigned long data) if (first && prio > sibling->execlists.queue_priority_hint) tasklet_hi_schedule(&sibling->execlists.tasklet); - spin_unlock(&sibling->active.lock); +unlock_engine: + spin_unlock_irq(&sibling->active.lock); + + if (intel_context_inflight(&ve->context)) + break; } - local_irq_enable(); } static void virtual_submit_request(struct i915_request *rq) From patchwork Tue Nov 24 11:42:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A99CBC63798 for ; Tue, 24 Nov 2020 11:42:35 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 27976206D8 for ; Tue, 24 Nov 2020 11:42:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 27976206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D2DAC6E249; Tue, 24 Nov 2020 11:42:32 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 14A896E23D for ; Tue, 24 Nov 2020 11:42:29 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090934-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:15 +0000 Message-Id: <20201124114219.29020-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/16] drm/i915/gt: Defer schedule_out until after the next dequeue X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Inside schedule_out, we do extra work upon idling the context, such as updating the runtime, kicking off retires, kicking virtual engines. However, if we are in a series of processing single requests per contexts, we may find ourselves scheduling out the context, only to immediately schedule it back in during dequeue. This is just extra work that we can avoid if we keep the context marked as inflight across the dequeue. This becomes more significant later on for minimising virtual engine misses. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_context_types.h | 4 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 123 ++++++++++++------ 2 files changed, 82 insertions(+), 45 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 52fa9c132746..10830eeab0f7 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -58,8 +58,8 @@ struct intel_context { struct intel_engine_cs *engine; struct intel_engine_cs *inflight; -#define intel_context_inflight(ce) ptr_mask_bits(READ_ONCE((ce)->inflight), 2) -#define intel_context_inflight_count(ce) ptr_unmask_bits(READ_ONCE((ce)->inflight), 2) +#define intel_context_inflight(ce) ptr_mask_bits(READ_ONCE((ce)->inflight), 3) +#define intel_context_inflight_count(ce) ptr_unmask_bits(READ_ONCE((ce)->inflight), 3) struct i915_address_space *vm; struct i915_gem_context __rcu *gem_context; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 575a5b7d5324..56e899739dd7 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2044,19 +2044,6 @@ static void set_preempt_timeout(struct intel_engine_cs *engine, active_preempt_timeout(engine, rq)); } -static inline void clear_ports(struct i915_request **ports, int count) -{ - memset_p((void **)ports, NULL, count); -} - -static inline void -copy_ports(struct i915_request **dst, struct i915_request **src, int count) -{ - /* A memcpy_p() would be very useful here! */ - while (count--) - WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ -} - static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -2392,6 +2379,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine) start_timeslice(engine, execlists->queue_priority_hint); skip_submit: ring_set_paused(engine, 0); + while (port-- != execlists->pending) + i915_request_put(*port); *execlists->pending = NULL; } } @@ -2403,22 +2392,38 @@ static void execlists_dequeue_irq(struct intel_engine_cs *engine) local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */ } -static void -cancel_port_requests(struct intel_engine_execlists * const execlists) +static inline void clear_ports(struct i915_request **ports, int count) +{ + memset_p((void **)ports, NULL, count); +} + +static inline void +copy_ports(struct i915_request **dst, struct i915_request **src, int count) +{ + /* A memcpy_p() would be very useful here! */ + while (count--) + WRITE_ONCE(*dst++, *src++); /* avoid write tearing */ +} + +static struct i915_request ** +cancel_port_requests(struct intel_engine_execlists * const execlists, + struct i915_request **inactive) { struct i915_request * const *port; for (port = execlists->pending; *port; port++) - execlists_schedule_out(*port); + *inactive++ = *port; clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending)); /* Mark the end of active before we overwrite *active */ for (port = xchg(&execlists->active, execlists->pending); *port; port++) - execlists_schedule_out(*port); + *inactive++ = *port; clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight)); smp_wmb(); /* complete the seqlock for execlists_active() */ WRITE_ONCE(execlists->active, execlists->inflight); + + return inactive; } static inline void @@ -2546,7 +2551,8 @@ csb_read(const struct intel_engine_cs *engine, u64 * const csb) return entry; } -static void process_csb(struct intel_engine_cs *engine) +static struct i915_request ** +process_csb(struct intel_engine_cs *engine, struct i915_request **inactive) { struct intel_engine_execlists * const execlists = &engine->execlists; u64 * const buf = execlists->csb_status; @@ -2575,7 +2581,7 @@ static void process_csb(struct intel_engine_cs *engine) head = execlists->csb_head; tail = READ_ONCE(*execlists->csb_write); if (unlikely(head == tail)) - return; + return inactive; /* * We will consume all events from HW, or at least pretend to. @@ -2655,7 +2661,7 @@ static void process_csb(struct intel_engine_cs *engine) /* cancel old inflight, prepare for switch */ trace_ports(execlists, "preempted", old); while (*old) - execlists_schedule_out(*old++); + *inactive++ = *old++; /* switch pending to inflight */ GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); @@ -2717,7 +2723,7 @@ static void process_csb(struct intel_engine_cs *engine) regs[CTX_RING_TAIL]); } - execlists_schedule_out(*execlists->active++); + *inactive++ = *execlists->active++; GEM_BUG_ON(execlists->active - execlists->inflight > execlists_num_ports(execlists)); @@ -2738,6 +2744,15 @@ static void process_csb(struct intel_engine_cs *engine) * invalidation before. */ invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); + + return inactive; +} + +static void post_process_csb(struct i915_request **port, + struct i915_request **last) +{ + while (port != last) + execlists_schedule_out(*port++); } static void __execlists_hold(struct i915_request *rq) @@ -3010,8 +3025,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid) for (port = el->active; (rq = *port); port++) { if (rq->context->lrc.ccid == ccid) { ENGINE_TRACE(engine, - "ccid found at active:%zd\n", - port - el->active); + "ccid:%x found at active:%zd\n", + ccid, port - el->active); return rq; } } @@ -3019,8 +3034,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid) for (port = el->pending; (rq = *port); port++) { if (rq->context->lrc.ccid == ccid) { ENGINE_TRACE(engine, - "ccid found at pending:%zd\n", - port - el->pending); + "ccid:%x found at pending:%zd\n", + ccid, port - el->pending); return rq; } } @@ -3138,8 +3153,11 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine) static void execlists_submission_tasklet(unsigned long data) { struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; - process_csb(engine); + inactive = process_csb(engine, post); + GEM_BUG_ON(inactive - post > ARRAY_SIZE(post)); if (unlikely(preempt_timeout(engine))) engine->execlists.error_interrupt |= ERROR_PREEMPT; @@ -3163,6 +3181,8 @@ static void execlists_submission_tasklet(unsigned long data) if (!engine->execlists.pending[0]) execlists_dequeue_irq(engine); + + post_process_csb(post, inactive); } static void __execlists_kick(struct intel_engine_execlists *execlists) @@ -4108,8 +4128,6 @@ static void enable_execlists(struct intel_engine_cs *engine) ENGINE_POSTING_READ(engine, RING_HWS_PGA); enable_error_interrupt(engine); - - engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); } static bool unexpected_starting_state(struct intel_engine_cs *engine) @@ -4198,22 +4216,29 @@ static void __execlists_reset_reg_state(const struct intel_context *ce, __reset_stop_ring(regs, engine); } -static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) +static struct i915_request **reset_csb(struct intel_engine_cs *engine, + struct i915_request **inactive) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct intel_context *ce; - struct i915_request *rq; - u32 head; mb(); /* paranoia: read the CSB pointers from after the reset */ clflush(execlists->csb_write); mb(); - process_csb(engine); /* drain preemption events */ + inactive = process_csb(engine, inactive); /* drain preemption events */ /* Following the reset, we need to reload the CSB read/write pointers */ reset_csb_pointers(engine); + return inactive; +} + +static void execlists_reset_active(struct intel_engine_cs *engine, bool stalled) +{ + struct intel_context *ce; + struct i915_request *rq; + u32 head; + /* * Save the currently executing context, even if we completed * its request, it was still running at the time of the @@ -4221,7 +4246,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) */ rq = active_context(engine, engine->execlists.reset_ccid); if (!rq) - goto unwind; + return; ce = rq->context; GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); @@ -4284,11 +4309,20 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) __execlists_reset_reg_state(ce, engine); __execlists_update_reg_state(ce, engine, head); ce->lrc.desc |= CTX_DESC_FORCE_RESTORE; /* paranoid: GPU was reset! */ +} -unwind: - /* Push back any incomplete requests for replay after the reset. */ - cancel_port_requests(execlists); - __unwind_incomplete_requests(engine); +static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled) +{ + struct intel_engine_execlists * const execlists = &engine->execlists; + struct i915_request *post[2 * EXECLIST_MAX_PORTS]; + struct i915_request **inactive; + + inactive = reset_csb(engine, post); + + execlists_reset_active(engine, true); + + inactive = cancel_port_requests(execlists, inactive); + post_process_csb(post, inactive); } static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) @@ -4297,10 +4331,12 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled) ENGINE_TRACE(engine, "\n"); - spin_lock_irqsave(&engine->active.lock, flags); - - __execlists_reset(engine, stalled); + /* Process the csb, find the guilty context and throw away */ + execlists_reset_csb(engine, stalled); + /* Push back any incomplete requests for replay after the reset. */ + spin_lock_irqsave(&engine->active.lock, flags); + __unwind_incomplete_requests(engine); spin_unlock_irqrestore(&engine->active.lock, flags); } @@ -4335,9 +4371,9 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->active.lock, flags); + execlists_reset_csb(engine, true); - __execlists_reset(engine, true); + spin_lock_irqsave(&engine->active.lock, flags); /* Mark all executing requests as skipped. */ list_for_each_entry(rq, &engine->active.requests, sched.link) @@ -5163,6 +5199,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) else execlists->csb_size = GEN11_CSB_ENTRIES; + engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0); if (INTEL_GEN(engine->i915) >= 11) { execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32); execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32); From patchwork Tue Nov 24 11:42:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9D9AC2D0E4 for ; Tue, 24 Nov 2020 11:42:36 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 94899206D8 for ; Tue, 24 Nov 2020 11:42:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94899206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 179206E252; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4FCFD6E25A for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090935-1500050 for multiple; Tue, 24 Nov 2020 11:42:22 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:16 +0000 Message-Id: <20201124114219.29020-13-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 13/16] drm/i915/gt: Remove virtual breadcrumb before transfer X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The issue with stale virtual breadcrumbs remain. Now we have the problem that if the irq-signaler is still referencing the stale breadcrumb as we transfer it to a new sibling, the list becomes spaghetti. This is a very small window, but that doesn't stop it being hit infrequently. To prevent the lists being tangled (the iterator starting on one engine's b->signalers but walking onto another list), always decouple the virtual breadcrumb on schedule-out and make sure that the walker has stepped out of the lists. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 +++-- drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++++++++++++++ 2 files changed, 18 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index f5f6feed0fa6..5b84f51931d9 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -461,15 +461,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) { struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; struct intel_context *ce = rq->context; + unsigned long flags; bool release; if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) return; - spin_lock(&ce->signal_lock); + spin_lock_irqsave(&ce->signal_lock, flags); list_del_rcu(&rq->signal_link); release = remove_signaling_context(b, ce); - spin_unlock(&ce->signal_lock); + spin_unlock_irqrestore(&ce->signal_lock, flags); if (release) intel_context_put(ce); diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 56e899739dd7..3b9a95240b10 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1387,6 +1387,21 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); + struct intel_engine_cs *engine = rq->engine; + + /* Flush concurrent rcu iterators in signal_irq_work */ + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) { + /* + * After this point, the rq may be transferred to a new + * sibling, so before we clear ce->inflight make sure that + * the context has been removed from the b->signalers and + * furthermore we need to make sure that the concurrent + * iterator in signal_irq_work is no longer following + * ce->signal_link. + */ + i915_request_cancel_breadcrumb(rq); + irq_work_sync(&engine->breadcrumbs->irq_work); + } if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); From patchwork Tue Nov 24 11:42:17 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928133 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B2267C64E7A for ; Tue, 24 Nov 2020 11:42:50 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 505B32075A for ; Tue, 24 Nov 2020 11:42:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 505B32075A Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 3426F6E2D1; Tue, 24 Nov 2020 11:42:46 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 27D616E241 for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090936-1500050 for multiple; Tue, 24 Nov 2020 11:42:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:17 +0000 Message-Id: <20201124114219.29020-14-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 14/16] drm/i915/gt: Shrink the critical section for irq signaling X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Let's only wait for the list iterator when decoupling the virtual breadcrumb, as the signaling of all the requests may take a long time, during which we do not want to keep the tasklet spinning. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 2 ++ drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h | 1 + drivers/gpu/drm/i915/gt/intel_lrc.c | 3 ++- 3 files changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c index 5b84f51931d9..3e69d6c3b197 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -251,6 +251,7 @@ static void signal_irq_work(struct irq_work *work) intel_breadcrumbs_disarm_irq(b); rcu_read_lock(); + atomic_inc(&b->signaler_active); list_for_each_entry_rcu(ce, &b->signalers, signal_link) { struct i915_request *rq; @@ -284,6 +285,7 @@ static void signal_irq_work(struct irq_work *work) } } } + atomic_dec(&b->signaler_active); rcu_read_unlock(); llist_for_each_safe(signal, sn, signal) { diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h index a74bb3062bd8..f672053d694d 100644 --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h @@ -35,6 +35,7 @@ struct intel_breadcrumbs { spinlock_t signalers_lock; /* protects the list of signalers */ struct list_head signalers; struct llist_head signaled_requests; + atomic_t signaler_active; spinlock_t irq_lock; /* protects the interrupt from hardirq context */ struct irq_work irq_work; /* for use from inside irq_lock */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 3b9a95240b10..9247c03acdd0 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1400,7 +1400,8 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) * ce->signal_link. */ i915_request_cancel_breadcrumb(rq); - irq_work_sync(&engine->breadcrumbs->irq_work); + while (atomic_read(&engine->breadcrumbs->signaler_active)) + cpu_relax(); } if (READ_ONCE(ve->request)) From patchwork Tue Nov 24 11:42:18 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B76F7C2D0E4 for ; Tue, 24 Nov 2020 11:42:32 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D7BCF206D8 for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D7BCF206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E542F6E23D; Tue, 24 Nov 2020 11:42:30 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 219946E23D for ; Tue, 24 Nov 2020 11:42:28 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090937-1500050 for multiple; Tue, 24 Nov 2020 11:42:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:18 +0000 Message-Id: <20201124114219.29020-15-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 15/16] drm/i915/gt: Resubmit the virtual engine on schedule-out X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Having recognised that we do not change the sibling until we schedule out, we can then defer the decision to resubmit the virtual engine from the unwind of the active queue to scheduling out of the virtual context. This improves our resilence in virtual engine scheduling, and should eliminate the rare cases of gem_exec_balance failing. By keeping the unwind order intact on the local engine, we can preserve data dependency ordering while doing a preempt-to-busy pass until we have determined the new ELSP. This means that if we try to timeslice between a virtual engine and a data-dependent ordinary request, the pair will maintain their relative ordering and we will avoid the resubmission, cancelling the timeslicing until further change. The dilemma though is that we then may end up in a situation where the 'demotion' of the virtual request to an ordinary request in the engine queue results in filling the ELSP[] with virtual requests instead of spreading the load across the engines. To compensate for this, we mark each virtual request and refuse to resubmit a virtual request in the secondary ELSP slots, thus forcing subsequent virtual requests to be scheduled out after timeslicing. By delaying the decision until we schedule out, we will avoid unnecessary resubmission. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2079 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2098 Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/gt/intel_lrc.c | 122 +++++++++++++++---------- drivers/gpu/drm/i915/gt/selftest_lrc.c | 2 +- 2 files changed, 77 insertions(+), 47 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 9247c03acdd0..9e9a87e86c5f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1111,38 +1111,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) __i915_request_unsubmit(rq); - /* - * Push the request back into the queue for later resubmission. - * If this request is not native to this physical engine (i.e. - * it came from a virtual source), push it back onto the virtual - * engine so that it can be moved across onto another physical - * engine as load dictates. - */ - if (likely(rq->execution_mask == engine->mask)) { - GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); - if (rq_prio(rq) != prio) { - prio = rq_prio(rq); - pl = i915_sched_lookup_priolist(engine, prio); - } - GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - - list_move(&rq->sched.link, pl); - set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID); + if (rq_prio(rq) != prio) { + prio = rq_prio(rq); + pl = i915_sched_lookup_priolist(engine, prio); + } + GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - /* Check in case we rollback so far we wrap [size/2] */ - if (intel_ring_direction(rq->ring, - rq->tail, - rq->ring->tail + 8) > 0) - rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; + list_move(&rq->sched.link, pl); + set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); - active = rq; - } else { - struct intel_engine_cs *owner = rq->context->engine; + /* Check in case we rollback so far we wrap [size/2] */ + if (intel_ring_direction(rq->ring, + rq->tail, + rq->ring->tail + 8) > 0) + rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE; - WRITE_ONCE(rq->engine, owner); - owner->submit_request(rq); - active = NULL; - } + active = rq; } return active; @@ -1384,6 +1369,20 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); } +static void +resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve) +{ + struct intel_engine_cs *engine = rq->engine; + + spin_lock_irq(&engine->active.lock); + + clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags); + WRITE_ONCE(rq->engine, &ve->base); + ve->base.submit_request(rq); + + spin_unlock_irq(&engine->active.lock); +} + static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { struct virtual_engine *ve = container_of(ce, typeof(*ve), context); @@ -1404,6 +1403,16 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce) cpu_relax(); } + /* + * This engine is now too busy to run this virtual request, so + * see if we can find an alternative engine for it to execute on. + * Once a request has become bonded to this engine, we treat it the + * same as other native request. + */ + if (i915_request_in_priority_queue(rq) && + rq->execution_mask != engine->mask) + resubmit_virtual_request(rq, ve); + if (READ_ONCE(ve->request)) tasklet_hi_schedule(&ve->base.execlists.tasklet); } @@ -1648,6 +1657,20 @@ assert_pending_valid(const struct intel_engine_execlists *execlists, } sentinel = i915_request_has_sentinel(rq); + /* + * We want virtual requests to only be in the first slot so + * that they are never stuck behind a hog and can be immediately + * transferred onto the next idle engine. + */ + if (rq->execution_mask != engine->mask && + port != execlists->pending) { + GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n", + engine->name, + ce->timeline->fence_context, + port - execlists->pending); + return false; + } + /* Hold tightly onto the lock to prevent concurrent retires! */ if (!spin_trylock_irqsave(&rq->lock, flags)) continue; @@ -2314,6 +2337,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (i915_request_has_sentinel(last)) goto done; + /* + * We avoid submitting virtual requests into + * the secondary ports so that we can migrate + * the request immediately to another engine + * rather than wait for the primary request. + */ + if (rq->execution_mask != engine->mask) + goto done; + /* * If GVT overrides us we only ever submit * port[0], leaving port[1] empty. Note that we @@ -5711,7 +5743,6 @@ static void virtual_submission_tasklet(unsigned long data) static void virtual_submit_request(struct i915_request *rq) { struct virtual_engine *ve = to_virtual_engine(rq->engine); - struct i915_request *old; unsigned long flags; ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n", @@ -5722,28 +5753,27 @@ static void virtual_submit_request(struct i915_request *rq) spin_lock_irqsave(&ve->base.active.lock, flags); - old = ve->request; - if (old) { /* background completion event from preempt-to-busy */ - GEM_BUG_ON(!i915_request_completed(old)); - __i915_request_submit(old); - i915_request_put(old); - } - + /* By the time we resubmit a request, it may be completed */ if (i915_request_completed(rq)) { __i915_request_submit(rq); + goto unlock; + } - ve->base.execlists.queue_priority_hint = INT_MIN; - ve->request = NULL; - } else { - ve->base.execlists.queue_priority_hint = rq_prio(rq); - ve->request = i915_request_get(rq); + if (ve->request) { /* background completion from preempt-to-busy */ + GEM_BUG_ON(!i915_request_completed(ve->request)); + __i915_request_submit(ve->request); + i915_request_put(ve->request); + } - GEM_BUG_ON(!list_empty(virtual_queue(ve))); - list_move_tail(&rq->sched.link, virtual_queue(ve)); + ve->base.execlists.queue_priority_hint = rq_prio(rq); + ve->request = i915_request_get(rq); - tasklet_hi_schedule(&ve->base.execlists.tasklet); - } + GEM_BUG_ON(!list_empty(virtual_queue(ve))); + list_move_tail(&rq->sched.link, virtual_queue(ve)); + tasklet_hi_schedule(&ve->base.execlists.tasklet); + +unlock: spin_unlock_irqrestore(&ve->base.active.lock, flags); } diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 37cb51c3f4f6..fbbd8343d7f6 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -4589,7 +4589,7 @@ static int reset_virtual_engine(struct intel_gt *gt, spin_lock_irq(&engine->active.lock); __unwind_incomplete_requests(engine); spin_unlock_irq(&engine->active.lock); - GEM_BUG_ON(rq->engine != ve->engine); + GEM_BUG_ON(rq->engine != engine); /* Reset the engine while keeping our active request on hold */ execlists_hold(engine, rq); From patchwork Tue Nov 24 11:42:19 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 11928117 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E159C56201 for ; Tue, 24 Nov 2020 11:42:40 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id BD9FB206D8 for ; Tue, 24 Nov 2020 11:42:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BD9FB206D8 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=chris-wilson.co.uk Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 807C96E284; Tue, 24 Nov 2020 11:42:33 +0000 (UTC) Received: from fireflyinternet.com (unknown [77.68.26.236]) by gabe.freedesktop.org (Postfix) with ESMTPS id 51B146E270 for ; Tue, 24 Nov 2020 11:42:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from build.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 23090938-1500050 for multiple; Tue, 24 Nov 2020 11:42:23 +0000 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Tue, 24 Nov 2020 11:42:19 +0000 Message-Id: <20201124114219.29020-16-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20201124114219.29020-1-chris@chris-wilson.co.uk> References: <20201124114219.29020-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 16/16] drm/i915/gt: Simplify virtual engine handling for execlists_hold() X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Chris Wilson Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Now that the tasklet completely controls scheduling of the requests, and we postpone scheduling out the old requests, we can keep a hanging virtual request bound to the engine on which it hung, and remove it from te queue. On release, it will be returned to the same engine and remain in its queue until it is scheduled; after which point it will become eligible for transfer to a sibling. Instead, we could opt to resubmit the request along the virtual engine on unhold, making it eligible for load balancing immediately -- but that seems like a pointless optimisation for a hanging context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_lrc.c | 29 ----------------------------- 1 file changed, 29 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 9e9a87e86c5f..fd2c18da71a9 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -2852,35 +2852,6 @@ static bool execlists_hold(struct intel_engine_cs *engine, goto unlock; } - if (rq->engine != engine) { /* preempted virtual engine */ - struct virtual_engine *ve = to_virtual_engine(rq->engine); - - /* - * intel_context_inflight() is only protected by virtue - * of process_csb() being called only by the tasklet (or - * directly from inside reset while the tasklet is suspended). - * Assert that neither of those are allowed to run while we - * poke at the request queues. - */ - GEM_BUG_ON(!reset_in_progress(&engine->execlists)); - - /* - * An unsubmitted request along a virtual engine will - * remain on the active (this) engine until we are able - * to process the context switch away (and so mark the - * context as no longer in flight). That cannot have happened - * yet, otherwise we would not be hanging! - */ - spin_lock(&ve->base.active.lock); - GEM_BUG_ON(intel_context_inflight(rq->context) != engine); - GEM_BUG_ON(ve->request != rq); - ve->request = NULL; - spin_unlock(&ve->base.active.lock); - i915_request_put(rq); - - rq->engine = engine; - } - /* * Transfer this request onto the hold queue to prevent it * being resumbitted to HW (and potentially completed) before we have