From patchwork Mon Jun 10 07:20:59 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984189 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D98B81932 for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C210628816 for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id B6E2528842; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4275E2881C for ; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 093C88912A; Mon, 10 Jun 2019 07:21:56 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6CA3389125 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848354-1500050 for multiple; Mon, 10 Jun 2019 08:21:27 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:20:59 +0100 Message-Id: <20190610072126.6355-2-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 01/28] drm/i915: Move fence register tracking from i915->mm to ggtt X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP As the fence registers only apply to regions inside the GGTT is makes more sense that we track these as part of the i915_ggtt and not the general mm. In the next patch, we will then pull the register locking underneath the i915_ggtt.mutex. Signed-off-by: Chris Wilson Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_pm.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 6 +- drivers/gpu/drm/i915/gvt/aperture_gm.c | 7 +- drivers/gpu/drm/i915/gvt/gvt.h | 4 +- drivers/gpu/drm/i915/i915_debugfs.c | 42 +++++------ drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_drv.h | 28 -------- drivers/gpu/drm/i915/i915_gem.c | 52 +++----------- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 85 +++++++++++++++++------ drivers/gpu/drm/i915/i915_gem_fence_reg.h | 19 ++++- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 + drivers/gpu/drm/i915/i915_gem_gtt.h | 14 +++- drivers/gpu/drm/i915/i915_gpu_error.c | 6 +- drivers/gpu/drm/i915/i915_vma.h | 2 +- 15 files changed, 144 insertions(+), 132 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index c7b9b34de01b..a8b8b9c281f1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -310,9 +310,9 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) /* Mark as being mmapped into userspace for later revocation */ assert_rpm_wakelock_held(i915); if (!i915_vma_set_userfault(vma) && !obj->userfault_count++) - list_add(&obj->userfault_link, &i915->mm.userfault_list); + list_add(&obj->userfault_link, &i915->ggtt.userfault_list); if (CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND) - intel_wakeref_auto(&i915->mm.userfault_wakeref, + intel_wakeref_auto(&i915->ggtt.userfault_wakeref, msecs_to_jiffies_timeout(CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND)); GEM_BUG_ON(!obj->userfault_count); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index f40f13c0b8b7..6d6064fb2bf5 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -126,7 +126,7 @@ void i915_gem_suspend(struct drm_i915_private *i915) { GEM_TRACE("\n"); - intel_wakeref_auto(&i915->mm.userfault_wakeref, 0); + intel_wakeref_auto(&i915->ggtt.userfault_wakeref, 0); flush_workqueue(i915->wq); mutex_lock(&i915->drm.struct_mutex); diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index a6ecfdc735c4..1e93cf6eede4 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -695,19 +695,19 @@ static void revoke_mmaps(struct drm_i915_private *i915) { int i; - for (i = 0; i < i915->num_fence_regs; i++) { + for (i = 0; i < i915->ggtt.num_fences; i++) { struct drm_vma_offset_node *node; struct i915_vma *vma; u64 vma_offset; - vma = READ_ONCE(i915->fence_regs[i].vma); + vma = READ_ONCE(i915->ggtt.fence_regs[i].vma); if (!vma) continue; if (!i915_vma_has_userfault(vma)) continue; - GEM_BUG_ON(vma->fence != &i915->fence_regs[i]); + GEM_BUG_ON(vma->fence != &i915->ggtt.fence_regs[i]); node = &vma->obj->base.vma_node; vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT; unmap_mapping_range(i915->drm.anon_inode->i_mapping, diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c index 1fa2f65c3cd1..4098902bfaeb 100644 --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c @@ -35,6 +35,7 @@ */ #include "i915_drv.h" +#include "i915_gem_fence_reg.h" #include "gvt.h" static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm) @@ -128,7 +129,7 @@ void intel_vgpu_write_fence(struct intel_vgpu *vgpu, { struct intel_gvt *gvt = vgpu->gvt; struct drm_i915_private *dev_priv = gvt->dev_priv; - struct drm_i915_fence_reg *reg; + struct i915_fence_reg *reg; i915_reg_t fence_reg_lo, fence_reg_hi; assert_rpm_wakelock_held(dev_priv); @@ -163,7 +164,7 @@ static void free_vgpu_fence(struct intel_vgpu *vgpu) { struct intel_gvt *gvt = vgpu->gvt; struct drm_i915_private *dev_priv = gvt->dev_priv; - struct drm_i915_fence_reg *reg; + struct i915_fence_reg *reg; u32 i; if (WARN_ON(!vgpu_fence_sz(vgpu))) @@ -187,7 +188,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu) { struct intel_gvt *gvt = vgpu->gvt; struct drm_i915_private *dev_priv = gvt->dev_priv; - struct drm_i915_fence_reg *reg; + struct i915_fence_reg *reg; int i; intel_runtime_pm_get(dev_priv); diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h index b54f2bdc13a4..dfd10cf82b65 100644 --- a/drivers/gpu/drm/i915/gvt/gvt.h +++ b/drivers/gpu/drm/i915/gvt/gvt.h @@ -87,7 +87,7 @@ struct intel_vgpu_gm { /* Fences owned by a vGPU */ struct intel_vgpu_fence { - struct drm_i915_fence_reg *regs[INTEL_GVT_MAX_NUM_FENCES]; + struct i915_fence_reg *regs[INTEL_GVT_MAX_NUM_FENCES]; u32 base; u32 size; }; @@ -390,7 +390,7 @@ int intel_gvt_load_firmware(struct intel_gvt *gvt); #define gvt_hidden_gmadr_end(gvt) (gvt_hidden_gmadr_base(gvt) \ + gvt_hidden_sz(gvt) - 1) -#define gvt_fence_sz(gvt) (gvt->dev_priv->num_fence_regs) +#define gvt_fence_sz(gvt) ((gvt)->dev_priv->ggtt.num_fences) /* Aperture/GM space definitions for vGPU */ #define vgpu_aperture_offset(vgpu) ((vgpu)->gm.low_gm_node.start) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index f212241a2758..331b2f478c48 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -156,8 +156,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) unsigned int frontbuffer_bits; int pin_count = 0; - lockdep_assert_held(&obj->base.dev->struct_mutex); - seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s", &obj->base, get_active_flag(obj), @@ -173,17 +171,17 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) obj->mm.madv == I915_MADV_DONTNEED ? " purgeable" : ""); if (obj->base.name) seq_printf(m, " (name: %d)", obj->base.name); - list_for_each_entry(vma, &obj->vma.list, obj_link) { - if (i915_vma_is_pinned(vma)) - pin_count++; - } - seq_printf(m, " (pinned x %d)", pin_count); - if (obj->pin_global) - seq_printf(m, " (global)"); + + spin_lock(&obj->vma.lock); list_for_each_entry(vma, &obj->vma.list, obj_link) { if (!drm_mm_node_allocated(&vma->node)) continue; + spin_unlock(&obj->vma.lock); + + if (i915_vma_is_pinned(vma)) + pin_count++; + seq_printf(m, " (%sgtt offset: %08llx, size: %08llx, pages: %s", i915_vma_is_ggtt(vma) ? "g" : "pp", vma->node.start, vma->node.size, @@ -234,9 +232,16 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) vma->fence->id, i915_active_request_isset(&vma->last_fence) ? "*" : ""); seq_puts(m, ")"); + + spin_lock(&obj->vma.lock); } + spin_unlock(&obj->vma.lock); + + seq_printf(m, " (pinned x %d)", pin_count); if (obj->stolen) seq_printf(m, " (stolen: %08llx)", obj->stolen->start); + if (obj->pin_global) + seq_printf(m, " (global)"); engine = i915_gem_object_last_write_engine(obj); if (engine) @@ -870,28 +875,25 @@ static int i915_interrupt_info(struct seq_file *m, void *data) static int i915_gem_fence_regs_info(struct seq_file *m, void *data) { - struct drm_i915_private *dev_priv = node_to_i915(m->private); - struct drm_device *dev = &dev_priv->drm; - int i, ret; + struct drm_i915_private *i915 = node_to_i915(m->private); + unsigned int i; - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - return ret; + seq_printf(m, "Total fences = %d\n", i915->ggtt.num_fences); - seq_printf(m, "Total fences = %d\n", dev_priv->num_fence_regs); - for (i = 0; i < dev_priv->num_fence_regs; i++) { - struct i915_vma *vma = dev_priv->fence_regs[i].vma; + rcu_read_lock(); + for (i = 0; i < i915->ggtt.num_fences; i++) { + struct i915_vma *vma = i915->ggtt.fence_regs[i].vma; seq_printf(m, "Fence %d, pin count = %d, object = ", - i, dev_priv->fence_regs[i].pin_count); + i, i915->ggtt.fence_regs[i].pin_count); if (!vma) seq_puts(m, "unused"); else describe_obj(m, vma->obj); seq_putc(m, '\n'); } + rcu_read_unlock(); - mutex_unlock(&dev->struct_mutex); return 0; } diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index 1af6751e1b36..06a76d49ecd0 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -350,7 +350,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data, value = pdev->revision; break; case I915_PARAM_NUM_FENCES_AVAIL: - value = dev_priv->num_fence_regs; + value = dev_priv->ggtt.num_fences; break; case I915_PARAM_HAS_OVERLAY: value = dev_priv->overlay ? 1 : 0; @@ -1625,7 +1625,6 @@ static int i915_driver_init_hw(struct drm_i915_private *dev_priv) intel_uncore_sanitize(dev_priv); intel_gt_init_workarounds(dev_priv); - i915_gem_load_init_fences(dev_priv); /* On the 945G/GM, the chipset reports the MSI capability on the * integrated graphics even though the support isn't actually there diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 82e55c65289a..9e6eced477e7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -761,14 +761,6 @@ struct i915_gem_mm { */ struct list_head purge_list; - /** List of all objects in gtt_space, currently mmaped by userspace. - * All objects within this list must also be on bound_list. - */ - struct list_head userfault_list; - - /* Manual runtime pm autosuspend delay for user GGTT mmaps */ - struct intel_wakeref_auto userfault_wakeref; - /** * List of objects which are pending destruction. */ @@ -798,9 +790,6 @@ struct i915_gem_mm { struct notifier_block vmap_notifier; struct shrinker shrinker; - /** LRU list of objects with fence regs on them. */ - struct list_head fence_list; - /** * Workqueue to fault in userptr pages, flushed by the execbuf * when required but otherwise left to userspace to try again @@ -1489,9 +1478,6 @@ struct drm_i915_private { /* protects panel power sequencer state */ struct mutex pps_mutex; - struct drm_i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; /* assume 965 */ - int num_fence_regs; /* 8 on pre-965, 16 otherwise */ - unsigned int fsb_freq, mem_freq, is_ddr3; unsigned int skl_preferred_vco_freq; unsigned int max_cdclk_freq; @@ -2542,7 +2528,6 @@ void i915_gem_cleanup_userptr(struct drm_i915_private *dev_priv); void i915_gem_sanitize(struct drm_i915_private *i915); int i915_gem_init_early(struct drm_i915_private *dev_priv); void i915_gem_cleanup_early(struct drm_i915_private *dev_priv); -void i915_gem_load_init_fences(struct drm_i915_private *dev_priv); int i915_gem_freeze(struct drm_i915_private *dev_priv); int i915_gem_freeze_late(struct drm_i915_private *dev_priv); @@ -2668,19 +2653,6 @@ i915_vm_to_ppgtt(struct i915_address_space *vm) return container_of(vm, struct i915_hw_ppgtt, vm); } -/* i915_gem_fence_reg.c */ -struct drm_i915_fence_reg * -i915_reserve_fence(struct drm_i915_private *dev_priv); -void i915_unreserve_fence(struct drm_i915_fence_reg *fence); - -void i915_gem_restore_fences(struct drm_i915_private *dev_priv); - -void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv); -void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj, - struct sg_table *pages); -void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj, - struct sg_table *pages); - static inline struct i915_gem_context * __i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id) { diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9f2e213c6046..99427d8b9266 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -883,7 +883,7 @@ i915_gem_sw_finish_ioctl(struct drm_device *dev, void *data, return 0; } -void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv) +void i915_gem_runtime_suspend(struct drm_i915_private *i915) { struct drm_i915_gem_object *obj, *on; int i; @@ -896,17 +896,19 @@ void i915_gem_runtime_suspend(struct drm_i915_private *dev_priv) */ list_for_each_entry_safe(obj, on, - &dev_priv->mm.userfault_list, userfault_link) + &i915->ggtt.userfault_list, userfault_link) __i915_gem_object_release_mmap(obj); - /* The fence will be lost when the device powers down. If any were + /* + * The fence will be lost when the device powers down. If any were * in use by hardware (i.e. they are pinned), we should not be powering * down! All other fences will be reacquired by the user upon waking. */ - for (i = 0; i < dev_priv->num_fence_regs; i++) { - struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i]; + for (i = 0; i < i915->ggtt.num_fences; i++) { + struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i]; - /* Ideally we want to assert that the fence register is not + /* + * Ideally we want to assert that the fence register is not * live at this point (i.e. that no piece of code will be * trying to write through fence + GTT, as that both violates * our tracking of activity and associated locking/barriers, @@ -1684,7 +1686,7 @@ void i915_gem_fini_hw(struct drm_i915_private *dev_priv) { GEM_BUG_ON(dev_priv->gt.awake); - intel_wakeref_auto_fini(&dev_priv->mm.userfault_wakeref); + intel_wakeref_auto_fini(&dev_priv->ggtt.userfault_wakeref); i915_gem_suspend_late(dev_priv); intel_disable_gt_powersave(dev_priv); @@ -1726,38 +1728,6 @@ void i915_gem_init_mmio(struct drm_i915_private *i915) i915_gem_sanitize(i915); } -void -i915_gem_load_init_fences(struct drm_i915_private *dev_priv) -{ - int i; - - if (INTEL_GEN(dev_priv) >= 7 && !IS_VALLEYVIEW(dev_priv) && - !IS_CHERRYVIEW(dev_priv)) - dev_priv->num_fence_regs = 32; - else if (INTEL_GEN(dev_priv) >= 4 || - IS_I945G(dev_priv) || IS_I945GM(dev_priv) || - IS_G33(dev_priv) || IS_PINEVIEW(dev_priv)) - dev_priv->num_fence_regs = 16; - else - dev_priv->num_fence_regs = 8; - - if (intel_vgpu_active(dev_priv)) - dev_priv->num_fence_regs = - I915_READ(vgtif_reg(avail_rs.fence_num)); - - /* Initialize fence registers to zero */ - for (i = 0; i < dev_priv->num_fence_regs; i++) { - struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i]; - - fence->i915 = dev_priv; - fence->id = i; - list_add_tail(&fence->link, &dev_priv->mm.fence_list); - } - i915_gem_restore_fences(dev_priv); - - i915_gem_detect_bit_6_swizzle(dev_priv); -} - static void i915_gem_init__mm(struct drm_i915_private *i915) { spin_lock_init(&i915->mm.obj_lock); @@ -1768,10 +1738,6 @@ static void i915_gem_init__mm(struct drm_i915_private *i915) INIT_LIST_HEAD(&i915->mm.purge_list); INIT_LIST_HEAD(&i915->mm.unbound_list); INIT_LIST_HEAD(&i915->mm.bound_list); - INIT_LIST_HEAD(&i915->mm.fence_list); - - INIT_LIST_HEAD(&i915->mm.userfault_list); - intel_wakeref_auto_init(&i915->mm.userfault_wakeref, i915); i915_gem_init__objects(i915); } diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index 10aa6e350bfa..4ba3726556a4 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -25,6 +25,7 @@ #include "i915_drv.h" #include "i915_scatterlist.h" +#include "i915_vgpu.h" /** * DOC: fence register handling @@ -58,7 +59,7 @@ #define pipelined 0 -static void i965_write_fence_reg(struct drm_i915_fence_reg *fence, +static void i965_write_fence_reg(struct i915_fence_reg *fence, struct i915_vma *vma) { i915_reg_t fence_reg_lo, fence_reg_hi; @@ -115,7 +116,7 @@ static void i965_write_fence_reg(struct drm_i915_fence_reg *fence, } } -static void i915_write_fence_reg(struct drm_i915_fence_reg *fence, +static void i915_write_fence_reg(struct i915_fence_reg *fence, struct i915_vma *vma) { u32 val; @@ -155,7 +156,7 @@ static void i915_write_fence_reg(struct drm_i915_fence_reg *fence, } } -static void i830_write_fence_reg(struct drm_i915_fence_reg *fence, +static void i830_write_fence_reg(struct i915_fence_reg *fence, struct i915_vma *vma) { u32 val; @@ -187,7 +188,7 @@ static void i830_write_fence_reg(struct drm_i915_fence_reg *fence, } } -static void fence_write(struct drm_i915_fence_reg *fence, +static void fence_write(struct i915_fence_reg *fence, struct i915_vma *vma) { /* @@ -211,7 +212,7 @@ static void fence_write(struct drm_i915_fence_reg *fence, fence->dirty = false; } -static int fence_update(struct drm_i915_fence_reg *fence, +static int fence_update(struct i915_fence_reg *fence, struct i915_vma *vma) { intel_wakeref_t wakeref; @@ -256,7 +257,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, old->fence = NULL; } - list_move(&fence->link, &fence->i915->mm.fence_list); + list_move(&fence->link, &fence->i915->ggtt.fence_list); } /* @@ -280,7 +281,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, if (vma) { vma->fence = fence; - list_move_tail(&fence->link, &fence->i915->mm.fence_list); + list_move_tail(&fence->link, &fence->i915->ggtt.fence_list); } intel_runtime_pm_put(fence->i915, wakeref); @@ -300,7 +301,7 @@ static int fence_update(struct drm_i915_fence_reg *fence, */ int i915_vma_put_fence(struct i915_vma *vma) { - struct drm_i915_fence_reg *fence = vma->fence; + struct i915_fence_reg *fence = vma->fence; if (!fence) return 0; @@ -311,11 +312,11 @@ int i915_vma_put_fence(struct i915_vma *vma) return fence_update(fence, NULL); } -static struct drm_i915_fence_reg *fence_find(struct drm_i915_private *i915) +static struct i915_fence_reg *fence_find(struct drm_i915_private *i915) { - struct drm_i915_fence_reg *fence; + struct i915_fence_reg *fence; - list_for_each_entry(fence, &i915->mm.fence_list, link) { + list_for_each_entry(fence, &i915->ggtt.fence_list, link) { GEM_BUG_ON(fence->vma && fence->vma->fence != fence); if (fence->pin_count) @@ -352,7 +353,7 @@ static struct drm_i915_fence_reg *fence_find(struct drm_i915_private *i915) int i915_vma_pin_fence(struct i915_vma *vma) { - struct drm_i915_fence_reg *fence; + struct i915_fence_reg *fence; struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL; int err; @@ -369,7 +370,7 @@ i915_vma_pin_fence(struct i915_vma *vma) fence->pin_count++; if (!fence->dirty) { list_move_tail(&fence->link, - &fence->i915->mm.fence_list); + &fence->i915->ggtt.fence_list); return 0; } } else if (set) { @@ -404,10 +405,10 @@ i915_vma_pin_fence(struct i915_vma *vma) * This function walks the fence regs looking for a free one and remove * it from the fence_list. It is used to reserve fence for vGPU to use. */ -struct drm_i915_fence_reg * +struct i915_fence_reg * i915_reserve_fence(struct drm_i915_private *i915) { - struct drm_i915_fence_reg *fence; + struct i915_fence_reg *fence; int count; int ret; @@ -415,7 +416,7 @@ i915_reserve_fence(struct drm_i915_private *i915) /* Keep at least one fence available for the display engine. */ count = 0; - list_for_each_entry(fence, &i915->mm.fence_list, link) + list_for_each_entry(fence, &i915->ggtt.fence_list, link) count += !fence->pin_count; if (count <= 1) return ERR_PTR(-ENOSPC); @@ -441,11 +442,11 @@ i915_reserve_fence(struct drm_i915_private *i915) * * This function add a reserved fence register from vGPU to the fence_list. */ -void i915_unreserve_fence(struct drm_i915_fence_reg *fence) +void i915_unreserve_fence(struct i915_fence_reg *fence) { lockdep_assert_held(&fence->i915->drm.struct_mutex); - list_add(&fence->link, &fence->i915->mm.fence_list); + list_add(&fence->link, &fence->i915->ggtt.fence_list); } /** @@ -461,8 +462,8 @@ void i915_gem_restore_fences(struct drm_i915_private *i915) int i; rcu_read_lock(); /* keep obj alive as we dereference */ - for (i = 0; i < i915->num_fence_regs; i++) { - struct drm_i915_fence_reg *reg = &i915->fence_regs[i]; + for (i = 0; i < i915->ggtt.num_fences; i++) { + struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i]; struct i915_vma *vma = READ_ONCE(reg->vma); GEM_BUG_ON(vma && vma->fence != reg); @@ -534,8 +535,8 @@ void i915_gem_restore_fences(struct drm_i915_private *i915) * Detects bit 6 swizzling of address lookup between IGD access and CPU * access through main memory. */ -void -i915_gem_detect_bit_6_swizzle(struct drm_i915_private *i915) +static void +detect_bit_6_swizzle(struct drm_i915_private *i915) { struct intel_uncore *uncore = &i915->uncore; u32 swizzle_x = I915_BIT_6_SWIZZLE_UNKNOWN; @@ -798,3 +799,43 @@ i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj, i++; } } + +void +i915_ggtt_init_fences(struct i915_ggtt *ggtt) +{ + struct drm_i915_private *i915 = ggtt->vm.i915; + int num_fences; + int i; + + INIT_LIST_HEAD(&ggtt->fence_list); + INIT_LIST_HEAD(&ggtt->userfault_list); + intel_wakeref_auto_init(&ggtt->userfault_wakeref, i915); + + detect_bit_6_swizzle(i915); + + if (INTEL_GEN(i915) >= 7 && + !(IS_VALLEYVIEW(i915) || IS_CHERRYVIEW(i915))) + num_fences = 32; + else if (INTEL_GEN(i915) >= 4 || + IS_I945G(i915) || IS_I945GM(i915) || + IS_G33(i915) || IS_PINEVIEW(i915)) + num_fences = 16; + else + num_fences = 8; + + if (intel_vgpu_active(i915)) + num_fences = intel_uncore_read(&i915->uncore, + vgtif_reg(avail_rs.fence_num)); + + /* Initialize fence registers to zero */ + for (i = 0; i < num_fences; i++) { + struct i915_fence_reg *fence = &ggtt->fence_regs[i]; + + fence->i915 = i915; + fence->id = i; + list_add_tail(&fence->link, &ggtt->fence_list); + } + ggtt->num_fences = num_fences; + + i915_gem_restore_fences(i915); +} diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.h b/drivers/gpu/drm/i915/i915_gem_fence_reg.h index 09dcaf14121b..d2da98828179 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.h +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.h @@ -26,13 +26,17 @@ #define __I915_FENCE_REG_H__ #include +#include +struct drm_i915_gem_object; struct drm_i915_private; +struct i915_ggtt; struct i915_vma; +struct sg_table; #define I965_FENCE_PAGE 4096UL -struct drm_i915_fence_reg { +struct i915_fence_reg { struct list_head link; struct drm_i915_private *i915; struct i915_vma *vma; @@ -49,4 +53,17 @@ struct drm_i915_fence_reg { bool dirty; }; +/* i915_gem_fence_reg.c */ +struct i915_fence_reg *i915_reserve_fence(struct drm_i915_private *i915); +void i915_unreserve_fence(struct i915_fence_reg *fence); + +void i915_gem_restore_fences(struct drm_i915_private *i915); + +void i915_gem_object_do_bit_17_swizzle(struct drm_i915_gem_object *obj, + struct sg_table *pages); +void i915_gem_object_save_bit_17_swizzle(struct drm_i915_gem_object *obj, + struct sg_table *pages); + +void i915_ggtt_init_fences(struct i915_ggtt *ggtt); + #endif diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 87be9c1b6021..acc3cb7cb219 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -3600,6 +3600,8 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv) ggtt->mtrr = arch_phys_wc_add(ggtt->gmadr.start, ggtt->mappable_end); + i915_ggtt_init_fences(ggtt); + /* * Initialise stolen early so that we may reserve preallocated * objects for the BIOS to KMS transition. diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 12856f9dd1d1..ee396938de10 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -39,6 +39,7 @@ #include #include "gt/intel_reset.h" +#include "i915_gem_fence_reg.h" #include "i915_request.h" #include "i915_scatterlist.h" #include "i915_selftest.h" @@ -61,7 +62,6 @@ #define I915_MAX_NUM_FENCE_BITS 6 struct drm_i915_file_private; -struct drm_i915_fence_reg; struct drm_i915_gem_object; struct i915_vma; @@ -406,6 +406,18 @@ struct i915_ggtt { u32 pin_bias; + unsigned int num_fences; + struct i915_fence_reg fence_regs[I915_MAX_NUM_FENCES]; + struct list_head fence_list; + + /** List of all objects in gtt_space, currently mmaped by userspace. + * All objects within this list must also be on bound_list. + */ + struct list_head userfault_list; + + /* Manual runtime pm autosuspend delay for user GGTT mmaps */ + struct intel_wakeref_auto userfault_wakeref; + struct drm_mm_node error_capture; struct drm_mm_node uc_fw; }; diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 193a93857d99..8dc727cbfe68 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1126,13 +1126,13 @@ static void gem_record_fences(struct i915_gpu_state *error) int i; if (INTEL_GEN(dev_priv) >= 6) { - for (i = 0; i < dev_priv->num_fence_regs; i++) + for (i = 0; i < dev_priv->ggtt.num_fences; i++) error->fence[i] = I915_READ64(FENCE_REG_GEN6_LO(i)); } else if (INTEL_GEN(dev_priv) >= 4) { - for (i = 0; i < dev_priv->num_fence_regs; i++) + for (i = 0; i < dev_priv->ggtt.num_fences; i++) error->fence[i] = I915_READ64(FENCE_REG_965_LO(i)); } else { - for (i = 0; i < dev_priv->num_fence_regs; i++) + for (i = 0; i < dev_priv->ggtt.num_fences; i++) error->fence[i] = I915_READ(FENCE_REG(i)); } error->nfence = i; diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 0c57ab4fed5d..4b769db649bf 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -54,7 +54,7 @@ struct i915_vma { struct drm_i915_gem_object *obj; struct i915_address_space *vm; const struct i915_vma_ops *ops; - struct drm_i915_fence_reg *fence; + struct i915_fence_reg *fence; struct reservation_object *resv; /** Alias of obj->resv */ struct sg_table *pages; void __iomem *iomap; From patchwork Mon Jun 10 07:21:00 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984191 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 500771580 for ; Mon, 10 Jun 2019 07:21:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3A1F62881C for ; Mon, 10 Jun 2019 07:21:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 2E6EB28842; Mon, 10 Jun 2019 07:21:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 958052883C for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 455A88913D; Mon, 10 Jun 2019 07:21:56 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6F8BC8912A for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848355-1500050 for multiple; Mon, 10 Jun 2019 08:21:27 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:00 +0100 Message-Id: <20190610072126.6355-3-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 02/28] drm/i915: Track ggtt fence reservations under its own mutex X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We can reduce the locking for fence registers from the dev->struct_mutex to a local mutex. We could introduce a mutex for the sole purpose of tracking the fence acquisition, except there is a little bit of overlap with the fault tracking, so use the i915_ggtt.mutex as it covers both. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 7 ++ drivers/gpu/drm/i915/i915_debugfs.c | 5 +- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 81 ++++++++++++++------ drivers/gpu/drm/i915/i915_gem_fence_reg.h | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.h | 1 + drivers/gpu/drm/i915/i915_vma.h | 4 +- 6 files changed, 70 insertions(+), 30 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 3be67e561c26..127faef8d8c2 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1166,7 +1166,14 @@ static int evict_fence(void *data) goto out_unlock; } + err = i915_vma_pin(arg->vma, 0, 0, PIN_GLOBAL | PIN_MAPPABLE); + if (err) { + pr_err("Unable to pin vma for Y-tiled fence; err:%d\n", err); + goto out_unlock; + } + err = i915_vma_pin_fence(arg->vma); + i915_vma_unpin(arg->vma); if (err) { pr_err("Unable to pin Y-tiled fence; err:%d\n", err); goto out_unlock; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 331b2f478c48..b0f4c3638d21 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -882,10 +882,11 @@ static int i915_gem_fence_regs_info(struct seq_file *m, void *data) rcu_read_lock(); for (i = 0; i < i915->ggtt.num_fences; i++) { - struct i915_vma *vma = i915->ggtt.fence_regs[i].vma; + struct i915_fence_reg *reg = &i915->ggtt.fence_regs[i]; + struct i915_vma *vma = reg->vma; seq_printf(m, "Fence %d, pin count = %d, object = ", - i, i915->ggtt.fence_regs[i].pin_count); + i, atomic_read(®->pin_count)); if (!vma) seq_puts(m, "unused"); else diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index 4ba3726556a4..d13be3b0e91d 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -301,15 +301,24 @@ static int fence_update(struct i915_fence_reg *fence, */ int i915_vma_put_fence(struct i915_vma *vma) { + struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); struct i915_fence_reg *fence = vma->fence; + int err; if (!fence) return 0; - if (fence->pin_count) + if (atomic_read(&fence->pin_count)) return -EBUSY; - return fence_update(fence, NULL); + err = mutex_lock_interruptible(&ggtt->vm.mutex); + if (err) + return err; + + err = fence_update(fence, NULL); + mutex_unlock(&ggtt->vm.mutex); + + return err; } static struct i915_fence_reg *fence_find(struct drm_i915_private *i915) @@ -319,7 +328,7 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915) list_for_each_entry(fence, &i915->ggtt.fence_list, link) { GEM_BUG_ON(fence->vma && fence->vma->fence != fence); - if (fence->pin_count) + if (atomic_read(&fence->pin_count)) continue; return fence; @@ -353,6 +362,7 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915) int i915_vma_pin_fence(struct i915_vma *vma) { + struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); struct i915_fence_reg *fence; struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL; int err; @@ -361,27 +371,34 @@ i915_vma_pin_fence(struct i915_vma *vma) * Note that we revoke fences on runtime suspend. Therefore the user * must keep the device awake whilst using the fence. */ - assert_rpm_wakelock_held(vma->vm->i915); + assert_rpm_wakelock_held(ggtt->vm.i915); + GEM_BUG_ON(!i915_vma_is_pinned(vma)); + + err = mutex_lock_interruptible(&ggtt->vm.mutex); + if (err) + return err; /* Just update our place in the LRU if our fence is getting reused. */ if (vma->fence) { fence = vma->fence; GEM_BUG_ON(fence->vma != vma); - fence->pin_count++; + atomic_inc(&fence->pin_count); if (!fence->dirty) { - list_move_tail(&fence->link, - &fence->i915->ggtt.fence_list); - return 0; + list_move_tail(&fence->link, &ggtt->fence_list); + goto unlock; } } else if (set) { fence = fence_find(vma->vm->i915); - if (IS_ERR(fence)) - return PTR_ERR(fence); + if (IS_ERR(fence)) { + err = PTR_ERR(fence); + goto unlock; + } - GEM_BUG_ON(fence->pin_count); - fence->pin_count++; - } else - return 0; + GEM_BUG_ON(atomic_read(&fence->pin_count)); + atomic_inc(&fence->pin_count); + } else { + goto unlock; + } err = fence_update(fence, set); if (err) @@ -391,10 +408,12 @@ i915_vma_pin_fence(struct i915_vma *vma) GEM_BUG_ON(vma->fence != (set ? fence : NULL)); if (set) - return 0; + goto unlock; out_unpin: - fence->pin_count--; + atomic_dec(&fence->pin_count); +unlock: + mutex_unlock(&ggtt->vm.mutex); return err; } @@ -412,28 +431,38 @@ i915_reserve_fence(struct drm_i915_private *i915) int count; int ret; - lockdep_assert_held(&i915->drm.struct_mutex); + mutex_lock(&i915->ggtt.vm.mutex); /* Keep at least one fence available for the display engine. */ count = 0; list_for_each_entry(fence, &i915->ggtt.fence_list, link) - count += !fence->pin_count; - if (count <= 1) - return ERR_PTR(-ENOSPC); + count += !atomic_read(&fence->pin_count); + if (count <= 1) { + ret = -ENOSPC; + goto err; + } fence = fence_find(i915); - if (IS_ERR(fence)) - return fence; + if (IS_ERR(fence)) { + ret = PTR_ERR(fence); + goto err; + } if (fence->vma) { /* Force-remove fence from VMA */ ret = fence_update(fence, NULL); if (ret) - return ERR_PTR(ret); + goto err; } list_del(&fence->link); + mutex_unlock(&i915->ggtt.vm.mutex); + return fence; + +err: + mutex_unlock(&i915->ggtt.vm.mutex); + return ERR_PTR(ret); } /** @@ -444,9 +473,11 @@ i915_reserve_fence(struct drm_i915_private *i915) */ void i915_unreserve_fence(struct i915_fence_reg *fence) { - lockdep_assert_held(&fence->i915->drm.struct_mutex); + struct i915_ggtt *ggtt = &fence->i915->ggtt; - list_add(&fence->link, &fence->i915->ggtt.fence_list); + mutex_lock(&ggtt->vm.mutex); + list_add(&fence->link, &ggtt->fence_list); + mutex_unlock(&ggtt->vm.mutex); } /** diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.h b/drivers/gpu/drm/i915/i915_gem_fence_reg.h index d2da98828179..d7c6ebf789c1 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.h +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.h @@ -40,7 +40,7 @@ struct i915_fence_reg { struct list_head link; struct drm_i915_private *i915; struct i915_vma *vma; - int pin_count; + atomic_t pin_count; int id; /** * Whether the tiling parameters for the currently diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index ee396938de10..5f155bf183bb 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -36,6 +36,7 @@ #include #include +#include #include #include "gt/intel_reset.h" diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 4b769db649bf..908118ade441 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -419,8 +419,8 @@ int __must_check i915_vma_put_fence(struct i915_vma *vma); static inline void __i915_vma_unpin_fence(struct i915_vma *vma) { - GEM_BUG_ON(vma->fence->pin_count <= 0); - vma->fence->pin_count--; + GEM_BUG_ON(atomic_read(&vma->fence->pin_count) <= 0); + atomic_dec(&vma->fence->pin_count); } /** From patchwork Mon Jun 10 07:21:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984195 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 669C51932 for ; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4CF6428816 for ; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4143C2883C; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id EDDC128816 for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 1D79F89123; Mon, 10 Jun 2019 07:21:56 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 78EF789125 for ; Mon, 10 Jun 2019 07:21:51 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848356-1500050 for multiple; Mon, 10 Jun 2019 08:21:27 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:01 +0100 Message-Id: <20190610072126.6355-4-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 03/28] drm/i915: Combine unbound/bound list tracking for objects X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP With async binding, we don't want to manage a bound/unbound list as we may end up running before we even acquire the pages. All that is required is keeping track of shrinkable objects, so reduce it to the minimum list. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 12 +- drivers/gpu/drm/i915/gem/i915_gem_object.c | 5 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 +- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 13 +- drivers/gpu/drm/i915/gem/i915_gem_pm.c | 3 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 30 ++- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 +- drivers/gpu/drm/i915/i915_debugfs.c | 189 +----------------- drivers/gpu/drm/i915/i915_drv.h | 14 +- drivers/gpu/drm/i915/i915_gem.c | 30 ++- drivers/gpu/drm/i915/i915_vma.c | 30 +-- .../gpu/drm/i915/selftests/i915_gem_evict.c | 16 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- 13 files changed, 75 insertions(+), 275 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index e5deae62681f..6115109a2810 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -219,7 +219,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj, * rewrite the PTE in the belief that doing so tramples upon less * state and so involves less work. */ - if (obj->bind_count) { + if (atomic_read(&obj->bind_count)) { /* Before we change the PTE, the GPU must not be accessing it. * If we wait upon the object, we know that all the bound * VMA are no longer active. @@ -475,14 +475,10 @@ static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj) } mutex_unlock(&i915->ggtt.vm.mutex); - if (i915_gem_object_is_shrinkable(obj) && - obj->mm.madv == I915_MADV_WILLNEED) { - struct list_head *list; - + if (i915_gem_object_is_shrinkable(obj)) { spin_lock(&i915->mm.obj_lock); - list = obj->bind_count ? - &i915->mm.bound_list : &i915->mm.unbound_list; - list_move_tail(&obj->mm.link, list); + if (obj->mm.madv == I915_MADV_WILLNEED) + list_move_tail(&obj->mm.link, &i915->mm.shrink_list); spin_unlock(&i915->mm.obj_lock); } } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index a0bc8f7ab780..7a07e726ec83 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -214,7 +214,7 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, mutex_unlock(&i915->drm.struct_mutex); - GEM_BUG_ON(obj->bind_count); + GEM_BUG_ON(atomic_read(&obj->bind_count)); GEM_BUG_ON(obj->userfault_count); GEM_BUG_ON(atomic_read(&obj->frontbuffer_bits)); GEM_BUG_ON(!list_empty(&obj->lut_list)); @@ -329,7 +329,8 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) obj->mm.madv = I915_MADV_DONTNEED; - if (i915_gem_object_has_pages(obj)) { + if (i915_gem_object_has_pages(obj) && + i915_gem_object_is_shrinkable(obj)) { spin_lock(&i915->mm.obj_lock); list_move_tail(&obj->mm.link, &i915->mm.purge_list); spin_unlock(&i915->mm.obj_lock); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 9c161ba73558..5b05698619ce 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -155,7 +155,7 @@ struct drm_i915_gem_object { #define STRIDE_MASK (~TILING_MASK) /** Count of VMA actually bound by this object */ - unsigned int bind_count; + atomic_t bind_count; unsigned int active_count; /** Count of how many global VMA are currently pinned for use by HW */ unsigned int pin_global; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 7e64fd6bc19b..7868dd48d931 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -57,10 +57,19 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(!HAS_PAGE_SIZES(i915, obj->mm.page_sizes.sg)); if (i915_gem_object_is_shrinkable(obj)) { + struct list_head *list; + spin_lock(&i915->mm.obj_lock); + i915->mm.shrink_count++; i915->mm.shrink_memory += obj->base.size; - list_add(&obj->mm.link, &i915->mm.unbound_list); + + if (obj->mm.madv != I915_MADV_WILLNEED) + list = &i915->mm.purge_list; + else + list = &i915->mm.shrink_list; + list_add_tail(&obj->mm.link, list); + spin_unlock(&i915->mm.obj_lock); } } @@ -185,7 +194,7 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, if (i915_gem_object_has_pinned_pages(obj)) return -EBUSY; - GEM_BUG_ON(obj->bind_count); + GEM_BUG_ON(atomic_read(&obj->bind_count)); /* May be called by shrinker from within get_pages() (on another bo) */ mutex_lock_nested(&obj->mm.lock, subclass); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index 6d6064fb2bf5..a1add5e6b658 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -162,8 +162,7 @@ void i915_gem_suspend_late(struct drm_i915_private *i915) { struct drm_i915_gem_object *obj; struct list_head *phases[] = { - &i915->mm.unbound_list, - &i915->mm.bound_list, + &i915->mm.shrink_list, &i915->mm.purge_list, NULL }, **phase; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index d71e630c6fb8..1e7f48db7b3e 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -69,7 +69,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj) * to the GPU, simply unbinding from the GPU is not going to succeed * in releasing our pin count on the pages themselves. */ - if (atomic_read(&obj->mm.pages_pin_count) > obj->bind_count) + if (atomic_read(&obj->mm.pages_pin_count) > atomic_read(&obj->bind_count)) return false; /* If any vma are "permanently" pinned, it will prevent us from @@ -145,8 +145,10 @@ i915_gem_shrink(struct drm_i915_private *i915, unsigned int bit; } phases[] = { { &i915->mm.purge_list, ~0u }, - { &i915->mm.unbound_list, I915_SHRINK_UNBOUND }, - { &i915->mm.bound_list, I915_SHRINK_BOUND }, + { + &i915->mm.shrink_list, + I915_SHRINK_BOUND | I915_SHRINK_UNBOUND + }, { NULL, 0 }, }, *phase; intel_wakeref_t wakeref = 0; @@ -237,7 +239,7 @@ i915_gem_shrink(struct drm_i915_private *i915, continue; if (!(flags & I915_SHRINK_BOUND) && - READ_ONCE(obj->bind_count)) + atomic_read(&obj->bind_count)) continue; if (!can_release_pages(obj)) @@ -377,7 +379,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr) struct drm_i915_private *i915 = container_of(nb, struct drm_i915_private, mm.oom_notifier); struct drm_i915_gem_object *obj; - unsigned long unevictable, bound, unbound, freed_pages; + unsigned long unevictable, available, freed_pages; intel_wakeref_t wakeref; freed_pages = 0; @@ -391,26 +393,20 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr) * assert that there are no objects with pinned pages that are not * being pointed to by hardware. */ - unbound = bound = unevictable = 0; + available = unevictable = 0; spin_lock(&i915->mm.obj_lock); - list_for_each_entry(obj, &i915->mm.unbound_list, mm.link) { + list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) { if (!can_release_pages(obj)) unevictable += obj->base.size >> PAGE_SHIFT; else - unbound += obj->base.size >> PAGE_SHIFT; - } - list_for_each_entry(obj, &i915->mm.bound_list, mm.link) { - if (!can_release_pages(obj)) - unevictable += obj->base.size >> PAGE_SHIFT; - else - bound += obj->base.size >> PAGE_SHIFT; + available += obj->base.size >> PAGE_SHIFT; } spin_unlock(&i915->mm.obj_lock); - if (freed_pages || unbound || bound) + if (freed_pages || available) pr_info("Purging GPU memory, %lu pages freed, " - "%lu pages still pinned.\n", - freed_pages, unevictable); + "%lu pages still pinned, %lu pages left available.\n", + freed_pages, unevictable, available); *(unsigned long *)ptr += freed_pages; return NOTIFY_DONE; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 84d4f549eb21..c9b5e6cd940d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -689,10 +689,8 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv list_move_tail(&vma->vm_link, &ggtt->vm.bound_list); mutex_unlock(&ggtt->vm.mutex); - spin_lock(&dev_priv->mm.obj_lock); GEM_BUG_ON(i915_gem_object_is_shrinkable(obj)); - obj->bind_count++; - spin_unlock(&dev_priv->mm.obj_lock); + atomic_inc(&obj->bind_count); return obj; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index b0f4c3638d21..326a56a97247 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -104,19 +104,6 @@ static char get_pin_mapped_flag(struct drm_i915_gem_object *obj) return obj->mm.mapping ? 'M' : ' '; } -static u64 i915_gem_obj_total_ggtt_size(struct drm_i915_gem_object *obj) -{ - u64 size = 0; - struct i915_vma *vma; - - for_each_ggtt_vma(vma, obj) { - if (drm_mm_node_allocated(&vma->node)) - size += vma->node.size; - } - - return size; -} - static const char * stringify_page_sizes(unsigned int page_sizes, char *buf, size_t len) { @@ -252,83 +239,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits); } -static int obj_rank_by_stolen(const void *A, const void *B) -{ - const struct drm_i915_gem_object *a = - *(const struct drm_i915_gem_object **)A; - const struct drm_i915_gem_object *b = - *(const struct drm_i915_gem_object **)B; - - if (a->stolen->start < b->stolen->start) - return -1; - if (a->stolen->start > b->stolen->start) - return 1; - return 0; -} - -static int i915_gem_stolen_list_info(struct seq_file *m, void *data) -{ - struct drm_i915_private *dev_priv = node_to_i915(m->private); - struct drm_device *dev = &dev_priv->drm; - struct drm_i915_gem_object **objects; - struct drm_i915_gem_object *obj; - u64 total_obj_size, total_gtt_size; - unsigned long total, count, n; - int ret; - - total = READ_ONCE(dev_priv->mm.shrink_count); - objects = kvmalloc_array(total, sizeof(*objects), GFP_KERNEL); - if (!objects) - return -ENOMEM; - - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - goto out; - - total_obj_size = total_gtt_size = count = 0; - - spin_lock(&dev_priv->mm.obj_lock); - list_for_each_entry(obj, &dev_priv->mm.bound_list, mm.link) { - if (count == total) - break; - - if (obj->stolen == NULL) - continue; - - objects[count++] = obj; - total_obj_size += obj->base.size; - total_gtt_size += i915_gem_obj_total_ggtt_size(obj); - - } - list_for_each_entry(obj, &dev_priv->mm.unbound_list, mm.link) { - if (count == total) - break; - - if (obj->stolen == NULL) - continue; - - objects[count++] = obj; - total_obj_size += obj->base.size; - } - spin_unlock(&dev_priv->mm.obj_lock); - - sort(objects, count, sizeof(*objects), obj_rank_by_stolen, NULL); - - seq_puts(m, "Stolen:\n"); - for (n = 0; n < count; n++) { - seq_puts(m, " "); - describe_obj(m, objects[n]); - seq_putc(m, '\n'); - } - seq_printf(m, "Total %lu objects, %llu bytes, %llu GTT size\n", - count, total_obj_size, total_gtt_size); - - mutex_unlock(&dev->struct_mutex); -out: - kvfree(objects); - return ret; -} - struct file_stats { struct i915_address_space *vm; unsigned long count; @@ -348,7 +258,7 @@ static int per_file_stats(int id, void *ptr, void *data) stats->count++; stats->total += obj->base.size; - if (!obj->bind_count) + if (!atomic_read(&obj->bind_count)) stats->unbound += obj->base.size; if (obj->base.name || obj->base.dma_buf) stats->shared += obj->base.size; @@ -455,104 +365,22 @@ static void print_context_stats(struct seq_file *m, static int i915_gem_object_info(struct seq_file *m, void *data) { - struct drm_i915_private *dev_priv = node_to_i915(m->private); - struct drm_device *dev = &dev_priv->drm; - struct i915_ggtt *ggtt = &dev_priv->ggtt; - u32 count, mapped_count, purgeable_count, dpy_count, huge_count; - u64 size, mapped_size, purgeable_size, dpy_size, huge_size; - struct drm_i915_gem_object *obj; - unsigned int page_sizes = 0; - char buf[80]; + struct drm_i915_private *i915 = node_to_i915(m->private); int ret; seq_printf(m, "%u shrinkable objects, %llu bytes\n", - dev_priv->mm.shrink_count, - dev_priv->mm.shrink_memory); - - size = count = 0; - mapped_size = mapped_count = 0; - purgeable_size = purgeable_count = 0; - huge_size = huge_count = 0; - - spin_lock(&dev_priv->mm.obj_lock); - list_for_each_entry(obj, &dev_priv->mm.unbound_list, mm.link) { - size += obj->base.size; - ++count; - - if (obj->mm.madv == I915_MADV_DONTNEED) { - purgeable_size += obj->base.size; - ++purgeable_count; - } - - if (obj->mm.mapping) { - mapped_count++; - mapped_size += obj->base.size; - } - - if (obj->mm.page_sizes.sg > I915_GTT_PAGE_SIZE) { - huge_count++; - huge_size += obj->base.size; - page_sizes |= obj->mm.page_sizes.sg; - } - } - seq_printf(m, "%u unbound objects, %llu bytes\n", count, size); - - size = count = dpy_size = dpy_count = 0; - list_for_each_entry(obj, &dev_priv->mm.bound_list, mm.link) { - size += obj->base.size; - ++count; - - if (obj->pin_global) { - dpy_size += obj->base.size; - ++dpy_count; - } - - if (obj->mm.madv == I915_MADV_DONTNEED) { - purgeable_size += obj->base.size; - ++purgeable_count; - } - - if (obj->mm.mapping) { - mapped_count++; - mapped_size += obj->base.size; - } - - if (obj->mm.page_sizes.sg > I915_GTT_PAGE_SIZE) { - huge_count++; - huge_size += obj->base.size; - page_sizes |= obj->mm.page_sizes.sg; - } - } - spin_unlock(&dev_priv->mm.obj_lock); - - seq_printf(m, "%u bound objects, %llu bytes\n", - count, size); - seq_printf(m, "%u purgeable objects, %llu bytes\n", - purgeable_count, purgeable_size); - seq_printf(m, "%u mapped objects, %llu bytes\n", - mapped_count, mapped_size); - seq_printf(m, "%u huge-paged objects (%s) %llu bytes\n", - huge_count, - stringify_page_sizes(page_sizes, buf, sizeof(buf)), - huge_size); - seq_printf(m, "%u display objects (globally pinned), %llu bytes\n", - dpy_count, dpy_size); - - seq_printf(m, "%llu [%pa] gtt total\n", - ggtt->vm.total, &ggtt->mappable_end); - seq_printf(m, "Supported page sizes: %s\n", - stringify_page_sizes(INTEL_INFO(dev_priv)->page_sizes, - buf, sizeof(buf))); + i915->mm.shrink_count, + i915->mm.shrink_memory); seq_putc(m, '\n'); - ret = mutex_lock_interruptible(&dev->struct_mutex); + ret = mutex_lock_interruptible(&i915->drm.struct_mutex); if (ret) return ret; - print_batch_pool_stats(m, dev_priv); - print_context_stats(m, dev_priv); - mutex_unlock(&dev->struct_mutex); + print_batch_pool_stats(m, i915); + print_context_stats(m, i915); + mutex_unlock(&i915->drm.struct_mutex); return 0; } @@ -4536,7 +4364,6 @@ static const struct file_operations i915_fifo_underrun_reset_ops = { static const struct drm_info_list i915_debugfs_list[] = { {"i915_capabilities", i915_capabilities, 0}, {"i915_gem_objects", i915_gem_object_info, 0}, - {"i915_gem_stolen", i915_gem_stolen_list_info }, {"i915_gem_fence_regs", i915_gem_fence_regs_info, 0}, {"i915_gem_interrupt", i915_interrupt_info, 0}, {"i915_gem_batch_pool", i915_gem_batch_pool_info, 0}, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 9e6eced477e7..2beadccbd913 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -747,19 +747,15 @@ struct i915_gem_mm { /* Protects bound_list/unbound_list and #drm_i915_gem_object.mm.link */ spinlock_t obj_lock; - /** List of all objects in gtt_space. Used to restore gtt - * mappings on resume */ - struct list_head bound_list; /** - * List of objects which are not bound to the GTT (thus - * are idle and not used by the GPU). These objects may or may - * not actually have any pages attached. + * List of objects which are purgeable. */ - struct list_head unbound_list; + struct list_head purge_list; + /** - * List of objects which are purgeable. May be active. + * List of objects which have allocated pages and are shrinkable. */ - struct list_head purge_list; + struct list_head shrink_list; /** * List of objects which are pending destruction. diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 99427d8b9266..0aeca42cb061 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1143,10 +1143,8 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, spin_lock(&i915->mm.obj_lock); if (obj->mm.madv != I915_MADV_WILLNEED) list = &i915->mm.purge_list; - else if (obj->bind_count) - list = &i915->mm.bound_list; else - list = &i915->mm.unbound_list; + list = &i915->mm.shrink_list; list_move_tail(&obj->mm.link, list); spin_unlock(&i915->mm.obj_lock); } @@ -1736,8 +1734,7 @@ static void i915_gem_init__mm(struct drm_i915_private *i915) init_llist_head(&i915->mm.free_list); INIT_LIST_HEAD(&i915->mm.purge_list); - INIT_LIST_HEAD(&i915->mm.unbound_list); - INIT_LIST_HEAD(&i915->mm.bound_list); + INIT_LIST_HEAD(&i915->mm.shrink_list); i915_gem_init__objects(i915); } @@ -1796,11 +1793,7 @@ int i915_gem_freeze(struct drm_i915_private *dev_priv) int i915_gem_freeze_late(struct drm_i915_private *i915) { struct drm_i915_gem_object *obj; - struct list_head *phases[] = { - &i915->mm.unbound_list, - &i915->mm.bound_list, - NULL - }, **phase; + intel_wakeref_t wakeref; /* * Called just before we write the hibernation image. @@ -1817,17 +1810,18 @@ int i915_gem_freeze_late(struct drm_i915_private *i915) * the objects as well, see i915_gem_freeze() */ - i915_gem_shrink(i915, -1UL, NULL, I915_SHRINK_UNBOUND); + wakeref = intel_runtime_pm_get(i915); + + i915_gem_shrink(i915, -1UL, NULL, ~0); i915_gem_drain_freed_objects(i915); - for (phase = phases; *phase; phase++) { - list_for_each_entry(obj, *phase, mm.link) { - i915_gem_object_lock(obj); - WARN_ON(i915_gem_object_set_to_cpu_domain(obj, true)); - i915_gem_object_unlock(obj); - } + list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) { + i915_gem_object_lock(obj); + WARN_ON(i915_gem_object_set_to_cpu_domain(obj, true)); + i915_gem_object_unlock(obj); } - GEM_BUG_ON(!list_empty(&i915->mm.purge_list)); + + intel_runtime_pm_put(i915, wakeref); return 0; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index f6ac8394da77..a3cb08f602f9 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -82,8 +82,7 @@ static void obj_bump_mru(struct drm_i915_gem_object *obj) struct drm_i915_private *i915 = to_i915(obj->base.dev); spin_lock(&i915->mm.obj_lock); - if (obj->bind_count) - list_move_tail(&obj->mm.link, &i915->mm.bound_list); + list_move_tail(&obj->mm.link, &i915->mm.shrink_list); spin_unlock(&i915->mm.obj_lock); obj->mm.dirty = true; /* be paranoid */ @@ -535,7 +534,7 @@ static void assert_bind_count(const struct drm_i915_gem_object *obj) * assume that no else is pinning the pages, but as a rough assertion * that we will not run into problems later, this will do!) */ - GEM_BUG_ON(atomic_read(&obj->mm.pages_pin_count) < obj->bind_count); + GEM_BUG_ON(atomic_read(&obj->mm.pages_pin_count) < atomic_read(&obj->bind_count)); } /** @@ -677,17 +676,8 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) mutex_unlock(&vma->vm->mutex); if (vma->obj) { - struct drm_i915_gem_object *obj = vma->obj; - - spin_lock(&dev_priv->mm.obj_lock); - - if (i915_gem_object_is_shrinkable(obj)) - list_move_tail(&obj->mm.link, &dev_priv->mm.bound_list); - - obj->bind_count++; - assert_bind_count(obj); - - spin_unlock(&dev_priv->mm.obj_lock); + atomic_inc(&vma->obj->bind_count); + assert_bind_count(vma->obj); } return 0; @@ -703,8 +693,6 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) static void i915_vma_remove(struct i915_vma *vma) { - struct drm_i915_private *i915 = vma->vm->i915; - GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(vma->flags & (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND)); @@ -722,15 +710,7 @@ i915_vma_remove(struct i915_vma *vma) if (vma->obj) { struct drm_i915_gem_object *obj = vma->obj; - spin_lock(&i915->mm.obj_lock); - - GEM_BUG_ON(obj->bind_count == 0); - if (--obj->bind_count == 0 && - i915_gem_object_is_shrinkable(obj) && - obj->mm.madv == I915_MADV_WILLNEED) - list_move_tail(&obj->mm.link, &i915->mm.unbound_list); - - spin_unlock(&i915->mm.obj_lock); + atomic_dec(&obj->bind_count); /* * And finally now the object is completely decoupled from this diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c index 1d8235303edf..71c1363ad536 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c @@ -67,20 +67,24 @@ static int populate_ggtt(struct drm_i915_private *i915, count++; } + bound = 0; unbound = 0; - list_for_each_entry(obj, &i915->mm.unbound_list, mm.link) - if (obj->mm.quirked) + list_for_each_entry(obj, objects, st_link) { + GEM_BUG_ON(!obj->mm.quirked); + + if (atomic_read(&obj->bind_count)) + bound++; + else unbound++; + } + GEM_BUG_ON(bound + unbound != count); + if (unbound) { pr_err("%s: Found %lu objects unbound, expected %u!\n", __func__, unbound, 0); return -EINVAL; } - bound = 0; - list_for_each_entry(obj, &i915->mm.bound_list, mm.link) - if (obj->mm.quirked) - bound++; if (bound != count) { pr_err("%s: Found %lu objects bound, expected %lu!\n", __func__, bound, count); diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index f1e95eaf6923..dda8b9c79c37 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -1233,7 +1233,7 @@ static void track_vma_bind(struct i915_vma *vma) { struct drm_i915_gem_object *obj = vma->obj; - obj->bind_count++; /* track for eviction later */ + atomic_inc(&obj->bind_count); /* track for eviction later */ __i915_gem_object_pin_pages(obj); vma->pages = obj->mm.pages; From patchwork Mon Jun 10 07:21:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984203 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0062814E5 for ; Mon, 10 Jun 2019 07:23:06 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DC7B6286C2 for ; Mon, 10 Jun 2019 07:23:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D0F6A28816; Mon, 10 Jun 2019 07:23:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4754E286C2 for ; Mon, 10 Jun 2019 07:23:05 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DDCFD89137; Mon, 10 Jun 2019 07:23:04 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id F37FB89137 for ; Mon, 10 Jun 2019 07:23:03 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848357-1500050 for multiple; Mon, 10 Jun 2019 08:21:28 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:02 +0100 Message-Id: <20190610072126.6355-5-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 04/28] drm/i915: Promote i915->mm.obj_lock to be irqsafe X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The intent is to be update the mm.lists from inside a irqsoff section (e.g. from a softirq rcu workqueue), ergo we need to make the mm.obj_lock irqsafe. Fixes: 3b4fa9640ccd ("drm/i915: Track the purgeable objects on a separate eviction list") Signed-off-by: Chris Wilson Cc: Joonas Lahtinen Cc: Matthew Auld Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 8 ++++++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 12 ++++++++---- drivers/gpu/drm/i915/gem/i915_gem_pages.c | 13 +++++++++---- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 14 ++++++++------ drivers/gpu/drm/i915/i915_gem.c | 8 ++++++-- drivers/gpu/drm/i915/i915_vma.c | 7 +++++-- 6 files changed, 42 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 6115109a2810..bd180ef46aeb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -476,10 +476,14 @@ static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj) mutex_unlock(&i915->ggtt.vm.mutex); if (i915_gem_object_is_shrinkable(obj)) { - spin_lock(&i915->mm.obj_lock); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); + if (obj->mm.madv == I915_MADV_WILLNEED) list_move_tail(&obj->mm.link, &i915->mm.shrink_list); - spin_unlock(&i915->mm.obj_lock); + + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 7a07e726ec83..03725ca42cc7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -207,9 +207,11 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, */ if (i915_gem_object_has_pages(obj) && i915_gem_object_is_shrinkable(obj)) { - spin_lock(&i915->mm.obj_lock); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); list_del_init(&obj->mm.link); - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } mutex_unlock(&i915->drm.struct_mutex); @@ -331,9 +333,11 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) if (i915_gem_object_has_pages(obj) && i915_gem_object_is_shrinkable(obj)) { - spin_lock(&i915->mm.obj_lock); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); list_move_tail(&obj->mm.link, &i915->mm.purge_list); - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index 7868dd48d931..b36ad269f4ea 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -58,8 +58,9 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, if (i915_gem_object_is_shrinkable(obj)) { struct list_head *list; + unsigned long flags; - spin_lock(&i915->mm.obj_lock); + spin_lock_irqsave(&i915->mm.obj_lock, flags); i915->mm.shrink_count++; i915->mm.shrink_memory += obj->base.size; @@ -70,7 +71,7 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, list = &i915->mm.shrink_list; list_add_tail(&obj->mm.link, list); - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } } @@ -160,11 +161,15 @@ __i915_gem_object_unset_pages(struct drm_i915_gem_object *obj) return pages; if (i915_gem_object_is_shrinkable(obj)) { - spin_lock(&i915->mm.obj_lock); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); + list_del(&obj->mm.link); i915->mm.shrink_count--; i915->mm.shrink_memory -= obj->base.size; - spin_unlock(&i915->mm.obj_lock); + + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } if (obj->mm.mapping) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 1e7f48db7b3e..88e63afd1d3d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -209,6 +209,7 @@ i915_gem_shrink(struct drm_i915_private *i915, for (phase = phases; phase->list; phase++) { struct list_head still_in_list; struct drm_i915_gem_object *obj; + unsigned long flags; if ((flags & phase->bit) == 0) continue; @@ -222,7 +223,7 @@ i915_gem_shrink(struct drm_i915_private *i915, * to be able to shrink their pages, so they remain on * the unbound/bound list until actually freed. */ - spin_lock(&i915->mm.obj_lock); + spin_lock_irqsave(&i915->mm.obj_lock, flags); while (count < target && (obj = list_first_entry_or_null(phase->list, typeof(*obj), @@ -245,7 +246,7 @@ i915_gem_shrink(struct drm_i915_private *i915, if (!can_release_pages(obj)) continue; - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); if (unsafe_drop_pages(obj)) { /* May arrive from get_pages on another bo */ @@ -259,10 +260,10 @@ i915_gem_shrink(struct drm_i915_private *i915, } scanned += obj->base.size >> PAGE_SHIFT; - spin_lock(&i915->mm.obj_lock); + spin_lock_irqsave(&i915->mm.obj_lock, flags); } list_splice_tail(&still_in_list, phase->list); - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } if (flags & I915_SHRINK_BOUND) @@ -381,6 +382,7 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr) struct drm_i915_gem_object *obj; unsigned long unevictable, available, freed_pages; intel_wakeref_t wakeref; + unsigned long flags; freed_pages = 0; with_intel_runtime_pm(i915, wakeref) @@ -394,14 +396,14 @@ i915_gem_shrinker_oom(struct notifier_block *nb, unsigned long event, void *ptr) * being pointed to by hardware. */ available = unevictable = 0; - spin_lock(&i915->mm.obj_lock); + spin_lock_irqsave(&i915->mm.obj_lock, flags); list_for_each_entry(obj, &i915->mm.shrink_list, mm.link) { if (!can_release_pages(obj)) unevictable += obj->base.size >> PAGE_SHIFT; else available += obj->base.size >> PAGE_SHIFT; } - spin_unlock(&i915->mm.obj_lock); + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); if (freed_pages || available) pr_info("Purging GPU memory, %lu pages freed, " diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 0aeca42cb061..25f6a2c3139e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1140,13 +1140,17 @@ i915_gem_madvise_ioctl(struct drm_device *dev, void *data, struct list_head *list; if (i915_gem_object_is_shrinkable(obj)) { - spin_lock(&i915->mm.obj_lock); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); + if (obj->mm.madv != I915_MADV_WILLNEED) list = &i915->mm.purge_list; else list = &i915->mm.shrink_list; list_move_tail(&obj->mm.link, list); - spin_unlock(&i915->mm.obj_lock); + + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index a3cb08f602f9..5c075cd6f9fc 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -80,10 +80,13 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) static void obj_bump_mru(struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = to_i915(obj->base.dev); + unsigned long flags; + + spin_lock_irqsave(&i915->mm.obj_lock, flags); - spin_lock(&i915->mm.obj_lock); list_move_tail(&obj->mm.link, &i915->mm.shrink_list); - spin_unlock(&i915->mm.obj_lock); + + spin_unlock_irqrestore(&i915->mm.obj_lock, flags); obj->mm.dirty = true; /* be paranoid */ } From patchwork Mon Jun 10 07:21:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984185 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id AB0251932 for ; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 94F912883B for ; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8583A28843; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id F17852883B for ; Mon, 10 Jun 2019 07:21:55 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5592389122; Mon, 10 Jun 2019 07:21:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0064C89130 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848358-1500050 for multiple; Mon, 10 Jun 2019 08:21:28 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:03 +0100 Message-Id: <20190610072126.6355-6-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 05/28] drm/i915: Make the semaphore saturation mask global X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Dmitry Ermilov Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The idea behind keeping the saturation mask local to a context backfired spectacularly. The premise with the local mask was that we would be more proactive in attempting to use semaphores after each time the context idled, and that all new contexts would attempt to use semaphores ignoring the current state of the system. This turns out to be horribly optimistic. If the system state is still oversaturated and the existing workloads have all stopped using semaphores, the new workloads would attempt to use semaphores and be deprioritised behind real work. The new contexts would not switch off using semaphores until their initial batch of low priority work had completed. Given sufficient backload load of equal user priority, this would completely starve the new work of any GPU time. To compensate, remove the local tracking in favour of keeping it as global state on the engine -- once the system is saturated and semaphores are disabled, everyone stops attempting to use semaphores until the system is idle again. One of the reason for preferring local context tracking was that it worked with virtual engines, so for switching to global state we could either do a complete check of all the virtual siblings or simply disable semaphores for those requests. This takes the simpler approach of disabling semaphores on virtual engines. The downside is that the decision that the engine is saturated is a local measure -- we are only checking whether or not this context was scheduled in a timely fashion, it may be legitimately delayed due to user priorities. We still have the same dilemma though, that we do not want to employ the semaphore poll unless it will be used. Fixes: ca6e56f654e7 ("drm/i915: Disable semaphore busywaits on saturated systems") Signed-off-by: Chris Wilson Cc: Tvrtko Ursulin Cc: Dmitry Rogozhkin Cc: Dmitry Ermilov --- drivers/gpu/drm/i915/gt/intel_context.c | 2 -- drivers/gpu/drm/i915/gt/intel_context_types.h | 2 -- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 ++ drivers/gpu/drm/i915/gt/intel_engine_types.h | 2 ++ drivers/gpu/drm/i915/gt/intel_lrc.c | 2 +- drivers/gpu/drm/i915/i915_request.c | 4 ++-- 6 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index c78ec0b58e77..7e2b18ddda19 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -118,7 +118,6 @@ intel_context_init(struct intel_context *ce, ce->engine = engine; ce->ops = engine->cops; ce->sseu = engine->sseu; - ce->saturated = 0; INIT_LIST_HEAD(&ce->signal_link); INIT_LIST_HEAD(&ce->signals); @@ -161,7 +160,6 @@ void intel_context_enter_engine(struct intel_context *ce) void intel_context_exit_engine(struct intel_context *ce) { - ce->saturated = 0; intel_engine_pm_put(ce->engine); } diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 825fcf0ac9c4..e47d5b7af5d4 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -53,8 +53,6 @@ struct intel_context { atomic_t pin_count; struct mutex pin_mutex; /* guards pinning and associated on-gpuing */ - intel_engine_mask_t saturated; /* submitting semaphores too late? */ - /** * active_tracker: Active tracker for the external rq activity * on this intel_context object. diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index ccf034764741..7bffa0c6f5a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -98,6 +98,8 @@ static int __engine_park(struct intel_wakeref *wf) struct intel_engine_cs *engine = container_of(wf, typeof(*engine), wakeref); + engine->saturated = 0; + /* * If one and only one request is completed between pm events, * we know that we are inside the kernel context and it is diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 01223864237a..d2dff9056ea0 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -292,6 +292,8 @@ struct intel_engine_cs { struct intel_context *kernel_context; /* pinned */ struct intel_context *preempt_context; /* pinned; optional */ + intel_engine_mask_t saturated; /* submitting semaphores too late? */ + unsigned long serial; unsigned long wakeref_serial; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 497ac036c4a9..62220cf9d281 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -3144,7 +3144,6 @@ static void virtual_context_exit(struct intel_context *ce) struct virtual_engine *ve = container_of(ce, typeof(*ve), context); unsigned int n; - ce->saturated = 0; for (n = 0; n < ve->num_siblings; n++) intel_engine_pm_put(ve->siblings[n]); } @@ -3339,6 +3338,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx, ve->base.uabi_class = I915_ENGINE_CLASS_INVALID; ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL; ve->base.flags = I915_ENGINE_IS_VIRTUAL; + ve->base.saturated = ALL_ENGINES; snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index da1e6984a8cc..8100ccf04c0b 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -459,7 +459,7 @@ void __i915_request_submit(struct i915_request *request) */ if (request->sched.semaphores && i915_sw_fence_signaled(&request->semaphore)) - request->hw_context->saturated |= request->sched.semaphores; + engine->saturated |= request->sched.semaphores; /* We may be recursing from the signal callback of another i915 fence */ spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING); @@ -839,7 +839,7 @@ already_busywaiting(struct i915_request *rq) * * See the are-we-too-late? check in __i915_request_submit(). */ - return rq->sched.semaphores | rq->hw_context->saturated; + return rq->sched.semaphores | rq->engine->saturated; } static int From patchwork Mon Jun 10 07:21:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984197 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E3F8F1902 for ; Mon, 10 Jun 2019 07:22:01 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C740228816 for ; Mon, 10 Jun 2019 07:22:01 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB9B12883C; Mon, 10 Jun 2019 07:22:01 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6275728816 for ; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B644F89145; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9269989132 for ; Mon, 10 Jun 2019 07:21:51 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848359-1500050 for multiple; Mon, 10 Jun 2019 08:21:28 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:04 +0100 Message-Id: <20190610072126.6355-7-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 06/28] drm/i915: Keep contexts pinned until after the next kernel context switch X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We need to keep the context image pinned in memory until after the GPU has finished writing into it. Since it continues to write as we signal the final breadcrumb, we need to keep it pinned until the request after it is complete. Currently we know the order in which requests execute on each engine, and so to remove that presumption we need to identify a request/context-switch we know must occur after our completion. Any request queued after the signal must imply a context switch, for simplicity we use a fresh request from the kernel context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 ++---- drivers/gpu/drm/i915/gem/i915_gem_context.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_pm.c | 20 ++++- drivers/gpu/drm/i915/gt/intel_context.c | 80 ++++++++++++++++--- drivers/gpu/drm/i915/gt/intel_context.h | 3 + drivers/gpu/drm/i915/gt/intel_context_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_engine.h | 2 - drivers/gpu/drm/i915/gt/intel_engine_cs.c | 23 +----- drivers/gpu/drm/i915/gt/intel_engine_pm.c | 2 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 13 +-- drivers/gpu/drm/i915/gt/intel_lrc.c | 62 ++------------ drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 44 +--------- drivers/gpu/drm/i915/gt/mock_engine.c | 11 +-- drivers/gpu/drm/i915/i915_active.c | 80 ++++++++++++++++++- drivers/gpu/drm/i915/i915_active.h | 5 ++ drivers/gpu/drm/i915/i915_active_types.h | 3 + drivers/gpu/drm/i915/i915_gem.c | 4 - drivers/gpu/drm/i915/i915_request.c | 15 ---- .../gpu/drm/i915/selftests/mock_gem_device.c | 1 - 19 files changed, 214 insertions(+), 185 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index dd9aa77e38ae..1fdae3a62cef 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -691,17 +691,6 @@ int i915_gem_contexts_init(struct drm_i915_private *dev_priv) return 0; } -void i915_gem_contexts_lost(struct drm_i915_private *dev_priv) -{ - struct intel_engine_cs *engine; - enum intel_engine_id id; - - lockdep_assert_held(&dev_priv->drm.struct_mutex); - - for_each_engine(engine, dev_priv, id) - intel_engine_lost_context(engine); -} - void i915_gem_contexts_fini(struct drm_i915_private *i915) { lockdep_assert_held(&i915->drm.struct_mutex); @@ -1199,10 +1188,6 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) if (ret) goto out_add; - ret = gen8_emit_rpcs_config(rq, ce, sseu); - if (ret) - goto out_add; - /* * Guarantee context image and the timeline remains pinned until the * modifying request is retired by setting the ce activity tracker. @@ -1210,9 +1195,12 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) * But we only need to take one pin on the account of it. Or in other * words transfer the pinned ce object to tracked active request. */ - if (!i915_active_request_isset(&ce->active_tracker)) - __intel_context_pin(ce); - __i915_active_request_set(&ce->active_tracker, rq); + GEM_BUG_ON(i915_active_is_idle(&ce->active)); + ret = i915_active_ref(&ce->active, rq->fence.context, rq); + if (ret) + goto out_add; + + ret = gen8_emit_rpcs_config(rq, ce, sseu); out_add: i915_request_add(rq); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h index 630392c77e48..9691dd062f72 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h @@ -134,7 +134,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx) /* i915_gem_context.c */ int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv); -void i915_gem_contexts_lost(struct drm_i915_private *dev_priv); void i915_gem_contexts_fini(struct drm_i915_private *dev_priv); int i915_gem_context_open(struct drm_i915_private *i915, diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index a1add5e6b658..b0f37621de9f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -10,6 +10,22 @@ #include "i915_drv.h" #include "i915_globals.h" +static void call_idle_barriers(struct intel_engine_cs *engine) +{ + struct llist_node *node, *next; + + llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) { + struct i915_active_request *active = + container_of((struct list_head *)node, + typeof(*active), link); + + INIT_LIST_HEAD(&active->link); + RCU_INIT_POINTER(active->request, NULL); + + active->retire(active, NULL); + } +} + static void i915_gem_park(struct drm_i915_private *i915) { struct intel_engine_cs *engine; @@ -17,8 +33,10 @@ static void i915_gem_park(struct drm_i915_private *i915) lockdep_assert_held(&i915->drm.struct_mutex); - for_each_engine(engine, i915, id) + for_each_engine(engine, i915, id) { + call_idle_barriers(engine); /* cleanup after wedging */ i915_gem_batch_pool_fini(&engine->batch_pool); + } i915_timelines_park(i915); i915_vma_parked(i915); diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 7e2b18ddda19..0102f6bb62ec 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -61,7 +61,6 @@ int __intel_context_do_pin(struct intel_context *ce) i915_gem_context_get(ce->gem_context); /* for ctx->ppgtt */ - intel_context_get(ce); smp_mb__before_atomic(); /* flush pin before it is visible */ } @@ -89,20 +88,45 @@ void intel_context_unpin(struct intel_context *ce) ce->ops->unpin(ce); i915_gem_context_put(ce->gem_context); - intel_context_put(ce); + intel_context_inactive(ce); } mutex_unlock(&ce->pin_mutex); intel_context_put(ce); } -static void intel_context_retire(struct i915_active_request *active, - struct i915_request *rq) +static int __context_pin_state(struct i915_vma *vma, unsigned long flags) { - struct intel_context *ce = - container_of(active, typeof(*ce), active_tracker); + int err; - intel_context_unpin(ce); + err = i915_vma_pin(vma, 0, 0, flags | PIN_GLOBAL); + if (err) + return err; + + /* + * And mark it as a globally pinned object to let the shrinker know + * it cannot reclaim the object until we release it. + */ + vma->obj->pin_global++; + vma->obj->mm.dirty = true; + + return 0; +} + +static void __context_unpin_state(struct i915_vma *vma) +{ + vma->obj->pin_global--; + __i915_vma_unpin(vma); +} + +static void intel_context_retire(struct i915_active *active) +{ + struct intel_context *ce = container_of(active, typeof(*ce), active); + + if (ce->state) + __context_unpin_state(ce->state); + + intel_context_put(ce); } void @@ -124,8 +148,46 @@ intel_context_init(struct intel_context *ce, mutex_init(&ce->pin_mutex); - i915_active_request_init(&ce->active_tracker, - NULL, intel_context_retire); + i915_active_init(ctx->i915, &ce->active, intel_context_retire); +} + +int intel_context_active(struct intel_context *ce, unsigned long flags) +{ + int err; + + if (!i915_active_acquire(&ce->active)) + return 0; + + intel_context_get(ce); + + if (!ce->state) + return 0; + + err = __context_pin_state(ce->state, flags); + if (err) { + i915_active_cancel(&ce->active); + intel_context_put(ce); + return err; + } + + /* Preallocate tracking nodes */ + if (!i915_gem_context_is_kernel(ce->gem_context)) { + err = i915_active_acquire_preallocate_barrier(&ce->active, + ce->engine); + if (err) { + i915_active_release(&ce->active); + return err; + } + } + + return 0; +} + +void intel_context_inactive(struct intel_context *ce) +{ + /* Nodes preallocated in intel_context_active() */ + i915_active_acquire_barrier(&ce->active); + i915_active_release(&ce->active); } static void i915_global_context_shrink(void) diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index 63392c88cd98..e71629f7c2e0 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -102,6 +102,9 @@ static inline void intel_context_exit(struct intel_context *ce) ce->ops->exit(ce); } +int intel_context_active(struct intel_context *ce, unsigned long flags); +void intel_context_inactive(struct intel_context *ce); + static inline struct intel_context *intel_context_get(struct intel_context *ce) { kref_get(&ce->ref); diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index e47d5b7af5d4..08049ee91cee 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -54,10 +54,10 @@ struct intel_context { struct mutex pin_mutex; /* guards pinning and associated on-gpuing */ /** - * active_tracker: Active tracker for the external rq activity - * on this intel_context object. + * active: Active tracker for the rq activity (inc. external) on this + * intel_context object. */ - struct i915_active_request active_tracker; + struct i915_active active; const struct intel_context_ops *ops; diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 201bbd2a4faf..b9fd88f21609 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -466,8 +466,6 @@ static inline void intel_engine_reset(struct intel_engine_cs *engine, bool intel_engine_is_idle(struct intel_engine_cs *engine); bool intel_engines_are_idle(struct drm_i915_private *dev_priv); -void intel_engine_lost_context(struct intel_engine_cs *engine); - void intel_engines_reset_default_submission(struct drm_i915_private *i915); unsigned int intel_engines_has_context_isolation(struct drm_i915_private *i915); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 6b838948ba24..98b217621475 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -611,6 +611,8 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) { int err; + init_llist_head(&engine->barrier_tasks); + err = init_status_page(engine); if (err) return err; @@ -870,6 +872,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) if (engine->preempt_context) intel_context_unpin(engine->preempt_context); intel_context_unpin(engine->kernel_context); + GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); i915_timeline_fini(&engine->timeline); @@ -1200,26 +1203,6 @@ void intel_engines_reset_default_submission(struct drm_i915_private *i915) engine->set_default_submission(engine); } -/** - * intel_engine_lost_context: called when the GPU is reset into unknown state - * @engine: the engine - * - * We have either reset the GPU or otherwise about to lose state tracking of - * the current GPU logical state (e.g. suspend). On next use, it is therefore - * imperative that we make no presumptions about the current state and load - * from scratch. - */ -void intel_engine_lost_context(struct intel_engine_cs *engine) -{ - struct intel_context *ce; - - lockdep_assert_held(&engine->i915->drm.struct_mutex); - - ce = fetch_and_zero(&engine->last_retired_context); - if (ce) - intel_context_unpin(ce); -} - bool intel_engine_can_store_dword(struct intel_engine_cs *engine) { switch (INTEL_GEN(engine->i915)) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c index 7bffa0c6f5a2..25287140fb61 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c @@ -88,6 +88,8 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine) /* Check again on the next retirement. */ engine->wakeref_serial = engine->serial + 1; + + i915_request_add_barriers(rq); __i915_request_commit(rq); return false; diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index d2dff9056ea0..52f4987a57e5 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -11,6 +11,7 @@ #include #include #include +#include #include #include "i915_gem.h" @@ -288,6 +289,7 @@ struct intel_engine_cs { struct intel_ring *buffer; struct i915_timeline timeline; + struct llist_head barrier_tasks; struct intel_context *kernel_context; /* pinned */ struct intel_context *preempt_context; /* pinned; optional */ @@ -437,17 +439,6 @@ struct intel_engine_cs { struct intel_engine_execlists execlists; - /* Contexts are pinned whilst they are active on the GPU. The last - * context executed remains active whilst the GPU is idle - the - * switch away and write to the context object only occurs on the - * next execution. Contexts are only unpinned on retirement of the - * following request ensuring that we can always write to the object - * on the context switch even after idling. Across suspend, we switch - * to the kernel context and trash it as the save may not happen - * before the hardware is powered down. - */ - struct intel_context *last_retired_context; - /* status_notifier: list of callbacks for context-switch changes */ struct atomic_notifier_head context_status_notifier; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 62220cf9d281..b39fd3e79f51 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1422,60 +1422,11 @@ static void execlists_context_destroy(struct kref *kref) intel_context_free(ce); } -static int __context_pin(struct i915_vma *vma) -{ - unsigned int flags; - int err; - - flags = PIN_GLOBAL | PIN_HIGH; - flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma); - - err = i915_vma_pin(vma, 0, 0, flags); - if (err) - return err; - - vma->obj->pin_global++; - vma->obj->mm.dirty = true; - - return 0; -} - -static void __context_unpin(struct i915_vma *vma) -{ - vma->obj->pin_global--; - __i915_vma_unpin(vma); -} - static void execlists_context_unpin(struct intel_context *ce) { - struct intel_engine_cs *engine; - - /* - * The tasklet may still be using a pointer to our state, via an - * old request. However, since we know we only unpin the context - * on retirement of the following request, we know that the last - * request referencing us will have had a completion CS interrupt. - * If we see that it is still active, it means that the tasklet hasn't - * had the chance to run yet; let it run before we teardown the - * reference it may use. - */ - engine = READ_ONCE(ce->inflight); - if (unlikely(engine)) { - unsigned long flags; - - spin_lock_irqsave(&engine->timeline.lock, flags); - process_csb(engine); - spin_unlock_irqrestore(&engine->timeline.lock, flags); - - GEM_BUG_ON(READ_ONCE(ce->inflight)); - } - i915_gem_context_unpin_hw_id(ce->gem_context); - - intel_ring_unpin(ce->ring); - i915_gem_object_unpin_map(ce->state->obj); - __context_unpin(ce->state); + intel_ring_unpin(ce->ring); } static void @@ -1512,7 +1463,10 @@ __execlists_context_pin(struct intel_context *ce, goto err; GEM_BUG_ON(!ce->state); - ret = __context_pin(ce->state); + ret = intel_context_active(ce, + engine->i915->ggtt.pin_bias | + PIN_OFFSET_BIAS | + PIN_HIGH); if (ret) goto err; @@ -1521,7 +1475,7 @@ __execlists_context_pin(struct intel_context *ce, I915_MAP_OVERRIDE); if (IS_ERR(vaddr)) { ret = PTR_ERR(vaddr); - goto unpin_vma; + goto unpin_active; } ret = intel_ring_pin(ce->ring); @@ -1542,8 +1496,8 @@ __execlists_context_pin(struct intel_context *ce, intel_ring_unpin(ce->ring); unpin_map: i915_gem_object_unpin_map(ce->state->obj); -unpin_vma: - __context_unpin(ce->state); +unpin_active: + intel_context_inactive(ce); err: return ret; } diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index ff58d658e3e2..fc596adc637a 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1349,45 +1349,9 @@ static void __context_unpin_ppgtt(struct i915_gem_context *ctx) gen6_ppgtt_unpin(ppgtt); } -static int __context_pin(struct intel_context *ce) -{ - struct i915_vma *vma; - int err; - - vma = ce->state; - if (!vma) - return 0; - - err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH); - if (err) - return err; - - /* - * And mark is as a globally pinned object to let the shrinker know - * it cannot reclaim the object until we release it. - */ - vma->obj->pin_global++; - vma->obj->mm.dirty = true; - - return 0; -} - -static void __context_unpin(struct intel_context *ce) -{ - struct i915_vma *vma; - - vma = ce->state; - if (!vma) - return; - - vma->obj->pin_global--; - i915_vma_unpin(vma); -} - static void ring_context_unpin(struct intel_context *ce) { __context_unpin_ppgtt(ce->gem_context); - __context_unpin(ce); } static struct i915_vma * @@ -1477,18 +1441,18 @@ static int ring_context_pin(struct intel_context *ce) ce->state = vma; } - err = __context_pin(ce); + err = intel_context_active(ce, PIN_HIGH); if (err) return err; err = __context_pin_ppgtt(ce->gem_context); if (err) - goto err_unpin; + goto err_active; return 0; -err_unpin: - __context_unpin(ce); +err_active: + intel_context_inactive(ce); return err; } diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 6d7562769eb2..b7675ef18523 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -146,12 +146,18 @@ static void mock_context_destroy(struct kref *ref) static int mock_context_pin(struct intel_context *ce) { + int ret; + if (!ce->ring) { ce->ring = mock_ring(ce->engine); if (!ce->ring) return -ENOMEM; } + ret = intel_context_active(ce, PIN_HIGH); + if (ret) + return ret; + mock_timeline_pin(ce->ring->timeline); return 0; } @@ -328,14 +334,9 @@ void mock_engine_free(struct intel_engine_cs *engine) { struct mock_engine *mock = container_of(engine, typeof(*mock), base); - struct intel_context *ce; GEM_BUG_ON(timer_pending(&mock->hw_delay)); - ce = fetch_and_zero(&engine->last_retired_context); - if (ce) - intel_context_unpin(ce); - intel_context_unpin(engine->kernel_context); intel_engine_fini_breadcrumbs(engine); diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 863ae12707ba..100e40afc9d6 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -100,7 +100,7 @@ active_instance(struct i915_active *ref, u64 idx) parent = *p; node = rb_entry(parent, struct active_node, node); - if (node->timeline == idx) + if (node->timeline == idx && !IS_ERR(node->base.request)) goto replace; if (node->timeline < idx) @@ -157,6 +157,7 @@ void i915_active_init(struct drm_i915_private *i915, ref->retire = retire; ref->tree = RB_ROOT; i915_active_request_init(&ref->last, NULL, last_retire); + init_llist_head(&ref->barriers); ref->count = 0; } @@ -263,6 +264,83 @@ void i915_active_fini(struct i915_active *ref) } #endif +int i915_active_acquire_preallocate_barrier(struct i915_active *ref, + struct intel_engine_cs *engine) +{ + struct drm_i915_private *i915 = engine->i915; + unsigned long tmp; + int err = 0; + + GEM_BUG_ON(!engine->mask); + for_each_engine_masked(engine, i915, engine->mask, tmp) { + struct intel_context *kctx = engine->kernel_context; + struct active_node *node; + + node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); + if (unlikely(!node)) { + err = -ENOMEM; + break; + } + + i915_active_request_init(&node->base, + (void *)engine, node_retire); + node->timeline = kctx->ring->timeline->fence_context; + node->ref = ref; + ref->count++; + + llist_add((struct llist_node *)&node->base.link, + &ref->barriers); + } + + return err; +} + +void i915_active_acquire_barrier(struct i915_active *ref) +{ + struct llist_node *pos, *next; + + i915_active_acquire(ref); + + llist_for_each_safe(pos, next, llist_del_all(&ref->barriers)) { + struct intel_engine_cs *engine; + struct active_node *node; + struct rb_node **p, *parent; + + node = container_of((struct list_head *)pos, + typeof(*node), base.link); + + engine = (void *)rcu_access_pointer(node->base.request); + RCU_INIT_POINTER(node->base.request, ERR_PTR(-EAGAIN)); + + parent = NULL; + p = &ref->tree.rb_node; + while (*p) { + parent = *p; + if (rb_entry(parent, + struct active_node, + node)->timeline < node->timeline) + p = &parent->rb_right; + else + p = &parent->rb_left; + } + rb_link_node(&node->node, parent, p); + rb_insert_color(&node->node, &ref->tree); + + llist_add((struct llist_node *)&node->base.link, + &engine->barrier_tasks); + } + i915_active_release(ref); +} + +void i915_request_add_barriers(struct i915_request *rq) +{ + struct intel_engine_cs *engine = rq->engine; + struct llist_node *node, *next; + + llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) + list_add_tail((struct list_head *)node, &rq->active_list); +} + int i915_active_request_set(struct i915_active_request *active, struct i915_request *rq) { diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 7d758719ce39..d55d37673944 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -406,4 +406,9 @@ void i915_active_fini(struct i915_active *ref); static inline void i915_active_fini(struct i915_active *ref) { } #endif +int i915_active_acquire_preallocate_barrier(struct i915_active *ref, + struct intel_engine_cs *engine); +void i915_active_acquire_barrier(struct i915_active *ref); +void i915_request_add_barriers(struct i915_request *rq); + #endif /* _I915_ACTIVE_H_ */ diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h index b679253b53a5..c025991b9233 100644 --- a/drivers/gpu/drm/i915/i915_active_types.h +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -7,6 +7,7 @@ #ifndef _I915_ACTIVE_TYPES_H_ #define _I915_ACTIVE_TYPES_H_ +#include #include #include @@ -31,6 +32,8 @@ struct i915_active { unsigned int count; void (*retire)(struct i915_active *ref); + + struct llist_head barriers; }; #endif /* _I915_ACTIVE_TYPES_H_ */ diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 25f6a2c3139e..8303a702d9fe 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1197,10 +1197,6 @@ void i915_gem_sanitize(struct drm_i915_private *i915) intel_uncore_forcewake_put(&i915->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(i915, wakeref); - - mutex_lock(&i915->drm.struct_mutex); - i915_gem_contexts_lost(i915); - mutex_unlock(&i915->drm.struct_mutex); } void i915_gem_init_swizzling(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 8100ccf04c0b..be830f0ea76d 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -213,18 +213,6 @@ static void __retire_engine_request(struct intel_engine_cs *engine, spin_unlock(&rq->lock); local_irq_enable(); - - /* - * The backing object for the context is done after switching to the - * *next* context. Therefore we cannot retire the previous context until - * the next context has already started running. However, since we - * cannot take the required locks at i915_request_submit() we - * defer the unpinning of the active context to now, retirement of - * the subsequent request. - */ - if (engine->last_retired_context) - intel_context_unpin(engine->last_retired_context); - engine->last_retired_context = rq->hw_context; } static void __retire_engine_upto(struct intel_engine_cs *engine, @@ -759,9 +747,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) rq->infix = rq->ring->emit; /* end of header; start of user payload */ - /* Keep a second pin for the dual retirement along engine and ring */ - __intel_context_pin(ce); - intel_context_mark_active(ce); return rq; diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c index b7f3fbb4ae89..a96d0c012d46 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c @@ -56,7 +56,6 @@ static void mock_device_release(struct drm_device *dev) mutex_lock(&i915->drm.struct_mutex); mock_device_flush(i915); - i915_gem_contexts_lost(i915); mutex_unlock(&i915->drm.struct_mutex); flush_work(&i915->gem.idle_work); From patchwork Mon Jun 10 07:21:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984193 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 343221580 for ; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1CD302881C for ; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 1119828842; Mon, 10 Jun 2019 07:22:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 986732881C for ; Mon, 10 Jun 2019 07:21:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A202E89143; Mon, 10 Jun 2019 07:21:56 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id EE11D8912F for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848360-1500050 for multiple; Mon, 10 Jun 2019 08:21:28 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:05 +0100 Message-Id: <20190610072126.6355-8-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 07/28] drm/i915: Stop retiring along engine X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We no longer track the execution order along the engine and so no longer need to enforce ordering of retire along the engine. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 128 +++++++++++----------------- 1 file changed, 52 insertions(+), 76 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index be830f0ea76d..5423ec9eaf72 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -183,72 +183,23 @@ static void free_capture_list(struct i915_request *request) } } -static void __retire_engine_request(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - GEM_TRACE("%s(%s) fence %llx:%lld, current %d\n", - __func__, engine->name, - rq->fence.context, rq->fence.seqno, - hwsp_seqno(rq)); - - GEM_BUG_ON(!i915_request_completed(rq)); - - local_irq_disable(); - - spin_lock(&engine->timeline.lock); - GEM_BUG_ON(!list_is_first(&rq->link, &engine->timeline.requests)); - list_del_init(&rq->link); - spin_unlock(&engine->timeline.lock); - - spin_lock(&rq->lock); - i915_request_mark_complete(rq); - if (!i915_request_signaled(rq)) - dma_fence_signal_locked(&rq->fence); - if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) - i915_request_cancel_breadcrumb(rq); - if (rq->waitboost) { - GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters)); - atomic_dec(&rq->i915->gt_pm.rps.num_waiters); - } - spin_unlock(&rq->lock); - - local_irq_enable(); -} - -static void __retire_engine_upto(struct intel_engine_cs *engine, - struct i915_request *rq) -{ - struct i915_request *tmp; - - if (list_empty(&rq->link)) - return; - - do { - tmp = list_first_entry(&engine->timeline.requests, - typeof(*tmp), link); - - GEM_BUG_ON(tmp->engine != engine); - __retire_engine_request(engine, tmp); - } while (tmp != rq); -} - -static void i915_request_retire(struct i915_request *request) +static bool i915_request_retire(struct i915_request *rq) { struct i915_active_request *active, *next; - GEM_TRACE("%s fence %llx:%lld, current %d\n", - request->engine->name, - request->fence.context, request->fence.seqno, - hwsp_seqno(request)); + lockdep_assert_held(&rq->i915->drm.struct_mutex); + if (!i915_request_completed(rq)) + return false; - lockdep_assert_held(&request->i915->drm.struct_mutex); - GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit)); - GEM_BUG_ON(!i915_request_completed(request)); + GEM_TRACE("%s fence %llx:%lld, current %d\n", + rq->engine->name, + rq->fence.context, rq->fence.seqno, + hwsp_seqno(rq)); - trace_i915_request_retire(request); + GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit)); + trace_i915_request_retire(rq); - advance_ring(request); - free_capture_list(request); + advance_ring(rq); /* * Walk through the active list, calling retire on each. This allows @@ -260,7 +211,7 @@ static void i915_request_retire(struct i915_request *request) * pass along the auxiliary information (to avoid dereferencing * the node after the callback). */ - list_for_each_entry_safe(active, next, &request->active_list, link) { + list_for_each_entry_safe(active, next, &rq->active_list, link) { /* * In microbenchmarks or focusing upon time inside the kernel, * we may spend an inordinate amount of time simply handling @@ -276,18 +227,39 @@ static void i915_request_retire(struct i915_request *request) INIT_LIST_HEAD(&active->link); RCU_INIT_POINTER(active->request, NULL); - active->retire(active, request); + active->retire(active, rq); + } + + local_irq_disable(); + + spin_lock(&rq->engine->timeline.lock); + list_del(&rq->link); + spin_unlock(&rq->engine->timeline.lock); + + spin_lock(&rq->lock); + i915_request_mark_complete(rq); + if (!i915_request_signaled(rq)) + dma_fence_signal_locked(&rq->fence); + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) + i915_request_cancel_breadcrumb(rq); + if (rq->waitboost) { + GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters)); + atomic_dec(&rq->i915->gt_pm.rps.num_waiters); } + spin_unlock(&rq->lock); + + local_irq_enable(); - i915_request_remove_from_client(request); + intel_context_exit(rq->hw_context); + intel_context_unpin(rq->hw_context); - __retire_engine_upto(request->engine, request); + i915_request_remove_from_client(rq); - intel_context_exit(request->hw_context); - intel_context_unpin(request->hw_context); + free_capture_list(rq); + i915_sched_node_fini(&rq->sched); + i915_request_put(rq); - i915_sched_node_fini(&request->sched); - i915_request_put(request); + return true; } void i915_request_retire_upto(struct i915_request *rq) @@ -309,9 +281,7 @@ void i915_request_retire_upto(struct i915_request *rq) do { tmp = list_first_entry(&ring->request_list, typeof(*tmp), ring_link); - - i915_request_retire(tmp); - } while (tmp != rq); + } while (i915_request_retire(tmp) && tmp != rq); } static void irq_execute_cb(struct irq_work *wrk) @@ -600,12 +570,9 @@ static void ring_retire_requests(struct intel_ring *ring) { struct i915_request *rq, *rn; - list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link) { - if (!i915_request_completed(rq)) + list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link) + if (!i915_request_retire(rq)) break; - - i915_request_retire(rq); - } } static noinline struct i915_request * @@ -620,6 +587,15 @@ request_alloc_slow(struct intel_context *ce, gfp_t gfp) if (!gfpflags_allow_blocking(gfp)) goto out; + /* Move our oldest request to the slab-cache (if not in use!) */ + rq = list_first_entry(&ring->request_list, typeof(*rq), ring_link); + i915_request_retire(rq); + + rq = kmem_cache_alloc(global.slab_requests, + gfp | __GFP_RETRY_MAYFAIL | __GFP_NOWARN); + if (rq) + return rq; + /* Ratelimit ourselves to prevent oom from malicious clients */ rq = list_last_entry(&ring->request_list, typeof(*rq), ring_link); cond_synchronize_rcu(rq->rcustate); From patchwork Mon Jun 10 07:21:06 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984183 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1AEB31902 for ; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 026692881C for ; Mon, 10 Jun 2019 07:21:57 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id EA95228843; Mon, 10 Jun 2019 07:21:56 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id DA77A2881C for ; Mon, 10 Jun 2019 07:21:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 53EC089134; Mon, 10 Jun 2019 07:21:52 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9484C89126 for ; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848361-1500050 for multiple; Mon, 10 Jun 2019 08:21:28 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:06 +0100 Message-Id: <20190610072126.6355-9-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 08/28] drm/i915: Replace engine->timeline with a plain list X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP To continue the onslaught of removing the assumption of a global execution ordering, another casualty is the engine->timeline. Without an actual timeline to track, it is overkill and we can replace it with a much less grand plain list. We still need a list of requests inflight, for the simple purpose of finding inflight requests (for retiring, resetting, preemption etc). Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine.h | 6 ++ drivers/gpu/drm/i915/gt/intel_engine_cs.c | 62 ++++++------ drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 95 ++++++++++--------- drivers/gpu/drm/i915/gt/intel_reset.c | 10 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 15 ++- drivers/gpu/drm/i915/gt/mock_engine.c | 18 ++-- drivers/gpu/drm/i915/i915_gpu_error.c | 5 +- drivers/gpu/drm/i915/i915_request.c | 43 +++------ drivers/gpu/drm/i915/i915_request.h | 2 +- drivers/gpu/drm/i915/i915_scheduler.c | 38 ++++---- drivers/gpu/drm/i915/i915_timeline.c | 1 - drivers/gpu/drm/i915/i915_timeline.h | 19 ---- drivers/gpu/drm/i915/i915_timeline_types.h | 4 - drivers/gpu/drm/i915/intel_guc_submission.c | 16 ++-- .../gpu/drm/i915/selftests/mock_timeline.c | 1 - 16 files changed, 153 insertions(+), 188 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index b9fd88f21609..6be607e9c084 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -564,4 +564,10 @@ static inline bool inject_preempt_hang(struct intel_engine_execlists *execlists) #endif +void intel_engine_init_active(struct intel_engine_cs *engine, + unsigned int subclass); +#define ENGINE_PHYSICAL 0 +#define ENGINE_MOCK 1 +#define ENGINE_VIRTUAL 2 + #endif /* _INTEL_RINGBUFFER_H_ */ diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 98b217621475..0287c3b094a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -617,14 +617,7 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) if (err) return err; - err = i915_timeline_init(engine->i915, - &engine->timeline, - engine->status_page.vma); - if (err) - goto err_hwsp; - - i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE); - + intel_engine_init_active(engine, ENGINE_PHYSICAL); intel_engine_init_breadcrumbs(engine); intel_engine_init_execlists(engine); intel_engine_init_hangcheck(engine); @@ -637,10 +630,6 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) intel_sseu_from_device_info(&RUNTIME_INFO(engine->i915)->sseu); return 0; - -err_hwsp: - cleanup_status_page(engine); - return err; } /** @@ -797,6 +786,27 @@ static int pin_context(struct i915_gem_context *ctx, return 0; } +void +intel_engine_init_active(struct intel_engine_cs *engine, unsigned int subclass) +{ + INIT_LIST_HEAD(&engine->active.requests); + + spin_lock_init(&engine->active.lock); + lockdep_set_subclass(&engine->active.lock, subclass); + + /* + * Due to an interesting quirk in lockdep's internal debug tracking, + * after setting a subclass we must ensure the lock is used. Otherwise, + * nr_unused_locks is incremented once too often. + */ +#ifdef CONFIG_DEBUG_LOCK_ALLOC + local_irq_disable(); + lock_map_acquire(&engine->active.lock.dep_map); + lock_map_release(&engine->active.lock.dep_map); + local_irq_enable(); +#endif +} + /** * intel_engines_init_common - initialize cengine state which might require hw access * @engine: Engine to initialize. @@ -860,6 +870,8 @@ int intel_engine_init_common(struct intel_engine_cs *engine) */ void intel_engine_cleanup_common(struct intel_engine_cs *engine) { + GEM_BUG_ON(!list_empty(&engine->active.requests)); + cleanup_status_page(engine); intel_engine_fini_breadcrumbs(engine); @@ -874,8 +886,6 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) intel_context_unpin(engine->kernel_context); GEM_BUG_ON(!llist_empty(&engine->barrier_tasks)); - i915_timeline_fini(&engine->timeline); - intel_wa_list_free(&engine->ctx_wa_list); intel_wa_list_free(&engine->wa_list); intel_wa_list_free(&engine->whitelist); @@ -1481,16 +1491,6 @@ void intel_engine_dump(struct intel_engine_cs *engine, drm_printf(m, "\tRequests:\n"); - rq = list_first_entry(&engine->timeline.requests, - struct i915_request, link); - if (&rq->link != &engine->timeline.requests) - print_request(m, rq, "\t\tfirst "); - - rq = list_last_entry(&engine->timeline.requests, - struct i915_request, link); - if (&rq->link != &engine->timeline.requests) - print_request(m, rq, "\t\tlast "); - rq = intel_engine_find_active_request(engine); if (rq) { print_request(m, rq, "\t\tactive "); @@ -1571,7 +1571,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) if (!intel_engine_supports_stats(engine)) return -ENODEV; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); write_seqlock(&engine->stats.lock); if (unlikely(engine->stats.enabled == ~0)) { @@ -1597,7 +1597,7 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) unlock: write_sequnlock(&engine->stats.lock); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); return err; } @@ -1682,22 +1682,22 @@ intel_engine_find_active_request(struct intel_engine_cs *engine) * At all other times, we must assume the GPU is still running, but * we only care about the snapshot of this moment. */ - spin_lock_irqsave(&engine->timeline.lock, flags); - list_for_each_entry(request, &engine->timeline.requests, link) { + spin_lock_irqsave(&engine->active.lock, flags); + list_for_each_entry(request, &engine->active.requests, sched.link) { if (i915_request_completed(request)) continue; if (!i915_request_started(request)) - break; + continue; /* More than one preemptible request may match! */ if (!match_ring(request)) - break; + continue; active = request; break; } - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); return active; } diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 52f4987a57e5..868b220214f8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -288,7 +288,11 @@ struct intel_engine_cs { struct intel_ring *buffer; - struct i915_timeline timeline; + struct { + spinlock_t lock; + struct list_head requests; + } active; + struct llist_head barrier_tasks; struct intel_context *kernel_context; /* pinned */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index b39fd3e79f51..6879f065ae27 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -298,8 +298,8 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, * Check against the first request in ELSP[1], it will, thanks to the * power of PI, be the highest priority of that context. */ - if (!list_is_last(&rq->link, &engine->timeline.requests) && - rq_prio(list_next_entry(rq, link)) > last_prio) + if (!list_is_last(&rq->sched.link, &engine->active.requests) && + rq_prio(list_next_entry(rq, sched.link)) > last_prio) return true; if (rb) { @@ -434,11 +434,11 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) struct list_head *uninitialized_var(pl); int prio = I915_PRIORITY_INVALID; - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); list_for_each_entry_safe_reverse(rq, rn, - &engine->timeline.requests, - link) { + &engine->active.requests, + sched.link) { struct intel_engine_cs *owner; if (i915_request_completed(rq)) @@ -465,7 +465,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) } GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); - list_add(&rq->sched.link, pl); + list_move(&rq->sched.link, pl); active = rq; } else { rq->engine = owner; @@ -933,11 +933,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine) rb_entry(rb, typeof(*ve), nodes[engine->id].rb); struct i915_request *rq; - spin_lock(&ve->base.timeline.lock); + spin_lock(&ve->base.active.lock); rq = ve->request; if (unlikely(!rq)) { /* lost the race to a sibling */ - spin_unlock(&ve->base.timeline.lock); + spin_unlock(&ve->base.active.lock); rb_erase_cached(rb, &execlists->virtual); RB_CLEAR_NODE(rb); rb = rb_first_cached(&execlists->virtual); @@ -950,13 +950,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine) if (rq_prio(rq) >= queue_prio(execlists)) { if (!virtual_matches(ve, rq, engine)) { - spin_unlock(&ve->base.timeline.lock); + spin_unlock(&ve->base.active.lock); rb = rb_next(rb); continue; } if (last && !can_merge_rq(last, rq)) { - spin_unlock(&ve->base.timeline.lock); + spin_unlock(&ve->base.active.lock); return; /* leave this rq for another engine */ } @@ -1011,7 +1011,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine) last = rq; } - spin_unlock(&ve->base.timeline.lock); + spin_unlock(&ve->base.active.lock); break; } @@ -1068,8 +1068,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine) GEM_BUG_ON(port_isset(port)); } - list_del_init(&rq->sched.link); - __i915_request_submit(rq); trace_i915_request_in(rq, port_index(port, execlists)); @@ -1170,7 +1168,7 @@ static void process_csb(struct intel_engine_cs *engine) const u8 num_entries = execlists->csb_size; u8 head, tail; - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); /* * Note that csb_write, csb_status may be either in HWSP or mmio. @@ -1330,7 +1328,7 @@ static void process_csb(struct intel_engine_cs *engine) static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) { - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); process_csb(engine); if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT)) @@ -1351,15 +1349,16 @@ static void execlists_submission_tasklet(unsigned long data) !!intel_wakeref_active(&engine->wakeref), engine->execlists.active); - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __execlists_submission_tasklet(engine); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void queue_request(struct intel_engine_cs *engine, struct i915_sched_node *node, int prio) { + GEM_BUG_ON(!list_empty(&node->link)); list_add_tail(&node->link, i915_sched_lookup_priolist(engine, prio)); } @@ -1390,7 +1389,7 @@ static void execlists_submit_request(struct i915_request *request) unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); queue_request(engine, &request->sched, rq_prio(request)); @@ -1399,7 +1398,7 @@ static void execlists_submit_request(struct i915_request *request) submit_queue(engine, rq_prio(request)); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void __execlists_context_fini(struct intel_context *ce) @@ -2049,8 +2048,8 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) intel_engine_stop_cs(engine); /* And flush any current direct submission. */ - spin_lock_irqsave(&engine->timeline.lock, flags); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static bool lrc_regs_ok(const struct i915_request *rq) @@ -2093,11 +2092,11 @@ static void reset_csb_pointers(struct intel_engine_execlists *execlists) static struct i915_request *active_request(struct i915_request *rq) { - const struct list_head * const list = &rq->engine->timeline.requests; + const struct list_head * const list = &rq->engine->active.requests; const struct intel_context * const context = rq->hw_context; struct i915_request *active = NULL; - list_for_each_entry_from_reverse(rq, list, link) { + list_for_each_entry_from_reverse(rq, list, sched.link) { if (i915_request_completed(rq)) break; @@ -2214,11 +2213,11 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled) GEM_TRACE("%s\n", engine->name); - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __execlists_reset(engine, stalled); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void nop_submission_tasklet(unsigned long data) @@ -2249,12 +2248,12 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __execlists_reset(engine, true); /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->timeline.requests, link) { + list_for_each_entry(rq, &engine->active.requests, sched.link) { if (!i915_request_signaled(rq)) dma_fence_set_error(&rq->fence, -EIO); @@ -2285,7 +2284,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) rb_erase_cached(rb, &execlists->virtual); RB_CLEAR_NODE(rb); - spin_lock(&ve->base.timeline.lock); + spin_lock(&ve->base.active.lock); if (ve->request) { ve->request->engine = engine; __i915_request_submit(ve->request); @@ -2294,7 +2293,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) ve->base.execlists.queue_priority_hint = INT_MIN; ve->request = NULL; } - spin_unlock(&ve->base.timeline.lock); + spin_unlock(&ve->base.active.lock); } /* Remaining _unready_ requests will be nop'ed when submitted */ @@ -2306,7 +2305,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); execlists->tasklet.func = nop_submission_tasklet; - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void execlists_reset_finish(struct intel_engine_cs *engine) @@ -3009,12 +3008,18 @@ static int execlists_context_deferred_alloc(struct intel_context *ce, return ret; } +static struct list_head *virtual_queue(struct virtual_engine *ve) +{ + return &ve->base.execlists.default_priolist.requests[0]; +} + static void virtual_context_destroy(struct kref *kref) { struct virtual_engine *ve = container_of(kref, typeof(*ve), context.ref); unsigned int n; + GEM_BUG_ON(!list_empty(virtual_queue(ve))); GEM_BUG_ON(ve->request); GEM_BUG_ON(ve->context.inflight); @@ -3025,13 +3030,13 @@ static void virtual_context_destroy(struct kref *kref) if (RB_EMPTY_NODE(node)) continue; - spin_lock_irq(&sibling->timeline.lock); + spin_lock_irq(&sibling->active.lock); /* Detachment is lazily performed in the execlists tasklet */ if (!RB_EMPTY_NODE(node)) rb_erase_cached(node, &sibling->execlists.virtual); - spin_unlock_irq(&sibling->timeline.lock); + spin_unlock_irq(&sibling->active.lock); } GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet)); @@ -3039,8 +3044,6 @@ static void virtual_context_destroy(struct kref *kref) __execlists_context_fini(&ve->context); kfree(ve->bonds); - - i915_timeline_fini(&ve->base.timeline); kfree(ve); } @@ -3159,16 +3162,16 @@ static void virtual_submission_tasklet(unsigned long data) if (unlikely(!(mask & sibling->mask))) { if (!RB_EMPTY_NODE(&node->rb)) { - spin_lock(&sibling->timeline.lock); + spin_lock(&sibling->active.lock); rb_erase_cached(&node->rb, &sibling->execlists.virtual); RB_CLEAR_NODE(&node->rb); - spin_unlock(&sibling->timeline.lock); + spin_unlock(&sibling->active.lock); } continue; } - spin_lock(&sibling->timeline.lock); + spin_lock(&sibling->active.lock); if (!RB_EMPTY_NODE(&node->rb)) { /* @@ -3212,7 +3215,7 @@ static void virtual_submission_tasklet(unsigned long data) tasklet_hi_schedule(&sibling->execlists.tasklet); } - spin_unlock(&sibling->timeline.lock); + spin_unlock(&sibling->active.lock); } local_irq_enable(); } @@ -3229,9 +3232,13 @@ static void virtual_submit_request(struct i915_request *rq) GEM_BUG_ON(ve->base.submit_request != virtual_submit_request); GEM_BUG_ON(ve->request); + GEM_BUG_ON(!list_empty(virtual_queue(ve))); + ve->base.execlists.queue_priority_hint = rq_prio(rq); WRITE_ONCE(ve->request, rq); + list_move_tail(&rq->sched.link, virtual_queue(ve)); + tasklet_schedule(&ve->base.execlists.tasklet); } @@ -3296,10 +3303,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx, snprintf(ve->base.name, sizeof(ve->base.name), "virtual"); - err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL); - if (err) - goto err_put; - i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL); + intel_engine_init_active(&ve->base, ENGINE_VIRTUAL); intel_engine_init_execlists(&ve->base); @@ -3310,6 +3314,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx, ve->base.submit_request = virtual_submit_request; ve->base.bond_execute = virtual_bond_execute; + INIT_LIST_HEAD(virtual_queue(ve)); ve->base.execlists.queue_priority_hint = INT_MIN; tasklet_init(&ve->base.execlists.tasklet, virtual_submission_tasklet, @@ -3464,11 +3469,11 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, unsigned int count; struct rb_node *rb; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); last = NULL; count = 0; - list_for_each_entry(rq, &engine->timeline.requests, link) { + list_for_each_entry(rq, &engine->active.requests, sched.link) { if (count++ < max - 1) show_request(m, rq, "\t\tE "); else @@ -3531,7 +3536,7 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine, show_request(m, last, "\t\tV "); } - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } void intel_lr_context_reset(struct intel_engine_cs *engine, diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 1e93cf6eede4..1f831fe759a5 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -49,12 +49,12 @@ static void engine_skip_context(struct i915_request *rq) struct intel_engine_cs *engine = rq->engine; struct i915_gem_context *hung_ctx = rq->gem_context; - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); if (!i915_request_is_active(rq)) return; - list_for_each_entry_continue(rq, &engine->timeline.requests, link) + list_for_each_entry_continue(rq, &engine->active.requests, sched.link) if (rq->gem_context == hung_ctx) i915_request_skip(rq, -EIO); } @@ -130,7 +130,7 @@ void i915_reset_request(struct i915_request *rq, bool guilty) rq->fence.seqno, yesno(guilty)); - lockdep_assert_held(&rq->engine->timeline.lock); + lockdep_assert_held(&rq->engine->active.lock); GEM_BUG_ON(i915_request_completed(rq)); if (guilty) { @@ -785,10 +785,10 @@ static void nop_submit_request(struct i915_request *request) engine->name, request->fence.context, request->fence.seqno); dma_fence_set_error(&request->fence, -EIO); - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __i915_request_submit(request); i915_request_mark_complete(request); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); intel_engine_queue_breadcrumbs(engine); } diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index fc596adc637a..93b0893c736b 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -730,14 +730,13 @@ static void reset_prepare(struct intel_engine_cs *engine) static void reset_ring(struct intel_engine_cs *engine, bool stalled) { - struct i915_timeline *tl = &engine->timeline; struct i915_request *pos, *rq; unsigned long flags; u32 head; rq = NULL; - spin_lock_irqsave(&tl->lock, flags); - list_for_each_entry(pos, &tl->requests, link) { + spin_lock_irqsave(&engine->active.lock, flags); + list_for_each_entry(pos, &engine->active.requests, sched.link) { if (!i915_request_completed(pos)) { rq = pos; break; @@ -791,7 +790,7 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled) } engine->buffer->head = intel_ring_wrap(engine->buffer, head); - spin_unlock_irqrestore(&tl->lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void reset_finish(struct intel_engine_cs *engine) @@ -877,10 +876,10 @@ static void cancel_requests(struct intel_engine_cs *engine) struct i915_request *request; unsigned long flags; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(request, &engine->timeline.requests, link) { + list_for_each_entry(request, &engine->active.requests, sched.link) { if (!i915_request_signaled(request)) dma_fence_set_error(&request->fence, -EIO); @@ -889,7 +888,7 @@ static void cancel_requests(struct intel_engine_cs *engine) /* Remaining _unready_ requests will be nop'ed when submitted */ - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void i9xx_submit_request(struct i915_request *request) @@ -1267,8 +1266,6 @@ intel_engine_create_ring(struct intel_engine_cs *engine, GEM_BUG_ON(!is_power_of_2(size)); GEM_BUG_ON(RING_CTL_SIZE(size) & ~RING_NR_PAGES); - GEM_BUG_ON(timeline == &engine->timeline); - lockdep_assert_held(&engine->i915->drm.struct_mutex); ring = kzalloc(sizeof(*ring), GFP_KERNEL); if (!ring) diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index b7675ef18523..00c666d3e652 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -229,17 +229,17 @@ static void mock_cancel_requests(struct intel_engine_cs *engine) struct i915_request *request; unsigned long flags; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); /* Mark all submitted requests as skipped. */ - list_for_each_entry(request, &engine->timeline.requests, sched.link) { + list_for_each_entry(request, &engine->active.requests, sched.link) { if (!i915_request_signaled(request)) dma_fence_set_error(&request->fence, -EIO); i915_request_mark_complete(request); } - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } struct intel_engine_cs *mock_engine(struct drm_i915_private *i915, @@ -285,28 +285,23 @@ int mock_engine_init(struct intel_engine_cs *engine) struct drm_i915_private *i915 = engine->i915; int err; + intel_engine_init_active(engine, ENGINE_MOCK); intel_engine_init_breadcrumbs(engine); intel_engine_init_execlists(engine); intel_engine_init__pm(engine); - if (i915_timeline_init(i915, &engine->timeline, NULL)) - goto err_breadcrumbs; - i915_timeline_set_subclass(&engine->timeline, TIMELINE_ENGINE); - engine->kernel_context = i915_gem_context_get_engine(i915->kernel_context, engine->id); if (IS_ERR(engine->kernel_context)) - goto err_timeline; + goto err_breadcrumbs; err = intel_context_pin(engine->kernel_context); intel_context_put(engine->kernel_context); if (err) - goto err_timeline; + goto err_breadcrumbs; return 0; -err_timeline: - i915_timeline_fini(&engine->timeline); err_breadcrumbs: intel_engine_fini_breadcrumbs(engine); return -ENOMEM; @@ -340,7 +335,6 @@ void mock_engine_free(struct intel_engine_cs *engine) intel_context_unpin(engine->kernel_context); intel_engine_fini_breadcrumbs(engine); - i915_timeline_fini(&engine->timeline); kfree(engine); } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 8dc727cbfe68..4a8e7ee19916 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1269,7 +1269,7 @@ static void engine_record_requests(struct intel_engine_cs *engine, count = 0; request = first; - list_for_each_entry_from(request, &engine->timeline.requests, link) + list_for_each_entry_from(request, &engine->active.requests, sched.link) count++; if (!count) return; @@ -1282,7 +1282,8 @@ static void engine_record_requests(struct intel_engine_cs *engine, count = 0; request = first; - list_for_each_entry_from(request, &engine->timeline.requests, link) { + list_for_each_entry_from(request, + &engine->active.requests, sched.link) { if (count >= ee->num_requests) { /* * If the ring request list was changed in diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 5423ec9eaf72..72d920dcb31a 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -232,9 +232,9 @@ static bool i915_request_retire(struct i915_request *rq) local_irq_disable(); - spin_lock(&rq->engine->timeline.lock); - list_del(&rq->link); - spin_unlock(&rq->engine->timeline.lock); + spin_lock(&rq->engine->active.lock); + list_del(&rq->sched.link); + spin_unlock(&rq->engine->active.lock); spin_lock(&rq->lock); i915_request_mark_complete(rq); @@ -254,6 +254,7 @@ static bool i915_request_retire(struct i915_request *rq) intel_context_unpin(rq->hw_context); i915_request_remove_from_client(rq); + list_del(&rq->link); free_capture_list(rq); i915_sched_node_fini(&rq->sched); @@ -373,28 +374,17 @@ __i915_request_await_execution(struct i915_request *rq, return 0; } -static void move_to_timeline(struct i915_request *request, - struct i915_timeline *timeline) -{ - GEM_BUG_ON(request->timeline == &request->engine->timeline); - lockdep_assert_held(&request->engine->timeline.lock); - - spin_lock(&request->timeline->lock); - list_move_tail(&request->link, &timeline->requests); - spin_unlock(&request->timeline->lock); -} - void __i915_request_submit(struct i915_request *request) { struct intel_engine_cs *engine = request->engine; - GEM_TRACE("%s fence %llx:%lld -> current %d\n", + GEM_TRACE("%s fence %llx:%lld, current %d\n", engine->name, request->fence.context, request->fence.seqno, hwsp_seqno(request)); GEM_BUG_ON(!irqs_disabled()); - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); if (i915_gem_context_is_banned(request->gem_context)) i915_request_skip(request, -EIO); @@ -422,6 +412,8 @@ void __i915_request_submit(struct i915_request *request) /* We may be recursing from the signal callback of another i915 fence */ spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING); + list_move_tail(&request->sched.link, &engine->active.requests); + GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags)); set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags); @@ -437,9 +429,6 @@ void __i915_request_submit(struct i915_request *request) engine->emit_fini_breadcrumb(request, request->ring->vaddr + request->postfix); - /* Transfer from per-context onto the global per-engine timeline */ - move_to_timeline(request, &engine->timeline); - engine->serial++; trace_i915_request_execute(request); @@ -451,11 +440,11 @@ void i915_request_submit(struct i915_request *request) unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __i915_request_submit(request); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } void __i915_request_unsubmit(struct i915_request *request) @@ -468,7 +457,7 @@ void __i915_request_unsubmit(struct i915_request *request) hwsp_seqno(request)); GEM_BUG_ON(!irqs_disabled()); - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); /* * Only unwind in reverse order, required so that the per-context list @@ -486,9 +475,6 @@ void __i915_request_unsubmit(struct i915_request *request) spin_unlock(&request->lock); - /* Transfer back from the global per-engine timeline to per-context */ - move_to_timeline(request, request->timeline); - /* We've already spun, don't charge on resubmitting. */ if (request->sched.semaphores && i915_request_started(request)) { request->sched.attr.priority |= I915_PRIORITY_NOSEMAPHORE; @@ -510,11 +496,11 @@ void i915_request_unsubmit(struct i915_request *request) unsigned long flags; /* Will be called from irq-context when using foreign fences. */ - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); __i915_request_unsubmit(request); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static int __i915_sw_fence_call @@ -669,7 +655,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) rq->engine = ce->engine; rq->ring = ce->ring; rq->timeline = tl; - GEM_BUG_ON(rq->timeline == &ce->engine->timeline); rq->hwsp_seqno = tl->hwsp_seqno; rq->hwsp_cacheline = tl->hwsp_cacheline; rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */ @@ -1134,9 +1119,7 @@ __i915_request_add_to_timeline(struct i915_request *rq) 0); } - spin_lock_irq(&timeline->lock); list_add_tail(&rq->link, &timeline->requests); - spin_unlock_irq(&timeline->lock); /* * Make sure that no request gazumped us - if it was allocated after diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index c9f7d07991c8..edbbdfec24ab 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -217,7 +217,7 @@ struct i915_request { bool waitboost; - /** engine->request_list entry for this request */ + /** timeline->request entry for this request */ struct list_head link; /** ring->request_list entry for this request */ diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 78ceb56d7801..2e9b38bdc33c 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -77,7 +77,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio) bool first = true; int idx, i; - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); assert_priolists(execlists); /* buckets sorted from highest [in slot 0] to lowest priority */ @@ -162,9 +162,9 @@ sched_lock_engine(const struct i915_sched_node *node, * check that the rq still belongs to the newly locked engine. */ while (locked != (engine = READ_ONCE(rq->engine))) { - spin_unlock(&locked->timeline.lock); + spin_unlock(&locked->active.lock); memset(cache, 0, sizeof(*cache)); - spin_lock(&engine->timeline.lock); + spin_lock(&engine->active.lock); locked = engine; } @@ -189,7 +189,7 @@ static void kick_submission(struct intel_engine_cs *engine, int prio) * tasklet, i.e. we have not change the priority queue * sufficiently to oust the running context. */ - if (inflight && !i915_scheduler_need_preempt(prio, rq_prio(inflight))) + if (!inflight || !i915_scheduler_need_preempt(prio, rq_prio(inflight))) return; tasklet_hi_schedule(&engine->execlists.tasklet); @@ -278,7 +278,7 @@ static void __i915_schedule(struct i915_sched_node *node, memset(&cache, 0, sizeof(cache)); engine = node_to_request(node)->engine; - spin_lock(&engine->timeline.lock); + spin_lock(&engine->active.lock); /* Fifo and depth-first replacement ensure our deps execute before us */ engine = sched_lock_engine(node, engine, &cache); @@ -287,7 +287,7 @@ static void __i915_schedule(struct i915_sched_node *node, node = dep->signaler; engine = sched_lock_engine(node, engine, &cache); - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); /* Recheck after acquiring the engine->timeline.lock */ if (prio <= node->attr.priority || node_signaled(node)) @@ -296,14 +296,8 @@ static void __i915_schedule(struct i915_sched_node *node, GEM_BUG_ON(node_to_request(node)->engine != engine); node->attr.priority = prio; - if (!list_empty(&node->link)) { - GEM_BUG_ON(intel_engine_is_virtual(engine)); - if (!cache.priolist) - cache.priolist = - i915_sched_lookup_priolist(engine, - prio); - list_move_tail(&node->link, cache.priolist); - } else { + + if (list_empty(&node->link)) { /* * If the request is not in the priolist queue because * it is not yet runnable, then it doesn't contribute @@ -312,8 +306,16 @@ static void __i915_schedule(struct i915_sched_node *node, * queue; but in that case we may still need to reorder * the inflight requests. */ - if (!i915_sw_fence_done(&node_to_request(node)->submit)) - continue; + continue; + } + + if (!intel_engine_is_virtual(engine) && + !i915_request_is_active(node_to_request(node))) { + if (!cache.priolist) + cache.priolist = + i915_sched_lookup_priolist(engine, + prio); + list_move_tail(&node->link, cache.priolist); } if (prio <= engine->execlists.queue_priority_hint) @@ -325,7 +327,7 @@ static void __i915_schedule(struct i915_sched_node *node, kick_submission(engine, prio); } - spin_unlock(&engine->timeline.lock); + spin_unlock(&engine->active.lock); } void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr) @@ -439,8 +441,6 @@ void i915_sched_node_fini(struct i915_sched_node *node) { struct i915_dependency *dep, *tmp; - GEM_BUG_ON(!list_empty(&node->link)); - spin_lock_irq(&schedule_lock); /* diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index 000e1a9b6750..c311ce9c6f9d 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -251,7 +251,6 @@ int i915_timeline_init(struct drm_i915_private *i915, timeline->fence_context = dma_fence_context_alloc(1); - spin_lock_init(&timeline->lock); mutex_init(&timeline->mutex); INIT_ACTIVE_REQUEST(&timeline->last_request); diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h index 27668a1a69a3..36e5e5a65155 100644 --- a/drivers/gpu/drm/i915/i915_timeline.h +++ b/drivers/gpu/drm/i915/i915_timeline.h @@ -36,25 +36,6 @@ int i915_timeline_init(struct drm_i915_private *i915, struct i915_vma *hwsp); void i915_timeline_fini(struct i915_timeline *tl); -static inline void -i915_timeline_set_subclass(struct i915_timeline *timeline, - unsigned int subclass) -{ - lockdep_set_subclass(&timeline->lock, subclass); - - /* - * Due to an interesting quirk in lockdep's internal debug tracking, - * after setting a subclass we must ensure the lock is used. Otherwise, - * nr_unused_locks is incremented once too often. - */ -#ifdef CONFIG_DEBUG_LOCK_ALLOC - local_irq_disable(); - lock_map_acquire(&timeline->lock.dep_map); - lock_map_release(&timeline->lock.dep_map); - local_irq_enable(); -#endif -} - struct i915_timeline * i915_timeline_create(struct drm_i915_private *i915, struct i915_vma *global_hwsp); diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h index 1688705f4a2b..fce5cb4f1090 100644 --- a/drivers/gpu/drm/i915/i915_timeline_types.h +++ b/drivers/gpu/drm/i915/i915_timeline_types.h @@ -23,10 +23,6 @@ struct i915_timeline { u64 fence_context; u32 seqno; - spinlock_t lock; -#define TIMELINE_CLIENT 0 /* default subclass */ -#define TIMELINE_ENGINE 1 -#define TIMELINE_VIRTUAL 2 struct mutex mutex; /* protects the flow of requests */ unsigned int pin_count; diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 89592ef778b8..928121f06054 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -740,7 +740,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine) bool submit = false; struct rb_node *rb; - lockdep_assert_held(&engine->timeline.lock); + lockdep_assert_held(&engine->active.lock); if (port_isset(port)) { if (intel_engine_has_preemption(engine)) { @@ -822,7 +822,7 @@ static void guc_submission_tasklet(unsigned long data) struct i915_request *rq; unsigned long flags; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); rq = port_request(port); while (rq && i915_request_completed(rq)) { @@ -847,7 +847,7 @@ static void guc_submission_tasklet(unsigned long data) if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT)) guc_dequeue(engine); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void guc_reset_prepare(struct intel_engine_cs *engine) @@ -884,7 +884,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled) struct i915_request *rq; unsigned long flags; - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); execlists_cancel_port_requests(execlists); @@ -900,7 +900,7 @@ static void guc_reset(struct intel_engine_cs *engine, bool stalled) intel_lr_context_reset(engine, rq->hw_context, rq->head, stalled); out_unlock: - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void guc_cancel_requests(struct intel_engine_cs *engine) @@ -926,13 +926,13 @@ static void guc_cancel_requests(struct intel_engine_cs *engine) * submission's irq state, we also wish to remind ourselves that * it is irq state.) */ - spin_lock_irqsave(&engine->timeline.lock, flags); + spin_lock_irqsave(&engine->active.lock, flags); /* Cancel the requests on the HW and clear the ELSP tracker. */ execlists_cancel_port_requests(execlists); /* Mark all executing requests as skipped. */ - list_for_each_entry(rq, &engine->timeline.requests, link) { + list_for_each_entry(rq, &engine->active.requests, sched.link) { if (!i915_request_signaled(rq)) dma_fence_set_error(&rq->fence, -EIO); @@ -961,7 +961,7 @@ static void guc_cancel_requests(struct intel_engine_cs *engine) execlists->queue = RB_ROOT_CACHED; GEM_BUG_ON(port_isset(execlists->port)); - spin_unlock_irqrestore(&engine->timeline.lock, flags); + spin_unlock_irqrestore(&engine->active.lock, flags); } static void guc_reset_finish(struct intel_engine_cs *engine) diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c index e084476469ef..65b52be23d42 100644 --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c @@ -13,7 +13,6 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context) timeline->i915 = NULL; timeline->fence_context = context; - spin_lock_init(&timeline->lock); mutex_init(&timeline->mutex); INIT_ACTIVE_REQUEST(&timeline->last_request); From patchwork Mon Jun 10 07:21:07 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5B5241932 for ; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 451D02881C for ; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3729D28832; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CD91228816 for ; Mon, 10 Jun 2019 07:21:52 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B9B6F8912B; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id BA94A89124 for ; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848362-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:07 +0100 Message-Id: <20190610072126.6355-10-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 09/28] drm/i915: Flush the execution-callbacks on retiring X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP In the unlikely case the request completes while we regard it as not even executing on the GPU (see the next patch!), we have to flush any pending execution callbacks at retirement and ensure that we do not add any more. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_request.c | 93 +++++++++++++++-------------- 1 file changed, 49 insertions(+), 44 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 72d920dcb31a..26bd45f11dc5 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -119,6 +119,50 @@ const struct dma_fence_ops i915_fence_ops = { .release = i915_fence_release, }; +static void irq_execute_cb(struct irq_work *wrk) +{ + struct execute_cb *cb = container_of(wrk, typeof(*cb), work); + + i915_sw_fence_complete(cb->fence); + kmem_cache_free(global.slab_execute_cbs, cb); +} + +static void irq_execute_cb_hook(struct irq_work *wrk) +{ + struct execute_cb *cb = container_of(wrk, typeof(*cb), work); + + cb->hook(container_of(cb->fence, struct i915_request, submit), + &cb->signal->fence); + i915_request_put(cb->signal); + + irq_execute_cb(wrk); +} + +static void __notify_execute_cb(struct i915_request *rq) +{ + struct execute_cb *cb; + + lockdep_assert_held(&rq->lock); + + if (list_empty(&rq->execute_cb)) + return; + + list_for_each_entry(cb, &rq->execute_cb, link) + irq_work_queue(&cb->work); + + /* + * XXX Rollback on __i915_request_unsubmit() + * + * In the future, perhaps when we have an active time-slicing scheduler, + * it will be interesting to unsubmit parallel execution and remove + * busywaits from the GPU until their master is restarted. This is + * quite hairy, we have to carefully rollback the fence and do a + * preempt-to-idle cycle on the target engine, all the while the + * master execute_cb may refire. + */ + INIT_LIST_HEAD(&rq->execute_cb); +} + static inline void i915_request_remove_from_client(struct i915_request *request) { @@ -246,6 +290,11 @@ static bool i915_request_retire(struct i915_request *rq) GEM_BUG_ON(!atomic_read(&rq->i915->gt_pm.rps.num_waiters)); atomic_dec(&rq->i915->gt_pm.rps.num_waiters); } + if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) { + set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags); + __notify_execute_cb(rq); + } + GEM_BUG_ON(!list_empty(&rq->execute_cb)); spin_unlock(&rq->lock); local_irq_enable(); @@ -285,50 +334,6 @@ void i915_request_retire_upto(struct i915_request *rq) } while (i915_request_retire(tmp) && tmp != rq); } -static void irq_execute_cb(struct irq_work *wrk) -{ - struct execute_cb *cb = container_of(wrk, typeof(*cb), work); - - i915_sw_fence_complete(cb->fence); - kmem_cache_free(global.slab_execute_cbs, cb); -} - -static void irq_execute_cb_hook(struct irq_work *wrk) -{ - struct execute_cb *cb = container_of(wrk, typeof(*cb), work); - - cb->hook(container_of(cb->fence, struct i915_request, submit), - &cb->signal->fence); - i915_request_put(cb->signal); - - irq_execute_cb(wrk); -} - -static void __notify_execute_cb(struct i915_request *rq) -{ - struct execute_cb *cb; - - lockdep_assert_held(&rq->lock); - - if (list_empty(&rq->execute_cb)) - return; - - list_for_each_entry(cb, &rq->execute_cb, link) - irq_work_queue(&cb->work); - - /* - * XXX Rollback on __i915_request_unsubmit() - * - * In the future, perhaps when we have an active time-slicing scheduler, - * it will be interesting to unsubmit parallel execution and remove - * busywaits from the GPU until their master is restarted. This is - * quite hairy, we have to carefully rollback the fence and do a - * preempt-to-idle cycle on the target engine, all the while the - * master execute_cb may refire. - */ - INIT_LIST_HEAD(&rq->execute_cb); -} - static int __i915_request_await_execution(struct i915_request *rq, struct i915_request *signal, From patchwork Mon Jun 10 07:21:08 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984187 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 74F001580 for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5887A28816 for ; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4C89F28842; Mon, 10 Jun 2019 07:21:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 063D428816 for ; Mon, 10 Jun 2019 07:21:54 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 294F589131; Mon, 10 Jun 2019 07:21:51 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id E0C3289113 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848363-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:08 +0100 Message-Id: <20190610072126.6355-11-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 10/28] drm/i915/execlists: Preempt-to-busy X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP When using a global seqno, we required a precise stop-the-workd event to handle preemption and unwind the global seqno counter. To accomplish this, we would preempt to a special out-of-band context and wait for the machine to report that it was idle. Given an idle machine, we could very precisely see which requests had completed and which we needed to feed back into the run queue. However, now that we have scrapped the global seqno, we no longer need to precisely unwind the global counter and only track requests by their per-context seqno. This allows us to loosely unwind inflight requests while scheduling a preemption, with the enormous caveat that the requests we put back on the run queue are still _inflight_ (until the preemption request is complete). This makes request tracking much more messy, as at any point then we can see a completed request that we believe is not currently scheduled for execution. We also have to be careful not to rewind RING_TAIL past RING_HEAD on preempting to the running context, and for this we use a semaphore to prevent completion of the request before continuing. To accomplish this feat, we change how we track requests scheduled to the HW. Instead of appending our requests onto a single list as we submit, we track each submission to ELSP as its own block. Then upon receiving the CS preemption event, we promote the pending block to the inflight block (discarding what was previously being tracked). As normal CS completion events arrive, we then remove stale entries from the inflight tracker. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_context_types.h | 5 + drivers/gpu/drm/i915/gt/intel_engine.h | 61 +- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 61 +- drivers/gpu/drm/i915/gt/intel_engine_types.h | 52 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 671 ++++++++---------- drivers/gpu/drm/i915/i915_gpu_error.c | 19 +- drivers/gpu/drm/i915/i915_request.c | 6 + drivers/gpu/drm/i915/i915_request.h | 1 + drivers/gpu/drm/i915/i915_scheduler.c | 3 +- drivers/gpu/drm/i915/i915_utils.h | 12 + drivers/gpu/drm/i915/intel_guc_submission.c | 175 ++--- drivers/gpu/drm/i915/selftests/i915_request.c | 8 +- 13 files changed, 465 insertions(+), 611 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 1fdae3a62cef..f09e3abe695a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -645,7 +645,7 @@ static void init_contexts(struct drm_i915_private *i915) static bool needs_preempt_context(struct drm_i915_private *i915) { - return HAS_EXECLISTS(i915); + return USES_GUC_SUBMISSION(i915); } int i915_gem_contexts_init(struct drm_i915_private *dev_priv) diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h index 08049ee91cee..4c0e211c715d 100644 --- a/drivers/gpu/drm/i915/gt/intel_context_types.h +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h @@ -13,6 +13,7 @@ #include #include "i915_active_types.h" +#include "i915_utils.h" #include "intel_engine_types.h" #include "intel_sseu.h" @@ -38,6 +39,10 @@ struct intel_context { struct i915_gem_context *gem_context; struct intel_engine_cs *engine; struct intel_engine_cs *inflight; +#define intel_context_inflight(ce) ptr_mask_bits((ce)->inflight, 2) +#define intel_context_inflight_count(ce) ptr_unmask_bits((ce)->inflight, 2) +#define intel_context_inflight_inc(ce) ptr_count_inc(&(ce)->inflight) +#define intel_context_inflight_dec(ce) ptr_count_dec(&(ce)->inflight) struct list_head signal_link; struct list_head signals; diff --git a/drivers/gpu/drm/i915/gt/intel_engine.h b/drivers/gpu/drm/i915/gt/intel_engine.h index 6be607e9c084..b798fbdd03b8 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine.h +++ b/drivers/gpu/drm/i915/gt/intel_engine.h @@ -124,71 +124,26 @@ hangcheck_action_to_str(const enum intel_engine_hangcheck_action a) void intel_engines_set_scheduler_caps(struct drm_i915_private *i915); -static inline void -execlists_set_active(struct intel_engine_execlists *execlists, - unsigned int bit) -{ - __set_bit(bit, (unsigned long *)&execlists->active); -} - -static inline bool -execlists_set_active_once(struct intel_engine_execlists *execlists, - unsigned int bit) -{ - return !__test_and_set_bit(bit, (unsigned long *)&execlists->active); -} - -static inline void -execlists_clear_active(struct intel_engine_execlists *execlists, - unsigned int bit) -{ - __clear_bit(bit, (unsigned long *)&execlists->active); -} - -static inline void -execlists_clear_all_active(struct intel_engine_execlists *execlists) +static inline unsigned int +execlists_num_ports(const struct intel_engine_execlists * const execlists) { - execlists->active = 0; + return execlists->port_mask + 1; } -static inline bool -execlists_is_active(const struct intel_engine_execlists *execlists, - unsigned int bit) +static inline struct i915_request * +execlists_active(const struct intel_engine_execlists *execlists) { - return test_bit(bit, (unsigned long *)&execlists->active); + GEM_BUG_ON(execlists->active - execlists->inflight > + execlists_num_ports(execlists)); + return READ_ONCE(*execlists->active); } -void execlists_user_begin(struct intel_engine_execlists *execlists, - const struct execlist_port *port); -void execlists_user_end(struct intel_engine_execlists *execlists); - void execlists_cancel_port_requests(struct intel_engine_execlists * const execlists); struct i915_request * execlists_unwind_incomplete_requests(struct intel_engine_execlists *execlists); -static inline unsigned int -execlists_num_ports(const struct intel_engine_execlists * const execlists) -{ - return execlists->port_mask + 1; -} - -static inline struct execlist_port * -execlists_port_complete(struct intel_engine_execlists * const execlists, - struct execlist_port * const port) -{ - const unsigned int m = execlists->port_mask; - - GEM_BUG_ON(port_index(port, execlists) != 0); - GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_USER)); - - memmove(port, port + 1, m * sizeof(struct execlist_port)); - memset(port + m, 0, sizeof(struct execlist_port)); - - return port; -} - static inline u32 intel_read_status_page(const struct intel_engine_cs *engine, int reg) { diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 0287c3b094a2..19202d0a0d7d 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -508,6 +508,10 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine) GEM_BUG_ON(!is_power_of_2(execlists_num_ports(execlists))); GEM_BUG_ON(execlists_num_ports(execlists) > EXECLIST_MAX_PORTS); + memset(execlists->pending, 0, sizeof(execlists->pending)); + execlists->active = + memset(execlists->inflight, 0, sizeof(execlists->inflight)); + execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; } @@ -1151,7 +1155,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) return true; /* Waiting to drain ELSP? */ - if (READ_ONCE(engine->execlists.active)) { + if (execlists_active(&engine->execlists)) { struct tasklet_struct *t = &engine->execlists.tasklet; synchronize_hardirq(engine->i915->drm.irq); @@ -1168,7 +1172,7 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine) /* Otherwise flush the tasklet if it was on another cpu */ tasklet_unlock_wait(t); - if (READ_ONCE(engine->execlists.active)) + if (execlists_active(&engine->execlists)) return false; } @@ -1365,6 +1369,7 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine, } if (HAS_EXECLISTS(dev_priv)) { + struct i915_request * const *port, *rq; const u32 *hws = &engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; const u8 num_entries = execlists->csb_size; @@ -1397,26 +1402,28 @@ static void intel_engine_print_registers(const struct intel_engine_cs *engine, } rcu_read_lock(); - for (idx = 0; idx < execlists_num_ports(execlists); idx++) { - struct i915_request *rq; - unsigned int count; - - rq = port_unpack(&execlists->port[idx], &count); - if (rq) { - char hdr[80]; - - snprintf(hdr, sizeof(hdr), - "\t\tELSP[%d] count=%d, ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ", - idx, count, - i915_ggtt_offset(rq->ring->vma), - rq->timeline->hwsp_offset, - hwsp_seqno(rq)); - print_request(m, rq, hdr); - } else { - drm_printf(m, "\t\tELSP[%d] idle\n", idx); - } + for (port = execlists->active; (rq = *port); port++) { + char hdr[80]; + + snprintf(hdr, sizeof(hdr), + "\t\tActive[%d] ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ", + (int)(port - execlists->active), + i915_ggtt_offset(rq->ring->vma), + rq->timeline->hwsp_offset, + hwsp_seqno(rq)); + print_request(m, rq, hdr); + } + for (port = execlists->pending; (rq = *port); port++) { + char hdr[80]; + + snprintf(hdr, sizeof(hdr), + "\t\tPending[%d] ring:{start:%08x, hwsp:%08x, seqno:%08x}, rq: ", + (int)(port - execlists->pending), + i915_ggtt_offset(rq->ring->vma), + rq->timeline->hwsp_offset, + hwsp_seqno(rq)); + print_request(m, rq, hdr); } - drm_printf(m, "\t\tHW active? 0x%x\n", execlists->active); rcu_read_unlock(); } else if (INTEL_GEN(dev_priv) > 6) { drm_printf(m, "\tPP_DIR_BASE: 0x%08x\n", @@ -1580,15 +1587,19 @@ int intel_enable_engine_stats(struct intel_engine_cs *engine) } if (engine->stats.enabled++ == 0) { - const struct execlist_port *port = execlists->port; - unsigned int num_ports = execlists_num_ports(execlists); + struct i915_request * const *port; + struct i915_request *rq; engine->stats.enabled_at = ktime_get(); /* XXX submission method oblivious? */ - while (num_ports-- && port_isset(port)) { + for (port = execlists->active; (rq = *port); port++) engine->stats.active++; - port++; + + for (port = execlists->pending; (rq = *port); port++) { + /* Exclude any contexts already counted in active */ + if (intel_context_inflight_count(rq->hw_context) == 1) + engine->stats.active++; } if (engine->stats.active) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 868b220214f8..ce0ade3f5cad 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -160,51 +160,10 @@ struct intel_engine_execlists { */ u32 __iomem *ctrl_reg; - /** - * @port: execlist port states - * - * For each hardware ELSP (ExecList Submission Port) we keep - * track of the last request and the number of times we submitted - * that port to hw. We then count the number of times the hw reports - * a context completion or preemption. As only one context can - * be active on hw, we limit resubmission of context to port[0]. This - * is called Lite Restore, of the context. - */ - struct execlist_port { - /** - * @request_count: combined request and submission count - */ - struct i915_request *request_count; -#define EXECLIST_COUNT_BITS 2 -#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS) -#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS) -#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS) -#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS) -#define port_set(p, packed) ((p)->request_count = (packed)) -#define port_isset(p) ((p)->request_count) -#define port_index(p, execlists) ((p) - (execlists)->port) - - /** - * @context_id: context ID for port - */ - GEM_DEBUG_DECL(u32 context_id); - #define EXECLIST_MAX_PORTS 2 - } port[EXECLIST_MAX_PORTS]; - - /** - * @active: is the HW active? We consider the HW as active after - * submitting any context for execution and until we have seen the - * last context completion event. After that, we do not expect any - * more events until we submit, and so can park the HW. - * - * As we have a small number of different sources from which we feed - * the HW, we track the state of each inside a single bitfield. - */ - unsigned int active; -#define EXECLISTS_ACTIVE_USER 0 -#define EXECLISTS_ACTIVE_PREEMPT 1 -#define EXECLISTS_ACTIVE_HWACK 2 + struct i915_request * const *active; + struct i915_request *inflight[EXECLIST_MAX_PORTS + 1 /* sentinel */]; + struct i915_request *pending[EXECLIST_MAX_PORTS + 1]; /** * @port_mask: number of execlist ports - 1 @@ -245,11 +204,6 @@ struct intel_engine_execlists { */ u32 *csb_status; - /** - * @preempt_complete_status: expected CSB upon completing preemption - */ - u32 preempt_complete_status; - /** * @csb_size: context status buffer FIFO size */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 6879f065ae27..a57ca97bc17a 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -161,6 +161,8 @@ #define GEN8_CTX_STATUS_COMPLETED_MASK \ (GEN8_CTX_STATUS_COMPLETE | GEN8_CTX_STATUS_PREEMPTED) +#define CTX_DESC_FORCE_RESTORE BIT_ULL(2) + /* Typical size of the average request (2 pipecontrols and a MI_BB) */ #define EXECLISTS_REQUEST_SIZE 64 /* bytes */ #define WA_TAIL_DWORDS 2 @@ -221,6 +223,14 @@ static void execlists_init_reg_state(u32 *reg_state, struct intel_engine_cs *engine, struct intel_ring *ring); +static inline u32 intel_hws_preempt_address(struct intel_engine_cs *engine) +{ + return (i915_ggtt_offset(engine->status_page.vma) + + I915_GEM_HWS_PREEMPT_ADDR); +} + +#define ring_pause(E) ((E)->status_page.addr[I915_GEM_HWS_PREEMPT]) + static inline struct i915_priolist *to_priolist(struct rb_node *rb) { return rb_entry(rb, struct i915_priolist, node); @@ -271,12 +281,6 @@ static inline bool need_preempt(const struct intel_engine_cs *engine, { int last_prio; - if (!engine->preempt_context) - return false; - - if (i915_request_completed(rq)) - return false; - /* * Check if the current priority hint merits a preemption attempt. * @@ -338,9 +342,6 @@ __maybe_unused static inline bool assert_priority_queue(const struct i915_request *prev, const struct i915_request *next) { - const struct intel_engine_execlists *execlists = - &prev->engine->execlists; - /* * Without preemption, the prev may refer to the still active element * which we refuse to let go. @@ -348,7 +349,7 @@ assert_priority_queue(const struct i915_request *prev, * Even with preemption, there are times when we think it is better not * to preempt and leave an ostensibly lower priority request in flight. */ - if (port_request(execlists->port) == prev) + if (i915_request_is_active(prev)) return true; return rq_prio(prev) >= rq_prio(next); @@ -442,13 +443,11 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine) struct intel_engine_cs *owner; if (i915_request_completed(rq)) - break; + continue; /* XXX */ __i915_request_unsubmit(rq); unwind_wa_tail(rq); - GEM_BUG_ON(rq->hw_context->inflight); - /* * Push the request back into the queue for later resubmission. * If this request is not native to this physical engine (i.e. @@ -500,32 +499,32 @@ execlists_context_status_change(struct i915_request *rq, unsigned long status) status, rq); } -inline void -execlists_user_begin(struct intel_engine_execlists *execlists, - const struct execlist_port *port) +static inline struct i915_request * +execlists_schedule_in(struct i915_request *rq, int idx) { - execlists_set_active_once(execlists, EXECLISTS_ACTIVE_USER); -} + struct intel_context *ce = rq->hw_context; + int count; -inline void -execlists_user_end(struct intel_engine_execlists *execlists) -{ - execlists_clear_active(execlists, EXECLISTS_ACTIVE_USER); -} + trace_i915_request_in(rq, idx); -static inline void -execlists_context_schedule_in(struct i915_request *rq) -{ - GEM_BUG_ON(rq->hw_context->inflight); + count = intel_context_inflight_count(ce); + if (!count) { + intel_context_get(ce); + ce->inflight = rq->engine; + + execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); + intel_engine_context_in(ce->inflight); + } + + intel_context_inflight_inc(ce); + GEM_BUG_ON(intel_context_inflight(ce) != rq->engine); - execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN); - intel_engine_context_in(rq->engine); - rq->hw_context->inflight = rq->engine; + return i915_request_get(rq); } -static void kick_siblings(struct i915_request *rq) +static void kick_siblings(struct i915_request *rq, struct intel_context *ce) { - struct virtual_engine *ve = to_virtual_engine(rq->hw_context->engine); + struct virtual_engine *ve = container_of(ce, typeof(*ve), context); struct i915_request *next = READ_ONCE(ve->request); if (next && next->execution_mask & ~rq->execution_mask) @@ -533,29 +532,42 @@ static void kick_siblings(struct i915_request *rq) } static inline void -execlists_context_schedule_out(struct i915_request *rq, unsigned long status) +execlists_schedule_out(struct i915_request *rq) { - rq->hw_context->inflight = NULL; - intel_engine_context_out(rq->engine); - execlists_context_status_change(rq, status); + struct intel_context *ce = rq->hw_context; + + GEM_BUG_ON(!intel_context_inflight_count(ce)); + trace_i915_request_out(rq); - /* - * If this is part of a virtual engine, its next request may have - * been blocked waiting for access to the active context. We have - * to kick all the siblings again in case we need to switch (e.g. - * the next request is not runnable on this engine). Hopefully, - * we will already have submitted the next request before the - * tasklet runs and do not need to rebuild each virtual tree - * and kick everyone again. - */ - if (rq->engine != rq->hw_context->engine) - kick_siblings(rq); + intel_context_inflight_dec(ce); + if (!intel_context_inflight_count(ce)) { + intel_engine_context_out(ce->inflight); + execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT); + + ce->inflight = NULL; + intel_context_put(ce); + + /* + * If this is part of a virtual engine, its next request may + * have been blocked waiting for access to the active context. + * We have to kick all the siblings again in case we need to + * switch (e.g. the next request is not runnable on this + * engine). Hopefully, we will already have submitted the next + * request before the tasklet runs and do not need to rebuild + * each virtual tree and kick everyone again. + */ + if (rq->engine != ce->engine) + kick_siblings(rq, ce); + } + + i915_request_put(rq); } -static u64 execlists_update_context(struct i915_request *rq) +static u64 execlists_update_context(const struct i915_request *rq) { struct intel_context *ce = rq->hw_context; + u64 desc; ce->lrc_reg_state[CTX_RING_TAIL + 1] = intel_ring_set_tail(rq->ring, rq->tail); @@ -576,7 +588,11 @@ static u64 execlists_update_context(struct i915_request *rq) * wmb). */ mb(); - return ce->lrc_desc; + + desc = ce->lrc_desc; + ce->lrc_desc &= ~CTX_DESC_FORCE_RESTORE; + + return desc; } static inline void write_desc(struct intel_engine_execlists *execlists, u64 desc, u32 port) @@ -590,12 +606,54 @@ static inline void write_desc(struct intel_engine_execlists *execlists, u64 desc } } +static __maybe_unused void +trace_ports(const struct intel_engine_execlists *execlists, + const char *msg, + struct i915_request * const *ports) +{ + const struct intel_engine_cs *engine = + container_of(execlists, typeof(*engine), execlists); + + GEM_TRACE("%s: %s { %llx:%lld%s, %llx:%lld }\n", + engine->name, msg, + ports[0]->fence.context, + ports[0]->fence.seqno, + i915_request_completed(ports[0]) ? "!" : + i915_request_started(ports[0]) ? "*" : + "", + ports[1] ? ports[1]->fence.context : 0, + ports[1] ? ports[1]->fence.seqno : 0); +} + +static __maybe_unused bool +assert_pending_valid(const struct intel_engine_execlists *execlists, + const char *msg) +{ + struct i915_request * const *port, *rq; + struct intel_context *ce = NULL; + + trace_ports(execlists, msg, execlists->pending); + + if (execlists->pending[execlists_num_ports(execlists)]) + return false; + + for (port = execlists->pending; (rq = *port); port++) { + if (ce == rq->hw_context) + return false; + + ce = rq->hw_context; + } + + return ce; +} + static void execlists_submit_ports(struct intel_engine_cs *engine) { struct intel_engine_execlists *execlists = &engine->execlists; - struct execlist_port *port = execlists->port; unsigned int n; + GEM_BUG_ON(!assert_pending_valid(execlists, "submit")); + /* * We can skip acquiring intel_runtime_pm_get() here as it was taken * on our behalf by the request (see i915_gem_mark_busy()) and it will @@ -613,38 +671,16 @@ static void execlists_submit_ports(struct intel_engine_cs *engine) * of elsq entries, keep this in mind before changing the loop below. */ for (n = execlists_num_ports(execlists); n--; ) { - struct i915_request *rq; - unsigned int count; - u64 desc; + struct i915_request *rq = execlists->pending[n]; - rq = port_unpack(&port[n], &count); - if (rq) { - GEM_BUG_ON(count > !n); - if (!count++) - execlists_context_schedule_in(rq); - port_set(&port[n], port_pack(rq, count)); - desc = execlists_update_context(rq); - GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc)); - - GEM_TRACE("%s in[%d]: ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n", - engine->name, n, - port[n].context_id, count, - rq->fence.context, rq->fence.seqno, - hwsp_seqno(rq), - rq_prio(rq)); - } else { - GEM_BUG_ON(!n); - desc = 0; - } - - write_desc(execlists, desc, n); + write_desc(execlists, + rq ? execlists_update_context(rq) : 0, + n); } /* we need to manually load the submit queue */ if (execlists->ctrl_reg) writel(EL_CTRL_LOAD, execlists->ctrl_reg); - - execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK); } static bool ctx_single_port_submission(const struct intel_context *ce) @@ -668,6 +704,7 @@ static bool can_merge_ctx(const struct intel_context *prev, static bool can_merge_rq(const struct i915_request *prev, const struct i915_request *next) { + GEM_BUG_ON(prev == next); GEM_BUG_ON(!assert_priority_queue(prev, next)); if (!can_merge_ctx(prev->hw_context, next->hw_context)) @@ -676,58 +713,6 @@ static bool can_merge_rq(const struct i915_request *prev, return true; } -static void port_assign(struct execlist_port *port, struct i915_request *rq) -{ - GEM_BUG_ON(rq == port_request(port)); - - if (port_isset(port)) - i915_request_put(port_request(port)); - - port_set(port, port_pack(i915_request_get(rq), port_count(port))); -} - -static void inject_preempt_context(struct intel_engine_cs *engine) -{ - struct intel_engine_execlists *execlists = &engine->execlists; - struct intel_context *ce = engine->preempt_context; - unsigned int n; - - GEM_BUG_ON(execlists->preempt_complete_status != - upper_32_bits(ce->lrc_desc)); - - /* - * Switch to our empty preempt context so - * the state of the GPU is known (idle). - */ - GEM_TRACE("%s\n", engine->name); - for (n = execlists_num_ports(execlists); --n; ) - write_desc(execlists, 0, n); - - write_desc(execlists, ce->lrc_desc, n); - - /* we need to manually load the submit queue */ - if (execlists->ctrl_reg) - writel(EL_CTRL_LOAD, execlists->ctrl_reg); - - execlists_clear_active(execlists, EXECLISTS_ACTIVE_HWACK); - execlists_set_active(execlists, EXECLISTS_ACTIVE_PREEMPT); - - (void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++); -} - -static void complete_preempt_context(struct intel_engine_execlists *execlists) -{ - GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT)); - - if (inject_preempt_hang(execlists)) - return; - - execlists_cancel_port_requests(execlists); - __unwind_incomplete_requests(container_of(execlists, - struct intel_engine_cs, - execlists)); -} - static void virtual_update_register_offsets(u32 *regs, struct intel_engine_cs *engine) { @@ -792,7 +777,7 @@ static bool virtual_matches(const struct virtual_engine *ve, * we reuse the register offsets). This is a very small * hystersis on the greedy seelction algorithm. */ - inflight = READ_ONCE(ve->context.inflight); + inflight = intel_context_inflight(&ve->context); if (inflight && inflight != engine) return false; @@ -815,13 +800,23 @@ static void virtual_xfer_breadcrumbs(struct virtual_engine *ve, spin_unlock(&old->breadcrumbs.irq_lock); } +static struct i915_request * +last_active(const struct intel_engine_execlists *execlists) +{ + struct i915_request * const *last = execlists->active; + + while (*last && i915_request_completed(*last)) + last++; + + return *last; +} + static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct execlist_port *port = execlists->port; - const struct execlist_port * const last_port = - &execlists->port[execlists->port_mask]; - struct i915_request *last = port_request(port); + struct i915_request **port = execlists->pending; + struct i915_request ** const last_port = port + execlists->port_mask; + struct i915_request *last; struct rb_node *rb; bool submit = false; @@ -867,65 +862,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine) break; } + /* + * If the queue is higher priority than the last + * request in the currently active context, submit afresh. + * We will resubmit again afterwards in case we need to split + * the active context to interject the preemption request, + * i.e. we will retrigger preemption following the ack in case + * of trouble. + */ + last = last_active(execlists); if (last) { - /* - * Don't resubmit or switch until all outstanding - * preemptions (lite-restore) are seen. Then we - * know the next preemption status we see corresponds - * to this ELSP update. - */ - GEM_BUG_ON(!execlists_is_active(execlists, - EXECLISTS_ACTIVE_USER)); - GEM_BUG_ON(!port_count(&port[0])); - - /* - * If we write to ELSP a second time before the HW has had - * a chance to respond to the previous write, we can confuse - * the HW and hit "undefined behaviour". After writing to ELSP, - * we must then wait until we see a context-switch event from - * the HW to indicate that it has had a chance to respond. - */ - if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK)) - return; - if (need_preempt(engine, last, rb)) { - inject_preempt_context(engine); - return; - } + GEM_TRACE("%s: preempting last=%llx:%lld, prio=%d, hint=%d\n", + engine->name, + last->fence.context, + last->fence.seqno, + last->sched.attr.priority, + execlists->queue_priority_hint); + /* + * Don't let the RING_HEAD advance past the breadcrumb + * as we unwind (and until we resubmit) so that we do + * not accidentally tell it to go backwards. + */ + ring_pause(engine) = 1; - /* - * In theory, we could coalesce more requests onto - * the second port (the first port is active, with - * no preemptions pending). However, that means we - * then have to deal with the possible lite-restore - * of the second port (as we submit the ELSP, there - * may be a context-switch) but also we may complete - * the resubmission before the context-switch. Ergo, - * coalescing onto the second port will cause a - * preemption event, but we cannot predict whether - * that will affect port[0] or port[1]. - * - * If the second port is already active, we can wait - * until the next context-switch before contemplating - * new requests. The GPU will be busy and we should be - * able to resubmit the new ELSP before it idles, - * avoiding pipeline bubbles (momentary pauses where - * the driver is unable to keep up the supply of new - * work). However, we have to double check that the - * priorities of the ports haven't been switch. - */ - if (port_count(&port[1])) - return; + /* + * Note that we have not stopped the GPU at this point, + * so we are unwinding the incomplete requests as they + * remain inflight and so by the time we do complete + * the preemption, some of the unwound requests may + * complete! + */ + __unwind_incomplete_requests(engine); - /* - * WaIdleLiteRestore:bdw,skl - * Apply the wa NOOPs to prevent - * ring:HEAD == rq:TAIL as we resubmit the - * request. See gen8_emit_fini_breadcrumb() for - * where we prepare the padding after the - * end of the request. - */ - last->tail = last->wa_tail; + /* + * If we need to return to the preempted context, we + * need to skip the lite-restore and force it to + * reload the RING_TAIL. Otherwise, the HW has a + * tendency to ignore us rewinding the TAIL to the + * end of an earlier request. + */ + last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE; + last = NULL; + } else { + /* + * Otherwise if we already have a request pending + * for execution after the current one, we can + * just wait until the next CS event before + * queuing more. In either case we will force a + * lite-restore preemption event, but if we wait + * we hopefully coalesce several updates into a single + * submission. + */ + if (!list_is_last(&last->sched.link, + &engine->active.requests)) + return; + + /* + * WaIdleLiteRestore:bdw,skl + * Apply the wa NOOPs to prevent + * ring:HEAD == rq:TAIL as we resubmit the + * request. See gen8_emit_fini_breadcrumb() for + * where we prepare the padding after the + * end of the request. + */ + last->tail = last->wa_tail; + } } while (rb) { /* XXX virtual is always taking precedence */ @@ -955,9 +957,24 @@ static void execlists_dequeue(struct intel_engine_cs *engine) continue; } + if (i915_request_completed(rq)) { + ve->request = NULL; + ve->base.execlists.queue_priority_hint = INT_MIN; + rb_erase_cached(rb, &execlists->virtual); + RB_CLEAR_NODE(rb); + + rq->engine = engine; + __i915_request_submit(rq); + + spin_unlock(&ve->base.active.lock); + + rb = rb_first_cached(&execlists->virtual); + continue; + } + if (last && !can_merge_rq(last, rq)) { spin_unlock(&ve->base.active.lock); - return; /* leave this rq for another engine */ + return; /* leave this for another */ } GEM_TRACE("%s: virtual rq=%llx:%lld%s, new engine? %s\n", @@ -1006,9 +1023,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine) } __i915_request_submit(rq); - trace_i915_request_in(rq, port_index(port, execlists)); - submit = true; - last = rq; + if (!i915_request_completed(rq)) { + submit = true; + last = rq; + } } spin_unlock(&ve->base.active.lock); @@ -1021,6 +1039,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine) int i; priolist_for_each_request_consume(rq, rn, p, i) { + if (i915_request_completed(rq)) + goto skip; + /* * Can we combine this request with the current port? * It has to be the same context/ringbuffer and not @@ -1060,19 +1081,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine) ctx_single_port_submission(rq->hw_context)) goto done; - - if (submit) - port_assign(port, last); + *port = execlists_schedule_in(last, port - execlists->pending); port++; - - GEM_BUG_ON(port_isset(port)); } - __i915_request_submit(rq); - trace_i915_request_in(rq, port_index(port, execlists)); - last = rq; submit = true; +skip: + __i915_request_submit(rq); } rb_erase_cached(&p->node, &execlists->queue); @@ -1097,54 +1113,30 @@ static void execlists_dequeue(struct intel_engine_cs *engine) * interrupt for secondary ports). */ execlists->queue_priority_hint = queue_prio(execlists); + GEM_TRACE("%s: queue_priority_hint:%d, submit:%s\n", + engine->name, execlists->queue_priority_hint, + yesno(submit)); if (submit) { - port_assign(port, last); + *port = execlists_schedule_in(last, port - execlists->pending); + memset(port + 1, 0, (last_port - port) * sizeof(*port)); execlists_submit_ports(engine); } - - /* We must always keep the beast fed if we have work piled up */ - GEM_BUG_ON(rb_first_cached(&execlists->queue) && - !port_isset(execlists->port)); - - /* Re-evaluate the executing context setup after each preemptive kick */ - if (last) - execlists_user_begin(execlists, execlists->port); - - /* If the engine is now idle, so should be the flag; and vice versa. */ - GEM_BUG_ON(execlists_is_active(&engine->execlists, - EXECLISTS_ACTIVE_USER) == - !port_isset(engine->execlists.port)); } void execlists_cancel_port_requests(struct intel_engine_execlists * const execlists) { - struct execlist_port *port = execlists->port; - unsigned int num_ports = execlists_num_ports(execlists); - - while (num_ports-- && port_isset(port)) { - struct i915_request *rq = port_request(port); - - GEM_TRACE("%s:port%u fence %llx:%lld, (current %d)\n", - rq->engine->name, - (unsigned int)(port - execlists->port), - rq->fence.context, rq->fence.seqno, - hwsp_seqno(rq)); + struct i915_request * const *port, *rq; - GEM_BUG_ON(!execlists->active); - execlists_context_schedule_out(rq, - i915_request_completed(rq) ? - INTEL_CONTEXT_SCHEDULE_OUT : - INTEL_CONTEXT_SCHEDULE_PREEMPTED); + for (port = execlists->pending; (rq = *port); port++) + execlists_schedule_out(rq); + memset(execlists->pending, 0, sizeof(execlists->pending)); - i915_request_put(rq); - - memset(port, 0, sizeof(*port)); - port++; - } - - execlists_clear_all_active(execlists); + for (port = execlists->active; (rq = *port); port++) + execlists_schedule_out(rq); + execlists->active = + memset(execlists->inflight, 0, sizeof(execlists->inflight)); } static inline void @@ -1163,7 +1155,6 @@ reset_in_progress(const struct intel_engine_execlists *execlists) static void process_csb(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct execlist_port *port = execlists->port; const u32 * const buf = execlists->csb_status; const u8 num_entries = execlists->csb_size; u8 head, tail; @@ -1197,9 +1188,7 @@ static void process_csb(struct intel_engine_cs *engine) rmb(); do { - struct i915_request *rq; unsigned int status; - unsigned int count; if (++head == num_entries) head = 0; @@ -1222,68 +1211,37 @@ static void process_csb(struct intel_engine_cs *engine) * status notifier. */ - GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x, active=0x%x\n", + GEM_TRACE("%s csb[%d]: status=0x%08x:0x%08x\n", engine->name, head, - buf[2 * head + 0], buf[2 * head + 1], - execlists->active); + buf[2 * head + 0], buf[2 * head + 1]); status = buf[2 * head]; - if (status & (GEN8_CTX_STATUS_IDLE_ACTIVE | - GEN8_CTX_STATUS_PREEMPTED)) - execlists_set_active(execlists, - EXECLISTS_ACTIVE_HWACK); - if (status & GEN8_CTX_STATUS_ACTIVE_IDLE) - execlists_clear_active(execlists, - EXECLISTS_ACTIVE_HWACK); - - if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK)) - continue; - - /* We should never get a COMPLETED | IDLE_ACTIVE! */ - GEM_BUG_ON(status & GEN8_CTX_STATUS_IDLE_ACTIVE); + if (status & GEN8_CTX_STATUS_IDLE_ACTIVE) { +promote: + GEM_BUG_ON(!assert_pending_valid(execlists, "promote")); + execlists->active = + memcpy(execlists->inflight, + execlists->pending, + execlists_num_ports(execlists) * + sizeof(*execlists->pending)); + execlists->pending[0] = NULL; - if (status & GEN8_CTX_STATUS_COMPLETE && - buf[2*head + 1] == execlists->preempt_complete_status) { - GEM_TRACE("%s preempt-idle\n", engine->name); - complete_preempt_context(execlists); - continue; - } + if (!inject_preempt_hang(execlists)) + ring_pause(engine) = 0; + } else if (status & GEN8_CTX_STATUS_PREEMPTED) { + struct i915_request * const *port = execlists->active; - if (status & GEN8_CTX_STATUS_PREEMPTED && - execlists_is_active(execlists, - EXECLISTS_ACTIVE_PREEMPT)) - continue; + trace_ports(execlists, "preempted", execlists->active); - GEM_BUG_ON(!execlists_is_active(execlists, - EXECLISTS_ACTIVE_USER)); + while (*port) + execlists_schedule_out(*port++); - rq = port_unpack(port, &count); - GEM_TRACE("%s out[0]: ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n", - engine->name, - port->context_id, count, - rq ? rq->fence.context : 0, - rq ? rq->fence.seqno : 0, - rq ? hwsp_seqno(rq) : 0, - rq ? rq_prio(rq) : 0); + goto promote; + } else if (*execlists->active) { + struct i915_request *rq = *execlists->active++; - /* Check the context/desc id for this event matches */ - GEM_DEBUG_BUG_ON(buf[2 * head + 1] != port->context_id); - - GEM_BUG_ON(count == 0); - if (--count == 0) { - /* - * On the final event corresponding to the - * submission of this context, we expect either - * an element-switch event or a completion - * event (and on completion, the active-idle - * marker). No more preemptions, lite-restore - * or otherwise. - */ - GEM_BUG_ON(status & GEN8_CTX_STATUS_PREEMPTED); - GEM_BUG_ON(port_isset(&port[1]) && - !(status & GEN8_CTX_STATUS_ELEMENT_SWITCH)); - GEM_BUG_ON(!port_isset(&port[1]) && - !(status & GEN8_CTX_STATUS_ACTIVE_IDLE)); + trace_ports(execlists, "completed", + execlists->active - 1); /* * We rely on the hardware being strongly @@ -1292,21 +1250,10 @@ static void process_csb(struct intel_engine_cs *engine) * user interrupt and CSB is processed. */ GEM_BUG_ON(!i915_request_completed(rq)); + execlists_schedule_out(rq); - execlists_context_schedule_out(rq, - INTEL_CONTEXT_SCHEDULE_OUT); - i915_request_put(rq); - - GEM_TRACE("%s completed ctx=%d\n", - engine->name, port->context_id); - - port = execlists_port_complete(execlists, port); - if (port_isset(port)) - execlists_user_begin(execlists, port); - else - execlists_user_end(execlists); - } else { - port_set(port, port_pack(rq, count)); + GEM_BUG_ON(execlists->active - execlists->inflight > + execlists_num_ports(execlists)); } } while (head != tail); @@ -1331,7 +1278,7 @@ static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) lockdep_assert_held(&engine->active.lock); process_csb(engine); - if (!execlists_is_active(&engine->execlists, EXECLISTS_ACTIVE_PREEMPT)) + if (!engine->execlists.pending[0]) execlists_dequeue(engine); } @@ -1344,11 +1291,6 @@ static void execlists_submission_tasklet(unsigned long data) struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; unsigned long flags; - GEM_TRACE("%s awake?=%d, active=%x\n", - engine->name, - !!intel_wakeref_active(&engine->wakeref), - engine->execlists.active); - spin_lock_irqsave(&engine->active.lock, flags); __execlists_submission_tasklet(engine); spin_unlock_irqrestore(&engine->active.lock, flags); @@ -1375,12 +1317,16 @@ static void __submit_queue_imm(struct intel_engine_cs *engine) tasklet_hi_schedule(&execlists->tasklet); } -static void submit_queue(struct intel_engine_cs *engine, int prio) +static void submit_queue(struct intel_engine_cs *engine, + const struct i915_request *rq) { - if (prio > engine->execlists.queue_priority_hint) { - engine->execlists.queue_priority_hint = prio; - __submit_queue_imm(engine); - } + struct intel_engine_execlists *execlists = &engine->execlists; + + if (rq_prio(rq) <= execlists->queue_priority_hint) + return; + + execlists->queue_priority_hint = rq_prio(rq); + __submit_queue_imm(engine); } static void execlists_submit_request(struct i915_request *request) @@ -1396,7 +1342,7 @@ static void execlists_submit_request(struct i915_request *request) GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root)); GEM_BUG_ON(list_empty(&request->sched.link)); - submit_queue(engine, rq_prio(request)); + submit_queue(engine, request); spin_unlock_irqrestore(&engine->active.lock, flags); } @@ -2052,27 +1998,13 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine) spin_unlock_irqrestore(&engine->active.lock, flags); } -static bool lrc_regs_ok(const struct i915_request *rq) -{ - const struct intel_ring *ring = rq->ring; - const u32 *regs = rq->hw_context->lrc_reg_state; - - /* Quick spot check for the common signs of context corruption */ - - if (regs[CTX_RING_BUFFER_CONTROL + 1] != - (RING_CTL_SIZE(ring->size) | RING_VALID)) - return false; - - if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma)) - return false; - - return true; -} - -static void reset_csb_pointers(struct intel_engine_execlists *execlists) +static void reset_csb_pointers(struct intel_engine_cs *engine) { + struct intel_engine_execlists * const execlists = &engine->execlists; const unsigned int reset_value = execlists->csb_size - 1; + ring_pause(engine) = 0; + /* * After a reset, the HW starts writing into CSB entry [0]. We * therefore have to set our HEAD pointer back one entry so that @@ -2119,18 +2051,19 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) process_csb(engine); /* drain preemption events */ /* Following the reset, we need to reload the CSB read/write pointers */ - reset_csb_pointers(&engine->execlists); + reset_csb_pointers(engine); /* * Save the currently executing context, even if we completed * its request, it was still running at the time of the * reset and will have been clobbered. */ - if (!port_isset(execlists->port)) - goto out_clear; + rq = execlists_active(execlists); + if (!rq) + return; - rq = port_request(execlists->port); ce = rq->hw_context; + rq = active_request(rq); /* * Catch up with any missed context-switch interrupts. @@ -2143,9 +2076,12 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) */ execlists_cancel_port_requests(execlists); - rq = active_request(rq); - if (!rq) + if (!rq) { + ce->ring->head = ce->ring->tail; goto out_replay; + } + + ce->ring->head = intel_ring_wrap(ce->ring, rq->head); /* * If this request hasn't started yet, e.g. it is waiting on a @@ -2159,7 +2095,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) * Otherwise, if we have not started yet, the request should replay * perfectly and we do not need to flag the result as being erroneous. */ - if (!i915_request_started(rq) && lrc_regs_ok(rq)) + if (!i915_request_started(rq)) goto out_replay; /* @@ -2174,7 +2110,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) * image back to the expected values to skip over the guilty request. */ i915_reset_request(rq, stalled); - if (!stalled && lrc_regs_ok(rq)) + if (!stalled) goto out_replay; /* @@ -2194,17 +2130,13 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled) execlists_init_reg_state(regs, ce, engine, ce->ring); out_replay: - /* Rerun the request; its payload has been neutered (if guilty). */ - ce->ring->head = - rq ? intel_ring_wrap(ce->ring, rq->head) : ce->ring->tail; + GEM_TRACE("%s replay {head:%04x, tail:%04x\n", + engine->name, ce->ring->head, ce->ring->tail); intel_ring_update_space(ce->ring); __execlists_update_reg_state(ce, engine); /* Push back any incomplete requests for replay after the reset. */ __unwind_incomplete_requests(engine); - -out_clear: - execlists_clear_all_active(execlists); } static void execlists_reset(struct intel_engine_cs *engine, bool stalled) @@ -2300,7 +2232,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; - GEM_BUG_ON(port_isset(execlists->port)); GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet)); execlists->tasklet.func = nop_submission_tasklet; @@ -2518,15 +2449,29 @@ static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs) return cs; } +static u32 *emit_preempt_busywait(struct i915_request *request, u32 *cs) +{ + *cs++ = MI_SEMAPHORE_WAIT | + MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | + MI_SEMAPHORE_SAD_EQ_SDD; + *cs++ = 0; + *cs++ = intel_hws_preempt_address(request->engine); + *cs++ = 0; + + return cs; +} + static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs) { cs = gen8_emit_ggtt_write(cs, request->fence.seqno, request->timeline->hwsp_offset, 0); - *cs++ = MI_USER_INTERRUPT; + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + cs = emit_preempt_busywait(request, cs); request->tail = intel_ring_offset(request, cs); assert_ring_tail_valid(request->ring, request->tail); @@ -2547,9 +2492,10 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs) PIPE_CONTROL_FLUSH_ENABLE | PIPE_CONTROL_CS_STALL, 0); - *cs++ = MI_USER_INTERRUPT; + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + cs = emit_preempt_busywait(request, cs); request->tail = intel_ring_offset(request, cs); assert_ring_tail_valid(request->ring, request->tail); @@ -2598,8 +2544,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine) engine->flags |= I915_ENGINE_SUPPORTS_STATS; if (!intel_vgpu_active(engine->i915)) engine->flags |= I915_ENGINE_HAS_SEMAPHORES; - if (engine->preempt_context && - HAS_LOGICAL_RING_PREEMPTION(engine->i915)) + if (HAS_LOGICAL_RING_PREEMPTION(engine->i915)) engine->flags |= I915_ENGINE_HAS_PREEMPTION; } @@ -2722,11 +2667,6 @@ int intel_execlists_submission_init(struct intel_engine_cs *engine) i915_mmio_reg_offset(RING_ELSP(base)); } - execlists->preempt_complete_status = ~0u; - if (engine->preempt_context) - execlists->preempt_complete_status = - upper_32_bits(engine->preempt_context->lrc_desc); - execlists->csb_status = &engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX]; @@ -2738,7 +2678,7 @@ int intel_execlists_submission_init(struct intel_engine_cs *engine) else execlists->csb_size = GEN11_CSB_ENTRIES; - reset_csb_pointers(execlists); + reset_csb_pointers(engine); return 0; } @@ -2921,11 +2861,6 @@ populate_lr_context(struct intel_context *ce, if (!engine->default_state) regs[CTX_CONTEXT_CONTROL + 1] |= _MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT); - if (ce->gem_context == engine->i915->preempt_context && - INTEL_GEN(engine->i915) < 11) - regs[CTX_CONTEXT_CONTROL + 1] |= - _MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT | - CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT); ret = 0; err_unpin_ctx: diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 4a8e7ee19916..4139b9762b44 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1241,10 +1241,10 @@ static void error_record_engine_registers(struct i915_gpu_state *error, } } -static void record_request(struct i915_request *request, +static void record_request(const struct i915_request *request, struct drm_i915_error_request *erq) { - struct i915_gem_context *ctx = request->gem_context; + const struct i915_gem_context *ctx = request->gem_context; erq->flags = request->fence.flags; erq->context = request->fence.context; @@ -1308,20 +1308,15 @@ static void engine_record_requests(struct intel_engine_cs *engine, ee->num_requests = count; } -static void error_record_engine_execlists(struct intel_engine_cs *engine, +static void error_record_engine_execlists(const struct intel_engine_cs *engine, struct drm_i915_error_engine *ee) { const struct intel_engine_execlists * const execlists = &engine->execlists; - unsigned int n; + struct i915_request * const *port = execlists->active; + unsigned int n = 0; - for (n = 0; n < execlists_num_ports(execlists); n++) { - struct i915_request *rq = port_request(&execlists->port[n]); - - if (!rq) - break; - - record_request(rq, &ee->execlist[n]); - } + while (*port) + record_request(*port++, &ee->execlist[n++]); ee->num_ports = n; } diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 26bd45f11dc5..c71edd6ea873 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -276,6 +276,12 @@ static bool i915_request_retire(struct i915_request *rq) local_irq_disable(); + /* + * We only loosely track inflight requests across preemption, + * and so we may find ourselves attempting to retire a _completed_ + * request that we have removed from the HW and put back on a run + * queue. + */ spin_lock(&rq->engine->active.lock); list_del(&rq->sched.link); spin_unlock(&rq->engine->active.lock); diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index edbbdfec24ab..bebc1e9b4a5e 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -28,6 +28,7 @@ #include #include +#include "gt/intel_context_types.h" #include "gt/intel_engine_types.h" #include "i915_gem.h" diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index 2e9b38bdc33c..b1ba3e65cd52 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -179,8 +179,7 @@ static inline int rq_prio(const struct i915_request *rq) static void kick_submission(struct intel_engine_cs *engine, int prio) { - const struct i915_request *inflight = - port_request(engine->execlists.port); + const struct i915_request *inflight = *engine->execlists.active; /* * If we are already the currently executing context, don't diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h index 2987219a6300..4920ff9aba62 100644 --- a/drivers/gpu/drm/i915/i915_utils.h +++ b/drivers/gpu/drm/i915/i915_utils.h @@ -131,6 +131,18 @@ __check_struct_size(size_t base, size_t arr, size_t count, size_t *size) ((typeof(ptr))((unsigned long)(ptr) | __bits)); \ }) +#define ptr_count_dec(p_ptr) do { \ + typeof(p_ptr) __p = (p_ptr); \ + unsigned long __v = (unsigned long)(*__p); \ + *__p = (typeof(*p_ptr))(--__v); \ +} while (0) + +#define ptr_count_inc(p_ptr) do { \ + typeof(p_ptr) __p = (p_ptr); \ + unsigned long __v = (unsigned long)(*__p); \ + *__p = (typeof(*p_ptr))(++__v); \ +} while (0) + #define page_mask_bits(ptr) ptr_mask_bits(ptr, PAGE_SHIFT) #define page_unmask_bits(ptr) ptr_unmask_bits(ptr, PAGE_SHIFT) #define page_pack_bits(ptr, bits) ptr_pack_bits(ptr, bits, PAGE_SHIFT) diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c index 928121f06054..6b6413d88b0a 100644 --- a/drivers/gpu/drm/i915/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/intel_guc_submission.c @@ -32,7 +32,11 @@ #include "intel_guc_submission.h" #include "i915_drv.h" -#define GUC_PREEMPT_FINISHED 0x1 +enum { + GUC_PREEMPT_NONE = 0, + GUC_PREEMPT_INPROGRESS, + GUC_PREEMPT_FINISHED, +}; #define GUC_PREEMPT_BREADCRUMB_DWORDS 0x8 #define GUC_PREEMPT_BREADCRUMB_BYTES \ (sizeof(u32) * GUC_PREEMPT_BREADCRUMB_DWORDS) @@ -537,15 +541,11 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc); u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64); - spin_lock(&client->wq_lock); - guc_wq_item_append(client, engine->guc_id, ctx_desc, ring_tail, rq->fence.seqno); guc_ring_doorbell(client); client->submissions[engine->id] += 1; - - spin_unlock(&client->wq_lock); } /* @@ -631,8 +631,9 @@ static void inject_preempt_context(struct work_struct *work) data[6] = intel_guc_ggtt_offset(guc, guc->shared_data); if (WARN_ON(intel_guc_send(guc, data, ARRAY_SIZE(data)))) { - execlists_clear_active(&engine->execlists, - EXECLISTS_ACTIVE_PREEMPT); + intel_write_status_page(engine, + I915_GEM_HWS_PREEMPT, + GUC_PREEMPT_NONE); tasklet_schedule(&engine->execlists.tasklet); } @@ -672,8 +673,6 @@ static void complete_preempt_context(struct intel_engine_cs *engine) { struct intel_engine_execlists *execlists = &engine->execlists; - GEM_BUG_ON(!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT)); - if (inject_preempt_hang(execlists)) return; @@ -681,89 +680,90 @@ static void complete_preempt_context(struct intel_engine_cs *engine) execlists_unwind_incomplete_requests(execlists); wait_for_guc_preempt_report(engine); - intel_write_status_page(engine, I915_GEM_HWS_PREEMPT, 0); + intel_write_status_page(engine, I915_GEM_HWS_PREEMPT, GUC_PREEMPT_NONE); } -/** - * guc_submit() - Submit commands through GuC - * @engine: engine associated with the commands - * - * The only error here arises if the doorbell hardware isn't functioning - * as expected, which really shouln't happen. - */ -static void guc_submit(struct intel_engine_cs *engine) +static void guc_submit(struct intel_engine_cs *engine, + struct i915_request **out, + struct i915_request **end) { struct intel_guc *guc = &engine->i915->guc; - struct intel_engine_execlists * const execlists = &engine->execlists; - struct execlist_port *port = execlists->port; - unsigned int n; + struct intel_guc_client *client = guc->execbuf_client; - for (n = 0; n < execlists_num_ports(execlists); n++) { - struct i915_request *rq; - unsigned int count; + spin_lock(&client->wq_lock); - rq = port_unpack(&port[n], &count); - if (rq && count == 0) { - port_set(&port[n], port_pack(rq, ++count)); + do { + struct i915_request *rq = *out++; - flush_ggtt_writes(rq->ring->vma); + flush_ggtt_writes(rq->ring->vma); + guc_add_request(guc, rq); + } while (out != end); - guc_add_request(guc, rq); - } - } + spin_unlock(&client->wq_lock); } -static void port_assign(struct execlist_port *port, struct i915_request *rq) +static inline int rq_prio(const struct i915_request *rq) { - GEM_BUG_ON(port_isset(port)); - - port_set(port, i915_request_get(rq)); + return rq->sched.attr.priority | __NO_PREEMPTION; } -static inline int rq_prio(const struct i915_request *rq) +static struct i915_request *schedule_in(struct i915_request *rq, int idx) { - return rq->sched.attr.priority; + trace_i915_request_in(rq, idx); + + if (!rq->hw_context->inflight) + rq->hw_context->inflight = rq->engine; + intel_context_inflight_inc(rq->hw_context); + + return i915_request_get(rq); } -static inline int port_prio(const struct execlist_port *port) +static void schedule_out(struct i915_request *rq) { - return rq_prio(port_request(port)) | __NO_PREEMPTION; + trace_i915_request_out(rq); + + intel_context_inflight_dec(rq->hw_context); + if (!intel_context_inflight_count(rq->hw_context)) + rq->hw_context->inflight = NULL; + + i915_request_put(rq); } -static bool __guc_dequeue(struct intel_engine_cs *engine) +static void __guc_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; - struct execlist_port *port = execlists->port; - struct i915_request *last = NULL; - const struct execlist_port * const last_port = - &execlists->port[execlists->port_mask]; + struct i915_request **first = execlists->inflight; + struct i915_request ** const last_port = first + execlists->port_mask; + struct i915_request *last = first[0]; + struct i915_request **port; bool submit = false; struct rb_node *rb; lockdep_assert_held(&engine->active.lock); - if (port_isset(port)) { + if (last) { if (intel_engine_has_preemption(engine)) { struct guc_preempt_work *preempt_work = &engine->i915->guc.preempt_work[engine->id]; int prio = execlists->queue_priority_hint; - if (i915_scheduler_need_preempt(prio, - port_prio(port))) { - execlists_set_active(execlists, - EXECLISTS_ACTIVE_PREEMPT); + if (i915_scheduler_need_preempt(prio, rq_prio(last))) { + intel_write_status_page(engine, + I915_GEM_HWS_PREEMPT, + GUC_PREEMPT_INPROGRESS); queue_work(engine->i915->guc.preempt_wq, &preempt_work->work); - return false; + return; } } - port++; - if (port_isset(port)) - return false; + if (*++first) + return; + + last = NULL; } - GEM_BUG_ON(port_isset(port)); + port = first; while ((rb = rb_first_cached(&execlists->queue))) { struct i915_priolist *p = to_priolist(rb); struct i915_request *rq, *rn; @@ -774,18 +774,15 @@ static bool __guc_dequeue(struct intel_engine_cs *engine) if (port == last_port) goto done; - if (submit) - port_assign(port, last); + *port = schedule_in(last, + port - execlists->inflight); port++; } list_del_init(&rq->sched.link); - __i915_request_submit(rq); - trace_i915_request_in(rq, port_index(port, execlists)); - - last = rq; submit = true; + last = rq; } rb_erase_cached(&p->node, &execlists->queue); @@ -794,58 +791,41 @@ static bool __guc_dequeue(struct intel_engine_cs *engine) done: execlists->queue_priority_hint = rb ? to_priolist(rb)->priority : INT_MIN; - if (submit) - port_assign(port, last); - if (last) - execlists_user_begin(execlists, execlists->port); - - /* We must always keep the beast fed if we have work piled up */ - GEM_BUG_ON(port_isset(execlists->port) && - !execlists_is_active(execlists, EXECLISTS_ACTIVE_USER)); - GEM_BUG_ON(rb_first_cached(&execlists->queue) && - !port_isset(execlists->port)); - - return submit; -} - -static void guc_dequeue(struct intel_engine_cs *engine) -{ - if (__guc_dequeue(engine)) - guc_submit(engine); + if (submit) { + *port = schedule_in(last, port - execlists->inflight); + *++port = NULL; + guc_submit(engine, first, port); + } + execlists->active = execlists->inflight; } static void guc_submission_tasklet(unsigned long data) { struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; struct intel_engine_execlists * const execlists = &engine->execlists; - struct execlist_port *port = execlists->port; - struct i915_request *rq; + struct i915_request **port, *rq; unsigned long flags; spin_lock_irqsave(&engine->active.lock, flags); - rq = port_request(port); - while (rq && i915_request_completed(rq)) { - trace_i915_request_out(rq); - i915_request_put(rq); + for (port = execlists->inflight; (rq = *port); port++) { + if (!i915_request_completed(rq)) + break; - port = execlists_port_complete(execlists, port); - if (port_isset(port)) { - execlists_user_begin(execlists, port); - rq = port_request(port); - } else { - execlists_user_end(execlists); - rq = NULL; - } + schedule_out(rq); + } + if (port != execlists->inflight) { + int idx = port - execlists->inflight; + int rem = ARRAY_SIZE(execlists->inflight) - idx; + memmove(execlists->inflight, port, rem * sizeof(*port)); } - if (execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT) && - intel_read_status_page(engine, I915_GEM_HWS_PREEMPT) == + if (intel_read_status_page(engine, I915_GEM_HWS_PREEMPT) == GUC_PREEMPT_FINISHED) complete_preempt_context(engine); - if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_PREEMPT)) - guc_dequeue(engine); + if (!intel_read_status_page(engine, I915_GEM_HWS_PREEMPT)) + __guc_dequeue(engine); spin_unlock_irqrestore(&engine->active.lock, flags); } @@ -959,7 +939,6 @@ static void guc_cancel_requests(struct intel_engine_cs *engine) execlists->queue_priority_hint = INT_MIN; execlists->queue = RB_ROOT_CACHED; - GEM_BUG_ON(port_isset(execlists->port)); spin_unlock_irqrestore(&engine->active.lock, flags); } @@ -1422,7 +1401,7 @@ int intel_guc_submission_enable(struct intel_guc *guc) * and it is guaranteed that it will remove the work item from the * queue before our request is completed. */ - BUILD_BUG_ON(ARRAY_SIZE(engine->execlists.port) * + BUILD_BUG_ON(ARRAY_SIZE(engine->execlists.inflight) * sizeof(struct guc_wq_item) * I915_NUM_ENGINES > GUC_WQ_SIZE); diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index dfaa5bc52ecc..36ff8421c1a0 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -366,13 +366,15 @@ static int __igt_breadcrumbs_smoketest(void *arg) if (!wait_event_timeout(wait->wait, i915_sw_fence_done(wait), - HZ / 2)) { + 5 * HZ)) { struct i915_request *rq = requests[count - 1]; - pr_err("waiting for %d fences (last %llx:%lld) on %s timed out!\n", - count, + pr_err("waiting for %d/%d fences (last %llx:%lld) on %s timed out!\n", + atomic_read(&wait->pending), count, rq->fence.context, rq->fence.seqno, t->engine->name); + GEM_TRACE_DUMP(); + i915_gem_set_wedged(t->engine->i915); GEM_BUG_ON(!i915_request_completed(rq)); i915_sw_fence_wait(wait); From patchwork Mon Jun 10 07:21:09 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984175 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6E3371932 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 572DE286C2 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4B71428832; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 71A13286C2 for ; Mon, 10 Jun 2019 07:21:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EB50989113; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 15DC289119 for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848364-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:09 +0100 Message-Id: <20190610072126.6355-12-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 11/28] drm/i915/execlists: Minimalistic timeslicing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we have multiple contexts of equal priority pending execution, activate a timer to demote the currently executing context in favour of the next in the queue when that timeslice expires. This enforces fairness between contexts (so long as they allow preemption -- forced preemption, in the future, will kick those who do not obey) and allows us to avoid userspace blocking forward progress with e.g. unbounded MI_SEMAPHORE_WAIT. For the starting point here, we use the jiffie as our timeslice so that we should be reasonably efficient wrt frequent CPU wakeups. Testcase: igt/gem_exec_scheduler/semaphore-resolve Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gt/intel_engine_types.h | 6 + drivers/gpu/drm/i915/gt/intel_lrc.c | 111 +++++++++ drivers/gpu/drm/i915/gt/selftest_lrc.c | 223 +++++++++++++++++++ drivers/gpu/drm/i915/i915_scheduler.c | 1 + drivers/gpu/drm/i915/i915_scheduler_types.h | 1 + 5 files changed, 342 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index ce0ade3f5cad..1cbe10a0fec7 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include "i915_gem.h" @@ -137,6 +138,11 @@ struct intel_engine_execlists { */ struct tasklet_struct tasklet; + /** + * @timer: kick the current context if its timeslice expires + */ + struct timer_list timer; + /** * @default_priolist: priority list for I915_PRIORITY_NORMAL */ diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index a57ca97bc17a..83b5738815aa 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -255,6 +255,7 @@ static int effective_prio(const struct i915_request *rq) prio |= I915_PRIORITY_NOSEMAPHORE; /* Restrict mere WAIT boosts from triggering preemption */ + BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */ return prio | __NO_PREEMPTION; } @@ -811,6 +812,81 @@ last_active(const struct intel_engine_execlists *execlists) return *last; } +static void +defer_request(struct i915_request * const rq, struct list_head * const pl) +{ + struct i915_dependency *p; + + /* + * We want to move the interrupted request to the back of + * the round-robin list (i.e. its priority level), but + * in doing so, we must then move all requests that were in + * flight and were waiting for the interrupted request to + * be run after it again. + */ + list_move_tail(&rq->sched.link, pl); + + list_for_each_entry(p, &rq->sched.waiters_list, wait_link) { + struct i915_request *w = + container_of(p->waiter, typeof(*w), sched); + + /* Leave semaphores spinning on the other engines */ + if (w->engine != rq->engine) + continue; + + /* No waiter should start before the active request completed */ + GEM_BUG_ON(i915_request_started(w)); + + GEM_BUG_ON(rq_prio(w) > rq_prio(rq)); + if (rq_prio(w) < rq_prio(rq)) + continue; + + if (list_empty(&w->sched.link)) + continue; /* Not yet submitted; unready */ + + /* + * This should be very shallow as it is limited by the + * number of requests that can fit in a ring (<64) and + * the number of contexts that can be in flight on this + * engine. + */ + defer_request(w, pl); + } +} + +static void defer_active(struct intel_engine_cs *engine) +{ + struct i915_request *rq; + + rq = __unwind_incomplete_requests(engine); + if (!rq) + return; + + defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq))); +} + +static bool +need_timeslice(struct intel_engine_cs *engine, const struct i915_request *rq) +{ + int hint; + + if (list_is_last(&rq->sched.link, &engine->active.requests)) + return false; + + hint = max(rq_prio(list_next_entry(rq, sched.link)), + engine->execlists.queue_priority_hint); + + return hint >= rq_prio(rq); +} + +static bool +enable_timeslice(struct intel_engine_cs *engine) +{ + struct i915_request *last = last_active(&engine->execlists); + + return last && need_timeslice(engine, last); +} + static void execlists_dequeue(struct intel_engine_cs *engine) { struct intel_engine_execlists * const execlists = &engine->execlists; @@ -904,6 +980,27 @@ static void execlists_dequeue(struct intel_engine_cs *engine) */ last->hw_context->lrc_desc |= CTX_DESC_FORCE_RESTORE; last = NULL; + } else if (need_timeslice(engine, last) && + !timer_pending(&engine->execlists.timer)) { + GEM_TRACE("%s: expired last=%llx:%lld, prio=%d, hint=%d\n", + engine->name, + last->fence.context, + last->fence.seqno, + last->sched.attr.priority, + execlists->queue_priority_hint); + + ring_pause(engine) = 1; + defer_active(engine); + + /* + * Unlike for preemption, if we rewind and continue + * executing the same context as previously active, + * the order of execution will remain the same and + * the tail will only advance. We do not need to + * force a full context restore, as a lite-restore + * is sufficient to resample the monotonic TAIL. + */ + last = NULL; } else { /* * Otherwise if we already have a request pending @@ -1226,6 +1323,9 @@ static void process_csb(struct intel_engine_cs *engine) sizeof(*execlists->pending)); execlists->pending[0] = NULL; + if (enable_timeslice(engine)) + mod_timer(&execlists->timer, jiffies + 1); + if (!inject_preempt_hang(execlists)) ring_pause(engine) = 0; } else if (status & GEN8_CTX_STATUS_PREEMPTED) { @@ -1296,6 +1396,15 @@ static void execlists_submission_tasklet(unsigned long data) spin_unlock_irqrestore(&engine->active.lock, flags); } +static void execlists_submission_timer(struct timer_list *timer) +{ + struct intel_engine_cs *engine = + from_timer(engine, timer, execlists.timer); + + /* Kick the tasklet for some interrupt coalescing and reset handling */ + tasklet_hi_schedule(&engine->execlists.tasklet); +} + static void queue_request(struct intel_engine_cs *engine, struct i915_sched_node *node, int prio) @@ -2524,6 +2633,7 @@ static int gen8_init_rcs_context(struct i915_request *rq) static void execlists_park(struct intel_engine_cs *engine) { + del_timer_sync(&engine->execlists.timer); intel_engine_park(engine); } @@ -2621,6 +2731,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) tasklet_init(&engine->execlists.tasklet, execlists_submission_tasklet, (unsigned long)engine); + timer_setup(&engine->execlists.timer, execlists_submission_timer, 0); logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 06593254b7d6..338111d690ac 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -79,6 +79,225 @@ static int live_sanitycheck(void *arg) return err; } +static int +emit_semaphore_chain(struct i915_request *rq, struct i915_vma *vma, int idx) +{ + u32 *cs; + + cs = intel_ring_begin(rq, 10); + if (IS_ERR(cs)) + return PTR_ERR(cs); + + *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE; + + *cs++ = MI_SEMAPHORE_WAIT | + MI_SEMAPHORE_GLOBAL_GTT | + MI_SEMAPHORE_POLL | + MI_SEMAPHORE_SAD_NEQ_SDD; + *cs++ = 0; + *cs++ = i915_ggtt_offset(vma) + 4 * idx; + *cs++ = 0; + + if (idx > 0) { + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = i915_ggtt_offset(vma) + 4 * (idx - 1); + *cs++ = 0; + *cs++ = 1; + } else { + *cs++ = MI_NOOP; + *cs++ = MI_NOOP; + *cs++ = MI_NOOP; + *cs++ = MI_NOOP; + } + + *cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE; + + intel_ring_advance(rq, cs); + return 0; +} + +static struct i915_request * +semaphore_queue(struct intel_engine_cs *engine, struct i915_vma *vma, int idx) +{ + struct i915_gem_context *ctx; + struct i915_request *rq; + int err; + + ctx = kernel_context(engine->i915); + if (!ctx) + return ERR_PTR(-ENOMEM); + + rq = igt_request_alloc(ctx, engine); + if (IS_ERR(rq)) + goto out_ctx; + + err = emit_semaphore_chain(rq, vma, idx); + i915_request_add(rq); + if (err) + rq = ERR_PTR(err); + +out_ctx: + kernel_context_close(ctx); + return rq; +} + +static int +release_queue(struct intel_engine_cs *engine, + struct i915_vma *vma, + int idx) +{ + struct i915_sched_attr attr = { + .priority = I915_USER_PRIORITY(I915_PRIORITY_MAX), + }; + struct i915_request *rq; + u32 *cs; + + rq = i915_request_create(engine->kernel_context); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + cs = intel_ring_begin(rq, 4); + if (IS_ERR(cs)) { + i915_request_add(rq); + return PTR_ERR(cs); + } + + *cs++ = MI_STORE_DWORD_IMM_GEN4 | MI_USE_GGTT; + *cs++ = i915_ggtt_offset(vma) + 4 * (idx - 1); + *cs++ = 0; + *cs++ = 1; + + intel_ring_advance(rq, cs); + i915_request_add(rq); + + engine->schedule(rq, &attr); + + return 0; +} + +static int +slice_semaphore_queue(struct intel_engine_cs *outer, + struct i915_vma *vma, + int count) +{ + struct intel_engine_cs *engine; + struct i915_request *head; + enum intel_engine_id id; + int err, i, n = 0; + + head = semaphore_queue(outer, vma, n++); + if (IS_ERR(head)) + return PTR_ERR(head); + + i915_request_get(head); + for_each_engine(engine, outer->i915, id) { + for (i = 0; i < count; i++) { + struct i915_request *rq; + + rq = semaphore_queue(engine, vma, n++); + if (IS_ERR(rq)) { + err = PTR_ERR(rq); + goto out; + } + } + } + + err = release_queue(outer, vma, n); + if (err) + goto out; + + if (i915_request_wait(head, + I915_WAIT_LOCKED, + 2 * RUNTIME_INFO(outer->i915)->num_engines * (count + 2) * (count + 3)) < 0) { + pr_err("Failed to slice along semaphore chain of length (%d, %d)!\n", + count, n); + GEM_TRACE_DUMP(); + i915_gem_set_wedged(outer->i915); + err = -EIO; + } + +out: + i915_request_put(head); + return err; +} + +static int live_timeslice_preempt(void *arg) +{ + struct drm_i915_private *i915 = arg; + struct drm_i915_gem_object *obj; + intel_wakeref_t wakeref; + struct i915_vma *vma; + void *vaddr; + int err = 0; + int count; + + /* + * If a request takes too long, we would like to give other users + * a fair go on the GPU. In particular, users may create batches + * that wait upon external input, where that input may even be + * supplied by another GPU job. To avoid blocking forever, we + * need to preempt the current task and replace it with another + * ready task. + */ + + mutex_lock(&i915->drm.struct_mutex); + wakeref = intel_runtime_pm_get(i915); + + obj = i915_gem_object_create_internal(i915, PAGE_SIZE); + if (IS_ERR(obj)) { + err = PTR_ERR(obj); + goto err_unlock; + } + + vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL); + if (IS_ERR(vma)) { + err = PTR_ERR(vma); + goto err_obj; + } + + vaddr = i915_gem_object_pin_map(obj, I915_MAP_WC); + if (IS_ERR(vaddr)) { + err = PTR_ERR(vaddr); + goto err_obj; + } + + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); + if (err) + goto err_map; + + for_each_prime_number_from(count, 1, 16) { + struct intel_engine_cs *engine; + enum intel_engine_id id; + + for_each_engine(engine, i915, id) { + memset(vaddr, 0, PAGE_SIZE); + + err = slice_semaphore_queue(engine, vma, count); + if (err) + goto err_pin; + + if (igt_flush_test(i915, I915_WAIT_LOCKED)) { + err = -EIO; + goto err_pin; + } + } + } + +err_pin: + i915_vma_unpin(vma); +err_map: + i915_gem_object_unpin_map(obj); +err_obj: + i915_gem_object_put(obj); +err_unlock: + if (igt_flush_test(i915, I915_WAIT_LOCKED)) + err = -EIO; + intel_runtime_pm_put(i915, wakeref); + mutex_unlock(&i915->drm.struct_mutex); + + return err; +} + static int live_busywait_preempt(void *arg) { struct drm_i915_private *i915 = arg; @@ -398,6 +617,9 @@ static int live_late_preempt(void *arg) if (!ctx_lo) goto err_ctx_hi; + /* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */ + ctx_lo->sched.priority = I915_USER_PRIORITY(1); + for_each_engine(engine, i915, id) { struct igt_live_test t; struct i915_request *rq; @@ -1818,6 +2040,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915) { static const struct i915_subtest tests[] = { SUBTEST(live_sanitycheck), + SUBTEST(live_timeslice_preempt), SUBTEST(live_busywait_preempt), SUBTEST(live_preempt), SUBTEST(live_late_preempt), diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c index b1ba3e65cd52..0bd452e851d8 100644 --- a/drivers/gpu/drm/i915/i915_scheduler.c +++ b/drivers/gpu/drm/i915/i915_scheduler.c @@ -394,6 +394,7 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node, list_add(&dep->wait_link, &signal->waiters_list); list_add(&dep->signal_link, &node->signalers_list); dep->signaler = signal; + dep->waiter = node; dep->flags = flags; /* Keep track of whether anyone on this chain has a semaphore */ diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h index 3e309631bd0b..aad81acba9dc 100644 --- a/drivers/gpu/drm/i915/i915_scheduler_types.h +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h @@ -62,6 +62,7 @@ struct i915_sched_node { struct i915_dependency { struct i915_sched_node *signaler; + struct i915_sched_node *waiter; struct list_head signal_link; struct list_head wait_link; struct list_head dfs_link; From patchwork Mon Jun 10 07:21:10 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984181 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C6A216C5 for ; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADC5128816 for ; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A234F28832; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3F56F28816 for ; Mon, 10 Jun 2019 07:21:53 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E68F58912D; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3B90B8910B for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848365-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:10 +0100 Message-Id: <20190610072126.6355-13-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 12/28] drm/i915/execlists: Force preemption X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If the preempted context takes too long to relinquish control, e.g. it is stuck inside a shader with arbitration disabled, evict that context with an engine reset. This ensures that preemptions are reasonably responsive, providing a tighter QoS for the more important context at the cost of flagging unresponsive contexts more frequently (i.e. instead of using an ~10s hangcheck, we now evict at ~10ms). The challenge of lies in picking a timeout that can be reasonably serviced by HW for typical workloads, balancing the existing clients against the needs for responsiveness. Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Tvrtko Ursulin --- drivers/gpu/drm/i915/Kconfig.profile | 12 +++++++ drivers/gpu/drm/i915/gt/intel_lrc.c | 49 ++++++++++++++++++++++++++-- 2 files changed, 58 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 4fd1ea639d0f..613b753cb27a 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -25,3 +25,15 @@ config DRM_I915_SPIN_REQUEST May be 0 to disable the initial spin. In practice, we estimate the cost of enabling the interrupt (if currently disabled) to be a few microseconds. + +config DRM_I915_PREEMPT_TIMEOUT + int "Preempt timeout (ms)" + default 10 # milliseconds + help + How long to wait (in milliseconds) for a preemption event to occur + when submitting a new context via execlists. If the current context + does not hit an arbitration point and yield to HW before the timer + expires, the HW will be reset to allow the more important context + to execute. + + May be 0 to disable the timeout. diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 83b5738815aa..40246ebd223f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1218,6 +1218,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine) *port = execlists_schedule_in(last, port - execlists->pending); memset(port + 1, 0, (last_port - port) * sizeof(*port)); execlists_submit_ports(engine); + + if (CONFIG_DRM_I915_PREEMPT_TIMEOUT) { + mod_timer(&execlists->timer, + jiffies + msecs_to_jiffies_timeout(CONFIG_DRM_I915_PREEMPT_TIMEOUT)); + } } } @@ -1373,13 +1378,48 @@ static void process_csb(struct intel_engine_cs *engine) invalidate_csb_entries(&buf[0], &buf[num_entries - 1]); } -static void __execlists_submission_tasklet(struct intel_engine_cs *const engine) +static bool __execlists_submission_tasklet(struct intel_engine_cs *const engine) { lockdep_assert_held(&engine->active.lock); process_csb(engine); - if (!engine->execlists.pending[0]) + if (!engine->execlists.pending[0]) { execlists_dequeue(engine); + return true; + } + + return false; +} + +static void preempt_reset(struct intel_engine_cs *engine) +{ + const unsigned int bit = I915_RESET_ENGINE + engine->id; + unsigned long *lock = &engine->i915->gpu_error.flags; + + if (test_and_set_bit(bit, lock)) + return; + + tasklet_disable_nosync(&engine->execlists.tasklet); + spin_unlock(&engine->active.lock); + + i915_reset_engine(engine, "preemption time out"); + + spin_lock(&engine->active.lock); + tasklet_enable(&engine->execlists.tasklet); + + clear_bit(bit, lock); + wake_up_bit(lock, bit); +} + +static bool preempt_timeout(struct intel_engine_cs *const engine) +{ + if (!CONFIG_DRM_I915_PREEMPT_TIMEOUT) + return false; + + if (!intel_engine_has_preemption(engine)) + return false; + + return !timer_pending(&engine->execlists.timer); } /* @@ -1392,7 +1432,10 @@ static void execlists_submission_tasklet(unsigned long data) unsigned long flags; spin_lock_irqsave(&engine->active.lock, flags); - __execlists_submission_tasklet(engine); + + if (!__execlists_submission_tasklet(engine) && preempt_timeout(engine)) + preempt_reset(engine); + spin_unlock_irqrestore(&engine->active.lock, flags); } From patchwork Mon Jun 10 07:21:11 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984171 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9DE5B6C5 for ; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 87056286C2 for ; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7B8D628816; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 33777286C2 for ; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id E84CB89119; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 604BB89113 for ; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848366-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:11 +0100 Message-Id: <20190610072126.6355-14-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 13/28] drm/i915: Use forced preemptions in preference over hangcheck X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP How well does this work in practice? It means that unless someone else is attempting to run we do not reset infinite loops. Maybe that is a good thing. Opens: * This sacrifices error capture. Maybe make that an opt-in with a watchdog. Signed-off-by: Chris Wilson Cc: Mika Kuoppala Cc: Tvrtko Ursulin Cc: Joonas Lahtinen --- drivers/gpu/drm/i915/gt/intel_gt_pm.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c index ae7155f0e063..9a4e58d637ee 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c @@ -43,7 +43,8 @@ static int intel_gt_unpark(struct intel_wakeref *wf) i915_pmu_gt_unparked(i915); - i915_queue_hangcheck(i915); + if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION)) + i915_queue_hangcheck(i915); pm_notify(i915, INTEL_GT_UNPARK); From patchwork Mon Jun 10 07:21:12 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 53A931902 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3B56F28812 for ; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3028E28832; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id ED02D28812 for ; Mon, 10 Jun 2019 07:21:49 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 43FD089127; Mon, 10 Jun 2019 07:21:48 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 846D289119 for ; Mon, 10 Jun 2019 07:21:43 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848367-1500050 for multiple; Mon, 10 Jun 2019 08:21:29 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:12 +0100 Message-Id: <20190610072126.6355-15-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 14/28] drm/i915: Add a label for config DRM_I915_SPIN_REQUEST X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we don't give it a label, it does not appear as a configuration option. Signed-off-by: Chris Wilson Cc: Mika Kuoppala Reviewed-by: Mika Kuoppala --- drivers/gpu/drm/i915/Kconfig.profile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile index 613b753cb27a..8273d3baafe4 100644 --- a/drivers/gpu/drm/i915/Kconfig.profile +++ b/drivers/gpu/drm/i915/Kconfig.profile @@ -13,7 +13,7 @@ config DRM_I915_USERFAULT_AUTOSUSPEND runtime pm autosuspend delay tunable. config DRM_I915_SPIN_REQUEST - int + int "Busywait for request completion (us)" default 5 # microseconds help Before sleeping waiting for a request (GPU operation) to complete, From patchwork Mon Jun 10 07:21:13 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A63881580 for ; Mon, 10 Jun 2019 07:21:52 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8E94528816 for ; Mon, 10 Jun 2019 07:21:52 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 82D352881C; Mon, 10 Jun 2019 07:21:52 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id A7BD8286C2 for ; Mon, 10 Jun 2019 07:21:51 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6E52289129; Mon, 10 Jun 2019 07:21:50 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id AA6F189113 for ; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848368-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:13 +0100 Message-Id: <20190610072126.6355-16-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 15/28] drm/i915: Throw away the active object retirement complexity X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Remove the accumulated optimisations that we have for i915_vma_retire and reduce it to the bare essential of tracking the active object reference. This allows us to only use atomic operations, and so will be able to avoid the struct_mutex requirement. The principal loss here is the shrinker MRU bumping, so now if we have to shrink, we will do so in much more random order and more likely to try and shrink recently used objects. That is a nuisance, but shrinking active objects is a second step we try to avoid and will always be a system-wide performance issue. The other loss is here is in the automatic pruning of the reservation_object when idling. This is not as large an issue as upon reservation_object introduction as now adding new fences into the object replaces already signaled fences, keeping the array compact. But we do lose the auto-expiration of stale fences and unused arrays. That may be a noticeable problem for which we need to re-implement autopruning. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 1 - drivers/gpu/drm/i915/gem/i915_gem_object.h | 6 -- .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 - drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 5 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 1 - drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 1 - drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 32 +++++------ drivers/gpu/drm/i915/i915_debugfs.c | 8 +-- drivers/gpu/drm/i915/i915_gem_batch_pool.c | 42 ++++++-------- drivers/gpu/drm/i915/i915_vma.c | 56 ++++--------------- 11 files changed, 47 insertions(+), 110 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 03725ca42cc7..291a8f27d85a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -189,7 +189,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, mutex_lock(&i915->drm.struct_mutex); - GEM_BUG_ON(i915_gem_object_is_active(obj)); list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) { GEM_BUG_ON(i915_vma_is_active(vma)); vma->flags &= ~I915_VMA_PIN_MASK; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 7cb1871d7128..454bfb498001 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -158,12 +158,6 @@ i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj) return obj->ops->flags & I915_GEM_OBJECT_ASYNC_CANCEL; } -static inline bool -i915_gem_object_is_active(const struct drm_i915_gem_object *obj) -{ - return READ_ONCE(obj->active_count); -} - static inline bool i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 5b05698619ce..c299fed2c6b1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -156,7 +156,6 @@ struct drm_i915_gem_object { /** Count of VMA actually bound by this object */ atomic_t bind_count; - unsigned int active_count; /** Count of how many global VMA are currently pinned for use by HW */ unsigned int pin_global; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 88e63afd1d3d..48451110e736 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -235,8 +235,9 @@ i915_gem_shrink(struct drm_i915_private *i915, continue; if (!(flags & I915_SHRINK_ACTIVE) && - (i915_gem_object_is_active(obj) || - i915_gem_object_is_framebuffer(obj))) + (i915_gem_object_is_framebuffer(obj) || + reservation_object_test_signaled_rcu(obj->resv, + true))) continue; if (!(flags & I915_SHRINK_BOUND) && diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index b92809418729..b0ba1680ede6 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -476,7 +476,6 @@ static int igt_mmap_offset_exhaustion(void *arg) } /* NB we rely on the _active_ reference to access obj now */ - GEM_BUG_ON(!i915_gem_object_is_active(obj)); err = create_mmap_offset(obj); if (err) { pr_err("[loop %d] create_mmap_offset failed with err=%d\n", diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 40246ebd223f..0ab3d1377e92 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1502,9 +1502,7 @@ static void execlists_submit_request(struct i915_request *request) static void __execlists_context_fini(struct intel_context *ce) { intel_ring_put(ce->ring); - - GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj)); - i915_gem_object_put(ce->state->obj); + i915_vma_put(ce->state); } static void execlists_context_destroy(struct kref *kref) diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index 93b0893c736b..aa4c9e82138e 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1309,7 +1309,6 @@ void intel_ring_free(struct kref *ref) static void __ring_context_fini(struct intel_context *ce) { - GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj)); i915_gem_object_put(ce->state->obj); } diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 127faef8d8c2..c6016398c7e9 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -130,33 +130,29 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine) struct drm_i915_private *i915 = h->i915; struct i915_address_space *vm = h->ctx->ppgtt ? &h->ctx->ppgtt->vm : &i915->ggtt.vm; + struct drm_i915_gem_object *obj; struct i915_request *rq = NULL; struct i915_vma *hws, *vma; unsigned int flags; + void *vaddr; u32 *batch; int err; - if (i915_gem_object_is_active(h->obj)) { - struct drm_i915_gem_object *obj; - void *vaddr; - - obj = i915_gem_object_create_internal(h->i915, PAGE_SIZE); - if (IS_ERR(obj)) - return ERR_CAST(obj); + obj = i915_gem_object_create_internal(h->i915, PAGE_SIZE); + if (IS_ERR(obj)) + return ERR_CAST(obj); - vaddr = i915_gem_object_pin_map(obj, - i915_coherent_map_type(h->i915)); - if (IS_ERR(vaddr)) { - i915_gem_object_put(obj); - return ERR_CAST(vaddr); - } + vaddr = i915_gem_object_pin_map(obj, i915_coherent_map_type(h->i915)); + if (IS_ERR(vaddr)) { + i915_gem_object_put(obj); + return ERR_CAST(vaddr); + } - i915_gem_object_unpin_map(h->obj); - i915_gem_object_put(h->obj); + i915_gem_object_unpin_map(h->obj); + i915_gem_object_put(h->obj); - h->obj = obj; - h->batch = vaddr; - } + h->obj = obj; + h->batch = vaddr; vma = i915_vma_instance(h->obj, vm, NULL); if (IS_ERR(vma)) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 326a56a97247..fec5fdeca07f 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -74,11 +74,6 @@ static int i915_capabilities(struct seq_file *m, void *data) return 0; } -static char get_active_flag(struct drm_i915_gem_object *obj) -{ - return i915_gem_object_is_active(obj) ? '*' : ' '; -} - static char get_pin_flag(struct drm_i915_gem_object *obj) { return obj->pin_global ? 'p' : ' '; @@ -143,9 +138,8 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) unsigned int frontbuffer_bits; int pin_count = 0; - seq_printf(m, "%pK: %c%c%c%c%c %8zdKiB %02x %02x %s%s%s", + seq_printf(m, "%pK: %c%c%c%c %8zdKiB %02x %02x %s%s%s", &obj->base, - get_active_flag(obj), get_pin_flag(obj), get_tiling_flag(obj), get_global_flag(obj), diff --git a/drivers/gpu/drm/i915/i915_gem_batch_pool.c b/drivers/gpu/drm/i915/i915_gem_batch_pool.c index 56adfdcaed3e..1b7595e2ac21 100644 --- a/drivers/gpu/drm/i915/i915_gem_batch_pool.c +++ b/drivers/gpu/drm/i915/i915_gem_batch_pool.c @@ -94,34 +94,26 @@ i915_gem_batch_pool_get(struct i915_gem_batch_pool *pool, list = &pool->cache_list[n]; list_for_each_entry(obj, list, batch_pool_link) { + struct reservation_object *resv = obj->resv; + /* The batches are strictly LRU ordered */ - if (i915_gem_object_is_active(obj)) { - struct reservation_object *resv = obj->resv; - - if (!reservation_object_test_signaled_rcu(resv, true)) - break; - - i915_retire_requests(pool->engine->i915); - GEM_BUG_ON(i915_gem_object_is_active(obj)); - - /* - * The object is now idle, clear the array of shared - * fences before we add a new request. Although, we - * remain on the same engine, we may be on a different - * timeline and so may continually grow the array, - * trapping a reference to all the old fences, rather - * than replace the existing fence. - */ - if (rcu_access_pointer(resv->fence)) { - reservation_object_lock(resv, NULL); - reservation_object_add_excl_fence(resv, NULL); - reservation_object_unlock(resv); - } + if (!reservation_object_test_signaled_rcu(resv, true)) + break; + + /* + * The object is now idle, clear the array of shared + * fences before we add a new request. Although, we + * remain on the same engine, we may be on a different + * timeline and so may continually grow the array, + * trapping a reference to all the old fences, rather + * than replace the existing fence. + */ + if (rcu_access_pointer(resv->fence)) { + reservation_object_lock(resv, NULL); + reservation_object_add_excl_fence(resv, NULL); + reservation_object_unlock(resv); } - GEM_BUG_ON(!reservation_object_test_signaled_rcu(obj->resv, - true)); - if (obj->base.size >= size) goto found; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 5c075cd6f9fc..6e04b4489ad2 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -77,45 +77,11 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) #endif -static void obj_bump_mru(struct drm_i915_gem_object *obj) -{ - struct drm_i915_private *i915 = to_i915(obj->base.dev); - unsigned long flags; - - spin_lock_irqsave(&i915->mm.obj_lock, flags); - - list_move_tail(&obj->mm.link, &i915->mm.shrink_list); - - spin_unlock_irqrestore(&i915->mm.obj_lock, flags); - - obj->mm.dirty = true; /* be paranoid */ -} - static void __i915_vma_retire(struct i915_active *ref) { struct i915_vma *vma = container_of(ref, typeof(*vma), active); - struct drm_i915_gem_object *obj = vma->obj; - - GEM_BUG_ON(!i915_gem_object_is_active(obj)); - if (--obj->active_count) - return; - - /* Prune the shared fence arrays iff completely idle (inc. external) */ - if (reservation_object_trylock(obj->resv)) { - if (reservation_object_test_signaled_rcu(obj->resv, true)) - reservation_object_add_excl_fence(obj->resv, NULL); - reservation_object_unlock(obj->resv); - } - - /* - * Bump our place on the bound list to keep it roughly in LRU order - * so that we don't steal from recently used but inactive objects - * (unless we are forced to ofc!) - */ - if (i915_gem_object_is_shrinkable(obj)) - obj_bump_mru(obj); - i915_gem_object_put(obj); /* and drop the active reference */ + i915_vma_put(vma); } static struct i915_vma * @@ -923,6 +889,7 @@ int i915_vma_move_to_active(struct i915_vma *vma, unsigned int flags) { struct drm_i915_gem_object *obj = vma->obj; + int err; assert_vma_held(vma); assert_object_held(obj); @@ -936,17 +903,13 @@ int i915_vma_move_to_active(struct i915_vma *vma, * add the active reference first and queue for it to be dropped * *last*. */ - if (!vma->active.count && !obj->active_count++) - i915_gem_object_get(obj); /* once more for the active ref */ - - if (unlikely(i915_active_ref(&vma->active, rq->fence.context, rq))) { - if (!vma->active.count && !--obj->active_count) - i915_gem_object_put(obj); - return -ENOMEM; - } + if (i915_active_acquire(&vma->active)) + i915_vma_get(vma); - GEM_BUG_ON(!i915_vma_is_active(vma)); - GEM_BUG_ON(!obj->active_count); + err = i915_active_ref(&vma->active, rq->fence.context, rq); + i915_active_release(&vma->active); + if (err) + return err; obj->write_domain = 0; if (flags & EXEC_OBJECT_WRITE) { @@ -958,11 +921,14 @@ int i915_vma_move_to_active(struct i915_vma *vma, obj->read_domains = 0; } obj->read_domains |= I915_GEM_GPU_DOMAINS; + obj->mm.dirty = true; if (flags & EXEC_OBJECT_NEEDS_FENCE) __i915_active_request_set(&vma->last_fence, rq); export_fence(vma, rq, flags); + + GEM_BUG_ON(!i915_vma_is_active(vma)); return 0; } From patchwork Mon Jun 10 07:21:14 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984147 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A1D7C1580 for ; Mon, 10 Jun 2019 07:21:37 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8939E28812 for ; Mon, 10 Jun 2019 07:21:37 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7D16E28816; Mon, 10 Jun 2019 07:21:37 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3302A286C2 for ; Mon, 10 Jun 2019 07:21:36 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6DAEC89101; Mon, 10 Jun 2019 07:21:34 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 62BD088FA4 for ; Mon, 10 Jun 2019 07:21:31 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848369-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:14 +0100 Message-Id: <20190610072126.6355-17-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 16/28] drm/i915: Provide an i915_active.acquire callback X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we introduce a callback for i915_active that is only called the first time we use the i915_active and is symmetrically paired with the i915_active.retire callback, we can replace the open-coded and non-atomic implementations -- which will be very fragile (i.e. broken) upon removing the struct_mutex serialisation. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +- drivers/gpu/drm/i915/gt/intel_context.c | 82 ++++--- drivers/gpu/drm/i915/gt/intel_context.h | 14 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 6 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 2 +- drivers/gpu/drm/i915/gt/mock_engine.c | 2 +- drivers/gpu/drm/i915/i915_active.c | 215 ++++++++++--------- drivers/gpu/drm/i915/i915_active.h | 25 +-- drivers/gpu/drm/i915/i915_active_types.h | 10 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/i915_timeline.c | 16 +- drivers/gpu/drm/i915/i915_vma.c | 15 +- drivers/gpu/drm/i915/selftests/i915_active.c | 12 +- 13 files changed, 219 insertions(+), 190 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index f09e3abe695a..837cad233cc6 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -922,8 +922,12 @@ static int context_barrier_task(struct i915_gem_context *ctx, if (!cb) return -ENOMEM; - i915_active_init(i915, &cb->base, cb_retire); - i915_active_acquire(&cb->base); + i915_active_init(i915, &cb->base, NULL, cb_retire); + err = i915_active_acquire(&cb->base); + if (err) { + kfree(cb); + return err; + } for_each_gem_engine(ce, i915_gem_context_lock_engines(ctx), it) { struct i915_request *rq; diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index 0102f6bb62ec..b9fea31cf9ec 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -95,11 +95,14 @@ void intel_context_unpin(struct intel_context *ce) intel_context_put(ce); } -static int __context_pin_state(struct i915_vma *vma, unsigned long flags) +static int __context_pin_state(struct i915_vma *vma) { + u64 flags; int err; - err = i915_vma_pin(vma, 0, 0, flags | PIN_GLOBAL); + flags = PIN_HIGH | PIN_GLOBAL; + flags |= i915_vm_to_ggtt(vma->vm)->pin_bias | PIN_OFFSET_BIAS; + err = i915_vma_pin(vma, 0, 0, flags); if (err) return err; @@ -119,7 +122,7 @@ static void __context_unpin_state(struct i915_vma *vma) __i915_vma_unpin(vma); } -static void intel_context_retire(struct i915_active *active) +static void __intel_context_retire(struct i915_active *active) { struct intel_context *ce = container_of(active, typeof(*ce), active); @@ -129,65 +132,58 @@ static void intel_context_retire(struct i915_active *active) intel_context_put(ce); } -void -intel_context_init(struct intel_context *ce, - struct i915_gem_context *ctx, - struct intel_engine_cs *engine) -{ - GEM_BUG_ON(!engine->cops); - - kref_init(&ce->ref); - - ce->gem_context = ctx; - ce->engine = engine; - ce->ops = engine->cops; - ce->sseu = engine->sseu; - - INIT_LIST_HEAD(&ce->signal_link); - INIT_LIST_HEAD(&ce->signals); - - mutex_init(&ce->pin_mutex); - - i915_active_init(ctx->i915, &ce->active, intel_context_retire); -} - -int intel_context_active(struct intel_context *ce, unsigned long flags) +static int __intel_context_active(struct i915_active *active) { + struct intel_context *ce = container_of(active, typeof(*ce), active); int err; - if (!i915_active_acquire(&ce->active)) - return 0; - intel_context_get(ce); if (!ce->state) return 0; - err = __context_pin_state(ce->state, flags); - if (err) { - i915_active_cancel(&ce->active); - intel_context_put(ce); - return err; - } + err = __context_pin_state(ce->state); + if (err) + goto err_put; /* Preallocate tracking nodes */ if (!i915_gem_context_is_kernel(ce->gem_context)) { err = i915_active_acquire_preallocate_barrier(&ce->active, ce->engine); - if (err) { - i915_active_release(&ce->active); - return err; - } + if (err) + goto err_unpin; } return 0; + +err_unpin: + __context_unpin_state(ce->state); +err_put: + intel_context_put(ce); + return err; } -void intel_context_inactive(struct intel_context *ce) +void +intel_context_init(struct intel_context *ce, + struct i915_gem_context *ctx, + struct intel_engine_cs *engine) { - /* Nodes preallocated in intel_context_active() */ - i915_active_acquire_barrier(&ce->active); - i915_active_release(&ce->active); + GEM_BUG_ON(!engine->cops); + + kref_init(&ce->ref); + + ce->gem_context = ctx; + ce->engine = engine; + ce->ops = engine->cops; + ce->sseu = engine->sseu; + + INIT_LIST_HEAD(&ce->signal_link); + INIT_LIST_HEAD(&ce->signals); + + mutex_init(&ce->pin_mutex); + + i915_active_init(ctx->i915, &ce->active, + __intel_context_active, __intel_context_retire); } static void i915_global_context_shrink(void) diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h index e71629f7c2e0..156f4a9c9269 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.h +++ b/drivers/gpu/drm/i915/gt/intel_context.h @@ -9,6 +9,7 @@ #include +#include "i915_active.h" #include "intel_context_types.h" #include "intel_engine_types.h" @@ -102,8 +103,17 @@ static inline void intel_context_exit(struct intel_context *ce) ce->ops->exit(ce); } -int intel_context_active(struct intel_context *ce, unsigned long flags); -void intel_context_inactive(struct intel_context *ce); +static inline int intel_context_active(struct intel_context *ce) +{ + return i915_active_acquire(&ce->active); +} + +static inline void intel_context_inactive(struct intel_context *ce) +{ + /* Nodes preallocated in intel_context_active() */ + i915_active_acquire_barrier(&ce->active); + i915_active_release(&ce->active); +} static inline struct intel_context *intel_context_get(struct intel_context *ce) { diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 0ab3d1377e92..826fe77158a9 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1558,12 +1558,10 @@ __execlists_context_pin(struct intel_context *ce, goto err; GEM_BUG_ON(!ce->state); - ret = intel_context_active(ce, - engine->i915->ggtt.pin_bias | - PIN_OFFSET_BIAS | - PIN_HIGH); + ret = intel_context_active(ce); if (ret) goto err; + GEM_BUG_ON(!i915_vma_is_pinned(ce->state)); vaddr = i915_gem_object_pin_map(ce->state->obj, i915_coherent_map_type(engine->i915) | diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index aa4c9e82138e..69718961aae8 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1437,7 +1437,7 @@ static int ring_context_pin(struct intel_context *ce) ce->state = vma; } - err = intel_context_active(ce, PIN_HIGH); + err = intel_context_active(ce); if (err) return err; diff --git a/drivers/gpu/drm/i915/gt/mock_engine.c b/drivers/gpu/drm/i915/gt/mock_engine.c index 00c666d3e652..d88dcd21ea6c 100644 --- a/drivers/gpu/drm/i915/gt/mock_engine.c +++ b/drivers/gpu/drm/i915/gt/mock_engine.c @@ -154,7 +154,7 @@ static int mock_context_pin(struct intel_context *ce) return -ENOMEM; } - ret = intel_context_active(ce, PIN_HIGH); + ret = intel_context_active(ce); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 100e40afc9d6..4888ef745c6b 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -30,48 +30,49 @@ struct active_node { }; static void -__active_park(struct i915_active *ref) +active_retire(struct i915_active *ref) { + struct rb_root root = RB_ROOT; struct active_node *it, *n; + bool retire = false; - rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - GEM_BUG_ON(i915_active_request_isset(&it->base)); - kmem_cache_free(global.slab_cache, it); + GEM_BUG_ON(!atomic_read(&ref->count)); + if (atomic_add_unless(&ref->count, -1, 1)) + return; + + /* One active may be flushed from inside the acquire of another */ + mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING); + + /* return the unused nodes to our slabcache -- flushing the allocator */ + if (atomic_dec_and_test(&ref->count)) { + root = ref->tree; + ref->tree = RB_ROOT; + ref->cache = NULL; + retire = true; } - ref->tree = RB_ROOT; -} -static void -__active_retire(struct i915_active *ref) -{ - GEM_BUG_ON(!ref->count); - if (--ref->count) - return; + mutex_unlock(&ref->mutex); - /* return the unused nodes to our slabcache */ - __active_park(ref); + if (retire) + ref->retire(ref); - ref->retire(ref); + rbtree_postorder_for_each_entry_safe(it, n, &root, node) { + GEM_BUG_ON(i915_active_request_isset(&it->base)); + kmem_cache_free(global.slab_cache, it); + } } static void node_retire(struct i915_active_request *base, struct i915_request *rq) { - __active_retire(container_of(base, struct active_node, base)->ref); -} - -static void -last_retire(struct i915_active_request *base, struct i915_request *rq) -{ - __active_retire(container_of(base, struct i915_active, last)); + active_retire(container_of(base, struct active_node, base)->ref); } static struct i915_active_request * active_instance(struct i915_active *ref, u64 idx) { - struct active_node *node; + struct active_node *node, *prealloc; struct rb_node **p, *parent; - struct i915_request *old; /* * We track the most recently used timeline to skip a rbtree search @@ -79,20 +80,17 @@ active_instance(struct i915_active *ref, u64 idx) * at all. We can reuse the last slot if it is empty, that is * after the previous activity has been retired, or if it matches the * current timeline. - * - * Note that we allow the timeline to be active simultaneously in - * the rbtree and the last cache. We do this to avoid having - * to search and replace the rbtree element for a new timeline, with - * the cost being that we must be aware that the ref may be retired - * twice for the same timeline (as the older rbtree element will be - * retired before the new request added to last). */ - old = i915_active_request_raw(&ref->last, BKL(ref)); - if (!old || old->fence.context == idx) - goto out; + node = READ_ONCE(ref->cache); + if (node && node->timeline == idx) + return &node->base; + + /* Preallocate a replacement, just in case */ + prealloc = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); + if (!prealloc) + return NULL; - /* Move the currently active fence into the rbtree */ - idx = old->fence.context; + mutex_lock(&ref->mutex); parent = NULL; p = &ref->tree.rb_node; @@ -100,8 +98,10 @@ active_instance(struct i915_active *ref, u64 idx) parent = *p; node = rb_entry(parent, struct active_node, node); - if (node->timeline == idx && !IS_ERR(node->base.request)) - goto replace; + if (node->timeline == idx && !IS_ERR(node->base.request)) { + kmem_cache_free(global.slab_cache, prealloc); + goto out; + } if (node->timeline < idx) p = &parent->rb_right; @@ -109,17 +109,7 @@ active_instance(struct i915_active *ref, u64 idx) p = &parent->rb_left; } - node = kmem_cache_alloc(global.slab_cache, GFP_KERNEL); - - /* kmalloc may retire the ref->last (thanks shrinker)! */ - if (unlikely(!i915_active_request_raw(&ref->last, BKL(ref)))) { - kmem_cache_free(global.slab_cache, node); - goto out; - } - - if (unlikely(!node)) - return ERR_PTR(-ENOMEM); - + node = prealloc; i915_active_request_init(&node->base, NULL, node_retire); node->ref = ref; node->timeline = idx; @@ -127,38 +117,27 @@ active_instance(struct i915_active *ref, u64 idx) rb_link_node(&node->node, parent, p); rb_insert_color(&node->node, &ref->tree); -replace: - /* - * Overwrite the previous active slot in the rbtree with last, - * leaving last zeroed. If the previous slot is still active, - * we must be careful as we now only expect to receive one retire - * callback not two, and so much undo the active counting for the - * overwritten slot. - */ - if (i915_active_request_isset(&node->base)) { - /* Retire ourselves from the old rq->active_list */ - __list_del_entry(&node->base.link); - ref->count--; - GEM_BUG_ON(!ref->count); - } - GEM_BUG_ON(list_empty(&ref->last.link)); - list_replace_init(&ref->last.link, &node->base.link); - node->base.request = fetch_and_zero(&ref->last.request); - out: - return &ref->last; + ref->cache = node; + mutex_unlock(&ref->mutex); + + return &node->base; } -void i915_active_init(struct drm_i915_private *i915, - struct i915_active *ref, - void (*retire)(struct i915_active *ref)) +void __i915_active_init(struct drm_i915_private *i915, + struct i915_active *ref, + int (*active)(struct i915_active *ref), + void (*retire)(struct i915_active *ref), + struct lock_class_key *key) { ref->i915 = i915; + ref->active = active; ref->retire = retire; ref->tree = RB_ROOT; - i915_active_request_init(&ref->last, NULL, last_retire); + ref->cache = NULL; init_llist_head(&ref->barriers); - ref->count = 0; + atomic_set(&ref->count, 0); + __mutex_init(&ref->mutex, "i915_active", key); } int i915_active_ref(struct i915_active *ref, @@ -166,60 +145,80 @@ int i915_active_ref(struct i915_active *ref, struct i915_request *rq) { struct i915_active_request *active; - int err = 0; + int err; /* Prevent reaping in case we malloc/wait while building the tree */ - i915_active_acquire(ref); + err = i915_active_acquire(ref); + if (err) + return err; active = active_instance(ref, timeline); - if (IS_ERR(active)) { - err = PTR_ERR(active); + if (!active) { + err = -ENOMEM; goto out; } if (!i915_active_request_isset(active)) - ref->count++; + atomic_inc(&ref->count); __i915_active_request_set(active, rq); - GEM_BUG_ON(!ref->count); out: i915_active_release(ref); return err; } -bool i915_active_acquire(struct i915_active *ref) +int i915_active_acquire(struct i915_active *ref) { - lockdep_assert_held(BKL(ref)); - return !ref->count++; + int err; + + if (atomic_add_unless(&ref->count, 1, 0)) + return 0; + + err = mutex_lock_interruptible(&ref->mutex); + if (err) + return err; + + if (!atomic_read(&ref->count) && ref->active) + err = ref->active(ref); + if (!err) + atomic_inc(&ref->count); + + mutex_unlock(&ref->mutex); + + return err; } void i915_active_release(struct i915_active *ref) { - lockdep_assert_held(BKL(ref)); - __active_retire(ref); + active_retire(ref); } int i915_active_wait(struct i915_active *ref) { struct active_node *it, *n; - int ret = 0; + int err; - if (i915_active_acquire(ref)) - goto out_release; + if (RB_EMPTY_ROOT(&ref->tree)) + return 0; + + err = i915_active_acquire(ref); /* Avoid retiring ourselves */ + if (err) + return err; - ret = i915_active_request_retire(&ref->last, BKL(ref)); - if (ret) - goto out_release; + err = mutex_lock_interruptible(&ref->mutex); + if (err) + goto out; rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - ret = i915_active_request_retire(&it->base, BKL(ref)); - if (ret) + err = i915_active_request_retire(&it->base, BKL(ref)); + if (err) break; } + mutex_unlock(&ref->mutex); -out_release: +out: i915_active_release(ref); - return ret; + return err; } int i915_request_await_active_request(struct i915_request *rq, @@ -234,23 +233,24 @@ int i915_request_await_active_request(struct i915_request *rq, int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) { struct active_node *it, *n; - int err = 0; + int err; - /* await allocates and so we need to avoid hitting the shrinker */ - if (i915_active_acquire(ref)) - goto out; /* was idle */ + if (RB_EMPTY_ROOT(&ref->tree)) + return 0; - err = i915_request_await_active_request(rq, &ref->last); + /* await allocates and so we need to avoid hitting the shrinker */ + err = i915_active_acquire(ref); if (err) - goto out; + return err; + mutex_lock(&ref->mutex); rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { err = i915_request_await_active_request(rq, &it->base); if (err) - goto out; + break; } + mutex_unlock(&ref->mutex); -out: i915_active_release(ref); return err; } @@ -258,9 +258,9 @@ int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) void i915_active_fini(struct i915_active *ref) { - GEM_BUG_ON(i915_active_request_isset(&ref->last)); GEM_BUG_ON(!RB_EMPTY_ROOT(&ref->tree)); - GEM_BUG_ON(ref->count); + GEM_BUG_ON(atomic_read(&ref->count)); + mutex_destroy(&ref->mutex); } #endif @@ -286,7 +286,7 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref, (void *)engine, node_retire); node->timeline = kctx->ring->timeline->fence_context; node->ref = ref; - ref->count++; + atomic_inc(&ref->count); llist_add((struct llist_node *)&node->base.link, &ref->barriers); @@ -299,8 +299,9 @@ void i915_active_acquire_barrier(struct i915_active *ref) { struct llist_node *pos, *next; - i915_active_acquire(ref); + GEM_BUG_ON(i915_active_is_idle(ref)); + mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING); llist_for_each_safe(pos, next, llist_del_all(&ref->barriers)) { struct intel_engine_cs *engine; struct active_node *node; @@ -329,7 +330,7 @@ void i915_active_acquire_barrier(struct i915_active *ref) llist_add((struct llist_node *)&node->base.link, &engine->barrier_tasks); } - i915_active_release(ref); + mutex_unlock(&ref->mutex); } void i915_request_add_barriers(struct i915_request *rq) diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index d55d37673944..6fa9263ca407 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -369,9 +369,16 @@ i915_active_request_retire(struct i915_active_request *active, * synchronisation. */ -void i915_active_init(struct drm_i915_private *i915, - struct i915_active *ref, - void (*retire)(struct i915_active *ref)); +void __i915_active_init(struct drm_i915_private *i915, + struct i915_active *ref, + int (*active)(struct i915_active *ref), + void (*retire)(struct i915_active *ref), + struct lock_class_key *key); +#define i915_active_init(i915, ref, active, retire) do { \ + static struct lock_class_key __key; \ + \ + __i915_active_init(i915, ref, active, retire, &__key); \ +} while (0) int i915_active_ref(struct i915_active *ref, u64 timeline, @@ -384,20 +391,14 @@ int i915_request_await_active(struct i915_request *rq, int i915_request_await_active_request(struct i915_request *rq, struct i915_active_request *active); -bool i915_active_acquire(struct i915_active *ref); - -static inline void i915_active_cancel(struct i915_active *ref) -{ - GEM_BUG_ON(ref->count != 1); - ref->count = 0; -} - +int i915_active_acquire(struct i915_active *ref); void i915_active_release(struct i915_active *ref); +void __i915_active_release_nested(struct i915_active *ref, int subclass); static inline bool i915_active_is_idle(const struct i915_active *ref) { - return !ref->count; + return !atomic_read(&ref->count); } #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h index c025991b9233..5b0a3024ce24 100644 --- a/drivers/gpu/drm/i915/i915_active_types.h +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -7,7 +7,9 @@ #ifndef _I915_ACTIVE_TYPES_H_ #define _I915_ACTIVE_TYPES_H_ +#include #include +#include #include #include @@ -24,13 +26,17 @@ struct i915_active_request { i915_active_retire_fn retire; }; +struct active_node; + struct i915_active { struct drm_i915_private *i915; + struct active_node *cache; struct rb_root tree; - struct i915_active_request last; - unsigned int count; + struct mutex mutex; + atomic_t count; + int (*active)(struct i915_active *ref); void (*retire)(struct i915_active *ref); struct llist_head barriers; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index acc3cb7cb219..b939d6a11aaf 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2062,7 +2062,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) if (!vma) return ERR_PTR(-ENOMEM); - i915_active_init(i915, &vma->active, NULL); + i915_active_init(i915, &vma->active, NULL, NULL); INIT_ACTIVE_REQUEST(&vma->last_fence); vma->vm = &ggtt->vm; diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index c311ce9c6f9d..e2f9336bab83 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -148,6 +148,15 @@ static void __cacheline_retire(struct i915_active *active) __idle_cacheline_free(cl); } +static int __cacheline_active(struct i915_active *active) +{ + struct i915_timeline_cacheline *cl = + container_of(active, typeof(*cl), active); + + __i915_vma_pin(cl->hwsp->vma); + return 0; +} + static struct i915_timeline_cacheline * cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline) { @@ -170,15 +179,16 @@ cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline) cl->hwsp = hwsp; cl->vaddr = page_pack_bits(vaddr, cacheline); - i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire); + i915_active_init(hwsp_to_i915(hwsp), &cl->active, + __cacheline_active, __cacheline_retire); return cl; } static void cacheline_acquire(struct i915_timeline_cacheline *cl) { - if (cl && i915_active_acquire(&cl->active)) - __i915_vma_pin(cl->hwsp->vma); + if (cl) + i915_active_acquire(&cl->active); } static void cacheline_release(struct i915_timeline_cacheline *cl) diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 6e04b4489ad2..be15f0e0c6eb 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -77,6 +77,14 @@ static void vma_print_allocator(struct i915_vma *vma, const char *reason) #endif +static int __i915_vma_active(struct i915_active *ref) +{ + struct i915_vma *vma = container_of(ref, typeof(*vma), active); + + i915_vma_get(vma); + return 0; +} + static void __i915_vma_retire(struct i915_active *ref) { struct i915_vma *vma = container_of(ref, typeof(*vma), active); @@ -106,7 +114,8 @@ vma_create(struct drm_i915_gem_object *obj, vma->size = obj->base.size; vma->display_alignment = I915_GTT_MIN_ALIGNMENT; - i915_active_init(vm->i915, &vma->active, __i915_vma_retire); + i915_active_init(vm->i915, &vma->active, + __i915_vma_active, __i915_vma_retire); INIT_ACTIVE_REQUEST(&vma->last_fence); INIT_LIST_HEAD(&vma->closed_link); @@ -903,11 +912,7 @@ int i915_vma_move_to_active(struct i915_vma *vma, * add the active reference first and queue for it to be dropped * *last*. */ - if (i915_active_acquire(&vma->active)) - i915_vma_get(vma); - err = i915_active_ref(&vma->active, rq->fence.context, rq); - i915_active_release(&vma->active); if (err) return err; diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c index cc1ca4be1a00..3b3ca5658122 100644 --- a/drivers/gpu/drm/i915/selftests/i915_active.c +++ b/drivers/gpu/drm/i915/selftests/i915_active.c @@ -36,14 +36,12 @@ static int __live_active_setup(struct drm_i915_private *i915, if (!submit) return -ENOMEM; - i915_active_init(i915, &active->base, __live_active_retire); + i915_active_init(i915, &active->base, NULL, __live_active_retire); active->retired = false; - if (!i915_active_acquire(&active->base)) { - pr_err("First i915_active_acquire should report being idle\n"); - err = -EINVAL; + err = i915_active_acquire(&active->base); + if (err) goto out; - } for_each_engine(engine, i915, id) { struct i915_request *rq; @@ -74,9 +72,9 @@ static int __live_active_setup(struct drm_i915_private *i915, pr_err("i915_active retired before submission!\n"); err = -EINVAL; } - if (active->base.count != count) { + if (atomic_read(&active->base.count) != count) { pr_err("i915_active not tracking all requests, found %d, expected %d\n", - active->base.count, count); + atomic_read(&active->base.count), count); err = -EINVAL; } From patchwork Mon Jun 10 07:21:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984169 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CDD91902 for ; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6498D286C2 for ; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 592B628816; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0158C286C2 for ; Mon, 10 Jun 2019 07:21:47 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 953F689117; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id D19568910E for ; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848370-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:15 +0100 Message-Id: <20190610072126.6355-18-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 17/28] drm/i915: Push the i915_active.retire into a worker X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP As we need to use a mutex to serialisation i915_active activation (because we want to allow the callback to sleep), we need to push the i915_active.retire into a worker callback in case we get need to retire from an atomic context. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_active.c | 38 +++++++++++++++++++----- drivers/gpu/drm/i915/i915_active_types.h | 3 ++ drivers/gpu/drm/i915/i915_vma.c | 1 + 3 files changed, 35 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index 4888ef745c6b..f7ffa6e7bd9a 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -30,18 +30,13 @@ struct active_node { }; static void -active_retire(struct i915_active *ref) +__active_retire(struct i915_active *ref) { struct rb_root root = RB_ROOT; struct active_node *it, *n; bool retire = false; - GEM_BUG_ON(!atomic_read(&ref->count)); - if (atomic_add_unless(&ref->count, -1, 1)) - return; - - /* One active may be flushed from inside the acquire of another */ - mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING); + lockdep_assert_held(&ref->mutex); /* return the unused nodes to our slabcache -- flushing the allocator */ if (atomic_dec_and_test(&ref->count)) { @@ -62,6 +57,34 @@ active_retire(struct i915_active *ref) } } +static void +active_work(struct work_struct *wrk) +{ + struct i915_active *ref = container_of(wrk, typeof(*ref), work); + + if (atomic_add_unless(&ref->count, -1, 1)) + return; + + mutex_lock(&ref->mutex); + __active_retire(ref); +} + +static void +active_retire(struct i915_active *ref) +{ + GEM_BUG_ON(!atomic_read(&ref->count)); + if (atomic_add_unless(&ref->count, -1, 1)) + return; + + /* If we are inside interrupt context (fence signaling), defer */ + if (!mutex_trylock(&ref->mutex)) { + queue_work(system_unbound_wq, &ref->work); + return; + } + + __active_retire(ref); +} + static void node_retire(struct i915_active_request *base, struct i915_request *rq) { @@ -138,6 +161,7 @@ void __i915_active_init(struct drm_i915_private *i915, init_llist_head(&ref->barriers); atomic_set(&ref->count, 0); __mutex_init(&ref->mutex, "i915_active", key); + INIT_WORK(&ref->work, active_work); } int i915_active_ref(struct i915_active *ref, diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h index 5b0a3024ce24..06acdffe0f6d 100644 --- a/drivers/gpu/drm/i915/i915_active_types.h +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -12,6 +12,7 @@ #include #include #include +#include struct drm_i915_private; struct i915_active_request; @@ -39,6 +40,8 @@ struct i915_active { int (*active)(struct i915_active *ref); void (*retire)(struct i915_active *ref); + struct work_struct work; + struct llist_head barriers; }; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index be15f0e0c6eb..393575bfb5ec 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -975,6 +975,7 @@ int i915_vma_unbind(struct i915_vma *vma) if (ret) return ret; } + flush_work(&vma->active.work); GEM_BUG_ON(i915_vma_is_active(vma)); if (i915_vma_is_pinned(vma)) { From patchwork Mon Jun 10 07:21:16 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984163 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0444E1902 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E27C9286C2 for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D75CD2881C; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 347BD286C2 for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 07ABF89115; Mon, 10 Jun 2019 07:21:43 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id F3C9C89109 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848371-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:16 +0100 Message-Id: <20190610072126.6355-19-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 18/28] drm/i915/overlay: Switch to using i915_active tracking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Remove the raw i915_active_request tracking in favour of the higher level i915_active tracking for the sole purpose of making the lockless transition easier in later patches. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_active.h | 19 ---- drivers/gpu/drm/i915/intel_overlay.c | 130 +++++++++++++-------------- 2 files changed, 64 insertions(+), 85 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index 6fa9263ca407..bdec4f81b6e8 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -89,25 +89,6 @@ int __must_check i915_active_request_set(struct i915_active_request *active, struct i915_request *rq); -/** - * i915_active_request_set_retire_fn - updates the retirement callback - * @active - the active tracker - * @fn - the routine called when the request is retired - * @mutex - struct_mutex used to guard retirements - * - * i915_active_request_set_retire_fn() updates the function pointer that - * is called when the final request associated with the @active tracker - * is retired. - */ -static inline void -i915_active_request_set_retire_fn(struct i915_active_request *active, - i915_active_retire_fn fn, - struct mutex *mutex) -{ - lockdep_assert_held(mutex); - active->retire = fn ?: i915_active_retire_noop; -} - /** * i915_active_request_raw - return the active request * @active - the active tracker diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index a2ac06a08715..55da4802426d 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -190,7 +190,8 @@ struct intel_overlay { struct overlay_registers __iomem *regs; u32 flip_addr; /* flip handling */ - struct i915_active_request last_flip; + struct i915_active last_flip; + void (*flip_complete)(struct intel_overlay *ovl); }; static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv, @@ -216,32 +217,26 @@ static void i830_overlay_clock_gating(struct drm_i915_private *dev_priv, PCI_DEVFN(0, 0), I830_CLOCK_GATE, val); } -static void intel_overlay_submit_request(struct intel_overlay *overlay, - struct i915_request *rq, - i915_active_retire_fn retire) +static struct i915_request * +alloc_request(struct intel_overlay *overlay, void (*fn)(struct intel_overlay *)) { - GEM_BUG_ON(i915_active_request_peek(&overlay->last_flip, - &overlay->i915->drm.struct_mutex)); - i915_active_request_set_retire_fn(&overlay->last_flip, retire, - &overlay->i915->drm.struct_mutex); - __i915_active_request_set(&overlay->last_flip, rq); - i915_request_add(rq); -} + struct intel_engine_cs *engine = overlay->i915->engine[RCS0]; + struct i915_request *rq; + int err; -static int intel_overlay_do_wait_request(struct intel_overlay *overlay, - struct i915_request *rq, - i915_active_retire_fn retire) -{ - intel_overlay_submit_request(overlay, rq, retire); - return i915_active_request_retire(&overlay->last_flip, - &overlay->i915->drm.struct_mutex); -} + overlay->flip_complete = fn; -static struct i915_request *alloc_request(struct intel_overlay *overlay) -{ - struct intel_engine_cs *engine = overlay->i915->engine[RCS0]; + rq = i915_request_create(engine->kernel_context); + if (IS_ERR(rq)) + return rq; + + err = i915_active_ref(&overlay->last_flip, rq->fence.context, rq); + if (err) { + i915_request_add(rq); + return ERR_PTR(err); + } - return i915_request_create(engine->kernel_context); + return rq; } /* overlay needs to be disable in OCMD reg */ @@ -253,7 +248,7 @@ static int intel_overlay_on(struct intel_overlay *overlay) WARN_ON(overlay->active); - rq = alloc_request(overlay); + rq = alloc_request(overlay, NULL); if (IS_ERR(rq)) return PTR_ERR(rq); @@ -274,7 +269,9 @@ static int intel_overlay_on(struct intel_overlay *overlay) *cs++ = MI_NOOP; intel_ring_advance(rq, cs); - return intel_overlay_do_wait_request(overlay, rq, NULL); + i915_request_add(rq); + + return i915_active_wait(&overlay->last_flip); } static void intel_overlay_flip_prepare(struct intel_overlay *overlay, @@ -318,7 +315,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay, if (tmp & (1 << 17)) DRM_DEBUG("overlay underrun, DOVSTA: %x\n", tmp); - rq = alloc_request(overlay); + rq = alloc_request(overlay, NULL); if (IS_ERR(rq)) return PTR_ERR(rq); @@ -333,8 +330,7 @@ static int intel_overlay_continue(struct intel_overlay *overlay, intel_ring_advance(rq, cs); intel_overlay_flip_prepare(overlay, vma); - - intel_overlay_submit_request(overlay, rq, NULL); + i915_request_add(rq); return 0; } @@ -355,20 +351,13 @@ static void intel_overlay_release_old_vma(struct intel_overlay *overlay) } static void -intel_overlay_release_old_vid_tail(struct i915_active_request *active, - struct i915_request *rq) +intel_overlay_release_old_vid_tail(struct intel_overlay *overlay) { - struct intel_overlay *overlay = - container_of(active, typeof(*overlay), last_flip); - intel_overlay_release_old_vma(overlay); } -static void intel_overlay_off_tail(struct i915_active_request *active, - struct i915_request *rq) +static void intel_overlay_off_tail(struct intel_overlay *overlay) { - struct intel_overlay *overlay = - container_of(active, typeof(*overlay), last_flip); struct drm_i915_private *dev_priv = overlay->i915; intel_overlay_release_old_vma(overlay); @@ -381,6 +370,16 @@ static void intel_overlay_off_tail(struct i915_active_request *active, i830_overlay_clock_gating(dev_priv, true); } +static void +intel_overlay_last_flip_retire(struct i915_active *active) +{ + struct intel_overlay *overlay = + container_of(active, typeof(*overlay), last_flip); + + if (overlay->flip_complete) + overlay->flip_complete(overlay); +} + /* overlay needs to be disabled in OCMD reg */ static int intel_overlay_off(struct intel_overlay *overlay) { @@ -395,7 +394,7 @@ static int intel_overlay_off(struct intel_overlay *overlay) * of the hw. Do it in both cases */ flip_addr |= OFC_UPDATE; - rq = alloc_request(overlay); + rq = alloc_request(overlay, intel_overlay_off_tail); if (IS_ERR(rq)) return PTR_ERR(rq); @@ -418,17 +417,16 @@ static int intel_overlay_off(struct intel_overlay *overlay) intel_ring_advance(rq, cs); intel_overlay_flip_prepare(overlay, NULL); + i915_request_add(rq); - return intel_overlay_do_wait_request(overlay, rq, - intel_overlay_off_tail); + return i915_active_wait(&overlay->last_flip); } /* recover from an interruption due to a signal * We have to be careful not to repeat work forever an make forward progess. */ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay) { - return i915_active_request_retire(&overlay->last_flip, - &overlay->i915->drm.struct_mutex); + return i915_active_wait(&overlay->last_flip); } /* Wait for pending overlay flip and release old frame. @@ -438,43 +436,40 @@ static int intel_overlay_recover_from_interrupt(struct intel_overlay *overlay) static int intel_overlay_release_old_vid(struct intel_overlay *overlay) { struct drm_i915_private *dev_priv = overlay->i915; + struct i915_request *rq; u32 *cs; - int ret; lockdep_assert_held(&dev_priv->drm.struct_mutex); - /* Only wait if there is actually an old frame to release to + /* + * Only wait if there is actually an old frame to release to * guarantee forward progress. */ if (!overlay->old_vma) return 0; - if (I915_READ(GEN2_ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT) { - /* synchronous slowpath */ - struct i915_request *rq; + if (!(I915_READ(GEN2_ISR) & I915_OVERLAY_PLANE_FLIP_PENDING_INTERRUPT)) { + intel_overlay_release_old_vid_tail(overlay); + return 0; + } - rq = alloc_request(overlay); - if (IS_ERR(rq)) - return PTR_ERR(rq); + rq = alloc_request(overlay, intel_overlay_release_old_vid_tail); + if (IS_ERR(rq)) + return PTR_ERR(rq); - cs = intel_ring_begin(rq, 2); - if (IS_ERR(cs)) { - i915_request_add(rq); - return PTR_ERR(cs); - } + cs = intel_ring_begin(rq, 2); + if (IS_ERR(cs)) { + i915_request_add(rq); + return PTR_ERR(cs); + } - *cs++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP; - *cs++ = MI_NOOP; - intel_ring_advance(rq, cs); + *cs++ = MI_WAIT_FOR_EVENT | MI_WAIT_FOR_OVERLAY_FLIP; + *cs++ = MI_NOOP; + intel_ring_advance(rq, cs); - ret = intel_overlay_do_wait_request(overlay, rq, - intel_overlay_release_old_vid_tail); - if (ret) - return ret; - } else - intel_overlay_release_old_vid_tail(&overlay->last_flip, NULL); + i915_request_add(rq); - return 0; + return i915_active_wait(&overlay->last_flip); } void intel_overlay_reset(struct drm_i915_private *dev_priv) @@ -1371,7 +1366,9 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv) overlay->contrast = 75; overlay->saturation = 146; - INIT_ACTIVE_REQUEST(&overlay->last_flip); + i915_active_init(dev_priv, + &overlay->last_flip, + NULL, intel_overlay_last_flip_retire); ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv)); if (ret) @@ -1413,6 +1410,7 @@ void intel_overlay_cleanup(struct drm_i915_private *dev_priv) WARN_ON(overlay->active); i915_gem_object_put(overlay->reg_bo); + i915_active_fini(&overlay->last_flip); kfree(overlay); } From patchwork Mon Jun 10 07:21:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984157 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 215281902 for ; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 08545286C2 for ; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id F0CB728812; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 8D98528816 for ; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 19FBA8910A; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 61CC188FA4 for ; Mon, 10 Jun 2019 07:21:38 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848372-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:17 +0100 Message-Id: <20190610072126.6355-20-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 19/28] drm/i915: Forgo last_fence active request tracking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We were using the last_fence to track the last request that used this vma that might be interpreted by a fence register and forced ourselves to wait for this request before modifying any fence register that overlapped our vma. Due to requirement that we need to track any XY_BLT command, linear or tiled, this in effect meant that we have to track the vma for its active lifespan anyway, so we can forgo the explicit last_fence tracking and just use the whole vma->active. Another solution would be to pipeline the register updates, and would help resolve some long running stalls for gen3 (but only gen 2 and 3!) Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/i915_debugfs.c | 4 +--- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 6 ++---- drivers/gpu/drm/i915/i915_gem_gtt.c | 1 - drivers/gpu/drm/i915/i915_vma.c | 13 ------------- drivers/gpu/drm/i915/i915_vma.h | 1 - 5 files changed, 3 insertions(+), 22 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index fec5fdeca07f..f6de8f99e7bb 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -209,9 +209,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) } } if (vma->fence) - seq_printf(m, " , fence: %d%s", - vma->fence->id, - i915_active_request_isset(&vma->last_fence) ? "*" : ""); + seq_printf(m, " , fence: %d", vma->fence->id); seq_puts(m, ")"); spin_lock(&obj->vma.lock); diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index d13be3b0e91d..543c5a47cc79 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -230,16 +230,14 @@ static int fence_update(struct i915_fence_reg *fence, i915_gem_object_get_tiling(vma->obj))) return -EINVAL; - ret = i915_active_request_retire(&vma->last_fence, - &vma->obj->base.dev->struct_mutex); + ret = i915_active_wait(&vma->active); if (ret) return ret; } old = xchg(&fence->vma, NULL); if (old) { - ret = i915_active_request_retire(&old->last_fence, - &old->obj->base.dev->struct_mutex); + ret = i915_active_wait(&old->active); if (ret) { fence->vma = old; return ret; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index b939d6a11aaf..05fef1d3579d 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2063,7 +2063,6 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) return ERR_PTR(-ENOMEM); i915_active_init(i915, &vma->active, NULL, NULL); - INIT_ACTIVE_REQUEST(&vma->last_fence); vma->vm = &ggtt->vm; vma->ops = &pd_vma_ops; diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 393575bfb5ec..54694ca871da 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -116,7 +116,6 @@ vma_create(struct drm_i915_gem_object *obj, i915_active_init(vm->i915, &vma->active, __i915_vma_active, __i915_vma_retire); - INIT_ACTIVE_REQUEST(&vma->last_fence); INIT_LIST_HEAD(&vma->closed_link); @@ -791,8 +790,6 @@ static void __i915_vma_destroy(struct i915_vma *vma) GEM_BUG_ON(vma->node.allocated); GEM_BUG_ON(vma->fence); - GEM_BUG_ON(i915_active_request_isset(&vma->last_fence)); - mutex_lock(&vma->vm->mutex); list_del(&vma->vm_link); mutex_unlock(&vma->vm->mutex); @@ -928,9 +925,6 @@ int i915_vma_move_to_active(struct i915_vma *vma, obj->read_domains |= I915_GEM_GPU_DOMAINS; obj->mm.dirty = true; - if (flags & EXEC_OBJECT_NEEDS_FENCE) - __i915_active_request_set(&vma->last_fence, rq); - export_fence(vma, rq, flags); GEM_BUG_ON(!i915_vma_is_active(vma)); @@ -963,14 +957,7 @@ int i915_vma_unbind(struct i915_vma *vma) * before we are finished). */ __i915_vma_pin(vma); - ret = i915_active_wait(&vma->active); - if (ret) - goto unpin; - - ret = i915_active_request_retire(&vma->last_fence, - &vma->vm->i915->drm.struct_mutex); -unpin: __i915_vma_unpin(vma); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 908118ade441..71088ff4ad59 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -111,7 +111,6 @@ struct i915_vma { #define I915_VMA_GGTT_WRITE BIT(14) struct i915_active active; - struct i915_active_request last_fence; /** * Support different GGTT views into the same object. From patchwork Mon Jun 10 07:21:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984165 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7DEF11580 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64BDE286C2 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 595742881C; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2AEBC28816 for ; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id D81418910E; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 21C038910B for ; Mon, 10 Jun 2019 07:21:39 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848373-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:18 +0100 Message-Id: <20190610072126.6355-21-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 20/28] drm/i915: Extract intel_frontbuffer active tracking X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Move the active tracking for the frontbuffer operations out of the i915_gem_object and into its own first class (refcounted) object. In the process of detangling, we switch from low level request tracking to the easier i915_active -- with the plan that this avoids any potential atomic callbacks as the frontbuffer tracking wishes to sleep as it flushes. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_clflush.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 14 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 4 - drivers/gpu/drm/i915/gem/i915_gem_object.c | 28 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 2 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 8 +- drivers/gpu/drm/i915/i915_debugfs.c | 5 - drivers/gpu/drm/i915/i915_drv.h | 4 - drivers/gpu/drm/i915/i915_gem.c | 47 +--- drivers/gpu/drm/i915/i915_vma.c | 6 +- drivers/gpu/drm/i915/intel_display.c | 67 +++-- drivers/gpu/drm/i915/intel_drv.h | 1 + drivers/gpu/drm/i915/intel_fbdev.c | 22 +- drivers/gpu/drm/i915/intel_frontbuffer.c | 244 +++++++++++++----- drivers/gpu/drm/i915/intel_frontbuffer.h | 70 +++-- drivers/gpu/drm/i915/intel_overlay.c | 8 +- 16 files changed, 294 insertions(+), 238 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c index 537aa2337cc8..54b447539430 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_clflush.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_clflush.c @@ -47,7 +47,7 @@ static void __i915_do_clflush(struct drm_i915_gem_object *obj) { GEM_BUG_ON(!i915_gem_object_has_pages(obj)); drm_clflush_sg(obj->mm.pages); - intel_fb_obj_flush(obj, ORIGIN_CPU); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU); } static void i915_clflush_work(struct work_struct *work) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index bd180ef46aeb..7e6ed767348c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -550,13 +550,6 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write) return 0; } -static inline enum fb_op_origin -fb_write_origin(struct drm_i915_gem_object *obj, unsigned int domain) -{ - return (domain == I915_GEM_DOMAIN_GTT ? - obj->frontbuffer_ggtt_origin : ORIGIN_CPU); -} - /** * Called when user space prepares to use an object with the CPU, either * through the mmap ioctl's mapping or a GTT mapping. @@ -660,9 +653,8 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data, i915_gem_object_unlock(obj); - if (write_domain != 0) - intel_fb_obj_invalidate(obj, - fb_write_origin(obj, write_domain)); + if (write_domain) + intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU); out_unpin: i915_gem_object_unpin_pages(obj); @@ -782,7 +774,7 @@ int i915_gem_object_prepare_write(struct drm_i915_gem_object *obj, } out: - intel_fb_obj_invalidate(obj, ORIGIN_CPU); + intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU); obj->mm.dirty = true; /* return with the pages pinned */ return 0; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index a8b8b9c281f1..49cf9ad97bfc 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -99,9 +99,6 @@ i915_gem_mmap_ioctl(struct drm_device *dev, void *data, up_write(&mm->mmap_sem); if (IS_ERR_VALUE(addr)) goto err; - - /* This may race, but that's ok, it only gets set */ - WRITE_ONCE(obj->frontbuffer_ggtt_origin, ORIGIN_CPU); } i915_gem_object_put(obj); @@ -280,7 +277,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) * Userspace is now writing through an untracked VMA, abandon * all hope that the hardware is able to track future writes. */ - obj->frontbuffer_ggtt_origin = ORIGIN_CPU; vma = i915_gem_object_ggtt_pin(obj, &view, 0, 0, flags); if (IS_ERR(vma) && !view.type) { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 291a8f27d85a..5f75f69687e8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -44,16 +44,6 @@ void i915_gem_object_free(struct drm_i915_gem_object *obj) return kmem_cache_free(global.slab_objects, obj); } -static void -frontbuffer_retire(struct i915_active_request *active, - struct i915_request *request) -{ - struct drm_i915_gem_object *obj = - container_of(active, typeof(*obj), frontbuffer_write); - - intel_fb_obj_flush(obj, ORIGIN_CS); -} - void i915_gem_object_init(struct drm_i915_gem_object *obj, const struct drm_i915_gem_object_ops *ops) { @@ -72,10 +62,6 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, reservation_object_init(&obj->__builtin_resv); obj->resv = &obj->__builtin_resv; - obj->frontbuffer_ggtt_origin = ORIGIN_GTT; - i915_active_request_init(&obj->frontbuffer_write, - NULL, frontbuffer_retire); - obj->mm.madv = I915_MADV_WILLNEED; INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); mutex_init(&obj->mm.get_page.lock); @@ -217,7 +203,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, GEM_BUG_ON(atomic_read(&obj->bind_count)); GEM_BUG_ON(obj->userfault_count); - GEM_BUG_ON(atomic_read(&obj->frontbuffer_bits)); GEM_BUG_ON(!list_empty(&obj->lut_list)); if (obj->ops->release) @@ -322,6 +307,8 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) { struct drm_i915_gem_object *obj = to_intel_bo(gem_obj); + GEM_BUG_ON(i915_gem_object_is_framebuffer(obj)); + if (obj->mm.quirked) __i915_gem_object_unpin_pages(obj); @@ -350,13 +337,6 @@ void i915_gem_free_object(struct drm_gem_object *gem_obj) call_rcu(&obj->rcu, __i915_gem_free_object_rcu); } -static inline enum fb_op_origin -fb_write_origin(struct drm_i915_gem_object *obj, unsigned int domain) -{ - return (domain == I915_GEM_DOMAIN_GTT ? - obj->frontbuffer_ggtt_origin : ORIGIN_CPU); -} - static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj) { return !(obj->cache_level == I915_CACHE_NONE || @@ -378,9 +358,7 @@ i915_gem_object_flush_write_domain(struct drm_i915_gem_object *obj, switch (obj->write_domain) { case I915_GEM_DOMAIN_GTT: i915_gem_flush_ggtt_writes(dev_priv); - - intel_fb_obj_flush(obj, - fb_write_origin(obj, I915_GEM_DOMAIN_GTT)); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU); for_each_ggtt_vma(vma, obj) { if (vma->iomap) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 454bfb498001..67d70d144bd9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -161,7 +161,7 @@ i915_gem_object_needs_async_cancel(const struct drm_i915_gem_object *obj) static inline bool i915_gem_object_is_framebuffer(const struct drm_i915_gem_object *obj) { - return READ_ONCE(obj->framebuffer_references); + return READ_ONCE(obj->frontbuffer); } static inline unsigned int diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index c299fed2c6b1..21bfb7bd0f57 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -15,6 +15,7 @@ #include "i915_selftest.h" struct drm_i915_gem_object; +struct intel_fronbuffer; /* * struct i915_lut_handle tracks the fast lookups from handle to vma used @@ -144,9 +145,7 @@ struct drm_i915_gem_object { */ u16 write_domain; - atomic_t frontbuffer_bits; - unsigned int frontbuffer_ggtt_origin; /* write once */ - struct i915_active_request frontbuffer_write; + struct intel_frontbuffer *frontbuffer; /** Current tiling stride for the object, if it's tiled. */ unsigned int tiling_and_stride; @@ -239,9 +238,6 @@ struct drm_i915_gem_object { */ struct reservation_object *resv; - /** References from framebuffers, locks out tiling changes. */ - unsigned int framebuffer_references; - /** Record of address bit 17 of each page at last unbind. */ unsigned long *bit_17; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index f6de8f99e7bb..a2462d6ef565 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -135,7 +135,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) struct drm_i915_private *dev_priv = to_i915(obj->base.dev); struct intel_engine_cs *engine; struct i915_vma *vma; - unsigned int frontbuffer_bits; int pin_count = 0; seq_printf(m, "%pK: %c%c%c%c %8zdKiB %02x %02x %s%s%s", @@ -225,10 +224,6 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) engine = i915_gem_object_last_write_engine(obj); if (engine) seq_printf(m, " (%s)", engine->name); - - frontbuffer_bits = atomic_read(&obj->frontbuffer_bits); - if (frontbuffer_bits) - seq_printf(m, " (frontbuffer: 0x%03x)", frontbuffer_bits); } struct file_stats { diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 2beadccbd913..8944b65af138 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2588,10 +2588,6 @@ int i915_gem_mmap_gtt(struct drm_file *file_priv, struct drm_device *dev, u32 handle, u64 *offset); int i915_gem_mmap_gtt_version(void); -void i915_gem_track_fb(struct drm_i915_gem_object *old, - struct drm_i915_gem_object *new, - unsigned frontbuffer_bits); - int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno); static inline bool __i915_wedged(struct i915_gpu_error *error) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 8303a702d9fe..e097f7fcce6f 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -133,17 +133,19 @@ i915_gem_phys_pwrite(struct drm_i915_gem_object *obj, void *vaddr = obj->phys_handle->vaddr + args->offset; char __user *user_data = u64_to_user_ptr(args->data_ptr); - /* We manually control the domain here and pretend that it + /* + * We manually control the domain here and pretend that it * remains coherent i.e. in the GTT domain, like shmem_pwrite. */ - intel_fb_obj_invalidate(obj, ORIGIN_CPU); + intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU); + if (copy_from_user(vaddr, user_data, args->size)) return -EFAULT; drm_clflush_virt_range(vaddr, args->size); i915_gem_chipset_flush(to_i915(obj->base.dev)); - intel_fb_obj_flush(obj, ORIGIN_CPU); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU); return 0; } @@ -629,7 +631,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, goto out_unpin; } - intel_fb_obj_invalidate(obj, ORIGIN_CPU); + intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CPU); user_data = u64_to_user_ptr(args->data_ptr); offset = args->offset; @@ -670,7 +672,7 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, user_data += page_length; offset += page_length; } - intel_fb_obj_flush(obj, ORIGIN_CPU); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU); i915_gem_object_unlock_fence(obj, fence); out_unpin: @@ -763,7 +765,7 @@ i915_gem_shmem_pwrite(struct drm_i915_gem_object *obj, offset = 0; } - intel_fb_obj_flush(obj, ORIGIN_CPU); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_CPU); i915_gem_object_unlock_fence(obj, fence); return ret; @@ -1869,39 +1871,6 @@ int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file) return ret; } -/** - * i915_gem_track_fb - update frontbuffer tracking - * @old: current GEM buffer for the frontbuffer slots - * @new: new GEM buffer for the frontbuffer slots - * @frontbuffer_bits: bitmask of frontbuffer slots - * - * This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them - * from @old and setting them in @new. Both @old and @new can be NULL. - */ -void i915_gem_track_fb(struct drm_i915_gem_object *old, - struct drm_i915_gem_object *new, - unsigned frontbuffer_bits) -{ - /* Control of individual bits within the mask are guarded by - * the owning plane->mutex, i.e. we can never see concurrent - * manipulation of individual bits. But since the bitfield as a whole - * is updated using RMW, we need to use atomics in order to update - * the bits. - */ - BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES > - BITS_PER_TYPE(atomic_t)); - - if (old) { - WARN_ON(!(atomic_read(&old->frontbuffer_bits) & frontbuffer_bits)); - atomic_andnot(frontbuffer_bits, &old->frontbuffer_bits); - } - - if (new) { - WARN_ON(atomic_read(&new->frontbuffer_bits) & frontbuffer_bits); - atomic_or(frontbuffer_bits, &new->frontbuffer_bits); - } -} - #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) #include "selftests/mock_gem_device.c" #include "selftests/i915_gem.c" diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 54694ca871da..5bc58d0baf73 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -917,8 +917,10 @@ int i915_vma_move_to_active(struct i915_vma *vma, if (flags & EXEC_OBJECT_WRITE) { obj->write_domain = I915_GEM_DOMAIN_RENDER; - if (intel_fb_obj_invalidate(obj, ORIGIN_CS)) - __i915_active_request_set(&obj->frontbuffer_write, rq); + if (intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CS)) + i915_active_ref(&obj->frontbuffer->write, + rq->fence.context, + rq); obj->read_domains = 0; } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 62fa573f90e8..3a3298812ad8 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3047,12 +3047,13 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, { struct drm_device *dev = crtc->base.dev; struct drm_i915_private *dev_priv = to_i915(dev); - struct drm_i915_gem_object *obj = NULL; struct drm_mode_fb_cmd2 mode_cmd = { 0 }; struct drm_framebuffer *fb = &plane_config->fb->base; u32 base_aligned = round_down(plane_config->base, PAGE_SIZE); u32 size_aligned = round_up(plane_config->base + plane_config->size, PAGE_SIZE); + struct drm_i915_gem_object *obj; + bool ret = false; size_aligned -= base_aligned; @@ -3094,7 +3095,7 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, break; default: MISSING_CASE(plane_config->tiling); - return false; + goto out; } mode_cmd.pixel_format = fb->format->format; @@ -3106,16 +3107,15 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, if (intel_framebuffer_init(to_intel_framebuffer(fb), obj, &mode_cmd)) { DRM_DEBUG_KMS("intel fb init failed\n"); - goto out_unref_obj; + goto out; } DRM_DEBUG_KMS("initial plane fb obj %p\n", obj); - return true; - -out_unref_obj: + ret = true; +out: i915_gem_object_put(obj); - return false; + return ret; } static void @@ -3172,6 +3172,12 @@ static void intel_plane_disable_noatomic(struct intel_crtc *crtc, intel_disable_plane(plane, crtc_state); } +static struct intel_frontbuffer * +to_intel_frontbuffer(struct drm_framebuffer *fb) +{ + return fb ? to_intel_framebuffer(fb)->frontbuffer : NULL; +} + static void intel_find_initial_plane_obj(struct intel_crtc *intel_crtc, struct intel_initial_plane_config *plane_config) @@ -3179,7 +3185,6 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc, struct drm_device *dev = intel_crtc->base.dev; struct drm_i915_private *dev_priv = to_i915(dev); struct drm_crtc *c; - struct drm_i915_gem_object *obj; struct drm_plane *primary = intel_crtc->base.primary; struct drm_plane_state *plane_state = primary->state; struct intel_plane *intel_plane = to_intel_plane(primary); @@ -3255,8 +3260,7 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc, return; } - obj = intel_fb_obj(fb); - intel_fb_obj_flush(obj, ORIGIN_DIRTYFB); + intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB); plane_state->src_x = 0; plane_state->src_y = 0; @@ -3271,14 +3275,14 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc, intel_state->base.src = drm_plane_state_src(plane_state); intel_state->base.dst = drm_plane_state_dest(plane_state); - if (i915_gem_object_is_tiled(obj)) + if (plane_config->tiling) dev_priv->preserve_bios_swizzle = true; plane_state->fb = fb; plane_state->crtc = &intel_crtc->base; atomic_or(to_intel_plane(primary)->frontbuffer_bit, - &obj->frontbuffer_bits); + &to_intel_frontbuffer(fb)->bits); } static int skl_max_plane_width(const struct drm_framebuffer *fb, @@ -13908,9 +13912,9 @@ static void intel_atomic_track_fbs(struct drm_atomic_state *state) int i; for_each_oldnew_plane_in_state(state, plane, old_plane_state, new_plane_state, i) - i915_gem_track_fb(intel_fb_obj(old_plane_state->fb), - intel_fb_obj(new_plane_state->fb), - to_intel_plane(plane)->frontbuffer_bit); + intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb), + to_intel_frontbuffer(new_plane_state->fb), + to_intel_plane(plane)->frontbuffer_bit); } /** @@ -14220,7 +14224,7 @@ intel_prepare_plane_fb(struct drm_plane *plane, return ret; fb_obj_bump_render_priority(obj); - intel_fb_obj_flush(obj, ORIGIN_DIRTYFB); + intel_frontbuffer_flush(obj->frontbuffer, ORIGIN_DIRTYFB); if (!new_state->fence) { /* implicit fencing */ struct dma_fence *fence; @@ -14487,13 +14491,12 @@ intel_legacy_cursor_update(struct drm_plane *plane, struct drm_modeset_acquire_ctx *ctx) { struct drm_i915_private *dev_priv = to_i915(crtc->dev); - int ret; struct drm_plane_state *old_plane_state, *new_plane_state; struct intel_plane *intel_plane = to_intel_plane(plane); - struct drm_framebuffer *old_fb; struct intel_crtc_state *crtc_state = to_intel_crtc_state(crtc->state); struct intel_crtc_state *new_crtc_state; + int ret; /* * When crtc is inactive or there is a modeset pending, @@ -14561,11 +14564,10 @@ intel_legacy_cursor_update(struct drm_plane *plane, if (ret) goto out_unlock; - intel_fb_obj_flush(intel_fb_obj(fb), ORIGIN_FLIP); - - old_fb = old_plane_state->fb; - i915_gem_track_fb(intel_fb_obj(old_fb), intel_fb_obj(fb), - intel_plane->frontbuffer_bit); + intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_FLIP); + intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb), + to_intel_frontbuffer(fb), + intel_plane->frontbuffer_bit); /* Swap plane state */ plane->state = new_plane_state; @@ -15255,15 +15257,9 @@ static void intel_setup_outputs(struct drm_i915_private *dev_priv) static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb) { struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); - struct drm_i915_gem_object *obj = intel_fb_obj(fb); drm_framebuffer_cleanup(fb); - - i915_gem_object_lock(obj); - WARN_ON(!obj->framebuffer_references--); - i915_gem_object_unlock(obj); - - i915_gem_object_put(obj); + intel_frontbuffer_put(intel_fb->frontbuffer); kfree(intel_fb); } @@ -15291,7 +15287,7 @@ static int intel_user_framebuffer_dirty(struct drm_framebuffer *fb, struct drm_i915_gem_object *obj = intel_fb_obj(fb); i915_gem_object_flush_if_display(obj); - intel_fb_obj_flush(obj, ORIGIN_DIRTYFB); + intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_DIRTYFB); return 0; } @@ -15313,8 +15309,11 @@ static int intel_framebuffer_init(struct intel_framebuffer *intel_fb, int ret = -EINVAL; int i; + intel_fb->frontbuffer = intel_frontbuffer_get(obj); + if (!intel_fb->frontbuffer) + return -ENOMEM; + i915_gem_object_lock(obj); - obj->framebuffer_references++; tiling = i915_gem_object_get_tiling(obj); stride = i915_gem_object_get_stride(obj); i915_gem_object_unlock(obj); @@ -15431,9 +15430,7 @@ static int intel_framebuffer_init(struct intel_framebuffer *intel_fb, return 0; err: - i915_gem_object_lock(obj); - obj->framebuffer_references--; - i915_gem_object_unlock(obj); + intel_frontbuffer_put(intel_fb->frontbuffer); return ret; } diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h index d0aeb383024a..f8ed9d5ab479 100644 --- a/drivers/gpu/drm/i915/intel_drv.h +++ b/drivers/gpu/drm/i915/intel_drv.h @@ -69,6 +69,7 @@ enum intel_output_type { struct intel_framebuffer { struct drm_framebuffer base; + struct intel_frontbuffer *frontbuffer; struct intel_rotation_info rot_info; /* for each plane in the normal GTT view */ diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index 0d3a6fa674e6..c7c11d1842af 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -47,13 +47,14 @@ #include "intel_fbdev.h" #include "intel_frontbuffer.h" -static void intel_fbdev_invalidate(struct intel_fbdev *ifbdev) +static struct intel_frontbuffer *to_frontbuffer(struct intel_fbdev *ifbdev) { - struct drm_i915_gem_object *obj = intel_fb_obj(&ifbdev->fb->base); - unsigned int origin = - ifbdev->vma_flags & PLANE_HAS_FENCE ? ORIGIN_GTT : ORIGIN_CPU; + return ifbdev->fb->frontbuffer; +} - intel_fb_obj_invalidate(obj, origin); +static void intel_fbdev_invalidate(struct intel_fbdev *ifbdev) +{ + intel_frontbuffer_invalidate(to_frontbuffer(ifbdev), ORIGIN_CPU); } static int intel_fbdev_set_par(struct fb_info *info) @@ -180,7 +181,6 @@ static int intelfb_create(struct drm_fb_helper *helper, const struct i915_ggtt_view view = { .type = I915_GGTT_VIEW_NORMAL, }; - struct drm_framebuffer *fb; intel_wakeref_t wakeref; struct fb_info *info; struct i915_vma *vma; @@ -226,8 +226,7 @@ static int intelfb_create(struct drm_fb_helper *helper, goto out_unlock; } - fb = &ifbdev->fb->base; - intel_fb_obj_flush(intel_fb_obj(fb), ORIGIN_DIRTYFB); + intel_frontbuffer_flush(to_frontbuffer(ifbdev), ORIGIN_DIRTYFB); info = drm_fb_helper_alloc_fbi(helper); if (IS_ERR(info)) { @@ -236,7 +235,7 @@ static int intelfb_create(struct drm_fb_helper *helper, goto out_unpin; } - ifbdev->helper.fb = fb; + ifbdev->helper.fb = &ifbdev->fb->base; info->fbops = &intelfb_ops; @@ -262,13 +261,14 @@ static int intelfb_create(struct drm_fb_helper *helper, * If the object is stolen however, it will be full of whatever * garbage was left in there. */ - if (intel_fb_obj(fb)->stolen && !prealloc) + if (vma->obj->stolen && !prealloc) memset_io(info->screen_base, 0, info->screen_size); /* Use default scratch pixmap (info->pixmap.flags = FB_PIXMAP_SYSTEM) */ DRM_DEBUG_KMS("allocated %dx%d fb: 0x%08x\n", - fb->width, fb->height, i915_ggtt_offset(vma)); + ifbdev->fb->base.width, ifbdev->fb->base.height, + i915_ggtt_offset(vma)); ifbdev->vma = vma; ifbdev->vma_flags = flags; diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c index aa34e33b6087..4fcec413f405 100644 --- a/drivers/gpu/drm/i915/intel_frontbuffer.c +++ b/drivers/gpu/drm/i915/intel_frontbuffer.c @@ -68,28 +68,9 @@ #include "intel_frontbuffer.h" #include "intel_psr.h" -void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj, - enum fb_op_origin origin, - unsigned int frontbuffer_bits) -{ - struct drm_i915_private *dev_priv = to_i915(obj->base.dev); - - if (origin == ORIGIN_CS) { - spin_lock(&dev_priv->fb_tracking.lock); - dev_priv->fb_tracking.busy_bits |= frontbuffer_bits; - dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits; - spin_unlock(&dev_priv->fb_tracking.lock); - } - - might_sleep(); - intel_psr_invalidate(dev_priv, frontbuffer_bits, origin); - intel_edp_drrs_invalidate(dev_priv, frontbuffer_bits); - intel_fbc_invalidate(dev_priv, frontbuffer_bits, origin); -} - /** - * intel_frontbuffer_flush - flush frontbuffer - * @dev_priv: i915 device + * frontbuffer_flush - flush frontbuffer + * @i915: i915 device * @frontbuffer_bits: frontbuffer plane tracking bits * @origin: which operation caused the flush * @@ -99,45 +80,27 @@ void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj, * * Can be called without any locks held. */ -static void intel_frontbuffer_flush(struct drm_i915_private *dev_priv, - unsigned frontbuffer_bits, - enum fb_op_origin origin) +static void frontbuffer_flush(struct drm_i915_private *i915, + unsigned int frontbuffer_bits, + enum fb_op_origin origin) { /* Delay flushing when rings are still busy.*/ - spin_lock(&dev_priv->fb_tracking.lock); - frontbuffer_bits &= ~dev_priv->fb_tracking.busy_bits; - spin_unlock(&dev_priv->fb_tracking.lock); + spin_lock(&i915->fb_tracking.lock); + frontbuffer_bits &= ~i915->fb_tracking.busy_bits; + spin_unlock(&i915->fb_tracking.lock); if (!frontbuffer_bits) return; might_sleep(); - intel_edp_drrs_flush(dev_priv, frontbuffer_bits); - intel_psr_flush(dev_priv, frontbuffer_bits, origin); - intel_fbc_flush(dev_priv, frontbuffer_bits, origin); -} - -void __intel_fb_obj_flush(struct drm_i915_gem_object *obj, - enum fb_op_origin origin, - unsigned int frontbuffer_bits) -{ - struct drm_i915_private *dev_priv = to_i915(obj->base.dev); - - if (origin == ORIGIN_CS) { - spin_lock(&dev_priv->fb_tracking.lock); - /* Filter out new bits since rendering started. */ - frontbuffer_bits &= dev_priv->fb_tracking.busy_bits; - dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits; - spin_unlock(&dev_priv->fb_tracking.lock); - } - - if (frontbuffer_bits) - intel_frontbuffer_flush(dev_priv, frontbuffer_bits, origin); + intel_edp_drrs_flush(i915, frontbuffer_bits); + intel_psr_flush(i915, frontbuffer_bits, origin); + intel_fbc_flush(i915, frontbuffer_bits, origin); } /** * intel_frontbuffer_flip_prepare - prepare asynchronous frontbuffer flip - * @dev_priv: i915 device + * @i915: i915 device * @frontbuffer_bits: frontbuffer plane tracking bits * * This function gets called after scheduling a flip on @obj. The actual @@ -147,19 +110,19 @@ void __intel_fb_obj_flush(struct drm_i915_gem_object *obj, * * Can be called without any locks held. */ -void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv, +void intel_frontbuffer_flip_prepare(struct drm_i915_private *i915, unsigned frontbuffer_bits) { - spin_lock(&dev_priv->fb_tracking.lock); - dev_priv->fb_tracking.flip_bits |= frontbuffer_bits; + spin_lock(&i915->fb_tracking.lock); + i915->fb_tracking.flip_bits |= frontbuffer_bits; /* Remove stale busy bits due to the old buffer. */ - dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits; - spin_unlock(&dev_priv->fb_tracking.lock); + i915->fb_tracking.busy_bits &= ~frontbuffer_bits; + spin_unlock(&i915->fb_tracking.lock); } /** * intel_frontbuffer_flip_complete - complete asynchronous frontbuffer flip - * @dev_priv: i915 device + * @i915: i915 device * @frontbuffer_bits: frontbuffer plane tracking bits * * This function gets called after the flip has been latched and will complete @@ -167,23 +130,22 @@ void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv, * * Can be called without any locks held. */ -void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv, +void intel_frontbuffer_flip_complete(struct drm_i915_private *i915, unsigned frontbuffer_bits) { - spin_lock(&dev_priv->fb_tracking.lock); + spin_lock(&i915->fb_tracking.lock); /* Mask any cancelled flips. */ - frontbuffer_bits &= dev_priv->fb_tracking.flip_bits; - dev_priv->fb_tracking.flip_bits &= ~frontbuffer_bits; - spin_unlock(&dev_priv->fb_tracking.lock); + frontbuffer_bits &= i915->fb_tracking.flip_bits; + i915->fb_tracking.flip_bits &= ~frontbuffer_bits; + spin_unlock(&i915->fb_tracking.lock); if (frontbuffer_bits) - intel_frontbuffer_flush(dev_priv, - frontbuffer_bits, ORIGIN_FLIP); + frontbuffer_flush(i915, frontbuffer_bits, ORIGIN_FLIP); } /** * intel_frontbuffer_flip - synchronous frontbuffer flip - * @dev_priv: i915 device + * @i915: i915 device * @frontbuffer_bits: frontbuffer plane tracking bits * * This function gets called after scheduling a flip on @obj. This is for @@ -192,13 +154,159 @@ void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv, * * Can be called without any locks held. */ -void intel_frontbuffer_flip(struct drm_i915_private *dev_priv, +void intel_frontbuffer_flip(struct drm_i915_private *i915, unsigned frontbuffer_bits) { - spin_lock(&dev_priv->fb_tracking.lock); + spin_lock(&i915->fb_tracking.lock); /* Remove stale busy bits due to the old buffer. */ - dev_priv->fb_tracking.busy_bits &= ~frontbuffer_bits; - spin_unlock(&dev_priv->fb_tracking.lock); + i915->fb_tracking.busy_bits &= ~frontbuffer_bits; + spin_unlock(&i915->fb_tracking.lock); + + frontbuffer_flush(i915, frontbuffer_bits, ORIGIN_FLIP); +} + +void __intel_fb_invalidate(struct intel_frontbuffer *fb, + enum fb_op_origin origin, + unsigned int frontbuffer_bits) +{ + struct drm_i915_private *i915 = to_i915(fb->obj->base.dev); + + if (origin == ORIGIN_CS) { + spin_lock(&i915->fb_tracking.lock); + i915->fb_tracking.busy_bits |= frontbuffer_bits; + i915->fb_tracking.flip_bits &= ~frontbuffer_bits; + spin_unlock(&i915->fb_tracking.lock); + } + + might_sleep(); + intel_psr_invalidate(i915, frontbuffer_bits, origin); + intel_edp_drrs_invalidate(i915, frontbuffer_bits); + intel_fbc_invalidate(i915, frontbuffer_bits, origin); +} + +void __intel_fb_flush(struct intel_frontbuffer *fb, + enum fb_op_origin origin, + unsigned int frontbuffer_bits) +{ + struct drm_i915_private *i915 = to_i915(fb->obj->base.dev); + + if (origin == ORIGIN_CS) { + spin_lock(&i915->fb_tracking.lock); + /* Filter out new bits since rendering started. */ + frontbuffer_bits &= i915->fb_tracking.busy_bits; + i915->fb_tracking.busy_bits &= ~frontbuffer_bits; + spin_unlock(&i915->fb_tracking.lock); + } + + if (frontbuffer_bits) + frontbuffer_flush(i915, frontbuffer_bits, origin); +} + +static int frontbuffer_active(struct i915_active *ref) +{ + struct intel_frontbuffer *fb = + container_of(ref, typeof(*fb), write); + + kref_get(&fb->ref); + return 0; +} + +static void frontbuffer_retire(struct i915_active *ref) +{ + struct intel_frontbuffer *fb = + container_of(ref, typeof(*fb), write); + + intel_frontbuffer_flush(fb, ORIGIN_CS); + intel_frontbuffer_put(fb); +} + +static void frontbuffer_release(struct kref *ref) +{ + struct intel_frontbuffer *fb = + container_of(ref, typeof(*fb), ref); + + fb->obj->frontbuffer = NULL; + spin_unlock(&to_i915(fb->obj->base.dev)->fb_tracking.lock); + + i915_gem_object_put(fb->obj); + kfree(fb); +} + +struct intel_frontbuffer * +intel_frontbuffer_get(struct drm_i915_gem_object *obj) +{ + struct drm_i915_private *i915 = to_i915(obj->base.dev); + struct intel_frontbuffer *fb; + + spin_lock(&i915->fb_tracking.lock); + fb = obj->frontbuffer; + if (fb) + kref_get(&fb->ref); + spin_unlock(&i915->fb_tracking.lock); + if (fb) + return fb; + + fb = kmalloc(sizeof(*fb), GFP_KERNEL); + if (!fb) + return NULL; - intel_frontbuffer_flush(dev_priv, frontbuffer_bits, ORIGIN_FLIP); + fb->obj = obj; + kref_init(&fb->ref); + atomic_set(&fb->bits, 0); + i915_active_init(i915, &fb->write, + frontbuffer_active, frontbuffer_retire); + + spin_lock(&i915->fb_tracking.lock); + if (obj->frontbuffer) { + kfree(fb); + fb = obj->frontbuffer; + kref_get(&fb->ref); + } else { + i915_gem_object_get(obj); + obj->frontbuffer = fb; + } + spin_unlock(&i915->fb_tracking.lock); + + return fb; +} + +void intel_frontbuffer_put(struct intel_frontbuffer *fb) +{ + struct drm_i915_private *i915 = to_i915(fb->obj->base.dev); + + kref_put_lock(&fb->ref, frontbuffer_release, &i915->fb_tracking.lock); +} + +/** + * intel_frontbuffer_track - update frontbuffer tracking + * @old: current GEM buffer for the frontbuffer slots + * @new: new GEM buffer for the frontbuffer slots + * @frontbuffer_bits: bitmask of frontbuffer slots + * + * This updates the frontbuffer tracking bits @frontbuffer_bits by clearing them + * from @old and setting them in @new. Both @old and @new can be NULL. + */ +void intel_frontbuffer_track(struct intel_frontbuffer *old, + struct intel_frontbuffer *new, + unsigned int frontbuffer_bits) +{ + /* + * Control of individual bits within the mask are guarded by + * the owning plane->mutex, i.e. we can never see concurrent + * manipulation of individual bits. But since the bitfield as a whole + * is updated using RMW, we need to use atomics in order to update + * the bits. + */ + BUILD_BUG_ON(INTEL_FRONTBUFFER_BITS_PER_PIPE * I915_MAX_PIPES > + BITS_PER_TYPE(atomic_t)); + + if (old) { + WARN_ON(!(atomic_read(&old->bits) & frontbuffer_bits)); + atomic_andnot(frontbuffer_bits, &old->bits); + } + + if (new) { + WARN_ON(atomic_read(&new->bits) & frontbuffer_bits); + atomic_or(frontbuffer_bits, &new->bits); + } } diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.h b/drivers/gpu/drm/i915/intel_frontbuffer.h index 5727320c8084..6d869c2c1f83 100644 --- a/drivers/gpu/drm/i915/intel_frontbuffer.h +++ b/drivers/gpu/drm/i915/intel_frontbuffer.h @@ -24,7 +24,10 @@ #ifndef __INTEL_FRONTBUFFER_H__ #define __INTEL_FRONTBUFFER_H__ -#include "gem/i915_gem_object.h" +#include +#include + +#include "i915_active.h" struct drm_i915_private; struct drm_i915_gem_object; @@ -37,23 +40,30 @@ enum fb_op_origin { ORIGIN_DIRTYFB, }; -void intel_frontbuffer_flip_prepare(struct drm_i915_private *dev_priv, +struct intel_frontbuffer { + struct kref ref; + atomic_t bits; + struct i915_active write; + struct drm_i915_gem_object *obj; +}; + +void intel_frontbuffer_flip_prepare(struct drm_i915_private *i915, unsigned frontbuffer_bits); -void intel_frontbuffer_flip_complete(struct drm_i915_private *dev_priv, +void intel_frontbuffer_flip_complete(struct drm_i915_private *i915, unsigned frontbuffer_bits); -void intel_frontbuffer_flip(struct drm_i915_private *dev_priv, +void intel_frontbuffer_flip(struct drm_i915_private *i915, unsigned frontbuffer_bits); -void __intel_fb_obj_invalidate(struct drm_i915_gem_object *obj, - enum fb_op_origin origin, - unsigned int frontbuffer_bits); -void __intel_fb_obj_flush(struct drm_i915_gem_object *obj, - enum fb_op_origin origin, - unsigned int frontbuffer_bits); +struct intel_frontbuffer * +intel_frontbuffer_get(struct drm_i915_gem_object *obj); + +void __intel_fb_invalidate(struct intel_frontbuffer *fb, + enum fb_op_origin origin, + unsigned int frontbuffer_bits); /** - * intel_fb_obj_invalidate - invalidate frontbuffer object - * @obj: GEM object to invalidate + * intel_frontbuffer_invalidate - invalidate frontbuffer object + * @fb: GEM object to invalidate * @origin: which operation caused the invalidation * * This function gets called every time rendering on the given object starts and @@ -62,37 +72,53 @@ void __intel_fb_obj_flush(struct drm_i915_gem_object *obj, * until the rendering completes or a flip on this frontbuffer plane is * scheduled. */ -static inline bool intel_fb_obj_invalidate(struct drm_i915_gem_object *obj, - enum fb_op_origin origin) +static inline bool intel_frontbuffer_invalidate(struct intel_frontbuffer *fb, + enum fb_op_origin origin) { unsigned int frontbuffer_bits; - frontbuffer_bits = atomic_read(&obj->frontbuffer_bits); + if (!fb) + return false; + + frontbuffer_bits = atomic_read(&fb->bits); if (!frontbuffer_bits) return false; - __intel_fb_obj_invalidate(obj, origin, frontbuffer_bits); + __intel_fb_invalidate(fb, origin, frontbuffer_bits); return true; } +void __intel_fb_flush(struct intel_frontbuffer *fb, + enum fb_op_origin origin, + unsigned int frontbuffer_bits); + /** - * intel_fb_obj_flush - flush frontbuffer object - * @obj: GEM object to flush + * intel_frontbuffer_flush - flush frontbuffer object + * @fb: GEM object to flush * @origin: which operation caused the flush * * This function gets called every time rendering on the given object has * completed and frontbuffer caching can be started again. */ -static inline void intel_fb_obj_flush(struct drm_i915_gem_object *obj, - enum fb_op_origin origin) +static inline void intel_frontbuffer_flush(struct intel_frontbuffer *fb, + enum fb_op_origin origin) { unsigned int frontbuffer_bits; - frontbuffer_bits = atomic_read(&obj->frontbuffer_bits); + if (!fb) + return; + + frontbuffer_bits = atomic_read(&fb->bits); if (!frontbuffer_bits) return; - __intel_fb_obj_flush(obj, origin, frontbuffer_bits); + __intel_fb_flush(fb, origin, frontbuffer_bits); } +void intel_frontbuffer_track(struct intel_frontbuffer *old, + struct intel_frontbuffer *new, + unsigned int frontbuffer_bits); + +void intel_frontbuffer_put(struct intel_frontbuffer *fb); + #endif /* __INTEL_FRONTBUFFER_H__ */ diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 55da4802426d..38495df60f7b 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -281,9 +281,9 @@ static void intel_overlay_flip_prepare(struct intel_overlay *overlay, WARN_ON(overlay->old_vma); - i915_gem_track_fb(overlay->vma ? overlay->vma->obj : NULL, - vma ? vma->obj : NULL, - INTEL_FRONTBUFFER_OVERLAY(pipe)); + intel_frontbuffer_track(overlay->vma ? overlay->vma->obj->frontbuffer : NULL, + vma ? vma->obj->frontbuffer : NULL, + INTEL_FRONTBUFFER_OVERLAY(pipe)); intel_frontbuffer_flip_prepare(overlay->i915, INTEL_FRONTBUFFER_OVERLAY(pipe)); @@ -768,7 +768,7 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, ret = PTR_ERR(vma); goto out_pin_section; } - intel_fb_obj_flush(new_bo, ORIGIN_DIRTYFB); + intel_frontbuffer_flush(new_bo->frontbuffer, ORIGIN_DIRTYFB); ret = i915_vma_put_fence(vma); if (ret) From patchwork Mon Jun 10 07:21:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984161 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A534F6C5 for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8B53D28812 for ; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7FF7328832; Mon, 10 Jun 2019 07:21:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 35BC128812 for ; Mon, 10 Jun 2019 07:21:43 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 56A9989109; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4A4E68910B for ; Mon, 10 Jun 2019 07:21:39 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848374-1500050 for multiple; Mon, 10 Jun 2019 08:21:30 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:19 +0100 Message-Id: <20190610072126.6355-22-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 21/28] drm/i915: Coordinate i915_active with its own mutex X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Forgo the struct_mutex serialisation for i915_active, and interpose its own mutex handling for active/retire. This is a multi-layered sleight-of-hand. First, we had to ensure that no active/retire callbacks accidentally inverted the mutex ordering rules, nor assumed that they were themselves serialised by struct_mutex. More challenging though, is the rule over updating elements of the active rbtree. Instead of the whole i915_active now being serialised by struct_mutex, allocations/rotations of the tree are serialised by the i915_active.mutex and individual nodes are serialised by the caller using the i915_timeline.mutex (we need to use nested spinlocks to interact with the dma_fence callback lists). The pain point here is that instead of a single mutex around execbuf, we now have to take a mutex for active tracker (one for each vma, context, etc) and a couple of spinlocks for each fence update. The improvement in fine grained locking allowing for multiple concurrent clients (eventually!) should be worth it in typical loads. Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 12 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 2 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 + drivers/gpu/drm/i915/gem/i915_gem_pm.c | 9 +- drivers/gpu/drm/i915/gt/intel_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_reset.c | 10 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 5 +- drivers/gpu/drm/i915/gvt/scheduler.c | 3 - drivers/gpu/drm/i915/i915_active.c | 165 +++++++------- drivers/gpu/drm/i915/i915_active.h | 204 +++++------------- drivers/gpu/drm/i915/i915_active_types.h | 17 +- drivers/gpu/drm/i915/i915_gem.c | 50 +++-- drivers/gpu/drm/i915/i915_gem_gtt.c | 2 +- drivers/gpu/drm/i915/i915_request.c | 56 +---- drivers/gpu/drm/i915/i915_request.h | 1 - drivers/gpu/drm/i915/i915_timeline.c | 9 +- drivers/gpu/drm/i915/i915_timeline_types.h | 2 +- drivers/gpu/drm/i915/i915_vma.c | 36 +--- drivers/gpu/drm/i915/intel_frontbuffer.c | 3 +- drivers/gpu/drm/i915/intel_overlay.c | 6 +- drivers/gpu/drm/i915/selftests/i915_active.c | 5 +- .../gpu/drm/i915/selftests/mock_timeline.c | 2 +- 22 files changed, 229 insertions(+), 374 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 837cad233cc6..2e684f8151d9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -909,20 +909,18 @@ static int context_barrier_task(struct i915_gem_context *ctx, void (*task)(void *data), void *data) { - struct drm_i915_private *i915 = ctx->i915; struct context_barrier_task *cb; struct i915_gem_engines_iter it; struct intel_context *ce; int err = 0; - lockdep_assert_held(&i915->drm.struct_mutex); GEM_BUG_ON(!task); cb = kmalloc(sizeof(*cb), GFP_KERNEL); if (!cb) return -ENOMEM; - i915_active_init(i915, &cb->base, NULL, cb_retire); + i915_active_init(&cb->base, NULL, cb_retire); err = i915_active_acquire(&cb->base); if (err) { kfree(cb); @@ -954,7 +952,9 @@ static int context_barrier_task(struct i915_gem_context *ctx, if (emit) err = emit(rq, data); if (err == 0) - err = i915_active_ref(&cb->base, rq->fence.context, rq); + err = i915_active_ref(&cb->base, + rq->fence.context, + &rq->fence); i915_request_add(rq); if (err) @@ -1188,7 +1188,7 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) return PTR_ERR(rq); /* Queue this switch after all other activity by this context. */ - ret = i915_active_request_set(&ce->ring->timeline->last_request, rq); + ret = i915_active_fence_set(&ce->ring->timeline->last_request, rq); if (ret) goto out_add; @@ -1200,7 +1200,7 @@ gen8_modify_rpcs(struct intel_context *ce, struct intel_sseu sseu) * words transfer the pinned ce object to tracked active request. */ GEM_BUG_ON(i915_active_is_idle(&ce->active)); - ret = i915_active_ref(&ce->active, rq->fence.context, rq); + ret = i915_active_ref(&ce->active, rq->fence.context, &rq->fence); if (ret) goto out_add; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 2c4f3229361d..6193c81ebbed 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -1316,7 +1316,7 @@ relocate_entry(struct i915_vma *vma, if (!eb->reloc_cache.vaddr && (DBG_FORCE_RELOC == FORCE_GPU_RELOC || - !reservation_object_test_signaled_rcu(vma->resv, true))) { + i915_vma_is_active(vma))) { const unsigned int gen = eb->reloc_cache.gen; unsigned int len; u32 *batch; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 21bfb7bd0f57..e87fca4d8194 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -11,6 +11,8 @@ #include +#include + #include "i915_active.h" #include "i915_selftest.h" diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pm.c b/drivers/gpu/drm/i915/gem/i915_gem_pm.c index b0f37621de9f..283092ac13fe 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pm.c @@ -15,14 +15,11 @@ static void call_idle_barriers(struct intel_engine_cs *engine) struct llist_node *node, *next; llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) { - struct i915_active_request *active = + struct i915_active_fence *active = container_of((struct list_head *)node, - typeof(*active), link); + typeof(*active), cb.node); - INIT_LIST_HEAD(&active->link); - RCU_INIT_POINTER(active->request, NULL); - - active->retire(active, NULL); + active->cb.func(NULL, &active->cb); } } diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index b9fea31cf9ec..a32698f7645f 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -182,7 +182,7 @@ intel_context_init(struct intel_context *ce, mutex_init(&ce->pin_mutex); - i915_active_init(ctx->i915, &ce->active, + i915_active_init(&ce->active, __intel_context_active, __intel_context_retire); } diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c index 1f831fe759a5..c5a772889ea5 100644 --- a/drivers/gpu/drm/i915/gt/intel_reset.c +++ b/drivers/gpu/drm/i915/gt/intel_reset.c @@ -881,10 +881,10 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915) */ mutex_lock(&i915->gt.timelines.mutex); list_for_each_entry(tl, &i915->gt.timelines.active_list, link) { - struct i915_request *rq; + struct dma_fence *fence; - rq = i915_active_request_get_unlocked(&tl->last_request); - if (!rq) + fence = i915_active_fence_get(&tl->last_request); + if (!fence) continue; /* @@ -894,8 +894,8 @@ static bool __i915_gem_unset_wedged(struct drm_i915_private *i915) * (I915_FENCE_TIMEOUT) so this wait should not be unbounded * in the worst case. */ - dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT); - i915_request_put(rq); + dma_fence_default_wait(fence, false, MAX_SCHEDULE_TIMEOUT); + dma_fence_put(fence); } mutex_unlock(&i915->gt.timelines.mutex); diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 338111d690ac..9fc03a400685 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -854,7 +854,6 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine) if (!rq) return NULL; - INIT_LIST_HEAD(&rq->active_list); rq->engine = engine; i915_sched_node_init(&rq->sched); @@ -945,8 +944,8 @@ static int live_suppress_wait_preempt(void *arg) } /* Disable NEWCLIENT promotion */ - __i915_active_request_set(&rq[i]->timeline->last_request, - dummy); + __i915_active_fence_set(&rq[i]->timeline->last_request, + &dummy->fence); i915_request_add(rq[i]); } diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 6cd72dd96a4b..31752b7ebff5 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -391,11 +391,8 @@ intel_gvt_workload_req_alloc(struct intel_vgpu_workload *workload) { struct intel_vgpu *vgpu = workload->vgpu; struct intel_vgpu_submission *s = &vgpu->submission; - struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; struct i915_request *rq; - lockdep_assert_held(&dev_priv->drm.struct_mutex); - if (workload->req) return 0; diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c index f7ffa6e7bd9a..65a9ad2fc9c7 100644 --- a/drivers/gpu/drm/i915/i915_active.c +++ b/drivers/gpu/drm/i915/i915_active.c @@ -8,8 +8,6 @@ #include "i915_active.h" #include "i915_globals.h" -#define BKL(ref) (&(ref)->i915->drm.struct_mutex) - /* * Active refs memory management * @@ -23,7 +21,7 @@ static struct i915_global_active { } global; struct active_node { - struct i915_active_request base; + struct i915_active_fence base; struct i915_active *ref; struct rb_node node; u64 timeline; @@ -32,11 +30,12 @@ struct active_node { static void __active_retire(struct i915_active *ref) { - struct rb_root root = RB_ROOT; struct active_node *it, *n; + struct rb_root root; bool retire = false; lockdep_assert_held(&ref->mutex); + GEM_BUG_ON(i915_active_is_idle(ref)); /* return the unused nodes to our slabcache -- flushing the allocator */ if (atomic_dec_and_test(&ref->count)) { @@ -47,12 +46,13 @@ __active_retire(struct i915_active *ref) } mutex_unlock(&ref->mutex); + if (!retire) + return; - if (retire) - ref->retire(ref); + ref->retire(ref); rbtree_postorder_for_each_entry_safe(it, n, &root, node) { - GEM_BUG_ON(i915_active_request_isset(&it->base)); + GEM_BUG_ON(i915_active_fence_isset(&it->base)); kmem_cache_free(global.slab_cache, it); } } @@ -86,12 +86,13 @@ active_retire(struct i915_active *ref) } static void -node_retire(struct i915_active_request *base, struct i915_request *rq) +node_retire(struct dma_fence *fence, struct dma_fence_cb *cb) { - active_retire(container_of(base, struct active_node, base)->ref); + i915_active_fence_cb(fence, cb); + active_retire(container_of(cb, struct active_node, base.cb)->ref); } -static struct i915_active_request * +static struct i915_active_fence * active_instance(struct i915_active *ref, u64 idx) { struct active_node *node, *prealloc; @@ -114,6 +115,7 @@ active_instance(struct i915_active *ref, u64 idx) return NULL; mutex_lock(&ref->mutex); + GEM_BUG_ON(i915_active_is_idle(ref)); parent = NULL; p = &ref->tree.rb_node; @@ -121,7 +123,7 @@ active_instance(struct i915_active *ref, u64 idx) parent = *p; node = rb_entry(parent, struct active_node, node); - if (node->timeline == idx && !IS_ERR(node->base.request)) { + if (node->timeline == idx && !IS_ERR(node->base.fence)) { kmem_cache_free(global.slab_cache, prealloc); goto out; } @@ -133,7 +135,8 @@ active_instance(struct i915_active *ref, u64 idx) } node = prealloc; - i915_active_request_init(&node->base, NULL, node_retire); + RCU_INIT_POINTER(node->base.fence, NULL); + node->base.cb.func = node_retire; node->ref = ref; node->timeline = idx; @@ -147,13 +150,11 @@ active_instance(struct i915_active *ref, u64 idx) return &node->base; } -void __i915_active_init(struct drm_i915_private *i915, - struct i915_active *ref, +void __i915_active_init(struct i915_active *ref, int (*active)(struct i915_active *ref), void (*retire)(struct i915_active *ref), struct lock_class_key *key) { - ref->i915 = i915; ref->active = active; ref->retire = retire; ref->tree = RB_ROOT; @@ -166,9 +167,9 @@ void __i915_active_init(struct drm_i915_private *i915, int i915_active_ref(struct i915_active *ref, u64 timeline, - struct i915_request *rq) + struct dma_fence *fence) { - struct i915_active_request *active; + struct i915_active_fence *active; int err; /* Prevent reaping in case we malloc/wait while building the tree */ @@ -182,9 +183,9 @@ int i915_active_ref(struct i915_active *ref, goto out; } - if (!i915_active_request_isset(active)) + GEM_BUG_ON(!atomic_read(&ref->count)); + if (!__i915_active_fence_set(active, fence)) atomic_inc(&ref->count); - __i915_active_request_set(active, rq); out: i915_active_release(ref); @@ -222,60 +223,33 @@ int i915_active_wait(struct i915_active *ref) struct active_node *it, *n; int err; + might_sleep(); if (RB_EMPTY_ROOT(&ref->tree)) return 0; - err = i915_active_acquire(ref); /* Avoid retiring ourselves */ + err = mutex_lock_interruptible(&ref->mutex); if (err) return err; - err = mutex_lock_interruptible(&ref->mutex); - if (err) - goto out; + if (!atomic_add_unless(&ref->count, 1, 0)) + goto unlock; rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - err = i915_active_request_retire(&it->base, BKL(ref)); - if (err) - break; - } - mutex_unlock(&ref->mutex); + struct dma_fence *fence; -out: - i915_active_release(ref); - return err; -} - -int i915_request_await_active_request(struct i915_request *rq, - struct i915_active_request *active) -{ - struct i915_request *barrier = - i915_active_request_raw(active, &rq->i915->drm.struct_mutex); - - return barrier ? i915_request_await_dma_fence(rq, &barrier->fence) : 0; -} + fence = i915_active_fence_get(&it->base); + if (!fence) + continue; -int i915_request_await_active(struct i915_request *rq, struct i915_active *ref) -{ - struct active_node *it, *n; - int err; - - if (RB_EMPTY_ROOT(&ref->tree)) - return 0; - - /* await allocates and so we need to avoid hitting the shrinker */ - err = i915_active_acquire(ref); - if (err) - return err; - - mutex_lock(&ref->mutex); - rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) { - err = i915_request_await_active_request(rq, &it->base); + err = dma_fence_wait(fence, true); + dma_fence_put(fence); if (err) break; } - mutex_unlock(&ref->mutex); + __active_retire(ref); - i915_active_release(ref); +unlock: + mutex_unlock(&ref->mutex); return err; } @@ -306,13 +280,13 @@ int i915_active_acquire_preallocate_barrier(struct i915_active *ref, break; } - i915_active_request_init(&node->base, - (void *)engine, node_retire); + RCU_INIT_POINTER(node->base.fence, (void *)engine); + node->base.cb.func = node_retire; node->timeline = kctx->ring->timeline->fence_context; node->ref = ref; atomic_inc(&ref->count); - llist_add((struct llist_node *)&node->base.link, + llist_add((struct llist_node *)&node->base.cb.node, &ref->barriers); } @@ -332,10 +306,10 @@ void i915_active_acquire_barrier(struct i915_active *ref) struct rb_node **p, *parent; node = container_of((struct list_head *)pos, - typeof(*node), base.link); + typeof(*node), base.cb.node); - engine = (void *)rcu_access_pointer(node->base.request); - RCU_INIT_POINTER(node->base.request, ERR_PTR(-EAGAIN)); + engine = (void *)rcu_access_pointer(node->base.fence); + RCU_INIT_POINTER(node->base.fence, ERR_PTR(-EAGAIN)); parent = NULL; p = &ref->tree.rb_node; @@ -351,7 +325,7 @@ void i915_active_acquire_barrier(struct i915_active *ref) rb_link_node(&node->node, parent, p); rb_insert_color(&node->node, &ref->tree); - llist_add((struct llist_node *)&node->base.link, + llist_add((struct llist_node *)&node->base.cb.node, &engine->barrier_tasks); } mutex_unlock(&ref->mutex); @@ -361,29 +335,66 @@ void i915_request_add_barriers(struct i915_request *rq) { struct intel_engine_cs *engine = rq->engine; struct llist_node *node, *next; + unsigned long flags; + + GEM_BUG_ON(intel_engine_is_virtual(engine)); + node = llist_del_all(&engine->barrier_tasks); + if (!node) + return; - llist_for_each_safe(node, next, llist_del_all(&engine->barrier_tasks)) - list_add_tail((struct list_head *)node, &rq->active_list); + spin_lock_irqsave(&rq->lock, flags); + llist_for_each_safe(node, next, node) + list_add_tail((struct list_head *)node, &rq->fence.cb_list); + spin_unlock_irqrestore(&rq->lock, flags); } -int i915_active_request_set(struct i915_active_request *active, - struct i915_request *rq) +struct dma_fence * +__i915_active_fence_set(struct i915_active_fence *active, + struct dma_fence *fence) { + struct dma_fence *old; + unsigned long flags; + + spin_lock_irqsave(fence->lock, flags); + GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)); + + old = rcu_dereference_protected(active->fence, 1); + if (old) { + spin_lock_nested(old->lock, SINGLE_DEPTH_NESTING); + __list_del_entry(&active->cb.node); + spin_unlock(old->lock); + } + + rcu_assign_pointer(active->fence, fence); + list_add_tail(&active->cb.node, &fence->cb_list); + + spin_unlock_irqrestore(fence->lock, flags); + + return old; +} + +int i915_active_fence_set(struct i915_active_fence *active, + struct i915_request *rq) +{ + struct dma_fence *fence; int err; /* Must maintain ordering wrt previous active requests */ - err = i915_request_await_active_request(rq, active); - if (err) - return err; + fence = i915_active_fence_get(active); + if (fence) { + err = i915_request_await_dma_fence(rq, fence); + dma_fence_put(fence); + if (err) + return err; + } - __i915_active_request_set(active, rq); + __i915_active_fence_set(active, &rq->fence); return 0; } -void i915_active_retire_noop(struct i915_active_request *active, - struct i915_request *request) +void i915_active_noop(struct dma_fence *fence, struct dma_fence_cb *cb) { - /* Space left intentionally blank */ + i915_active_fence_cb(fence, cb); } #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST) diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h index bdec4f81b6e8..21a9ce6fa461 100644 --- a/drivers/gpu/drm/i915/i915_active.h +++ b/drivers/gpu/drm/i915/i915_active.h @@ -10,7 +10,9 @@ #include #include "i915_active_types.h" -#include "i915_request.h" + +struct i915_request; +struct intel_engine_cs; /* * We treat requests as fences. This is not be to confused with our @@ -28,128 +30,62 @@ * write access so that we can perform concurrent read operations between * the CPU and GPU engines, as well as waiting for all rendering to * complete, or waiting for the last GPU user of a "fence register". The - * object then embeds a #i915_active_request to track the most recent (in + * object then embeds a #i915_active_fence to track the most recent (in * retirement order) request relevant for the desired mode of access. - * The #i915_active_request is updated with i915_active_request_set() to + * The #i915_active_fence is updated with i915_active_fence_set() to * track the most recent fence request, typically this is done as part of * i915_vma_move_to_active(). * - * When the #i915_active_request completes (is retired), it will + * When the #i915_active_fence completes (is retired), it will * signal its completion to the owner through a callback as well as mark - * itself as idle (i915_active_request.request == NULL). The owner + * itself as idle (i915_active_fence.request == NULL). The owner * can then perform any action, such as delayed freeing of an active * resource including itself. */ -void i915_active_retire_noop(struct i915_active_request *active, - struct i915_request *request); +void i915_active_noop(struct dma_fence *fence, struct dma_fence_cb *cb); /** - * i915_active_request_init - prepares the activity tracker for use + * i915_active_fence_init - prepares the activity tracker for use * @active - the active tracker * @rq - initial request to track, can be NULL * @func - a callback when then the tracker is retired (becomes idle), * can be NULL * - * i915_active_request_init() prepares the embedded @active struct for use as + * i915_active_fence_init() prepares the embedded @active struct for use as * an activity tracker, that is for tracking the last known active request * associated with it. When the last request becomes idle, when it is retired * after completion, the optional callback @func is invoked. */ static inline void -i915_active_request_init(struct i915_active_request *active, - struct i915_request *rq, - i915_active_retire_fn retire) +i915_active_fence_init(struct i915_active_fence *active, + struct dma_fence *fence, + dma_fence_func_t fn) { - RCU_INIT_POINTER(active->request, rq); - INIT_LIST_HEAD(&active->link); - active->retire = retire ?: i915_active_retire_noop; + RCU_INIT_POINTER(active->fence, fence); + active->cb.func = fn ?: i915_active_noop; } -#define INIT_ACTIVE_REQUEST(name) i915_active_request_init((name), NULL, NULL) +#define INIT_ACTIVE_FENCE(A) i915_active_fence_init(A, NULL, NULL) -/** - * i915_active_request_set - updates the tracker to watch the current request - * @active - the active tracker - * @request - the request to watch - * - * __i915_active_request_set() watches the given @request for completion. Whilst - * that @request is busy, the @active reports busy. When that @request is - * retired, the @active tracker is updated to report idle. - */ -static inline void -__i915_active_request_set(struct i915_active_request *active, - struct i915_request *request) -{ - list_move(&active->link, &request->active_list); - rcu_assign_pointer(active->request, request); -} +struct dma_fence * +__i915_active_fence_set(struct i915_active_fence *active, + struct dma_fence *fence); int __must_check -i915_active_request_set(struct i915_active_request *active, - struct i915_request *rq); - -/** - * i915_active_request_raw - return the active request - * @active - the active tracker - * - * i915_active_request_raw() returns the current request being tracked, or NULL. - * It does not obtain a reference on the request for the caller, so the caller - * must hold struct_mutex. - */ -static inline struct i915_request * -i915_active_request_raw(const struct i915_active_request *active, - struct mutex *mutex) -{ - return rcu_dereference_protected(active->request, - lockdep_is_held(mutex)); -} +i915_active_fence_set(struct i915_active_fence *active, + struct i915_request *rq); /** - * i915_active_request_peek - report the active request being monitored + * __i915_active_fence_get_rcu - return a reference to the active request * @active - the active tracker * - * i915_active_request_peek() returns the current request being tracked if - * still active, or NULL. It does not obtain a reference on the request - * for the caller, so the caller must hold struct_mutex. - */ -static inline struct i915_request * -i915_active_request_peek(const struct i915_active_request *active, - struct mutex *mutex) -{ - struct i915_request *request; - - request = i915_active_request_raw(active, mutex); - if (!request || i915_request_completed(request)) - return NULL; - - return request; -} - -/** - * i915_active_request_get - return a reference to the active request - * @active - the active tracker - * - * i915_active_request_get() returns a reference to the active request, or NULL - * if the active tracker is idle. The caller must hold struct_mutex. - */ -static inline struct i915_request * -i915_active_request_get(const struct i915_active_request *active, - struct mutex *mutex) -{ - return i915_request_get(i915_active_request_peek(active, mutex)); -} - -/** - * __i915_active_request_get_rcu - return a reference to the active request - * @active - the active tracker - * - * __i915_active_request_get() returns a reference to the active request, + * __i915_active_fence_get() returns a reference to the active request, * or NULL if the active tracker is idle. The caller must hold the RCU read * lock, but the returned pointer is safe to use outside of RCU. */ -static inline struct i915_request * -__i915_active_request_get_rcu(const struct i915_active_request *active) +static inline struct dma_fence * +__i915_active_fence_get_rcu(const struct i915_active_fence *active) { /* * Performing a lockless retrieval of the active request is super @@ -198,10 +134,11 @@ __i915_active_request_get_rcu(const struct i915_active_request *active) * See i915_request_alloc(). */ do { - struct i915_request *request; + struct dma_fence *fence; - request = rcu_dereference(active->request); - if (!request || i915_request_completed(request)) + fence = rcu_dereference(active->fence); + if (!fence || + test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) return NULL; /* @@ -218,7 +155,7 @@ __i915_active_request_get_rcu(const struct i915_active_request *active) */ barrier(); - request = i915_request_get_rcu(request); + fence = dma_fence_get_rcu(fence); /* * What stops the following rcu_access_pointer() from occurring @@ -247,81 +184,58 @@ __i915_active_request_get_rcu(const struct i915_active_request *active) * The corresponding write barrier is part of * rcu_assign_pointer(). */ - if (!request || request == rcu_access_pointer(active->request)) - return rcu_pointer_handoff(request); + if (!fence || fence == rcu_access_pointer(active->fence)) + return rcu_pointer_handoff(fence); - i915_request_put(request); + dma_fence_put(fence); } while (1); } /** - * i915_active_request_get_unlocked - return a reference to the active request + * i915_active_fence_get - return a reference to the active request * @active - the active tracker * - * i915_active_request_get_unlocked() returns a reference to the active request, + * i915_active_fence_get() returns a reference to the active request, * or NULL if the active tracker is idle. The reference is obtained under RCU, * so no locking is required by the caller. * - * The reference should be freed with i915_request_put(). + * The reference should be freed with dma_fence_put(). */ -static inline struct i915_request * -i915_active_request_get_unlocked(const struct i915_active_request *active) +static inline struct dma_fence * +i915_active_fence_get(const struct i915_active_fence *active) { - struct i915_request *request; + struct dma_fence *fence; rcu_read_lock(); - request = __i915_active_request_get_rcu(active); + fence = __i915_active_fence_get_rcu(active); rcu_read_unlock(); - return request; + return fence; } /** - * i915_active_request_isset - report whether the active tracker is assigned + * i915_active_fence_isset - report whether the active tracker is assigned * @active - the active tracker * - * i915_active_request_isset() returns true if the active tracker is currently + * i915_active_fence_isset() returns true if the active tracker is currently * assigned to a request. Due to the lazy retiring, that request may be idle * and this may report stale information. */ static inline bool -i915_active_request_isset(const struct i915_active_request *active) +i915_active_fence_isset(const struct i915_active_fence *active) { - return rcu_access_pointer(active->request); + return rcu_access_pointer(active->fence); } -/** - * i915_active_request_retire - waits until the request is retired - * @active - the active request on which to wait - * - * i915_active_request_retire() waits until the request is completed, - * and then ensures that at least the retirement handler for this - * @active tracker is called before returning. If the @active - * tracker is idle, the function returns immediately. - */ -static inline int __must_check -i915_active_request_retire(struct i915_active_request *active, - struct mutex *mutex) +static inline void +i915_active_fence_cb(struct dma_fence *fence, + struct dma_fence_cb *cb) { - struct i915_request *request; - long ret; - - request = i915_active_request_raw(active, mutex); - if (!request) - return 0; - - ret = i915_request_wait(request, - I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED, - MAX_SCHEDULE_TIMEOUT); - if (ret < 0) - return ret; - - list_del_init(&active->link); - RCU_INIT_POINTER(active->request, NULL); - - active->retire(active, request); + struct i915_active_fence *active = + container_of(cb, typeof(*active), cb); - return 0; + //GEM_BUG_ON(rcu_access_pointer(active->fence) != fence); + RCU_INIT_POINTER(active->fence, NULL); } /* @@ -350,31 +264,29 @@ i915_active_request_retire(struct i915_active_request *active, * synchronisation. */ -void __i915_active_init(struct drm_i915_private *i915, - struct i915_active *ref, +void __i915_active_init(struct i915_active *ref, int (*active)(struct i915_active *ref), void (*retire)(struct i915_active *ref), struct lock_class_key *key); -#define i915_active_init(i915, ref, active, retire) do { \ +#define i915_active_init(ref, active, retire) do { \ static struct lock_class_key __key; \ \ - __i915_active_init(i915, ref, active, retire, &__key); \ + __i915_active_init(ref, active, retire, &__key); \ } while (0) int i915_active_ref(struct i915_active *ref, u64 timeline, - struct i915_request *rq); + struct dma_fence *fence); int i915_active_wait(struct i915_active *ref); int i915_request_await_active(struct i915_request *rq, struct i915_active *ref); -int i915_request_await_active_request(struct i915_request *rq, - struct i915_active_request *active); +int i915_request_await_active_fence(struct i915_request *rq, + struct i915_active_fence *active); int i915_active_acquire(struct i915_active *ref); void i915_active_release(struct i915_active *ref); -void __i915_active_release_nested(struct i915_active *ref, int subclass); static inline bool i915_active_is_idle(const struct i915_active *ref) diff --git a/drivers/gpu/drm/i915/i915_active_types.h b/drivers/gpu/drm/i915/i915_active_types.h index 06acdffe0f6d..9519b6523801 100644 --- a/drivers/gpu/drm/i915/i915_active_types.h +++ b/drivers/gpu/drm/i915/i915_active_types.h @@ -8,30 +8,21 @@ #define _I915_ACTIVE_TYPES_H_ #include +#include #include #include #include #include #include -struct drm_i915_private; -struct i915_active_request; -struct i915_request; - -typedef void (*i915_active_retire_fn)(struct i915_active_request *, - struct i915_request *); - -struct i915_active_request { - struct i915_request __rcu *request; - struct list_head link; - i915_active_retire_fn retire; +struct i915_active_fence { + struct dma_fence __rcu *fence; + struct dma_fence_cb cb; }; struct active_node; struct i915_active { - struct drm_i915_private *i915; - struct active_node *cache; struct rb_root tree; struct mutex mutex; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e097f7fcce6f..d74fcddd863e 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -951,34 +951,38 @@ wait_for_timelines(struct drm_i915_private *i915, mutex_lock(>->mutex); list_for_each_entry(tl, >->active_list, link) { - struct i915_request *rq; + struct dma_fence *fence; - rq = i915_active_request_get_unlocked(&tl->last_request); - if (!rq) + fence = i915_active_fence_get(&tl->last_request); + if (!fence) continue; - mutex_unlock(>->mutex); - - /* - * "Race-to-idle". - * - * Switching to the kernel context is often used a synchronous - * step prior to idling, e.g. in suspend for flushing all - * current operations to memory before sleeping. These we - * want to complete as quickly as possible to avoid prolonged - * stalls, so allow the gpu to boost to maximum clocks. - */ - if (flags & I915_WAIT_FOR_IDLE_BOOST) - gen6_rps_boost(rq); + if (!dma_fence_is_i915(fence)) { + timeout = dma_fence_wait_timeout(fence, + flags & I915_WAIT_INTERRUPTIBLE, + timeout); + } else { + struct i915_request *rq = to_request(fence); + + /* + * "Race-to-idle". + * + * Switching to the kernel context is often used as + * a synchronous step prior to idling, e.g. in suspend + * for flushing all current operations to memory before + * sleeping. These we want to complete as quickly as + * possible to avoid prolonged stalls, so allow the gpu + * to boost to maximum clocks. + */ + if (flags & I915_WAIT_FOR_IDLE_BOOST) + gen6_rps_boost(rq); + + timeout = i915_request_wait(rq, flags, timeout); + } - timeout = i915_request_wait(rq, flags, timeout); - i915_request_put(rq); + dma_fence_put(fence); if (timeout < 0) - return timeout; - - /* restart after reacquiring the lock */ - mutex_lock(>->mutex); - tl = list_entry(>->active_list, typeof(*tl), link); + break; } mutex_unlock(>->mutex); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 05fef1d3579d..60f754292a64 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -2062,7 +2062,7 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) if (!vma) return ERR_PTR(-ENOMEM); - i915_active_init(i915, &vma->active, NULL, NULL); + i915_active_init(&vma->active, NULL, NULL); vma->vm = &ggtt->vm; vma->ops = &pd_vma_ops; diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index c71edd6ea873..cbdf71e80616 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -229,9 +229,8 @@ static void free_capture_list(struct i915_request *request) static bool i915_request_retire(struct i915_request *rq) { - struct i915_active_request *active, *next; + lockdep_assert_held(&rq->timeline->mutex); - lockdep_assert_held(&rq->i915->drm.struct_mutex); if (!i915_request_completed(rq)) return false; @@ -245,35 +244,6 @@ static bool i915_request_retire(struct i915_request *rq) advance_ring(rq); - /* - * Walk through the active list, calling retire on each. This allows - * objects to track their GPU activity and mark themselves as idle - * when their *last* active request is completed (updating state - * tracking lists for eviction, active references for GEM, etc). - * - * As the ->retire() may free the node, we decouple it first and - * pass along the auxiliary information (to avoid dereferencing - * the node after the callback). - */ - list_for_each_entry_safe(active, next, &rq->active_list, link) { - /* - * In microbenchmarks or focusing upon time inside the kernel, - * we may spend an inordinate amount of time simply handling - * the retirement of requests and processing their callbacks. - * Of which, this loop itself is particularly hot due to the - * cache misses when jumping around the list of - * i915_active_request. So we try to keep this loop as - * streamlined as possible and also prefetch the next - * i915_active_request to try and hide the likely cache miss. - */ - prefetchw(next); - - INIT_LIST_HEAD(&active->link); - RCU_INIT_POINTER(active->request, NULL); - - active->retire(active, rq); - } - local_irq_disable(); /* @@ -288,8 +258,7 @@ static bool i915_request_retire(struct i915_request *rq) spin_lock(&rq->lock); i915_request_mark_complete(rq); - if (!i915_request_signaled(rq)) - dma_fence_signal_locked(&rq->fence); + dma_fence_signal_locked(&rq->fence); if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) i915_request_cancel_breadcrumb(rq); if (rq->waitboost) { @@ -328,11 +297,9 @@ void i915_request_retire_upto(struct i915_request *rq) rq->fence.context, rq->fence.seqno, hwsp_seqno(rq)); - lockdep_assert_held(&rq->i915->drm.struct_mutex); + lockdep_assert_held(&rq->timeline->mutex); GEM_BUG_ON(!i915_request_completed(rq)); - - if (list_empty(&rq->ring_link)) - return; + GEM_BUG_ON(list_empty(&rq->ring_link)); do { tmp = list_first_entry(&ring->request_list, @@ -567,6 +534,7 @@ static void ring_retire_requests(struct intel_ring *ring) { struct i915_request *rq, *rn; + lockdep_assert_held(&ring->timeline->mutex); list_for_each_entry_safe(rq, rn, &ring->request_list, ring_link) if (!i915_request_retire(rq)) break; @@ -687,7 +655,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) rq->waitboost = false; rq->execution_mask = ALL_ENGINES; - INIT_LIST_HEAD(&rq->active_list); INIT_LIST_HEAD(&rq->execute_cb); /* @@ -726,7 +693,6 @@ __i915_request_create(struct intel_context *ce, gfp_t gfp) ce->ring->emit = rq->head; /* Make sure we didn't add ourselves to external state before freeing */ - GEM_BUG_ON(!list_empty(&rq->active_list)); GEM_BUG_ON(!list_empty(&rq->sched.signalers_list)); GEM_BUG_ON(!list_empty(&rq->sched.waiters_list)); @@ -1113,7 +1079,8 @@ __i915_request_add_to_timeline(struct i915_request *rq) * precludes optimising to use semaphores serialisation of a single * timeline across engines. */ - prev = rcu_dereference_protected(timeline->last_request.request, 1); + prev = to_request(__i915_active_fence_set(&timeline->last_request, + &rq->fence)); if (prev && !i915_request_completed(prev)) { if (is_power_of_2(prev->engine->mask | rq->engine->mask)) i915_sw_fence_await_sw_fence(&rq->submit, @@ -1138,7 +1105,6 @@ __i915_request_add_to_timeline(struct i915_request *rq) * us, the timeline will hold its seqno which is later than ours. */ GEM_BUG_ON(timeline->seqno != rq->fence.seqno); - __i915_active_request_set(&timeline->last_request, rq); return prev; } @@ -1369,10 +1335,6 @@ static void request_wait_wake(struct dma_fence *fence, struct dma_fence_cb *cb) * maximum of @timeout jiffies (with MAX_SCHEDULE_TIMEOUT implying an * unbounded wait). * - * If the caller holds the struct_mutex, the caller must pass I915_WAIT_LOCKED - * in via the flags, and vice versa if the struct_mutex is not held, the caller - * must not specify that the wait is locked. - * * Returns the remaining time (in jiffies) if the request completed, which may * be zero or -ETIME if the request is unfinished after the timeout expires. * May return -EINTR is called with I915_WAIT_INTERRUPTIBLE and a signal is @@ -1482,7 +1444,11 @@ bool i915_retire_requests(struct drm_i915_private *i915) list_for_each_entry_safe(ring, tmp, &i915->gt.active_rings, active_link) { intel_ring_get(ring); /* last rq holds reference! */ + mutex_lock(&ring->timeline->mutex); + ring_retire_requests(ring); + + mutex_unlock(&ring->timeline->mutex); intel_ring_put(ring); } diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h index bebc1e9b4a5e..8277cff0df70 100644 --- a/drivers/gpu/drm/i915/i915_request.h +++ b/drivers/gpu/drm/i915/i915_request.h @@ -211,7 +211,6 @@ struct i915_request { * on the active_list (of their final request). */ struct i915_capture_list *capture_list; - struct list_head active_list; /** Time at which this request was emitted, in jiffies. */ unsigned long emitted_jiffies; diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c index e2f9336bab83..4ffb61991768 100644 --- a/drivers/gpu/drm/i915/i915_timeline.c +++ b/drivers/gpu/drm/i915/i915_timeline.c @@ -179,8 +179,7 @@ cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline) cl->hwsp = hwsp; cl->vaddr = page_pack_bits(vaddr, cacheline); - i915_active_init(hwsp_to_i915(hwsp), &cl->active, - __cacheline_active, __cacheline_retire); + i915_active_init(&cl->active, __cacheline_active, __cacheline_retire); return cl; } @@ -263,7 +262,7 @@ int i915_timeline_init(struct drm_i915_private *i915, mutex_init(&timeline->mutex); - INIT_ACTIVE_REQUEST(&timeline->last_request); + INIT_ACTIVE_FENCE(&timeline->last_request); INIT_LIST_HEAD(&timeline->requests); i915_syncmap_init(&timeline->sync); @@ -463,7 +462,7 @@ __i915_timeline_get_seqno(struct i915_timeline *tl, * all writes into the cacheline from previous requests are complete. */ err = i915_active_ref(&tl->hwsp_cacheline->active, - tl->fence_context, rq); + tl->fence_context, &rq->fence); if (err) goto err_cacheline; @@ -514,7 +513,7 @@ int i915_timeline_get_seqno(struct i915_timeline *tl, static int cacheline_ref(struct i915_timeline_cacheline *cl, struct i915_request *rq) { - return i915_active_ref(&cl->active, rq->fence.context, rq); + return i915_active_ref(&cl->active, rq->fence.context, &rq->fence); } int i915_timeline_read_hwsp(struct i915_request *from, diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h index fce5cb4f1090..5af6d185d70c 100644 --- a/drivers/gpu/drm/i915/i915_timeline_types.h +++ b/drivers/gpu/drm/i915/i915_timeline_types.h @@ -45,7 +45,7 @@ struct i915_timeline { * the request using i915_active_request_get_request_rcu(), or hold the * struct_mutex. */ - struct i915_active_request last_request; + struct i915_active_fence last_request; /** * We track the most recent seqno that we wait on in every context so diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 5bc58d0baf73..b8b8a9ca1ac7 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -114,8 +114,7 @@ vma_create(struct drm_i915_gem_object *obj, vma->size = obj->base.size; vma->display_alignment = I915_GTT_MIN_ALIGNMENT; - i915_active_init(vm->i915, &vma->active, - __i915_vma_active, __i915_vma_retire); + i915_active_init(&vma->active, __i915_vma_active, __i915_vma_retire); INIT_LIST_HEAD(&vma->closed_link); @@ -909,7 +908,7 @@ int i915_vma_move_to_active(struct i915_vma *vma, * add the active reference first and queue for it to be dropped * *last*. */ - err = i915_active_ref(&vma->active, rq->fence.context, rq); + err = i915_active_ref(&vma->active, rq->fence.context, &rq->fence); if (err) return err; @@ -920,7 +919,7 @@ int i915_vma_move_to_active(struct i915_vma *vma, if (intel_frontbuffer_invalidate(obj->frontbuffer, ORIGIN_CS)) i915_active_ref(&obj->frontbuffer->write, rq->fence.context, - rq); + &rq->fence); obj->read_domains = 0; } @@ -939,31 +938,10 @@ int i915_vma_unbind(struct i915_vma *vma) lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); - /* - * First wait upon any activity as retiring the request may - * have side-effects such as unpinning or even unbinding this vma. - */ - might_sleep(); - if (i915_vma_is_active(vma)) { - /* - * When a closed VMA is retired, it is unbound - eek. - * In order to prevent it from being recursively closed, - * take a pin on the vma so that the second unbind is - * aborted. - * - * Even more scary is that the retire callback may free - * the object (last active vma). To prevent the explosion - * we defer the actual object free to a worker that can - * only proceed once it acquires the struct_mutex (which - * we currently hold, therefore it cannot free this object - * before we are finished). - */ - __i915_vma_pin(vma); - ret = i915_active_wait(&vma->active); - __i915_vma_unpin(vma); - if (ret) - return ret; - } + ret = i915_active_wait(&vma->active); + if (ret) + return ret; + flush_work(&vma->active.work); GEM_BUG_ON(i915_vma_is_active(vma)); diff --git a/drivers/gpu/drm/i915/intel_frontbuffer.c b/drivers/gpu/drm/i915/intel_frontbuffer.c index 4fcec413f405..e8a1bb20f548 100644 --- a/drivers/gpu/drm/i915/intel_frontbuffer.c +++ b/drivers/gpu/drm/i915/intel_frontbuffer.c @@ -253,8 +253,7 @@ intel_frontbuffer_get(struct drm_i915_gem_object *obj) fb->obj = obj; kref_init(&fb->ref); atomic_set(&fb->bits, 0); - i915_active_init(i915, &fb->write, - frontbuffer_active, frontbuffer_retire); + i915_active_init(&fb->write, frontbuffer_active, frontbuffer_retire); spin_lock(&i915->fb_tracking.lock); if (obj->frontbuffer) { diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index 38495df60f7b..dc7b66c94f74 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -230,7 +230,8 @@ alloc_request(struct intel_overlay *overlay, void (*fn)(struct intel_overlay *)) if (IS_ERR(rq)) return rq; - err = i915_active_ref(&overlay->last_flip, rq->fence.context, rq); + err = i915_active_ref(&overlay->last_flip, + rq->fence.context, &rq->fence); if (err) { i915_request_add(rq); return ERR_PTR(err); @@ -1366,8 +1367,7 @@ void intel_overlay_setup(struct drm_i915_private *dev_priv) overlay->contrast = 75; overlay->saturation = 146; - i915_active_init(dev_priv, - &overlay->last_flip, + i915_active_init(&overlay->last_flip, NULL, intel_overlay_last_flip_retire); ret = get_registers(overlay, OVERLAY_NEEDS_PHYSICAL(dev_priv)); diff --git a/drivers/gpu/drm/i915/selftests/i915_active.c b/drivers/gpu/drm/i915/selftests/i915_active.c index 3b3ca5658122..376ea04350b1 100644 --- a/drivers/gpu/drm/i915/selftests/i915_active.c +++ b/drivers/gpu/drm/i915/selftests/i915_active.c @@ -36,7 +36,7 @@ static int __live_active_setup(struct drm_i915_private *i915, if (!submit) return -ENOMEM; - i915_active_init(i915, &active->base, NULL, __live_active_retire); + i915_active_init(&active->base, NULL, __live_active_retire); active->retired = false; err = i915_active_acquire(&active->base); @@ -57,7 +57,8 @@ static int __live_active_setup(struct drm_i915_private *i915, GFP_KERNEL); if (err >= 0) err = i915_active_ref(&active->base, - rq->fence.context, rq); + rq->fence.context, + &rq->fence); i915_request_add(rq); if (err) { pr_err("Failed to track active ref!\n"); diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c index 65b52be23d42..024de718f66f 100644 --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c @@ -15,7 +15,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context) mutex_init(&timeline->mutex); - INIT_ACTIVE_REQUEST(&timeline->last_request); + INIT_ACTIVE_FENCE(&timeline->last_request); INIT_LIST_HEAD(&timeline->requests); i915_syncmap_init(&timeline->sync); From patchwork Mon Jun 10 07:21:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984167 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BD0E91902 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id A6C52286C2 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 9B60E28816; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 2AEC928812 for ; Mon, 10 Jun 2019 07:21:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B9C788911F; Mon, 10 Jun 2019 07:21:43 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 88F8C8910B for ; Mon, 10 Jun 2019 07:21:37 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848375-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:20 +0100 Message-Id: <20190610072126.6355-23-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 22/28] drm/i915: Only track bound elements of the GTT X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Signed-off-by: Chris Wilson --- drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 20 +++---------------- drivers/gpu/drm/i915/i915_gem_gtt.h | 5 ----- drivers/gpu/drm/i915/i915_vma.c | 12 ++--------- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 2 +- 5 files changed, 7 insertions(+), 34 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index c9b5e6cd940d..65ba85655582 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -686,7 +686,7 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv __i915_vma_set_map_and_fenceable(vma); mutex_lock(&ggtt->vm.mutex); - list_move_tail(&vma->vm_link, &ggtt->vm.bound_list); + list_add_tail(&vma->vm_link, &ggtt->vm.bound_list); mutex_unlock(&ggtt->vm.mutex); GEM_BUG_ON(i915_gem_object_is_shrinkable(obj)); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 60f754292a64..429817c1d2b8 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -498,7 +498,6 @@ static void i915_address_space_init(struct i915_address_space *vm, int subclass) stash_init(&vm->free_pages); - INIT_LIST_HEAD(&vm->unbound_list); INIT_LIST_HEAD(&vm->bound_list); } @@ -2076,10 +2075,6 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) INIT_LIST_HEAD(&vma->obj_link); INIT_LIST_HEAD(&vma->closed_link); - mutex_lock(&vma->vm->mutex); - list_add(&vma->vm_link, &vma->vm->unbound_list); - mutex_unlock(&vma->vm->mutex); - return vma; } @@ -2256,19 +2251,11 @@ i915_ppgtt_create(struct drm_i915_private *i915) static void ppgtt_destroy_vma(struct i915_address_space *vm) { - struct list_head *phases[] = { - &vm->bound_list, - &vm->unbound_list, - NULL, - }, **phase; + struct i915_vma *vma, *vn; vm->closed = true; - for (phase = phases; *phase; phase++) { - struct i915_vma *vma, *vn; - - list_for_each_entry_safe(vma, vn, *phase, vm_link) - i915_vma_destroy(vma); - } + list_for_each_entry_safe(vma, vn, &vm->bound_list, vm_link) + i915_vma_destroy(vma); } void i915_ppgtt_release(struct kref *kref) @@ -2281,7 +2268,6 @@ void i915_ppgtt_release(struct kref *kref) ppgtt_destroy_vma(&ppgtt->vm); GEM_BUG_ON(!list_empty(&ppgtt->vm.bound_list)); - GEM_BUG_ON(!list_empty(&ppgtt->vm.unbound_list)); ppgtt->vm.cleanup(&ppgtt->vm); i915_address_space_fini(&ppgtt->vm); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 5f155bf183bb..939f2f13ef7d 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -327,11 +327,6 @@ struct i915_address_space { */ struct list_head bound_list; - /** - * List of vma that are not unbound. - */ - struct list_head unbound_list; - struct pagestash free_pages; /* Global GTT */ diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index b8b8a9ca1ac7..2d7af763b928 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -207,10 +207,6 @@ vma_create(struct drm_i915_gem_object *obj, spin_unlock(&obj->vma.lock); - mutex_lock(&vm->mutex); - list_add(&vma->vm_link, &vm->unbound_list); - mutex_unlock(&vm->mutex); - return vma; err_vma: @@ -648,7 +644,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level)); mutex_lock(&vma->vm->mutex); - list_move_tail(&vma->vm_link, &vma->vm->bound_list); + list_add_tail(&vma->vm_link, &vma->vm->bound_list); mutex_unlock(&vma->vm->mutex); if (vma->obj) { @@ -676,7 +672,7 @@ i915_vma_remove(struct i915_vma *vma) mutex_lock(&vma->vm->mutex); drm_mm_remove_node(&vma->node); - list_move_tail(&vma->vm_link, &vma->vm->unbound_list); + list_del(&vma->vm_link); mutex_unlock(&vma->vm->mutex); /* @@ -789,10 +785,6 @@ static void __i915_vma_destroy(struct i915_vma *vma) GEM_BUG_ON(vma->node.allocated); GEM_BUG_ON(vma->fence); - mutex_lock(&vma->vm->mutex); - list_del(&vma->vm_link); - mutex_unlock(&vma->vm->mutex); - if (vma->obj) { struct drm_i915_gem_object *obj = vma->obj; diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index dda8b9c79c37..cb7193c9bc77 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -1239,7 +1239,7 @@ static void track_vma_bind(struct i915_vma *vma) vma->pages = obj->mm.pages; mutex_lock(&vma->vm->mutex); - list_move_tail(&vma->vm_link, &vma->vm->bound_list); + list_add_tail(&vma->vm_link, &vma->vm->bound_list); mutex_unlock(&vma->vm->mutex); } From patchwork Mon Jun 10 07:21:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984145 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E2ABE6C5 for ; Mon, 10 Jun 2019 07:21:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CC9CE286C2 for ; Mon, 10 Jun 2019 07:21:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C0B7E28816; Mon, 10 Jun 2019 07:21:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4617C28812 for ; Mon, 10 Jun 2019 07:21:35 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5497089100; Mon, 10 Jun 2019 07:21:34 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 5B0A188FA4 for ; Mon, 10 Jun 2019 07:21:32 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848376-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:21 +0100 Message-Id: <20190610072126.6355-24-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 23/28] drm/i915: Propagate fence errors X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Auld Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Errors spread like wildfire, and must eventually be returned to the user. They need to be captured and passed along the flow of fences, infecting each in turn with the existing error, until finally they fall out of a user visible result. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/i915_request.c | 8 +++++++ drivers/gpu/drm/i915/i915_sw_fence.c | 23 +++++++++++++++---- drivers/gpu/drm/i915/i915_sw_fence.h | 7 ++++++ drivers/gpu/drm/i915/selftests/lib_sw_fence.c | 1 + 4 files changed, 34 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index cbdf71e80616..595c29ec56b7 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -490,6 +490,10 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) switch (state) { case FENCE_COMPLETE: trace_i915_request_submit(request); + + if (unlikely(fence->error)) + i915_request_skip(request, fence->error); + /* * We need to serialize use of the submit_request() callback * with its hotplugging performed during an emergency @@ -1040,6 +1044,9 @@ void i915_request_skip(struct i915_request *rq, int error) GEM_BUG_ON(!IS_ERR_VALUE((long)error)); dma_fence_set_error(&rq->fence, error); + if (rq->infix == rq->postfix) + return; + /* * As this request likely depends on state from the lost * context, clear out all the user operations leaving the @@ -1051,6 +1058,7 @@ void i915_request_skip(struct i915_request *rq, int error) head = 0; } memset(vaddr + head, 0, rq->postfix - head); + rq->infix = rq->postfix; } static struct i915_request * diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c index 5387aafd3424..dedacafc9442 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.c +++ b/drivers/gpu/drm/i915/i915_sw_fence.c @@ -157,8 +157,11 @@ static void __i915_sw_fence_wake_up_all(struct i915_sw_fence *fence, LIST_HEAD(extra); do { - list_for_each_entry_safe(pos, next, &x->head, entry) - pos->func(pos, TASK_NORMAL, 0, &extra); + list_for_each_entry_safe(pos, next, &x->head, entry) { + pos->func(pos, + TASK_NORMAL, fence->error, + &extra); + } if (list_empty(&extra)) break; @@ -219,6 +222,8 @@ void __i915_sw_fence_init(struct i915_sw_fence *fence, __init_waitqueue_head(&fence->wait, name, key); atomic_set(&fence->pending, 1); + fence->error = 0; + fence->flags = (unsigned long)fn; } @@ -230,6 +235,8 @@ void i915_sw_fence_commit(struct i915_sw_fence *fence) static int i915_sw_fence_wake(wait_queue_entry_t *wq, unsigned mode, int flags, void *key) { + i915_sw_fence_set_error_once(wq->private, flags); + list_del(&wq->entry); __i915_sw_fence_complete(wq->private, key); @@ -302,8 +309,10 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence, debug_fence_assert(fence); might_sleep_if(gfpflags_allow_blocking(gfp)); - if (i915_sw_fence_done(signaler)) + if (i915_sw_fence_done(signaler)) { + i915_sw_fence_set_error_once(fence, signaler->error); return 0; + } debug_fence_assert(signaler); @@ -319,6 +328,7 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence, return -ENOMEM; i915_sw_fence_wait(signaler); + i915_sw_fence_set_error_once(fence, signaler->error); return 0; } @@ -337,7 +347,7 @@ static int __i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence, __add_wait_queue_entry_tail(&signaler->wait, wq); pending = 1; } else { - i915_sw_fence_wake(wq, 0, 0, NULL); + i915_sw_fence_wake(wq, 0, signaler->error, NULL); pending = 0; } spin_unlock_irqrestore(&signaler->wait.lock, flags); @@ -372,6 +382,7 @@ static void dma_i915_sw_fence_wake(struct dma_fence *dma, { struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base); + i915_sw_fence_set_error_once(cb->fence, dma->error); i915_sw_fence_complete(cb->fence); kfree(cb); } @@ -391,6 +402,7 @@ static void timer_i915_sw_fence_wake(struct timer_list *t) cb->dma->seqno, i915_sw_fence_debug_hint(fence)); + i915_sw_fence_set_error_once(fence, -ETIMEDOUT); i915_sw_fence_complete(fence); } @@ -480,6 +492,7 @@ static void __dma_i915_sw_fence_wake(struct dma_fence *dma, { struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base); + i915_sw_fence_set_error_once(cb->fence, dma->error); i915_sw_fence_complete(cb->fence); } @@ -501,7 +514,7 @@ int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence, if (ret == 0) { ret = 1; } else { - i915_sw_fence_complete(fence); + __dma_i915_sw_fence_wake(dma, &cb->base); if (ret == -ENOENT) /* fence already signaled */ ret = 0; } diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h index 9cb5c3b307a6..518cbaad9bea 100644 --- a/drivers/gpu/drm/i915/i915_sw_fence.h +++ b/drivers/gpu/drm/i915/i915_sw_fence.h @@ -22,6 +22,7 @@ struct i915_sw_fence { wait_queue_head_t wait; unsigned long flags; atomic_t pending; + int error; }; #define I915_SW_FENCE_CHECKED_BIT 0 /* used internally for DAG checking */ @@ -106,4 +107,10 @@ static inline void i915_sw_fence_wait(struct i915_sw_fence *fence) wait_event(fence->wait, i915_sw_fence_done(fence)); } +static inline void +i915_sw_fence_set_error_once(struct i915_sw_fence *fence, int error) +{ + cmpxchg(&fence->error, 0, error); +} + #endif /* _I915_SW_FENCE_H_ */ diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c index b976c12817c5..080b90b63d16 100644 --- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c +++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c @@ -40,6 +40,7 @@ void __onstack_fence_init(struct i915_sw_fence *fence, __init_waitqueue_head(&fence->wait, name, key); atomic_set(&fence->pending, 1); + fence->error = 0; fence->flags = (unsigned long)nop_fence_notify; } From patchwork Mon Jun 10 07:21:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984153 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E63911580 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CF817286C2 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C43922881C; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 4CC06286C2 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0716E88FA4; Mon, 10 Jun 2019 07:21:39 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id ADA088910A for ; Mon, 10 Jun 2019 07:21:36 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848377-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:22 +0100 Message-Id: <20190610072126.6355-25-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 24/28] drm/i915: Allow page pinning to be in the background X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Auld Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Assume that pages may be pinned in a background task and use a completion event to synchronise with callers that must access the pages immediately. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_object.c | 1 + drivers/gpu/drm/i915/gem/i915_gem_object.h | 7 +-- .../gpu/drm/i915/gem/i915_gem_object_types.h | 3 ++ drivers/gpu/drm/i915/gem/i915_gem_pages.c | 53 +++++++++++++++---- 4 files changed, 52 insertions(+), 12 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index 5f75f69687e8..ee69cd7948c0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -65,6 +65,7 @@ void i915_gem_object_init(struct drm_i915_gem_object *obj, obj->mm.madv = I915_MADV_WILLNEED; INIT_RADIX_TREE(&obj->mm.get_page.radix, GFP_KERNEL | __GFP_NOWARN); mutex_init(&obj->mm.get_page.lock); + init_completion(&obj->mm.completion); } /** diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 67d70d144bd9..7ea8013d108f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -234,7 +234,7 @@ int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj); int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj); static inline int __must_check -i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) +i915_gem_object_pin_pages_async(struct drm_i915_gem_object *obj) { might_lock(&obj->mm.lock); @@ -244,6 +244,9 @@ i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) return __i915_gem_object_get_pages(obj); } +int __must_check +i915_gem_object_pin_pages(struct drm_i915_gem_object *obj); + static inline bool i915_gem_object_has_pages(struct drm_i915_gem_object *obj) { @@ -267,9 +270,7 @@ i915_gem_object_has_pinned_pages(struct drm_i915_gem_object *obj) static inline void __i915_gem_object_unpin_pages(struct drm_i915_gem_object *obj) { - GEM_BUG_ON(!i915_gem_object_has_pages(obj)); GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); - atomic_dec(&obj->mm.pages_pin_count); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index e87fca4d8194..8f61d7a93078 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -7,6 +7,7 @@ #ifndef __I915_GEM_OBJECT_TYPES_H__ #define __I915_GEM_OBJECT_TYPES_H__ +#include #include #include @@ -210,6 +211,8 @@ struct drm_i915_gem_object { */ struct list_head link; + struct completion completion; + /** * Advice: are the backing pages purgeable? */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_pages.c b/drivers/gpu/drm/i915/gem/i915_gem_pages.c index b36ad269f4ea..6bec301cee79 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_pages.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_pages.c @@ -73,21 +73,18 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } + + complete_all(&obj->mm.completion); } int ____i915_gem_object_get_pages(struct drm_i915_gem_object *obj) { - int err; - if (unlikely(obj->mm.madv != I915_MADV_WILLNEED)) { DRM_DEBUG("Attempting to obtain a purgeable object\n"); return -EFAULT; } - err = obj->ops->get_pages(obj); - GEM_BUG_ON(!err && !i915_gem_object_has_pages(obj)); - - return err; + return obj->ops->get_pages(obj); } /* Ensure that the associated pages are gathered from the backing storage @@ -105,7 +102,7 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) if (err) return err; - if (unlikely(!i915_gem_object_has_pages(obj))) { + if (!obj->mm.pages) { GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj)); err = ____i915_gem_object_get_pages(obj); @@ -121,6 +118,32 @@ int __i915_gem_object_get_pages(struct drm_i915_gem_object *obj) return err; } +int i915_gem_object_pin_pages(struct drm_i915_gem_object *obj) +{ + int err; + + err = i915_gem_object_pin_pages_async(obj); + if (err) + return err; + + err = wait_for_completion_interruptible(&obj->mm.completion); + if (err) + goto err_unpin; + + if (IS_ERR(obj->mm.pages)) { + err = PTR_ERR(obj->mm.pages); + goto err_unpin; + } + + GEM_BUG_ON(!i915_gem_object_has_pages(obj)); + return 0; + +err_unpin: + GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); + atomic_dec(&obj->mm.pages_pin_count); + return err; +} + /* Immediately discard the backing storage */ void i915_gem_object_truncate(struct drm_i915_gem_object *obj) { @@ -201,6 +224,9 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, GEM_BUG_ON(atomic_read(&obj->bind_count)); + if (obj->mm.pages == ERR_PTR(-EAGAIN)) + wait_for_completion(&obj->mm.completion); + /* May be called by shrinker from within get_pages() (on another bo) */ mutex_lock_nested(&obj->mm.lock, subclass); if (unlikely(atomic_read(&obj->mm.pages_pin_count))) { @@ -227,6 +253,7 @@ int __i915_gem_object_put_pages(struct drm_i915_gem_object *obj, if (!IS_ERR(pages)) obj->ops->put_pages(obj, pages); + reinit_completion(&obj->mm.completion); err = 0; unlock: mutex_unlock(&obj->mm.lock); @@ -304,7 +331,7 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, type &= ~I915_MAP_OVERRIDE; if (!atomic_inc_not_zero(&obj->mm.pages_pin_count)) { - if (unlikely(!i915_gem_object_has_pages(obj))) { + if (!obj->mm.pages) { GEM_BUG_ON(i915_gem_object_has_pinned_pages(obj)); err = ____i915_gem_object_get_pages(obj); @@ -316,7 +343,6 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, atomic_inc(&obj->mm.pages_pin_count); pinned = false; } - GEM_BUG_ON(!i915_gem_object_has_pages(obj)); ptr = page_unpack_bits(obj->mm.mapping, &has_type); if (ptr && has_type != type) { @@ -334,6 +360,15 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj, } if (!ptr) { + err = wait_for_completion_interruptible(&obj->mm.completion); + if (err) + goto err_unpin; + + if (IS_ERR(obj->mm.pages)) { + err = PTR_ERR(obj->mm.pages); + goto err_unpin; + } + ptr = i915_gem_object_map(obj, type); if (!ptr) { err = -ENOMEM; From patchwork Mon Jun 10 07:21:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984159 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 92F406C5 for ; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 786DD286C2 for ; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6CB1628832; Mon, 10 Jun 2019 07:21:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 26A5D286C2 for ; Mon, 10 Jun 2019 07:21:42 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 554AF8910D; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id D513988FA4 for ; Mon, 10 Jun 2019 07:21:35 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848378-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:23 +0100 Message-Id: <20190610072126.6355-26-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 25/28] drm/i915: Pull kref into i915_address_space X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Auld Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Make the kref common to both derived structs (i915_ggtt and i915_ppgtt) so that we can safely reference count an abstract ctx->vm address space. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- .../gpu/drm/i915/gem/i915_gem_client_blt.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_context.c | 132 +++++++++--------- .../gpu/drm/i915/gem/i915_gem_context_types.h | 6 +- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 4 +- .../gpu/drm/i915/gem/i915_gem_object_blt.c | 4 +- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 6 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 26 ++-- .../drm/i915/gem/selftests/i915_gem_context.c | 36 +++-- .../gpu/drm/i915/gem/selftests/mock_context.c | 3 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 26 ++-- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 7 +- drivers/gpu/drm/i915/gt/selftest_lrc.c | 2 +- .../gpu/drm/i915/gt/selftest_workarounds.c | 10 +- drivers/gpu/drm/i915/gvt/scheduler.c | 8 +- drivers/gpu/drm/i915/i915_debugfs.c | 2 +- drivers/gpu/drm/i915/i915_drv.h | 6 - drivers/gpu/drm/i915/i915_gem_gtt.c | 33 ++--- drivers/gpu/drm/i915/i915_gem_gtt.h | 27 ++-- drivers/gpu/drm/i915/i915_gpu_error.c | 2 +- drivers/gpu/drm/i915/i915_trace.h | 2 +- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 10 +- drivers/gpu/drm/i915/selftests/i915_request.c | 3 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 4 +- drivers/gpu/drm/i915/selftests/igt_spinner.c | 5 +- drivers/gpu/drm/i915/selftests/mock_gtt.c | 1 - 26 files changed, 187 insertions(+), 191 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c index 4899ca1dd76c..f253ec5765ad 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c @@ -250,13 +250,11 @@ int i915_gem_schedule_fill_pages_blt(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct i915_gem_context *ctx = ce->gem_context; - struct i915_address_space *vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct clear_pages_work *work; struct i915_sleeve *sleeve; int err; - vm = ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; - sleeve = create_sleeve(vm, obj, pages, page_sizes); if (IS_ERR(sleeve)) return PTR_ERR(sleeve); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 2e684f8151d9..342407ec0119 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -309,7 +309,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx) GEM_BUG_ON(!i915_gem_context_is_closed(ctx)); release_hw_id(ctx); - i915_ppgtt_put(ctx->ppgtt); + if (ctx->vm) + i915_vm_put(ctx->vm); free_engines(rcu_access_pointer(ctx->engines)); mutex_destroy(&ctx->engines_mutex); @@ -397,7 +398,7 @@ static void context_close(struct i915_gem_context *ctx) } static u32 default_desc_template(const struct drm_i915_private *i915, - const struct i915_hw_ppgtt *ppgtt) + const struct i915_address_space *vm) { u32 address_mode; u32 desc; @@ -405,7 +406,7 @@ static u32 default_desc_template(const struct drm_i915_private *i915, desc = GEN8_CTX_VALID | GEN8_CTX_PRIVILEGE; address_mode = INTEL_LEGACY_32B_CONTEXT; - if (ppgtt && i915_vm_is_4lvl(&ppgtt->vm)) + if (vm && i915_vm_is_4lvl(vm)) address_mode = INTEL_LEGACY_64B_CONTEXT; desc |= address_mode << GEN8_CTX_ADDRESSING_MODE_SHIFT; @@ -421,7 +422,7 @@ static u32 default_desc_template(const struct drm_i915_private *i915, } static struct i915_gem_context * -__create_context(struct drm_i915_private *dev_priv) +__create_context(struct drm_i915_private *i915) { struct i915_gem_context *ctx; struct i915_gem_engines *e; @@ -433,8 +434,8 @@ __create_context(struct drm_i915_private *dev_priv) return ERR_PTR(-ENOMEM); kref_init(&ctx->ref); - list_add_tail(&ctx->link, &dev_priv->contexts.list); - ctx->i915 = dev_priv; + list_add_tail(&ctx->link, &i915->contexts.list); + ctx->i915 = i915; ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL); mutex_init(&ctx->mutex); @@ -452,14 +453,14 @@ __create_context(struct drm_i915_private *dev_priv) /* NB: Mark all slices as needing a remap so that when the context first * loads it will restore whatever remap state already exists. If there * is no remap info, it will be a NOP. */ - ctx->remap_slice = ALL_L3_SLICES(dev_priv); + ctx->remap_slice = ALL_L3_SLICES(i915); i915_gem_context_set_bannable(ctx); i915_gem_context_set_recoverable(ctx); ctx->ring_size = 4 * PAGE_SIZE; ctx->desc_template = - default_desc_template(dev_priv, dev_priv->mm.aliasing_ppgtt); + default_desc_template(i915, &i915->mm.aliasing_ppgtt->vm); for (i = 0; i < ARRAY_SIZE(ctx->hang_timestamp); i++) ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES; @@ -471,26 +472,26 @@ __create_context(struct drm_i915_private *dev_priv) return ERR_PTR(err); } -static struct i915_hw_ppgtt * -__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt) +static struct i915_address_space * +__set_ppgtt(struct i915_gem_context *ctx, struct i915_address_space *vm) { - struct i915_hw_ppgtt *old = ctx->ppgtt; + struct i915_address_space *old = ctx->vm; - ctx->ppgtt = i915_ppgtt_get(ppgtt); - ctx->desc_template = default_desc_template(ctx->i915, ppgtt); + ctx->vm = i915_vm_get(vm); + ctx->desc_template = default_desc_template(ctx->i915, vm); return old; } static void __assign_ppgtt(struct i915_gem_context *ctx, - struct i915_hw_ppgtt *ppgtt) + struct i915_address_space *vm) { - if (ppgtt == ctx->ppgtt) + if (vm == ctx->vm) return; - ppgtt = __set_ppgtt(ctx, ppgtt); - if (ppgtt) - i915_ppgtt_put(ppgtt); + vm = __set_ppgtt(ctx, vm); + if (vm) + i915_vm_put(vm); } static struct i915_gem_context * @@ -522,8 +523,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv, unsigned int flags) return ERR_CAST(ppgtt); } - __assign_ppgtt(ctx, ppgtt); - i915_ppgtt_put(ppgtt); + __assign_ppgtt(ctx, &ppgtt->vm); + i915_vm_put(&ppgtt->vm); } if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) { @@ -712,7 +713,7 @@ static int context_idr_cleanup(int id, void *p, void *data) static int vm_idr_cleanup(int id, void *p, void *data) { - i915_ppgtt_put(p); + i915_vm_put(p); return 0; } @@ -722,8 +723,8 @@ static int gem_context_register(struct i915_gem_context *ctx, int ret; ctx->file_priv = fpriv; - if (ctx->ppgtt) - ctx->ppgtt->vm.file = fpriv; + if (ctx->vm) + ctx->vm->file = fpriv; ctx->pid = get_task_pid(current, PIDTYPE_PID); ctx->name = kasprintf(GFP_KERNEL, "%s[%d]", @@ -833,7 +834,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data, if (err) goto err_put; - err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL); + err = idr_alloc(&file_priv->vm_idr, &ppgtt->vm, 0, 0, GFP_KERNEL); if (err < 0) goto err_unlock; @@ -847,7 +848,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data, err_unlock: mutex_unlock(&file_priv->vm_idr_lock); err_put: - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); return err; } @@ -856,7 +857,7 @@ int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data, { struct drm_i915_file_private *file_priv = file->driver_priv; struct drm_i915_gem_vm_control *args = data; - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; int err; u32 id; @@ -874,13 +875,13 @@ int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data, if (err) return err; - ppgtt = idr_remove(&file_priv->vm_idr, id); + vm = idr_remove(&file_priv->vm_idr, id); mutex_unlock(&file_priv->vm_idr_lock); - if (!ppgtt) + if (!vm) return -ENOENT; - i915_ppgtt_put(ppgtt); + i915_vm_put(vm); return 0; } @@ -974,10 +975,10 @@ static int get_ppgtt(struct drm_i915_file_private *file_priv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) { - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; int ret; - if (!ctx->ppgtt) + if (!ctx->vm) return -ENODEV; /* XXX rcu acquire? */ @@ -985,19 +986,19 @@ static int get_ppgtt(struct drm_i915_file_private *file_priv, if (ret) return ret; - ppgtt = i915_ppgtt_get(ctx->ppgtt); + vm = i915_vm_get(ctx->vm); mutex_unlock(&ctx->i915->drm.struct_mutex); ret = mutex_lock_interruptible(&file_priv->vm_idr_lock); if (ret) goto err_put; - ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL); + ret = idr_alloc(&file_priv->vm_idr, vm, 0, 0, GFP_KERNEL); GEM_BUG_ON(!ret); if (ret < 0) goto err_unlock; - i915_ppgtt_get(ppgtt); + i915_vm_get(vm); args->size = 0; args->value = ret; @@ -1006,29 +1007,30 @@ static int get_ppgtt(struct drm_i915_file_private *file_priv, err_unlock: mutex_unlock(&file_priv->vm_idr_lock); err_put: - i915_ppgtt_put(ppgtt); + i915_vm_put(vm); return ret; } static void set_ppgtt_barrier(void *data) { - struct i915_hw_ppgtt *old = data; + struct i915_address_space *old = data; - if (INTEL_GEN(old->vm.i915) < 8) - gen6_ppgtt_unpin_all(old); + if (INTEL_GEN(old->i915) < 8) + gen6_ppgtt_unpin_all(i915_vm_to_ppgtt(old)); - i915_ppgtt_put(old); + i915_vm_put(old); } static int emit_ppgtt_update(struct i915_request *rq, void *data) { - struct i915_hw_ppgtt *ppgtt = rq->gem_context->ppgtt; + struct i915_address_space *vm = rq->gem_context->vm; struct intel_engine_cs *engine = rq->engine; u32 base = engine->mmio_base; u32 *cs; int i; - if (i915_vm_is_4lvl(&ppgtt->vm)) { + if (i915_vm_is_4lvl(vm)) { + struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); const dma_addr_t pd_daddr = px_dma(&ppgtt->pml4); cs = intel_ring_begin(rq, 6); @@ -1045,6 +1047,8 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data) *cs++ = MI_NOOP; intel_ring_advance(rq, cs); } else if (HAS_LOGICAL_RING_CONTEXTS(engine->i915)) { + struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); if (IS_ERR(cs)) return PTR_ERR(cs); @@ -1062,7 +1066,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data) intel_ring_advance(rq, cs); } else { /* ppGTT is not part of the legacy context image */ - gen6_ppgtt_pin(ppgtt); + gen6_ppgtt_pin(i915_vm_to_ppgtt(vm)); } return 0; @@ -1080,13 +1084,13 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, struct i915_gem_context *ctx, struct drm_i915_gem_context_param *args) { - struct i915_hw_ppgtt *ppgtt, *old; + struct i915_address_space *vm, *old; int err; if (args->size) return -EINVAL; - if (!ctx->ppgtt) + if (!ctx->vm) return -ENODEV; if (upper_32_bits(args->value)) @@ -1096,18 +1100,18 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, if (err) return err; - ppgtt = idr_find(&file_priv->vm_idr, args->value); - if (ppgtt) - i915_ppgtt_get(ppgtt); + vm = idr_find(&file_priv->vm_idr, args->value); + if (vm) + i915_vm_get(vm); mutex_unlock(&file_priv->vm_idr_lock); - if (!ppgtt) + if (!vm) return -ENOENT; err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex); if (err) goto out; - if (ppgtt == ctx->ppgtt) + if (vm == ctx->vm) goto unlock; /* Teardown the existing obj:vma cache, it will have to be rebuilt. */ @@ -1115,7 +1119,7 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, lut_close(ctx); mutex_unlock(&ctx->mutex); - old = __set_ppgtt(ctx, ppgtt); + old = __set_ppgtt(ctx, vm); /* * We need to flush any requests using the current ppgtt before @@ -1128,16 +1132,16 @@ static int set_ppgtt(struct drm_i915_file_private *file_priv, set_ppgtt_barrier, old); if (err) { - ctx->ppgtt = old; + ctx->vm = old; ctx->desc_template = default_desc_template(ctx->i915, old); - i915_ppgtt_put(ppgtt); + i915_vm_put(vm); } unlock: mutex_unlock(&ctx->i915->drm.struct_mutex); out: - i915_ppgtt_put(ppgtt); + i915_vm_put(vm); return err; } @@ -2025,15 +2029,15 @@ static int clone_timeline(struct i915_gem_context *dst, static int clone_vm(struct i915_gem_context *dst, struct i915_gem_context *src) { - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; rcu_read_lock(); do { - ppgtt = READ_ONCE(src->ppgtt); - if (!ppgtt) + vm = READ_ONCE(src->vm); + if (!vm) break; - if (!kref_get_unless_zero(&ppgtt->ref)) + if (!kref_get_unless_zero(&vm->ref)) continue; /* @@ -2051,16 +2055,16 @@ static int clone_vm(struct i915_gem_context *dst, * it cannot be reallocated elsewhere. */ - if (ppgtt == READ_ONCE(src->ppgtt)) + if (vm == READ_ONCE(src->vm)) break; - i915_ppgtt_put(ppgtt); + i915_vm_put(vm); } while (1); rcu_read_unlock(); - if (ppgtt) { - __assign_ppgtt(dst, ppgtt); - i915_ppgtt_put(ppgtt); + if (vm) { + __assign_ppgtt(dst, vm); + i915_vm_put(vm); } return 0; @@ -2285,8 +2289,8 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data, case I915_CONTEXT_PARAM_GTT_SIZE: args->size = 0; - if (ctx->ppgtt) - args->value = ctx->ppgtt->vm.total; + if (ctx->vm) + args->value = ctx->vm->total; else if (to_i915(dev)->mm.aliasing_ppgtt) args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total; else diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h index 3db7448b9732..cc513410eeef 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h @@ -25,7 +25,7 @@ struct pid; struct drm_i915_private; struct drm_i915_file_private; -struct i915_hw_ppgtt; +struct i915_address_space; struct i915_timeline; struct intel_ring; @@ -80,7 +80,7 @@ struct i915_gem_context { struct i915_timeline *timeline; /** - * @ppgtt: unique address space (GTT) + * @vm: unique address space (GTT) * * In full-ppgtt mode, each context has its own address space ensuring * complete seperation of one client from all others. @@ -88,7 +88,7 @@ struct i915_gem_context { * In other modes, this is a NULL pointer with the expectation that * the caller uses the shared global GTT. */ - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; /** * @pid: process id of creator diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 6193c81ebbed..eeb4c6cdb01d 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -723,8 +723,8 @@ static int eb_select_context(struct i915_execbuffer *eb) return -ENOENT; eb->gem_context = ctx; - if (ctx->ppgtt) { - eb->vm = &ctx->ppgtt->vm; + if (ctx->vm) { + eb->vm = ctx->vm; eb->invalid_flags |= EXEC_OBJECT_NEEDS_GTT; } else { eb->vm = &eb->i915->ggtt.vm; diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c index fc8ee7ef3d69..cb42e3a312e2 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_blt.c @@ -49,14 +49,12 @@ int i915_gem_object_fill_blt(struct drm_i915_gem_object *obj, { struct drm_i915_private *i915 = to_i915(obj->base.dev); struct i915_gem_context *ctx = ce->gem_context; - struct i915_address_space *vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct i915_request *rq; struct i915_vma *vma; int err; /* XXX: ce->vm please */ - vm = ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; - vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) return PTR_ERR(vma); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index cfa990edb351..528b61678334 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -768,14 +768,14 @@ i915_gem_userptr_ioctl(struct drm_device *dev, return -EFAULT; if (args->flags & I915_USERPTR_READ_ONLY) { - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; /* * On almost all of the older hw, we cannot tell the GPU that * a page is readonly. */ - ppgtt = dev_priv->kernel_context->ppgtt; - if (!ppgtt || !ppgtt->vm.has_read_only) + vm = dev_priv->kernel_context->vm; + if (!vm || !vm->has_read_only) return -ENODEV; } diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index ec2985c0a92e..232d5cf4396c 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -1038,8 +1038,7 @@ static int __igt_write_huge(struct i915_gem_context *ctx, u32 dword, u32 val) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; unsigned int flags = PIN_USER | PIN_OFFSET_FIXED; struct i915_vma *vma; int err; @@ -1092,8 +1091,7 @@ static int igt_write_huge(struct i915_gem_context *ctx, struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; static struct intel_engine_cs *engines[I915_NUM_ENGINES]; struct intel_engine_cs *engine; I915_RND_STATE(prng); @@ -1419,7 +1417,7 @@ static int igt_ppgtt_pin_update(void *arg) struct i915_gem_context *ctx = arg; struct drm_i915_private *dev_priv = ctx->i915; unsigned long supported = INTEL_INFO(dev_priv)->page_sizes; - struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; + struct i915_address_space *vm = ctx->vm; struct drm_i915_gem_object *obj; struct i915_vma *vma; unsigned int flags = PIN_USER | PIN_OFFSET_FIXED; @@ -1434,7 +1432,7 @@ static int igt_ppgtt_pin_update(void *arg) * huge-gtt-pages. */ - if (!ppgtt || !i915_vm_is_4lvl(&ppgtt->vm)) { + if (!vm || !i915_vm_is_4lvl(vm)) { pr_info("48b PPGTT not supported, skipping\n"); return 0; } @@ -1449,7 +1447,7 @@ static int igt_ppgtt_pin_update(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj); - vma = i915_vma_instance(obj, &ppgtt->vm, NULL); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto out_put; @@ -1503,7 +1501,7 @@ static int igt_ppgtt_pin_update(void *arg) if (IS_ERR(obj)) return PTR_ERR(obj); - vma = i915_vma_instance(obj, &ppgtt->vm, NULL); + vma = i915_vma_instance(obj, vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto out_put; @@ -1541,8 +1539,7 @@ static int igt_tmpfs_fallback(void *arg) struct i915_gem_context *ctx = arg; struct drm_i915_private *i915 = ctx->i915; struct vfsmount *gemfs = i915->mm.gemfs; - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct drm_i915_gem_object *obj; struct i915_vma *vma; u32 *vaddr; @@ -1599,8 +1596,7 @@ static int igt_shrink_thp(void *arg) { struct i915_gem_context *ctx = arg; struct drm_i915_private *i915 = ctx->i915; - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct drm_i915_gem_object *obj; struct i915_vma *vma; unsigned int flags = PIN_USER; @@ -1721,7 +1717,7 @@ int i915_gem_huge_page_mock_selftests(void) err = i915_subtests(tests, ppgtt); out_close: - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); out_unlock: mutex_unlock(&dev_priv->drm.struct_mutex); @@ -1766,8 +1762,8 @@ int i915_gem_huge_page_live_selftests(struct drm_i915_private *dev_priv) goto out_unlock; } - if (ctx->ppgtt) - ctx->ppgtt->vm.scrub_64K = true; + if (ctx->vm) + ctx->vm->scrub_64K = true; err = i915_subtests(tests, ctx); diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 41105f6ed206..74b0e5871c4b 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -248,8 +248,7 @@ static int gpu_fill(struct drm_i915_gem_object *obj, unsigned int dw) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct i915_request *rq; struct i915_vma *vma; struct i915_vma *batch; @@ -438,8 +437,7 @@ create_test_object(struct i915_gem_context *ctx, struct list_head *objects) { struct drm_i915_gem_object *obj; - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &ctx->i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &ctx->i915->ggtt.vm; u64 size; int err; @@ -541,7 +539,7 @@ static int igt_ctx_exec(void *arg) pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n", ndwords, dw, max_dwords(obj), engine->name, ctx->hw_id, - yesno(!!ctx->ppgtt), err); + yesno(!!ctx->vm), err); goto out_unlock; } @@ -612,7 +610,7 @@ static int igt_shared_ctx_exec(void *arg) goto out_unlock; } - if (!parent->ppgtt) { /* not full-ppgtt; nothing to share */ + if (!parent->vm) { /* not full-ppgtt; nothing to share */ err = 0; goto out_unlock; } @@ -643,7 +641,7 @@ static int igt_shared_ctx_exec(void *arg) goto out_test; } - __assign_ppgtt(ctx, parent->ppgtt); + __assign_ppgtt(ctx, parent->vm); if (!obj) { obj = create_test_object(parent, file, &objects); @@ -661,7 +659,7 @@ static int igt_shared_ctx_exec(void *arg) pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n", ndwords, dw, max_dwords(obj), engine->name, ctx->hw_id, - yesno(!!ctx->ppgtt), err); + yesno(!!ctx->vm), err); kernel_context_close(ctx); goto out_test; } @@ -758,7 +756,7 @@ emit_rpcs_query(struct drm_i915_gem_object *obj, GEM_BUG_ON(!intel_engine_can_store_dword(ce->engine)); - vma = i915_vma_instance(obj, &ce->gem_context->ppgtt->vm, NULL); + vma = i915_vma_instance(obj, ce->gem_context->vm, NULL); if (IS_ERR(vma)) return PTR_ERR(vma); @@ -1176,8 +1174,8 @@ static int igt_ctx_readonly(void *arg) { struct drm_i915_private *i915 = arg; struct drm_i915_gem_object *obj = NULL; + struct i915_address_space *vm; struct i915_gem_context *ctx; - struct i915_hw_ppgtt *ppgtt; unsigned long idx, ndwords, dw; struct igt_live_test t; struct drm_file *file; @@ -1208,8 +1206,8 @@ static int igt_ctx_readonly(void *arg) goto out_unlock; } - ppgtt = ctx->ppgtt ?: i915->mm.aliasing_ppgtt; - if (!ppgtt || !ppgtt->vm.has_read_only) { + vm = ctx->vm ?: &i915->mm.aliasing_ppgtt->vm; + if (!vm || !vm->has_read_only) { err = 0; goto out_unlock; } @@ -1244,7 +1242,7 @@ static int igt_ctx_readonly(void *arg) pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n", ndwords, dw, max_dwords(obj), engine->name, ctx->hw_id, - yesno(!!ctx->ppgtt), err); + yesno(!!ctx->vm), err); goto out_unlock; } @@ -1288,7 +1286,7 @@ static int igt_ctx_readonly(void *arg) static int check_scratch(struct i915_gem_context *ctx, u64 offset) { struct drm_mm_node *node = - __drm_mm_interval_first(&ctx->ppgtt->vm.mm, + __drm_mm_interval_first(&ctx->vm->mm, offset, offset + sizeof(u32) - 1); if (!node || node->start > offset) return 0; @@ -1336,7 +1334,7 @@ static int write_to_scratch(struct i915_gem_context *ctx, __i915_gem_object_flush_map(obj, 0, 64); i915_gem_object_unpin_map(obj); - vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL); + vma = i915_vma_instance(obj, ctx->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto err; @@ -1433,7 +1431,7 @@ static int read_from_scratch(struct i915_gem_context *ctx, i915_gem_object_flush_map(obj); i915_gem_object_unpin_map(obj); - vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL); + vma = i915_vma_instance(obj, ctx->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto err; @@ -1542,11 +1540,11 @@ static int igt_vm_isolation(void *arg) } /* We can only test vm isolation, if the vm are distinct */ - if (ctx_a->ppgtt == ctx_b->ppgtt) + if (ctx_a->vm == ctx_b->vm) goto out_unlock; - vm_total = ctx_a->ppgtt->vm.total; - GEM_BUG_ON(ctx_b->ppgtt->vm.total != vm_total); + vm_total = ctx_a->vm->total; + GEM_BUG_ON(ctx_b->vm->total != vm_total); vm_total -= I915_GTT_PAGE_SIZE; wakeref = intel_runtime_pm_get(i915); diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c index 6578f2f6c3f8..82371c60d4aa 100644 --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c @@ -48,7 +48,8 @@ mock_context(struct drm_i915_private *i915, if (!ppgtt) goto err_put; - __set_ppgtt(ctx, ppgtt); + __set_ppgtt(ctx, &ppgtt->vm); + i915_vm_put(&ppgtt->vm); } return ctx; diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 826fe77158a9..59ddd02d1149 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1551,7 +1551,7 @@ __execlists_context_pin(struct intel_context *ce, void *vaddr; int ret; - GEM_BUG_ON(!ce->gem_context->ppgtt); + GEM_BUG_ON(!ce->gem_context->vm); ret = execlists_context_deferred_alloc(ce, engine); if (ret) @@ -1668,7 +1668,8 @@ static int gen8_emit_init_breadcrumb(struct i915_request *rq) static int emit_pdps(struct i915_request *rq) { const struct intel_engine_cs * const engine = rq->engine; - struct i915_hw_ppgtt * const ppgtt = rq->gem_context->ppgtt; + struct i915_hw_ppgtt * const ppgtt = + i915_vm_to_ppgtt(rq->gem_context->vm); int err, i; u32 *cs; @@ -1741,7 +1742,7 @@ static int execlists_request_alloc(struct i915_request *request) */ /* Unconditionally invalidate GPU caches and TLBs. */ - if (i915_vm_is_4lvl(&request->gem_context->ppgtt->vm)) + if (i915_vm_is_4lvl(request->gem_context->vm)) ret = request->engine->emit_flush(request, EMIT_INVALIDATE); else ret = emit_pdps(request); @@ -2867,7 +2868,7 @@ static void execlists_init_reg_state(u32 *regs, struct intel_engine_cs *engine, struct intel_ring *ring) { - struct i915_hw_ppgtt *ppgtt = ce->gem_context->ppgtt; + struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(ce->gem_context->vm); bool rcs = engine->class == RENDER_CLASS; u32 base = engine->mmio_base; diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index 69718961aae8..d96bb679c2b0 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1326,23 +1326,23 @@ static void ring_context_destroy(struct kref *ref) static int __context_pin_ppgtt(struct i915_gem_context *ctx) { - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; int err = 0; - ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt; - if (ppgtt) - err = gen6_ppgtt_pin(ppgtt); + vm = ctx->vm ?: &ctx->i915->mm.aliasing_ppgtt->vm; + if (vm) + err = gen6_ppgtt_pin(i915_vm_to_ppgtt((vm))); return err; } static void __context_unpin_ppgtt(struct i915_gem_context *ctx) { - struct i915_hw_ppgtt *ppgtt; + struct i915_address_space *vm; - ppgtt = ctx->ppgtt ?: ctx->i915->mm.aliasing_ppgtt; - if (ppgtt) - gen6_ppgtt_unpin(ppgtt); + vm = ctx->vm ?: &ctx->i915->mm.aliasing_ppgtt->vm; + if (vm) + gen6_ppgtt_unpin(i915_vm_to_ppgtt(vm)); } static void ring_context_unpin(struct intel_context *ce) @@ -1664,14 +1664,16 @@ static int switch_context(struct i915_request *rq) { struct intel_engine_cs *engine = rq->engine; struct i915_gem_context *ctx = rq->gem_context; - struct i915_hw_ppgtt *ppgtt = ctx->ppgtt ?: rq->i915->mm.aliasing_ppgtt; + struct i915_address_space *vm = + ctx->vm ?: &rq->i915->mm.aliasing_ppgtt->vm; unsigned int unwind_mm = 0; u32 hw_flags = 0; int ret, i; GEM_BUG_ON(HAS_EXECLISTS(rq->i915)); - if (ppgtt) { + if (vm) { + struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); int loops; /* @@ -1718,7 +1720,7 @@ static int switch_context(struct i915_request *rq) goto err_mm; } - if (ppgtt) { + if (vm) { ret = engine->emit_flush(rq, EMIT_INVALIDATE); if (ret) goto err_mm; @@ -1761,7 +1763,7 @@ static int switch_context(struct i915_request *rq) err_mm: if (unwind_mm) - ppgtt->pd_dirty_engines |= unwind_mm; + i915_vm_to_ppgtt(vm)->pd_dirty_engines |= unwind_mm; err: return ret; } diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index c6016398c7e9..458c6559312d 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -128,8 +128,7 @@ static struct i915_request * hang_create_request(struct hang *h, struct intel_engine_cs *engine) { struct drm_i915_private *i915 = h->i915; - struct i915_address_space *vm = - h->ctx->ppgtt ? &h->ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = h->ctx->vm ?: &i915->ggtt.vm; struct drm_i915_gem_object *obj; struct i915_request *rq = NULL; struct i915_vma *hws, *vma; @@ -1357,8 +1356,8 @@ static int igt_reset_evict_ppgtt(void *arg) } err = 0; - if (ctx->ppgtt) /* aliasing == global gtt locking, covered above */ - err = __igt_reset_evict_vma(i915, &ctx->ppgtt->vm, + if (ctx->vm) /* aliasing == global gtt locking, covered above */ + err = __igt_reset_evict_vma(i915, ctx->vm, evict_vma, EXEC_OBJECT_WRITE); out: diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c index 9fc03a400685..77fc988f1886 100644 --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c @@ -1311,7 +1311,7 @@ static int smoke_submit(struct preempt_smoke *smoke, int err = 0; if (batch) { - vma = i915_vma_instance(batch, &ctx->ppgtt->vm, NULL); + vma = i915_vma_instance(batch, ctx->vm, NULL); if (IS_ERR(vma)) return PTR_ERR(vma); diff --git a/drivers/gpu/drm/i915/gt/selftest_workarounds.c b/drivers/gpu/drm/i915/gt/selftest_workarounds.c index 2cb1519fde42..c8d335d63f9c 100644 --- a/drivers/gpu/drm/i915/gt/selftest_workarounds.c +++ b/drivers/gpu/drm/i915/gt/selftest_workarounds.c @@ -358,7 +358,7 @@ static struct i915_vma *create_batch(struct i915_gem_context *ctx) if (IS_ERR(obj)) return ERR_CAST(obj); - vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL); + vma = i915_vma_instance(obj, ctx->vm, NULL); if (IS_ERR(vma)) { err = PTR_ERR(vma); goto err_obj; @@ -442,7 +442,7 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx, int err = 0, i, v; u32 *cs, *results; - scratch = create_scratch(&ctx->ppgtt->vm, 2 * ARRAY_SIZE(values) + 1); + scratch = create_scratch(ctx->vm, 2 * ARRAY_SIZE(values) + 1); if (IS_ERR(scratch)) return PTR_ERR(scratch); @@ -925,7 +925,7 @@ static int live_isolated_whitelist(void *arg) if (!intel_engines_has_context_isolation(i915)) return 0; - if (!i915->kernel_context->ppgtt) + if (!i915->kernel_context->vm) return 0; for (i = 0; i < ARRAY_SIZE(client); i++) { @@ -937,14 +937,14 @@ static int live_isolated_whitelist(void *arg) goto err; } - client[i].scratch[0] = create_scratch(&c->ppgtt->vm, 1024); + client[i].scratch[0] = create_scratch(c->vm, 1024); if (IS_ERR(client[i].scratch[0])) { err = PTR_ERR(client[i].scratch[0]); kernel_context_close(c); goto err; } - client[i].scratch[1] = create_scratch(&c->ppgtt->vm, 1024); + client[i].scratch[1] = create_scratch(c->vm, 1024); if (IS_ERR(client[i].scratch[1])) { err = PTR_ERR(client[i].scratch[1]); i915_vma_unpin_and_release(&client[i].scratch[0], 0); diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 31752b7ebff5..e1efff5e6424 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -368,7 +368,7 @@ static int set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload, struct i915_gem_context *ctx) { struct intel_vgpu_mm *mm = workload->shadow_mm; - struct i915_hw_ppgtt *ppgtt = ctx->ppgtt; + struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(ctx->vm); int i = 0; if (mm->type != INTEL_GVT_MM_PPGTT || !mm->ppgtt_mm.shadowed) @@ -1148,7 +1148,7 @@ void intel_vgpu_clean_submission(struct intel_vgpu *vgpu) intel_vgpu_select_submission_ops(vgpu, ALL_ENGINES, 0); - i915_context_ppgtt_root_restore(s, s->shadow[0]->gem_context->ppgtt); + i915_context_ppgtt_root_restore(s, i915_vm_to_ppgtt(s->shadow[0]->gem_context->vm)); for_each_engine(engine, vgpu->gvt->dev_priv, id) intel_context_unpin(s->shadow[id]); @@ -1213,7 +1213,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu) if (IS_ERR(ctx)) return PTR_ERR(ctx); - i915_context_ppgtt_root_save(s, ctx->ppgtt); + i915_context_ppgtt_root_save(s, i915_vm_to_ppgtt(ctx->vm)); for_each_engine(engine, vgpu->gvt->dev_priv, i) { struct intel_context *ce; @@ -1256,7 +1256,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu) return 0; out_shadow_ctx: - i915_context_ppgtt_root_restore(s, ctx->ppgtt); + i915_context_ppgtt_root_restore(s, i915_vm_to_ppgtt(ctx->vm)); for_each_engine(engine, vgpu->gvt->dev_priv, i) { if (IS_ERR(s->shadow[i])) break; diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index a2462d6ef565..e2d6b8dff4ab 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -328,7 +328,7 @@ static void print_context_stats(struct seq_file *m, i915_gem_context_unlock_engines(ctx); if (!IS_ERR_OR_NULL(ctx->file_priv)) { - struct file_stats stats = { .vm = &ctx->ppgtt->vm, }; + struct file_stats stats = { .vm = ctx->vm, }; struct drm_file *file = ctx->file_priv->file; struct task_struct *task; char name[80]; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 8944b65af138..a043f920694e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -2639,12 +2639,6 @@ struct drm_gem_object *i915_gem_prime_import(struct drm_device *dev, struct dma_buf *i915_gem_prime_export(struct drm_device *dev, struct drm_gem_object *gem_obj, int flags); -static inline struct i915_hw_ppgtt * -i915_vm_to_ppgtt(struct i915_address_space *vm) -{ - return container_of(vm, struct i915_hw_ppgtt, vm); -} - static inline struct i915_gem_context * __i915_gem_context_lookup_rcu(struct drm_i915_file_private *file_priv, u32 id) { diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 429817c1d2b8..fd8e68c55585 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -483,6 +483,8 @@ static void vm_free_page(struct i915_address_space *vm, struct page *page) static void i915_address_space_init(struct i915_address_space *vm, int subclass) { + kref_init(&vm->ref); + /* * The vm->mutex must be reclaim safe (for use in the shrinker). * Do a dummy acquire now under fs_reclaim so that any allocation @@ -1229,9 +1231,8 @@ static int gen8_init_scratch(struct i915_address_space *vm) */ if (vm->has_read_only && vm->i915->kernel_context && - vm->i915->kernel_context->ppgtt) { - struct i915_address_space *clone = - &vm->i915->kernel_context->ppgtt->vm; + vm->i915->kernel_context->vm) { + struct i915_address_space *clone = vm->i915->kernel_context->vm; GEM_BUG_ON(!clone->has_read_only); @@ -1590,8 +1591,6 @@ static int gen8_preallocate_top_level_pdp(struct i915_hw_ppgtt *ppgtt) static void ppgtt_init(struct drm_i915_private *i915, struct i915_hw_ppgtt *ppgtt) { - kref_init(&ppgtt->ref); - ppgtt->vm.i915 = i915; ppgtt->vm.dma = &i915->drm.pdev->dev; ppgtt->vm.total = BIT_ULL(INTEL_INFO(i915)->ppgtt_size); @@ -2258,20 +2257,22 @@ static void ppgtt_destroy_vma(struct i915_address_space *vm) i915_vma_destroy(vma); } -void i915_ppgtt_release(struct kref *kref) +void i915_vm_release(struct kref *kref) { - struct i915_hw_ppgtt *ppgtt = - container_of(kref, struct i915_hw_ppgtt, ref); + struct i915_address_space *vm = + container_of(kref, struct i915_address_space, ref); - trace_i915_ppgtt_release(&ppgtt->vm); + GEM_BUG_ON(i915_is_ggtt(vm)); + trace_i915_ppgtt_release(vm); - ppgtt_destroy_vma(&ppgtt->vm); + ppgtt_destroy_vma(vm); - GEM_BUG_ON(!list_empty(&ppgtt->vm.bound_list)); + GEM_BUG_ON(!list_empty(&vm->bound_list)); - ppgtt->vm.cleanup(&ppgtt->vm); - i915_address_space_fini(&ppgtt->vm); - kfree(ppgtt); + vm->cleanup(vm); + i915_address_space_fini(vm); + + kfree(vm); } /* Certain Gen5 chipsets require require idling the GPU before @@ -2836,7 +2837,7 @@ static int init_aliasing_ppgtt(struct drm_i915_private *i915) return 0; err_ppgtt: - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); return err; } @@ -2849,7 +2850,7 @@ static void fini_aliasing_ppgtt(struct drm_i915_private *i915) if (!ppgtt) return; - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); ggtt->vm.vma_ops.bind_vma = ggtt_bind_vma; ggtt->vm.vma_ops.unbind_vma = ggtt_unbind_vma; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 939f2f13ef7d..f56b19eb79c4 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -294,6 +294,8 @@ struct pagestash { }; struct i915_address_space { + struct kref ref; + struct drm_mm mm; struct drm_i915_private *i915; struct device *dma; @@ -420,7 +422,6 @@ struct i915_ggtt { struct i915_hw_ppgtt { struct i915_address_space vm; - struct kref ref; intel_engine_mask_t pd_dirty_engines; union { @@ -590,10 +591,19 @@ i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n) static inline struct i915_ggtt * i915_vm_to_ggtt(struct i915_address_space *vm) { + BUILD_BUG_ON(offsetof(struct i915_ggtt, vm)); GEM_BUG_ON(!i915_is_ggtt(vm)); return container_of(vm, struct i915_ggtt, vm); } +static inline struct i915_hw_ppgtt * +i915_vm_to_ppgtt(struct i915_address_space *vm) +{ + BUILD_BUG_ON(offsetof(struct i915_hw_ppgtt, vm)); + GEM_BUG_ON(i915_is_ggtt(vm)); + return container_of(vm, struct i915_hw_ppgtt, vm); +} + #define INTEL_MAX_PPAT_ENTRIES 8 #define INTEL_PPAT_PERFECT_MATCH (~0U) @@ -636,18 +646,19 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv); int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv); struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv); -void i915_ppgtt_release(struct kref *kref); -static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt) +static inline struct i915_address_space * +i915_vm_get(struct i915_address_space *vm) { - kref_get(&ppgtt->ref); - return ppgtt; + kref_get(&vm->ref); + return vm; } -static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt) +void i915_vm_release(struct kref *kref); + +static inline void i915_vm_put(struct i915_address_space *vm) { - if (ppgtt) - kref_put(&ppgtt->ref, i915_ppgtt_release); + kref_put(&vm->ref, i915_vm_release); } int gen6_ppgtt_pin(struct i915_hw_ppgtt *base); diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 4139b9762b44..fba95bb0bc6a 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1421,7 +1421,7 @@ static void gem_record_rings(struct i915_gpu_state *error) struct i915_gem_context *ctx = request->gem_context; struct intel_ring *ring; - ee->vm = ctx->ppgtt ? &ctx->ppgtt->vm : &ggtt->vm; + ee->vm = ctx->vm ?: &ggtt->vm; record_context(&ee->context, ctx); diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h index 83b389e34b50..5c8cfaa70d72 100644 --- a/drivers/gpu/drm/i915/i915_trace.h +++ b/drivers/gpu/drm/i915/i915_trace.h @@ -977,7 +977,7 @@ DECLARE_EVENT_CLASS(i915_context, __entry->dev = ctx->i915->drm.primary->index; __entry->ctx = ctx; __entry->hw_id = ctx->hw_id; - __entry->vm = ctx->ppgtt ? &ctx->ppgtt->vm : NULL; + __entry->vm = ctx->vm; ), TP_printk("dev=%u, ctx=%p, ctx_vm=%p, hw_id=%u", diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index cb7193c9bc77..e90e5e2ca042 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -209,7 +209,7 @@ static int igt_ppgtt_alloc(void *arg) err_ppgtt_cleanup: mutex_lock(&dev_priv->drm.struct_mutex); - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); mutex_unlock(&dev_priv->drm.struct_mutex); return err; } @@ -1021,7 +1021,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv, err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time); - i915_ppgtt_put(ppgtt); + i915_vm_put(&ppgtt->vm); out_unlock: mutex_unlock(&dev_priv->drm.struct_mutex); @@ -1251,7 +1251,6 @@ static int exercise_mock(struct drm_i915_private *i915, { const u64 limit = totalram_pages() << PAGE_SHIFT; struct i915_gem_context *ctx; - struct i915_hw_ppgtt *ppgtt; IGT_TIMEOUT(end_time); int err; @@ -1259,10 +1258,7 @@ static int exercise_mock(struct drm_i915_private *i915, if (!ctx) return -ENOMEM; - ppgtt = ctx->ppgtt; - GEM_BUG_ON(!ppgtt); - - err = func(i915, &ppgtt->vm, 0, min(ppgtt->vm.total, limit), end_time); + err = func(i915, ctx->vm, 0, min(ctx->vm->total, limit), end_time); mock_context_close(ctx); return err; diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c index 36ff8421c1a0..1bd1ee95cf8c 100644 --- a/drivers/gpu/drm/i915/selftests/i915_request.c +++ b/drivers/gpu/drm/i915/selftests/i915_request.c @@ -756,8 +756,7 @@ static int live_empty_request(void *arg) static struct i915_vma *recursive_batch(struct drm_i915_private *i915) { struct i915_gem_context *ctx = i915->kernel_context; - struct i915_address_space *vm = - ctx->ppgtt ? &ctx->ppgtt->vm : &i915->ggtt.vm; + struct i915_address_space *vm = ctx->vm ?: &i915->ggtt.vm; struct drm_i915_gem_object *obj; const int gen = INTEL_GEN(i915); struct i915_vma *vma; diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 6f3e41c0cb3f..a166d9405a94 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -38,7 +38,7 @@ static bool assert_vma(struct i915_vma *vma, { bool ok = true; - if (vma->vm != &ctx->ppgtt->vm) { + if (vma->vm != ctx->vm) { pr_err("VMA created with wrong VM\n"); ok = false; } @@ -113,7 +113,7 @@ static int create_vmas(struct drm_i915_private *i915, list_for_each_entry(obj, objects, st_link) { for (pinned = 0; pinned <= 1; pinned++) { list_for_each_entry(ctx, contexts, link) { - struct i915_address_space *vm = &ctx->ppgtt->vm; + struct i915_address_space *vm = ctx->vm; struct i915_vma *vma; int err; diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c index 3ea77c0ca678..1e59b543cf27 100644 --- a/drivers/gpu/drm/i915/selftests/igt_spinner.c +++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c @@ -89,17 +89,16 @@ igt_spinner_create_request(struct igt_spinner *spin, struct intel_engine_cs *engine, u32 arbitration_command) { - struct i915_address_space *vm = &ctx->ppgtt->vm; struct i915_request *rq = NULL; struct i915_vma *hws, *vma; u32 *batch; int err; - vma = i915_vma_instance(spin->obj, vm, NULL); + vma = i915_vma_instance(spin->obj, ctx->vm, NULL); if (IS_ERR(vma)) return ERR_CAST(vma); - hws = i915_vma_instance(spin->hws, vm, NULL); + hws = i915_vma_instance(spin->hws, ctx->vm, NULL); if (IS_ERR(hws)) return ERR_CAST(hws); diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index cd83929fde8e..9e61c2f06cc9 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -65,7 +65,6 @@ mock_ppgtt(struct drm_i915_private *i915, if (!ppgtt) return NULL; - kref_init(&ppgtt->ref); ppgtt->vm.i915 = i915; ppgtt->vm.total = round_down(U64_MAX, PAGE_SIZE); ppgtt->vm.file = ERR_PTR(-ENODEV); From patchwork Mon Jun 10 07:21:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984149 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AE0C1902 for ; Mon, 10 Jun 2019 07:21:39 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E4EE2286C2 for ; Mon, 10 Jun 2019 07:21:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D47252881C; Mon, 10 Jun 2019 07:21:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 82C6C286C2 for ; Mon, 10 Jun 2019 07:21:37 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 22E9B890FF; Mon, 10 Jun 2019 07:21:36 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2F9D1890FF for ; Mon, 10 Jun 2019 07:21:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848379-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:24 +0100 Message-Id: <20190610072126.6355-27-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 26/28] drm/i915: Rename i915_hw_ppgtt to i915_ppgtt X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Matthew Auld Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Keeping the _hw_ in there does not help to distinguish it from its brethren i915_ggtt, so drop it. Signed-off-by: Chris Wilson Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 8 +- .../gpu/drm/i915/gem/selftests/huge_pages.c | 12 +-- .../gpu/drm/i915/gem/selftests/mock_context.c | 2 +- drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 5 +- drivers/gpu/drm/i915/gvt/scheduler.c | 6 +- drivers/gpu/drm/i915/i915_drv.h | 2 +- drivers/gpu/drm/i915/i915_gem_gtt.c | 78 +++++++++---------- drivers/gpu/drm/i915/i915_gem_gtt.h | 28 +++---- drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 6 +- drivers/gpu/drm/i915/selftests/mock_gtt.c | 6 +- drivers/gpu/drm/i915/selftests/mock_gtt.h | 4 +- 12 files changed, 78 insertions(+), 83 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 342407ec0119..2e3e47bb1d11 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -513,7 +513,7 @@ i915_gem_create_context(struct drm_i915_private *dev_priv, unsigned int flags) return ctx; if (HAS_FULL_PPGTT(dev_priv)) { - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; ppgtt = i915_ppgtt_create(dev_priv); if (IS_ERR(ppgtt)) { @@ -807,7 +807,7 @@ int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data, struct drm_i915_private *i915 = to_i915(dev); struct drm_i915_gem_vm_control *args = data; struct drm_i915_file_private *file_priv = file->driver_priv; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; int err; if (!HAS_FULL_PPGTT(i915)) @@ -1030,7 +1030,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data) int i; if (i915_vm_is_4lvl(vm)) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); const dma_addr_t pd_daddr = px_dma(&ppgtt->pml4); cs = intel_ring_begin(rq, 6); @@ -1047,7 +1047,7 @@ static int emit_ppgtt_update(struct i915_request *rq, void *data) *cs++ = MI_NOOP; intel_ring_advance(rq, cs); } else if (HAS_LOGICAL_RING_CONTEXTS(engine->i915)) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2); if (IS_ERR(cs)) diff --git a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c index 232d5cf4396c..73e667b31cc4 100644 --- a/drivers/gpu/drm/i915/gem/selftests/huge_pages.c +++ b/drivers/gpu/drm/i915/gem/selftests/huge_pages.c @@ -368,7 +368,7 @@ static int igt_check_page_sizes(struct i915_vma *vma) static int igt_mock_exhaust_device_supported_pages(void *arg) { - struct i915_hw_ppgtt *ppgtt = arg; + struct i915_ppgtt *ppgtt = arg; struct drm_i915_private *i915 = ppgtt->vm.i915; unsigned int saved_mask = INTEL_INFO(i915)->page_sizes; struct drm_i915_gem_object *obj; @@ -447,7 +447,7 @@ static int igt_mock_exhaust_device_supported_pages(void *arg) static int igt_mock_ppgtt_misaligned_dma(void *arg) { - struct i915_hw_ppgtt *ppgtt = arg; + struct i915_ppgtt *ppgtt = arg; struct drm_i915_private *i915 = ppgtt->vm.i915; unsigned long supported = INTEL_INFO(i915)->page_sizes; struct drm_i915_gem_object *obj; @@ -575,7 +575,7 @@ static int igt_mock_ppgtt_misaligned_dma(void *arg) } static void close_object_list(struct list_head *objects, - struct i915_hw_ppgtt *ppgtt) + struct i915_ppgtt *ppgtt) { struct drm_i915_gem_object *obj, *on; @@ -595,7 +595,7 @@ static void close_object_list(struct list_head *objects, static int igt_mock_ppgtt_huge_fill(void *arg) { - struct i915_hw_ppgtt *ppgtt = arg; + struct i915_ppgtt *ppgtt = arg; struct drm_i915_private *i915 = ppgtt->vm.i915; unsigned long max_pages = ppgtt->vm.total >> PAGE_SHIFT; unsigned long page_num; @@ -716,7 +716,7 @@ static int igt_mock_ppgtt_huge_fill(void *arg) static int igt_mock_ppgtt_64K(void *arg) { - struct i915_hw_ppgtt *ppgtt = arg; + struct i915_ppgtt *ppgtt = arg; struct drm_i915_private *i915 = ppgtt->vm.i915; struct drm_i915_gem_object *obj; const struct object_info { @@ -1683,7 +1683,7 @@ int i915_gem_huge_page_mock_selftests(void) SUBTEST(igt_mock_ppgtt_64K), }; struct drm_i915_private *dev_priv; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; int err; dev_priv = mock_gem_device(); diff --git a/drivers/gpu/drm/i915/gem/selftests/mock_context.c b/drivers/gpu/drm/i915/gem/selftests/mock_context.c index 82371c60d4aa..be8974ccff24 100644 --- a/drivers/gpu/drm/i915/gem/selftests/mock_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/mock_context.c @@ -38,7 +38,7 @@ mock_context(struct drm_i915_private *i915, goto err_engines; if (name) { - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; ctx->name = kstrdup(name, GFP_KERNEL); if (!ctx->name) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index 59ddd02d1149..42d1a941c547 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1668,7 +1668,7 @@ static int gen8_emit_init_breadcrumb(struct i915_request *rq) static int emit_pdps(struct i915_request *rq) { const struct intel_engine_cs * const engine = rq->engine; - struct i915_hw_ppgtt * const ppgtt = + struct i915_ppgtt * const ppgtt = i915_vm_to_ppgtt(rq->gem_context->vm); int err, i; u32 *cs; @@ -2868,7 +2868,7 @@ static void execlists_init_reg_state(u32 *regs, struct intel_engine_cs *engine, struct intel_ring *ring) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(ce->gem_context->vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(ce->gem_context->vm); bool rcs = engine->class == RENDER_CLASS; u32 base = engine->mmio_base; diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index d96bb679c2b0..f4750207c393 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1468,8 +1468,7 @@ static const struct intel_context_ops ring_context_ops = { .destroy = ring_context_destroy, }; -static int load_pd_dir(struct i915_request *rq, - const struct i915_hw_ppgtt *ppgtt) +static int load_pd_dir(struct i915_request *rq, const struct i915_ppgtt *ppgtt) { const struct intel_engine_cs * const engine = rq->engine; u32 *cs; @@ -1673,7 +1672,7 @@ static int switch_context(struct i915_request *rq) GEM_BUG_ON(HAS_EXECLISTS(rq->i915)); if (vm) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); int loops; /* diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index e1efff5e6424..c36f904a6d54 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -368,7 +368,7 @@ static int set_context_ppgtt_from_shadow(struct intel_vgpu_workload *workload, struct i915_gem_context *ctx) { struct intel_vgpu_mm *mm = workload->shadow_mm; - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(ctx->vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(ctx->vm); int i = 0; if (mm->type != INTEL_GVT_MM_PPGTT || !mm->ppgtt_mm.shadowed) @@ -1120,7 +1120,7 @@ int intel_gvt_init_workload_scheduler(struct intel_gvt *gvt) static void i915_context_ppgtt_root_restore(struct intel_vgpu_submission *s, - struct i915_hw_ppgtt *ppgtt) + struct i915_ppgtt *ppgtt) { int i; @@ -1178,7 +1178,7 @@ void intel_vgpu_reset_submission(struct intel_vgpu *vgpu, static void i915_context_ppgtt_root_save(struct intel_vgpu_submission *s, - struct i915_hw_ppgtt *ppgtt) + struct i915_ppgtt *ppgtt) { int i; diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index a043f920694e..1855fe02347b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -780,7 +780,7 @@ struct i915_gem_mm { struct vfsmount *gemfs; /** PPGTT used for aliasing the PPGTT with the GTT */ - struct i915_hw_ppgtt *aliasing_ppgtt; + struct i915_ppgtt *aliasing_ppgtt; struct notifier_block oom_notifier; struct notifier_block vmap_notifier; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index fd8e68c55585..1a157bd43c2c 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -802,7 +802,7 @@ static void gen8_initialize_pml4(struct i915_address_space *vm, * context switching/execlist queuing code takes extra steps * to ensure that tlbs are flushed. */ -static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt) +static void mark_tlbs_dirty(struct i915_ppgtt *ppgtt) { ppgtt->pd_dirty_engines = ALL_ENGINES; } @@ -943,7 +943,7 @@ static void gen8_ppgtt_set_pml4e(struct i915_pml4 *pml4, static void gen8_ppgtt_clear_4lvl(struct i915_address_space *vm, u64 start, u64 length) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct i915_pml4 *pml4 = &ppgtt->pml4; struct i915_page_directory_pointer *pdp; unsigned int pml4e; @@ -996,7 +996,7 @@ static __always_inline struct gen8_insert_pte gen8_insert_pte(u64 start) } static __always_inline bool -gen8_ppgtt_insert_pte_entries(struct i915_hw_ppgtt *ppgtt, +gen8_ppgtt_insert_pte_entries(struct i915_ppgtt *ppgtt, struct i915_page_directory_pointer *pdp, struct sgt_dma *iter, struct gen8_insert_pte *idx, @@ -1057,7 +1057,7 @@ static void gen8_ppgtt_insert_3lvl(struct i915_address_space *vm, enum i915_cache_level cache_level, u32 flags) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct sgt_dma iter = sgt_dma(vma); struct gen8_insert_pte idx = gen8_insert_pte(vma->node.start); @@ -1191,7 +1191,7 @@ static void gen8_ppgtt_insert_4lvl(struct i915_address_space *vm, enum i915_cache_level cache_level, u32 flags) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct sgt_dma iter = sgt_dma(vma); struct i915_page_directory_pointer **pdps = ppgtt->pml4.pdps; @@ -1290,7 +1290,7 @@ static int gen8_init_scratch(struct i915_address_space *vm) return ret; } -static int gen8_ppgtt_notify_vgt(struct i915_hw_ppgtt *ppgtt, bool create) +static int gen8_ppgtt_notify_vgt(struct i915_ppgtt *ppgtt, bool create) { struct i915_address_space *vm = &ppgtt->vm; struct drm_i915_private *dev_priv = vm->i915; @@ -1351,7 +1351,7 @@ static void gen8_ppgtt_cleanup_3lvl(struct i915_address_space *vm, free_pdp(vm, pdp); } -static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt) +static void gen8_ppgtt_cleanup_4lvl(struct i915_ppgtt *ppgtt) { int i; @@ -1368,7 +1368,7 @@ static void gen8_ppgtt_cleanup_4lvl(struct i915_hw_ppgtt *ppgtt) static void gen8_ppgtt_cleanup(struct i915_address_space *vm) { struct drm_i915_private *i915 = vm->i915; - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); if (intel_vgpu_active(i915)) gen8_ppgtt_notify_vgt(ppgtt, false); @@ -1500,7 +1500,7 @@ static int gen8_ppgtt_alloc_3lvl(struct i915_address_space *vm, static int gen8_ppgtt_alloc_4lvl(struct i915_address_space *vm, u64 start, u64 length) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); struct i915_pml4 *pml4 = &ppgtt->pml4; struct i915_page_directory_pointer *pdp; u64 from = start; @@ -1556,7 +1556,7 @@ static int gen8_ppgtt_alloc_4lvl(struct i915_address_space *vm, return -ENOMEM; } -static int gen8_preallocate_top_level_pdp(struct i915_hw_ppgtt *ppgtt) +static int gen8_preallocate_top_level_pdp(struct i915_ppgtt *ppgtt) { struct i915_address_space *vm = &ppgtt->vm; struct i915_page_directory_pointer *pdp = &ppgtt->pdp; @@ -1589,7 +1589,7 @@ static int gen8_preallocate_top_level_pdp(struct i915_hw_ppgtt *ppgtt) } static void ppgtt_init(struct drm_i915_private *i915, - struct i915_hw_ppgtt *ppgtt) + struct i915_ppgtt *ppgtt) { ppgtt->vm.i915 = i915; ppgtt->vm.dma = &i915->drm.pdev->dev; @@ -1610,9 +1610,9 @@ static void ppgtt_init(struct drm_i915_private *i915, * space. * */ -static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915) +static struct i915_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915) { - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; int err; ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); @@ -1682,7 +1682,7 @@ static struct i915_hw_ppgtt *gen8_ppgtt_create(struct drm_i915_private *i915) } /* Write pde (index) from the page directory @pd to the page table @pt */ -static inline void gen6_write_pde(const struct gen6_hw_ppgtt *ppgtt, +static inline void gen6_write_pde(const struct gen6_ppgtt *ppgtt, const unsigned int pde, const struct i915_page_table *pt) { @@ -1739,7 +1739,7 @@ static void gen6_ppgtt_enable(struct drm_i915_private *dev_priv) static void gen6_ppgtt_clear_range(struct i915_address_space *vm, u64 start, u64 length) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); unsigned int first_entry = start / I915_GTT_PAGE_SIZE; unsigned int pde = first_entry / GEN6_PTES; unsigned int pte = first_entry % GEN6_PTES; @@ -1779,7 +1779,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, enum i915_cache_level cache_level, u32 flags) { - struct i915_hw_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); + struct i915_ppgtt *ppgtt = i915_vm_to_ppgtt(vm); unsigned first_entry = vma->node.start / I915_GTT_PAGE_SIZE; unsigned act_pt = first_entry / GEN6_PTES; unsigned act_pte = first_entry % GEN6_PTES; @@ -1817,7 +1817,7 @@ static void gen6_ppgtt_insert_entries(struct i915_address_space *vm, static int gen6_alloc_va_range(struct i915_address_space *vm, u64 start, u64 length) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); struct i915_page_table *pt; intel_wakeref_t wakeref; u64 from = start; @@ -1877,7 +1877,7 @@ static int gen6_alloc_va_range(struct i915_address_space *vm, return -ENOMEM; } -static int gen6_ppgtt_init_scratch(struct gen6_hw_ppgtt *ppgtt) +static int gen6_ppgtt_init_scratch(struct gen6_ppgtt *ppgtt) { struct i915_address_space * const vm = &ppgtt->base.vm; struct i915_page_table *unused; @@ -1912,7 +1912,7 @@ static void gen6_ppgtt_free_scratch(struct i915_address_space *vm) cleanup_scratch_page(vm); } -static void gen6_ppgtt_free_pd(struct gen6_hw_ppgtt *ppgtt) +static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt) { struct i915_page_table *pt; u32 pde; @@ -1970,7 +1970,7 @@ static const struct i915_vma_ops nop_vma_ops = { static void gen6_ppgtt_cleanup(struct i915_address_space *vm) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); struct gen6_ppgtt_cleanup_work *work = ppgtt->work; /* FIXME remove the struct_mutex to bring the locking under control */ @@ -2001,7 +2001,7 @@ static int pd_vma_bind(struct i915_vma *vma, u32 unused) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); - struct gen6_hw_ppgtt *ppgtt = vma->private; + struct gen6_ppgtt *ppgtt = vma->private; u32 ggtt_offset = i915_ggtt_offset(vma) / I915_GTT_PAGE_SIZE; struct i915_page_table *pt; unsigned int pde; @@ -2020,7 +2020,7 @@ static int pd_vma_bind(struct i915_vma *vma, static void pd_vma_unbind(struct i915_vma *vma) { - struct gen6_hw_ppgtt *ppgtt = vma->private; + struct gen6_ppgtt *ppgtt = vma->private; struct i915_page_table * const scratch_pt = ppgtt->base.vm.scratch_pt; struct i915_page_table *pt; unsigned int pde; @@ -2047,7 +2047,7 @@ static const struct i915_vma_ops pd_vma_ops = { .unbind_vma = pd_vma_unbind, }; -static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) +static struct i915_vma *pd_vma_create(struct gen6_ppgtt *ppgtt, int size) { struct drm_i915_private *i915 = ppgtt->base.vm.i915; struct i915_ggtt *ggtt = &i915->ggtt; @@ -2077,9 +2077,9 @@ static struct i915_vma *pd_vma_create(struct gen6_hw_ppgtt *ppgtt, int size) return vma; } -int gen6_ppgtt_pin(struct i915_hw_ppgtt *base) +int gen6_ppgtt_pin(struct i915_ppgtt *base) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(base); int err; GEM_BUG_ON(ppgtt->base.vm.closed); @@ -2111,9 +2111,9 @@ int gen6_ppgtt_pin(struct i915_hw_ppgtt *base) return err; } -void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base) +void gen6_ppgtt_unpin(struct i915_ppgtt *base) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(base); GEM_BUG_ON(!ppgtt->pin_count); if (--ppgtt->pin_count) @@ -2122,9 +2122,9 @@ void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base) i915_vma_unpin(ppgtt->vma); } -void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base) +void gen6_ppgtt_unpin_all(struct i915_ppgtt *base) { - struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base); + struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(base); if (!ppgtt->pin_count) return; @@ -2133,10 +2133,10 @@ void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base) i915_vma_unpin(ppgtt->vma); } -static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915) +static struct i915_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915) { struct i915_ggtt * const ggtt = &i915->ggtt; - struct gen6_hw_ppgtt *ppgtt; + struct gen6_ppgtt *ppgtt; int err; ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); @@ -2225,8 +2225,8 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv) return 0; } -static struct i915_hw_ppgtt * -__hw_ppgtt_create(struct drm_i915_private *i915) +static struct i915_ppgtt * +__ppgtt_create(struct drm_i915_private *i915) { if (INTEL_GEN(i915) < 8) return gen6_ppgtt_create(i915); @@ -2234,12 +2234,12 @@ __hw_ppgtt_create(struct drm_i915_private *i915) return gen8_ppgtt_create(i915); } -struct i915_hw_ppgtt * +struct i915_ppgtt * i915_ppgtt_create(struct drm_i915_private *i915) { - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; - ppgtt = __hw_ppgtt_create(i915); + ppgtt = __ppgtt_create(i915); if (IS_ERR(ppgtt)) return ppgtt; @@ -2705,7 +2705,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma, pte_flags |= PTE_READ_ONLY; if (flags & I915_VMA_LOCAL_BIND) { - struct i915_hw_ppgtt *appgtt = i915->mm.aliasing_ppgtt; + struct i915_ppgtt *appgtt = i915->mm.aliasing_ppgtt; if (!(vma->flags & I915_VMA_LOCAL_BIND)) { ret = appgtt->vm.allocate_va_range(&appgtt->vm, @@ -2804,7 +2804,7 @@ static void i915_gtt_color_adjust(const struct drm_mm_node *node, static int init_aliasing_ppgtt(struct drm_i915_private *i915) { struct i915_ggtt *ggtt = &i915->ggtt; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; int err; ppgtt = i915_ppgtt_create(i915); @@ -2844,7 +2844,7 @@ static int init_aliasing_ppgtt(struct drm_i915_private *i915) static void fini_aliasing_ppgtt(struct drm_i915_private *i915) { struct i915_ggtt *ggtt = &i915->ggtt; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; ppgtt = fetch_and_zero(&i915->mm.aliasing_ppgtt); if (!ppgtt) diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index f56b19eb79c4..0e9926b32408 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -420,7 +420,7 @@ struct i915_ggtt { struct drm_mm_node uc_fw; }; -struct i915_hw_ppgtt { +struct i915_ppgtt { struct i915_address_space vm; intel_engine_mask_t pd_dirty_engines; @@ -431,8 +431,8 @@ struct i915_hw_ppgtt { }; }; -struct gen6_hw_ppgtt { - struct i915_hw_ppgtt base; +struct gen6_ppgtt { + struct i915_ppgtt base; struct i915_vma *vma; gen6_pte_t __iomem *pd_addr; @@ -443,11 +443,11 @@ struct gen6_hw_ppgtt { struct gen6_ppgtt_cleanup_work *work; }; -#define __to_gen6_ppgtt(base) container_of(base, struct gen6_hw_ppgtt, base) +#define __to_gen6_ppgtt(base) container_of(base, struct gen6_ppgtt, base) -static inline struct gen6_hw_ppgtt *to_gen6_ppgtt(struct i915_hw_ppgtt *base) +static inline struct gen6_ppgtt *to_gen6_ppgtt(struct i915_ppgtt *base) { - BUILD_BUG_ON(offsetof(struct gen6_hw_ppgtt, base)); + BUILD_BUG_ON(offsetof(struct gen6_ppgtt, base)); return __to_gen6_ppgtt(base); } @@ -583,7 +583,7 @@ static inline u64 gen8_pte_count(u64 address, u64 length) } static inline dma_addr_t -i915_page_dir_dma_addr(const struct i915_hw_ppgtt *ppgtt, const unsigned n) +i915_page_dir_dma_addr(const struct i915_ppgtt *ppgtt, const unsigned int n) { return px_dma(ppgtt->pdp.page_directory[n]); } @@ -596,12 +596,12 @@ i915_vm_to_ggtt(struct i915_address_space *vm) return container_of(vm, struct i915_ggtt, vm); } -static inline struct i915_hw_ppgtt * +static inline struct i915_ppgtt * i915_vm_to_ppgtt(struct i915_address_space *vm) { - BUILD_BUG_ON(offsetof(struct i915_hw_ppgtt, vm)); + BUILD_BUG_ON(offsetof(struct i915_ppgtt, vm)); GEM_BUG_ON(i915_is_ggtt(vm)); - return container_of(vm, struct i915_hw_ppgtt, vm); + return container_of(vm, struct i915_ppgtt, vm); } #define INTEL_MAX_PPAT_ENTRIES 8 @@ -645,7 +645,7 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv); int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv); -struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv); +struct i915_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv); static inline struct i915_address_space * i915_vm_get(struct i915_address_space *vm) @@ -661,9 +661,9 @@ static inline void i915_vm_put(struct i915_address_space *vm) kref_put(&vm->ref, i915_vm_release); } -int gen6_ppgtt_pin(struct i915_hw_ppgtt *base); -void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base); -void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base); +int gen6_ppgtt_pin(struct i915_ppgtt *base); +void gen6_ppgtt_unpin(struct i915_ppgtt *base); +void gen6_ppgtt_unpin_all(struct i915_ppgtt *base); void i915_check_and_clear_faults(struct drm_i915_private *dev_priv); void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv); diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c index e90e5e2ca042..70c8bc626300 100644 --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c @@ -148,7 +148,7 @@ fake_dma_object(struct drm_i915_private *i915, u64 size) static int igt_ppgtt_alloc(void *arg) { struct drm_i915_private *dev_priv = arg; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; u64 size, last, limit; int err = 0; @@ -157,7 +157,7 @@ static int igt_ppgtt_alloc(void *arg) if (!HAS_PPGTT(dev_priv)) return 0; - ppgtt = __hw_ppgtt_create(dev_priv); + ppgtt = __ppgtt_create(dev_priv); if (IS_ERR(ppgtt)) return PTR_ERR(ppgtt); @@ -999,7 +999,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv, unsigned long end_time)) { struct drm_file *file; - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; IGT_TIMEOUT(end_time); int err; diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.c b/drivers/gpu/drm/i915/selftests/mock_gtt.c index 9e61c2f06cc9..f625c307a406 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.c +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.c @@ -55,11 +55,9 @@ static void mock_cleanup(struct i915_address_space *vm) { } -struct i915_hw_ppgtt * -mock_ppgtt(struct drm_i915_private *i915, - const char *name) +struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name) { - struct i915_hw_ppgtt *ppgtt; + struct i915_ppgtt *ppgtt; ppgtt = kzalloc(sizeof(*ppgtt), GFP_KERNEL); if (!ppgtt) diff --git a/drivers/gpu/drm/i915/selftests/mock_gtt.h b/drivers/gpu/drm/i915/selftests/mock_gtt.h index 40d544bde1d5..3387393286de 100644 --- a/drivers/gpu/drm/i915/selftests/mock_gtt.h +++ b/drivers/gpu/drm/i915/selftests/mock_gtt.h @@ -28,8 +28,6 @@ void mock_init_ggtt(struct drm_i915_private *i915, struct i915_ggtt *ggtt); void mock_fini_ggtt(struct i915_ggtt *ggtt); -struct i915_hw_ppgtt * -mock_ppgtt(struct drm_i915_private *i915, - const char *name); +struct i915_ppgtt *mock_ppgtt(struct drm_i915_private *i915, const char *name); #endif /* !__MOCK_GTT_H */ From patchwork Mon Jun 10 07:21:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984151 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 27D7D6C5 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 11780286C2 for ; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 061EA2881C; Mon, 10 Jun 2019 07:21:40 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 14AE5286C2 for ; Mon, 10 Jun 2019 07:21:39 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 36F1489105; Mon, 10 Jun 2019 07:21:36 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 0A93288FA4 for ; Mon, 10 Jun 2019 07:21:34 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848380-1500050 for multiple; Mon, 10 Jun 2019 08:21:31 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:25 +0100 Message-Id: <20190610072126.6355-28-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 27/28] drm/i915: Allow vma binding to occur asynchronously X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP If we let pages be allocated asynchronously, we also then want to push the binding process into an asynchronous task. Make it so, utilising the recent improvements to fence error tracking and struct_mutex reduction. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 16 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 + drivers/gpu/drm/i915/i915_gem_gtt.c | 28 ++- drivers/gpu/drm/i915/i915_vma.c | 160 +++++++++++++++--- drivers/gpu/drm/i915/i915_vma.h | 14 ++ drivers/gpu/drm/i915/selftests/i915_vma.c | 4 +- 6 files changed, 189 insertions(+), 37 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index eeb4c6cdb01d..ccb089420830 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -580,6 +580,15 @@ static int eb_reserve_vma(const struct i915_execbuffer *eb, u64 pin_flags; int err; + /* + * If we load the pages asynchronously, then the user *must* + * obey the reservation_object and not bypass waiting on it. + * On the positive side, if the vma is not yet bound (no pages!), + * then it should not have any annoying implicit fences. + */ + if (exec_flags & EXEC_OBJECT_ASYNC && !vma->pages) + *vma->exec_flags &= ~EXEC_OBJECT_ASYNC; + pin_flags = PIN_USER | PIN_NONBLOCK; if (exec_flags & EXEC_OBJECT_NEEDS_GTT) pin_flags |= PIN_GLOBAL; @@ -1187,7 +1196,7 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) obj->write_domain = 0; err = i915_request_await_object(rq, vma->obj, true); - if (err == 0) + if (!err) err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE); i915_vma_unlock(vma); @@ -1245,8 +1254,9 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, goto skip_request; i915_vma_lock(batch); - GEM_BUG_ON(!reservation_object_test_signaled_rcu(batch->resv, true)); - err = i915_vma_move_to_active(batch, rq, 0); + err = i915_request_await_object(rq, batch->obj, false); + if (err == 0) + err = i915_vma_move_to_active(batch, rq, 0); i915_vma_unlock(batch); if (err) goto skip_request; diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 74b0e5871c4b..e8a438d0b8a2 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -295,6 +295,10 @@ static int gpu_fill(struct drm_i915_gem_object *obj, goto err_batch; } + err = i915_request_await_object(rq, batch->obj, false); + if (err) + goto err_request; + flags = 0; if (INTEL_GEN(vm->i915) <= 5) flags |= I915_DISPATCH_SECURE; diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index 1a157bd43c2c..bfca8a4a88e2 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -30,6 +30,7 @@ #include #include #include +#include #include @@ -139,14 +140,14 @@ static inline void i915_ggtt_invalidate(struct drm_i915_private *i915) static int ppgtt_bind_vma(struct i915_vma *vma, enum i915_cache_level cache_level, - u32 unused) + u32 flags) { + struct i915_address_space *vm = vma->vm; u32 pte_flags; int err; - if (!(vma->flags & I915_VMA_LOCAL_BIND)) { - err = vma->vm->allocate_va_range(vma->vm, - vma->node.start, vma->size); + if (flags & I915_VMA_ALLOC_BIND) { + err = vm->allocate_va_range(vm, vma->node.start, vma->size); if (err) return err; } @@ -156,20 +157,26 @@ static int ppgtt_bind_vma(struct i915_vma *vma, if (i915_gem_object_is_readonly(vma->obj)) pte_flags |= PTE_READ_ONLY; - vma->vm->insert_entries(vma->vm, vma, cache_level, pte_flags); + vm->insert_entries(vm, vma, cache_level, pte_flags); return 0; } static void ppgtt_unbind_vma(struct i915_vma *vma) { - vma->vm->clear_range(vma->vm, vma->node.start, vma->size); + struct i915_address_space *vm = vma->vm; + + vm->clear_range(vm, vma->node.start, vma->size); } static int ppgtt_set_pages(struct i915_vma *vma) { GEM_BUG_ON(vma->pages); + wait_for_completion(&vma->obj->mm.completion); + if (IS_ERR(vma->obj->mm.pages)) + return PTR_ERR(vma->obj->mm.pages); + vma->pages = vma->obj->mm.pages; vma->page_sizes = vma->obj->mm.page_sizes; @@ -2707,7 +2714,7 @@ static int aliasing_gtt_bind_vma(struct i915_vma *vma, if (flags & I915_VMA_LOCAL_BIND) { struct i915_ppgtt *appgtt = i915->mm.aliasing_ppgtt; - if (!(vma->flags & I915_VMA_LOCAL_BIND)) { + if (flags & I915_VMA_ALLOC_BIND) { ret = appgtt->vm.allocate_va_range(&appgtt->vm, vma->node.start, vma->size); @@ -3896,13 +3903,18 @@ i915_get_ggtt_vma_pages(struct i915_vma *vma) { int ret; - /* The vma->pages are only valid within the lifespan of the borrowed + /* + * The vma->pages are only valid within the lifespan of the borrowed * obj->mm.pages. When the obj->mm.pages sg_table is regenerated, so * must be the vma->pages. A simple rule is that vma->pages must only * be accessed when the obj->mm.pages are pinned. */ GEM_BUG_ON(!i915_gem_object_has_pinned_pages(vma->obj)); + wait_for_completion(&vma->obj->mm.completion); + if (IS_ERR(vma->obj->mm.pages)) + return PTR_ERR(vma->obj->mm.pages); + switch (vma->ggtt_view.type) { default: GEM_BUG_ON(vma->ggtt_view.type); diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 2d7af763b928..89079630d0af 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -92,6 +92,70 @@ static void __i915_vma_retire(struct i915_active *ref) i915_vma_put(vma); } +static int __vma_bind(struct i915_vma *vma, + unsigned int cache_level, + unsigned int flags) +{ + int err; + + if (!vma->pages) { + err = vma->ops->set_pages(vma); + if (err) + return err; + } + + return vma->ops->bind_vma(vma, cache_level, flags); +} + +static void do_async_bind(struct work_struct *work) +{ + struct i915_vma *vma = container_of(work, typeof(*vma), async.work); + struct i915_address_space *vm = vma->vm; + int err; + + if (vma->async.dma->error) + goto out; + + err = __vma_bind(vma, vma->async.cache_level, vma->async.flags); + if (err) + dma_fence_set_error(vma->async.dma, err); + +out: + dma_fence_signal(vma->async.dma); + + i915_vma_unpin(vma); + i915_vma_put(vma); + i915_vm_put(vm); +} + +static void __queue_async_bind(struct dma_fence *f, struct dma_fence_cb *cb) +{ + struct i915_vma *vma = container_of(cb, typeof(*vma), async.cb); + + if (f->error) + dma_fence_set_error(vma->async.dma, f->error); + + INIT_WORK(&vma->async.work, do_async_bind); + queue_work(system_unbound_wq, &vma->async.work); +} + +static const char *async_bind_driver_name(struct dma_fence *fence) +{ + return DRIVER_NAME; +} + +static const char *async_bind_timeline_name(struct dma_fence *fence) +{ + return "bind"; +} + +static const struct dma_fence_ops async_bind_ops = { + .get_driver_name = async_bind_driver_name, + .get_timeline_name = async_bind_timeline_name, +}; + +static DEFINE_SPINLOCK(async_lock); + static struct i915_vma * vma_create(struct drm_i915_gem_object *obj, struct i915_address_space *vm, @@ -276,6 +340,54 @@ i915_vma_instance(struct drm_i915_gem_object *obj, return vma; } +static int queue_async_bind(struct i915_vma *vma, + enum i915_cache_level cache_level, + u32 flags) +{ + bool ready = true; + + /* We are not allowed to shrink inside vm->mutex! */ + vma->async.dma = kmalloc(sizeof(*vma->async.dma), + GFP_NOWAIT | __GFP_NOWARN); + if (!vma->async.dma) + return -ENOMEM; + + dma_fence_init(vma->async.dma, + &async_bind_ops, + &async_lock, + vma->vm->i915->mm.unordered_timeline, + 0); + + /* XXX find and avoid allocations under reservation_object locks */ + if (!i915_vma_trylock(vma)) { + kfree(fetch_and_zero(&vma->async.dma)); + return -EAGAIN; + } + + if (rcu_access_pointer(vma->resv->fence_excl)) { /* async pages */ + struct dma_fence *f = reservation_object_get_excl(vma->resv); + + if (!dma_fence_add_callback(f, + &vma->async.cb, + __queue_async_bind)) + ready = false; + } + reservation_object_add_excl_fence(vma->resv, vma->async.dma); + i915_vma_unlock(vma); + + i915_vm_get(vma->vm); + i915_vma_get(vma); + __i915_vma_pin(vma); /* avoid being shrunk */ + + vma->async.cache_level = cache_level; + vma->async.flags = flags; + + if (ready) + __queue_async_bind(vma->async.dma, &vma->async.cb); + + return 0; +} + /** * i915_vma_bind - Sets up PTEs for an VMA in it's corresponding address space. * @vma: VMA to map @@ -293,17 +405,12 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, u32 vma_flags; int ret; + GEM_BUG_ON(!flags); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(vma->size > vma->node.size); - - if (GEM_DEBUG_WARN_ON(range_overflows(vma->node.start, - vma->node.size, - vma->vm->total))) - return -ENODEV; - - if (GEM_DEBUG_WARN_ON(!flags)) - return -EINVAL; - + GEM_BUG_ON(range_overflows(vma->node.start, + vma->node.size, + vma->vm->total)); bind_flags = 0; if (flags & PIN_GLOBAL) bind_flags |= I915_VMA_GLOBAL_BIND; @@ -318,14 +425,18 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level, if (bind_flags == 0) return 0; - GEM_BUG_ON(!vma->pages); + if ((bind_flags & ~vma_flags) & I915_VMA_LOCAL_BIND) + bind_flags |= I915_VMA_ALLOC_BIND; trace_i915_vma_bind(vma, bind_flags); - ret = vma->ops->bind_vma(vma, cache_level, bind_flags); + if (bind_flags & I915_VMA_ALLOC_BIND) + ret = queue_async_bind(vma, cache_level, bind_flags); + else + ret = __vma_bind(vma, cache_level, bind_flags); if (ret) return ret; - vma->flags |= bind_flags; + vma->flags |= bind_flags & ~I915_VMA_ALLOC_BIND; return 0; } @@ -569,7 +680,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) } if (vma->obj) { - ret = i915_gem_object_pin_pages(vma->obj); + ret = i915_gem_object_pin_pages_async(vma->obj); if (ret) return ret; @@ -578,25 +689,19 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) cache_level = 0; } - GEM_BUG_ON(vma->pages); - - ret = vma->ops->set_pages(vma); - if (ret) - goto err_unpin; - if (flags & PIN_OFFSET_FIXED) { u64 offset = flags & PIN_OFFSET_MASK; if (!IS_ALIGNED(offset, alignment) || range_overflows(offset, size, end)) { ret = -EINVAL; - goto err_clear; + goto err_unpin; } ret = i915_gem_gtt_reserve(vma->vm, &vma->node, size, offset, cache_level, flags); if (ret) - goto err_clear; + goto err_unpin; } else { /* * We only support huge gtt pages through the 48b PPGTT, @@ -635,7 +740,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) size, alignment, cache_level, start, end, flags); if (ret) - goto err_clear; + goto err_unpin; GEM_BUG_ON(vma->node.start < start); GEM_BUG_ON(vma->node.start + vma->node.size > end); @@ -654,8 +759,6 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) return 0; -err_clear: - vma->ops->clear_pages(vma); err_unpin: if (vma->obj) i915_gem_object_unpin_pages(vma->obj); @@ -790,7 +893,7 @@ static void __i915_vma_destroy(struct i915_vma *vma) spin_lock(&obj->vma.lock); list_del(&vma->obj_link); - rb_erase(&vma->obj_node, &vma->obj->vma.tree); + rb_erase(&vma->obj_node, &obj->vma.tree); spin_unlock(&obj->vma.lock); } @@ -930,6 +1033,11 @@ int i915_vma_unbind(struct i915_vma *vma) lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); + if (vma->async.dma && + dma_fence_wait_timeout(vma->async.dma, true, + MAX_SCHEDULE_TIMEOUT) < 0) + return -EINTR; + ret = i915_active_wait(&vma->active); if (ret) return ret; @@ -975,6 +1083,8 @@ int i915_vma_unbind(struct i915_vma *vma) } vma->flags &= ~(I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND); + dma_fence_put(fetch_and_zero(&vma->async.dma)); + i915_vma_remove(vma); return 0; diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 71088ff4ad59..67e43f5d01f6 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -103,6 +103,7 @@ struct i915_vma { #define I915_VMA_GLOBAL_BIND BIT(9) #define I915_VMA_LOCAL_BIND BIT(10) #define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW) +#define I915_VMA_ALLOC_BIND I915_VMA_PIN_OVERFLOW /* not stored */ #define I915_VMA_GGTT BIT(11) #define I915_VMA_CAN_FENCE BIT(12) @@ -143,6 +144,14 @@ struct i915_vma { unsigned int *exec_flags; struct hlist_node exec_node; u32 exec_handle; + + struct i915_vma_async_bind { + struct dma_fence *dma; + struct dma_fence_cb cb; + struct work_struct work; + unsigned int cache_level; + unsigned int flags; + } async; }; struct i915_vma * @@ -305,6 +314,11 @@ static inline void i915_vma_lock(struct i915_vma *vma) reservation_object_lock(vma->resv, NULL); } +static inline bool i915_vma_trylock(struct i915_vma *vma) +{ + return reservation_object_trylock(vma->resv); +} + static inline void i915_vma_unlock(struct i915_vma *vma) { reservation_object_unlock(vma->resv); diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index a166d9405a94..615ac485c731 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -204,8 +204,10 @@ static int igt_vma_create(void *arg) mock_context_close(ctx); } - list_for_each_entry_safe(obj, on, &objects, st_link) + list_for_each_entry_safe(obj, on, &objects, st_link) { + i915_gem_object_wait(obj, I915_WAIT_ALL, MAX_SCHEDULE_TIMEOUT); i915_gem_object_put(obj); + } return err; } From patchwork Mon Jun 10 07:21:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 10984155 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ED6211580 for ; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D0E08286C2 for ; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C4DE82881C; Mon, 10 Jun 2019 07:21:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 5CC1A28812 for ; Mon, 10 Jun 2019 07:21:38 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 5907389107; Mon, 10 Jun 2019 07:21:36 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from fireflyinternet.com (mail.fireflyinternet.com [109.228.58.192]) by gabe.freedesktop.org (Postfix) with ESMTPS id 325AA88FA4 for ; Mon, 10 Jun 2019 07:21:33 +0000 (UTC) X-Default-Received-SPF: pass (skip=forwardok (res=PASS)) x-ip-name=78.156.65.138; Received: from haswell.alporthouse.com (unverified [78.156.65.138]) by fireflyinternet.com (Firefly Internet (M1)) with ESMTP id 16848381-1500050 for multiple; Mon, 10 Jun 2019 08:21:32 +0100 From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Mon, 10 Jun 2019 08:21:26 +0100 Message-Id: <20190610072126.6355-29-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20190610072126.6355-1-chris@chris-wilson.co.uk> References: <20190610072126.6355-1-chris@chris-wilson.co.uk> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH 28/28] drm/i915: Use vm->mutex for serialising GTT insertion X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Serialising insertion into each of the global GTT and ppGTT accounts for a large chunk of the current struct_mutex serialisation requireemnts. (Note that it is not just the drm_mm / gtt management itself being serialised, but the pin count and various flags.) Previously, the main blocker for replacing this mutex was the reset handing, but with the advent of "lockless" resets, we can freely take the vm->mutex and block waiting for the GPU (without fear of deadlock if the GPU hangs). We also proscribe allocations underneath vm->mutex, allowing us to take the mutex inside the shrinker, avoiding the recursive struct_mutex we previously used. Signed-off-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_client_blt.c | 7 +- drivers/gpu/drm/i915/gem/i915_gem_domain.c | 33 ++--- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 5 +- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 27 ++-- drivers/gpu/drm/i915/gem/i915_gem_object.c | 11 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 +- .../gpu/drm/i915/gem/i915_gem_object_types.h | 2 +- drivers/gpu/drm/i915/gem/i915_gem_shrinker.c | 93 +------------ drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 5 +- drivers/gpu/drm/i915/gem/i915_gem_tiling.c | 60 +++++---- drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 28 +--- .../drm/i915/gem/selftests/i915_gem_mman.c | 2 + drivers/gpu/drm/i915/gt/intel_context.c | 5 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 4 +- drivers/gpu/drm/i915/gvt/aperture_gm.c | 22 ++-- drivers/gpu/drm/i915/i915_debugfs.c | 10 +- drivers/gpu/drm/i915/i915_gem.c | 123 +++++++++++------- drivers/gpu/drm/i915/i915_gem_evict.c | 28 +--- drivers/gpu/drm/i915/i915_gem_fence_reg.c | 71 +++++----- drivers/gpu/drm/i915/i915_gem_gtt.c | 85 ++---------- drivers/gpu/drm/i915/i915_gem_gtt.h | 3 - drivers/gpu/drm/i915/i915_perf.c | 20 +-- drivers/gpu/drm/i915/i915_vma.c | 77 ++++++----- drivers/gpu/drm/i915/i915_vma.h | 61 +++------ drivers/gpu/drm/i915/intel_display.c | 32 +---- drivers/gpu/drm/i915/intel_fbdev.c | 8 +- drivers/gpu/drm/i915/intel_guc.c | 1 + drivers/gpu/drm/i915/intel_overlay.c | 12 +- drivers/gpu/drm/i915/selftests/i915_vma.c | 2 + 29 files changed, 301 insertions(+), 539 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c index f253ec5765ad..453f50182f55 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_client_blt.c @@ -154,7 +154,6 @@ static void clear_pages_dma_fence_cb(struct dma_fence *fence, static void clear_pages_worker(struct work_struct *work) { struct clear_pages_work *w = container_of(work, typeof(*w), work); - struct drm_i915_private *i915 = w->ce->gem_context->i915; struct drm_i915_gem_object *obj = w->sleeve->obj; struct i915_vma *vma = w->sleeve->vma; struct i915_request *rq; @@ -170,11 +169,9 @@ static void clear_pages_worker(struct work_struct *work) obj->cache_dirty = false; } - /* XXX: we need to kill this */ - mutex_lock(&i915->drm.struct_mutex); err = i915_vma_pin(vma, 0, 0, PIN_USER); if (unlikely(err)) - goto out_unlock; + goto out_signal; rq = i915_request_create(w->ce); if (IS_ERR(rq)) { @@ -210,8 +207,6 @@ static void clear_pages_worker(struct work_struct *work) i915_request_add(rq); out_unpin: i915_vma_unpin(vma); -out_unlock: - mutex_unlock(&i915->drm.struct_mutex); out_signal: if (unlikely(err)) { dma_fence_set_error(&w->dma, err); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c index 7e6ed767348c..56aade8fb702 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c @@ -26,7 +26,7 @@ static void __i915_gem_object_flush_for_display(struct drm_i915_gem_object *obj) void i915_gem_object_flush_if_display(struct drm_i915_gem_object *obj) { - if (!READ_ONCE(obj->pin_global)) + if (!atomic_read(&obj->pin_global)) return; i915_gem_object_lock(obj); @@ -369,16 +369,11 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data, if (ret) goto out; - ret = mutex_lock_interruptible(&i915->drm.struct_mutex); - if (ret) - goto out; - ret = i915_gem_object_lock_interruptible(obj); if (ret == 0) { ret = i915_gem_object_set_cache_level(obj, level); i915_gem_object_unlock(obj); } - mutex_unlock(&i915->drm.struct_mutex); out: i915_gem_object_put(obj); @@ -405,7 +400,7 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, /* Mark the global pin early so that we account for the * display coherency whilst setting up the cache domains. */ - obj->pin_global++; + atomic_inc(&obj->pin_global); /* The display engine is not coherent with the LLC cache on gen6. As * a result, we make sure that the pinning that is about to occur is @@ -455,25 +450,27 @@ i915_gem_object_pin_to_display_plane(struct drm_i915_gem_object *obj, return vma; err_unpin_global: - obj->pin_global--; + atomic_dec(&obj->pin_global); return vma; } static void i915_gem_object_bump_inactive_ggtt(struct drm_i915_gem_object *obj) { struct drm_i915_private *i915 = to_i915(obj->base.dev); - struct i915_vma *vma; GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj)); - mutex_lock(&i915->ggtt.vm.mutex); - for_each_ggtt_vma(vma, obj) { - if (!drm_mm_node_allocated(&vma->node)) - continue; + if (mutex_trylock(&i915->ggtt.vm.mutex)) { + struct i915_vma *vma; - list_move_tail(&vma->vm_link, &vma->vm->bound_list); + for_each_ggtt_vma(vma, obj) { + if (!drm_mm_node_allocated(&vma->node)) + continue; + + list_move_tail(&vma->vm_link, &vma->vm->bound_list); + } + mutex_unlock(&i915->ggtt.vm.mutex); } - mutex_unlock(&i915->ggtt.vm.mutex); if (i915_gem_object_is_shrinkable(obj)) { unsigned long flags; @@ -492,12 +489,10 @@ i915_gem_object_unpin_from_display_plane(struct i915_vma *vma) { struct drm_i915_gem_object *obj = vma->obj; - assert_object_held(obj); - - if (WARN_ON(obj->pin_global == 0)) + if (GEM_WARN_ON(!atomic_read(&obj->pin_global))) return; - if (--obj->pin_global == 0) + if (atomic_dec_and_test(&obj->pin_global)) vma->display_alignment = I915_GTT_MIN_ALIGNMENT; /* Bump the LRU to try and avoid premature eviction whilst flipping */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index ccb089420830..cdc1bd285c62 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -547,8 +547,11 @@ eb_add_vma(struct i915_execbuffer *eb, eb_unreserve_vma(vma, vma->exec_flags); list_add_tail(&vma->exec_link, &eb->unbound); - if (drm_mm_node_allocated(&vma->node)) + if (drm_mm_node_allocated(&vma->node)) { + mutex_lock(&vma->vm->mutex); err = i915_vma_unbind(vma); + mutex_unlock(&vma->vm->mutex); + } if (unlikely(err)) vma->exec_flags = NULL; } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index 49cf9ad97bfc..9fa0569afb50 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -248,16 +248,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) goto err_rpm; } - ret = i915_mutex_lock_interruptible(dev); - if (ret) - goto err_reset; - - /* Access to snoopable pages through the GTT is incoherent. */ - if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) { - ret = -EFAULT; - goto err_unlock; - } - /* Now pin it into the GTT as needed */ vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE | @@ -287,13 +277,19 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) } if (IS_ERR(vma)) { ret = PTR_ERR(vma); - goto err_unlock; + goto err_reset; } - ret = i915_vma_pin_fence(vma); + assert_rpm_wakelock_held(i915); + + ret = mutex_lock_interruptible(&vma->vm->mutex); if (ret) goto err_unpin; + ret = __i915_vma_pin_fence(vma); + if (ret) + goto err_unlock; + /* Finally, remap it using the new GTT offset */ ret = remap_io_mapping(area, area->vm_start + (vma->ggtt_view.partial.offset << PAGE_SHIFT), @@ -304,7 +300,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) goto err_fence; /* Mark as being mmapped into userspace for later revocation */ - assert_rpm_wakelock_held(i915); if (!i915_vma_set_userfault(vma) && !obj->userfault_count++) list_add(&obj->userfault_link, &i915->ggtt.userfault_list); if (CONFIG_DRM_I915_USERFAULT_AUTOSUSPEND) @@ -316,10 +311,10 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) err_fence: i915_vma_unpin_fence(vma); +err_unlock: + mutex_unlock(&vma->vm->mutex); err_unpin: __i915_vma_unpin(vma); -err_unlock: - mutex_unlock(&dev->struct_mutex); err_reset: i915_reset_unlock(i915, srcu); err_rpm: @@ -405,7 +400,7 @@ void i915_gem_object_release_mmap(struct drm_i915_gem_object *obj) * requirement that operations to the GGTT be made holding the RPM * wakeref. */ - lockdep_assert_held(&i915->drm.struct_mutex); + lockdep_assert_held(&i915->ggtt.vm.mutex); wakeref = intel_runtime_pm_get(i915); if (!obj->userfault_count) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c index ee69cd7948c0..a62ba5de56b4 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c @@ -174,12 +174,15 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, trace_i915_gem_object_destroy(obj); - mutex_lock(&i915->drm.struct_mutex); - list_for_each_entry_safe(vma, vn, &obj->vma.list, obj_link) { + struct i915_address_space *vm = vma->vm; + GEM_BUG_ON(i915_vma_is_active(vma)); - vma->flags &= ~I915_VMA_PIN_MASK; + atomic_set(&vma->pin_count, 0); + + mutex_lock(&vm->mutex); i915_vma_destroy(vma); + mutex_unlock(&vm->mutex); } GEM_BUG_ON(!list_empty(&obj->vma.list)); GEM_BUG_ON(!RB_EMPTY_ROOT(&obj->vma.tree)); @@ -200,8 +203,6 @@ static void __i915_gem_free_objects(struct drm_i915_private *i915, spin_unlock_irqrestore(&i915->mm.obj_lock, flags); } - mutex_unlock(&i915->drm.struct_mutex); - GEM_BUG_ON(atomic_read(&obj->bind_count)); GEM_BUG_ON(obj->userfault_count); GEM_BUG_ON(!list_empty(&obj->lut_list)); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h index 7ea8013d108f..0fc54924b33a 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -403,7 +403,8 @@ static inline bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) if (!(obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)) return true; - return obj->pin_global; /* currently in use by HW, keep flushed */ + /* Currently in use by HW? Keep flushed. */ + return atomic_read(&obj->pin_global); } static inline void __start_cpu_write(struct drm_i915_gem_object *obj) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h index 8f61d7a93078..f792953b8a71 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_object_types.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object_types.h @@ -159,7 +159,7 @@ struct drm_i915_gem_object { /** Count of VMA actually bound by this object */ atomic_t bind_count; /** Count of how many global VMA are currently pinned for use by HW */ - unsigned int pin_global; + atomic_t pin_global; struct { struct mutex lock; /* protects the pages and their use */ diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c index 48451110e736..4eeef6225a71 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shrinker.c @@ -16,40 +16,6 @@ #include "i915_trace.h" -static bool shrinker_lock(struct drm_i915_private *i915, - unsigned int flags, - bool *unlock) -{ - struct mutex *m = &i915->drm.struct_mutex; - - switch (mutex_trylock_recursive(m)) { - case MUTEX_TRYLOCK_RECURSIVE: - *unlock = false; - return true; - - case MUTEX_TRYLOCK_FAILED: - *unlock = false; - if (flags & I915_SHRINK_ACTIVE && - mutex_lock_killable_nested(m, I915_MM_SHRINKER) == 0) - *unlock = true; - return *unlock; - - case MUTEX_TRYLOCK_SUCCESS: - *unlock = true; - return true; - } - - BUG(); -} - -static void shrinker_unlock(struct drm_i915_private *i915, bool unlock) -{ - if (!unlock) - return; - - mutex_unlock(&i915->drm.struct_mutex); -} - static bool swap_available(void) { return get_nr_swap_pages() > 0; @@ -78,7 +44,7 @@ static bool can_release_pages(struct drm_i915_gem_object *obj) * To simplify the scan, and to avoid walking the list of vma under the * object, we just check the count of its permanently pinned. */ - if (READ_ONCE(obj->pin_global)) + if (atomic_read(&obj->pin_global)) return false; /* We can only return physical pages to the system if we can either @@ -154,10 +120,6 @@ i915_gem_shrink(struct drm_i915_private *i915, intel_wakeref_t wakeref = 0; unsigned long count = 0; unsigned long scanned = 0; - bool unlock; - - if (!shrinker_lock(i915, flags, &unlock)) - return 0; /* * When shrinking the active list, also consider active contexts. @@ -174,7 +136,6 @@ i915_gem_shrink(struct drm_i915_private *i915, MAX_SCHEDULE_TIMEOUT); trace_i915_gem_shrink(i915, target, flags); - i915_retire_requests(i915); /* * Unbinding of objects will require HW access; Let us not wake the @@ -216,13 +177,6 @@ i915_gem_shrink(struct drm_i915_private *i915, INIT_LIST_HEAD(&still_in_list); - /* - * We serialize our access to unreferenced objects through - * the use of the struct_mutex. While the objects are not - * yet freed (due to RCU then a workqueue) we still want - * to be able to shrink their pages, so they remain on - * the unbound/bound list until actually freed. - */ spin_lock_irqsave(&i915->mm.obj_lock, flags); while (count < target && (obj = list_first_entry_or_null(phase->list, @@ -270,10 +224,6 @@ i915_gem_shrink(struct drm_i915_private *i915, if (flags & I915_SHRINK_BOUND) intel_runtime_pm_put(i915, wakeref); - i915_retire_requests(i915); - - shrinker_unlock(i915, unlock); - if (nr_scanned) *nr_scanned += scanned; return count; @@ -343,13 +293,9 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) struct drm_i915_private *i915 = container_of(shrinker, struct drm_i915_private, mm.shrinker); unsigned long freed; - bool unlock; sc->nr_scanned = 0; - if (!shrinker_lock(i915, 0, &unlock)) - return SHRINK_STOP; - freed = i915_gem_shrink(i915, sc->nr_to_scan, &sc->nr_scanned, @@ -370,8 +316,6 @@ i915_gem_shrinker_scan(struct shrinker *shrinker, struct shrink_control *sc) } } - shrinker_unlock(i915, unlock); - return sc->nr_scanned ? freed : SHRINK_STOP; } @@ -423,16 +367,12 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr struct i915_vma *vma, *next; unsigned long freed_pages = 0; intel_wakeref_t wakeref; - bool unlock; - - if (!shrinker_lock(i915, 0, &unlock)) - return NOTIFY_DONE; /* Force everything onto the inactive lists */ if (i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED, MAX_SCHEDULE_TIMEOUT)) - goto out; + return NOTIFY_DONE; with_intel_runtime_pm(i915, wakeref) freed_pages += i915_gem_shrink(i915, -1UL, NULL, @@ -449,16 +389,11 @@ i915_gem_shrinker_vmap(struct notifier_block *nb, unsigned long event, void *ptr if (!vma->iomap || i915_vma_is_active(vma)) continue; - mutex_unlock(&i915->ggtt.vm.mutex); if (i915_vma_unbind(vma) == 0) freed_pages += count; - mutex_lock(&i915->ggtt.vm.mutex); } mutex_unlock(&i915->ggtt.vm.mutex); -out: - shrinker_unlock(i915, unlock); - *(unsigned long *)ptr += freed_pages; return NOTIFY_DONE; } @@ -500,37 +435,13 @@ void i915_gem_shrinker_unregister(struct drm_i915_private *i915) void i915_gem_shrinker_taints_mutex(struct drm_i915_private *i915, struct mutex *mutex) { - bool unlock = false; - if (!IS_ENABLED(CONFIG_LOCKDEP)) return; - if (!lockdep_is_held_type(&i915->drm.struct_mutex, -1)) { - mutex_acquire(&i915->drm.struct_mutex.dep_map, - I915_MM_NORMAL, 0, _RET_IP_); - unlock = true; - } - fs_reclaim_acquire(GFP_KERNEL); - /* - * As we invariably rely on the struct_mutex within the shrinker, - * but have a complicated recursion dance, taint all the mutexes used - * within the shrinker with the struct_mutex. For completeness, we - * taint with all subclass of struct_mutex, even though we should - * only need tainting by I915_MM_NORMAL to catch possible ABBA - * deadlocks from using struct_mutex inside @mutex. - */ - mutex_acquire(&i915->drm.struct_mutex.dep_map, - I915_MM_SHRINKER, 0, _RET_IP_); - mutex_acquire(&mutex->dep_map, 0, 0, _RET_IP_); mutex_release(&mutex->dep_map, 0, _RET_IP_); - mutex_release(&i915->drm.struct_mutex.dep_map, 0, _RET_IP_); - fs_reclaim_release(GFP_KERNEL); - - if (unlock) - mutex_release(&i915->drm.struct_mutex.dep_map, 0, _RET_IP_); } diff --git a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c index 65ba85655582..2990ef3604aa 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_stolen.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_stolen.c @@ -618,8 +618,6 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv if (!drm_mm_initialized(&dev_priv->mm.stolen)) return NULL; - lockdep_assert_held(&dev_priv->drm.struct_mutex); - DRM_DEBUG_DRIVER("creating preallocated stolen object: stolen_offset=%pa, gtt_offset=%pa, size=%pa\n", &stolen_offset, >t_offset, &size); @@ -671,10 +669,12 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv * setting up the GTT space. The actual reservation will occur * later. */ + mutex_lock(&ggtt->vm.mutex); ret = i915_gem_gtt_reserve(&ggtt->vm, &vma->node, size, gtt_offset, obj->cache_level, 0); if (ret) { + mutex_unlock(&ggtt->vm.mutex); DRM_DEBUG_DRIVER("failed to allocate stolen GTT space\n"); goto err_pages; } @@ -685,7 +685,6 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_i915_private *dev_priv vma->flags |= I915_VMA_GLOBAL_BIND; __i915_vma_set_map_and_fenceable(vma); - mutex_lock(&ggtt->vm.mutex); list_add_tail(&vma->vm_link, &ggtt->vm.bound_list); mutex_unlock(&ggtt->vm.mutex); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c index ca0c2f451742..f919dbb36edb 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_tiling.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_tiling.c @@ -212,15 +212,18 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, GEM_BUG_ON(!i915_tiling_ok(obj, tiling, stride)); GEM_BUG_ON(!stride ^ (tiling == I915_TILING_NONE)); - lockdep_assert_held(&i915->drm.struct_mutex); if ((tiling | stride) == obj->tiling_and_stride) return 0; - if (i915_gem_object_is_framebuffer(obj)) + i915_gem_object_lock(obj); + if (i915_gem_object_is_framebuffer(obj)) { + i915_gem_object_unlock(obj); return -EBUSY; + } - /* We need to rebind the object if its current allocation + /* + * We need to rebind the object if its current allocation * no longer meets the alignment restrictions for its new * tiling mode. Otherwise we can just leave it alone, but * need to ensure that any fence register is updated before @@ -233,17 +236,37 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, * whilst executing a fenced command for an untiled object. */ - err = i915_gem_object_fence_prepare(obj, tiling, stride); - if (err) + err = mutex_lock_interruptible(&i915->ggtt.vm.mutex); + if (err) { + i915_gem_object_unlock(obj); return err; + } - i915_gem_object_lock(obj); - if (i915_gem_object_is_framebuffer(obj)) { + err = i915_gem_object_fence_prepare(obj, tiling, stride); + if (err) { + mutex_unlock(&i915->ggtt.vm.mutex); i915_gem_object_unlock(obj); - return -EBUSY; + return err; } - /* If the memory has unknown (i.e. varying) swizzling, we pin the + for_each_ggtt_vma(vma, obj) { + vma->fence_size = + i915_gem_fence_size(i915, vma->size, tiling, stride); + vma->fence_alignment = + i915_gem_fence_alignment(i915, + vma->size, tiling, stride); + + if (vma->fence) + vma->fence->dirty = true; + } + + /* Force the fence to be reacquired for GTT access */ + i915_gem_object_release_mmap(obj); + + mutex_unlock(&i915->ggtt.vm.mutex); + + /* + * If the memory has unknown (i.e. varying) swizzling, we pin the * pages to prevent them being swapped out and causing corruption * due to the change in swizzling. */ @@ -264,23 +287,9 @@ i915_gem_object_set_tiling(struct drm_i915_gem_object *obj, } mutex_unlock(&obj->mm.lock); - for_each_ggtt_vma(vma, obj) { - vma->fence_size = - i915_gem_fence_size(i915, vma->size, tiling, stride); - vma->fence_alignment = - i915_gem_fence_alignment(i915, - vma->size, tiling, stride); - - if (vma->fence) - vma->fence->dirty = true; - } - obj->tiling_and_stride = tiling | stride; i915_gem_object_unlock(obj); - /* Force the fence to be reacquired for GTT access */ - i915_gem_object_release_mmap(obj); - /* Try to preallocate memory required to save swizzling on put-pages */ if (i915_gem_object_needs_bit17_swizzle(obj)) { if (!obj->bit_17) { @@ -364,12 +373,7 @@ i915_gem_set_tiling_ioctl(struct drm_device *dev, void *data, } } - err = mutex_lock_interruptible(&dev->struct_mutex); - if (err) - goto err; - err = i915_gem_object_set_tiling(obj, args->tiling_mode, args->stride); - mutex_unlock(&dev->struct_mutex); /* We have to maintain this existing ABI... */ args->stride = i915_gem_object_get_stride(obj); diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c index 528b61678334..f093deaeb5c0 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c @@ -93,7 +93,6 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, struct i915_mmu_notifier *mn = container_of(_mn, struct i915_mmu_notifier, mn); struct interval_tree_node *it; - struct mutex *unlock = NULL; unsigned long end; int ret = 0; @@ -130,32 +129,12 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, } spin_unlock(&mn->lock); - if (!unlock) { - unlock = &mn->mm->i915->drm.struct_mutex; - - switch (mutex_trylock_recursive(unlock)) { - default: - case MUTEX_TRYLOCK_FAILED: - if (mutex_lock_killable_nested(unlock, I915_MM_SHRINKER)) { - i915_gem_object_put(obj); - return -EINTR; - } - /* fall through */ - case MUTEX_TRYLOCK_SUCCESS: - break; - - case MUTEX_TRYLOCK_RECURSIVE: - unlock = ERR_PTR(-EEXIST); - break; - } - } - ret = i915_gem_object_unbind(obj); if (ret == 0) ret = __i915_gem_object_put_pages(obj, I915_MM_SHRINKER); i915_gem_object_put(obj); if (ret) - goto unlock; + return ret; spin_lock(&mn->lock); @@ -168,12 +147,7 @@ userptr_mn_invalidate_range_start(struct mmu_notifier *_mn, } spin_unlock(&mn->lock); -unlock: - if (!IS_ERR_OR_NULL(unlock)) - mutex_unlock(unlock); - return ret; - } static const struct mmu_notifier_ops i915_gem_userptr_notifier = { diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index b0ba1680ede6..fbbb89051c21 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -168,7 +168,9 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, if (err) return err; + mutex_lock(&to_i915(obj->base.dev)->ggtt.vm.mutex); i915_vma_destroy(vma); + mutex_unlock(&to_i915(obj->base.dev)->ggtt.vm.mutex); } return 0; diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c index a32698f7645f..e9b86787e8ad 100644 --- a/drivers/gpu/drm/i915/gt/intel_context.c +++ b/drivers/gpu/drm/i915/gt/intel_context.c @@ -110,7 +110,7 @@ static int __context_pin_state(struct i915_vma *vma) * And mark it as a globally pinned object to let the shrinker know * it cannot reclaim the object until we release it. */ - vma->obj->pin_global++; + atomic_inc(&vma->obj->pin_global); vma->obj->mm.dirty = true; return 0; @@ -118,7 +118,8 @@ static int __context_pin_state(struct i915_vma *vma) static void __context_unpin_state(struct i915_vma *vma) { - vma->obj->pin_global--; + GEM_BUG_ON(!atomic_read(&vma->obj->pin_global)); + atomic_dec(&vma->obj->pin_global); __i915_vma_unpin(vma); } diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index f4750207c393..2b4485d918f1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -1183,7 +1183,7 @@ int intel_ring_pin(struct intel_ring *ring) goto unpin_ring; } - vma->obj->pin_global++; + atomic_read(&vma->obj->pin_global); ring->vaddr = addr; return 0; @@ -1219,7 +1219,7 @@ void intel_ring_unpin(struct intel_ring *ring) i915_gem_object_unpin_map(ring->vma->obj); ring->vaddr = NULL; - ring->vma->obj->pin_global--; + atomic_dec(&ring->vma->obj->pin_global); i915_vma_unpin(ring->vma); i915_timeline_unpin(ring->timeline); diff --git a/drivers/gpu/drm/i915/gvt/aperture_gm.c b/drivers/gpu/drm/i915/gvt/aperture_gm.c index 4098902bfaeb..a2d7a1df820f 100644 --- a/drivers/gpu/drm/i915/gvt/aperture_gm.c +++ b/drivers/gpu/drm/i915/gvt/aperture_gm.c @@ -61,14 +61,14 @@ static int alloc_gm(struct intel_vgpu *vgpu, bool high_gm) flags = PIN_MAPPABLE; } - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&dev_priv->ggtt.vm.mutex); mmio_hw_access_pre(dev_priv); ret = i915_gem_gtt_insert(&dev_priv->ggtt.vm, node, size, I915_GTT_PAGE_SIZE, I915_COLOR_UNEVICTABLE, start, end, flags); mmio_hw_access_post(dev_priv); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); if (ret) gvt_err("fail to alloc %s gm space from host\n", high_gm ? "high" : "low"); @@ -98,9 +98,9 @@ static int alloc_vgpu_gm(struct intel_vgpu *vgpu) return 0; out_free_aperture: - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&dev_priv->ggtt.vm.mutex); drm_mm_remove_node(&vgpu->gm.low_gm_node); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); return ret; } @@ -108,10 +108,10 @@ static void free_vgpu_gm(struct intel_vgpu *vgpu) { struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&dev_priv->ggtt.vm.mutex); drm_mm_remove_node(&vgpu->gm.low_gm_node); drm_mm_remove_node(&vgpu->gm.high_gm_node); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); } /** @@ -172,14 +172,14 @@ static void free_vgpu_fence(struct intel_vgpu *vgpu) intel_runtime_pm_get(dev_priv); - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&dev_priv->ggtt.vm.mutex); _clear_vgpu_fence(vgpu); for (i = 0; i < vgpu_fence_sz(vgpu); i++) { reg = vgpu->fence.regs[i]; i915_unreserve_fence(reg); vgpu->fence.regs[i] = NULL; } - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); intel_runtime_pm_put_unchecked(dev_priv); } @@ -194,7 +194,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu) intel_runtime_pm_get(dev_priv); /* Request fences from host */ - mutex_lock(&dev_priv->drm.struct_mutex); + mutex_lock(&dev_priv->ggtt.vm.mutex); for (i = 0; i < vgpu_fence_sz(vgpu); i++) { reg = i915_reserve_fence(dev_priv); @@ -206,7 +206,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu) _clear_vgpu_fence(vgpu); - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); intel_runtime_pm_put_unchecked(dev_priv); return 0; out_free_fence: @@ -219,7 +219,7 @@ static int alloc_vgpu_fence(struct intel_vgpu *vgpu) i915_unreserve_fence(reg); vgpu->fence.regs[i] = NULL; } - mutex_unlock(&dev_priv->drm.struct_mutex); + mutex_unlock(&dev_priv->ggtt.vm.mutex); intel_runtime_pm_put_unchecked(dev_priv); return -ENOSPC; } diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index e2d6b8dff4ab..3e78b5d0d5a6 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -76,7 +76,7 @@ static int i915_capabilities(struct seq_file *m, void *data) static char get_pin_flag(struct drm_i915_gem_object *obj) { - return obj->pin_global ? 'p' : ' '; + return atomic_read(&obj->pin_global) ? 'p' : ' '; } static char get_tiling_flag(struct drm_i915_gem_object *obj) @@ -218,7 +218,7 @@ describe_obj(struct seq_file *m, struct drm_i915_gem_object *obj) seq_printf(m, " (pinned x %d)", pin_count); if (obj->stolen) seq_printf(m, " (stolen: %08llx)", obj->stolen->start); - if (obj->pin_global) + if (atomic_read(&obj->pin_global)) seq_printf(m, " (global)"); engine = i915_gem_object_last_write_engine(obj); @@ -1602,11 +1602,6 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data) struct drm_device *dev = &dev_priv->drm; struct intel_framebuffer *fbdev_fb = NULL; struct drm_framebuffer *drm_fb; - int ret; - - ret = mutex_lock_interruptible(&dev->struct_mutex); - if (ret) - return ret; #ifdef CONFIG_DRM_FBDEV_EMULATION if (dev_priv->fbdev && dev_priv->fbdev->helper.fb) { @@ -1641,7 +1636,6 @@ static int i915_gem_framebuffer_info(struct seq_file *m, void *data) seq_putc(m, '\n'); } mutex_unlock(&dev->mode_config.fb_lock); - mutex_unlock(&dev->struct_mutex); return 0; } diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index d74fcddd863e..fde995b05c81 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -60,20 +60,31 @@ #include "intel_pm.h" static int -insert_mappable_node(struct i915_ggtt *ggtt, - struct drm_mm_node *node, u32 size) +insert_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node, u32 size) { + int ret; + memset(node, 0, sizeof(*node)); - return drm_mm_insert_node_in_range(&ggtt->vm.mm, node, - size, 0, I915_COLOR_UNEVICTABLE, - 0, ggtt->mappable_end, - DRM_MM_INSERT_LOW); + + ret = mutex_lock_interruptible(&ggtt->vm.mutex); + if (ret) + return ret; + + ret = drm_mm_insert_node_in_range(&ggtt->vm.mm, node, + size, 0, I915_COLOR_UNEVICTABLE, + 0, ggtt->mappable_end, + DRM_MM_INSERT_LOW); + mutex_unlock(&ggtt->vm.mutex); + + return ret; } static void -remove_mappable_node(struct drm_mm_node *node) +remove_mappable_node(struct i915_ggtt *ggtt, struct drm_mm_node *node) { + mutex_lock(&ggtt->vm.mutex); drm_mm_remove_node(node); + mutex_unlock(&ggtt->vm.mutex); } int @@ -84,8 +95,11 @@ i915_gem_get_aperture_ioctl(struct drm_device *dev, void *data, struct drm_i915_gem_get_aperture *args = data; struct i915_vma *vma; u64 pinned; + int ret; - mutex_lock(&ggtt->vm.mutex); + ret = mutex_lock_interruptible(&ggtt->vm.mutex); + if (ret) + return ret; pinned = ggtt->vm.reserved; list_for_each_entry(vma, &ggtt->vm.bound_list, vm_link) @@ -106,8 +120,6 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj) LIST_HEAD(still_in_list); int ret = 0; - lockdep_assert_held(&obj->base.dev->struct_mutex); - spin_lock(&obj->vma.lock); while (!ret && (vma = list_first_entry_or_null(&obj->vma.list, struct i915_vma, @@ -115,7 +127,11 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj) list_move_tail(&vma->obj_link, &still_in_list); spin_unlock(&obj->vma.lock); - ret = i915_vma_unbind(vma); + ret = mutex_lock_interruptible(&vma->vm->mutex); + if (!ret) { + ret = i915_vma_unbind(vma); + mutex_unlock(&vma->vm->mutex); + } spin_lock(&obj->vma.lock); } @@ -371,10 +387,6 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj, u64 remain, offset; int ret; - ret = mutex_lock_interruptible(&i915->drm.struct_mutex); - if (ret) - return ret; - wakeref = intel_runtime_pm_get(i915); vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE | @@ -383,7 +395,12 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj, if (!IS_ERR(vma)) { node.start = i915_ggtt_offset(vma); node.allocated = false; - ret = i915_vma_put_fence(vma); + + ret = mutex_lock_interruptible(&vma->vm->mutex); + if (!ret) { + ret = i915_vma_put_fence(vma); + mutex_unlock(&vma->vm->mutex); + } if (ret) { i915_vma_unpin(vma); vma = ERR_PTR(ret); @@ -392,12 +409,10 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj, if (IS_ERR(vma)) { ret = insert_mappable_node(ggtt, &node, PAGE_SIZE); if (ret) - goto out_unlock; + goto out_rpm; GEM_BUG_ON(!node.allocated); } - mutex_unlock(&i915->drm.struct_mutex); - ret = i915_gem_object_lock_interruptible(obj); if (ret) goto out_unpin; @@ -453,17 +468,15 @@ i915_gem_gtt_pread(struct drm_i915_gem_object *obj, i915_gem_object_unlock_fence(obj, fence); out_unpin: - mutex_lock(&i915->drm.struct_mutex); if (node.allocated) { wmb(); ggtt->vm.clear_range(&ggtt->vm, node.start, node.size); - remove_mappable_node(&node); + remove_mappable_node(ggtt, &node); } else { i915_vma_unpin(vma); } -out_unlock: +out_rpm: intel_runtime_pm_put(i915, wakeref); - mutex_unlock(&i915->drm.struct_mutex); return ret; } @@ -570,10 +583,6 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, void __user *user_data; int ret; - ret = mutex_lock_interruptible(&i915->drm.struct_mutex); - if (ret) - return ret; - if (i915_gem_object_has_struct_page(obj)) { /* * Avoid waking the device up if we can fallback, as @@ -583,10 +592,8 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, * using the cache bypass of indirect GGTT access. */ wakeref = intel_runtime_pm_get_if_in_use(i915); - if (!wakeref) { - ret = -EFAULT; - goto out_unlock; - } + if (!wakeref) + return -EFAULT; } else { /* No backing pages, no fallback, we must force GGTT access */ wakeref = intel_runtime_pm_get(i915); @@ -599,7 +606,12 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, if (!IS_ERR(vma)) { node.start = i915_ggtt_offset(vma); node.allocated = false; - ret = i915_vma_put_fence(vma); + + ret = mutex_lock_interruptible(&vma->vm->mutex); + if (!ret) { + ret = i915_vma_put_fence(vma); + mutex_unlock(&vma->vm->mutex); + } if (ret) { i915_vma_unpin(vma); vma = ERR_PTR(ret); @@ -612,8 +624,6 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, GEM_BUG_ON(!node.allocated); } - mutex_unlock(&i915->drm.struct_mutex); - ret = i915_gem_object_lock_interruptible(obj); if (ret) goto out_unpin; @@ -676,18 +686,15 @@ i915_gem_gtt_pwrite_fast(struct drm_i915_gem_object *obj, i915_gem_object_unlock_fence(obj, fence); out_unpin: - mutex_lock(&i915->drm.struct_mutex); if (node.allocated) { wmb(); ggtt->vm.clear_range(&ggtt->vm, node.start, node.size); - remove_mappable_node(&node); + remove_mappable_node(ggtt, &node); } else { i915_vma_unpin(vma); } out_rpm: intel_runtime_pm_put(i915, wakeref); -out_unlock: - mutex_unlock(&i915->drm.struct_mutex); return ret; } @@ -1032,8 +1039,6 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, struct i915_vma *vma; int ret; - lockdep_assert_held(&obj->base.dev->struct_mutex); - if (flags & PIN_MAPPABLE && (!view || view->type == I915_GGTT_VIEW_NORMAL)) { /* If the required space is larger than the available @@ -1070,14 +1075,28 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, if (IS_ERR(vma)) return vma; + ret = i915_gem_object_pin_pages(obj); + if (ret) + return ERR_PTR(ret); + + ret = mutex_lock_interruptible(&vm->mutex); + if (ret) { + vma = ERR_PTR(ret); + goto unpin; + } + if (i915_vma_misplaced(vma, size, alignment, flags)) { if (flags & PIN_NONBLOCK) { - if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma)) - return ERR_PTR(-ENOSPC); + if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma)) { + vma = ERR_PTR(-ENOSPC); + goto unlock; + } if (flags & PIN_MAPPABLE && - vma->fence_size > dev_priv->ggtt.mappable_end / 2) - return ERR_PTR(-ENOSPC); + vma->fence_size > dev_priv->ggtt.mappable_end / 2) { + vma = ERR_PTR(-ENOSPC); + goto unlock; + } } WARN(i915_vma_is_pinned(vma), @@ -1088,14 +1107,20 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj, !!(flags & PIN_MAPPABLE), i915_vma_is_map_and_fenceable(vma)); ret = i915_vma_unbind(vma); - if (ret) - return ERR_PTR(ret); + if (ret) { + vma = ERR_PTR(ret); + goto unlock; + } } - ret = i915_vma_pin(vma, size, alignment, flags | PIN_GLOBAL); + ret = __i915_vma_do_pin(vma, size, alignment, flags | PIN_GLOBAL); if (ret) - return ERR_PTR(ret); + vma = ERR_PTR(ret); +unlock: + mutex_unlock(&vm->mutex); +unpin: + i915_gem_object_unpin_pages(obj); return vma; } @@ -1394,7 +1419,9 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915) * from the GTT to prevent such accidents and reclaim the * space. */ + mutex_lock(&state->vm->mutex); err = i915_vma_unbind(state); + mutex_unlock(&state->vm->mutex); if (err) goto err_active; diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c index a5783c4cb98b..0202f58d8bb5 100644 --- a/drivers/gpu/drm/i915/i915_gem_evict.c +++ b/drivers/gpu/drm/i915/i915_gem_evict.c @@ -48,8 +48,7 @@ static int ggtt_flush(struct drm_i915_private *i915) * bound by their active reference. */ return i915_gem_wait_for_idle(i915, - I915_WAIT_INTERRUPTIBLE | - I915_WAIT_LOCKED, + I915_WAIT_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT); } @@ -108,7 +107,7 @@ i915_gem_evict_something(struct i915_address_space *vm, struct i915_vma *active; int ret; - lockdep_assert_held(&vm->i915->drm.struct_mutex); + lockdep_assert_held(&vm->mutex); trace_i915_gem_evict(vm, min_size, alignment, flags); /* @@ -131,15 +130,6 @@ i915_gem_evict_something(struct i915_address_space *vm, min_size, alignment, cache_level, start, end, mode); - /* - * Retire before we search the active list. Although we have - * reasonable accuracy in our retirement lists, we may have - * a stray pin (preventing eviction) that can only be resolved by - * retiring. - */ - if (!(flags & PIN_NONBLOCK)) - i915_retire_requests(dev_priv); - search_again: active = NULL; INIT_LIST_HEAD(&eviction_list); @@ -273,20 +263,12 @@ int i915_gem_evict_for_node(struct i915_address_space *vm, bool check_color; int ret = 0; - lockdep_assert_held(&vm->i915->drm.struct_mutex); + lockdep_assert_held(&vm->mutex); GEM_BUG_ON(!IS_ALIGNED(start, I915_GTT_PAGE_SIZE)); GEM_BUG_ON(!IS_ALIGNED(end, I915_GTT_PAGE_SIZE)); trace_i915_gem_evict_node(vm, target, flags); - /* Retire before we search the active list. Although we have - * reasonable accuracy in our retirement lists, we may have - * a stray pin (preventing eviction) that can only be resolved by - * retiring. - */ - if (!(flags & PIN_NONBLOCK)) - i915_retire_requests(vm->i915); - check_color = vm->mm.color_adjust; if (check_color) { /* Expand search to cover neighbouring guard pages (or lack!) */ @@ -384,7 +366,7 @@ int i915_gem_evict_vm(struct i915_address_space *vm) struct i915_vma *vma, *next; int ret; - lockdep_assert_held(&vm->i915->drm.struct_mutex); + lockdep_assert_held(&vm->mutex); trace_i915_gem_evict_vm(vm); /* Switch back to the default context in order to unpin @@ -399,7 +381,6 @@ int i915_gem_evict_vm(struct i915_address_space *vm) } INIT_LIST_HEAD(&eviction_list); - mutex_lock(&vm->mutex); list_for_each_entry(vma, &vm->bound_list, vm_link) { if (i915_vma_is_pinned(vma)) continue; @@ -407,7 +388,6 @@ int i915_gem_evict_vm(struct i915_address_space *vm) __i915_vma_pin(vma); list_add(&vma->evict_link, &eviction_list); } - mutex_unlock(&vm->mutex); ret = 0; list_for_each_entry_safe(vma, next, &eviction_list, evict_link) { diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c index 543c5a47cc79..57f1ca4cc9fb 100644 --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c @@ -339,26 +339,8 @@ static struct i915_fence_reg *fence_find(struct drm_i915_private *i915) return ERR_PTR(-EDEADLK); } -/** - * i915_vma_pin_fence - set up fencing for a vma - * @vma: vma to map through a fence reg - * - * When mapping objects through the GTT, userspace wants to be able to write - * to them without having to worry about swizzling if the object is tiled. - * This function walks the fence regs looking for a free one for @obj, - * stealing one if it can't find any. - * - * It then sets up the reg based on the object's properties: address, pitch - * and tiling format. - * - * For an untiled surface, this removes any existing fence. - * - * Returns: - * - * 0 on success, negative error code on failure. - */ int -i915_vma_pin_fence(struct i915_vma *vma) +__i915_vma_pin_fence(struct i915_vma *vma) { struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); struct i915_fence_reg *fence; @@ -370,12 +352,9 @@ i915_vma_pin_fence(struct i915_vma *vma) * must keep the device awake whilst using the fence. */ assert_rpm_wakelock_held(ggtt->vm.i915); + lockdep_assert_held(&ggtt->vm.mutex); GEM_BUG_ON(!i915_vma_is_pinned(vma)); - err = mutex_lock_interruptible(&ggtt->vm.mutex); - if (err) - return err; - /* Just update our place in the LRU if our fence is getting reused. */ if (vma->fence) { fence = vma->fence; @@ -383,19 +362,17 @@ i915_vma_pin_fence(struct i915_vma *vma) atomic_inc(&fence->pin_count); if (!fence->dirty) { list_move_tail(&fence->link, &ggtt->fence_list); - goto unlock; + return 0; } } else if (set) { fence = fence_find(vma->vm->i915); - if (IS_ERR(fence)) { - err = PTR_ERR(fence); - goto unlock; - } + if (IS_ERR(fence)) + return PTR_ERR(fence); GEM_BUG_ON(atomic_read(&fence->pin_count)); atomic_inc(&fence->pin_count); } else { - goto unlock; + return 0; } err = fence_update(fence, set); @@ -406,12 +383,44 @@ i915_vma_pin_fence(struct i915_vma *vma) GEM_BUG_ON(vma->fence != (set ? fence : NULL)); if (set) - goto unlock; + return 0; out_unpin: atomic_dec(&fence->pin_count); -unlock: + return err; +} + +/** + * i915_vma_pin_fence - set up fencing for a vma + * @vma: vma to map through a fence reg + * + * When mapping objects through the GTT, userspace wants to be able to write + * to them without having to worry about swizzling if the object is tiled. + * This function walks the fence regs looking for a free one for @obj, + * stealing one if it can't find any. + * + * It then sets up the reg based on the object's properties: address, pitch + * and tiling format. + * + * For an untiled surface, this removes any existing fence. + * + * Returns: + * + * 0 on success, negative error code on failure. + */ +int +i915_vma_pin_fence(struct i915_vma *vma) +{ + struct i915_ggtt *ggtt = i915_vm_to_ggtt(vma->vm); + int err; + + err = mutex_lock_interruptible(&ggtt->vm.mutex); + if (err) + return err; + + err = __i915_vma_pin_fence(vma); mutex_unlock(&ggtt->vm.mutex); + return err; } diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c index bfca8a4a88e2..f4dbe4199120 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.c +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c @@ -1929,62 +1929,15 @@ static void gen6_ppgtt_free_pd(struct gen6_ppgtt *ppgtt) free_pt(&ppgtt->base.vm, pt); } -struct gen6_ppgtt_cleanup_work { - struct work_struct base; - struct i915_vma *vma; -}; - -static void gen6_ppgtt_cleanup_work(struct work_struct *wrk) -{ - struct gen6_ppgtt_cleanup_work *work = - container_of(wrk, typeof(*work), base); - /* Side note, vma->vm is the GGTT not the ppgtt we just destroyed! */ - struct drm_i915_private *i915 = work->vma->vm->i915; - - mutex_lock(&i915->drm.struct_mutex); - i915_vma_destroy(work->vma); - mutex_unlock(&i915->drm.struct_mutex); - - kfree(work); -} - -static int nop_set_pages(struct i915_vma *vma) -{ - return -ENODEV; -} - -static void nop_clear_pages(struct i915_vma *vma) -{ -} - -static int nop_bind(struct i915_vma *vma, - enum i915_cache_level cache_level, - u32 unused) -{ - return -ENODEV; -} - -static void nop_unbind(struct i915_vma *vma) -{ -} - -static const struct i915_vma_ops nop_vma_ops = { - .set_pages = nop_set_pages, - .clear_pages = nop_clear_pages, - .bind_vma = nop_bind, - .unbind_vma = nop_unbind, -}; - static void gen6_ppgtt_cleanup(struct i915_address_space *vm) { struct gen6_ppgtt *ppgtt = to_gen6_ppgtt(i915_vm_to_ppgtt(vm)); - struct gen6_ppgtt_cleanup_work *work = ppgtt->work; + struct i915_address_space *ggtt = ppgtt->vma->vm; - /* FIXME remove the struct_mutex to bring the locking under control */ - INIT_WORK(&work->base, gen6_ppgtt_cleanup_work); - work->vma = ppgtt->vma; - work->vma->ops = &nop_vma_ops; - schedule_work(&work->base); + mutex_lock(&ggtt->mutex); + GEM_BUG_ON(ppgtt->vma->vm != ggtt); + i915_vma_destroy(ppgtt->vma); + mutex_unlock(&ggtt->mutex); gen6_ppgtt_free_pd(ppgtt); gen6_ppgtt_free_scratch(vm); @@ -2159,15 +2112,9 @@ static struct i915_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915) ppgtt->base.vm.pte_encode = ggtt->vm.pte_encode; - ppgtt->work = kmalloc(sizeof(*ppgtt->work), GFP_KERNEL); - if (!ppgtt->work) { - err = -ENOMEM; - goto err_free; - } - err = gen6_ppgtt_init_scratch(ppgtt); if (err) - goto err_work; + goto err_free; ppgtt->vma = pd_vma_create(ppgtt, GEN6_PD_SIZE); if (IS_ERR(ppgtt->vma)) { @@ -2179,8 +2126,6 @@ static struct i915_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915) err_scratch: gen6_ppgtt_free_scratch(&ppgtt->base.vm); -err_work: - kfree(ppgtt->work); err_free: kfree(ppgtt); return ERR_PTR(err); @@ -2272,7 +2217,9 @@ void i915_vm_release(struct kref *kref) GEM_BUG_ON(i915_is_ggtt(vm)); trace_i915_ppgtt_release(vm); + mutex_lock(&vm->mutex); ppgtt_destroy_vma(vm); + mutex_unlock(&vm->mutex); GEM_BUG_ON(!list_empty(&vm->bound_list)); @@ -2944,11 +2891,12 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv) ggtt->vm.closed = true; - mutex_lock(&dev_priv->drm.struct_mutex); fini_aliasing_ppgtt(dev_priv); + mutex_lock(&ggtt->vm.mutex); list_for_each_entry_safe(vma, vn, &ggtt->vm.bound_list, vm_link) WARN_ON(i915_vma_unbind(vma)); + mutex_unlock(&ggtt->vm.mutex); if (drm_mm_node_allocated(&ggtt->error_capture)) drm_mm_remove_node(&ggtt->error_capture); @@ -2968,8 +2916,6 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv) __pagevec_release(pvec); } - mutex_unlock(&dev_priv->drm.struct_mutex); - arch_phys_wc_del(ggtt->mtrr); io_mapping_fini(&ggtt->iomap); @@ -3572,7 +3518,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv) * beyond the end of the batch buffer, across the page boundary, * and beyond the end of the GTT if we do not provide a guard. */ - mutex_lock(&dev_priv->drm.struct_mutex); i915_address_space_init(&ggtt->vm, VM_CLASS_GGTT); ggtt->vm.is_ggtt = true; @@ -3582,7 +3527,6 @@ int i915_ggtt_init_hw(struct drm_i915_private *dev_priv) if (!HAS_LLC(dev_priv) && !HAS_PPGTT(dev_priv)) ggtt->vm.mm.color_adjust = i915_gtt_color_adjust; - mutex_unlock(&dev_priv->drm.struct_mutex); if (!io_mapping_init_wc(&dev_priv->ggtt.iomap, dev_priv->ggtt.gmadr.start, @@ -3661,10 +3605,8 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv) if (!(vma->flags & I915_VMA_GLOBAL_BIND)) continue; - mutex_unlock(&ggtt->vm.mutex); - if (!i915_vma_unbind(vma)) - goto lock; + continue; WARN_ON(i915_vma_bind(vma, obj ? obj->cache_level : 0, @@ -3674,9 +3616,6 @@ void i915_gem_restore_gtt_mappings(struct drm_i915_private *dev_priv) WARN_ON(i915_gem_object_set_to_gtt_domain(obj, false)); i915_gem_object_unlock(obj); } - -lock: - mutex_lock(&ggtt->vm.mutex); } ggtt->vm.closed = false; @@ -4073,7 +4012,7 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, u64 offset; int err; - lockdep_assert_held(&vm->i915->drm.struct_mutex); + lockdep_assert_held(&vm->mutex); GEM_BUG_ON(!size); GEM_BUG_ON(!IS_ALIGNED(size, I915_GTT_PAGE_SIZE)); GEM_BUG_ON(alignment && !is_power_of_2(alignment)); diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h index 0e9926b32408..806ba2816a69 100644 --- a/drivers/gpu/drm/i915/i915_gem_gtt.h +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h @@ -439,8 +439,6 @@ struct gen6_ppgtt { unsigned int pin_count; bool scan_for_unused_pt; - - struct gen6_ppgtt_cleanup_work *work; }; #define __to_gen6_ppgtt(base) container_of(base, struct gen6_ppgtt, base) @@ -694,7 +692,6 @@ int i915_gem_gtt_insert(struct i915_address_space *vm, #define PIN_OFFSET_BIAS BIT_ULL(6) #define PIN_OFFSET_FIXED BIT_ULL(7) -#define PIN_MBZ BIT_ULL(8) /* I915_VMA_PIN_OVERFLOW */ #define PIN_GLOBAL BIT_ULL(9) /* I915_VMA_GLOBAL_BIND */ #define PIN_USER BIT_ULL(10) /* I915_VMA_LOCAL_BIND */ #define PIN_UPDATE BIT_ULL(11) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2e33a9b4eae7..517c3a16f1cd 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1347,13 +1347,9 @@ static void oa_put_render_ctx_id(struct i915_perf_stream *stream) static void free_oa_buffer(struct drm_i915_private *i915) { - mutex_lock(&i915->drm.struct_mutex); - i915_vma_unpin_and_release(&i915->perf.oa.oa_buffer.vma, I915_VMA_RELEASE_MAP); - mutex_unlock(&i915->drm.struct_mutex); - i915->perf.oa.oa_buffer.vaddr = NULL; } @@ -1505,19 +1501,12 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv) if (WARN_ON(dev_priv->perf.oa.oa_buffer.vma)) return -ENODEV; - ret = i915_mutex_lock_interruptible(&dev_priv->drm); - if (ret) - return ret; - BUILD_BUG_ON_NOT_POWER_OF_2(OA_BUFFER_SIZE); BUILD_BUG_ON(OA_BUFFER_SIZE < SZ_128K || OA_BUFFER_SIZE > SZ_16M); bo = i915_gem_object_create_shmem(dev_priv, OA_BUFFER_SIZE); - if (IS_ERR(bo)) { - DRM_ERROR("Failed to allocate OA buffer\n"); - ret = PTR_ERR(bo); - goto unlock; - } + if (IS_ERR(bo)) + return PTR_ERR(bo); i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC); @@ -1539,8 +1528,7 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv) DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p\n", i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma), dev_priv->perf.oa.oa_buffer.vaddr); - - goto unlock; + return 0; err_unpin: __i915_vma_unpin(vma); @@ -1551,8 +1539,6 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv) dev_priv->perf.oa.oa_buffer.vaddr = NULL; dev_priv->perf.oa.oa_buffer.vma = NULL; -unlock: - mutex_unlock(&dev_priv->drm.struct_mutex); return ret; } diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c index 89079630d0af..68b45d3bba9b 100644 --- a/drivers/gpu/drm/i915/i915_vma.c +++ b/drivers/gpu/drm/i915/i915_vma.c @@ -314,8 +314,6 @@ vma_lookup(struct drm_i915_gem_object *obj, * Once created, the VMA is kept until either the object is freed, or the * address space is closed. * - * Must be called with struct_mutex held. - * * Returns the vma, or an error pointer. */ struct i915_vma * @@ -448,7 +446,8 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) /* Access through the GTT requires the device to be awake. */ assert_rpm_wakelock_held(vma->vm->i915); - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); + mutex_lock(&vma->vm->mutex); + if (WARN_ON(!i915_vma_is_map_and_fenceable(vma))) { err = -ENODEV; goto err; @@ -472,16 +471,19 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma) __i915_vma_pin(vma); - err = i915_vma_pin_fence(vma); + err = __i915_vma_pin_fence(vma); if (err) goto err_unpin; i915_vma_set_ggtt_write(vma); + mutex_unlock(&vma->vm->mutex); + return ptr; err_unpin: __i915_vma_unpin(vma); err: + mutex_unlock(&vma->vm->mutex); return IO_ERR_PTR(err); } @@ -497,8 +499,6 @@ void i915_vma_flush_writes(struct i915_vma *vma) void i915_vma_unpin_iomap(struct i915_vma *vma) { - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); - GEM_BUG_ON(vma->iomap == NULL); i915_vma_flush_writes(vma); @@ -680,10 +680,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) } if (vma->obj) { - ret = i915_gem_object_pin_pages_async(vma->obj); - if (ret) - return ret; - + __i915_gem_object_pin_pages(vma->obj); cache_level = vma->obj->cache_level; } else { cache_level = 0; @@ -748,9 +745,7 @@ i915_vma_insert(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(!i915_gem_valid_gtt_space(vma, cache_level)); - mutex_lock(&vma->vm->mutex); list_add_tail(&vma->vm_link, &vma->vm->bound_list); - mutex_unlock(&vma->vm->mutex); if (vma->obj) { atomic_inc(&vma->obj->bind_count); @@ -773,10 +768,8 @@ i915_vma_remove(struct i915_vma *vma) vma->ops->clear_pages(vma); - mutex_lock(&vma->vm->mutex); drm_mm_remove_node(&vma->node); list_del(&vma->vm_link); - mutex_unlock(&vma->vm->mutex); /* * Since the unbound list is global, only move to that list if @@ -797,25 +790,20 @@ i915_vma_remove(struct i915_vma *vma) } } -int __i915_vma_do_pin(struct i915_vma *vma, - u64 size, u64 alignment, u64 flags) +int __i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { const unsigned int bound = vma->flags; int ret; - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); - GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0); - GEM_BUG_ON((flags & PIN_GLOBAL) && !i915_vma_is_ggtt(vma)); + lockdep_assert_held(&vma->vm->mutex); - if (WARN_ON(bound & I915_VMA_PIN_OVERFLOW)) { - ret = -EBUSY; - goto err_unpin; - } + if (atomic_read(&vma->pin_count)) + goto out; if ((bound & I915_VMA_BIND_MASK) == 0) { ret = i915_vma_insert(vma, size, alignment, flags); if (ret) - goto err_unpin; + return ret; } GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); @@ -829,6 +817,9 @@ int __i915_vma_do_pin(struct i915_vma *vma, __i915_vma_set_map_and_fenceable(vma); GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags)); + +out: + atomic_inc(&vma->pin_count); return 0; err_remove: @@ -837,11 +828,34 @@ int __i915_vma_do_pin(struct i915_vma *vma, GEM_BUG_ON(vma->pages); GEM_BUG_ON(vma->flags & I915_VMA_BIND_MASK); } -err_unpin: - __i915_vma_unpin(vma); return ret; } +int i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) +{ + int err; + + GEM_BUG_ON((flags & (PIN_GLOBAL | PIN_USER)) == 0); + GEM_BUG_ON((flags & PIN_GLOBAL) && !i915_vma_is_ggtt(vma)); + + if (vma->obj) { + err = i915_gem_object_pin_pages_async(vma->obj); + if (err) + return err; + } + + err = mutex_lock_interruptible(&vma->vm->mutex); + if (!err) { + err = __i915_vma_do_pin(vma, size, alignment, flags); + mutex_unlock(&vma->vm->mutex); + } + + if (vma->obj) + i915_gem_object_unpin_pages(vma->obj); + + return err; +} + void i915_vma_close(struct i915_vma *vma) { struct drm_i915_private *i915 = vma->vm->i915; @@ -904,7 +918,7 @@ static void __i915_vma_destroy(struct i915_vma *vma) void i915_vma_destroy(struct i915_vma *vma) { - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); + lockdep_assert_held(&vma->vm->mutex); GEM_BUG_ON(i915_vma_is_pinned(vma)); @@ -922,10 +936,14 @@ void i915_vma_parked(struct drm_i915_private *i915) spin_lock_irq(&i915->gt.closed_lock); list_for_each_entry_safe(vma, next, &i915->gt.closed_vma, closed_link) { + struct i915_address_space *vm = vma->vm; + list_del_init(&vma->closed_link); spin_unlock_irq(&i915->gt.closed_lock); + mutex_lock(&vm->mutex); i915_vma_destroy(vma); + mutex_unlock(&vm->mutex); spin_lock_irq(&i915->gt.closed_lock); } @@ -948,7 +966,8 @@ void i915_vma_revoke_mmap(struct i915_vma *vma) struct drm_vma_offset_node *node = &vma->obj->base.vma_node; u64 vma_offset; - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); + lockdep_assert_held(&vma->vm->mutex); + GEM_BUG_ON(!i915_vma_is_ggtt(vma)); if (!i915_vma_has_userfault(vma)) return; @@ -1031,7 +1050,7 @@ int i915_vma_unbind(struct i915_vma *vma) { int ret; - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); + lockdep_assert_held(&vma->vm->mutex); if (vma->async.dma && dma_fence_wait_timeout(vma->async.dma, true, diff --git a/drivers/gpu/drm/i915/i915_vma.h b/drivers/gpu/drm/i915/i915_vma.h index 67e43f5d01f6..675472c44a4b 100644 --- a/drivers/gpu/drm/i915/i915_vma.h +++ b/drivers/gpu/drm/i915/i915_vma.h @@ -72,38 +72,14 @@ struct i915_vma { * that exist in the ctx->handle_vmas LUT for this vma. */ atomic_t open_count; + atomic_t pin_count; unsigned long flags; - /** - * How many users have pinned this object in GTT space. - * - * This is a tightly bound, fairly small number of users, so we - * stuff inside the flags field so that we can both check for overflow - * and detect a no-op i915_vma_pin() in a single check, while also - * pinning the vma. - * - * The worst case display setup would have the same vma pinned for - * use on each plane on each crtc, while also building the next atomic - * state and holding a pin for the length of the cleanup queue. In the - * future, the flip queue may be increased from 1. - * Estimated worst case: 3 [qlen] * 4 [max crtcs] * 7 [max planes] = 84 - * - * For GEM, the number of concurrent users for pwrite/pread is - * unbounded. For execbuffer, it is currently one but will in future - * be extended to allow multiple clients to pin vma concurrently. - * - * We also use suballocated pages, with each suballocation claiming - * its own pin on the shared vma. At present, this is limited to - * exclusive cachelines of a single page, so a maximum of 64 possible - * users. - */ -#define I915_VMA_PIN_MASK 0xff -#define I915_VMA_PIN_OVERFLOW BIT(8) /** Flags and address space this VMA is bound to */ #define I915_VMA_GLOBAL_BIND BIT(9) #define I915_VMA_LOCAL_BIND BIT(10) -#define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND | I915_VMA_PIN_OVERFLOW) -#define I915_VMA_ALLOC_BIND I915_VMA_PIN_OVERFLOW /* not stored */ +#define I915_VMA_BIND_MASK (I915_VMA_GLOBAL_BIND | I915_VMA_LOCAL_BIND) +#define I915_VMA_ALLOC_BIND BIT(0) /* not stored */ #define I915_VMA_GGTT BIT(11) #define I915_VMA_CAN_FENCE BIT(12) @@ -324,30 +300,27 @@ static inline void i915_vma_unlock(struct i915_vma *vma) reservation_object_unlock(vma->resv); } -int __i915_vma_do_pin(struct i915_vma *vma, - u64 size, u64 alignment, u64 flags); +int __i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags); +int i915_vma_do_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags); + static inline int __must_check i915_vma_pin(struct i915_vma *vma, u64 size, u64 alignment, u64 flags) { - BUILD_BUG_ON(PIN_MBZ != I915_VMA_PIN_OVERFLOW); BUILD_BUG_ON(PIN_GLOBAL != I915_VMA_GLOBAL_BIND); BUILD_BUG_ON(PIN_USER != I915_VMA_LOCAL_BIND); - /* Pin early to prevent the shrinker/eviction logic from destroying - * our vma as we insert and bind. - */ - if (likely(((++vma->flags ^ flags) & I915_VMA_BIND_MASK) == 0)) { + if (atomic_add_unless(&vma->pin_count, 1, 0)) { GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); GEM_BUG_ON(i915_vma_misplaced(vma, size, alignment, flags)); return 0; } - return __i915_vma_do_pin(vma, size, alignment, flags); + return i915_vma_do_pin(vma, size, alignment, flags); } static inline int i915_vma_pin_count(const struct i915_vma *vma) { - return vma->flags & I915_VMA_PIN_MASK; + return atomic_read(&vma->pin_count); } static inline bool i915_vma_is_pinned(const struct i915_vma *vma) @@ -357,19 +330,18 @@ static inline bool i915_vma_is_pinned(const struct i915_vma *vma) static inline void __i915_vma_pin(struct i915_vma *vma) { - vma->flags++; - GEM_BUG_ON(vma->flags & I915_VMA_PIN_OVERFLOW); + atomic_inc(&vma->pin_count); } static inline void __i915_vma_unpin(struct i915_vma *vma) { - vma->flags--; + atomic_dec(&vma->pin_count); } static inline void i915_vma_unpin(struct i915_vma *vma) { - GEM_BUG_ON(!i915_vma_is_pinned(vma)); GEM_BUG_ON(!drm_mm_node_allocated(&vma->node)); + GEM_BUG_ON(!atomic_read(&vma->pin_count)); __i915_vma_unpin(vma); } @@ -388,8 +360,6 @@ static inline bool i915_vma_is_bound(const struct i915_vma *vma, * the caller must call i915_vma_unpin_iomap to relinquish the pinning * after the iomapping is no longer required. * - * Callers must hold the struct_mutex. - * * Returns a valid iomapped pointer or ERR_PTR. */ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma); @@ -401,8 +371,8 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma); * * Unpins the previously iomapped VMA from i915_vma_pin_iomap(). * - * Callers must hold the struct_mutex. This function is only valid to be - * called on a VMA previously iomapped by the caller with i915_vma_pin_iomap(). + * This function is only valid to be called on a VMA previously iomapped + * by the caller with i915_vma_pin_iomap(). */ void i915_vma_unpin_iomap(struct i915_vma *vma); @@ -412,6 +382,8 @@ static inline struct page *i915_vma_first_page(struct i915_vma *vma) return sg_page(vma->pages->sgl); } +int __i915_vma_pin_fence(struct i915_vma *vma); + /** * i915_vma_pin_fence - pin fencing state * @vma: vma to pin fencing for @@ -447,7 +419,6 @@ static inline void __i915_vma_unpin_fence(struct i915_vma *vma) static inline void i915_vma_unpin_fence(struct i915_vma *vma) { - /* lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); */ if (vma->fence) __i915_vma_unpin_fence(vma); } diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index 3a3298812ad8..aa94f4499dfe 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -2093,8 +2093,6 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, unsigned int pinctl; u32 alignment; - WARN_ON(!mutex_is_locked(&dev->struct_mutex)); - alignment = intel_surf_alignment(fb, 0); /* Note that the w/a also requires 64 PTE of padding following the @@ -2175,13 +2173,9 @@ intel_pin_and_fence_fb_obj(struct drm_framebuffer *fb, void intel_unpin_fb_vma(struct i915_vma *vma, unsigned long flags) { - lockdep_assert_held(&vma->vm->i915->drm.struct_mutex); - - i915_gem_object_lock(vma->obj); if (flags & PLANE_HAS_FENCE) i915_vma_unpin_fence(vma); i915_gem_object_unpin_from_display_plane(vma); - i915_gem_object_unlock(vma->obj); i915_vma_put(vma); } @@ -3077,12 +3071,10 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc, return false; } - mutex_lock(&dev->struct_mutex); obj = i915_gem_object_create_stolen_for_preallocated(dev_priv, base_aligned, base_aligned, size_aligned); - mutex_unlock(&dev->struct_mutex); if (!obj) return false; @@ -3244,13 +3236,11 @@ intel_find_initial_plane_obj(struct intel_crtc *intel_crtc, intel_state->color_plane[0].stride = intel_fb_pitch(fb, 0, intel_state->base.rotation); - mutex_lock(&dev->struct_mutex); intel_state->vma = intel_pin_and_fence_fb_obj(fb, &intel_state->view, intel_plane_uses_fence(intel_state), &intel_state->flags); - mutex_unlock(&dev->struct_mutex); if (IS_ERR(intel_state->vma)) { DRM_ERROR("failed to pin boot fb on pipe %d: %li\n", intel_crtc->pipe, PTR_ERR(intel_state->vma)); @@ -14152,8 +14142,6 @@ static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj) * bits. Some older platforms need special physical address handling for * cursor planes. * - * Must be called with struct_mutex held. - * * Returns 0 on success, negative error code on failure. */ int @@ -14210,15 +14198,8 @@ intel_prepare_plane_fb(struct drm_plane *plane, if (ret) return ret; - ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex); - if (ret) { - i915_gem_object_unpin_pages(obj); - return ret; - } - ret = intel_plane_pin_fb(to_intel_plane_state(new_state)); - mutex_unlock(&dev_priv->drm.struct_mutex); i915_gem_object_unpin_pages(obj); if (ret) return ret; @@ -14267,8 +14248,6 @@ intel_prepare_plane_fb(struct drm_plane *plane, * @old_state: the state from the previous modeset * * Cleans up a framebuffer that has just been removed from a plane. - * - * Must be called with struct_mutex held. */ void intel_cleanup_plane_fb(struct drm_plane *plane, @@ -14284,9 +14263,7 @@ intel_cleanup_plane_fb(struct drm_plane *plane, } /* Should only be called after a successful intel_prepare_plane_fb()! */ - mutex_lock(&dev_priv->drm.struct_mutex); intel_plane_unpin_fb(to_intel_plane_state(old_state)); - mutex_unlock(&dev_priv->drm.struct_mutex); } int @@ -14490,7 +14467,6 @@ intel_legacy_cursor_update(struct drm_plane *plane, u32 src_w, u32 src_h, struct drm_modeset_acquire_ctx *ctx) { - struct drm_i915_private *dev_priv = to_i915(crtc->dev); struct drm_plane_state *old_plane_state, *new_plane_state; struct intel_plane *intel_plane = to_intel_plane(plane); struct intel_crtc_state *crtc_state = @@ -14556,13 +14532,9 @@ intel_legacy_cursor_update(struct drm_plane *plane, if (ret) goto out_free; - ret = mutex_lock_interruptible(&dev_priv->drm.struct_mutex); - if (ret) - goto out_free; - ret = intel_plane_pin_fb(to_intel_plane_state(new_plane_state)); if (ret) - goto out_unlock; + goto out_free; intel_frontbuffer_flush(to_intel_frontbuffer(fb), ORIGIN_FLIP); intel_frontbuffer_track(to_intel_frontbuffer(old_plane_state->fb), @@ -14592,8 +14564,6 @@ intel_legacy_cursor_update(struct drm_plane *plane, intel_plane_unpin_fb(to_intel_plane_state(old_plane_state)); -out_unlock: - mutex_unlock(&dev_priv->drm.struct_mutex); out_free: if (new_crtc_state) intel_crtc_destroy_state(crtc, &new_crtc_state->base); diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index c7c11d1842af..1464157d0ffe 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -212,7 +212,6 @@ static int intelfb_create(struct drm_fb_helper *helper, sizes->fb_height = intel_fb->base.height; } - mutex_lock(&dev->struct_mutex); wakeref = intel_runtime_pm_get(dev_priv); /* Pin the GGTT vma for our access via info->screen_base. @@ -273,7 +272,6 @@ static int intelfb_create(struct drm_fb_helper *helper, ifbdev->vma_flags = flags; intel_runtime_pm_put(dev_priv, wakeref); - mutex_unlock(&dev->struct_mutex); vga_switcheroo_client_fb_set(pdev, info); return 0; @@ -281,7 +279,6 @@ static int intelfb_create(struct drm_fb_helper *helper, intel_unpin_fb_vma(vma, flags); out_unlock: intel_runtime_pm_put(dev_priv, wakeref); - mutex_unlock(&dev->struct_mutex); return ret; } @@ -298,11 +295,8 @@ static void intel_fbdev_destroy(struct intel_fbdev *ifbdev) drm_fb_helper_fini(&ifbdev->helper); - if (ifbdev->vma) { - mutex_lock(&ifbdev->helper.dev->struct_mutex); + if (ifbdev->vma) intel_unpin_fb_vma(ifbdev->vma, ifbdev->vma_flags); - mutex_unlock(&ifbdev->helper.dev->struct_mutex); - } if (ifbdev->fb) drm_framebuffer_remove(&ifbdev->fb->base); diff --git a/drivers/gpu/drm/i915/intel_guc.c b/drivers/gpu/drm/i915/intel_guc.c index 43232242d167..4a0e7691090f 100644 --- a/drivers/gpu/drm/i915/intel_guc.c +++ b/drivers/gpu/drm/i915/intel_guc.c @@ -673,6 +673,7 @@ struct i915_vma *intel_guc_allocate_vma(struct intel_guc *guc, u32 size) goto err; flags = PIN_GLOBAL | PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma); + ret = i915_vma_pin(vma, 0, 0, flags); if (ret) { vma = ERR_PTR(ret); diff --git a/drivers/gpu/drm/i915/intel_overlay.c b/drivers/gpu/drm/i915/intel_overlay.c index dc7b66c94f74..cec765ca3614 100644 --- a/drivers/gpu/drm/i915/intel_overlay.c +++ b/drivers/gpu/drm/i915/intel_overlay.c @@ -752,7 +752,6 @@ static int intel_overlay_do_put_image(struct intel_overlay *overlay, struct i915_vma *vma; int ret, tmp_width; - lockdep_assert_held(&dev_priv->drm.struct_mutex); WARN_ON(!drm_modeset_is_locked(&dev_priv->drm.mode_config.connection_mutex)); ret = intel_overlay_release_old_vid(overlay); @@ -1308,15 +1307,11 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys) struct i915_vma *vma; int err; - mutex_lock(&i915->drm.struct_mutex); - obj = i915_gem_object_create_stolen(i915, PAGE_SIZE); if (obj == NULL) obj = i915_gem_object_create_internal(i915, PAGE_SIZE); - if (IS_ERR(obj)) { - err = PTR_ERR(obj); - goto err_unlock; - } + if (IS_ERR(obj)) + return PTR_ERR(obj); vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0, PIN_MAPPABLE); if (IS_ERR(vma)) { @@ -1337,13 +1332,10 @@ static int get_registers(struct intel_overlay *overlay, bool use_phys) } overlay->reg_bo = obj; - mutex_unlock(&i915->drm.struct_mutex); return 0; err_put_bo: i915_gem_object_put(obj); -err_unlock: - mutex_unlock(&i915->drm.struct_mutex); return err; } diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c b/drivers/gpu/drm/i915/selftests/i915_vma.c index 615ac485c731..eeb0450c3f84 100644 --- a/drivers/gpu/drm/i915/selftests/i915_vma.c +++ b/drivers/gpu/drm/i915/selftests/i915_vma.c @@ -337,7 +337,9 @@ static int igt_vma_pin1(void *arg) if (!err) { i915_vma_unpin(vma); + mutex_lock(&ggtt->vm.mutex); err = i915_vma_unbind(vma); + mutex_unlock(&ggtt->vm.mutex); if (err) { pr_err("Failed to unbind single page from GGTT, err=%d\n", err); goto out;