From patchwork Sat Aug 24 00:20:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Daniele Ceraolo Spurio X-Patchwork-Id: 11112557 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 13A10112C for ; Sat, 24 Aug 2019 00:21:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id EE45421726 for ; Sat, 24 Aug 2019 00:21:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EE45421726 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 510586ED70; Sat, 24 Aug 2019 00:20:59 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by gabe.freedesktop.org (Postfix) with ESMTPS id 3E6BA6ED70 for ; Sat, 24 Aug 2019 00:20:57 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by fmsmga105.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 23 Aug 2019 17:20:56 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,422,1559545200"; d="scan'208";a="179309186" Received: from dceraolo-linux.fm.intel.com ([10.1.27.145]) by fmsmga008.fm.intel.com with ESMTP; 23 Aug 2019 17:20:56 -0700 From: Daniele Ceraolo Spurio To: intel-gfx@lists.freedesktop.org Date: Fri, 23 Aug 2019 17:20:22 -0700 Message-Id: <20190824002022.27636-1-daniele.ceraolospurio@intel.com> X-Mailer: git-send-email 2.23.0 MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH] drm/i915: use a separate context for gpu relocs X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The CS pre-parser can pre-fetch commands across memory sync points and starting from gen12 it is able to pre-fetch across BB_START and BB_END boundaries as well, so when we emit gpu relocs the pre-parser might fetch the target location of the reloc before the memory write lands. The parser can't pre-fetch across the ctx switch, so we use a separate context to guarantee that the memory is syncronized before the parser can get to it. Suggested-by: Chris Wilson Signed-off-by: Daniele Ceraolo Spurio Cc: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index b5f6937369ea..d27201c654e9 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -252,6 +252,7 @@ struct i915_execbuffer { bool has_fence : 1; bool needs_unfenced : 1; + struct intel_context *ce; struct i915_request *rq; u32 *rq_cmd; unsigned int rq_size; @@ -903,6 +904,7 @@ static void reloc_cache_init(struct reloc_cache *cache, cache->has_fence = cache->gen < 4; cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment; cache->node.allocated = false; + cache->ce = NULL; cache->rq = NULL; cache->rq_size = 0; } @@ -943,6 +945,7 @@ static void reloc_gpu_flush(struct reloc_cache *cache) static void reloc_cache_reset(struct reloc_cache *cache) { void *vaddr; + struct intel_context *ce; if (cache->rq) reloc_gpu_flush(cache); @@ -973,6 +976,10 @@ static void reloc_cache_reset(struct reloc_cache *cache) } } + ce = fetch_and_zero(&cache->ce); + if (ce) + intel_context_put(ce); + cache->vaddr = 0; cache->page = -1; } @@ -1168,7 +1175,7 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb, if (err) goto err_unmap; - rq = i915_request_create(eb->context); + rq = intel_context_create_request(cache->ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); goto err_unpin; @@ -1239,6 +1246,24 @@ static u32 *reloc_gpu(struct i915_execbuffer *eb, if (!intel_engine_can_store_dword(eb->engine)) return ERR_PTR(-ENODEV); + /* + * The CS pre-parser can pre-fetch commands across memory sync + * points and starting gen12 it is able to pre-fetch across + * BB_START and BB_END boundaries (within the same context). We + * use a separate context to guarantee that the reloc writes + * land before the parser gets to the target memory location. + */ + if (!cache->ce) { + struct intel_context *ce; + + ce = intel_context_create(eb->context->gem_context, + eb->engine); + if (IS_ERR(ce)) + return ERR_CAST(ce); + + cache->ce = ce; /* released in reloc_cache_reset */ + } + err = __reloc_gpu_alloc(eb, vma, len); if (unlikely(err)) return ERR_PTR(err);