From patchwork Fri Oct 7 09:46:34 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chris Wilson X-Patchwork-Id: 9365905 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 59330608A6 for ; Fri, 7 Oct 2016 09:47:47 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 48E212946C for ; Fri, 7 Oct 2016 09:47:47 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3DEE22946E; Fri, 7 Oct 2016 09:47:47 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id CA61F2946D for ; Fri, 7 Oct 2016 09:47:46 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EFE526EB5E; Fri, 7 Oct 2016 09:47:45 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mail-wm0-x243.google.com (mail-wm0-x243.google.com [IPv6:2a00:1450:400c:c09::243]) by gabe.freedesktop.org (Postfix) with ESMTPS id EC38E6EB58 for ; Fri, 7 Oct 2016 09:47:28 +0000 (UTC) Received: by mail-wm0-x243.google.com with SMTP id f193so2091900wmg.2 for ; Fri, 07 Oct 2016 02:47:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references; bh=seMe2VV7b7WKe5o25okez+/grMXRlRLGubAQ12xKNKY=; b=KikWpfRlmNuSiOEo/SLqFwCR+Y5ji+GdoypZwBuvpwK90kFuT5X2lf/QZ7KuR+NhPg R4Goq/k/MldVGAuhS3Kthj1nSyel2eaEcZUZGTIdWKn8p7KmFI0RgQ5oKP3UnKwR56dB r/UUpOZUTETrBEy7q+a7AX9KxEKDKoJRpE6WB6nQTrtQt23JTZr0SQLG/C7P8dMbZTX2 F2SWGcqQ58x6j5Lq49eScV1LlxWjmmbHffAC8tStejuKUc4nHPF40irsg2pzCf+0/8mV +hwye++gKQoSrstRwMjHepG6a1k0mJADityNsQoflq3BccBGkzTM5xCsh7sdsySQVb09 amCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references; bh=seMe2VV7b7WKe5o25okez+/grMXRlRLGubAQ12xKNKY=; b=eznxZzVhkwMxIUi/dczHinrRP39m6VjX+7yaiumlBPcOHAy/rhBF92xgNwADM8Aajk mZKBAQgnHz1+vaCz8Sawbzm/wye4Ce/TDZXxqiCkAmDFTq47lN45ZpSCxLtkNGZYZjJl nIx9Ovko4MBtPcaPStvqnROig9AzyFi3mzfHuHM19+rjb9MSdLBL8QdrO5VUt3D+O9jK esEKp8k1q4tpt80SA/7enANs100uZ5rKaJJTwb8I6c7Zzn8sz3bimHzCe0nLKG4KoMB9 cHRlfksfBg+x2vId8OBDqWpLuHSOOJ6mLm02+Z/GV7xKrf+2+4x5zkGJ0spwGbKR4g4h dTTQ== X-Gm-Message-State: AA6/9Rmvtri3cwJaZXCzMVCUZRSqnzS8hL+P1jn9oP9FWJF5xy1sE842FAHGP7zss20g5Q== X-Received: by 10.194.95.133 with SMTP id dk5mr12311875wjb.152.1475833647334; Fri, 07 Oct 2016 02:47:27 -0700 (PDT) Received: from haswell.alporthouse.com ([78.156.65.138]) by smtp.gmail.com with ESMTPSA id h3sm18877585wjp.45.2016.10.07.02.47.25 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Oct 2016 02:47:26 -0700 (PDT) From: Chris Wilson To: intel-gfx@lists.freedesktop.org Date: Fri, 7 Oct 2016 10:46:34 +0100 Message-Id: <20161007094635.28319-42-chris@chris-wilson.co.uk> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20161007094635.28319-1-chris@chris-wilson.co.uk> References: <20161007094635.28319-1-chris@chris-wilson.co.uk> Subject: [Intel-gfx] [PATCH 41/42] drm/i915: Enable userspace to opt-out of implicit fencing X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP Userspace is faced with a dilemma. The kernel requires implicit fencing to manage resource usage (we always must wait for the GPU to finish before releasing its PTE) and for third parties. However, userspace may wish to avoid this serialisation if it is either using explicit fencing between parties and wants more fine-grained access to buffers (e.g. it may partition the buffer between uses and track fences on ranges rather than the implicit fences tracking the whole object). It follows that userspace needs a mechanism to avoid the kernel's serialisation on its implicit fences before execbuf execution. The next question is whether this is an object, execbuf or context flag. Hybrid users (such as using explicit EGL_ANDROID_native_sync fencing on shared winsys buffers, but implicit fencing on internal surfaces) require a per-object level flag. Given that this flag need to be only set once for the lifetime of the object, this reduces the convenience of having an execbuf or context level flag (and avoids having multiple pieces of uABI controlling the same feature). Incorrect use of this flag will result in rendering corruption and GPU hangs - but will not result in use-after-free or similar resource tracking issues. Serious caveat: write ordering is not strictly correct after setting this flag on a render target on multiple engines. This affects all subsequent GEM operations (execbuf, set-domain, pread) and shared dma-buf operations. A fix is possible - but costly (both in terms of further ABI changes and runtime overhead). Testcase: igt/gem_exec_async Signed-off-by: Chris Wilson Reviewed-by: Joonas Lahtinen --- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/i915_gem_execbuffer.c | 3 +++ include/uapi/drm/i915_drm.h | 27 ++++++++++++++++++++++++++- 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index cd00b021bdfb..bf61e3c4caa3 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -332,6 +332,7 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_HANDLE_LUT: case I915_PARAM_HAS_COHERENT_PHYS_GTT: case I915_PARAM_HAS_EXEC_SOFTPIN: + case I915_PARAM_HAS_EXEC_ASYNC: /* For the time being all of these are always true; * if some supported hardware does not have one of these * features this value needs to be provided from diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index b4865bcc8a3e..1fde95dc2e4b 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1112,6 +1112,9 @@ i915_gem_execbuffer_move_to_gpu(struct drm_i915_gem_request *req, list_for_each_entry(vma, vmas, exec_list) { struct drm_i915_gem_object *obj = vma->obj; + if (vma->exec_entry->flags & EXEC_OBJECT_ASYNC) + continue; + ret = i915_gem_request_await_object (req, obj, obj->base.pending_write_domain); if (ret) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 03725fe89859..a2fa511b46b3 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -388,6 +388,10 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_HAS_POOLED_EU 38 #define I915_PARAM_MIN_EU_IN_POOL 39 #define I915_PARAM_MMAP_GTT_VERSION 40 +/* Query whether DRM_I915_GEM_EXECBUFFER2 supports the ability to opt-out of + * synchronisation with implicit fencing on individual objects. + */ +#define I915_PARAM_HAS_EXEC_ASYNC 41 typedef struct drm_i915_getparam { __s32 param; @@ -729,8 +733,29 @@ struct drm_i915_gem_exec_object2 { #define EXEC_OBJECT_SUPPORTS_48B_ADDRESS (1<<3) #define EXEC_OBJECT_PINNED (1<<4) #define EXEC_OBJECT_PAD_TO_SIZE (1<<5) +/* The kernel implicitly tracks GPU activity on all GEM objects, and + * synchronises operations with outstanding rendering. This includes + * rendering on other devices if exported via dma-buf. However, sometimes + * this tracking is too coarse and the user knows better. For example, + * if the object is split into non-overlapping ranges shared between different + * clients or engines (i.e. suballocating objects), the implicit tracking + * by kernel assumes that each operation affects the whole object rather + * than an individual range, causing needless synchronisation between clients. + * The kernel will also forgo any CPU cache flushes prior to rendering from + * the object as the client is expected to be also handling such domain + * tracking. + * + * The kernel maintains the implicit tracking in order to manage resources + * used by the GPU - this flag only disables the synchronisation prior to + * rendering with this object in this execbuf. + * + * Opting out of implicit synhronisation requires the user to do its own + * explicit tracking to avoid rendering corruption. See, for example, + * I915_PARAM_HAS_EXEC_FENCE to order execbufs and execute them asynchronously. + */ +#define EXEC_OBJECT_ASYNC (1<<6) /* All remaining bits are MBZ and RESERVED FOR FUTURE USE */ -#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_PAD_TO_SIZE<<1) +#define __EXEC_OBJECT_UNKNOWN_FLAGS -(EXEC_OBJECT_ASYNC<<1) __u64 flags; union {