From patchwork Wed Aug 28 14:33:18 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119271 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 344861395 for ; Wed, 28 Aug 2019 14:33:50 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1D2762080F for ; Wed, 28 Aug 2019 14:33:50 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1D2762080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id B27B389CD4; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id B65C389CD4 for ; Wed, 28 Aug 2019 14:33:46 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:33 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419385" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:31 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:18 +0300 Message-Id: <20190828143327.7965-2-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 01/10] drm/syncobj: add sideband payload X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: David Zhou , Christian Koenig Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" The Vulkan timeline semaphores allow signaling to happen on the point of the timeline without all of the its dependencies to be created. The current 2 implementations (AMD/Intel) of the Vulkan spec on top of the Linux kernel are using a thread to wait on the dependencies of a given point to materialize and delay actual submission to the kernel driver until the wait completes. If a binary semaphore is submitted for signaling along the side of a timeline semaphore waiting for completion that means that the drm syncobj associated with that binary semaphore will not have a DMA fence associated with it by the time vkQueueSubmit() returns. This and the fact that a binary semaphore can be signaled and unsignaled as before its DMA fences materialize mean that we cannot just rely on the fence within the syncobj but we also need a sideband payload verifying that the fence in the syncobj matches the last submission from the Vulkan API point of view. This change adds a sideband payload that is incremented with signaled syncobj when vkQueueSubmit() is called. The next vkQueueSubmit() waiting on a the syncobj will read the sideband payload and wait for a fence chain element with a seqno superior or equal to the sideband payload value to be added into the fence chain and use that fence to trigger the submission on the kernel driver. v2: Use a separate ioctl to get/set the sideband value (Christian) v3: Use 2 ioctls for get/set (Christian) v4: Use a single new ioctl v5: a bunch of blattant mistakes Store payload atomically (Chris) v6: Only touch atomic value once (Jason) Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v5) Cc: Christian Koenig Cc: Jason Ekstrand Cc: David(ChunMing) Zhou Signed-off-by: Lionel Landwerlin Reviewed-by: David Zhou (v5) --- drivers/gpu/drm/drm_internal.h | 2 ++ drivers/gpu/drm/drm_ioctl.c | 3 ++ drivers/gpu/drm/drm_syncobj.c | 59 +++++++++++++++++++++++++++++++++- include/drm/drm_syncobj.h | 9 ++++++ include/uapi/drm/drm.h | 17 ++++++++++ 5 files changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 51a2055c8f18..e297dfd85019 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -208,6 +208,8 @@ int drm_syncobj_timeline_signal_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file_private); +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private); /* drm_framebuffer.c */ void drm_framebuffer_print_info(struct drm_printer *p, unsigned int indent, diff --git a/drivers/gpu/drm/drm_ioctl.c b/drivers/gpu/drm/drm_ioctl.c index f675a3bb2c88..644d0bc800a4 100644 --- a/drivers/gpu/drm/drm_ioctl.c +++ b/drivers/gpu/drm/drm_ioctl.c @@ -703,6 +703,9 @@ static const struct drm_ioctl_desc drm_ioctls[] = { DRM_RENDER_ALLOW), DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_QUERY, drm_syncobj_query_ioctl, DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_SYNCOBJ_BINARY, drm_syncobj_binary_ioctl, + DRM_RENDER_ALLOW), + DRM_IOCTL_DEF(DRM_IOCTL_CRTC_GET_SEQUENCE, drm_crtc_get_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_CRTC_QUEUE_SEQUENCE, drm_crtc_queue_sequence_ioctl, 0), DRM_IOCTL_DEF(DRM_IOCTL_MODE_CREATE_LEASE, drm_mode_create_lease_ioctl, DRM_MASTER), diff --git a/drivers/gpu/drm/drm_syncobj.c b/drivers/gpu/drm/drm_syncobj.c index 4b5c7b0ed714..732310b2b367 100644 --- a/drivers/gpu/drm/drm_syncobj.c +++ b/drivers/gpu/drm/drm_syncobj.c @@ -1224,8 +1224,10 @@ drm_syncobj_reset_ioctl(struct drm_device *dev, void *data, if (ret < 0) return ret; - for (i = 0; i < args->count_handles; i++) + for (i = 0; i < args->count_handles; i++) { drm_syncobj_replace_fence(syncobjs[i], NULL); + atomic64_set(&syncobjs[i]->binary_payload, 0); + } drm_syncobj_array_free(syncobjs, args->count_handles); @@ -1395,6 +1397,61 @@ int drm_syncobj_query_ioctl(struct drm_device *dev, void *data, if (ret) break; } + + drm_syncobj_array_free(syncobjs, args->count_handles); + + return ret; +} + +int drm_syncobj_binary_ioctl(struct drm_device *dev, void *data, + struct drm_file *file_private) +{ + struct drm_syncobj_binary_array *args = data; + struct drm_syncobj **syncobjs; + u32 __user *access_flags = u64_to_user_ptr(args->access_flags); + u64 __user *values = u64_to_user_ptr(args->values); + u32 i; + int ret; + + if (!drm_core_check_feature(dev, DRIVER_SYNCOBJ_TIMELINE)) + return -EOPNOTSUPP; + + if (args->pad != 0) + return -EINVAL; + + if (args->count_handles == 0) + return -EINVAL; + + ret = drm_syncobj_array_find(file_private, + u64_to_user_ptr(args->handles), + args->count_handles, + &syncobjs); + if (ret < 0) + return ret; + + for (i = 0; i < args->count_handles; i++) { + u32 flags; + u64 value; + + if (get_user(flags, &access_flags[i])) { + ret = -EFAULT; + break; + } + + if (flags & DRM_SYNCOBJ_BINARY_VALUE_INC) + value = atomic64_inc_return(&syncobjs[i]->binary_payload) - 1; + else if (flags & DRM_SYNCOBJ_BINARY_VALUE_READ) + value = atomic64_read(&syncobjs[i]->binary_payload); + + if (flags & DRM_SYNCOBJ_BINARY_VALUE_READ) { + if (put_user(value, &values[i])) { + ret = -EFAULT; + break; + } + } + + } + drm_syncobj_array_free(syncobjs, args->count_handles); return ret; diff --git a/include/drm/drm_syncobj.h b/include/drm/drm_syncobj.h index 6cf7243a1dc5..aa76cb3f9107 100644 --- a/include/drm/drm_syncobj.h +++ b/include/drm/drm_syncobj.h @@ -61,6 +61,15 @@ struct drm_syncobj { * @file: A file backing for this syncobj. */ struct file *file; + /** + * @binary_payload: A 64bit payload for binary syncobjs. + * + * We use the payload value to wait on binary syncobj fences to + * materialize. It is a reservation mechanism for the signaler to + * express that at some point in the future a dma fence with the same + * seqno will be put into the syncobj. + */ + atomic64_t binary_payload; }; void drm_syncobj_free(struct kref *kref); diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h index 8a5b2f8f8eb9..78a0a413b788 100644 --- a/include/uapi/drm/drm.h +++ b/include/uapi/drm/drm.h @@ -785,6 +785,22 @@ struct drm_syncobj_timeline_array { __u32 pad; }; +struct drm_syncobj_binary_array { + /* A pointer to an array of u32 syncobj handles. */ + __u64 handles; + /* A pointer to an array of u32 access flags for each handle. */ + __u64 access_flags; + /* The binary value of a syncobj is read before it is incremented. */ +#define DRM_SYNCOBJ_BINARY_VALUE_READ (1u << 0) +#define DRM_SYNCOBJ_BINARY_VALUE_INC (1u << 1) + /* A pointer to an array of u64 values written to by the kernel if the + * handle is flagged for reading. + */ + __u64 values; + /* The length of the 3 arrays above. */ + __u32 count_handles; + __u32 pad; +}; /* Query current scanout sequence number */ struct drm_crtc_get_sequence { @@ -946,6 +962,7 @@ extern "C" { #define DRM_IOCTL_SYNCOBJ_QUERY DRM_IOWR(0xCB, struct drm_syncobj_timeline_array) #define DRM_IOCTL_SYNCOBJ_TRANSFER DRM_IOWR(0xCC, struct drm_syncobj_transfer) #define DRM_IOCTL_SYNCOBJ_TIMELINE_SIGNAL DRM_IOWR(0xCD, struct drm_syncobj_timeline_array) +#define DRM_IOCTL_SYNCOBJ_BINARY DRM_IOWR(0xCE, struct drm_syncobj_binary_array) /** * Device specific ioctls should only be in their respective headers From patchwork Wed Aug 28 14:33:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119277 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B3836112C for ; Wed, 28 Aug 2019 14:33:57 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9A7A82080F for ; Wed, 28 Aug 2019 14:33:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9A7A82080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 8789089CDE; Wed, 28 Aug 2019 14:33:54 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id D2BC889CBA for ; Wed, 28 Aug 2019 14:33:46 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419392" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:33 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:19 +0300 Message-Id: <20190828143327.7965-3-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 02/10] drm/i915: introduce a mechanism to extend execbuf2 X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We're planning to use this for a couple of new feature where we need to provide additional parameters to execbuf. v2: Check for invalid flags in execbuffer2 (Lionel) v3: Rename I915_EXEC_EXT -> I915_EXEC_USE_EXTENSIONS (Chris) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson (v1) --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 39 ++++++++++++++++++- include/uapi/drm/i915_drm.h | 26 +++++++++++-- 2 files changed, 61 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 27dbcb508055..4f5fd946ab28 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -25,6 +25,7 @@ #include "i915_gem_context.h" #include "i915_gem_ioctls.h" #include "i915_trace.h" +#include "i915_user_extensions.h" enum { FORCE_CPU_RELOC = 1, @@ -272,6 +273,10 @@ struct i915_execbuffer { */ int lut_size; struct hlist_head *buckets; /** ht for relocation handles */ + + struct { + u64 flags; /** Available extensions parameters */ + } extensions; }; #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags]) @@ -1940,7 +1945,8 @@ static bool i915_gem_check_execbuffer(struct drm_i915_gem_execbuffer2 *exec) return false; /* Kernel clipping was a DRI1 misfeature */ - if (!(exec->flags & I915_EXEC_FENCE_ARRAY)) { + if (!(exec->flags & (I915_EXEC_FENCE_ARRAY | + I915_EXEC_USE_EXTENSIONS))) { if (exec->num_cliprects || exec->cliprects_ptr) return false; } @@ -2442,6 +2448,33 @@ signal_fence_array(struct i915_execbuffer *eb, } } +static const i915_user_extension_fn execbuf_extensions[] = { +}; + +static int +parse_execbuf2_extensions(struct drm_i915_gem_execbuffer2 *args, + struct i915_execbuffer *eb) +{ + eb->extensions.flags = 0; + + if (!(args->flags & I915_EXEC_USE_EXTENSIONS)) + return 0; + + /* The execbuf2 extension mechanism reuses cliprects_ptr. So we cannot + * have another flag also using it at the same time. + */ + if (eb->args->flags & I915_EXEC_FENCE_ARRAY) + return -EINVAL; + + if (args->num_cliprects != 0) + return -EINVAL; + + return i915_user_extensions(u64_to_user_ptr(args->cliprects_ptr), + execbuf_extensions, + ARRAY_SIZE(execbuf_extensions), + eb); +} + static int i915_gem_do_execbuffer(struct drm_device *dev, struct drm_file *file, @@ -2488,6 +2521,10 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (args->flags & I915_EXEC_IS_PINNED) eb.batch_flags |= I915_DISPATCH_PINNED; + err = parse_execbuf2_extensions(args, &eb); + if (err) + return err; + if (args->flags & I915_EXEC_FENCE_IN) { in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2)); if (!in_fence) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 469dc512cca3..0a99c26730e1 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1007,6 +1007,10 @@ struct drm_i915_gem_exec_fence { __u32 flags; }; +enum drm_i915_gem_execbuffer_ext { + DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */ +}; + struct drm_i915_gem_execbuffer2 { /** * List of gem_exec_object2 structs @@ -1023,8 +1027,15 @@ struct drm_i915_gem_execbuffer2 { __u32 num_cliprects; /** * This is a struct drm_clip_rect *cliprects if I915_EXEC_FENCE_ARRAY - * is not set. If I915_EXEC_FENCE_ARRAY is set, then this is a - * struct drm_i915_gem_exec_fence *fences. + * & I915_EXEC_USE_EXTENSIONS are not set. + * + * If I915_EXEC_FENCE_ARRAY is set, then this is a pointer to an array + * of struct drm_i915_gem_exec_fence and num_cliprects is the length + * of the array. + * + * If I915_EXEC_USE_EXTENSIONS is set, then this is a pointer to a + * single struct drm_i915_gem_base_execbuffer_ext and num_cliprects is + * 0. */ __u64 cliprects_ptr; #define I915_EXEC_RING_MASK (0x3f) @@ -1142,7 +1153,16 @@ struct drm_i915_gem_execbuffer2 { */ #define I915_EXEC_FENCE_SUBMIT (1 << 20) -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1)) +/* + * Setting I915_EXEC_USE_EXTENSIONS implies that + * drm_i915_gem_execbuffer2.cliprects_ptr is treated as a pointer to an linked + * list of i915_user_extension. Each i915_user_extension node is the base of a + * larger structure. The list of supported structures are listed in the + * drm_i915_gem_execbuffer_ext enum. + */ +#define I915_EXEC_USE_EXTENSIONS (1 << 21) + +#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_USE_EXTENSIONS<<1)) #define I915_EXEC_CONTEXT_ID_MASK (0xffffffff) #define i915_execbuffer2_set_context_id(eb2, context) \ From patchwork Wed Aug 28 14:33:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119289 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B53F6112C for ; Wed, 28 Aug 2019 14:34:02 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 9D4E12080F for ; Wed, 28 Aug 2019 14:34:02 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9D4E12080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0A4BD89CDF; Wed, 28 Aug 2019 14:33:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6DBD289CD4 for ; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:36 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419403" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:34 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:20 +0300 Message-Id: <20190828143327.7965-4-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 03/10] drm/i915: add syncobj timeline support X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Introduces a new parameters to execbuf so that we can specify syncobj handles as well as timeline points. v2: Reuse i915_user_extension_fn v3: Check that the chained extension is only present once (Chris) v4: Check that dma_fence_chain_find_seqno returns a non NULL fence (Lionel) v5: Use BIT_ULL (Chris) v6: Fix issue with already signaled timeline points, dma_fence_chain_find_seqno() setting fence to NULL (Chris) v7: Report ENOENT with invalid syncobj handle (Lionel) v8: Check for out of order timeline point insertion (Chris) v9: After explanations on https://lists.freedesktop.org/archives/dri-devel/2019-August/229287.html drop the ordering check from v8 (Lionel) v10: Set first extension enum item to 1 (Jason) Signed-off-by: Lionel Landwerlin --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 307 ++++++++++++++---- drivers/gpu/drm/i915/i915_drv.c | 3 +- drivers/gpu/drm/i915/i915_getparam.c | 1 + include/uapi/drm/i915_drm.h | 39 +++ 4 files changed, 293 insertions(+), 57 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 4f5fd946ab28..46ad8d9642d1 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -214,6 +214,13 @@ enum { * the batchbuffer in trusted mode, otherwise the ioctl is rejected. */ +struct i915_eb_fences { + struct drm_syncobj *syncobj; /* Use with ptr_mask_bits() */ + struct dma_fence *dma_fence; + u64 value; + struct dma_fence_chain *chain_fence; +}; + struct i915_execbuffer { struct drm_i915_private *i915; /** i915 backpointer */ struct drm_file *file; /** per-file lookup tables and limits */ @@ -276,6 +283,7 @@ struct i915_execbuffer { struct { u64 flags; /** Available extensions parameters */ + struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences; } extensions; }; @@ -2320,67 +2328,217 @@ eb_pin_engine(struct i915_execbuffer *eb, } static void -__free_fence_array(struct drm_syncobj **fences, unsigned int n) +__free_fence_array(struct i915_eb_fences *fences, unsigned int n) { - while (n--) - drm_syncobj_put(ptr_mask_bits(fences[n], 2)); + while (n--) { + drm_syncobj_put(ptr_mask_bits(fences[n].syncobj, 2)); + dma_fence_put(fences[n].dma_fence); + kfree(fences[n].chain_fence); + } kvfree(fences); } -static struct drm_syncobj ** -get_fence_array(struct drm_i915_gem_execbuffer2 *args, - struct drm_file *file) +static struct i915_eb_fences * +get_timeline_fence_array(struct i915_execbuffer *eb, int *out_n_fences) +{ + struct drm_i915_gem_execbuffer_ext_timeline_fences *timeline_fences = + &eb->extensions.timeline_fences; + struct drm_i915_gem_exec_fence __user *user_fences; + struct i915_eb_fences *fences; + u64 __user *user_values; + u64 num_fences, num_user_fences = timeline_fences->fence_count; + unsigned long n; + int err; + + /* Check multiplication overflow for access_ok() and kvmalloc_array() */ + BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long)); + if (num_user_fences > min_t(unsigned long, + ULONG_MAX / sizeof(*user_fences), + SIZE_MAX / sizeof(*fences))) + return ERR_PTR(-EINVAL); + + user_fences = u64_to_user_ptr(timeline_fences->handles_ptr); + if (!access_ok(user_fences, num_user_fences * sizeof(*user_fences))) + return ERR_PTR(-EFAULT); + + user_values = u64_to_user_ptr(timeline_fences->values_ptr); + if (!access_ok(user_values, num_user_fences * sizeof(*user_values))) + return ERR_PTR(-EFAULT); + + fences = kvmalloc_array(num_user_fences, sizeof(*fences), + __GFP_NOWARN | GFP_KERNEL); + if (!fences) + return ERR_PTR(-ENOMEM); + + BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) & + ~__I915_EXEC_FENCE_UNKNOWN_FLAGS); + + for (n = 0, num_fences = 0; n < timeline_fences->fence_count; n++) { + struct drm_i915_gem_exec_fence user_fence; + struct drm_syncobj *syncobj; + struct dma_fence *fence = NULL; + u64 point; + + if (__copy_from_user(&user_fence, user_fences++, sizeof(user_fence))) { + err = -EFAULT; + goto err; + } + + if (user_fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) { + err = -EINVAL; + goto err; + } + + if (__get_user(point, user_values++)) { + err = -EFAULT; + goto err; + } + + syncobj = drm_syncobj_find(eb->file, user_fence.handle); + if (!syncobj) { + DRM_DEBUG("Invalid syncobj handle provided\n"); + err = -ENOENT; + goto err; + } + + if (user_fence.flags & I915_EXEC_FENCE_WAIT) { + fence = drm_syncobj_fence_get(syncobj); + if (!fence) { + DRM_DEBUG("Syncobj handle has no fence\n"); + drm_syncobj_put(syncobj); + err = -EINVAL; + goto err; + } + + err = dma_fence_chain_find_seqno(&fence, point); + if (err) { + DRM_DEBUG("Syncobj handle missing requested point %llu\n", point); + drm_syncobj_put(syncobj); + goto err; + } + + /* A point might have been signaled already and + * garbage collected from the timeline. In this case + * just ignore the point and carry on. + */ + if (!fence) { + drm_syncobj_put(syncobj); + continue; + } + } + + /* + * For timeline syncobjs we need to preallocate chains for + * later signaling. + */ + if (point != 0 && user_fence.flags & I915_EXEC_FENCE_SIGNAL) { + /* + * Waiting and signaling the same point (when point != + * 0) would break the timeline. + */ + if (user_fence.flags & I915_EXEC_FENCE_WAIT) { + DRM_DEBUG("Tring to wait & signal the same timeline point.\n"); + err = -EINVAL; + drm_syncobj_put(syncobj); + goto err; + } + + fences[num_fences].chain_fence = + kmalloc(sizeof(*fences[num_fences].chain_fence), + GFP_KERNEL); + if (!fences[num_fences].chain_fence) { + drm_syncobj_put(syncobj); + err = -ENOMEM; + DRM_DEBUG("Unable to alloc chain_fence\n"); + goto err; + } + } else { + fences[num_fences].chain_fence = NULL; + } + + fences[num_fences].syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2); + fences[num_fences].dma_fence = fence; + fences[num_fences].value = point; + num_fences++; + } + + *out_n_fences = num_fences; + + return fences; + +err: + __free_fence_array(fences, num_fences); + return ERR_PTR(err); +} + +static struct i915_eb_fences * +get_legacy_fence_array(struct i915_execbuffer *eb, + int *out_n_fences) { - const unsigned long nfences = args->num_cliprects; + struct drm_i915_gem_execbuffer2 *args = eb->args; struct drm_i915_gem_exec_fence __user *user; - struct drm_syncobj **fences; + struct i915_eb_fences *fences; + const u32 num_fences = args->num_cliprects; unsigned long n; int err; - if (!(args->flags & I915_EXEC_FENCE_ARRAY)) - return NULL; + *out_n_fences = num_fences; /* Check multiplication overflow for access_ok() and kvmalloc_array() */ BUILD_BUG_ON(sizeof(size_t) > sizeof(unsigned long)); - if (nfences > min_t(unsigned long, - ULONG_MAX / sizeof(*user), - SIZE_MAX / sizeof(*fences))) + if (*out_n_fences > min_t(unsigned long, + ULONG_MAX / sizeof(*user), + SIZE_MAX / sizeof(*fences))) return ERR_PTR(-EINVAL); user = u64_to_user_ptr(args->cliprects_ptr); - if (!access_ok(user, nfences * sizeof(*user))) + if (!access_ok(user, *out_n_fences * sizeof(*user))) return ERR_PTR(-EFAULT); - fences = kvmalloc_array(nfences, sizeof(*fences), + fences = kvmalloc_array(*out_n_fences, sizeof(*fences), __GFP_NOWARN | GFP_KERNEL); if (!fences) return ERR_PTR(-ENOMEM); - for (n = 0; n < nfences; n++) { - struct drm_i915_gem_exec_fence fence; + for (n = 0; n < *out_n_fences; n++) { + struct drm_i915_gem_exec_fence user_fence; struct drm_syncobj *syncobj; + struct dma_fence *fence = NULL; - if (__copy_from_user(&fence, user++, sizeof(fence))) { + if (__copy_from_user(&user_fence, user++, sizeof(user_fence))) { err = -EFAULT; goto err; } - if (fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) { + if (user_fence.flags & __I915_EXEC_FENCE_UNKNOWN_FLAGS) { err = -EINVAL; goto err; } - syncobj = drm_syncobj_find(file, fence.handle); + syncobj = drm_syncobj_find(eb->file, user_fence.handle); if (!syncobj) { DRM_DEBUG("Invalid syncobj handle provided\n"); err = -ENOENT; goto err; } + if (user_fence.flags & I915_EXEC_FENCE_WAIT) { + fence = drm_syncobj_fence_get(syncobj); + if (!fence) { + DRM_DEBUG("Syncobj handle has no fence\n"); + drm_syncobj_put(syncobj); + err = -EINVAL; + goto err; + } + } + BUILD_BUG_ON(~(ARCH_KMALLOC_MINALIGN - 1) & ~__I915_EXEC_FENCE_UNKNOWN_FLAGS); - fences[n] = ptr_pack_bits(syncobj, fence.flags, 2); + fences[n].syncobj = ptr_pack_bits(syncobj, user_fence.flags, 2); + fences[n].dma_fence = fence; + fences[n].value = 0; + fences[n].chain_fence = NULL; } return fences; @@ -2390,37 +2548,44 @@ get_fence_array(struct drm_i915_gem_execbuffer2 *args, return ERR_PTR(err); } +static struct i915_eb_fences * +get_fence_array(struct i915_execbuffer *eb, int *out_n_fences) +{ + if (eb->args->flags & I915_EXEC_FENCE_ARRAY) + return get_legacy_fence_array(eb, out_n_fences); + + if (eb->extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES)) + return get_timeline_fence_array(eb, out_n_fences); + + *out_n_fences = 0; + return NULL; +} + static void -put_fence_array(struct drm_i915_gem_execbuffer2 *args, - struct drm_syncobj **fences) +put_fence_array(struct i915_eb_fences *fences, int nfences) { if (fences) - __free_fence_array(fences, args->num_cliprects); + __free_fence_array(fences, nfences); } static int await_fence_array(struct i915_execbuffer *eb, - struct drm_syncobj **fences) + struct i915_eb_fences *fences, + int nfences) { - const unsigned int nfences = eb->args->num_cliprects; unsigned int n; int err; for (n = 0; n < nfences; n++) { struct drm_syncobj *syncobj; - struct dma_fence *fence; unsigned int flags; - syncobj = ptr_unpack_bits(fences[n], &flags, 2); + syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2); if (!(flags & I915_EXEC_FENCE_WAIT)) continue; - fence = drm_syncobj_fence_get(syncobj); - if (!fence) - return -EINVAL; - - err = i915_request_await_dma_fence(eb->request, fence); - dma_fence_put(fence); + err = i915_request_await_dma_fence(eb->request, + fences[n].dma_fence); if (err < 0) return err; } @@ -2430,9 +2595,9 @@ await_fence_array(struct i915_execbuffer *eb, static void signal_fence_array(struct i915_execbuffer *eb, - struct drm_syncobj **fences) + struct i915_eb_fences *fences, + int nfences) { - const unsigned int nfences = eb->args->num_cliprects; struct dma_fence * const fence = &eb->request->fence; unsigned int n; @@ -2440,15 +2605,46 @@ signal_fence_array(struct i915_execbuffer *eb, struct drm_syncobj *syncobj; unsigned int flags; - syncobj = ptr_unpack_bits(fences[n], &flags, 2); + syncobj = ptr_unpack_bits(fences[n].syncobj, &flags, 2); if (!(flags & I915_EXEC_FENCE_SIGNAL)) continue; - drm_syncobj_replace_fence(syncobj, fence); + if (fences[n].chain_fence) { + drm_syncobj_add_point(syncobj, fences[n].chain_fence, + fence, fences[n].value); + /* + * The chain's ownership is transfered to the + * timeline. + */ + fences[n].chain_fence = NULL; + } else { + drm_syncobj_replace_fence(syncobj, fence); + } } } +static int parse_timeline_fences(struct i915_user_extension __user *ext, void *data) +{ + struct i915_execbuffer *eb = data; + + /* Timeline fences are incompatible with the fence array flag. */ + if (eb->args->flags & I915_EXEC_FENCE_ARRAY) + return -EINVAL; + + if (eb->extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES)) + return -EINVAL; + + if (copy_from_user(&eb->extensions.timeline_fences, ext, + sizeof(eb->extensions.timeline_fences))) + return -EFAULT; + + eb->extensions.flags |= BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES); + + return 0; +} + static const i915_user_extension_fn execbuf_extensions[] = { + [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences, }; static int @@ -2479,14 +2675,15 @@ static int i915_gem_do_execbuffer(struct drm_device *dev, struct drm_file *file, struct drm_i915_gem_execbuffer2 *args, - struct drm_i915_gem_exec_object2 *exec, - struct drm_syncobj **fences) + struct drm_i915_gem_exec_object2 *exec) { struct i915_execbuffer eb; struct dma_fence *in_fence = NULL; struct dma_fence *exec_fence = NULL; struct sync_file *out_fence = NULL; + struct i915_eb_fences *fences = NULL; int out_fence_fd = -1; + int nfences = 0; int err; BUILD_BUG_ON(__EXEC_INTERNAL_FLAGS & ~__I915_EXEC_ILLEGAL_FLAGS); @@ -2525,10 +2722,16 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (err) return err; + fences = get_fence_array(&eb, &nfences); + if (IS_ERR(fences)) + return PTR_ERR(fences); + if (args->flags & I915_EXEC_FENCE_IN) { in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2)); - if (!in_fence) - return -EINVAL; + if (!in_fence) { + err = -EINVAL; + goto err_fences; + } } if (args->flags & I915_EXEC_FENCE_SUBMIT) { @@ -2673,7 +2876,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, } if (fences) { - err = await_fence_array(&eb, fences); + err = await_fence_array(&eb, fences, nfences); if (err) goto err_request; } @@ -2704,7 +2907,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, i915_request_add(eb.request); if (fences) - signal_fence_array(&eb, fences); + signal_fence_array(&eb, fences, nfences); if (out_fence) { if (err == 0) { @@ -2739,6 +2942,8 @@ i915_gem_do_execbuffer(struct drm_device *dev, dma_fence_put(exec_fence); err_in_fence: dma_fence_put(in_fence); +err_fences: + put_fence_array(fences, nfences); return err; } @@ -2832,7 +3037,7 @@ i915_gem_execbuffer_ioctl(struct drm_device *dev, void *data, exec2_list[i].flags = 0; } - err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list, NULL); + err = i915_gem_do_execbuffer(dev, file, &exec2, exec2_list); if (exec2.flags & __EXEC_HAS_RELOC) { struct drm_i915_gem_exec_object __user *user_exec_list = u64_to_user_ptr(args->buffers_ptr); @@ -2863,7 +3068,6 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, { struct drm_i915_gem_execbuffer2 *args = data; struct drm_i915_gem_exec_object2 *exec2_list; - struct drm_syncobj **fences = NULL; const size_t count = args->buffer_count; int err; @@ -2891,15 +3095,7 @@ i915_gem_execbuffer2_ioctl(struct drm_device *dev, void *data, return -EFAULT; } - if (args->flags & I915_EXEC_FENCE_ARRAY) { - fences = get_fence_array(args, file); - if (IS_ERR(fences)) { - kvfree(exec2_list); - return PTR_ERR(fences); - } - } - - err = i915_gem_do_execbuffer(dev, file, args, exec2_list, fences); + err = i915_gem_do_execbuffer(dev, file, args, exec2_list); /* * Now that we have begun execution of the batchbuffer, we ignore @@ -2939,7 +3135,6 @@ end:; } args->flags &= ~__I915_EXEC_UNKNOWN_FLAGS; - put_fence_array(args, fences); kvfree(exec2_list); return err; } diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c index bec25942d77d..4e21858fe110 100644 --- a/drivers/gpu/drm/i915/i915_drv.c +++ b/drivers/gpu/drm/i915/i915_drv.c @@ -2820,7 +2820,8 @@ static struct drm_driver driver = { */ .driver_features = DRIVER_GEM | - DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ, + DRIVER_RENDER | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_SYNCOBJ | + DRIVER_SYNCOBJ_TIMELINE, .release = i915_driver_release, .open = i915_driver_open, .lastclose = i915_driver_lastclose, diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index 5d9101376a3d..da6faa84e5b8 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -130,6 +130,7 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_HAS_EXEC_BATCH_FIRST: case I915_PARAM_HAS_EXEC_FENCE_ARRAY: case I915_PARAM_HAS_EXEC_SUBMIT_FENCE: + case I915_PARAM_HAS_EXEC_TIMELINE_FENCES: /* For the time being all of these are always true; * if some supported hardware does not have one of these * features this value needs to be provided from diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 0a99c26730e1..3d031e81648b 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -611,6 +611,13 @@ typedef struct drm_i915_irq_wait { * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT. */ #define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53 + +/* Query whether DRM_I915_GEM_EXECBUFFER2 supports supplying an array of + * timeline syncobj through drm_i915_gem_execbuf_ext_timeline_fences. See + * I915_EXEC_USE_EXTENSIONS. + */ +#define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 54 + /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1008,9 +1015,41 @@ struct drm_i915_gem_exec_fence { }; enum drm_i915_gem_execbuffer_ext { + /** + * See drm_i915_gem_execbuf_ext_timeline_fences. + */ + DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 1, + DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */ }; +/** + * This structure describes an array of drm_syncobj and associated points for + * timeline variants of drm_syncobj. It is invalid to append this structure to + * the execbuf if I915_EXEC_FENCE_ARRAY is set. + */ +struct drm_i915_gem_execbuffer_ext_timeline_fences { + struct i915_user_extension base; + + /** + * Number of element in the handles_ptr & value_ptr arrays. + */ + __u64 fence_count; + + /** + * Pointer to an array of struct drm_i915_gem_exec_fence of length + * fence_count. + */ + __u64 handles_ptr; + + /** + * Pointer to an array of u64 values of length fence_count. Values + * must be 0 for a binary drm_syncobj. A Value of 0 for a timeline + * drm_syncobj is invalid as it turns a drm_syncobj into a binary one. + */ + __u64 values_ptr; +}; + struct drm_i915_gem_execbuffer2 { /** * List of gem_exec_object2 structs From patchwork Wed Aug 28 14:33:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119275 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ADD041395 for ; Wed, 28 Aug 2019 14:33:56 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 965E82080F for ; Wed, 28 Aug 2019 14:33:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 965E82080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 4048B89CD9; Wed, 28 Aug 2019 14:33:54 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4117389CBA for ; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:37 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419411" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:36 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:21 +0300 Message-Id: <20190828143327.7965-5-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 04/10] drm/i915/perf: introduce a versioning of the i915-perf uapi X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Reporting this version will help application figure out what level of the support the running kernel provides. v2: Add i915_perf_ioctl_version() (Chris) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_getparam.c | 4 ++++ drivers/gpu/drm/i915/i915_perf.c | 10 ++++++++++ drivers/gpu/drm/i915/i915_perf.h | 1 + include/uapi/drm/i915_drm.h | 20 ++++++++++++++++++++ 4 files changed, 35 insertions(+) diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index da6faa84e5b8..bd41cc5ce906 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -5,6 +5,7 @@ #include "gt/intel_engine_user.h" #include "i915_drv.h" +#include "i915_perf.h" int i915_getparam_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv) @@ -157,6 +158,9 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_MMAP_GTT_COHERENT: value = INTEL_INFO(i915)->has_coherent_ggtt; break; + case I915_PARAM_PERF_REVISION: + value = i915_perf_ioctl_version(); + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 2c9f46e12622..47fb6f6f2065 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3701,3 +3701,13 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) dev_priv->perf.initialized = false; } + +/** + * i915_perf_ioctl_version - Version of the i915-perf subsystem + * + * This version number is used by userspace to detect available features. + */ +int i915_perf_ioctl_version(void) +{ + return 1; +} diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h index a412b16d9ffc..95549de65212 100644 --- a/drivers/gpu/drm/i915/i915_perf.h +++ b/drivers/gpu/drm/i915/i915_perf.h @@ -18,6 +18,7 @@ void i915_perf_init(struct drm_i915_private *i915); void i915_perf_fini(struct drm_i915_private *i915); void i915_perf_register(struct drm_i915_private *i915); void i915_perf_unregister(struct drm_i915_private *i915); +int i915_perf_ioctl_version(void); int i915_perf_open_ioctl(struct drm_device *dev, void *data, struct drm_file *file); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 3d031e81648b..e98c9a7baa91 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -618,6 +618,12 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_HAS_EXEC_TIMELINE_FENCES 54 +/* + * Revision of the i915-perf uAPI. The value returned helps determine what + * i915-perf features are available. See drm_i915_perf_property_id. + */ +#define I915_PARAM_PERF_REVISION 55 + /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1903,23 +1909,31 @@ enum drm_i915_perf_property_id { * Open the stream for a specific context handle (as used with * execbuffer2). A stream opened for a specific context this way * won't typically require root privileges. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_CTX_HANDLE = 1, /** * A value of 1 requests the inclusion of raw OA unit reports as * part of stream samples. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_SAMPLE_OA, /** * The value specifies which set of OA unit metrics should be * be configured, defining the contents of any OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_METRICS_SET, /** * The value specifies the size and layout of OA unit reports. + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_FORMAT, @@ -1929,6 +1943,8 @@ enum drm_i915_perf_property_id { * from this exponent as follows: * * 80ns * 2^(period_exponent + 1) + * + * This property is available in perf revision 1. */ DRM_I915_PERF_PROP_OA_EXPONENT, @@ -1960,6 +1976,8 @@ struct drm_i915_perf_open_param { * to close and re-open a stream with the same configuration. * * It's undefined whether any pending data for the stream will be lost. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_ENABLE _IO('i', 0x0) @@ -1967,6 +1985,8 @@ struct drm_i915_perf_open_param { * Disable data capture for a stream. * * It is an error to try and read a stream that is disabled. + * + * This ioctl is available in perf revision 1. */ #define I915_PERF_IOCTL_DISABLE _IO('i', 0x1) From patchwork Wed Aug 28 14:33:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119281 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8E5B2112C for ; Wed, 28 Aug 2019 14:33:59 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 76DAE2080F for ; Wed, 28 Aug 2019 14:33:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 76DAE2080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 957ED89CF4; Wed, 28 Aug 2019 14:33:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 8772D89CBA for ; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:39 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419417" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:37 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:22 +0300 Message-Id: <20190828143327.7965-6-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 05/10] drm/i915/perf: allow for CS OA configs to be created lazily X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Here we introduce a mechanism by which the execbuf part of the i915 driver will be able to request that a batch buffer containing the programming for a particular OA config be created. We'll execute these OA configuration buffers right before executing a set of userspace commands so that a particular user batchbuffer be executed with a given OA configuration. This mechanism essentially allows the userspace driver to go through several OA configuration without having to open/close the i915/perf stream. v2: No need for locking on object OA config object creation (Chris) Flush cpu mapping of OA config (Chris) v3: Properly deal with the perf_metric lock (Chris/Lionel) v4: Fix oa config unref/put when not found (Lionel) v5: Allocate BOs for configurations on the stream instead of globally (Lionel) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson (v4) --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 1 + drivers/gpu/drm/i915/i915_drv.h | 14 +- drivers/gpu/drm/i915/i915_perf.c | 241 ++++++++++++++++--- drivers/gpu/drm/i915/i915_perf.h | 22 ++ 4 files changed, 237 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index 86e00a2db8a4..a7f1377a54a2 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -133,6 +133,7 @@ */ #define MI_LOAD_REGISTER_IMM(x) MI_INSTR(0x22, 2*(x)-1) #define MI_LRI_FORCE_POSTED (1<<12) +#define MI_LOAD_REGISTER_IMM_MAX_REGS (126) #define MI_STORE_REGISTER_MEM MI_INSTR(0x24, 1) #define MI_STORE_REGISTER_MEM_GEN8 MI_INSTR(0x24, 2) #define MI_SRM_LRM_GLOBAL_GTT (1<<22) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index b42651a387d9..d8a1e842fb48 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -989,6 +989,8 @@ struct i915_oa_reg { }; struct i915_oa_config { + struct drm_i915_private *i915; + char uuid[UUID_STRING_LEN + 1]; int id; @@ -1003,7 +1005,7 @@ struct i915_oa_config { struct attribute *attrs[2]; struct device_attribute sysfs_metric_id; - atomic_t ref_count; + struct kref ref; }; struct i915_perf_stream; @@ -1130,6 +1132,12 @@ struct i915_perf_stream { */ struct i915_oa_config *oa_config; + /** + * @oa_config_bos: A list of struct i915_oa_config_bo allocated lazily + * each time @oa_config changes. + */ + struct list_head oa_config_bos; + /** * The OA context specific information. */ @@ -1660,8 +1668,8 @@ struct drm_i915_private { struct mutex metrics_lock; /* - * List of dynamic configurations, you need to hold - * dev_priv->perf.metrics_lock to access it. + * List of dynamic configurations (struct i915_oa_config), you + * need to hold dev_priv->perf.metrics_lock to access it. */ struct idr metrics_idr; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 47fb6f6f2065..0385abce7baa 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -365,11 +365,19 @@ struct perf_open_properties { int oa_period_exponent; }; +struct i915_oa_config_bo { + struct list_head link; + + struct i915_oa_config *oa_config; + struct drm_i915_gem_object *bo; +}; + static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer); -static void free_oa_config(struct drm_i915_private *dev_priv, - struct i915_oa_config *oa_config) +void i915_oa_config_release(struct kref *ref) { + struct i915_oa_config *oa_config = container_of(ref, typeof(*oa_config), ref); + if (!PTR_ERR(oa_config->flex_regs)) kfree(oa_config->flex_regs); if (!PTR_ERR(oa_config->b_counter_regs)) @@ -379,40 +387,173 @@ static void free_oa_config(struct drm_i915_private *dev_priv, kfree(oa_config); } -static void put_oa_config(struct drm_i915_private *dev_priv, - struct i915_oa_config *oa_config) +static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_regs) { - if (!atomic_dec_and_test(&oa_config->ref_count)) - return; + u32 i; + + for (i = 0; i < n_regs; i++) { + if ((i % MI_LOAD_REGISTER_IMM_MAX_REGS) == 0) { + u32 n_lri = min(n_regs - i, + (u32) MI_LOAD_REGISTER_IMM_MAX_REGS); + + *cs++ = MI_LOAD_REGISTER_IMM(n_lri); + } + *cs++ = i915_mmio_reg_offset(reg_data[i].addr); + *cs++ = reg_data[i].value; + } - free_oa_config(dev_priv, oa_config); + return cs; } -static int get_oa_config(struct drm_i915_private *dev_priv, - int metrics_set, - struct i915_oa_config **out_config) +static struct i915_oa_config_bo* alloc_oa_config_buffer(struct drm_i915_private *i915, + struct i915_oa_config *oa_config) { - int ret; + struct i915_oa_config_bo *oa_bo; + size_t config_length = 0; + u32 *cs; + int err; + + oa_bo = kzalloc(sizeof(*oa_bo), GFP_KERNEL); + if (!oa_bo) + return ERR_PTR(-ENOMEM); + + oa_bo->oa_config = i915_oa_config_get(oa_config); + + if (oa_config->mux_regs_len > 0) { + config_length += DIV_ROUND_UP(oa_config->mux_regs_len, + MI_LOAD_REGISTER_IMM_MAX_REGS) * 4; + config_length += oa_config->mux_regs_len * 8; + } + if (oa_config->b_counter_regs_len > 0) { + config_length += DIV_ROUND_UP(oa_config->b_counter_regs_len, + MI_LOAD_REGISTER_IMM_MAX_REGS) * 4; + config_length += oa_config->b_counter_regs_len * 8; + } + if (oa_config->flex_regs_len > 0) { + config_length += DIV_ROUND_UP(oa_config->flex_regs_len, + MI_LOAD_REGISTER_IMM_MAX_REGS) * 4; + config_length += oa_config->flex_regs_len * 8; + } + config_length += 4; /* MI_BATCH_BUFFER_END */ + config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE); + + oa_bo->bo = i915_gem_object_create_shmem(i915, config_length); + if (IS_ERR(oa_bo->bo)) { + err = PTR_ERR(oa_bo->bo); + goto err_oa_config; + } + + cs = i915_gem_object_pin_map(oa_bo->bo, I915_MAP_WB); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto err_oa_bo; + } + + cs = write_cs_mi_lri(cs, oa_config->mux_regs, oa_config->mux_regs_len); + cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len); + cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len); + + *cs++ = MI_BATCH_BUFFER_END; + + i915_gem_object_flush_map(oa_bo->bo); + i915_gem_object_unpin_map(oa_bo->bo); + + return oa_bo; + +err_oa_bo: + i915_gem_object_put(oa_bo->bo); +err_oa_config: + i915_oa_config_put(oa_bo->oa_config); + kfree(oa_bo); + + return ERR_PTR(err); +} + +int i915_perf_get_oa_config(struct drm_i915_private *i915, + int metrics_set, + struct i915_oa_config **out_config) +{ + struct i915_oa_config *oa_config; + int err; + + if (!i915->perf.initialized) + return -ENODEV; + + err = mutex_lock_interruptible(&i915->perf.metrics_lock); + if (err) + return err; if (metrics_set == 1) { - *out_config = &dev_priv->perf.test_config; - atomic_inc(&dev_priv->perf.test_config.ref_count); - return 0; + oa_config = &i915->perf.test_config; + } else { + oa_config = idr_find(&i915->perf.metrics_idr, metrics_set); + if (!oa_config) + err = -EINVAL; } - ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock); - if (ret) - return ret; + if (!err) + *out_config = i915_oa_config_get(oa_config); - *out_config = idr_find(&dev_priv->perf.metrics_idr, metrics_set); - if (!*out_config) - ret = -EINVAL; - else - atomic_inc(&(*out_config)->ref_count); + mutex_unlock(&i915->perf.metrics_lock); - mutex_unlock(&dev_priv->perf.metrics_lock); + return err; +} - return ret; +int i915_perf_get_oa_config_and_bo(struct i915_perf_stream *stream, + int metrics_set, + struct i915_oa_config **out_config, + struct drm_i915_gem_object **out_obj) +{ + struct drm_i915_private *i915 = stream->dev_priv; + struct i915_oa_config *oa_config; + int err = 0; + + if (!i915->perf.initialized) + return -ENODEV; + + err = i915_perf_get_oa_config(i915, metrics_set, &oa_config); + if (err) + return err; + + if (out_config) + *out_config = oa_config; + + if (out_obj) { + struct i915_oa_config_bo *oa_bo = NULL, *oa_bo_iter; + + /* Look for the buffer in the already allocated BOs attached + * to the stream. + */ + list_for_each_entry(oa_bo_iter, &stream->oa_config_bos, link) { + if (oa_bo_iter->oa_config == oa_config && + memcmp(oa_bo_iter->oa_config->uuid, + oa_config->uuid, + sizeof(oa_config->uuid)) == 0) { + oa_bo = oa_bo_iter; + break; + } + } + + if (!oa_bo) { + oa_bo = alloc_oa_config_buffer(i915, oa_config); + if (IS_ERR(oa_bo)) { + err = PTR_ERR(oa_bo); + goto err; + } + + list_add(&oa_bo->link, &stream->oa_config_bos); + } + + *out_obj = i915_gem_object_get(oa_bo->bo); + } + +err: + if (err) { + i915_oa_config_put(oa_config); + *out_config = NULL; + } + + return err; } static u32 gen8_oa_hw_tail_read(struct i915_perf_stream *stream) @@ -1360,6 +1501,19 @@ free_oa_buffer(struct i915_perf_stream *stream) stream->oa_buffer.vaddr = NULL; } +static void +free_oa_configs(struct i915_perf_stream *stream) +{ + struct i915_oa_config_bo *oa_bo, *tmp; + + i915_oa_config_put(stream->oa_config); + list_for_each_entry_safe(oa_bo, tmp, &stream->oa_config_bos, link) { + list_del(&oa_bo->link); + i915_oa_config_put(oa_bo->oa_config); + i915_gem_object_put(oa_bo->bo); + } +} + static void i915_oa_stream_destroy(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; @@ -1383,7 +1537,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) if (stream->ctx) oa_put_render_ctx_id(stream); - put_oa_config(dev_priv, stream->oa_config); + free_oa_configs(stream); if (dev_priv->perf.spurious_report_rs.missed) { DRM_NOTE("%d spurious OA report notices suppressed due to ratelimiting\n", @@ -2230,7 +2384,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } - ret = get_oa_config(dev_priv, props->metrics_set, &stream->oa_config); + ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, + &stream->oa_config); if (ret) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); goto err_config; @@ -2268,6 +2423,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, goto err_enable; } + DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); + mutex_unlock(&dev_priv->drm.struct_mutex); hrtimer_init(&stream->poll_check_timer, @@ -2287,7 +2444,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, free_oa_buffer(stream); err_oa_buf_alloc: - put_oa_config(dev_priv, stream->oa_config); + i915_oa_config_put(stream->oa_config); intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); @@ -2655,7 +2812,9 @@ static int i915_perf_release(struct inode *inode, struct file *file) struct drm_i915_private *dev_priv = stream->dev_priv; mutex_lock(&dev_priv->perf.lock); + i915_perf_destroy_locked(stream); + mutex_unlock(&dev_priv->perf.lock); /* Release the reference the perf stream kept on the driver. */ @@ -2764,6 +2923,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, goto err_ctx; } + INIT_LIST_HEAD(&stream->oa_config_bos); stream->dev_priv = dev_priv; stream->ctx = specific_ctx; @@ -3091,7 +3251,8 @@ void i915_perf_register(struct drm_i915_private *dev_priv) if (ret) goto sysfs_error; - atomic_set(&dev_priv->perf.test_config.ref_count, 1); + dev_priv->perf.test_config.i915 = dev_priv; + kref_init(&dev_priv->perf.test_config.ref); goto exit; @@ -3347,7 +3508,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, return -ENOMEM; } - atomic_set(&oa_config->ref_count, 1); + oa_config->i915 = dev_priv; + kref_init(&oa_config->ref); if (!uuid_is_valid(args->uuid)) { DRM_DEBUG("Invalid uuid format for OA config\n"); @@ -3446,7 +3608,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, sysfs_err: mutex_unlock(&dev_priv->perf.metrics_lock); reg_err: - put_oa_config(dev_priv, oa_config); + i915_oa_config_put(oa_config); DRM_DEBUG("Failed to add new OA config\n"); return err; } @@ -3482,13 +3644,13 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, ret = mutex_lock_interruptible(&dev_priv->perf.metrics_lock); if (ret) - goto lock_err; + return ret; oa_config = idr_find(&dev_priv->perf.metrics_idr, *arg); if (!oa_config) { DRM_DEBUG("Failed to remove unknown OA config\n"); ret = -ENOENT; - goto config_err; + goto err_unlock; } GEM_BUG_ON(*arg != oa_config->id); @@ -3498,13 +3660,16 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, idr_remove(&dev_priv->perf.metrics_idr, *arg); + mutex_unlock(&dev_priv->perf.metrics_lock); + DRM_DEBUG("Removed config %s id=%i\n", oa_config->uuid, oa_config->id); - put_oa_config(dev_priv, oa_config); + i915_oa_config_put(oa_config); -config_err: + return 0; + +err_unlock: mutex_unlock(&dev_priv->perf.metrics_lock); -lock_err: return ret; } @@ -3641,6 +3806,7 @@ void i915_perf_init(struct drm_i915_private *dev_priv) if (dev_priv->perf.ops.enable_metric_set) { INIT_LIST_HEAD(&dev_priv->perf.streams); + mutex_init(&dev_priv->perf.lock); oa_sample_rate_hard_limit = 1000 * @@ -3675,10 +3841,9 @@ void i915_perf_init(struct drm_i915_private *dev_priv) static int destroy_config(int id, void *p, void *data) { - struct drm_i915_private *dev_priv = data; struct i915_oa_config *oa_config = p; - put_oa_config(dev_priv, oa_config); + i915_oa_config_put(oa_config); return 0; } @@ -3692,7 +3857,7 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) if (!dev_priv->perf.initialized) return; - idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, dev_priv); + idr_for_each(&dev_priv->perf.metrics_idr, destroy_config, NULL); idr_destroy(&dev_priv->perf.metrics_idr); unregister_sysctl_table(dev_priv->perf.sysctl_header); diff --git a/drivers/gpu/drm/i915/i915_perf.h b/drivers/gpu/drm/i915/i915_perf.h index 95549de65212..d62980c49d42 100644 --- a/drivers/gpu/drm/i915/i915_perf.h +++ b/drivers/gpu/drm/i915/i915_perf.h @@ -29,5 +29,27 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, void i915_oa_init_reg_state(struct intel_engine_cs *engine, struct intel_context *ce, u32 *reg_state); +int i915_perf_get_oa_config(struct drm_i915_private *i915, + int metrics_set, + struct i915_oa_config **out_config); +int i915_perf_get_oa_config_and_bo(struct i915_perf_stream *stream, + int metrics_set, + struct i915_oa_config **out_config, + struct drm_i915_gem_object **out_obj); +void i915_oa_config_release(struct kref *ref); + +static inline struct i915_oa_config *i915_oa_config_get(struct i915_oa_config *oa_config) +{ + kref_get(&oa_config->ref); + return oa_config; +} + +static inline void i915_oa_config_put(struct i915_oa_config *oa_config) +{ + if (!oa_config) + return; + + kref_put(&oa_config->ref, i915_oa_config_release); +} #endif /* __I915_PERF_H__ */ From patchwork Wed Aug 28 14:33:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119283 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1FAB11395 for ; Wed, 28 Aug 2019 14:34:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 085032080F for ; Wed, 28 Aug 2019 14:34:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 085032080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id F1D9289CE2; Wed, 28 Aug 2019 14:33:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 9470B89CD8 for ; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:40 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419421" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:39 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:23 +0300 Message-Id: <20190828143327.7965-7-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 06/10] drm/i915/perf: implement active wait for noa configurations X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" NOA configuration take some amount of time to apply. That amount of time depends on the size of the GT. There is no documented time for this. For example, past experimentations with powergating configuration changes seem to indicate a 60~70us delay. We go with 500us as default for now which should be over the required amount of time (according to HW architects). v2: Don't forget to save/restore registers used for the wait (Chris) v3: Name used CS_GPR registers (Chris) Fix compile issue due to rebase (Lionel) v4: Fix save/restore helpers (Umesh) v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson (v4) --- drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 24 ++ drivers/gpu/drm/i915/gt/intel_gt_types.h | 5 + drivers/gpu/drm/i915/i915_debugfs.c | 31 +++ drivers/gpu/drm/i915/i915_drv.h | 8 + drivers/gpu/drm/i915/i915_perf.c | 234 ++++++++++++++++++- drivers/gpu/drm/i915/i915_reg.h | 4 +- 6 files changed, 302 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h index a7f1377a54a2..f133f8dbacb1 100644 --- a/drivers/gpu/drm/i915/gt/intel_gpu_commands.h +++ b/drivers/gpu/drm/i915/gt/intel_gpu_commands.h @@ -158,6 +158,7 @@ #define MI_BATCH_GTT (2<<6) /* aliased with (1<<7) on gen4 */ #define MI_BATCH_BUFFER_START_GEN8 MI_INSTR(0x31, 1) #define MI_BATCH_RESOURCE_STREAMER (1<<10) +#define MI_BATCH_PREDICATE (1 << 15) /* HSW+ on RCS only*/ /* * 3D instructions used by the kernel @@ -236,6 +237,29 @@ #define PIPE_CONTROL_DEPTH_CACHE_FLUSH (1<<0) #define PIPE_CONTROL_GLOBAL_GTT (1<<2) /* in addr dword */ +#define MI_MATH(x) MI_INSTR(0x1a, (x)-1) +#define MI_ALU_OP(op, src1, src2) (((op) << 20) | ((src1) << 10) | (src2)) +/* operands */ +#define MI_ALU_OP_NOOP 0 +#define MI_ALU_OP_LOAD 128 +#define MI_ALU_OP_LOADINV 1152 +#define MI_ALU_OP_LOAD0 129 +#define MI_ALU_OP_LOAD1 1153 +#define MI_ALU_OP_ADD 256 +#define MI_ALU_OP_SUB 257 +#define MI_ALU_OP_AND 258 +#define MI_ALU_OP_OR 259 +#define MI_ALU_OP_XOR 260 +#define MI_ALU_OP_STORE 384 +#define MI_ALU_OP_STOREINV 1408 +/* sources */ +#define MI_ALU_SRC_REG(x) (x) /* 0 -> 15 */ +#define MI_ALU_SRC_SRCA 32 +#define MI_ALU_SRC_SRCB 33 +#define MI_ALU_SRC_ACCU 49 +#define MI_ALU_SRC_ZF 50 +#define MI_ALU_SRC_CF 51 + /* * Commands used only by the command parser */ diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h index dc295c196d11..f752b6cf9ea1 100644 --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h @@ -97,6 +97,11 @@ enum intel_gt_scratch_field { /* 8 bytes */ INTEL_GT_SCRATCH_FIELD_COHERENTL3_WA = 256, + /* 6 * 8 bytes */ + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR = 2048, + + /* 4 bytes */ + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1 = 2096, }; #endif /* __INTEL_GT_TYPES_H__ */ diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 5c1a2b1e7d34..5633efc22070 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -3578,6 +3578,36 @@ DEFINE_SIMPLE_ATTRIBUTE(i915_wedged_fops, i915_wedged_get, i915_wedged_set, "%llu\n"); +static int +i915_perf_noa_delay_set(void *data, u64 val) +{ + struct drm_i915_private *i915 = data; + + /* This would lead to infinite waits as we're doing timestamp + * difference on the CS with only 32bits. + */ + if (val > ((1ul << 32) - 1) * RUNTIME_INFO(i915)->cs_timestamp_frequency_khz) + return -EINVAL; + + atomic64_set(&i915->perf.noa_programming_delay, val); + return 0; +} + +static int +i915_perf_noa_delay_get(void *data, u64 *val) +{ + struct drm_i915_private *i915 = data; + + *val = atomic64_read(&i915->perf.noa_programming_delay); + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(i915_perf_noa_delay_fops, + i915_perf_noa_delay_get, + i915_perf_noa_delay_set, + "%llu\n"); + + #define DROP_UNBOUND BIT(0) #define DROP_BOUND BIT(1) #define DROP_RETIRE BIT(2) @@ -4354,6 +4384,7 @@ static const struct i915_debugfs_files { const char *name; const struct file_operations *fops; } i915_debugfs_files[] = { + {"i915_perf_noa_delay", &i915_perf_noa_delay_fops}, {"i915_wedged", &i915_wedged_fops}, {"i915_cache_sharing", &i915_cache_sharing_fops}, {"i915_gem_drop_caches", &i915_drop_caches_fops}, diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index d8a1e842fb48..c452bf6872e7 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1217,6 +1217,12 @@ struct i915_perf_stream { */ u32 head; } oa_buffer; + + /** + * A batch buffer doing a wait on the GPU for the NOA logic to be + * reprogrammed. + */ + struct i915_vma *noa_wait; }; /** @@ -1709,6 +1715,8 @@ struct drm_i915_private { struct i915_oa_ops ops; const struct i915_oa_format *oa_formats; + + atomic64_t noa_programming_delay; } perf; /* Abstract the submission mechanism (legacy ringbuffer or execlists) away */ diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 0385abce7baa..dc3f3170764b 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -197,6 +197,7 @@ #include "gem/i915_gem_context.h" #include "gem/i915_gem_pm.h" +#include "gt/intel_gt.h" #include "gt/intel_lrc_reg.h" #include "i915_drv.h" @@ -406,6 +407,7 @@ static u32 *write_cs_mi_lri(u32 *cs, const struct i915_oa_reg *reg_data, u32 n_r } static struct i915_oa_config_bo* alloc_oa_config_buffer(struct drm_i915_private *i915, + struct i915_vma *noa_wait, struct i915_oa_config *oa_config) { struct i915_oa_config_bo *oa_bo; @@ -434,7 +436,7 @@ static struct i915_oa_config_bo* alloc_oa_config_buffer(struct drm_i915_private MI_LOAD_REGISTER_IMM_MAX_REGS) * 4; config_length += oa_config->flex_regs_len * 8; } - config_length += 4; /* MI_BATCH_BUFFER_END */ + config_length += 12; /* MI_BATCH_BUFFER_START into noa_wait loop */ config_length = ALIGN(config_length, I915_GTT_PAGE_SIZE); oa_bo->bo = i915_gem_object_create_shmem(i915, config_length); @@ -453,7 +455,12 @@ static struct i915_oa_config_bo* alloc_oa_config_buffer(struct drm_i915_private cs = write_cs_mi_lri(cs, oa_config->b_counter_regs, oa_config->b_counter_regs_len); cs = write_cs_mi_lri(cs, oa_config->flex_regs, oa_config->flex_regs_len); - *cs++ = MI_BATCH_BUFFER_END; + + /* Jump into the NOA wait busy loop. */ + *cs++ = (INTEL_GEN(i915) < 8 ? + MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8); + *cs++ = i915_ggtt_offset(noa_wait); + *cs++ = 0; i915_gem_object_flush_map(oa_bo->bo); i915_gem_object_unpin_map(oa_bo->bo); @@ -535,7 +542,9 @@ int i915_perf_get_oa_config_and_bo(struct i915_perf_stream *stream, } if (!oa_bo) { - oa_bo = alloc_oa_config_buffer(i915, oa_config); + oa_bo = alloc_oa_config_buffer(i915, + stream->noa_wait, + oa_config); if (IS_ERR(oa_bo)) { err = PTR_ERR(oa_bo); goto err; @@ -1501,6 +1510,16 @@ free_oa_buffer(struct i915_perf_stream *stream) stream->oa_buffer.vaddr = NULL; } +static void +free_noa_wait(struct i915_perf_stream *stream) +{ + struct drm_i915_private *i915 = stream->dev_priv; + + mutex_lock(&i915->drm.struct_mutex); + i915_vma_unpin_and_release(&stream->noa_wait, 0); + mutex_unlock(&i915->drm.struct_mutex); +} + static void free_oa_configs(struct i915_perf_stream *stream) { @@ -1530,6 +1549,7 @@ static void i915_oa_stream_destroy(struct i915_perf_stream *stream) mutex_unlock(&dev_priv->drm.struct_mutex); free_oa_buffer(stream); + free_noa_wait(stream); intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); @@ -1716,6 +1736,202 @@ static int alloc_oa_buffer(struct i915_perf_stream *stream) return ret; } +static u32 *save_restore_register(struct drm_i915_private *i915, u32 *cs, + bool save, i915_reg_t reg, u32 offset, + u32 dword_count) +{ + uint32_t d; + + for (d = 0; d < dword_count; d++) { + if (save) { + *cs++ = INTEL_GEN(i915) >= 8 ? + MI_STORE_REGISTER_MEM_GEN8 : + MI_STORE_REGISTER_MEM; + } else { + *cs++ = INTEL_GEN(i915) >= 8 ? + MI_LOAD_REGISTER_MEM_GEN8 : + MI_LOAD_REGISTER_MEM; + } + *cs++ = i915_mmio_reg_offset(reg) + 4 * d; + *cs++ = intel_gt_scratch_offset(&i915->gt, offset) + 4 * d; + *cs++ = 0; + } + + return cs; +} + +static int alloc_noa_wait(struct i915_perf_stream *stream) +{ + struct drm_i915_private *i915 = stream->dev_priv; + struct drm_i915_gem_object *bo; + struct i915_vma *vma; + const u64 delay_ticks = 0xffffffffffffffff - + DIV64_U64_ROUND_UP( + atomic64_read(&i915->perf.noa_programming_delay) * + RUNTIME_INFO(i915)->cs_timestamp_frequency_khz, + 1000000ull); + u32 *batch, *ts0, *cs, *jump; + int ret, i; + enum { START_TS, NOW_TS, DELTA_TS, JUMP_PREDICATE, DELTA_TARGET, N_CS_GPR }; + + bo = i915_gem_object_create_internal(i915, 4096); + if (IS_ERR(bo)) { + DRM_ERROR("Failed to allocate NOA wait batchbuffer\n"); + return PTR_ERR(bo); + } + + /* + * We pin in GGTT because we jump into this buffer now because + * multiple OA config BOs will have a jump to this address and it + * needs to be fixed during the lifetime of the i915/perf stream. + */ + vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0); + if (IS_ERR(vma)) { + ret = PTR_ERR(vma); + goto err_unref; + } + + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); + if (IS_ERR(batch)) { + ret = PTR_ERR(batch); + goto err_unpin; + } + + /* Save registers. */ + for (i = 0; i < N_CS_GPR; i++) { + cs = save_restore_register( + i915, cs, true /* save */, HSW_CS_GPR(i), + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); + } + cs = save_restore_register( + i915, cs, true /* save */, MI_PREDICATE_RESULT_1, + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); + + /* First timestamp snapshot location. */ + ts0 = cs; + + /* + * Initial snapshot of the timestamp register to implement the wait. + * We work with 32b values, so clear out the top 32b bits of the + * register because the ALU works 64bits. + */ + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(START_TS)) + 4; + *cs++ = 0; + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE)); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(START_TS)); + + /* + * This is the location we're going to jump back into until the + * required amount of time has passed. + */ + jump = cs; + + /* + * Take another snapshot of the timestamp register. Take care to clear + * up the top 32bits of CS_GPR(1) as we're using it for other + * operations below. + */ + *cs++ = MI_LOAD_REGISTER_IMM(1); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(NOW_TS)) + 4; + *cs++ = 0; + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(RING_TIMESTAMP(RENDER_RING_BASE)); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(NOW_TS)); + + /* + * Do a diff between the 2 timestamps and store the result back into + * CS_GPR(1). + */ + *cs++ = MI_MATH(5); + *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(NOW_TS)); + *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(START_TS)); + *cs++ = MI_ALU_OP(MI_ALU_OP_SUB, 0, 0); + *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(DELTA_TS), MI_ALU_SRC_ACCU); + *cs++ = MI_ALU_OP(MI_ALU_OP_STORE, MI_ALU_SRC_REG(JUMP_PREDICATE), MI_ALU_SRC_CF); + + /* + * Transfer the carry flag (set to 1 if ts1 < ts0, meaning the + * timestamp have rolled over the 32bits) into the predicate register + * to be used for the predicated jump. + */ + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(JUMP_PREDICATE)); + *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1); + + /* Restart from the beginning if we had timestamps roll over. */ + *cs++ = (INTEL_GEN(i915) < 8 ? + MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) | + MI_BATCH_PREDICATE; + *cs++ = i915_ggtt_offset(vma) + (ts0 - batch) * 4; + *cs++ = 0; + + /* + * Now add the diff between to previous timestamps and add it to : + * (((1 * << 64) - 1) - delay_ns) + * + * When the Carry Flag contains 1 this means the elapsed time is + * longer than the expected delay, and we can exit the wait loop. + */ + *cs++ = MI_LOAD_REGISTER_IMM(2); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(DELTA_TARGET)); + *cs++ = lower_32_bits(delay_ticks); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(DELTA_TARGET)) + 4; + *cs++ = upper_32_bits(delay_ticks); + + *cs++ = MI_MATH(4); + *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCA, MI_ALU_SRC_REG(DELTA_TS)); + *cs++ = MI_ALU_OP(MI_ALU_OP_LOAD, MI_ALU_SRC_SRCB, MI_ALU_SRC_REG(DELTA_TARGET)); + *cs++ = MI_ALU_OP(MI_ALU_OP_ADD, 0, 0); + *cs++ = MI_ALU_OP(MI_ALU_OP_STOREINV, MI_ALU_SRC_REG(JUMP_PREDICATE), MI_ALU_SRC_CF); + + /* + * Transfer the result into the predicate register to be used for the + * predicated jump. + */ + *cs++ = MI_LOAD_REGISTER_REG | (3 - 2); + *cs++ = i915_mmio_reg_offset(HSW_CS_GPR(JUMP_PREDICATE)); + *cs++ = i915_mmio_reg_offset(MI_PREDICATE_RESULT_1); + + /* Predicate the jump. */ + *cs++ = (INTEL_GEN(i915) < 8 ? + MI_BATCH_BUFFER_START : MI_BATCH_BUFFER_START_GEN8) | + MI_BATCH_PREDICATE; + *cs++ = i915_ggtt_offset(vma) + (jump - batch) * 4; + *cs++ = 0; + + /* Restore registers. */ + for (i = 0; i < N_CS_GPR; i++) { + cs = save_restore_register( + i915, cs, false /* save */, HSW_CS_GPR(i), + INTEL_GT_SCRATCH_FIELD_PERF_CS_GPR + 8 * i, 2); + } + cs = save_restore_register( + i915, cs, false /* save */, MI_PREDICATE_RESULT_1, + INTEL_GT_SCRATCH_FIELD_PERF_PREDICATE_RESULT_1, 1); + + /* And return to the ring. */ + *cs++ = MI_BATCH_BUFFER_END; + + GEM_BUG_ON((cs - batch) > (PAGE_SIZE / sizeof(*batch))); + + i915_gem_object_flush_map(bo); + i915_gem_object_unpin_map(bo); + + stream->noa_wait = vma; + + return 0; + +err_unpin: + __i915_vma_unpin(vma); + +err_unref: + i915_gem_object_put(bo); + + return ret; +} + static void config_oa_regs(struct drm_i915_private *dev_priv, const struct i915_oa_reg *regs, u32 n_regs) @@ -2384,6 +2600,12 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, } } + ret = alloc_noa_wait(stream); + if (ret) { + DRM_DEBUG("Unable to allocate NOA wait batch buffer\n"); + goto err_noa_wait_alloc; + } + ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, &stream->oa_config); if (ret) { @@ -2450,6 +2672,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); err_config: + free_noa_wait(stream); + +err_noa_wait_alloc: if (stream->ctx) oa_put_render_ctx_id(stream); @@ -3835,6 +4060,9 @@ void i915_perf_init(struct drm_i915_private *dev_priv) ratelimit_set_flags(&dev_priv->perf.spurious_report_rs, RATELIMIT_MSG_ON_RELEASE); + atomic64_set(&dev_priv->perf.noa_programming_delay, + 500 * 1000 /* 500us */); + dev_priv->perf.initialized = true; } } diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 02e1ef10c47e..cac043bc9ff6 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -545,7 +545,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define MI_PREDICATE_SRC0_UDW _MMIO(0x2400 + 4) #define MI_PREDICATE_SRC1 _MMIO(0x2408) #define MI_PREDICATE_SRC1_UDW _MMIO(0x2408 + 4) - +#define MI_PREDICATE_DATA _MMIO(0x2410) +#define MI_PREDICATE_RESULT _MMIO(0x2418) +#define MI_PREDICATE_RESULT_1 _MMIO(0x241c) #define MI_PREDICATE_RESULT_2 _MMIO(0x2214) #define LOWER_SLICE_ENABLED (1 << 0) #define LOWER_SLICE_DISABLED (0 << 0) From patchwork Wed Aug 28 14:33:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119279 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A8AEA112C for ; Wed, 28 Aug 2019 14:33:58 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 913E52080F for ; Wed, 28 Aug 2019 14:33:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 913E52080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id EFACE89CDD; Wed, 28 Aug 2019 14:33:54 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id B3D1589CD9 for ; Wed, 28 Aug 2019 14:33:47 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419430" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:40 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:24 +0300 Message-Id: <20190828143327.7965-8-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 07/10] drm/i915: add a new perf configuration execbuf parameter X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We want the ability to dispatch a set of command buffer to the hardware, each with a different OA configuration. To achieve this, we reuse a couple of fields from the execbuf2 struct (I CAN HAZ execbuf3?) to notify what OA configuration should be used for a batch buffer. This requires the process making the execbuf with this flag to also own the perf fd at the time of execbuf. v2: Add a emit_oa_config() vfunc in the intel_engine_cs (Chris) Move oa_config vma to active (Chris) v3: Don't drop the lock for engine lookup (Chris) Move OA config vma to active before writing the ringbuffer (Chris) v4: Reuse i915_user_extension_fn Serialize requests with OA config updates v5: Check that the chained extension is only present once (Chris) Unpin oa_vma in main path (Chris) v6: Use BIT_ULL (Chris) v7: Hold drm.struct_mutex when serializing the request with OA config (Chris) Signed-off-by: Lionel Landwerlin --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 133 +++++++++++++++++- drivers/gpu/drm/i915/gt/intel_engine_cs.c | 4 + drivers/gpu/drm/i915/gt/intel_engine_types.h | 9 ++ drivers/gpu/drm/i915/gt/intel_lrc.c | 4 +- drivers/gpu/drm/i915/gt/intel_ringbuffer.c | 4 +- drivers/gpu/drm/i915/i915_getparam.c | 4 + drivers/gpu/drm/i915/i915_perf.c | 2 - include/uapi/drm/i915_drm.h | 39 +++++ 8 files changed, 194 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index 46ad8d9642d1..bdecd893cd61 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -24,6 +24,7 @@ #include "i915_gem_clflush.h" #include "i915_gem_context.h" #include "i915_gem_ioctls.h" +#include "i915_perf.h" #include "i915_trace.h" #include "i915_user_extensions.h" @@ -284,7 +285,12 @@ struct i915_execbuffer { struct { u64 flags; /** Available extensions parameters */ struct drm_i915_gem_execbuffer_ext_timeline_fences timeline_fences; + struct drm_i915_gem_execbuffer_ext_perf perf_config; } extensions; + + struct i915_oa_config *oa_config; /** HW configuration for OA, NULL is not needed. */ + struct drm_i915_gem_object *oa_bo; + struct i915_vma *oa_vma; }; #define exec_entry(EB, VMA) (&(EB)->exec[(VMA)->exec_flags - (EB)->flags]) @@ -1152,6 +1158,42 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma) return err; } + +static int +get_execbuf_oa_config(struct i915_execbuffer *eb) +{ + struct file *perf_file; + int err = 0; + + eb->oa_config = NULL; + eb->oa_vma = NULL; + eb->oa_bo = NULL; + + if ((eb->extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) == 0) + return 0; + + perf_file = fget(eb->extensions.perf_config.perf_fd); + if (!perf_file) { + err = -EINVAL; + goto out; + } + + if (perf_file->private_data != eb->i915->perf.exclusive_stream) + err = -EINVAL; + + if (!err) { + err = i915_perf_get_oa_config_and_bo( + eb->i915->perf.exclusive_stream, + eb->extensions.perf_config.oa_config, + &eb->oa_config, &eb->oa_bo); + } + + fput(perf_file); + +out: + return err; +} + static int __reloc_gpu_alloc(struct i915_execbuffer *eb, struct i915_vma *vma, unsigned int len) @@ -2051,6 +2093,42 @@ add_to_client(struct i915_request *rq, struct drm_file *file) spin_unlock(&file_priv->mm.lock); } +static int eb_oa_config(struct i915_execbuffer *eb) +{ + int ret; + + if (!eb->oa_config) + return 0; + + lockdep_assert_held(&eb->i915->drm.struct_mutex); /* oa_config */ + + ret = i915_active_request_set(&eb->engine->last_oa_config, + eb->request); + if (ret) + return ret; + + /* + * If the config hasn't changed, skip reconfiguring the HW (this is + * subject to a delay we want to avoid has much as possible). + */ + if (eb->oa_config == eb->i915->perf.exclusive_stream->oa_config) + return 0; + + ret = i915_vma_move_to_active(eb->oa_vma, eb->request, 0); + if (ret) + return ret; + + ret = eb->engine->emit_bb_start(eb->request, + eb->oa_vma->node.start, + 0, I915_DISPATCH_SECURE); + if (ret) + return ret; + + swap(eb->oa_config, eb->i915->perf.exclusive_stream->oa_config); + + return 0; +} + static int eb_submit(struct i915_execbuffer *eb) { int err; @@ -2077,6 +2155,10 @@ static int eb_submit(struct i915_execbuffer *eb) return err; } + err = eb_oa_config(eb); + if (err) + return err; + err = eb->engine->emit_bb_start(eb->request, eb->batch->node.start + eb->batch_start_offset, @@ -2643,8 +2725,25 @@ static int parse_timeline_fences(struct i915_user_extension __user *ext, void *d return 0; } +static int parse_perf_config(struct i915_user_extension __user *ext, void *data) +{ + struct i915_execbuffer *eb = data; + + if (eb->extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) + return -EINVAL; + + if (copy_from_user(&eb->extensions.perf_config, ext, + sizeof(eb->extensions.perf_config))) + return -EFAULT; + + eb->extensions.flags |= BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_PERF); + + return 0; +} + static const i915_user_extension_fn execbuf_extensions[] = { [DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES] = parse_timeline_fences, + [DRM_I915_GEM_EXECBUFFER_EXT_PERF] = parse_perf_config, }; static int @@ -2755,10 +2854,14 @@ i915_gem_do_execbuffer(struct drm_device *dev, } } - err = eb_create(&eb); + err = get_execbuf_oa_config(&eb); if (err) goto err_out_fence; + err = eb_create(&eb); + if (err) + goto err_oa_config; + GEM_BUG_ON(!eb.lut_size); err = eb_select_context(&eb); @@ -2769,6 +2872,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, if (unlikely(err)) goto err_context; + if (eb.extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) { + if (!intel_engine_has_oa(eb.engine)) { + err = -ENODEV; + goto err_engine; + } + } + err = i915_mutex_lock_interruptible(dev); if (err) goto err_engine; @@ -2889,6 +2999,20 @@ i915_gem_do_execbuffer(struct drm_device *dev, } } + if (eb.extensions.flags & BIT_ULL(DRM_I915_GEM_EXECBUFFER_EXT_PERF)) { + eb.oa_vma = i915_vma_instance(eb.oa_bo, + &eb.engine->i915->ggtt.vm, NULL); + if (unlikely(IS_ERR(eb.oa_vma))) { + err = PTR_ERR(eb.oa_vma); + eb.oa_vma = NULL; + goto err_request; + } + + err = i915_vma_pin(eb.oa_vma, 0, 0, PIN_GLOBAL); + if (err) + goto err_request; + } + /* * Whilst this request exists, batch_obj will be on the * active_list, and so will hold the active reference. Only when this @@ -2935,6 +3059,13 @@ i915_gem_do_execbuffer(struct drm_device *dev, i915_gem_context_put(eb.gem_context); err_destroy: eb_destroy(&eb); +err_oa_config: + if (eb.oa_config) { + i915_gem_object_put(eb.oa_bo); + i915_oa_config_put(eb.oa_config); + } + if (eb.oa_vma) + i915_vma_unpin(eb.oa_vma); err_out_fence: if (out_fence_fd != -1) put_unused_fd(out_fence_fd); diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c index 17006d50b63f..f65375a26532 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c @@ -786,6 +786,10 @@ int intel_engine_init_common(struct intel_engine_cs *engine) engine->emit_fini_breadcrumb_dw = ret; + + INIT_ACTIVE_REQUEST(&engine->last_oa_config, + &engine->i915->drm.struct_mutex); + return 0; err_unpin: diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h index 15e02cb58a67..c62bdb464a06 100644 --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h @@ -399,6 +399,8 @@ struct intel_engine_cs { struct i915_wa_list wa_list; struct i915_wa_list whitelist; + struct i915_active_request last_oa_config; + u32 irq_keep_mask; /* always keep these interrupts */ u32 irq_enable_mask; /* bitmask to enable ring interrupt */ void (*irq_enable)(struct intel_engine_cs *engine); @@ -481,6 +483,7 @@ struct intel_engine_cs { #define I915_ENGINE_HAS_SEMAPHORES BIT(3) #define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(4) #define I915_ENGINE_IS_VIRTUAL BIT(5) +#define I915_ENGINE_HAS_OA BIT(6) unsigned int flags; /* @@ -576,6 +579,12 @@ intel_engine_is_virtual(const struct intel_engine_cs *engine) return engine->flags & I915_ENGINE_IS_VIRTUAL; } +static inline bool +intel_engine_has_oa(const struct intel_engine_cs *engine) +{ + return engine->flags & I915_ENGINE_HAS_OA; +} + #define instdone_has_slice(dev_priv___, sseu___, slice___) \ ((IS_GEN(dev_priv___, 7) ? 1 : ((sseu___)->slice_mask)) & BIT(slice___)) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index a141e9e37bf7..709b08f973c5 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -3082,8 +3082,10 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine) logical_ring_default_vfuncs(engine); logical_ring_default_irqs(engine); - if (engine->class == RENDER_CLASS) + if (engine->class == RENDER_CLASS) { rcs_submission_override(engine); + engine->flags |= I915_ENGINE_HAS_OA; + } return 0; } diff --git a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c index 601c16239fdf..6b06f64ffa23 100644 --- a/drivers/gpu/drm/i915/gt/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/gt/intel_ringbuffer.c @@ -2250,8 +2250,10 @@ static void setup_rcs(struct intel_engine_cs *engine) engine->irq_enable_mask = I915_USER_INTERRUPT; } - if (IS_HASWELL(i915)) + if (IS_HASWELL(i915)) { engine->emit_bb_start = hsw_emit_bb_start; + engine->flags |= I915_ENGINE_HAS_OA; + } engine->resume = rcs_resume; } diff --git a/drivers/gpu/drm/i915/i915_getparam.c b/drivers/gpu/drm/i915/i915_getparam.c index bd41cc5ce906..39d4c2c2e0f4 100644 --- a/drivers/gpu/drm/i915/i915_getparam.c +++ b/drivers/gpu/drm/i915/i915_getparam.c @@ -161,6 +161,10 @@ int i915_getparam_ioctl(struct drm_device *dev, void *data, case I915_PARAM_PERF_REVISION: value = i915_perf_ioctl_version(); break; + case I915_PARAM_HAS_EXEC_PERF_CONFIG: + /* Obviously requires perf support. */ + value = i915->perf.initialized; + break; default: DRM_DEBUG("Unknown parameter %d\n", param->param); return -EINVAL; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index dc3f3170764b..b9e76378b0c2 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -2666,8 +2666,6 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, free_oa_buffer(stream); err_oa_buf_alloc: - i915_oa_config_put(stream->oa_config); - intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index e98c9a7baa91..3166c9ca85f3 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -624,6 +624,16 @@ typedef struct drm_i915_irq_wait { */ #define I915_PARAM_PERF_REVISION 55 +/* + * Request an i915/perf performance configuration change before running the + * commands given in an execbuf. + * + * Performance configuration ID and the file descriptor of the i915 perf + * stream are given through drm_i915_gem_execbuffer_ext_perf. See + * I915_EXEC_EXT. + */ +#define I915_PARAM_HAS_EXEC_PERF_CONFIG 56 + /* Must be kept compact -- no holes and well documented */ typedef struct drm_i915_getparam { @@ -1026,6 +1036,12 @@ enum drm_i915_gem_execbuffer_ext { */ DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 1, + /** + * This identifier is associated with + * drm_i915_gem_execbuffer_perf_ext. + */ + DRM_I915_GEM_EXECBUFFER_EXT_PERF, + DRM_I915_GEM_EXECBUFFER_EXT_MAX /* non-ABI */ }; @@ -1056,6 +1072,29 @@ struct drm_i915_gem_execbuffer_ext_timeline_fences { __u64 values_ptr; }; +struct drm_i915_gem_execbuffer_ext_perf { + struct i915_user_extension base; + + /** + * Performance file descriptor returned by DRM_IOCTL_I915_PERF_OPEN. + * This is used to identify that the application requesting a HW + * performance configuration change actually has a right to do so + * because it also has access the i915-perf stream. + */ + __s32 perf_fd; + + /** + * Unused for now. Must be cleared to zero. + */ + __u32 pad; + + /** + * OA configuration ID to switch to before executing the commands + * associated to the execbuf. + */ + __u64 oa_config; +}; + struct drm_i915_gem_execbuffer2 { /** * List of gem_exec_object2 structs From patchwork Wed Aug 28 14:33:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119287 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id ECEE4112C for ; Wed, 28 Aug 2019 14:34:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D54C72080F for ; Wed, 28 Aug 2019 14:34:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D54C72080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id C920D89CE0; Wed, 28 Aug 2019 14:33:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 4D02689CDD for ; Wed, 28 Aug 2019 14:33:48 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:43 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419441" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:42 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:25 +0300 Message-Id: <20190828143327.7965-9-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 08/10] drm/i915/perf: allow holding preemption on filtered ctx X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson --- .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 8 +++++ drivers/gpu/drm/i915/i915_drv.h | 8 +++++ drivers/gpu/drm/i915/i915_perf.c | 31 +++++++++++++++++-- include/uapi/drm/i915_drm.h | 11 +++++++ 4 files changed, 55 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c index bdecd893cd61..5b1349ea135c 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c @@ -2107,6 +2107,14 @@ static int eb_oa_config(struct i915_execbuffer *eb) if (ret) return ret; + /* + * If the perf stream was opened with hold preemption, flag the + * request properly so that the priority of the request is bumped once + * it reaches the execlist ports. + */ + if (eb->i915->perf.exclusive_stream->hold_preemption) + eb->request->flags |= I915_REQUEST_NOPREEMPT; + /* * If the config hasn't changed, skip reconfiguring the HW (this is * subject to a delay we want to avoid has much as possible). diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index c452bf6872e7..ec609c022e5e 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1121,6 +1121,14 @@ struct i915_perf_stream { */ bool enabled; + /** + * @hold_preemption: Whether preemption is put on hold for command + * submissions done on the @ctx. This is useful for some drivers that + * cannot easily post process the OA buffer context to subtract delta + * of performance counters not associated with @ctx. + */ + bool hold_preemption; + /** * @ops: The callbacks providing the implementation of this specific * type of configured stream. diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index b9e76378b0c2..1eeabf0bbafc 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -343,6 +343,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * struct perf_open_properties - for validated properties given to open a stream * @sample_flags: `DRM_I915_PERF_PROP_SAMPLE_*` properties are tracked as flags * @single_context: Whether a single or all gpu contexts should be monitored + * @hold_preemption: Whether the preemption is disabled for the filtered + * context * @ctx_handle: A gem ctx handle for use with @single_context * @metrics_set: An ID for an OA unit metric set advertised via sysfs * @oa_format: An OA unit HW report format @@ -357,6 +359,7 @@ struct perf_open_properties { u32 sample_flags; u64 single_context:1; + u64 hold_preemption:1; u64 ctx_handle; /* OA sampling state */ @@ -2585,6 +2588,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, if (WARN_ON(stream->oa_buffer.format_size == 0)) return -EINVAL; + stream->hold_preemption = props->hold_preemption; + stream->oa_buffer.format = dev_priv->perf.oa_formats[props->oa_format].format; @@ -3111,6 +3116,15 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, } } + if (props->hold_preemption) { + if (!props->single_context) { + DRM_DEBUG("preemption disable with no context\n"); + ret = -EINVAL; + goto err; + } + privileged_op = true; + } + /* * On Haswell the OA unit supports clock gating off for a specific * context and in this mode there's no visibility of metrics for the @@ -3125,8 +3139,9 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, * MI_REPORT_PERF_COUNT commands and so consider it a privileged op to * enable the OA unit by default. */ - if (IS_HASWELL(dev_priv) && specific_ctx) + if (IS_HASWELL(dev_priv) && specific_ctx && !props->hold_preemption) { privileged_op = false; + } /* Similar to perf's kernel.perf_paranoid_cpu sysctl option * we check a dev.i915.perf_stream_paranoid sysctl option @@ -3135,7 +3150,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, */ if (privileged_op && i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) { - DRM_DEBUG("Insufficient privileges to open system-wide i915 perf stream\n"); + DRM_DEBUG("Insufficient privileges to open i915 perf stream\n"); ret = -EACCES; goto err_ctx; } @@ -3333,6 +3348,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, props->oa_periodic = true; props->oa_period_exponent = value; break; + case DRM_I915_PERF_PROP_HOLD_PREEMPTION: + props->hold_preemption = !!value; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; @@ -4100,5 +4118,12 @@ void i915_perf_fini(struct drm_i915_private *dev_priv) */ int i915_perf_ioctl_version(void) { - return 1; + /* 1: initial version + * + * 2: Add DRM_I915_PERF_PROP_HOLD_PREEMPTION parameter to hold + * preemption on a particular context so that performance data is + * accessible from a delta of MI_RPC reports without looking at the + * OA buffer. + */ + return 2; } diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 3166c9ca85f3..5850d68327ec 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1987,6 +1987,17 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_OA_EXPONENT, + /** + * Specifying this property is only valid when specify a context to + * filter with DRM_I915_PERF_PROP_CTX_HANDLE. Specifying this property + * will hold preemption of the particular context we want to gather + * performance data about. The execbuf2 submissions must include a + * drm_i915_gem_execbuffer_ext_perf parameter for this to apply. + * + * This property is available in perf revision 2. + */ + DRM_I915_PERF_PROP_HOLD_PREEMPTION, + DRM_I915_PERF_PROP_MAX /* non-ABI */ }; From patchwork Wed Aug 28 14:33:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119285 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4393218B7 for ; Wed, 28 Aug 2019 14:34:01 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2C8D02080F for ; Wed, 28 Aug 2019 14:34:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2C8D02080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 6D90C89CF2; Wed, 28 Aug 2019 14:33:55 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6D2C589CBA for ; Wed, 28 Aug 2019 14:33:48 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:44 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419448" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:43 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:26 +0300 Message-Id: <20190828143327.7965-10-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 09/10] drm/i915/perf: execute OA configuration from command stream X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 12 +++ drivers/gpu/drm/i915/i915_perf.c | 146 +++++++++++++++++++------------ 2 files changed, 104 insertions(+), 54 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index ec609c022e5e..f94de001201d 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1094,6 +1094,18 @@ struct i915_perf_stream { */ intel_wakeref_t wakeref; + /** + * @initial_config_rq: First request run at the opening of the i915 + * perf stream to configure the HW. Should be NULL after the perf + * stream has been opened successfully. + */ + struct i915_request *initial_config_rq; + + /** + * @initial_oa_config_bo: First OA configuration BO to be run. + */ + struct drm_i915_gem_object *initial_oa_config_bo; + /** * @sample_flags: Flags representing the `DRM_I915_PERF_PROP_SAMPLE_*` * properties given when opening a stream, representing the contents diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 1eeabf0bbafc..29eba8c3c792 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -1783,6 +1783,10 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return PTR_ERR(bo); } + ret = i915_mutex_lock_interruptible(&i915->drm); + if (ret) + goto err_unref; + /* * We pin in GGTT because we jump into this buffer now because * multiple OA config BOs will have a jump to this address and it @@ -1790,10 +1794,13 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) */ vma = i915_gem_object_ggtt_pin(bo, NULL, 0, 4096, 0); if (IS_ERR(vma)) { + mutex_unlock(&i915->drm.struct_mutex); ret = PTR_ERR(vma); goto err_unref; } + mutex_unlock(&i915->drm.struct_mutex); + batch = cs = i915_gem_object_pin_map(bo, I915_MAP_WB); if (IS_ERR(batch)) { ret = PTR_ERR(batch); @@ -1927,7 +1934,9 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return 0; err_unpin: - __i915_vma_unpin(vma); + mutex_lock(&i915->drm.struct_mutex); + i915_vma_unpin_and_release(&vma, 0); + mutex_unlock(&i915->drm.struct_mutex); err_unref: i915_gem_object_put(bo); @@ -1935,50 +1944,71 @@ static int alloc_noa_wait(struct i915_perf_stream *stream) return ret; } -static void config_oa_regs(struct drm_i915_private *dev_priv, - const struct i915_oa_reg *regs, - u32 n_regs) +static int emit_oa_config(struct drm_i915_private *i915, + struct i915_perf_stream *stream) { - u32 i; + struct i915_request *rq; + struct i915_vma *vma; + u32 *cs; + int err; - for (i = 0; i < n_regs; i++) { - const struct i915_oa_reg *reg = regs + i; + lockdep_assert_held(&i915->drm.struct_mutex); - I915_WRITE(reg->addr, reg->value); + rq = i915_request_create(i915->engine[RCS0]->kernel_context); + if (IS_ERR(rq)) + return PTR_ERR(rq); + + err = i915_active_request_set(&i915->engine[RCS0]->last_oa_config, + rq); + if (err) + goto err_add_request; + + vma = i915_vma_instance(stream->initial_oa_config_bo, + &i915->ggtt.vm, NULL); + if (unlikely(IS_ERR(vma))) { + err = PTR_ERR(vma); + goto err_add_request; } -} -static void delay_after_mux(void) -{ - /* - * It apparently takes a fairly long time for a new MUX - * configuration to be be applied after these register writes. - * This delay duration was derived empirically based on the - * render_basic config but hopefully it covers the maximum - * configuration latency. - * - * As a fallback, the checks in _append_oa_reports() to skip - * invalid OA reports do also seem to work to discard reports - * generated before this config has completed - albeit not - * silently. - * - * Unfortunately this is essentially a magic number, since we - * don't currently know of a reliable mechanism for predicting - * how long the MUX config will take to apply and besides - * seeing invalid reports we don't know of a reliable way to - * explicitly check that the MUX config has landed. - * - * It's even possible we've miss characterized the underlying - * problem - it just seems like the simplest explanation why - * a delay at this location would mitigate any invalid reports. - */ - usleep_range(15000, 20000); + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); + if (err) + goto err_add_request; + + err = i915_vma_move_to_active(vma, rq, 0); + if (err) + goto err_vma_unpin; + + cs = intel_ring_begin(rq, INTEL_GEN(i915) >= 8 ? 4 : 2); + if (IS_ERR(cs)) { + err = PTR_ERR(cs); + goto err_vma_unpin; + } + + if (INTEL_GEN(i915) > 8) { + *cs++ = MI_BATCH_BUFFER_START_GEN8; + *cs++ = lower_32_bits(vma->node.start); + *cs++ = upper_32_bits(vma->node.start); + *cs++ = MI_NOOP; + } else { + *cs++ = MI_BATCH_BUFFER_START; + *cs++ = vma->node.start; + } + + intel_ring_advance(rq, cs); + + stream->initial_config_rq = i915_request_get(rq); + +err_vma_unpin: + i915_vma_unpin(vma); +err_add_request: + i915_request_add(rq); + + return err; } static int hsw_enable_metric_set(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; - const struct i915_oa_config *oa_config = stream->oa_config; /* * PRM: @@ -1995,13 +2025,7 @@ static int hsw_enable_metric_set(struct i915_perf_stream *stream) I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) | GEN6_CSUNIT_CLOCK_GATE_DISABLE)); - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(dev_priv, stream); } static void hsw_disable_metric_set(struct i915_perf_stream *stream) @@ -2360,13 +2384,7 @@ static int gen8_enable_metric_set(struct i915_perf_stream *stream) if (ret) return ret; - config_oa_regs(dev_priv, oa_config->mux_regs, oa_config->mux_regs_len); - delay_after_mux(); - - config_oa_regs(dev_priv, oa_config->b_counter_regs, - oa_config->b_counter_regs_len); - - return 0; + return emit_oa_config(dev_priv, stream); } static void gen8_disable_metric_set(struct i915_perf_stream *stream) @@ -2542,6 +2560,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, { struct drm_i915_private *dev_priv = stream->dev_priv; int format_size; + long timeout; int ret; /* If the sysfs metrics/ directory wasn't registered for some @@ -2611,8 +2630,9 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, goto err_noa_wait_alloc; } - ret = i915_perf_get_oa_config(dev_priv, props->metrics_set, - &stream->oa_config); + ret = i915_perf_get_oa_config_and_bo(stream, props->metrics_set, + &stream->oa_config, + &stream->initial_oa_config_bo); if (ret) { DRM_DEBUG("Invalid OA config id=%i\n", props->metrics_set); goto err_config; @@ -2637,22 +2657,34 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, if (ret) goto err_oa_buf_alloc; + stream->ops = &i915_oa_stream_ops; + ret = i915_mutex_lock_interruptible(&dev_priv->drm); if (ret) goto err_lock; - stream->ops = &i915_oa_stream_ops; dev_priv->perf.exclusive_stream = stream; ret = dev_priv->perf.ops.enable_metric_set(stream); + mutex_unlock(&dev_priv->drm.struct_mutex); if (ret) { DRM_DEBUG("Unable to enable metric set\n"); goto err_enable; } - DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); + timeout = i915_request_wait(stream->initial_config_rq, + I915_WAIT_INTERRUPTIBLE, + MAX_SCHEDULE_TIMEOUT); + i915_request_put(stream->initial_config_rq); + i915_gem_object_put(stream->initial_oa_config_bo); + stream->initial_config_rq = NULL; + stream->initial_oa_config_bo = NULL; - mutex_unlock(&dev_priv->drm.struct_mutex); + ret = timeout < 0 ? timeout : 0; + if (ret) + goto err_enable; + + DRM_DEBUG("opening stream oa config uuid=%s\n", stream->oa_config->uuid); hrtimer_init(&stream->poll_check_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL); @@ -2663,6 +2695,7 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, return 0; err_enable: + mutex_lock(&dev_priv->drm.struct_mutex); dev_priv->perf.exclusive_stream = NULL; dev_priv->perf.ops.disable_metric_set(stream); mutex_unlock(&dev_priv->drm.struct_mutex); @@ -2674,6 +2707,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, intel_uncore_forcewake_put(&dev_priv->uncore, FORCEWAKE_ALL); intel_runtime_pm_put(&dev_priv->runtime_pm, stream->wakeref); + free_oa_configs(stream); + + i915_gem_object_put(stream->initial_oa_config_bo); + i915_request_put(stream->initial_config_rq); + err_config: free_noa_wait(stream); From patchwork Wed Aug 28 14:33:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 11119273 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7CF1A112C for ; Wed, 28 Aug 2019 14:33:54 +0000 (UTC) Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 657252080F for ; Wed, 28 Aug 2019 14:33:54 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 657252080F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=intel-gfx-bounces@lists.freedesktop.org Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id DF72089CBA; Wed, 28 Aug 2019 14:33:53 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by gabe.freedesktop.org (Postfix) with ESMTPS id 1F10689CBA for ; Wed, 28 Aug 2019 14:33:49 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga101.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Aug 2019 07:33:46 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.64,441,1559545200"; d="scan'208";a="380419456" Received: from cmartin2-mobl3.ger.corp.intel.com (HELO delly.ger.corp.intel.com) ([10.252.38.61]) by fmsmga005.fm.intel.com with ESMTP; 28 Aug 2019 07:33:44 -0700 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 28 Aug 2019 17:33:27 +0300 Message-Id: <20190828143327.7965-11-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.23.0 In-Reply-To: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> References: <20190828143327.7965-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [PATCH v11 10/10] drm/i915: add support for perf configuration queries X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" Listing configurations at the moment is supported only through sysfs. This might cause issues for applications wanting to list configurations from a container where sysfs isn't available. This change adds a way to query the number of configurations and their content through the i915 query uAPI. v2: Fix sparse warnings (Lionel) Add support to query configuration using uuid (Lionel) v3: Fix some inconsistency in uapi header (Lionel) Fix unlocking when not locked issue (Lionel) Add debug messages (Lionel) v4: Fix missing unlock (Dan) v5: Drop lock when copying config content to userspace (Chris) v6: Drop lock when copying config list to userspace (Chris) Fix deadlock when calling i915_perf_get_oa_config() under perf.metrics_lock (Lionel) Add i915_oa_config_get() (Chris) Signed-off-by: Lionel Landwerlin Reviewed-by: Chris Wilson --- drivers/gpu/drm/i915/i915_drv.h | 6 + drivers/gpu/drm/i915/i915_perf.c | 3 + drivers/gpu/drm/i915/i915_query.c | 283 ++++++++++++++++++++++++++++++ include/uapi/drm/i915_drm.h | 65 ++++++- 4 files changed, 354 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index f94de001201d..fc8e4ae05f2f 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1699,6 +1699,12 @@ struct drm_i915_private { */ struct idr metrics_idr; + /* + * Number of dynamic configurations, you need to hold + * dev_priv->perf.metrics_lock to access it. + */ + u32 n_metrics; + /* * Lock associated with anything below within this structure * except exclusive_stream. diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 29eba8c3c792..79c22209cbde 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -3878,6 +3878,8 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data, goto sysfs_err; } + dev_priv->perf.n_metrics++; + mutex_unlock(&dev_priv->perf.metrics_lock); DRM_DEBUG("Added config %s id=%i\n", oa_config->uuid, oa_config->id); @@ -3938,6 +3940,7 @@ int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data, &oa_config->sysfs_metric); idr_remove(&dev_priv->perf.metrics_idr, *arg); + dev_priv->perf.n_metrics--; mutex_unlock(&dev_priv->perf.metrics_lock); diff --git a/drivers/gpu/drm/i915/i915_query.c b/drivers/gpu/drm/i915/i915_query.c index abac5042da2b..89b2821be4a0 100644 --- a/drivers/gpu/drm/i915/i915_query.c +++ b/drivers/gpu/drm/i915/i915_query.c @@ -7,6 +7,7 @@ #include #include "i915_drv.h" +#include "i915_perf.h" #include "i915_query.h" #include @@ -140,10 +141,292 @@ query_engine_info(struct drm_i915_private *i915, return len; } +static int can_copy_perf_config_registers_or_number(u32 user_n_regs, + u64 user_regs_ptr, + u32 kernel_n_regs) +{ + /* + * We'll just put the number of registers, and won't copy the + * register. + */ + if (user_n_regs == 0) + return 0; + + if (user_n_regs < kernel_n_regs) + return -EINVAL; + + if (!access_ok(u64_to_user_ptr(user_regs_ptr), + 2 * sizeof(u32) * kernel_n_regs)) + return -EFAULT; + + return 0; +} + +static int copy_perf_config_registers_or_number(const struct i915_oa_reg *kernel_regs, + u32 kernel_n_regs, + u64 user_regs_ptr, + u32 *user_n_regs) +{ + u32 r; + + if (*user_n_regs == 0) { + *user_n_regs = kernel_n_regs; + return 0; + } + + *user_n_regs = kernel_n_regs; + + for (r = 0; r < kernel_n_regs; r++) { + u32 __user *user_reg_ptr = + u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2); + u32 __user *user_val_ptr = + u64_to_user_ptr(user_regs_ptr + sizeof(u32) * r * 2 + + sizeof(u32)); + int ret; + + ret = __put_user(i915_mmio_reg_offset(kernel_regs[r].addr), + user_reg_ptr); + if (ret) + return -EFAULT; + + ret = __put_user(kernel_regs[r].value, user_val_ptr); + if (ret) + return -EFAULT; + } + + return 0; +} + +static int query_perf_config_data(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item, + bool use_uuid) +{ + struct drm_i915_query_perf_config __user *user_query_config_ptr = + u64_to_user_ptr(query_item->data_ptr); + struct drm_i915_perf_oa_config __user *user_config_ptr = + u64_to_user_ptr(query_item->data_ptr + + sizeof(struct drm_i915_query_perf_config)); + struct drm_i915_perf_oa_config user_config; + struct i915_oa_config *oa_config = NULL; + char uuid[UUID_STRING_LEN + 1]; + u64 config_id; + u32 flags, total_size; + int ret; + + if (!i915->perf.initialized) + return -ENODEV; + + total_size = sizeof(struct drm_i915_query_perf_config) + + sizeof(struct drm_i915_perf_oa_config); + + if (query_item->length == 0) + return total_size; + + if (query_item->length < total_size) { + DRM_DEBUG("Invalid query config data item size=%u expected=%u\n", + query_item->length, total_size); + return -EINVAL; + } + + if (!access_ok(user_query_config_ptr, total_size)) + return -EFAULT; + + if (__get_user(flags, &user_query_config_ptr->flags)) + return -EFAULT; + + if (flags != 0) + return -EINVAL; + + if (use_uuid) { + BUILD_BUG_ON(sizeof(user_query_config_ptr->uuid) >= sizeof(uuid)); + + memset(&uuid, 0, sizeof(uuid)); + if (__copy_from_user(uuid, user_query_config_ptr->uuid, + sizeof(user_query_config_ptr->uuid))) + return -EFAULT; + } else { + if (__get_user(config_id, &user_query_config_ptr->config)) { + return -EFAULT; + } + } + + if (use_uuid) { + struct i915_oa_config *tmp; + int id; + + ret = mutex_lock_interruptible(&i915->perf.metrics_lock); + if (ret) + return ret; + + idr_for_each_entry(&i915->perf.metrics_idr, tmp, id) { + if (!strcmp(tmp->uuid, uuid)) { + oa_config = i915_oa_config_get(tmp); + break; + } + } + + mutex_unlock(&i915->perf.metrics_lock); + } else { + ret = i915_perf_get_oa_config(i915, config_id, &oa_config); + } + + if (ret || !oa_config) + return -ENOENT; + + if (__copy_from_user(&user_config, user_config_ptr, + sizeof(user_config))) { + ret = -EFAULT; + goto out; + } + + ret = can_copy_perf_config_registers_or_number(user_config.n_boolean_regs, + user_config.boolean_regs_ptr, + oa_config->b_counter_regs_len); + if (ret) + goto out; + + ret = can_copy_perf_config_registers_or_number(user_config.n_flex_regs, + user_config.flex_regs_ptr, + oa_config->flex_regs_len); + if (ret) + goto out; + + ret = can_copy_perf_config_registers_or_number(user_config.n_mux_regs, + user_config.mux_regs_ptr, + oa_config->mux_regs_len); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->b_counter_regs, + oa_config->b_counter_regs_len, + user_config.boolean_regs_ptr, + &user_config.n_boolean_regs); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->flex_regs, + oa_config->flex_regs_len, + user_config.flex_regs_ptr, + &user_config.n_flex_regs); + if (ret) + goto out; + + ret = copy_perf_config_registers_or_number(oa_config->mux_regs, + oa_config->mux_regs_len, + user_config.mux_regs_ptr, + &user_config.n_mux_regs); + if (ret) + goto out; + + memcpy(user_config.uuid, oa_config->uuid, sizeof(user_config.uuid)); + + if (__copy_to_user(user_config_ptr, &user_config, + sizeof(user_config))) { + ret = -EFAULT; + goto out; + } + + ret = total_size; + +out: + i915_oa_config_put(oa_config); + + return ret; +} + +static int query_perf_config_list(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item) +{ + struct drm_i915_query_perf_config __user *user_query_config_ptr = + u64_to_user_ptr(query_item->data_ptr); + struct i915_oa_config *oa_config; + u32 flags, total_size; + u64 i, n_configs, *oa_config_ids; + int ret, id; + + if (!i915->perf.initialized) + return -ENODEV; + + /* Count the default test configuration */ + n_configs = i915->perf.n_metrics + 1; + total_size = sizeof(struct drm_i915_query_perf_config) + + sizeof(u64) * n_configs; + + if (query_item->length == 0) + return total_size; + + if (query_item->length < total_size) { + DRM_DEBUG("Invalid query config list item size=%u expected=%u\n", + query_item->length, total_size); + return -EINVAL; + } + + if (!access_ok(user_query_config_ptr, total_size)) + return -EFAULT; + + if (__get_user(flags, &user_query_config_ptr->flags)) + return -EFAULT; + + if (flags != 0) + return -EINVAL; + + if (__put_user(n_configs, &user_query_config_ptr->config)) + return -EFAULT; + + ret = mutex_lock_interruptible(&i915->perf.metrics_lock); + if (ret) + return ret; + + /* Count the configs. */ + n_configs = 1; + idr_for_each_entry(&i915->perf.metrics_idr, oa_config, id) + n_configs++; + + oa_config_ids = + kmalloc_array(n_configs, sizeof(*oa_config_ids), GFP_KERNEL); + if (!oa_config_ids) { + mutex_unlock(&i915->perf.metrics_lock); + return -ENOMEM; + } + + i = 0; + oa_config_ids[i++] = 1ull; + idr_for_each_entry(&i915->perf.metrics_idr, oa_config, id) + oa_config_ids[i++] = id; + + mutex_unlock(&i915->perf.metrics_lock); + + ret = copy_to_user(u64_to_user_ptr(query_item->data_ptr + + sizeof(struct drm_i915_query_perf_config)), + oa_config_ids, + n_configs * sizeof(*oa_config_ids)); + kfree(oa_config_ids); + if (ret) + return ret; + + return total_size; +} + +static int query_perf_config(struct drm_i915_private *i915, + struct drm_i915_query_item *query_item) +{ + switch (query_item->flags) { + case DRM_I915_QUERY_PERF_CONFIG_LIST: + return query_perf_config_list(i915, query_item); + case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID: + return query_perf_config_data(i915, query_item, true); + case DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID: + return query_perf_config_data(i915, query_item, false); + default: + return -EINVAL; + } +} + static int (* const i915_query_funcs[])(struct drm_i915_private *dev_priv, struct drm_i915_query_item *query_item) = { query_topology_info, query_engine_info, + query_perf_config, }; int i915_query_ioctl(struct drm_device *dev, void *data, struct drm_file *file) diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 5850d68327ec..2e7215989df2 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1037,8 +1037,7 @@ enum drm_i915_gem_execbuffer_ext { DRM_I915_GEM_EXECBUFFER_EXT_TIMELINE_FENCES = 1, /** - * This identifier is associated with - * drm_i915_gem_execbuffer_perf_ext. + * See drm_i915_gem_execbuffer_perf_ext. */ DRM_I915_GEM_EXECBUFFER_EXT_PERF, @@ -2113,6 +2112,7 @@ struct drm_i915_query_item { __u64 query_id; #define DRM_I915_QUERY_TOPOLOGY_INFO 1 #define DRM_I915_QUERY_ENGINE_INFO 2 +#define DRM_I915_QUERY_PERF_CONFIG 3 /* Must be kept compact -- no holes and well documented */ /* @@ -2124,9 +2124,18 @@ struct drm_i915_query_item { __s32 length; /* - * Unused for now. Must be cleared to zero. + * When query_id == DRM_I915_QUERY_TOPOLOGY_INFO, must be 0. + * + * When query_id == DRM_I915_QUERY_PERF_CONFIG, must be one of the + * following : + * - DRM_I915_QUERY_PERF_CONFIG_LIST + * - DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID + * - DRM_I915_QUERY_PERF_CONFIG_FOR_UUID */ __u32 flags; +#define DRM_I915_QUERY_PERF_CONFIG_LIST 1 +#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID 2 +#define DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID 3 /* * Data will be written at the location pointed by data_ptr when the @@ -2252,6 +2261,56 @@ struct drm_i915_query_engine_info { struct drm_i915_engine_info engines[]; }; +/* + * Data written by the kernel with query DRM_I915_QUERY_PERF_CONFIG. + */ +struct drm_i915_query_perf_config { + union { + /* + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 sets + * this fields to the number of configurations available. + */ + __u64 n_configs; + + /* + * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_ID, + * i915 will use the value in this field as configuration + * identifier to decide what data to write into config_ptr. + */ + __u64 config; + + /* + * When query_id == DRM_I915_QUERY_PERF_CONFIG_DATA_FOR_UUID, + * i915 will use the value in this field as configuration + * identifier to decide what data to write into config_ptr. + * + * String formatted like "%08x-%04x-%04x-%04x-%012x" + */ + char uuid[36]; + }; + + /* + * Unused for now. Must be cleared to zero. + */ + __u32 flags; + + /* + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_LIST, i915 will + * write an array of __u64 of configuration identifiers. + * + * When query_item.flags == DRM_I915_QUERY_PERF_CONFIG_DATA, i915 will + * write a struct drm_i915_perf_oa_config. If the following fields of + * drm_i915_perf_oa_config are set not set to 0, i915 will write into + * the associated pointers the values of submitted when the + * configuration was created : + * + * - n_mux_regs + * - n_boolean_regs + * - n_flex_regs + */ + __u8 data[]; +}; + #if defined(__cplusplus) } #endif