From patchwork Wed Dec 19 14:37:44 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10737353 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5E363746 for ; Wed, 19 Dec 2018 14:38:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4D1E92873F for ; Wed, 19 Dec 2018 14:38:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3F0FA287BE; Wed, 19 Dec 2018 14:38:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 6D0BF2873F for ; Wed, 19 Dec 2018 14:38:02 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0D3676EFEA; Wed, 19 Dec 2018 14:38:00 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7982E6EFE4 for ; Wed, 19 Dec 2018 14:37:53 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2018 06:37:53 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,373,1539673200"; d="scan'208";a="127343428" Received: from delly.ld.intel.com ([10.103.238.204]) by fmsmga002.fm.intel.com with ESMTP; 19 Dec 2018 06:37:52 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Dec 2018 14:37:44 +0000 Message-Id: <20181219143747.8997-2-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> References: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC 1/4] drm/i915/perf: rework aging tail workaround X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP We're about to introduce an options to open the perf stream, giving the user ability to configure how often it wants the kernel to pull the OA registers for available data. Right now the workaround against the OA tail pointer race condition requires at least twice the internal kernel pulling timer to make any data available. This changes introduce checks on the OA data written into the circular buffer to make as much data as possible available on the first iteration of the pulling timer. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 32 +++--- drivers/gpu/drm/i915/i915_perf.c | 162 +++++++++++++++---------------- 2 files changed, 95 insertions(+), 99 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 0689e67c966e..09c4b5de141b 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1845,6 +1845,12 @@ struct drm_i915_private { */ struct ratelimit_state spurious_report_rs; + /** + * For rate limiting any notifications of tail pointer + * race. + */ + struct ratelimit_state tail_pointer_race; + bool periodic; int period_exponent; @@ -1885,23 +1891,11 @@ struct drm_i915_private { spinlock_t ptr_lock; /** - * One 'aging' tail pointer and one 'aged' - * tail pointer ready to used for reading. - * - * Initial values of 0xffffffff are invalid - * and imply that an update is required - * (and should be ignored by an attempted - * read) - */ - struct { - u32 offset; - } tails[2]; - - /** - * Index for the aged tail ready to read() - * data up to. + * The last HW tail reported by HW. The data + * might not have made it to memory yet + * though. */ - unsigned int aged_tail_idx; + u32 aging_tail; /** * A monotonic timestamp for when the current @@ -1920,6 +1914,12 @@ struct drm_i915_private { * data to userspace. */ u32 head; + + /** + * The last tail verified tail that can be + * read by userspace. + */ + u32 tail; } oa_buffer; u32 gen7_latched_oastatus1; diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 4529edfdcfc8..d54142f1cff4 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -451,8 +451,7 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) { int report_size = dev_priv->perf.oa.oa_buffer.format_size; unsigned long flags; - unsigned int aged_idx; - u32 head, hw_tail, aged_tail, aging_tail; + u32 hw_tail; u64 now; /* We have to consider the (unlikely) possibility that read() errors @@ -461,16 +460,6 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) */ spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - /* NB: The head we observe here might effectively be a little out of - * date (between head and tails[aged_idx].offset if there is currently - * a read() in progress. - */ - head = dev_priv->perf.oa.oa_buffer.head; - - aged_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - aged_tail = dev_priv->perf.oa.oa_buffer.tails[aged_idx].offset; - aging_tail = dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset; - hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv); /* The tail pointer increases in 64 byte increments, @@ -480,63 +469,76 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) now = ktime_get_mono_fast_ns(); - /* Update the aged tail - * - * Flip the tail pointer available for read()s once the aging tail is - * old enough to trust that the corresponding data will be visible to - * the CPU... - * - * Do this before updating the aging pointer in case we may be able to - * immediately start aging a new pointer too (if new data has become - * available) without needing to wait for a later hrtimer callback. - */ - if (aging_tail != INVALID_TAIL_PTR && - ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > - OA_TAIL_MARGIN_NSEC)) { - - aged_idx ^= 1; - dev_priv->perf.oa.oa_buffer.aged_tail_idx = aged_idx; + if (hw_tail == dev_priv->perf.oa.oa_buffer.aging_tail) { + /* If the HW tail hasn't move since the last check and the HW + * tail has been aging for long enough, declare it the new + * tail. + */ + if ((now - dev_priv->perf.oa.oa_buffer.aging_timestamp) > + OA_TAIL_MARGIN_NSEC) { + dev_priv->perf.oa.oa_buffer.tail = + dev_priv->perf.oa.oa_buffer.aging_tail; + } + } else { + u32 gtt_offset = i915_ggtt_offset(dev_priv->perf.oa.oa_buffer.vma); + u32 head, tail, landed_report_heads; - aged_tail = aging_tail; + /* NB: The head we observe here might effectively be a little out of + * date (between head and tails[aged_idx].offset if there is currently + * a read() in progress. + */ + head = dev_priv->perf.oa.oa_buffer.head - gtt_offset; - /* Mark that we need a new pointer to start aging... */ - dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = INVALID_TAIL_PTR; - aging_tail = INVALID_TAIL_PTR; - } + hw_tail -= gtt_offset; + tail = hw_tail; - /* Update the aging tail - * - * We throttle aging tail updates until we have a new tail that - * represents >= one report more data than is already available for - * reading. This ensures there will be enough data for a successful - * read once this new pointer has aged and ensures we will give the new - * pointer time to age. - */ - if (aging_tail == INVALID_TAIL_PTR && - (aged_tail == INVALID_TAIL_PTR || - OA_TAKEN(hw_tail, aged_tail) >= report_size)) { - struct i915_vma *vma = dev_priv->perf.oa.oa_buffer.vma; - u32 gtt_offset = i915_ggtt_offset(vma); - - /* Be paranoid and do a bounds check on the pointer read back - * from hardware, just in case some spurious hardware condition - * could put the tail out of bounds... + /* Walk the stream backward until we find at least 2 reports + * with dword 0 & 1 not at 0. Since the circular buffer + * pointers progress by increments of 64 bytes and that + * reports can be up to 256 bytes long, we can't tell whether + * a report has fully landed in memory before the first 2 + * dwords of the following report have effectively landed. + * + * This is assuming that chunks of memory the OA unit writes + * to land in the order they were written to. + * If not : (╯°□°)╯︵ ┻━┻ */ - if (hw_tail >= gtt_offset && - hw_tail < (gtt_offset + OA_BUFFER_SIZE)) { - dev_priv->perf.oa.oa_buffer.tails[!aged_idx].offset = - aging_tail = hw_tail; - dev_priv->perf.oa.oa_buffer.aging_timestamp = now; - } else { - DRM_ERROR("Ignoring spurious out of range OA buffer tail pointer = %u\n", - hw_tail); + landed_report_heads = 0; + while (OA_TAKEN(tail, head) >= report_size) { + u32 previous_tail = (tail - report_size) & (OA_BUFFER_SIZE - 1); + u8 *report = dev_priv->perf.oa.oa_buffer.vaddr + previous_tail; + u32 *report32 = (void *) report; + + /* Head of the report indicated by the HW tail register has + * indeed landed into memory. + */ + if (report32[0] != 0 || report[1] != 0) { + landed_report_heads++; + + if (landed_report_heads >= 2) + break; + } + + tail = previous_tail; } + + if (abs(tail - hw_tail) >= (2 * report_size)) { + if (__ratelimit(&dev_priv->perf.oa.tail_pointer_race)) { + DRM_NOTE("unlanded report(s) head=0x%x " + "tail=0x%x hw_tail=0x%x\n", + head, tail, hw_tail); + } + } + + dev_priv->perf.oa.oa_buffer.tail = gtt_offset + tail; + dev_priv->perf.oa.oa_buffer.aging_tail = gtt_offset + hw_tail; + dev_priv->perf.oa.oa_buffer.aging_timestamp = now; } spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); - return aged_tail == INVALID_TAIL_PTR ? - false : OA_TAKEN(aged_tail, head) >= report_size; + return OA_TAKEN(dev_priv->perf.oa.oa_buffer.tail, + dev_priv->perf.oa.oa_buffer.head) >= report_size; } /** @@ -655,7 +657,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, u32 mask = (OA_BUFFER_SIZE - 1); size_t start_offset = *offset; unsigned long flags; - unsigned int aged_tail_idx; u32 head, tail; u32 taken; int ret = 0; @@ -666,8 +667,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); head = dev_priv->perf.oa.oa_buffer.head; - aged_tail_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - tail = dev_priv->perf.oa.oa_buffer.tails[aged_tail_idx].offset; + tail = dev_priv->perf.oa.oa_buffer.tail; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); @@ -698,7 +698,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, head, tail)) return -EIO; - for (/* none */; (taken = OA_TAKEN(tail, head)); head = (head + report_size) & mask) { @@ -806,13 +805,10 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, } /* - * The above reason field sanity check is based on - * the assumption that the OA buffer is initially - * zeroed and we reset the field after copying so the - * check is still meaningful once old reports start - * being overwritten. + * Clear out the first 2 dword as a mean to detect unlanded + * reports. */ - report32[0] = 0; + report32[0] = report32[1] = 0; } if (start_offset != *offset) { @@ -944,7 +940,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, u32 mask = (OA_BUFFER_SIZE - 1); size_t start_offset = *offset; unsigned long flags; - unsigned int aged_tail_idx; u32 head, tail; u32 taken; int ret = 0; @@ -955,8 +950,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); head = dev_priv->perf.oa.oa_buffer.head; - aged_tail_idx = dev_priv->perf.oa.oa_buffer.aged_tail_idx; - tail = dev_priv->perf.oa.oa_buffer.tails[aged_tail_idx].offset; + tail = dev_priv->perf.oa.oa_buffer.tail; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); @@ -1020,13 +1014,10 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, if (ret) break; - /* The above report-id field sanity check is based on - * the assumption that the OA buffer is initially - * zeroed and we reset the field after copying so the - * check is still meaningful once old reports start - * being overwritten. + /* Clear out the first 2 dwords as a mean to detect unlanded + * reports. */ - report32[0] = 0; + report32[0] = report32[1] = 0; } if (start_offset != *offset) { @@ -1397,8 +1388,8 @@ static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv) I915_WRITE(GEN7_OASTATUS1, gtt_offset | OABUFFER_SIZE_16M); /* tail */ /* Mark that we need updated tail pointers to read from... */ - dev_priv->perf.oa.oa_buffer.tails[0].offset = INVALID_TAIL_PTR; - dev_priv->perf.oa.oa_buffer.tails[1].offset = INVALID_TAIL_PTR; + dev_priv->perf.oa.oa_buffer.aging_tail = + dev_priv->perf.oa.oa_buffer.tail = gtt_offset; spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); @@ -1453,8 +1444,8 @@ static void gen8_init_oa_buffer(struct drm_i915_private *dev_priv) I915_WRITE(GEN8_OATAILPTR, gtt_offset & GEN8_OATAILPTR_MASK); /* Mark that we need updated tail pointers to read from... */ - dev_priv->perf.oa.oa_buffer.tails[0].offset = INVALID_TAIL_PTR; - dev_priv->perf.oa.oa_buffer.tails[1].offset = INVALID_TAIL_PTR; + dev_priv->perf.oa.oa_buffer.aging_tail = + dev_priv->perf.oa.oa_buffer.tail = gtt_offset; /* * Reset state used to recognise context switches, affecting which @@ -2043,6 +2034,11 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream, ratelimit_set_flags(&dev_priv->perf.oa.spurious_report_rs, RATELIMIT_MSG_ON_RELEASE); + ratelimit_state_init(&dev_priv->perf.oa.tail_pointer_race, + 5 * HZ, 10); + ratelimit_set_flags(&dev_priv->perf.oa.tail_pointer_race, + RATELIMIT_MSG_ON_RELEASE); + stream->sample_size = sizeof(struct drm_i915_perf_record_header); format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size; From patchwork Wed Dec 19 14:37:45 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10737347 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6ED9E746 for ; Wed, 19 Dec 2018 14:37:58 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5E5902871C for ; Wed, 19 Dec 2018 14:37:58 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 4E9BE28756; Wed, 19 Dec 2018 14:37:58 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id D490A2871C for ; Wed, 19 Dec 2018 14:37:57 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 22DD86EFE5; Wed, 19 Dec 2018 14:37:57 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 7040B6EFE5 for ; Wed, 19 Dec 2018 14:37:54 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2018 06:37:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,373,1539673200"; d="scan'208";a="127343438" Received: from delly.ld.intel.com ([10.103.238.204]) by fmsmga002.fm.intel.com with ESMTP; 19 Dec 2018 06:37:53 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Dec 2018 14:37:45 +0000 Message-Id: <20181219143747.8997-3-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> References: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC 2/4] drm/i915/perf: add new open param to configure polling of OA buffer X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This new parameter let's the application choose how often the OA buffer should be checked on the CPU side for data availability. Longer polling period tend to reduce CPU overhead if the application does not care about somewhat real time data collection. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 6 ++++++ drivers/gpu/drm/i915/i915_perf.c | 27 +++++++++++++++++++++------ include/uapi/drm/i915_drm.h | 7 +++++++ 3 files changed, 34 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 09c4b5de141b..3669ed0a6376 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1348,6 +1348,12 @@ struct i915_perf_stream { * @oa_config: The OA configuration used by the stream. */ struct i915_oa_config *oa_config; + + /** + * @poll_oa_period: The period in nanoseconds at which the OA + * buffer should be checked for available data. + */ + u64 poll_oa_period; }; /** diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index d54142f1cff4..a8df58ec7d75 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -264,11 +264,11 @@ #define OA_TAIL_MARGIN_NSEC 100000ULL #define INVALID_TAIL_PTR 0xffffffff -/* frequency for checking whether the OA unit has written new reports to the - * circular OA buffer... +/* The default frequency for checking whether the OA unit has written new + * reports to the circular OA buffer... */ -#define POLL_FREQUENCY 200 -#define POLL_PERIOD (NSEC_PER_SEC / POLL_FREQUENCY) +#define DEFAULT_POLL_FREQUENCY 200 +#define DEFAULT_POLL_PERIOD (NSEC_PER_SEC / DEFAULT_POLL_FREQUENCY) /* for sysctl proc_dointvec_minmax of dev.i915.perf_stream_paranoid */ static int zero; @@ -345,6 +345,8 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * @oa_format: An OA unit HW report format * @oa_periodic: Whether to enable periodic OA unit sampling * @oa_period_exponent: The OA unit sampling period is derived from this + * @poll_oa_period: The period at which the CPU will check for OA data + * availability * * As read_properties_unlocked() enumerates and validates the properties given * to open a stream of metrics the configuration is built up in the structure @@ -361,6 +363,7 @@ struct perf_open_properties { int oa_format; bool oa_periodic; int oa_period_exponent; + u64 poll_oa_period; }; static void free_oa_config(struct drm_i915_private *dev_priv, @@ -1902,7 +1905,7 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) if (dev_priv->perf.oa.periodic) hrtimer_start(&dev_priv->perf.oa.poll_check_timer, - ns_to_ktime(POLL_PERIOD), + ns_to_ktime(stream->poll_oa_period), HRTIMER_MODE_REL_PINNED); } @@ -2266,13 +2269,15 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) struct drm_i915_private *dev_priv = container_of(hrtimer, typeof(*dev_priv), perf.oa.poll_check_timer); + struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream; if (oa_buffer_check_unlocked(dev_priv)) { dev_priv->perf.oa.pollin = true; wake_up(&dev_priv->perf.oa.poll_wq); } - hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD)); + hrtimer_forward_now(hrtimer, + ns_to_ktime(stream->poll_oa_period)); return HRTIMER_RESTART; } @@ -2593,6 +2598,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, stream->dev_priv = dev_priv; stream->ctx = specific_ctx; + stream->poll_oa_period = props->poll_oa_period; ret = i915_oa_stream_init(stream, param, props); if (ret) @@ -2669,6 +2675,7 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, u32 i; memset(props, 0, sizeof(struct perf_open_properties)); + props->poll_oa_period = DEFAULT_POLL_PERIOD; if (!n_props) { DRM_DEBUG("No i915 perf properties given\n"); @@ -2772,6 +2779,14 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, props->oa_periodic = true; props->oa_period_exponent = value; break; + case DRM_I915_PERF_PROP_POLL_OA_DELAY: + if (value < 100000 /* 100us */) { + DRM_DEBUG("OA availability timer too small (%lluns < 100us)\n", + value); + return -EINVAL; + } + props->poll_oa_period = value; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 298b2e197744..7191236ad7f2 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1540,6 +1540,13 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_OA_EXPONENT, + /** + * Specifying this property sets up a hrtimer in nanoseconds at which + * the i915 driver will check the OA buffer for available data. Values + * below 100 microseconds are not allowed. + */ + DRM_I915_PERF_PROP_POLL_OA_DELAY, + DRM_I915_PERF_PROP_MAX /* non-ABI */ }; From patchwork Wed Dec 19 14:37:46 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10737351 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 5D5BC13AD for ; Wed, 19 Dec 2018 14:38:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4ABD42873F for ; Wed, 19 Dec 2018 14:38:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 3B6F1287ED; Wed, 19 Dec 2018 14:38:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 7FC222873F for ; Wed, 19 Dec 2018 14:38:01 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id A8B756EFE7; Wed, 19 Dec 2018 14:37:59 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 661CA6EFE5 for ; Wed, 19 Dec 2018 14:37:55 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2018 06:37:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,373,1539673200"; d="scan'208";a="127343445" Received: from delly.ld.intel.com ([10.103.238.204]) by fmsmga002.fm.intel.com with ESMTP; 19 Dec 2018 06:37:54 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Dec 2018 14:37:46 +0000 Message-Id: <20181219143747.8997-4-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> References: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC 3/4] drm/i915/perf: handle interrupts from the OA unit X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP The OA unit can notify that its circular buffer is half full through an interrupt and we would like to give the application the ability to make use of this interrupt to get rid of CPU checks on the OA buffer. This change wires up the interrupt to the i915-perf stream and leaves it ignored for now. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_drv.h | 21 +++++++++++++ drivers/gpu/drm/i915/i915_irq.c | 39 ++++++++++++++++++++----- drivers/gpu/drm/i915/i915_perf.c | 26 +++++++++++++++++ drivers/gpu/drm/i915/i915_reg.h | 7 +++++ drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ 5 files changed, 88 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h index 3669ed0a6376..014294e33b40 100644 --- a/drivers/gpu/drm/i915/i915_drv.h +++ b/drivers/gpu/drm/i915/i915_drv.h @@ -1354,6 +1354,12 @@ struct i915_perf_stream { * buffer should be checked for available data. */ u64 poll_oa_period; + + /** + * @oa_interrupt_monitor: Whether the stream will be notified by OA + * interrupts. + */ + bool oa_interrupt_monitor; }; /** @@ -1845,6 +1851,21 @@ struct drm_i915_private { wait_queue_head_t poll_wq; bool pollin; + /** + * Atomic counter incremented by the interrupt + * handling code for each OA half full interrupt + * received. + */ + atomic64_t half_full_count; + + /** + * Copy of the atomic half_full_count that was last + * processed in the i915-perf driver. If both counters + * differ, there is data available to read in the OA + * buffer. + */ + u64 half_full_count_last; + /** * For rate limiting any notifications of spurious * invalid OA reports diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index d447d7d508f4..b1098118f71e 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -1222,6 +1222,12 @@ static void notify_ring(struct intel_engine_cs *engine) trace_intel_engine_notify(engine, wait); } +static void notify_perfmon_buffer_half_full(struct drm_i915_private *i915) +{ + atomic64_inc(&i915->perf.oa.half_full_count); + wake_up_all(&i915->perf.oa.poll_wq); +} + static void vlv_c0_read(struct drm_i915_private *dev_priv, struct intel_rps_ei *ei) { @@ -1486,6 +1492,9 @@ static void snb_gt_irq_handler(struct drm_i915_private *dev_priv, GT_RENDER_CS_MASTER_ERROR_INTERRUPT)) DRM_DEBUG("Command parser error, gt_iir 0x%08x\n", gt_iir); + if (gt_iir & GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT) + notify_perfmon_buffer_half_full(dev_priv); + if (gt_iir & GT_PARITY_ERROR(dev_priv)) ivybridge_parity_error_irq_handler(dev_priv, gt_iir); } @@ -1507,6 +1516,12 @@ gen8_cs_irq_handler(struct intel_engine_cs *engine, u32 iir) tasklet_hi_schedule(&engine->execlists.tasklet); } +static void gen8_perfmon_handler(struct drm_i915_private *i915, u32 iir) +{ + if (iir & GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT) + notify_perfmon_buffer_half_full(i915); +} + static void gen8_gt_irq_ack(struct drm_i915_private *i915, u32 master_ctl, u32 gt_iir[4]) { @@ -1516,6 +1531,7 @@ static void gen8_gt_irq_ack(struct drm_i915_private *i915, GEN8_GT_BCS_IRQ | \ GEN8_GT_VCS1_IRQ | \ GEN8_GT_VCS2_IRQ | \ + GEN8_GT_WDBOX_OACS_IRQ | \ GEN8_GT_VECS_IRQ | \ GEN8_GT_PM_IRQ | \ GEN8_GT_GUC_IRQ) @@ -1538,7 +1554,7 @@ static void gen8_gt_irq_ack(struct drm_i915_private *i915, raw_reg_write(regs, GEN8_GT_IIR(2), gt_iir[2]); } - if (master_ctl & GEN8_GT_VECS_IRQ) { + if (master_ctl & (GEN8_GT_VECS_IRQ | GEN8_GT_WDBOX_OACS_IRQ)) { gt_iir[3] = raw_reg_read(regs, GEN8_GT_IIR(3)); if (likely(gt_iir[3])) raw_reg_write(regs, GEN8_GT_IIR(3), gt_iir[3]); @@ -1562,9 +1578,11 @@ static void gen8_gt_irq_handler(struct drm_i915_private *i915, gt_iir[1] >> GEN8_VCS2_IRQ_SHIFT); } - if (master_ctl & GEN8_GT_VECS_IRQ) { + if (master_ctl & (GEN8_GT_VECS_IRQ | GEN8_GT_WDBOX_OACS_IRQ)) { gen8_cs_irq_handler(i915->engine[VECS], gt_iir[3] >> GEN8_VECS_IRQ_SHIFT); + gen8_perfmon_handler(i915, + gt_iir[3] >> GEN8_WD_IRQ_SHIFT); } if (master_ctl & (GEN8_GT_PM_IRQ | GEN8_GT_GUC_IRQ)) { @@ -3018,6 +3036,8 @@ gen11_other_irq_handler(struct drm_i915_private * const i915, { if (instance == OTHER_GTPM_INSTANCE) return gen6_rps_irq_handler(i915, iir); + if (instance == OTHER_WDOAPERF_INSTANCE) + return gen8_perfmon_handler(i915, iir); WARN_ONCE(1, "unhandled other interrupt instance=0x%x, iir=0x%x\n", instance, iir); @@ -4051,6 +4071,10 @@ static void gen5_gt_irq_postinstall(struct drm_device *dev) gt_irqs |= GT_BLT_USER_INTERRUPT | GT_BSD_USER_INTERRUPT; } + /* We only expose the i915/perf interface on HSW+. */ + if (IS_HASWELL(dev_priv)) + gt_irqs |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; + GEN3_IRQ_INIT(GT, dev_priv->gt_irq_mask, gt_irqs); if (INTEL_GEN(dev_priv) >= 6) { @@ -4180,7 +4204,8 @@ static void gen8_gt_irq_postinstall(struct drm_i915_private *dev_priv) GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT, 0, GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT | - GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT + GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT | + GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT << GEN8_WD_IRQ_SHIFT }; if (HAS_L3_DPF(dev_priv)) @@ -4302,12 +4327,12 @@ static void gen11_gt_irq_postinstall(struct drm_i915_private *dev_priv) /* * RPS interrupts will get enabled/disabled on demand when RPS itself - * is enabled/disabled. + * is enabled/disabled, just enable the OA interrupt for now. */ - dev_priv->pm_ier = 0x0; + dev_priv->pm_ier = GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; dev_priv->pm_imr = ~dev_priv->pm_ier; - I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_ENABLE, 0); - I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_MASK, ~0); + I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_ENABLE, dev_priv->pm_ier); + I915_WRITE(GEN11_GPM_WGBOXPERF_INTR_MASK, dev_priv->pm_imr); } static void icp_irq_postinstall(struct drm_device *dev) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index a8df58ec7d75..21b05437123d 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -347,6 +347,7 @@ static const struct i915_oa_format gen8_plus_oa_formats[I915_OA_FORMAT_MAX] = { * @oa_period_exponent: The OA unit sampling period is derived from this * @poll_oa_period: The period at which the CPU will check for OA data * availability + * @oa_interrupt_monitor: Whether we should monitor the OA interrupt. * * As read_properties_unlocked() enumerates and validates the properties given * to open a stream of metrics the configuration is built up in the structure @@ -364,6 +365,7 @@ struct perf_open_properties { bool oa_periodic; int oa_period_exponent; u64 poll_oa_period; + bool oa_interrupt_monitor; }; static void free_oa_config(struct drm_i915_private *dev_priv, @@ -1852,6 +1854,13 @@ static void gen7_oa_enable(struct i915_perf_stream *stream) */ gen7_init_oa_buffer(dev_priv); + if (stream->oa_interrupt_monitor) { + spin_lock(&dev_priv->irq_lock); + gen5_enable_gt_irq(dev_priv, + GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT); + spin_unlock(&dev_priv->irq_lock); + } + I915_WRITE(GEN7_OACONTROL, (ctx_id & GEN7_OACONTROL_CTX_MASK) | (period_exponent << @@ -1878,6 +1887,9 @@ static void gen8_oa_enable(struct i915_perf_stream *stream) */ gen8_init_oa_buffer(dev_priv); + if (stream->oa_interrupt_monitor) + I915_WRITE(GEN8_OA_IMR, ~GEN8_OA_IMR_MASK_INTR); + /* * Note: we don't rely on the hardware to perform single context * filtering and instead filter on the cpu based on the context-id @@ -1901,6 +1913,10 @@ static void i915_oa_stream_enable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + dev_priv->perf.oa.half_full_count_last = 0; + atomic64_set(&dev_priv->perf.oa.half_full_count, + dev_priv->perf.oa.half_full_count_last); + dev_priv->perf.oa.ops.oa_enable(stream); if (dev_priv->perf.oa.periodic) @@ -1913,6 +1929,13 @@ static void gen7_oa_disable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + if (stream->oa_interrupt_monitor) { + spin_lock(&dev_priv->irq_lock); + gen5_disable_gt_irq(dev_priv, + GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT); + spin_unlock(&dev_priv->irq_lock); + } + I915_WRITE(GEN7_OACONTROL, 0); if (intel_wait_for_register(dev_priv, GEN7_OACONTROL, GEN7_OACONTROL_ENABLE, 0, @@ -1924,6 +1947,8 @@ static void gen8_oa_disable(struct i915_perf_stream *stream) { struct drm_i915_private *dev_priv = stream->dev_priv; + I915_WRITE(GEN8_OA_IMR, 0xffffffff); + I915_WRITE(GEN8_OACONTROL, 0); if (intel_wait_for_register(dev_priv, GEN8_OACONTROL, GEN8_OA_COUNTER_ENABLE, 0, @@ -2599,6 +2624,7 @@ i915_perf_open_ioctl_locked(struct drm_i915_private *dev_priv, stream->dev_priv = dev_priv; stream->ctx = specific_ctx; stream->poll_oa_period = props->poll_oa_period; + stream->oa_interrupt_monitor = props->oa_interrupt_monitor; ret = i915_oa_stream_init(stream, param, props); if (ret) diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h index 0a7d60509ca7..c30d4cb6decd 100644 --- a/drivers/gpu/drm/i915/i915_reg.h +++ b/drivers/gpu/drm/i915/i915_reg.h @@ -223,6 +223,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define MAX_ENGINE_CLASS 4 #define OTHER_GTPM_INSTANCE 1 +#define OTHER_WDOAPERF_INSTANCE 2 #define MAX_ENGINE_INSTANCE 3 /* PCI config space */ @@ -617,6 +618,9 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg) #define OABUFFER_SIZE_8M (6 << 3) #define OABUFFER_SIZE_16M (7 << 3) +#define GEN8_OA_IMR _MMIO(0x2b20) +#define GEN8_OA_IMR_MASK_INTR (1 << 28) + /* * Flexible, Aggregate EU Counter Registers. * Note: these aren't contiguous @@ -2868,7 +2872,9 @@ enum i915_power_well_id { #define GT_BLT_USER_INTERRUPT (1 << 22) #define GT_BSD_CS_ERROR_INTERRUPT (1 << 15) #define GT_BSD_USER_INTERRUPT (1 << 12) +#define GEN8_GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT (1 << 12) /* bdw+ */ #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT_S1 (1 << 11) /* hsw+; rsvd on snb, ivb, vlv */ +#define GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT (1 << 9) /* ivb+ but only used on hsw+ */ #define GT_CONTEXT_SWITCH_INTERRUPT (1 << 8) #define GT_RENDER_L3_PARITY_ERROR_INTERRUPT (1 << 5) /* !snb */ #define GT_RENDER_PIPECTL_NOTIFY_INTERRUPT (1 << 4) @@ -7160,6 +7166,7 @@ enum { #define GEN8_DE_PIPE_B_IRQ (1 << 17) #define GEN8_DE_PIPE_A_IRQ (1 << 16) #define GEN8_DE_PIPE_IRQ(pipe) (1 << (16 + (pipe))) +#define GEN8_GT_WDBOX_OACS_IRQ (1 << 7) #define GEN8_GT_VECS_IRQ (1 << 6) #define GEN8_GT_GUC_IRQ (1 << 5) #define GEN8_GT_PM_IRQ (1 << 4) diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c index 02f6a9b81083..42f7157636c7 100644 --- a/drivers/gpu/drm/i915/intel_ringbuffer.c +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c @@ -2277,6 +2277,8 @@ int intel_init_render_ring_buffer(struct intel_engine_cs *engine) if (HAS_L3_DPF(dev_priv)) engine->irq_keep_mask = GT_RENDER_L3_PARITY_ERROR_INTERRUPT; + if (IS_HASWELL(dev_priv)) + engine->irq_keep_mask |= GT_PERFMON_BUFFER_HALF_FULL_INTERRUPT; engine->irq_enable_mask = GT_RENDER_USER_INTERRUPT; From patchwork Wed Dec 19 14:37:47 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Lionel Landwerlin X-Patchwork-Id: 10737349 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A88E313AD for ; Wed, 19 Dec 2018 14:38:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 98CBB2873F for ; Wed, 19 Dec 2018 14:38:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 8D634287BE; Wed, 19 Dec 2018 14:38:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) (using TLSv1.2 with cipher DHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 001AC2873F for ; Wed, 19 Dec 2018 14:37:59 +0000 (UTC) Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 51C416EFE6; Wed, 19 Dec 2018 14:37:59 +0000 (UTC) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by gabe.freedesktop.org (Postfix) with ESMTPS id 6E0556EFE5 for ; Wed, 19 Dec 2018 14:37:56 +0000 (UTC) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Dec 2018 06:37:56 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,373,1539673200"; d="scan'208";a="127343455" Received: from delly.ld.intel.com ([10.103.238.204]) by fmsmga002.fm.intel.com with ESMTP; 19 Dec 2018 06:37:55 -0800 From: Lionel Landwerlin To: intel-gfx@lists.freedesktop.org Date: Wed, 19 Dec 2018 14:37:47 +0000 Message-Id: <20181219143747.8997-5-lionel.g.landwerlin@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> References: <20181219143747.8997-1-lionel.g.landwerlin@intel.com> MIME-Version: 1.0 Subject: [Intel-gfx] [RFC 4/4] drm/i915/perf: add interrupt enabling parameter X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" X-Virus-Scanned: ClamAV using ClamSMTP This let's the application choose to be driven by the interrupt mechanism of the HW. In conjuction with long periods for checks for the availability of data on the CPU, this can reduce the CPU load when doing capture of OA data. Signed-off-by: Lionel Landwerlin --- drivers/gpu/drm/i915/i915_perf.c | 58 ++++++++++++++++++++++++++++---- include/uapi/drm/i915_drm.h | 8 +++++ 2 files changed, 59 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c index 21b05437123d..0a1c7694d3dc 100644 --- a/drivers/gpu/drm/i915/i915_perf.c +++ b/drivers/gpu/drm/i915/i915_perf.c @@ -252,7 +252,7 @@ * indicates that an updated tail pointer is needed. * * Most of the implementation details for this workaround are in - * oa_buffer_check_unlocked() and _append_oa_reports() + * oa_buffer_check() and _append_oa_reports() * * Note for posterity: previously the driver used to define an effective tail * pointer that lagged the real pointer by a 'tail margin' measured in bytes @@ -428,9 +428,11 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv) return oastatus1 & GEN7_OASTATUS1_TAIL_MASK; } + /** - * oa_buffer_check_unlocked - check for data and update tail ptr state + * oa_buffer_check - check for data and update tail ptr state * @dev_priv: i915 device instance + * @lock: whether to take the oa_buffer spin lock * * This is either called via fops (for blocking reads in user ctx) or the poll * check hrtimer (atomic ctx) to check the OA buffer tail pointer and check @@ -452,8 +454,9 @@ static u32 gen7_oa_hw_tail_read(struct drm_i915_private *dev_priv) * * Returns: %true if the OA buffer contains data, else %false */ -static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) +static bool oa_buffer_check(struct drm_i915_private *dev_priv, bool lock) { + u64 half_full_count = atomic64_read(&dev_priv->perf.oa.half_full_count); int report_size = dev_priv->perf.oa.oa_buffer.format_size; unsigned long flags; u32 hw_tail; @@ -463,7 +466,8 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) * could result in an OA buffer reset which might reset the head, * tails[] and aged_tail state. */ - spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + if (lock) + spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); hw_tail = dev_priv->perf.oa.ops.oa_hw_tail_read(dev_priv); @@ -540,7 +544,10 @@ static bool oa_buffer_check_unlocked(struct drm_i915_private *dev_priv) dev_priv->perf.oa.oa_buffer.aging_timestamp = now; } - spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + dev_priv->perf.oa.half_full_count_last = half_full_count; + + if (lock) + spin_unlock_irqrestore(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); return OA_TAKEN(dev_priv->perf.oa.oa_buffer.tail, dev_priv->perf.oa.oa_buffer.head) >= report_size; @@ -671,6 +678,16 @@ static int gen8_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + /* + * Opportunisticly checks for half buffer interrupt if requested by + * the user. + */ + if (stream->oa_interrupt_monitor && + (dev_priv->perf.oa.half_full_count_last != + atomic64_read(&dev_priv->perf.oa.half_full_count))) { + dev_priv->perf.oa.pollin = oa_buffer_check(dev_priv, false); + } + head = dev_priv->perf.oa.oa_buffer.head; tail = dev_priv->perf.oa.oa_buffer.tail; @@ -954,6 +971,16 @@ static int gen7_append_oa_reports(struct i915_perf_stream *stream, spin_lock_irqsave(&dev_priv->perf.oa.oa_buffer.ptr_lock, flags); + /* + * Opportunisticly checks for half buffer interrupt if requested by + * the user. + */ + if (stream->oa_interrupt_monitor && + (dev_priv->perf.oa.half_full_count_last != + atomic64_read(&dev_priv->perf.oa.half_full_count))) { + dev_priv->perf.oa.pollin = oa_buffer_check(dev_priv, false); + } + head = dev_priv->perf.oa.oa_buffer.head; tail = dev_priv->perf.oa.oa_buffer.tail; @@ -1151,7 +1178,7 @@ static int i915_oa_wait_unlocked(struct i915_perf_stream *stream) return -EIO; return wait_event_interruptible(dev_priv->perf.oa.poll_wq, - oa_buffer_check_unlocked(dev_priv)); + oa_buffer_check(dev_priv, true)); } /** @@ -1970,6 +1997,10 @@ static void i915_oa_stream_disable(struct i915_perf_stream *stream) dev_priv->perf.oa.ops.oa_disable(stream); + dev_priv->perf.oa.half_full_count_last = 0; + atomic64_set(&dev_priv->perf.oa.half_full_count, + dev_priv->perf.oa.half_full_count_last); + if (dev_priv->perf.oa.periodic) hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer); } @@ -2296,7 +2327,7 @@ static enum hrtimer_restart oa_poll_check_timer_cb(struct hrtimer *hrtimer) perf.oa.poll_check_timer); struct i915_perf_stream *stream = dev_priv->perf.oa.exclusive_stream; - if (oa_buffer_check_unlocked(dev_priv)) { + if (oa_buffer_check(dev_priv, true)) { dev_priv->perf.oa.pollin = true; wake_up(&dev_priv->perf.oa.poll_wq); } @@ -2332,6 +2363,16 @@ static __poll_t i915_perf_poll_locked(struct drm_i915_private *dev_priv, stream->ops->poll_wait(stream, file, wait); + /* + * Only check the half buffer full notifications if requested by the + * user. + */ + if (stream->oa_interrupt_monitor && + (dev_priv->perf.oa.half_full_count_last != + atomic64_read(&dev_priv->perf.oa.half_full_count))) { + dev_priv->perf.oa.pollin = oa_buffer_check(dev_priv, true); + } + /* Note: we don't explicitly check whether there's something to read * here since this path may be very hot depending on what else * userspace is polling, or on the timeout in use. We rely solely on @@ -2813,6 +2854,9 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv, } props->poll_oa_period = value; break; + case DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT: + props->oa_interrupt_monitor = value != 0; + break; case DRM_I915_PERF_PROP_MAX: MISSING_CASE(id); return -EINVAL; diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h index 7191236ad7f2..cad32d4f46d2 100644 --- a/include/uapi/drm/i915_drm.h +++ b/include/uapi/drm/i915_drm.h @@ -1547,6 +1547,14 @@ enum drm_i915_perf_property_id { */ DRM_I915_PERF_PROP_POLL_OA_DELAY, + /** + * Specifying this property sets up the interrupt mechanism for the OA + * buffer in i915. This option in conjuction with a long polling delay + * for avaibility of OA data can reduce CPU load significantly if you + * do not care about OA data being read as soon as it's available. + */ + DRM_I915_PERF_PROP_OA_ENABLE_INTERRUPT, + DRM_I915_PERF_PROP_MAX /* non-ABI */ };